Wednesday, April 29, 2009

Apache Mahout

End of semesters are always the ruf time since you have continuous deadlines :(

With my interest in Data mining and machine learning, I checked out to see open source projects that focused on ML. Weka workbench seems to be quite popular but I dont see any active work happening around it, since most of the ML algos have already been implemented in it. Through GSOC 2009, I got to know about Apache Mahout. The goal of this project is to implement scalable ML algos. So they have chosen to implement ML algos on top of Hadoop. I am new to Hadoop and was just reading a tutorial on MapReduce since Hadoop is an open source version of the MapReduce concept. It is quite interesting to see how parallelization can be achieved. One catch that I see is that we need to be clever to make sure that data can be processed in a parallel fashion. For example computing fibonacci series cant be made parallel since we are always dependent on the previous 2 values. There is also a video series on MapReduce

If you are interested, please get join the Mahout gang.

No comments: