Monday, November 03, 2014

Hortonworks data platform

I had earlier taken an introductory class in Udacity on hadoop and map-reduce. This was conducted by cloudera and was surely interesting. Recently I landed on hortonworks website and downloaded their Hortonworks data platform (HDP) virtualbox vm. I booted this vm on my mac and got the hdp up and running. Their tutorials were really really good and I loved the UI that they have provided. 
HDP's interface gives you access to the entire hadoop ecosystem. You can upload files into HDFS. Create a table and run Hive queries etc from right within the web interface provided by HDP. It was fun to playaround HDP. 

Monday, August 09, 2010

Are mobile apps really apps ??

I'd say more than 50% of the apps that I use are not "self-sustaining" apps. They wont work without the internet. They are dumb apps that are driven by webservices / web data. I am really wondering if the mobile apps are really apps or just an extension of the web data into our mobiles..Many pundits claim that this is the age of mobility and that mobile computing / technologies will dominate over the traditional computing systems. In a way we all see that. We see a lot more smart mobile devices being used compared to 4 - 5 yrs ago. Sure! they are a revolution but I think the internet / web is a more powerful phenomenon than the mobile revolution. Instead of using browsers on a desktop / laptop to access data from the cloud, we (or atleast I) are using so called mobile apps to access the same cloud. I use facebook / twitter more from my iPhone than my laptop. Infact mobile computing is boosting up cloud computing.
Sometimes I am at the crossroads of technology wondering which bus (mobile / cloud /..) to take. As a lover of CS and a passionate programmer, I'd always want to wet my hands & feet with every major technology. Ofcourse I want to write mobile apps and ofcourse I want to get much more involved with the cloud but IMHO the cloud will last longer than the mobile. The cloud will be the base and power much more than the mobile world. So why is the cloud growing in power ??? simple coz it has the data. I am getting reminded of deep throat advising Bob Woodward "Follow the money"... For us its "Follow the data"

Monday, July 12, 2010

Googles DIY tool for Android

Google has released a Do-It-Yourself tool for Android whereby anyone can develop apps for Android. You dont really need Programming skills is what they claim. The video showcasing this tool portrayed a very simple app but I am not sure if this tool can help build complex apps. But I guess programming with a programmers mind set is any day good. Yes, tools are important but they can only help you so much. I am curious to see if a game changing app would come out of "App Inventor". Maybe Google just wants to get their app count competitive with Applez.

Wednesday, June 30, 2010

Getting back on track

Itz been almost 6 months since I wrote something.. With a new look, I am hoping to get back on track :)
Some of the interesting things I learnt or worked over the last 6 months are..
1. Association Rule Mining using R - It was quite simple to perform Apriori based mining with the statistical tool R. R was fun to learn but m still a beginner.. need to explore a lot
2. Trying to dwelve into Mahout and Nutch

Couple of important events since my last post.
1. marriage
2. graduation

Hopefully I'll start getting active on my blog..

ciao.. Joe

Monday, November 30, 2009

Google App Engine vs Amazon EC2

Our term project titled Hoo is a Question Answer system and we are using a Named Entity Tagger which will tag the person names in any given sentence. This tagger was demands a lot of memory and so we had to find a hosting service that would address our memory needs. Initially, I was trying out Google App Engine (GAE). Itz got a nice plugin for eclipse and so developing / deploying a Java web app is so much easy. The major issue I faced was with the various constraints set by GAE. Your request cannot take more than 30 seconds, your static file cannot be more than 10 GB etc etc gave us a hard time deploying the app in prod environment of GAE. So as I was analyzing other solutions, I hit upon Amazon EC2 instance. This is just too good. EC2 also has a great eclipse plugin from which I can start my EC2 instance, manage and monitor it and more. This was a perfect solution, as we got a machine with 1.7GB RAM and 3 Ghz processor for 10 cents / hr and a static IP for 1 cent. But over a period of time, this solution is expensive but if you have a lucrative website and you want perceive growth, EC2 is probably one of the best options..

Saturday, October 31, 2009

Still alive and kicking

I am still alive and kicking. Itz been almost 2 months since I blogged. tatz the longest break I have taken to blog since I started blogging.. I am on my final sem @ UIUC and working on a Advance DBMS course. This class is quite hectic with projects, assignments and study guide problems (SGP). SGP is a new concept that I am getting adapted to. we basically go through all the landmark papers in the database world. Right before every class we need to read the paper and frame a qn / ans and post it in our class SGP page. This would guide / help other class mates to learn broadly. I might be benefited by someone elsez SGP QA while someone could learn from my SGP QA. we also vote on these SGP QA and get graded on these ratings. Itz a great team work. I've been learning a lot in this class. For eg, I thot Map-Reduce was a ground breaking idea from Google but after reading about Gamma database, I was shattered to see a similar idea 20 years ago. I am constantly getting reminded about my ignorance as I keep learning.
Also our Prof. Kevin Chang has a search engine called iWisdm and we are working on to build a dirty version of search service. Me and my team mate are working on a "Who" search engine called "Hoo". You can ask a "Who" question like "Who is the father of computers" and it would answer "Charles Babbage" and provide an image of Mr. Charles Babbage. We still have a long way to go for refining the results etc but we are progressing well so far..

Sunday, August 16, 2009

Google Code Jam 2009

I have registered for Google Code Jam this year. More info here . Meanwhile I am trying to solve some facebook puzzles and itz getting quite interesting.

Wednesday, August 05, 2009

Triple Boot on MacBook Pro

Boy, Am I excited ????!!!!
This is something that I was wanting to do since I got my MBP last year. With the recent HDD upgrade, I was able to allocate more space and configure triple boot (Ubuntu 9.04, OSX Leopard, Windows XP). I just mostly followed the instructions at this Ubuntu community site. The high level steps that I followed are

1. Installed refit on OSX Leopard
2. Using boot camp, created a partition for Windows
3. Restarted OSX Leopard with Win XP CD (hold the C button to boot from CD)
4. Formatted the Win partition as NTFS and installed windows
5. Restarted and logged onto OSX
6. Using the disk utility, partitioned my Mac partition for Mac and Ubuntu. I just left the parition to be hfs and not ext3 or anything.
7. Restarted and booted onto Ubuntu 9 CD.
8. Installed Ubuntu onto the newly created partition. Specifed the parition to be / and ext3 file system. Also I didnt create a parition for swap since Refit cant handle more than 4 partitions. Although Ubuntu would warn, you can safely neglect it
9. Restart and voila !! Refit would pose you with 3 OS and choose the one you like.

The best part I liked was that Ubuntu just recognized my wireless card. In the previous dell laptop, I had a broadcom wireless card and it was a restricted driver and I always had to spend time configuring it.

Although I can use only 1 OS at a time, it just feels cool and nice to have 3 OS running on my MBP. Itz hard to explain unless you experience it I guess..

Tuesday, July 28, 2009

Hard drive upgrade on mac book pro

I was thinking of selling my MBP to buy the latest version. Main reason was that I had only 120GB HDD and I wanted to try Ubuntu on MBP. But after listing it on ebay and craigslist, I decided not to take a huge financial loss. so I got a new HDD (Western Digital 320GB) from I went through couple of sites with info on the process to upgrade hard drives and was reluctant to do it myself. I was specifically not feeling comfortable removing the whole keyboard unit and tinker with small wires inside my MBP. But then the "engineer" side of me pushed me to try it out myself. so I got the screwdrivers and stuff and started off. The procedure specified at was very very helpful. It had a great step-by-step notes with pictures. So I just followed it to get the HDD upgraded. It was a sweet experience and it really helped me gain more confidence. Also getting my data back was just so easy. Prior to HDD upgrade, I took a backup of my data using time machine. I installed Leopard on the new drive and on the first boot, I was given an option to restore time machine backups. So I was able to get my every bit of data and settings back in no time. Thanks to time machine, I dont really have to worry about losing data..

Wednesday, July 08, 2009

Google Chrome OS - does it matter ?

This came as a shocking news to me today. Google has made a very significant move against the MS vs Google war with the announcement of Google Chrome OS. Specualtions of a Google OS has been in the wild for a long time. Two questions that came to my mind after reading this news was
1. why now ?
2. why netbook / browser centric ?

why now ?
Maybe there are playing tit for tat with MS. You Bing, I Chrome. With Bing hitting headlines and possibly stealing some of precious Google search shares, how do you counter attack and keep your publicity ratings high ? As long as I remember, this is the first time that the otherwise secretive Google has broken news on their vision for a netbook / browser centric OS. Probably Bing has forced Google Chrome OS to go on air.

why netbook / browser centric ?
Interestingly, netbook sales have been up during this recession. I was thinking that GOOG wants to target those sectors where it stands a chance. Converting desktop WIN users to a new OS will drain any company.Netbook is modeled to target customers who needs a computer for mail, chat, photos and the like. so positioning in this market should give GOOG a better shot.
Android -> mobile and other portable devices.
Chrome OS -> cloud computers.
Looks like the strategy is to break pawn chains, kill the bishop and rook before targeting the King.
Everyone knows that Google loves the cloud and intErnet. Browser based OS is a right choice for their philosophy to get things done in the cloud.

This is probably just the beginning of the battle. Google Chrome OS does definitely matter and it'll hold a spot in history

Tuesday, July 07, 2009

Predicting SSN from public data

CMU researchers have published this paper which talks about statistical methods to predict SSN numbers from public data.

Overview of the paper :
The SSN Nomenclature :
SSN (9 digits) = AN (3 digits) + GN (2 digits)+ SN (4 digits)
AN - Area Number. It is assigned based on the zipcode of the mailing address provided in the SSN application form
GN - Group Number. Within each SSA area, GNs are assigned in a precise but nonconsecutive order between 01 and 99
SN - Serial Number. Within each GN, SNs are assigned "consecutively from 0001 through 9999"

The prediction algorithm exploits the fact that people who were born in the same area are likely to have closer SSN numbers.
step 1: Use Death Master File (Itz a public file containing SSN #'s and place / date of birth of deceased people) to form clusters of people.
step 2: Now with the person's place / date of birth from social networking sites like Facebook or Orkut or watever, identify his / her cluster. This will reveal his / her ANGN.
step 3: Use regression to predict the SN.

Conclusion :
US Government is already working on randomizing SSN to defend against statistical attacks but those SSN's that we already hold are prone to prediction with certain accuracy as outlined above.

In the paper, they mention that aliens who got SSN long after their birth are outliers and wont be predicted. I am safe :) but nevertheless I will always remain skeptic & critic about the privacy of social networking sites

Excerpt from wired article
"With just two attempts, the researchers correctly guessed the first five digits of SSNs for 60 percent of deceased Americans born between 1989 and 2003. With fewer than 1,000 attempts, they could identify the entire nine digits for 8.5 percent of the group."

Monday, June 29, 2009

HTML5 - Interesting features and improvements

As I couldnt attend Google I/O 2009, I was watching the keynote speech online and learnt few interesting things facts about HTML 5. It is the next version of HTML standards and is currently work in progress. Interestingly most of the leading browsers have incorporated many of the features. Five key features were discussed and they are
1. Geo Location - A new API for locating you (ie your browser).I was awe struck with this idea of a browser being able to identify geo-location and process that information. I was worried about privacy but they promise to not track you without your consent.

2. Video tag - You can embed a video simply by using this "video" tag and you since it is a DOM element, you can manipulate it however you want. It was interesting to see a demo on this

3. Application cache - Google gears uses this standard to store data offline making the weg-apps work offline. This is also a brilliant idea which can make a web-app more attractive.

4. Web Workers - This is like threading in a browser. So if you have any heavy computation on your front-end js, you can fork it out to a new thread so your page doesnt really crash.

5. Canvas - I am not an artist and this didnt really interest me.

Google has been gently pushing developers to adopt HTML5 and make the web a even more better place. Time will tell how fast it gets adopted

Saturday, June 27, 2009

Kindle 1 - yet to mature

I recently bought kindle 1 reader and have been enjoying certain aspects of it.
Some of my thoughts...

1. Once you start reading on a kindle, your eyes seem to really forget that it is machine and feel more like reading from a paper (thanks to e-ink technology).
2. I also like the whispernet wireless which helps me download books from anywhere. Most of the kindle books have a solid sample chapter which you dont generally get to read before buying a paper based book.
3. Kindle books are cheaper by a significant margin compared to their paper counterparts.
4. Due to the easy access to several category of books, it is helping me to get on track with books other than tech / subject books.

1. NO TOUCH SCREEN. I am sure Amazon is working on a touch screen version of kindle. If they are not ?? Nah, I dont think so. Amazon is smart. They should come up with a touch screen version.
2. I find it difficult to navigate pages. Like in a regular book, I would flip through and itz not all that simple here
3. Screen flickers when we move from one page to another. this is annoying.

Overall, I am glad to see the efforts taken to move out from paper but it sure has to mature a lot and hopefully in the years to come, Kindle would become more compelling. "Save paper, Save the Environment"

Tuesday, May 19, 2009

Phone App for Attendance

I just got back from our Tuesday Church service. I really enjoy this service and have been able to learn a lot. After the service got over we were just having a little chat and a friend of mine was talking about how cumbersome it gets to note the attendance. she was casually asking if there wud be a better way for doing this than a pen and paper ? Got me thinking and i got this idea.. Think of a phone app that would scan the images of people who come in (through a built-in video camera) and using facial recognition just matches them to their names and marks their attendance. If someone is new, it would just show their image with no tag / name and once we assign the name, its all set for next week. The rough part would be that we have to make sure that the phone scans everyone who comes in.

Wednesday, April 29, 2009

Apache Mahout

End of semesters are always the ruf time since you have continuous deadlines :(

With my interest in Data mining and machine learning, I checked out to see open source projects that focused on ML. Weka workbench seems to be quite popular but I dont see any active work happening around it, since most of the ML algos have already been implemented in it. Through GSOC 2009, I got to know about Apache Mahout. The goal of this project is to implement scalable ML algos. So they have chosen to implement ML algos on top of Hadoop. I am new to Hadoop and was just reading a tutorial on MapReduce since Hadoop is an open source version of the MapReduce concept. It is quite interesting to see how parallelization can be achieved. One catch that I see is that we need to be clever to make sure that data can be processed in a parallel fashion. For example computing fibonacci series cant be made parallel since we are always dependent on the previous 2 values. There is also a video series on MapReduce

If you are interested, please get join the Mahout gang.