The Fifth Elephant
Hadoop, MapReduce and friends
Hadoop has nearly become synonymous with Big Data. There is a large community backing this open source project, making it a foundation piece for several other open-source projects and tools.
In this hacknight, you will go through short tutorials on how to get started with hadoop - how to build it, get a dev environment on your laptop and write map-reduce programs in that environment.
With the help of a few convenience scripts, pre-packaged config files and some hand-holding from Gopal Vijayaraghavan, you can download hadoop, build it and set up a single node cluster without much trouble. This sets up a very simple, non-secure hadoop cluster, easily extensible to a few nodes. The multiple node setup is only useful to debug node locality and schedulers, but for most of the HDFS/Hadoop development, the single node cluster works wonders.
Once you have learnt how to set up a cluster on your system, you can dig deeper into the architecture and try to modify hadoop itself to your needs. Perhaps there’s something in there that’s annoyed you for months? Maybe you think hadoop will be better off with a DualPivotQuickSort instead of the default sort? Actually, that sounds like a good idea. And it’s not restricted to just java, perhaps you are a javascript wizard and want to fix something in the jquery based web UI?
Or maybe you’ve got a Big Data problem and hadoop looks like a good way to solve it. Maybe R on Hadoop will fit your problem? The solution might be a quick ETL phase to cut down your logs to size, where Pig might fit in neatly. Or you want to check your A/B tests and funnels, where hive does a good job. If there’s a lot of numpy you already know, perhaps Pydoop might fit the bill. And if all else fails, might as well do it in Java & call it a day.
And just in case one of those don’t work just exactly right, fixing it is always an option - after cursing, yelling and debugging, not always in that order. Or write your own.
The hacknight is open to individuals who want to work with hadoop, on hadoop and everything in the neighbourhood. Come prepared with a laptop, a github account, java skills and enough sleep to tide you over the night.
Mentors
Gopal Vijayaraghavan has spent many years making PHP faster at Yahoo! and Zynga. He thinks about performance problems as opportunities to contribute. He has turned towards Hadoop for newer challenges and is chasing performance at both ends of the scale spectrum at Hortonworks. Beyond work, he’s a photographer by day, biker by sunset, and by night he posts random thoughts.
Rohit Chattar is a Senior Architect at Yahoo! in Advertising and Data Platform group. He is a thought leader specializing in designing solutions that uses huge amounts of data. He has deep knowledge and understanding of various usage models involving traditional databases and newer Big Data platforms to provide customer centric and cost effective solutions..
What is a hacknight?
Hacknight is a space where you can work on your pet project, try a new idea or test the waters of new tools, technologies and languages. At hacknights, you meet new people, make friends and network. Overall, a hacknight is an amazing, hands-on learning experience. HasGeek's hacknights provide developers the opportunity to work in an easy, relaxed, peer-to-peer environment, over a period of 12-20 hours.
This hacknight is part of The Fifth Elephant 2013, an annual conference on big data, analytics and cloud computing. More details, visit: http://fifthelephant.in/
Next steps:
- Register for the hacknight. You will receive a confirmation from us.
- Propose an app idea, and/or join a team.
- Comment on other projects.
- Come to the hacknight with your computer.
Other Participants
-
Log Analysis using mapreduce and Pig
3 membersPreventive report generation based on log analysis -
Hadoop job helpers for Go-lang
1 memberWrite hadoop tasks in Go to do your processing and run them within a cluster -
Bloom filter pushdown predicates for hive's map-joins
2 membersImprove hive's performance at processing left outer joins for queries -
[HADOOP-9335] Including UNIX like sort options for ls shell command
2 membershadoop fs -ls is very limited today supporting no native sort options