The Fifth Elephant
Data Visualization hacknight
It is commonly known that a picture is worth a thousand words. Making sense of the information you have extracted from raw data and representing it in easily consumable formats is critical. To do this, you need to produce crisp and attractive visualizations.
The Fifth Elephant presents visualization tutorials and a hacknight. You will learn how to use D3.js, R and Python Pandas during the tutorials. The trainers include Sameer Segal, S. Anand and Baan Bapat. They will also be present throughout the hacknight to mentor projects. Additionally, Pallav Nadhani of FusionCharts will talk to participants about how to approach data visualization holistically, the input-output process, slicing and dicing of visualizations based on the role and functions of the viewer and pillar of visualization to build on top of it.
Register for the tutorials and hacknight and learn how to build compelling visualizations.
Tools and languages covered in the tutorials
1). D3.js: D3.js is a new javascript library that helps you manipulate data and a browser's DOM elements. With D3, you can produce graphs and do a whole lot more. D3.js at its core boils down to a set of functions that can understand data items and track whether these data items are newly created, updated or deleted. It is how we string these functions together that determines the kind of visualizations we can build. Say goodbye to excel-like google visualizations and get ready to work closer to the metal!
2). R: R is an elegant and comprehensive statistical and graphical programming environment & language. There are a plenty of reasons why you would want to use R. It provides an unparalleled platform for programming statistical methods in an easy and straightforward manner. It is largely developed and maintained by the academia and contains advanced statistical routines, peer reviewed, not yet available in other software. The rapid rate of development & popularity of R can be gauged by the number of contributed packages -- 4300 -- it now has. R traces its roots back to the S language and environment developed at Bell Labs / Lucent Technologies by John Chambers and colleagues and has now far out grown its parent. It has state-of-the-art graphics capabilities.
3). Pandas: Pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language. It is already well on its way toward this goal.
Format
We will start the day with tutorials on each of the three above mentioned technologie from 11:00AM onwards. For the experts among you who can't wait to start hacking, the Hacknight space will be open from 2:00PM onwards.
Venue
Tutorial Venue: The Energy & Resources Institute, 4th Main, 2nd Cross, Domlur, 2nd Stage, Bangalore
Hacknight Venue: The Centre for Internet and Society (CIS), 2nd C Cross, Domlur 2nd Stage, Near Domlur Club, Bangalore
About the tutorials and requirements
Installation instructions: https://docs.google.com/a/hasgeek.in/document/d/1om2NE5z3lSpimpZBIZ72fCJkkkqndfdF5cApOdzrN_o/edit
1. D3.js - Instructed by Sameer Segal
D3 is extremely efficient in manipulating DOM elements in a browser and parsing relatively huge amounts of data. The two together make for amazing possibilites - the most obvious being graphs and visualizations. We will cover the basic philosophy of d3, the basics of the API, some basic data-driven charts, and end with an introduction to layouts. We will look through some examples and try and create our own versions.
Prerequisites:
a) The tutorial will be for a beginner to intermediate audience.
b) Participants are expected to have basic knowledge of HTML, Javascript/Jquery and CSS.
c) You need the following set-up on your laptops:
- Chrome
- Git
- Github account
- Some HTTP server like nginx, Apache2, wamp or Python.
Time - 11:00am to 12:00pm
2. R - Instructed by Baan Bapat
The tutorial will take the participants through a couple of the following use cases. We will update this space with the exact content of the tutorial soon.
- Univariate - continuous & categorical data
- Bi-variate - continuous Vs continuous (special case: time series & financial data), continuous Vs categorical, categorical Vs categorical
- Multivariate - similar combinations - specifically we will discuss the features offered by lattice and ggplot2 at this stage
- Specific modelling methods and associated plot methods (linear models, decision trees, clusters etc)
- Geo-spatial data plotting examples
- Interactive graphs with R - iplots and ggobi
- Couple of success stories
Prerequsites:
a) The tutorial will be for a beginner to intermediate audience.
b) Participants are expected to have a basic working knowledge of R and its data structures.
c) Follow the setup instructions here: http://cran.r-project.org/bin to get R up and running on your machine
d) Install the following libraries/packages subsequent to your installation of R:
- ggplot2
- vcd, vcdExtra
- Rgooglemaps, rgdal, rgl
- iplot, rggobi
- colorspace, RColorBrewer
Time - 2:00pm to 3:00pm
3. Pandas - Instructed by S. Anand
Pandas is emerging as the defacto data manipulation tool in the Python world, thanks to its richness and speed. This tutorial will cover
- Loading data from files, databases and the web
- Performing simple analysis (e.g. top 10, most common, etc) with data
- Working with Pandas like you'd work in Excel
- Plotting the data
Prerequisites:
a) The tutorial will be for an intermediate audience
b) The participants are expected to know Python and how to write basic Python programs.
c) Please make sure you follow the setup instructions here http://continuum.io/downloads.html and have the environment setup before coming to the tutorial.
Time - 3:30pm to 4:30pm
Instructors and Mentors
S. Anand has advised and designed IT systems for organisations such as the Aditya Birla Group, Citigroup, Honda, ICICI, IBM, Oracle, RBS, SAP, Steelcase and Tesco, among others. He holds an MBA degree from IIM Bangalore with two gold medals and a B.Tech from IIT Madras. He has worked at IBM, Lehman Brothers, The Boston Consulting Group and Infosys Consulting. He blogs at http://s-anand.net.
Sameer Segal is passionate about inclusive technology. He started artoo in 2010 and is a self-taught geek who works on the entire technology stack from Android to Cloud. He has been recognized as one of Asia-Pacific’s most promising young social entrepreneurs by the Foundation for Youth Social Entrepreneurship’s Paragon 100 Fellowship.
Baan Bapat is a consultant and trainer and has 14-years of work experience in the role of analytics solution specialist across services and manufacturing industries. He has worked at Jigsaw Academy, DecisionCraft Analytics, Techbooks, etc. as project and process manager as well as process engineer. He is proficient with analytics softwares such as R, CPLEX, COIN-OR and glpk.
Pallav Nadhani is the co-founder and CEO of FusionCharts, and an angel investor. He started FusionCharts at the age of 17 as a way to make pocket money. Today, FusionCharts has over 21,000 customers and 450,000 users in 118 countries. He holds a Masters in Computer Science from University of Edinburgh and loves his traveling, beer and food.
Next steps
1. Register for the hacknight. You will receive a confirmation from us.
2. Propose a project idea, and/or join a team.
3. Comment on other projects.
4. Come to the hacknight with your computer and software setup for the tutorials.
Other Participants
-
Visualize Music data from Facebook open graph
1 memberVisual history of music you've listened to -
UIDAI / Aadhaar
2 membersAnalyse / Visualize/ project UIDAI data -
Realtime Twitter hashtag/keyword visualization
3 membersInsightful visualization of twitter hashtags/keywords -
India Inflation
1 memberVisual summary of inflation -
Indian Census Data: Ask Interesting Questions
3 membersWe'll be analysing the 2011 Census data, possibly mashing it up with other data sources. -
Movie ratings
4 membersWhat can we learn from movie ratings by people? -
Agri prices in India
3 membersWhat can we learn from the commodity prices in India? -
IPLT20
1 memberScrape and analyse the extraordinarily rich information from http://www.iplt20.com/ -
Elections
6 membersAnalyse current and past election data -
Analysis and Visualisation of what tools people use to get their work done using data from usesthis.com
3 membersLet's analyse what software, operating system and hardware, hackers use to do their work.