Data Visualization hacknight
It is commonly known that a picture is worth a thousand words. Making sense of the information you have extracted from raw data and representing it in easily consumable formats is critical. To do this, you need to produce crisp and attractive visualizations.
The Fifth Elephant presents visualization tutorials and a hacknight. You will learn how to use D3.js, R and Python Pandas during the tutorials. The trainers include Sameer Segal, S. Anand and Baan Bapat. They will also be present throughout the hacknight to mentor projects. Additionally, Pallav Nadhani of FusionCharts will talk to participants about how to approach data visualization holistically, the input-output process, slicing and dicing of visualizations based on the role and functions of the viewer and pillar of visualization to build on top of it.
Register for the tutorials and hacknight and learn how to build compelling visualizations.
Tools and languages covered in the tutorials
2). R: R is an elegant and comprehensive statistical and graphical programming environment & language. There are a plenty of reasons why you would want to use R. It provides an unparalleled platform for programming statistical methods in an easy and straightforward manner. It is largely developed and maintained by the academia and contains advanced statistical routines, peer reviewed, not yet available in other software. The rapid rate of development & popularity of R can be gauged by the number of contributed packages -- 4300 -- it now has. R traces its roots back to the S language and environment developed at Bell Labs / Lucent Technologies by John Chambers and colleagues and has now far out grown its parent. It has state-of-the-art graphics capabilities.
3). Pandas: Pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language. It is already well on its way toward this goal.
We will start the day with tutorials on each of the three above mentioned technologie from 11:00AM onwards. For the experts among you who can't wait to start hacking, the Hacknight space will be open from 2:00PM onwards.
Tutorial Venue: The Energy & Resources Institute, 4th Main, 2nd Cross, Domlur, 2nd Stage, Bangalore
Hacknight Venue: The Centre for Internet and Society (CIS), 2nd C Cross, Domlur 2nd Stage, Near Domlur Club, Bangalore
About the tutorials and requirements
Installation instructions: https://docs.google.com/a/hasgeek.in/document/d/1om2NE5z3lSpimpZBIZ72fCJkkkqndfdF5cApOdzrN_o/edit
1. D3.js - Instructed by Sameer Segal
D3 is extremely efficient in manipulating DOM elements in a browser and parsing relatively huge amounts of data. The two together make for amazing possibilites - the most obvious being graphs and visualizations. We will cover the basic philosophy of d3, the basics of the API, some basic data-driven charts, and end with an introduction to layouts. We will look through some examples and try and create our own versions.
a) The tutorial will be for a beginner to intermediate audience.
c) You need the following set-up on your laptops:
- Github account
- Some HTTP server like nginx, Apache2, wamp or Python.
Time - 11:00am to 12:00pm
2. R - Instructed by Baan Bapat
The tutorial will take the participants through a couple of the following use cases. We will update this space with the exact content of the tutorial soon.
- Univariate - continuous & categorical data
- Bi-variate - continuous Vs continuous (special case: time series & financial data), continuous Vs categorical, categorical Vs categorical
- Multivariate - similar combinations - specifically we will discuss the features offered by lattice and ggplot2 at this stage
- Specific modelling methods and associated plot methods (linear models, decision trees, clusters etc)
- Geo-spatial data plotting examples
- Interactive graphs with R - iplots and ggobi
- Couple of success stories
a) The tutorial will be for a beginner to intermediate audience.
b) Participants are expected to have a basic working knowledge of R and its data structures.
c) Follow the setup instructions here: http://cran.r-project.org/bin to get R up and running on your machine
d) Install the following libraries/packages subsequent to your installation of R:
- vcd, vcdExtra
- Rgooglemaps, rgdal, rgl
- iplot, rggobi
- colorspace, RColorBrewer
Time - 2:00pm to 3:00pm
3. Pandas - Instructed by S. Anand
Pandas is emerging as the defacto data manipulation tool in the Python world, thanks to its richness and speed. This tutorial will cover
- Loading data from files, databases and the web
- Performing simple analysis (e.g. top 10, most common, etc) with data
- Working with Pandas like you'd work in Excel
- Plotting the data
a) The tutorial will be for an intermediate audience
b) The participants are expected to know Python and how to write basic Python programs.
c) Please make sure you follow the setup instructions here http://continuum.io/downloads.html and have the environment setup before coming to the tutorial.
Time - 3:30pm to 4:30pm
Instructors and Mentors
S. Anand has advised and designed IT systems for organisations such as the Aditya Birla Group, Citigroup, Honda, ICICI, IBM, Oracle, RBS, SAP, Steelcase and Tesco, among others. He holds an MBA degree from IIM Bangalore with two gold medals and a B.Tech from IIT Madras. He has worked at IBM, Lehman Brothers, The Boston Consulting Group and Infosys Consulting. He blogs at http://s-anand.net.
Sameer Segal is passionate about inclusive technology. He started artoo in 2010 and is a self-taught geek who works on the entire technology stack from Android to Cloud. He has been recognized as one of Asia-Pacific’s most promising young social entrepreneurs by the Foundation for Youth Social Entrepreneurship’s Paragon 100 Fellowship.
Baan Bapat is a consultant and trainer and has 14-years of work experience in the role of analytics solution specialist across services and manufacturing industries. He has worked at Jigsaw Academy, DecisionCraft Analytics, Techbooks, etc. as project and process manager as well as process engineer. He is proficient with analytics softwares such as R, CPLEX, COIN-OR and glpk.
Pallav Nadhani is the co-founder and CEO of FusionCharts, and an angel investor. He started FusionCharts at the age of 17 as a way to make pocket money. Today, FusionCharts has over 21,000 customers and 450,000 users in 118 countries. He holds a Masters in Computer Science from University of Edinburgh and loves his traveling, beer and food.
1. Register for the hacknight. You will receive a confirmation from us.
2. Propose a project idea, and/or join a team.
3. Comment on other projects.
4. Come to the hacknight with your computer and software setup for the tutorials.
Visualize Music data from Facebook open graph1 memberVisual history of music you've listened to
UIDAI / Aadhaar2 membersAnalyse / Visualize/ project UIDAI data
Realtime Twitter hashtag/keyword visualization3 membersInsightful visualization of twitter hashtags/keywords
India Inflation1 memberVisual summary of inflation
Indian Census Data: Ask Interesting Questions3 membersWe'll be analysing the 2011 Census data, possibly mashing it up with other data sources.
Movie ratings4 membersWhat can we learn from movie ratings by people?
Agri prices in India3 membersWhat can we learn from the commodity prices in India?
IPLT201 memberScrape and analyse the extraordinarily rich information from http://www.iplt20.com/
Elections6 membersAnalyse current and past election data
Analysis and Visualisation of what tools people use to get their work done using data from usesthis.com3 membersLet's analyse what software, operating system and hardware, hackers use to do their work.