. Because of community requests, Keystone has initiated a census of the Alukuruma, Irula, Kota and Toda communities in the Nilgiris, covering 160 hamlets over a large geographical region. This census is conducted once every two to three years.

They want to eventually enter this data into a queriable database, currently under development by a volunteer, in order to analyze the data and use it to advocate for these vulnerable populations. To this end, Keystone needs your help to check and clean up this census data. Because they are a small organization in a small town, they don’t have the capacity to do this task themselves.

The data

The datasets for this hackathon are from the latest census, taken over a period of months in 2010-11. They are in seven Excel spreadsheets, each of which contains data from a particular area. There are 38 fields in each of these datasets. The data, collected in Tamil on paper, has been entered by volunteers who are not well-versed in data handling, and as a consequence there are various errors in the datasets.

The details

The data needs to be looked at closely to identify errors and rectify them. For example, all of the village names from the different sheets need to be extracted into a single sheet to create a separate village list. It can have three columns: village name, area name, and a unique ID for each village.

Also, the Entry ID column needs to be populated with unique family IDs. Each individual in a family must have the same Entry ID. This ID can be a combination of the village ID and a number.

Additionally, the datasets need to be anonymized to avoid exposing sensitive information to the world at large. We would like suggestions on how to do this, so that we can open up this data. Also welcome are ideas on analyzing and visualizing the data.

The benefit

Your assistance in this task would help Keystone unlock the patterns in this data and help them advocate from a strong position for various efforts to improve the lives of the indigenous communities. Importantly, it would also enable them to inform the indigenous communities of their state of affairs so that they are better able to seek their rights and entitlements.

Keystone also plans to make this dataset the basis of their interventions with the communities, wherein they would track who is benefiting from which interventions and who is being left out. This is a longstanding dream of theirs, but they lack the capacity to build databases and maintain them and therefore it has remained pending for a long time.

Project Members