+2

Parse, Extract (Product Names from URLs or reviews) and Analyze (sentiment analysis?) the reviews to find out the most reviewed products, products with highest positive sentiment and vice versa, websites with most reviewed products and other interesting stats. 

Each of the task is quite challenging. For e.g. to parse the records, your date parser will have to handle all possible date formats.

The dataset has the following fields:

Rating, Summary, Date Time, Type, Review, Reviewer, ItemName, URL

You can download the dataset from here:

http://beevolve-is-hiring-engineers.s3.amazonaws.com/beevolve_reviews_dataset.txt.gz (1 GB compressed)

Sample Data (5000 records): http://beevolve-is-hiring-engineers.s3.amazonaws.com/beevolve_sample_data.csv (< 5 MB)


Project Members