+1

Apache log format is very common format that is getting generated in large volumes.  They are mostly stored in cloud storages like S3.  Come unleash the power of linux text processing tools to process these logs and generate cool statistics like

dayof week distribution of traffic, errors etc,

status code distribution and browser statistics etc, all with very thin logic.  you could concentrate on user interaction  and business logic, while the data extraction could be delegated to CloudInfra's apis - using single line mapreduces.

e.g. the following lines would generate browser distribution stats.

map:awk -F'"' '{ print $(NF-1) }'  groupby: awk -F'"' '{ print $(NF-1) }'  reduce: wc -l

(not just shell, you could use command snippets from python or any other language)


Project Members