Log Analysis using mapreduce and Pig

HasGeek
Hasjob
HGTV
Talkfunnel
Events
- Meta Refresh
- Rootconf
- The Fifth Elephant
- JSFoo
- Droidcon India
- Past events
- AngularJS Miniconf
- Cartonama Conference
- Scaling PHP in the Cloud
- AndroidCamp
- DocType HTML5

For large enterprises,analysing logs on daily basis is a difficult task. Generally developers/management analyse the logs manually based on need.

If Log can be analysed upfront and if all stakeholders got preventive report based on log analysis it will help to prevent error and act on issue on time.

Project Members

Comments

t3rmin4t0r · Fri, 7 Jun

More often than not, this is done with tools like Graylog2 rather than pig+hadoop.

1

·reply ·link
- Nitin Kumar t3rmin4t0r · Fri, 7 Jun
  
  Hi, Do you not think handling terabyte/Petabyte of logs(which company like Amadeus who owns GDS and others has) in cluster environment and using Map reduce for more efficient way(execution time will be less) is a proper use case?
  
  Later on use Titan and Faunus to enhance this model and make it more useful. But may be i am wrong or may be i am thinking in wrong direction , so any further inputs /ideas are most welcome.
  
  1
  
  ·reply ·link ·parent
  
  t3rmin4t0r Nitin Kumar · Fri, 7 Jun
  
  A lot of post-mortem analysis and large-scale ETL happens on hadoop and pig - which is what pig is meant for in the y! context. The real trouble with pig+hadoop is that they are only moderately unstructured, still operating on row+column style inputs.
  
  Most of the preventive up-front analysis of unstructured data is covered by Splunk on the commercial side (which will use hadoop YARN inside, very soon) & Graylog2/Elasticsearch on the open-source side. I have seen tools like Esper used to do complex event processing on input logs immediately and let hdfs spool it onto disks for later deeper inspections.
  
  Batch systems are always second in line to CEP layers or index+search layers, with alering mechanisms, particularly if time is of importance.
  
  1
  
  ·reply ·link ·parent

The Fifth ElephantHadoop, MapReduce and friends

Project Members

Comments