Monthly Archives: March 2014

Apache Tez for Hadoop >= 2.0

The Apache Tez project is aimed at building an application framework which allows for a complex directed-acyclic-graph of tasks for processing data. It is currently built atop Apache Hadoop YARN.

The 2 main design themes for Tez are:

  • Empowering end users by:
  • Expressive dataflow definition APIs
  • Flexible Input-Processor-Output runtime model
  • Data type agnostic
  • Simplifying deployment
  • Execution Performance
    • Performance gains over Map Reduce
    • Optimal resource management
    • Plan reconfiguration at runtime
    • Dynamic physical data flow decisions

Old way was:

References: