The current resource allocation model quite static

Processing – MapReduce and Beyond

YARN is fully functional today, and the future direction will see extensions to its current capabilities. Perhaps most notable among these will be the ability to specify and control container resources on more dimensions. Currently, only location, memory and CPU specifications are possible, and this will be expanded
into areas such as storage and network I/O.

	How MapReduce was the only processing model available in Hadoop 1 and its conceptual model The Java API to MapReduce, and how to use this to build some examples, from a word count to sentiment analysis of Twitter hashtags

In the next two chapters, we will move away from strictly batch processing and delve into the world of near real-time and iterative processing, using two of the YARN-hosted frameworks we introduced in this chapter, namely Samza and Spark.

[ 104 ]

• • • •	What Samza is and how it integrates with YARN and other projects such as Apache Kafka How Samza provides a simple callback-based interface for stream processing How Samza composes multiple stream processing jobs into more complex workflows How Samza supports persistent local state within tasks and how this greatly enriches what it can enable