And accessible from the mesos master yarn mode

optimizations, see “Working on a Per-Partition Basis” on page 107). Some tasks may spend almost all of their time reading data from an external storage system, and will not benefit much from additional optimization in Spark since they are bottlenecked on input read.

Storage: Information for RDDs that are persisted

Environment: Debugging Spark’s configuration

This page enumerates the set of active properties in the environment of your Spark application. The configuration here represents the “ground truth” of your applica‐tion’s configuration. It can be helpful if you are debugging which configuration flags are enabled, especially if you are using multiple configuration mechanisms. This page will also enumerate JARs and files you’ve added to your application, which can be useful when you’re tracking down issues such as missing dependencies.

Finding Information	\|	153

• In Mesos, logs are stored in the work/ directory of a Mesos slave, and accessible from the Mesos master UI.

• In YARN mode, the easiest way to collect logs is to use YARN’s log collection tool (running yarn logs -applicationId <app ID>) to produce a report con‐taining logs from your application. This will work only after an application has fully finished, since YARN must first aggregate these logs together. For viewing logs of a running application in YARN, you can click through the ResourceMan‐ager UI to the Nodes page, then browse to a particular node, and from there, a particular container. YARN will give you the logs associated with output pro‐duced by Spark in that container. This process is likely to become less round‐about in a future version of Spark with direct links to the relevant logs.

154	\|