Such those that handle key pairs and doubles

Iterative Computation with Spark

It is not possible to override the default SparkContext class, nor is it possible to create a new one within a running Spark shell. It is however possible to specify which master the context connects to using the MASTER environment variable.

There are a few key differences between the Java and Scala APIs:

•	api.java.function.Function, Function2, and other classes. As of Spark version 1.0 the API has been refactored to support Java 8 lambda expressions. With Java 8, Function classes can be replaced with inline expressions that act as a shorthand for anonymous functions. The RDD methods return Java collections

First of all, we create a context using the JavaSparkContext class:

JavaSparkContext sc = new JavaSparkContext(master, "JavaWordCount", System.getenv("SPARK_HOME"), JavaSparkContext.

public Iterable<String> call(String s) {
return Arrays.asList(s.split(" "));
}
});

JavaPairRDD<String, Integer> ones = words.map(new PairFunction<String, String, Integer>() {
@Override
public Tuple2<String, Integer> call(String s) { return new Tuple2<String, Integer>(s, 1); }
});

tweets = sc.textFile("/tmp/sample.txt")
counts = tweets.flatMap(lambda tweet: tweet.split(' ')) \ .map(lambda word: (word, 1)) \
.reduceByKey(lambda m,n:m+n)

[ 139 ]