Such those that handle key pairs and doubles
Iterative Computation with Spark
It is not possible to override the default SparkContext class, nor is it possible to create a new one within a running Spark shell. It is however possible to specify which master the context connects to using the MASTER environment variable.
There are a few key differences between the Java and Scala APIs:
• |
|
---|
First of all, we create a context using the JavaSparkContext class:
JavaSparkContext sc = new JavaSparkContext(master, "JavaWordCount", System.getenv("SPARK_HOME"), JavaSparkContext.
public Iterable<String> call(String s) {
return Arrays.asList(s.split(" "));
}
});JavaPairRDD<String, Integer> ones = words.map(new PairFunction<String, String, Integer>() {
@Override
public Tuple2<String, Integer> call(String s) { return new Tuple2<String, Integer>(s, 1); }
});tweets = sc.textFile("/tmp/sample.txt")
counts = tweets.flatMap(lambda tweet: tweet.split(' ')) \ .map(lambda word: (word, 1)) \
.reduceByKey(lambda m,n:m+n)
[ 139 ]