Using the java interface for java serialization
implementing your own serialization routines for your data types (e.g., using the java.io.Externalizable interface for Java Serialization, or using the reduce() method to define custom serialization for Python’s pickle library).
Working on a Per-Partition Basis
deffetchCallSigns(input):
"""Fetch call signs"""
returninput.mapPartitions(lambdacallSigns:processCallSigns(callSigns))contactsContactList=fetchCallSigns(validSigns)
Working on a Per-Partition Basis | | | 107 |
---|
// Use mapPartitions to reuse setup work.
JavaPairRDD<String,CallLog[]>contactsContactLists=
validCallSigns.mapPartitionsToPair(
newPairFlatMapFunction<Iterator<String>,String,CallLog[]>(){
publicIterable<Tuple2<String,CallLog[]>>call(Iterator<String>input){ // List for our results.
|
|
Function signature on RDD[T] | |||
---|---|---|---|---|---|
Iterator of the elements in | f: (Iterator[T]) → | ||||
that partition | return elements | ||||
|
|
||||
return elements | |||||
that partition |
|
||||
108 | | |
|