Kite jars can found opt cdh data organized number subprojects
Making Development Easier
df = float(df)
num_doc = float(num_doc)
except:
logger.warn("Invalid record %s" % line)/usr/bin/hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/ hadoop-streaming.jar \
-input tweets.json \
-output tweets.cnt \
-mapper /bin/cat \
-reducer /usr/bin/wcThe mapper source code can be found at https://github.com/learninghadoop2/ book-examples/blob/master/ch9/streaming/tf-idf/python/tf-idf.py.
On Cloudera's QuickStart VM, Kite JARs can be found at /opt/cloudera/parcels/ CDH/lib/kite/.
Kite Data is organized in a number of subprojects, some of which we'll describe in the following sections.
Implementations of the Reader<E> interface are used to read data from an underlying storage system and produce deserialized entities of type E. The newReader() method can be used to get an appropriate implementation for a given dataset:
public interface DatasetReader<E> extends Iterator<E>, Iterable<E>, Closeable {
void open();