Copy and from virtual linux machine into hdfs hadoop put

1. In this assignment, you will work with part of the MovieLens dataset. I selected a 2 files for this assignment which can be found in movielens.zip
The movielens dataset is a collection of movie ratings data and has been widely used in the industry and academia for experimenting with recommendation algorithms and we see many publications using this dataset to benchmark the performance of their algorithms.

For access to full-sized movielens data, go to http://grouplens.org/datasets/movielens/

-----------------------------
-- Table description "u.data"
--
-- field_1 userid
-- field_2 movieid
-- field_3 rating
-- field_4 unixtime
-----------------------------

--> u.item -- Information about the items (movies).The file has 24 pipe ("|") separated columns. this is a list of:

7. Create one table for u.data and one table for u.item in Hive and load the data.

------------------------
-- Assignment Questions
------------------------

5. Find the highest rated sci_fi movie. Explain how you define "highest rating".

BONUS: Are there any movies with no ratings? (Hint: outer join and IS NULL)

Screenshot of the result

2. Submit using Assessment -> Assignments -> Hive Assignment 1