Copy and from virtual linux machine into hdfs hadoop put
1. In this assignment, you will work with part of the MovieLens
dataset. I selected a 2 files for this assignment which can be found in
movielens.zip
The movielens dataset is a collection of movie ratings data and has been
widely used in the industry and academia for experimenting with
recommendation algorithms and we see many publications using this
dataset to benchmark the performance of their algorithms.
For access to full-sized movielens data, go to http://grouplens.org/datasets/movielens/
-----------------------------
-- Table description "u.data"
--
-- field_1 userid
-- field_2 movieid
-- field_3 rating
-- field_4 unixtime
-----------------------------
--> u.item -- Information about the items (movies).The file has 24 pipe ("|") separated columns. this is a list of:
7. Create one table for u.data and one table for u.item in Hive and load the data.
------------------------
-- Assignment Questions
------------------------
5. Find the highest rated sci_fi movie. Explain how you define "highest rating".
BONUS: Are there any movies with no ratings? (Hint: outer join and IS NULL)
Screenshot of the result
2. Submit using Assessment -> Assignments -> Hive Assignment 1