Write a Python program assignment

The twitter data file is in text format (one tweet per line). However, each tweet is in json format that you will need to convert explicitly to a Python data object. See the tips below for some pointers.

1) Write a Python program that computes the sentiment score of each tweet in the given set, based on the sentiment scores of the terms in the tweet's message data. The sentiment of a tweet is equivalent to the sum of the sentiment scores for each term in the tweet.

The file 'Sentiment Data' contains a list of pre-computed sentiment scores. Each line in the file contains a word or phrase followed by a sentiment score. Each word or phrase found in a tweet, but not in the 'Sentiment Data' should be given a sentiment score of 0. See the file 'Sentiment Data - README' for more information.

To use the data in the sentiment file, you may find it useful to build a dictionary. Note that the sentiment file format is tab-delimited, meaning that the term and the score are separated by a tab character. A tab character can be identified a "\t".

2) Write a Python program that computes the name of the happiest state as a string. (Build on your solution to 1 above)

There are several objects within the tweet that you can use to determine it’s origin. For example, coordinates, place, user, and country. You are free to develop your own strategy for determining the state that each tweet originates from.

Limit the tweets you analyze to those in the United States. Note: Not every tweet dictionary will have a text key -- real data is dirty. Be prepared to debug, and feel free to throw out tweets that your code can't handle to get something working. For example, non-English tweets.

TIPS

Unicode strings

==============

Strings in the twitter data prefixed with the letter "u" are unicode strings. For example: u"This is a string"

Unicode is a standard for representing a much larger variety of characters beyond the roman alphabet (greek, russian, mathematical symbols, logograms from non-phonetic writing systems such as kanji, etc.)

In most circumstances, you will be able to use a unicode object just like a string. If you encounter an error involving printing unicode, you can use the encode method to properly print the international characters, like this:

unicode_string = u"aaaàçççñññ"
encoded_string = unicode_string.encode('utf-8')
print encoded_string
JSON

====

The format of a tweet is json, which stands for JavaScript Object Notation. It is a simple format for representing nested structures of data --- lists of lists of dictionaries of lists of .... you get the idea.

You can use the json package that comes with the standard Python library. Using this library, the json data is parsed and converted to a Python dictionary representing the entire result set. The "results" key of this dictionary corresponds holds the actual tweets; each tweet is itself another dictionary.

import json

and call the loads() method to convert into Python data structure