The positive list and negative list words exist the keywords sheet
CS 2034 Data Analytics: Principles and Tools
Assignment 3
1.Data Munging
• | Download the tweetsSample.csv file (from OWL) that contains the tweets, and store it as an |
---|---|
xlsm file (YourUserID_Assign3.xlsm)
If you fail to save the file as .xlsm some of your work will be lost (e.g., functions/calculations)• Name the worksheet rawData
• First use Excel’s remove duplicate feature to remove the duplicate tweets based on column A (text). This should reduce the tweets from 5000 to about 3600.
2. Sentiment Analysis
• Copy the data in the keyword.csv file (downloaded
from OWL) and add it as a new sheet, with the name
keywords in your workbook.
• Next, create a VBA function that determines the sentiment of each tweet based on its contents • The function header is:
Function sentimentCalc(tweet1 As String) As Integer
• This function should check each word in the tweet and if the word exists as one of the keywords in the positive list or negative list it should impact the overall sentiment value. The positive list and negative list words exist in the keywords sheet. Access the keywords as ranges within your VBA code. The case of the word is inconsequential. For instance, happy, HAPPY, or happy are all treated as positive words regardless of their case.• Use this function in your processedData worksheet to calculate the sentiment value for each tweet (store in Column E)
• Create a function with the header:
Function sentimentCategory(sentVal As Integer) As String
oif the sentiment value is greater than 0, the Sentiment Category is “Positive” oIf the sentiment value is less than 0, its category is “Negative”.oIf the sentiment value is equal to 0, its category is “Neutral”.
Another alternative is to use the AVERAGEIF(S)/COUNTIF(S) function and create a separate column with
only one instance of each topic/group.
topic (There are over 39 topics in the dataset)
• TASK B: Calculate the overall sentiment value for each group.
• TASK E: Determine the number of Positive, Negative and Neutral Tweets for each topic
Figure 1: Layout of AnalyzedData worksheet
1.Show the hierarchical structure of the tweet topics and groups
2.Show the hierarchical structure and volume of tweets based on the topics and groups
3.Show the rank of groups based on the overall group sentiment value (sorted by rank)
4.Show the rank of topics based on the overall topic sentiment value (sorted by rank)
5.Show the flow of tweets (between group and topic) based on the number of positive number of tweets.
You don’t need to use all the 3 visualization applications. Just determine which ones have the visualizations that are relevant to your tasks.
and search the web.
• Your infographic should:
oExplain the data set, and the images you included from part 4. oHave at least 3 facts.
oThe second sheet will be the imported keyword data (keywords)
oThe third sheet which uses the sentimentCalc and sentimentCategory functions (processedData)
oThe fourth sheet which provides the summary information (analyzedData)•YourUserID_Assign3.pdf (PART 4)
oA PDF that includes all five images.•YourUserID_Assign3Infographic.png, YourUserID_Assign3Infographic.jpg or