Classification Clustering And Association Rules
Assignment – 6:
Classification, Clustering and Association Rules
Learning Objectives: Learn to apply Supervised (Classification) and Unsupervised Learning Methods (Segmentation / Association Rules) to business problems.
Datasets: (1) Loan Offer Sample Data.jmp; (2) Utilities.jmp; (3) Vegetable Stand.xlsx
Directions/ Submission Requirements:
- There are three parts in this assignment. Each Part is worth 30 points. The summary document (point 4 below) carries 10 points.
- Save Scripts all scripts in the Data Table; Upload in the Assignment section.
- Cut/paste JMP outputs in the working document. Submit / upload this in the assignment section. Penalty for non-submission of working document = 5 points
- (10 points) Provide a Summary Document – provide highlights of answers. It should include the taken to solve each problem. Submit / upload this in the assignment section. (Also submit in Turnitin as well). Penalty for non-submission of summary document = 25%
A. Classification: (30 points) Dataset: Loan Offer Sample Data.jmp;
1. Your Task: Develop a model for “Loan Offer Acceptance” using the “classification tree” method (using JMP). For this, partition the Data (60-40) using Seed = 123.
- (5 points) Use all variables and Validation column. “Go” What is the optimum number of split? Provide Fit Details. Misclassification rates for Training; and for validation) Answer the following for No of Split = 5 Provide Fit Details and Leaf Report:
- (5 points) What is the model? Write in Simple English the very first rule --
- (5 points) Which are the top two attributes [Hint: column contributions] in this model?
- (5 points) Which variables appear to be not relevant for classification?
- (5 points) What level of accuracy might be expected in practice when you use this model? [Hint: Look at Fit Details]
- (5 points) Assume Cutoff = 0.5. Will a client with the following profile accept the loan offer? Income = 100; Education = 1; Family = 3; CCavg = 2? Hint: Save Prediction Formula and add a new row with this data. Read the Outcome.)
B. Clustering / Segmentation: Dataset: (30 points): Dataset: Utilities.jmp
- Your Task: Segment Data for k = 3, 4, and 5. (Use only Cost, Demand, Sales and Nuclear)
- (5 points) Which is the optimum number of cluster?
Now, For k = 3, save clusters and save the Cluster Formula in the data table. Then
Answer the following questions for k = 3;
- (5 points) Suitably name each cluster based on segment characteristics. Why you chose these names?
- (5 points) Create parallel plots and show Mean → Share your insights a couple of sentences.
- (5 points) Create Scatterplots → Look at the Sales Vs Cost Scatter plot. Which cluster does not overlap. What are the utilities in that cluster?
- (5 points) Which is the smallest cluster? What companies are in that cluster?
- (5 points) For the values Cost = 120, Demand = 8.7, Sales = 9500 and Nuclear = 25 → Which cluster would this belong to. [Hint: Add a new row and look up the answer in the formula]
C. Association Rules/ Affinity Analysis (30 points)
3. Scenario: A supermarket database has 10000 point of sales transactions, out of which 2,000 include both items A and B and 800 of these include item C. The number of total transactions for C is 5000.
Consider the association rule “If A and B are purchased, then C is also purchased.” Answer the following questions (15 points):
- (5 points) What is the support for this rule?
- (5 points) What is the confidence for rule?
- (5 points) What is the lift for rule?
4. Scenario: A local farmer has set up a roadside vegetable stand and if offering the following items for sale: [Asparagus, Beans, Broccoli, Corn, Green peppers, Squash, and Tomatoes]. One by one customers pull over, pick up a basket, and purchase various combination of items. The transactions [T] and Items Purchased [D] are given in the file “Vegetable Stand.xlsx”. Consider the association rule, “If Squash is bought, then Beans are also bought”. Answer the following questions (15 points):
- (3 points) How many transactions are there where Squash was bought?
- (3 points) How many transactions where beans were also bought?
- (3 points) What is the support for this association rule?
- (3 points) What is the confidence for this rule?
- (3 points) What is the lift for this rule?