Classification Clustering And Association Rules

Assignment – 6:

Classification, Clustering and Association Rules

Learning Objectives: Learn to apply Supervised (Classification) and Unsupervised Learning Methods (Segmentation / Association Rules) to business problems.

Datasets: (1) Loan Offer Sample Data.jmp; (2) Utilities.jmp; (3) Vegetable Stand.xlsx

Directions/ Submission Requirements:

  1. There are three parts in this assignment. Each Part is worth 30 points. The summary document (point 4 below) carries 10 points.
  2. Save Scripts all scripts in the Data Table; Upload in the Assignment section.
  3. Cut/paste JMP outputs in the working document. Submit / upload this in the assignment section. Penalty for non-submission of working document = 5 points
  4. (10 points) Provide a Summary Document – provide highlights of answers. It should include the taken to solve each problem. Submit / upload this in the assignment section. (Also submit in Turnitin as well). Penalty for non-submission of summary document = 25%

A. Classification: (30 points) Dataset: Loan Offer Sample Data.jmp;

1. Your Task: Develop a model for “Loan Offer Acceptance” using the “classification tree” method (using JMP). For this, partition the Data (60-40) using Seed = 123.

  1. (5 points) Use all variables and Validation column. “Go” What is the optimum number of split? Provide Fit Details. Misclassification rates for Training; and for validation) Answer the following for No of Split = 5 Provide Fit Details and Leaf Report: 
  2. (5 points) What is the model? Write in Simple English the very first rule --
  3. (5 points) Which are the top two attributes [Hint: column contributions] in this model?
  4. (5 points) Which variables appear to be not relevant for classification?  
  5. (5 points) What level of accuracy might be expected in practice when you use this model? [Hint: Look at Fit Details]
  6. (5 points) Assume Cutoff = 0.5. Will a client with the following profile accept the loan offer? Income = 100; Education = 1; Family = 3; CCavg = 2? Hint: Save Prediction Formula and add a new row with this data. Read the Outcome.)

B. Clustering / Segmentation: Dataset:  (30 points): Dataset: Utilities.jmp

  1. Your Task: Segment Data for k = 3, 4, and 5. (Use only Cost, Demand, Sales and Nuclear)
  2. (5 points) Which is the optimum number of cluster?

Now, For k = 3, save clusters and save the Cluster Formula in the data table. Then

Answer the following questions for k = 3;

  1. (5 points) Suitably name each cluster based on segment characteristics. Why you chose these names?   
  2. (5 points) Create parallel plots and show Mean → Share your insights a couple of sentences.
  3. (5 points) Create Scatterplots → Look at the Sales Vs Cost Scatter plot. Which cluster does not overlap. What are the utilities in that cluster?   
  4. (5 points) Which is the smallest cluster? What companies are in that cluster?
  5. (5 points) For the values Cost = 120, Demand = 8.7, Sales = 9500 and Nuclear = 25 → Which cluster would this belong to. [Hint: Add a new row and look up the answer in the formula]   

C. Association Rules/ Affinity Analysis (30 points)

3. Scenario: A supermarket database has 10000 point of sales transactions, out of which 2,000 include both items A and B and 800 of these include item C. The number of total transactions for C is 5000.

Consider the association rule “If A and B are purchased, then C is also purchased.” Answer the following questions (15 points):

  1. (5 points) What is the support for this rule?
  2. (5 points) What is the confidence for rule?
  3. (5 points) What is the lift for rule?

4. Scenario: A local farmer has set up a roadside vegetable stand and if offering the following items for sale: [Asparagus, Beans, Broccoli, Corn, Green peppers, Squash, and Tomatoes]. One by one customers pull over, pick up a basket, and purchase various combination of items. The transactions [T] and Items Purchased [D] are given in the file “Vegetable Stand.xlsx”. Consider the association rule, “If Squash is bought, then Beans are also bought”. Answer the following questions (15 points):

  1. (3 points) How many transactions are there where Squash was bought?
  2. (3 points) How many transactions where beans were also bought?
  3. (3 points) What is the support for this association rule?
  4. (3 points) What is the confidence for this rule?
  5. (3 points) What is the lift for this rule?