# Classification Clustering And Association Rules

**Assignment – 6: **

Classification, Clustering and Association Rules

**Learning Objectives**: Learn to apply Supervised (Classification) and Unsupervised Learning Methods (Segmentation / Association Rules) to business problems.

**Datasets:** (1) Loan Offer Sample Data.jmp; (2) Utilities.jmp; (3) Vegetable Stand.xlsx

**Directions/ Submission Requirements**:

- There are three parts in this assignment. Each Part is worth 30 points. The summary document (point 4 below) carries 10 points.
- Save Scripts all scripts in the Data Table; Upload in the Assignment section.
- Cut/paste JMP outputs in the working document. Submit / upload this in the assignment section. Penalty for non-submission of working document = 5 points
**(10 points)**Provide a Summary Document – provide highlights of answers. It should include the taken to solve each problem. Submit / upload this in the assignment section. (Also submit in Turnitin as well). Penalty for non-submission of summary document = 25%

**A. Classification: (30 points)** Dataset: Loan Offer Sample Data.jmp;

1. Your Task: Develop a model for “Loan Offer Acceptance” using the “classification tree” method (using JMP). For this, partition the Data (60-40) using Seed = 123.

- (5 points) Use all variables and Validation column. “Go” What is the optimum number of split? Provide Fit Details. Misclassification rates for Training; and for validation)
**Answer the following for No of Split = 5 Provide Fit Details and Leaf Report:** - (5 points) What is the model? Write in Simple English the very first rule --
- (5 points) Which are the top two attributes [Hint: column contributions] in this model?
- (5 points) Which variables appear to be not relevant for classification?
- (5 points) What level of accuracy might be expected in practice when you use this model? [Hint: Look at Fit Details]
- (5 points) Assume Cutoff = 0.5. Will a client with the following profile accept the loan offer? Income = 100; Education = 1; Family = 3; CCavg = 2? Hint: Save Prediction Formula and add a new row with this data. Read the Outcome.)

**B. Clustering / Segmentation: Dataset: (30 points)**: Dataset: Utilities.jmp

- Your Task: Segment Data for k = 3, 4, and 5. (Use only Cost, Demand, Sales and Nuclear)
- (5 points) Which is the optimum number of cluster?

Now, For k = 3, save clusters and save the Cluster Formula in the data table. Then

Answer the following questions for k = 3;

- (5 points) Suitably name each cluster based on segment characteristics. Why you chose these names?
- (5 points) Create parallel plots and show Mean → Share your insights a couple of sentences.
- (5 points) Create Scatterplots → Look at the Sales Vs Cost Scatter plot. Which cluster does not overlap. What are the utilities in that cluster?
- (5 points) Which is the smallest cluster? What companies are in that cluster?
- (5 points) For the values Cost = 120, Demand = 8.7, Sales = 9500 and Nuclear = 25 → Which cluster would this belong to. [Hint: Add a new row and look up the answer in the formula]

**C. Association Rules/ Affinity Analysis (30 points)**

** 3. Scenario**: A supermarket database has 10000 point of sales transactions, out of which 2,000 include both items A and B and 800 of these include item C. The number of total transactions for C is 5000.

Consider the association rule “**If A and B are purchased, then C is also purchased**.” Answer the following questions (15 points):

- (5 points) What is the support for this rule?
- (5 points) What is the confidence for rule?
- (5 points) What is the lift for rule?

4. Scenario: A local farmer has set up a roadside vegetable stand and if offering the following items for sale: [Asparagus, Beans, Broccoli, Corn, Green peppers, Squash, and Tomatoes]. One by one customers pull over, pick up a basket, and purchase various combination of items. The transactions [T] and Items Purchased [D] are given in the file “**Vegetable Stand.xlsx**”. Consider the association rule, “**If Squash is bought, then Beans are also bought**”. Answer the following questions (15 points):

- (3 points) How many transactions are there where Squash was bought?
- (3 points) How many transactions where beans were also bought?
- (3 points) What is the support for this association rule?
- (3 points) What is the confidence for this rule?
- (3 points) What is the lift for this rule?