Medium income prediction based bayes theorem given training data
| CSE 5243 INTRO. TO DATA MINING |
|---|
| Huan Sun, CSE@The Ohio State University |
|---|
Classification: Basic Concepts Classification: Basic Concepts
Decision Tree Induction
Model Evaluation and Selection
Practical Issues of Classification
| PH | | | X ) | = | P ( | ) | = | P ( X | | | H | ) | × | P H | /) |
|
|---|
3
P(H) (prior probability): the initial probability
◼ E.g., X will buy computer, regardless of age, income, …4
| PH | | | X ) | = | P ( | ) | = | P ( X | | | H | ) | × | P H | /) |
|---|
P(H) (prior probability): the initial probability
◼ E.g., X will buy computer, regardless of age, income, … P(X): probability that sample data is observed
| PH | | | X ) | = | P ( X ) |
) | = | P ( X | | | H | ) | × | P H | /) |
|
|---|
8
Classification Is to Derive the Maximum Posteriori
Since P(X) is constant for all classes, only
distribution
|
10 |
|---|
|
11 |
|---|
Naïve Bayes Classifier
A simplified assumption: attributes are conditionally independent (i.e., no dependence relation between attributes):
|
|
g | ( | x , | μσ) | 1 | e | 2 | 12 | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
||||||||||||||||||||
| Px |
|
| | Ci | ) | = | k | , | μC i | ,σC | i | ||||||||||
and P(xk|Ci) is
Here, mean μ and standard deviation σ are estimated based on the values of
| 13 |
|---|
|
|---|
|
|
16
|
|
|---|---|
|
P(X|Ci) : P(X|buys_computer= “yes”) = P(age = “<=30”|buys_computer = “yes”) x P(income = “medium” | buys_computer = “yes”) x P(student = “yes” | buys_computer = “yes) x P(credit_rating = “fair” | buys_computer = “yes”) = 0.044
18
| | ||
19
| | ||


