Medium income prediction based bayes theorem given training data
CSE 5243 INTRO. TO DATA MINING |
---|
Huan Sun, CSE@The Ohio State University |
---|
Classification: Basic Concepts Classification: Basic Concepts
Decision Tree Induction
Model Evaluation and Selection
Practical Issues of Classification
PH | | | X ) | = | P ( | ) | = | P ( X | | | H | ) | × | P H | /) |
|
---|
3
P(H) (prior probability): the initial probability
◼ E.g., X will buy computer, regardless of age, income, …4
PH | | | X ) | = | P ( | ) | = | P ( X | | | H | ) | × | P H | /) |
---|
P(H) (prior probability): the initial probability
◼ E.g., X will buy computer, regardless of age, income, … P(X): probability that sample data is observed
PH | | | X ) | = | P ( X ) |
) | = | P ( X | | | H | ) | × | P H | /) |
|
---|
8
Classification Is to Derive the Maximum Posteriori
Since P(X) is constant for all classes, only
distribution
|
10 |
---|
|
11 |
---|
Naïve Bayes Classifier
A simplified assumption: attributes are conditionally independent (i.e., no dependence relation between attributes):
|
|
g | ( | x , | μσ) | 1 | e | 2 | 12 | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
||||||||||||||||||||
Px |
|
| | Ci | ) | = | k | , | μC i | ,σC | i |
and P(xk|Ci) is
Here, mean μ and standard deviation σ are estimated based on the values of
13 |
---|
|
---|
|
|
16
|
|
---|---|
|
P(X|Ci) : P(X|buys_computer= “yes”) = P(age = “<=30”|buys_computer = “yes”) x P(income = “medium” | buys_computer = “yes”) x P(student = “yes” | buys_computer = “yes) x P(credit_rating = “fair” | buys_computer = “yes”) = 0.044
18
|
19
|