Medium income prediction based bayes theorem given training data

CSE 5243 INTRO. TO DATA MINING

Huan Sun, CSE@The Ohio State University

Classification: Basic Concepts Classification: Basic Concepts
 Decision Tree Induction
 Model Evaluation and Selection
 Practical Issues of Classification

PH	\|	X )	=	P (		)	=	P ( X	\|	H	)	×	P H	/)	P ( )

3

 P(H) (prior probability): the initial probability
◼ E.g., X will buy computer, regardless of age, income, …

4

PH	\|	X )	=	P (		)	=	P ( X	\|	H	)	×	P H	/)

 P(H) (prior probability): the initial probability
◼ E.g., X will buy computer, regardless of age, income, …

 P(X): probability that sample data is observed

PH	\|	X )	=	P ( X )	)	=	P ( X	\|	H	)	×	P H	/)	P ( )

8

Classification Is to Derive the Maximum Posteriori

 Since P(X) is constant for all classes, only

distribution

10	10

11	11

Naïve Bayes Classifier
 A simplified assumption: attributes are conditionally independent (i.e., no dependence relation between attributes):

12	and P(xk\|Ci) is	g		(	x ,		μσ)				1				e			2		12
		g		(	x ,		μσ)				2πσ				e			2
		Px	k		\|	Ci		)	=			k	,	μC i			,σC	i

and P(xk|Ci) is

Here, mean μ and standard deviation σ are estimated based on the values of

		13

Data to be classified: X = (age <=30, Income = medium, Student = yes, Credit_rating = Fair)


P(buys_computer = “no”) = 5/14= 0.357


 P(Ci): P(buys_computer = “yes”) = 9/14 = 0.643 P(buys_computer= “no”) = 5/14= 0.357  Compute P(X\|Ci) for each class, where, X = (age <=30, Income = medium, Student = yes, Credit_rating = Fair)

16

Naïve Bayes Classifier: An Example

 Compute P(Xi|Ci) for each class
P(age = “<=30”|buys_computer = “yes”) = 2/9 = 0.222 P(age = “<= 30”|buys_computer = “no”) = 3/5 = 0.6
P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444 P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4 P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667 P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2

P(X|Ci) : P(X|buys_computer= “yes”) = P(age = “<=30”|buys_computer = “yes”) x P(income = “medium” | buys_computer = “yes”) x P(student = “yes” | buys_computer = “yes) x P(credit_rating = “fair” | buys_computer = “yes”) = 0.044

18






19




