Densityin the previous section

32 Theory:Probabilistic Classifiers

Lemma 2.14 For any 0 ≤ ε ≤ H(Pm P ), let Sε⊆ S be the set of all samples

claim:

Claim 1 Let Pm be a product distribution defined by the marginals Piover

D(P||Pm P ) = �P(x) log P(x) −�P(x) log Pm P (x)

= −H(P) −�P(x1, . . . xn) log Pm P (x1, . . . xn)

= −H(P) −��{log Pi(xi)}

(x1,...xi−1,xi+1,...xn)∈X� P(x1, . . . xn)

�H(Pi) = H(Pm P ),

Density of Distributions 33

			(2.34)
and by integrating over the range [0, ε] we get Theorem 2.12.
5.1	Distributional Density

In the previous section, Theorem 2.12 was phrased in terms of the number of sample sets that share the same marginal distribution and thus yield the same classifier. We now prove a similar result directly in terms of the number of joint distributions at a certain distance from their induced product distribution. We assume a fixed resolution τ for the representation of real numbers; two real numbers are indistinguishable if their difference is smaller than τ.


\|P≥ε\| \|P\|	≤ A exp −√Bε.	(2.35)

Consider a joint probability distribution P ∈ P over X × X with marginals pa = P(X1= 1) and pb = P(X2= 1). Define δ = P11

algebra we get − papb. By simple

P = [P11

P10

P01

P00