And exponential series know the definition the pgf

random variables. For some stochastic processes, they also have a special role

in telling us whether a process will ever reach a particular state.

1. Geometric Series
∞ 1
1 + r + r2+ r3+ . . . = � rx = 1 − r, when |r| < 1.

This formula proves that�∞P(X = x) = p(1 − p)x ⇒ x=0P(X = x) = 1 when X ∼ Geometric(p):

p
= 1 − (1 − p) (because |1 − p| < 1)

= 1.

(p + q)n =

��n�pxqn−x.

�

P(X = x) =

3. Exponential Power Series

∞ λx

�

P(X = x) =

4.2 Probability Generating Functions

The probability generating function (PGF) is a useful tool for dealing with discrete random variables taking values 0, 1, 2, . . .. Its particular strength is that it gives us an easy way of characterizing the distribution of X + Y when X and Y are independent. In general it is difficult to find the distribution of a sum using the traditional probability function. The PGF transforms a sum into a product and enables it to be handled much more easily.

GX(s) = E	�sX�	=	∞ �	sxP(X = x).

Properties of the PGF:


		GX(1) =			∞ �	1xP(X = x) =		∞ �
Example 1:	Binomial Distribution
Let X ∼ Binomial(n, p), so P(X = x) =							�n
GX(s) =		n	sx
GX(s) =		�n	sx
=		�n
=
= (ps + q)n				by the Binomial Theorem: true for all s.

Thus GX(s) = (ps + q)nfor all s ∈ R.

Let X ∼ Poisson(λ), so P(X = x) =λx x!e−λ for x = 0, 1, 2, . . ..

GX(s) = � ∞ sxλx x!e−λ = e−λ� ∞ (λ s)x

50 40 30 20 10 0
G(s) 5 4 3 2 1 0
−1.0 −0.5 0.0 0.5 1.0 1.5 2.0

∞ X ~ Geom(0.8)

�
GX(s) = sxpqx G(s) to infinity

=

1 − qs for all s such that |qs| < 1.

The probability generating function gets its name because the power series can

be expanded and differentiated to reveal the individual probabilities. Thus,

	∞ �

Thus p0 = P(X = 0) = GX(0).

First derivative: G′X(s) = p1+ 2p2s + 3p3s2 + 4p4s3 + . . .

Thus p3 = P(X = 3) = 1 3!G′′′X(0).

In general:

GX(s) = 2 5s + 3 5s3 : G′X(s) = 2 5 + 9 5s2 : G′′
X(s) = 18 5s : G′′′X(s) = 18 5 :

GX(0) = P(X = 0) = 0.

1
G(r) X(s) = 0 ∀r ≥ 4 : r!G(r) X(s) = P(X = r) = 0 ∀r ≥ 4.

Thus

atives at s = 0. It follows that the PGF specifies a unique set of probabilities.

Fact: If two power series agree on any interval containing 0, however small, then

81

4.4 Expectation and moments from the PGF

1. E(X) = G′X(1).

2. E

�∞
GX(s) = sxpx,

⇒ G′X(1) =
�

0.0 0.5 1.0 1.5

Example: Let X ∼ Poisson(λ). The PGF of X is GX(s) = eλ(s−1). Find E(X)

and Var(X). X ~ Poisson(4)

6 4 2 0
G(s)

For the variance, consider 0.0 0.5 1.0 1.5

= λ2+ λ − λ2�X(X − 1)�+ EX − (EX)2

= λ.

Theorem 4.5: Suppose that X1, . . . , Xn are independent random variables, and

let Y = X1 + . . . + Xn. Then

			83

= E(sX1sX2. . . sXn)

= E(sX1)E(sX2) . . .E(sXn)

Solution: GX+Y (s) = GX(s) · GY (s)

= eλ(s−1)eµ(s−1)

withdraws amount Xi. The total amount withdrawn during the day is TN = X1 + . . . + XN.

84

the whole distribution of TN.

Theorem 4.6: Let X1, X2, . . . be a sequence of independent and identically dis-

GTN(s) = GN	�	GX(s)

GTN(s) = E(sTN) = E

= EN E sX1+...+XN �sX1+...+XN� (conditional expectation)

� �sX1. . . sXN� �
= EN E (Xi’s are indept of N)

� �sX1� �sXN� �
= EN E . . . E (Xi’s are indept of each other)

Example:
My aunt was asked by her neighbours to feed the prize goldfish in their garden pond while they were on holiday. Although my aunt dutifully went and fed them every day, she never saw a single fish for the whole three weeks. It turned out that all the fish had been eaten by a heron when she wasn’t looking!

Let Xi =� 1 if heron catches a fish on visit i, Then T = X1 + X2 + . . . + XN (randomly stopped sum), so

GT(s) = GN(GX(s)).

GT(s) =	1 − θ 1 − θ GX(s)

=
=	1 −�1 − θ + θp� θp
This is the PGF of the Geometric ness of PGFs, we have: �1 −		θp		�
		1 − θ + θp		�
T ∼ Geometric�			1 − θ 1 − θ + θp		�	.

• Let T = X1 + . . . + XN be the randomly stopped sum;• Find the distribution of T.

88

• if N = n is fixed, then T = X1 + . . . + Xn is a sum of n independent Binomial(1, p) random variables, so (T | N = n) ∼ Binomial(n, p).

For most distributions of X, it would be difficult or impossible to write down the distribution of X1 + . . . + Xn:

Back to the heron problem: we are lucky in this case that we know the distri-bution of (T | N = n) is Binomial(N = n, p), so

P(T = t \| N = n) =

∞ = ��n�pt(1 − p)n−t(1 − θ)θn p ∞ = (1 − θ)		89
∞ = ��n�pt(1 − p)n−t(1 − θ)θn p ∞ = (1 − θ)	(⋆⋆)	89

Without the PGF, we have two major difficulties:

	GX(s) = E(sX)		�	= G(k) X(1)
Used for:
		E�X(X − 1) . . .(X − k + 1)
Probabilities:		E�X(X − 1) . . .(X − k + 1)

We have been using PGFs throughout this chapter without paying much at-tention to their mathematical properties. For example, are we sure that the power series GX(s) =�∞integrate the infinite power series term by term as we did in Section 4.4? When we said in Section 4.4 that E(X) = G′X(1), can we be sure that GX(1) and its x=0sxP(X = x) converges? Can we differentiate and

derivative G′X(1) even exist?

It is always ≥ 1: every PGF converges for at least s ∈ (−1, 1).

The radius of convergence could be anything from R = 1 to R = ∞.

91

The figure shows the PGF of the Geometric(p = 0.8) distribution, with its radius of convergence R = 5. Note that although the convergence set (−5, 5) is symmetric about 0, the function GX(s) = p/(1 − qs) = 4/(5 − s) is not.

Geometric(0.8) probability generating function

At the limits of convergence, strange things happen:

• At the positive end, as s ↑ 5, both GX(s) and p/(1 − qs) approach infinity.

Additionally, when s = −5, Unlike the positive end, this means that GX(s) is not (right)-continuous GX(−5) = 0.8�∞x=0(−1)x does not exist.

at −R:
s↓−5GX(s) = 0.4 ̸= GX(−5).

GX(s) = � sx�n�pxqn−x

Abel’s Theorem for continuity of power series at s = 1

Recall from above that if X ∼ Geometric(0.8), then GX(s) is not continuous at the negative end of its convergence (−R):

93

Theorem 4.8: Abel’s Theorem.

lim s↑1G(s) =� pi = G(1) ,

whether or not this sum is finite.

Abel’s Theorem means that for any PGF, we can write GX(1) as shorthand for

lims↑1 GX(s).

�∞
GX(s) = sxpx,

�
so G′X(s) = xsx−1px (term-by-term differentiation: see below).

= lim s↑1G′X(s),

G′

We have stated that the PGF converges for all |s| < R for some R. In fact,

the probability generating function converges absolutely if |s| < R. Absolute

This is useful because GX(s) × GY (s) = GX+Y (s) if X and Y are independent.

The PGF also converges uniformly on any set {s : |s| ≤ R′} where R′< R.

Uniform convergence allows us to differentiate or integrate the PGF term by��sxP(X = x) − GX(s)�� < ǫ.
n

term.

ds(sxP(X = x))=

sx+1
ds =

�� b sxP(X = x) ds�
∞

95

4.9 Special Process: the Random Walk

1/2	−2	1/2	−1	1/2	0	1/2	1	1/2	2	1/2	3
1/2	−2	1/2	−1	1/2	0	1/2	1	1/2	2	1/2	3

Question:

What is the key difference between the random walk and the gambler’s ruin?

In the gambler’s ruin, states are not equal: the states closest to 0 are more likely to end in ruin than the states closest to winning. By contrast, the random walk has no end-points, so (for example) the distribution of the time to reach state 5 starting from state 0 is exactly the same as the distribution of the time to reach state 1005 starting from state 1000. We can exploit this fact to solve some problems for the random walk that would be much more difficult to solve for the gambler’s ruin.

96

In Chapter 3 we saw how to find expected reaching times: the expected number of steps taken to reach a particular state. We used the law of total expectation and first-step analysis (Section 3.5).

However, the expected or average reaching time doesn’t tell the whole story. Think back to the model for gene spread in Section 3.7. If there is just one animal out of 100 with the harmful allele, the expected number of generations to fixation is quite large at 10.5: even though the allele will usually die out after one or two generations. The high average is caused by a small chance that the allele will take hold and grow, requiring a very large number of generations before it either dies out or saturates the population. In most stochastic processes, the average is of limited use by itself, without having some idea about the variance and skew of the distribution.

97

Define Tij to be the number of steps taken to reach state j, starting at state i.

Problem:		�sT01�

Arrived!

1 with probability 0.5,

Yn =�−1 with probability 0.5,

= E

= 1

♠H(s) = E

Now if Y1 = 1, then T01 = 1 definitely, so E

Now T−1,0 and T01 are independent (Markov property). Also, they have the

same distribution because the process is translation invariant (i.e. all states are

99

Thus

1 ±�1 − 41

to go from 0 to 1. With the positive root, lims→0 H(0) = lims→0

we take the negative root instead. �2� = ∞; so

s→0�f(s)� = lim

s→0�f ′(s)�

100

Notation for quick solutions of first-step analysis for finding PGFs

Now				Here is a good
T =	�
T =	�

Taking expectations:

Thus:

sH(s)2− 2H(s) + s = 0.

H(s) = 1 −

4.10 Defective random variables

A random variable is said to be defective if it can take the value ∞.

P(Tij = ∞) = P(we NEVER reach state j, starting from state i).

In other cases, we will always reach state j eventually, starting from state i.

T is defective if P(T = ∞) > 0.

102

∞
�
P(T = t) < 1,

because we are missing the positive value of P(T = ∞). All probabilities of T must still sum to 1, so we have

PGFs for defective random variables

When T is defective, the PGF of T is defined as the power series

H(s) =	∞ �

Because H(s) is a power series satisfying the conditions of Abel’s Theorem, we know that:

• H(s) is left-continuous at s = 1, i.e. lims↑1 H(s) = H(1). This is different from the behaviour of E(sT), if T is defective:

We know what E(sT) is when s = 1:

• E(sT) is always 1 when s = 1, whether T is defective or not: E(1T) = 1 for ANY random variable T.

Let H(s) =�∞for |s| < 1. Then T is defective if and only if H(1) < 1.
t=0stP(T = t) be the power series representing the PGF of T

104

3. The event that we never reach state j, starting from state i, is the same as the event that T = ∞. (If we wait an infinite length of time, we never get there.) So

P(never reach state j | start at state i) = P(T = ∞).

Overall:

P( never reach state j | start at state i) = P(T = ∞) = 1 − H(1).

will get the wrong answer.

When you are asked to find E(T) in a context where T might be defective:

In the random walk in Section 4.9, we defined the first reaching time T01 as the

number of steps taken to get from state 0 to state 1.

Questions:

a) What is the probability that we never reach state 1, starting from state 0?

Now H(1) =1−	√1 1−12	= 1. So T01 is not defective.

106

b) Because T01 is not defective, we can find E(T01) by differentiating the PGF:

H′(s) = −s−2−1 = s−1−�s−2− 1

�s−2− 1 �1/2 �−1/2 �−2s−3�

This result is striking. Even though we will definitely reach state 1, the

expected time to do so is infinite! In general, we can prove the following results

p = q =1 2 Guaranteed 0 ∞

for s < 1. At s = 1, the solution for E(sT) suddenly flips from the − root to the + root of the quadratic. This explains how E(sT) can be discontinuous as

s ↑ 1, even though the negative root for H(s) is continuous as s ↑ 1 and all the