Language:EN
Pages: 22
Rating : ⭐⭐⭐⭐⭐
Price: $10.99
Page 1 Preview
cantor function maps cantor set length zero the in

Cantor function maps cantor set length zero the interval

Continuos Random Variables and Moment Generating Functions OPRE 7310 Lecture Notes by Metin C¸ akanyıldırım

Compiled at 18:26 on Wednesday 27thSeptember, 2017

(ai, bi). Hence, an absolutely continuous function maps sufficiently tight intervals to arbitrarily tight inter-FX(ai) can be bounded by ϵ for every given δ that bounds ∑n i=1(bi− ai) for finitely many disjoint n intervals i=1FX(bi)

vals. In the special case, an absolutely continuous function maps intervals of length zero to intervals of length zero (p.413 Billingsley 1995). In other words, you cannot make something out of nothing by passing it through an absolutely continuous function.

Available intervals Length of Length of removed

tion n

available intervals

intervals in this iteration

The discrete random variable X has the probability mass function P(X = a), which is mostly nonzero. For an absolutely continuous random variable, we must set P(X = a) = 0. Otherwise, P(X = a) > 0 implies that the set of Lebesgue measure A = [a, a] has P(A) > 0, which contradicts the absolute continuity of X or its cdf FX(a) = P(X ≤ a). FX is absolutely continuous if and only if there exists a nonnegative function fX such that

FX(a) =∫ a−fX(u)du.

Example: Consider the absolutely continuous cdf F(x) = 1I0≤x≤1x + 1Ix>1. Let f0(x) = 1I0≤x≤1 and let

points do not affect the integral, we have fa(x) = f0(x) + a1Ix=a for a ∈ [0, 1]. Since an alteration at a single point or alterations at countably many

not the input. The connection to a density, if it exists, is through integral. This puts integral at the center F(x) := P(X ≤ x) without any reference to a density. A density is often an outcome of this process but

2

iii) Nonlinearity of variance: For constant c, V(cg(X)) = c2V(g(X)).

iv) V(X) = E(X2) (E(X))2.

random variables X and Y are said to be independent.

3

Example: For two independent random variables, we check the equality E(X1X2) = E(X1)E(X2):

E(X1X2) =

∫ ∞∞∫ ∞∞(x1x2) fX1(x1) fX2(x2)dx1dx2 =

∫ ∞x1 fX1(x1)dx1
=

V(X2). In summary, we have

v) For two independent random variables X1 and X2, E(X1X2) = E(X1)E(X2).

A uniform random variable X takes values between x and x, and the probability of X being in any subinterval

of [x, x] of a given length δ is the same:

1 =∫ x xf (u)du =∫ x xcdu = c(x − x).

Hence, f (u) = 1/(x − x) for x ≤ u ≤ x.

Example: For two independent uniform random variables X1, X2 ∼ U(0, 1), find the pdf of max{X1, X2}.

P(max{X1, X2} ≤ a) = P(X1 ≤ a, X2 ≤ a) = P(X1 ≤ a)P(X2 ≤ a) = a2for 0 ≤ a ≤ 1.

P(a ≤ X ≤ a + δ) = exp(−λ(a − b))P(b ≤ X ≤ b + δ) for 0 ≤ a ≤ b and δ ≥ 0,
where the decay parameter λ is the parameter of the distribution. First letting δ → ∞ and then setting a = 0, we obtain the cdf

P(a ≤ X) = exp(−λ(a − b))P(b ≤ X) = P(b ≤ X) = exp(−λb) = FX(b) = P(X ≤ b) = 1 exp(−λb). FX is differentiable and its derivative yields the density

which gives us the memoryless property: From the cdf, we can obtain the complementary cdf ¯F(x) = exp(−λx). So we have ¯F(a + b)/ ¯F(a) =¯F(b)
P(X > a + b|X > a) =P(X > a + b) = ¯F(a + b)

F(b) = P(X > b).

¯F(a)

Translated into lifetime of a machine, the memoryless property implies the following: If a machine is func-tioning at time a (has not failed over [0, a]), the probability that it will be functioning at time a + b (will not fail over [0, a + b]) is exactly the same as the probability that it functioned for the first b time units after its in-stallation. This implies that if the machine has not failed, it is as good as new. “As good as new” assumption is used in reliability theory along with other assumptions such as “new better than used”.

Since 1 = P(X ≥ x), we have no choice but set

P(X ≥ a) = ( x )α for a ≥ x.

E(X) = ∫ ∞ xα xα+1 dx = α ∫ ∞
−α xα =
For α = 1, α − 1

xα−1

��� x

α − 1

E(X) = ∫ ∞ x x x2 dx = ∫ ∞
This also implies that E(X) diverges for any α ≤ 1. Pareto random variable presents an example where even the first moment does not exist. Note that when the first moment does not exist, the higher moments do not

For fixed λ and sufficiently large x, ln(x)/x ≤ λ(1 − λ). This inequality implies (1/(1 − λ)) ln(x) ≤ λx and in turn x−1/(1−λ)≥ e−λx. When λ = 0.2, the inequality ln(x)/x ≤ λ(1 − λ) holds for x ≥ 19, e.g., ln(19)/19 = 0.1549 0.16 = 0.2(1 0.2). When λ = 0.4, it holds for x ≥ 10, e.g., ln(10)/10 = 0.2302 0.24 = 0.4(1 0.4). For every λ, we can find an interval of the form [x, ∞) such that Pareto random variable places more probability into this interval than Exponential random variable. In summary, Pareto random variable has a heavier tail than Exponential random variable.

Distributions whose tails are heavier than Exponential are called heavy tail distributions. Pareto is an example of heavy tail distributions. Heavy tail distributions leave significant probability to their tails where the random variable takes large values. Multiplication of these large values by large tail probabilities can yield to infinite mean and variance. This may be disappointing but does not indicate any inconsistency within the probability theory. If anything, you should be careful when applying results requiring finite moments to arbitrary random variables.

On page 501 of Pareto (1897), he says:

.. . every endeavor must be made to discover if the distribution of wealth presents any uniformity at all. Fortunately the figures representing the distribution of wealth group themselves according to a very simple law, which I have been enabled to deduce from unquestioned statistical data. .. . N = A/(x + b)α. In which N represents the number of individuals having an income greater than x.

P(X ≤ x) = P(X1 + X2 ≤ x) =∫ x 0P(X2 ≤ x − u)λe−λudu =∫ x 0(1 − e−λ(x−u))λe−λudu

= ∫ x 0λe−λudu −x 0λe−λxdu = −e−λx + 1 − λxe−λx.

for x ≥ 0 and α > 0.

You can specialize this formula to α = 2 to see that it is consistent with the result of the last exercise. We use Gamma(α, λ) to denote a Gamma random variable.

probabilities for the second floor are P(Gamma(5, λ) 10) and P(Gamma(5, λ) 10|Gamma(5, λ) 8).

Example: For integer α, find E(Gamma(α, λ)) and V(Gamma(α, λ)). We can start with the first two moments of Exponential random variable: E(Expo(λ)) = 1/λ and E(Expo(λ)2) = 2/λ2, which imply V(Expo(λ)) = 1/λ2. Since α is integer, we can think of Gamma as sum of α independent exponential random variables so that we can apply the formulas for the expected value and variance of sum of independent random variables.

Density 1.0
0.8
0.6
0.4
0.2
0.0
0 2 4 6

8

x

As we show in the Appendix, B(k, m) = Γ(k)Γ(m)/Γ(k + m).

Having defined the Beta function, let us consider an experiment which helps us to relate Beta random variable to the other random variables we have already studied. We consider Bernoulli trials where the

success probability is constant but unknown. That is, let P be a random variable denoting the success prob-ability and suppose that the success probabilities in each of the Bernoulli trials is P. After n trials, suppose that we observe m success. Given these m successes we can update the distribution of P, we are seeking P(P = p|Bin(n, P) = m):
=

To simplify further, suppose that fP is the uniform density, then
m) ∫ 1 0fP(u)P(Bin(n, u) = m)du where fP is the pdf of the success probability P. These lead to

fP|Bin(n,p)=m(p) =

∫ 1 0um(1 − u)n−mdu
pm(1 − p)n−m =

B(m + 1, n − m + 1)
pm(1 − p)n−m

Beta random variable can be used to represent percentages and proportions.

Example: Find E(X).

3.6 Normal Random Variable

Normal random variable plays a central role in probability and statistics. Its density has light tails so many observations are expected to be distributed around its mean. The density is a symmetric and has a range over the entire real line, so the normal random variable can take negative values. The density for normally

The cumulative distribution function at an arbitrary x cannot be found analytically so we refer to published tables or software for probability computation. Nevertheless, we can analytically integrate over the entire range. We use Normal(µ, σ2) to denote a normal random variable with mean µ and standard deviation σ.

Example: Establish that the normal density integrates to 1 over . We first establish by using z = (x − µ)/σ that

∫ ∞ 1
2π
e−z2 1/2dz1 ∫ ∞ 1
2π
e−z2 2/2dz2 =1 2π ∫ ∞

Now consider the transformation r2= z2 1+ z2 2and θ = arccos(z1/r). Hence, z1 = r cos θ and z2 = r sin θ.

The Jacobian for this transformation is

whose determinant is just r. Thus, J(r, θ) =  cos θ
sin θ r cos θ

2π∫ 2π dθ∫ ∞re−r2/2dr = 1.

Example: Find E(X). Using z = (x − µ)/σ � �� �� �� �

2π e−z2/2dz = µ∫ ∞


∞(−u)e−(−u)2/2(−du) =

σ

2π∫ ∞ �ze−z2/2dz = µ.

From the last example, we can see that any random variable can be standardized by deducting its mean and dividing by its standard deviation, the standard normal random variable is Normal(0, 1).

X is a lognormally distributed random variable if ln X is Normal(µ, σ2). That is, X can be expressed as

X = eYfor Y ∼ Normal(µ, σ2).

P(X x) = P(Y ln x) = ∫ ln x
≤ ≤ 2πσ

σ

Taking the derivative with respect to x

Example: Find E(Xk)

Given two random varaible X and Y, their mixture Z can be obtained as

Z = { X
}
with probability 1 − p

We can easily obtain the pdf of Z if both X and Y are continuous random variables.

When one of the mixing distributions is discrete and another one is continuous, we do not have a pdf.

Random variables derived from others through max or min operators can have masses at the end of their ranges. Product returns due to unmet expectations to a retailer can be treated as negative demand, then the demand D takes values in . If you want only the new customer demand for products, you should consider max{0, D} which has a mass at 0. When a person buys a product, you know that his willingness to pay W is more than the price p. Willingness-to-pay of customers who already bought the product is max{p, W}. When you have a capacity of c and accept only demand that is less than the capacity, the accepted demand becomes min{c, D}. The last two derived random variables have masses at p and c.

4 Moment Generating Functions

mX(t) =

x=
exp(tx)pX(x) and mX(t) =
mX(t) =

∞ ∞ ∞

1 ∑ pX(x) + txpX(x) +t2 2! ∑ x2pX(x) + · · · +ti



x=

xipX(x) + . . .

=
=
=

Form the last equality, we obtain mX(t = 0) = 1. To pick up only the E(X) term on the right-hand side, we consider the derivative of mX(t) with respect to t. Provided that derivative can be pushed into the summation in the last equation above, we can obtain

For example,


E(Xi) ti−1 and

E(Xi)
Then
i=2 (i − 1)! i=3

(i − 2)!

E(Xk) =dk dtk mX(t)

���� t=0
12

mX+Y(t) = E(exp(t(X + Y))) = E(exp(tX))E(exp(tY)) = mX(t)mY(t),

where independence of exp(tX) and exp(tY) is used in the middle equality.

Example: For an exponential random variable X find the moment generating function mX(t) and the mo-ments E(Xk).

mX(t) = E(etX) = ∫ ∞ etxλe−λxdx = λ ∫ ∞ e−(λ−t)xdx = λ (

1
λ − te−(λ−t)x

���� ) =
Then 0

mGamma(n,λ)(t) = Πn i=1mExpo(λ)(t) = ( λ − t )n .

λ

Example:
function to obtain the distribution of two independent Normal random variables X1 + X2, where X1 = Normal(µ1, σ2 1) and X2 = Normal(µ2, σ2 2). We have
m(t) =
e−0.5(x−µ)2/σ2dx = eµt ∫ ∞ 1
2π

1

e−0.5(z−σt)2dz

X 2πσ
=

13

so X1 + X2 ∼ Normal(µ1 + µ2, σ2 1+ σ2 2).

5 Application: From Lognormal Distribution to Geometric Brownian Motion
If St denotes the price of a security (a stock, a bond, or their combinations) at time t ≥ 0 for given S0 = s0 known now, we can consider
St = St St−1 . . .S2 S1

.

S0 St−1 St−2 S1 S0

Suppose that each ratio Si/Si−1 is independent of the history Si−1, Si−2, . . . , S1, S0 and is given by a Lognor-mal distribution with parameters µ and σ.

St = s0Xt,

where Xt is a lognormal random variable with parameters and 2. Then we can immediately obtain the mean and variance of the security price

15

6 Exercises

Xi? Can you use nXi(t) to easily compute the nX(t) for the Binomial random variable X = ∑n i=1Xi,

how?

7. When X is demand and a is inventory, E(1IX≥a(X − a)) is the expected inventory shortage. For X = Normal(µ, σ2) and Z = Normal(0, 1), establish that

Inventory Shortage: E(1IX≥a(X − a)) = σ fZ (a − µ ) (a − µ) ( 1 − FZ (a − µ )) .

11. We set October as month 0, and use S1 and S2 to denote stock prices for months of November and De-

cember. We assume that S2/S1 and S1/S0 have independent Lognormal distributions. In other words,

the price St at the end of quarter t is given by St = MtSt−1 for S0 = s0. The logarithm of the multiple Mt has normal distribution whose variance remains constant at σ2over the quarters but its mean increases

proportionally E(ln Mt) = µt. What is the distribution of S4 at the end of the year? Suppose that

for appropriate numbers m, n ≤ k. Find out what m, n are in terms of k. Specialize your formula to k = 1 and k = 2 and provide these special versions.

14. A lake hosts N = 3 frogs, which start taking sunbathes together everyday. Frogs are very particular about the dryness of their skin so they jump into water after a while and finish their sunbathing. Each frog takes a sunbathe for Expo(λ) amount of time independent of other frogs. Let Xn be the duration of the time at least n frogs are sunbathing together. Find the cdfs of X1 and X3. Does X1 or X3 have one of the common distributions?

d) Supose that Xi’s are independent. Is it always true that each Xi is an exponential random variable with parameter λ? If yes, prove this statement. Otherwise, provide a counterexample.

16. Suppose that Xi is the wealth of individuals in country i, whose population is ni, for i = 1, 2. Let X1, X2 be iid Pareto random variables such that P(Xi ≥ a) = 1/a for a ≥ 1. P(Xi ≥ a) can also be interpreted as the percentage of the population with wealth a or more. We make some groups of individuals by picking ˜n1 individual from country 1 and ˜n2 individuals country 2 such that ˜n1/ ˜n2 = n1/n2: The representation of each country in these groups is proportional to its sizes. Let X be the average wealth of individuals in any one of these groups: X = ( ˜n1X1 + ˜n2X2)/( ˜n1 + ˜n2). Groups have identically distributed average wealths and we are interested in the distribution of this wealth.

17

a) Evaluate the tail probability as one of the countries becomes very large, i.e., ni → ∞.

b) Evaluate the tail probability when the countries have the same size.

18

7 Appendix: Change of Variables in Double Integrals and Beta Function

x = X(u, v) and y = Y(u, v) for (x, y) ∈ S, (u, v) ∈ T.

Note that X and Y are functions of (u, v), so we can think of partial derivatives of the form ∂X/∂u, ∂X/∂v,

∂u
∂u

∂Y

∂u

∂X
∂v

∫∫ Sf (x, y)dxdy = ∫∫

Tf (X(u, v), Y(u, v))|J(u, v)|dudv.

Example: Use change of variable rule above to establish the connection between Beta function and Gamma

Let u = x/(x + y) and v = x + y so that x = X(u, v) = uv and Y(u, v) = v(1 − u). Then the Jacobian matrix is

J(u, v) = −v 1 − u  ,
v u

8 Appendix: Induction over Natural Numbers

In these notes, we take natural numbers as N = {0, 1, 2, . . . } by choosing to include 0 in N . However,

P5: If S ⊂ N is such that i) 0 ∈ S and ii) x ∈ N ∩ S =⇒ δ(x) ∈ S, then S = N . By P1 and P3, 0 can be thought as the initial element of natural numbers in the sense that 0 is not a

successor of any other natural numbers. P2 introduces the successor function and P4 requires that if the

A natural number cannot be a successor of its own: If x ∈ N , then δ(x) ̸= x.
Each natural number except 0 has a unique predecessor: If x ∈ N \ {0}, then there exists y such that x = δ(y).

There exists a unique function f : N × N → N such that f (x, 1) = δ(x) and f (x, δ(y)) = δ( f (x, y)) for

One cannot help but realize the similarity between the theorem statement and axiom P5. So the proof is S(k) =⇒ S(k + 1), then S(n) holds for every n ∈ N .

based on P5. Let A = {n ∈ N : S(n) is true }. Then by condition i) in the theorem statement, 0 ∈ A. By ii) in the theorem statement and S(k = 0) true, we obtain that S(k + 1 = 1) is true. Then 1 ∈ A. In general, k ∈ A implies k + 1 ∈ A. Hence A = N by P5. That is, S(n) holds for every n ∈ N . Does the induction extend to infinity? To answer this, we ask a more fundamental question: Is infinity a

There can be other versions of the induction theorem (by initializing with S(m) for m > 0 or by assum-

ing S(m) for all m ≤ k in the kth induction step). I have even found a description of induction over real numbers by considering intervals and proceeding from a tighter interval towards a broader one (see The In-

You are viewing 1/3rd of the document.Purchase the document to get full access instantly

Immediately available after payment
Both online and downloadable
No strings attached
How It Works
Login account
Login Your Account
Place in cart
Add to Cart
send in the money
Make payment
Document download
Download File
img

Uploaded by : Gustavo Olivé Cortina

PageId: DOCB08E81F