adding random variables
adding RVs
brian avery
September 16, 2018
library(ggplot2)
set.seed(87349)
adding UNIF
First, simulate several \(UNIF(0,1)\) RVs, also sums of these RVs, and store in dataframe.
uniforms <- data.frame(U=runif(100000), V=runif(100000), W=runif(100000),
X=runif(100000), Y=runif(100000), Z=runif(100000))
uniforms$two <- uniforms$U + uniforms$V
uniforms$three <- uniforms$U + uniforms$V + uniforms$W
uniforms$four <- uniforms$U + uniforms$V + uniforms$W + uniforms$X
uniforms$five <- uniforms$U + uniforms$V + uniforms$W + uniforms$X + uniforms$Y
head(uniforms)
## U V W X Y Z
## 1 0.2237303 0.212718492 0.21313572 0.8592435 0.230055725 0.39315550
## 2 0.6732303 0.049370517 0.09932315 0.5116051 0.323103104 0.51446665
## 3 0.1352415 0.004280132 0.06291041 0.2264902 0.005689052 0.61851831
## 4 0.8180958 0.846841856 0.58342319 0.5776902 0.170271991 0.04613788
## 5 0.4162543 0.663753846 0.88186666 0.1426889 0.240065767 0.08535646
## 6 0.5642494 0.099078418 0.60979669 0.2343633 0.838472871 0.17989737
## two three four five
## 1 0.4364488 0.6495845 1.5088279 1.7388837
## 2 0.7226008 0.8219240 1.3335291 1.6566322
## 3 0.1395216 0.2024320 0.4289222 0.4346113
## 4 1.6649377 2.2483608 2.8260510 2.9963230
## 5 1.0800081 1.9618748 2.1045637 2.3446294
## 6 0.6633279 1.2731245 1.5074878 2.3459607
Each pair of graphs is:
Left side - histogram of RV Right side - density of simulated RV in blue, Normal RV with same mean and sd as simulated RV in orange.
single \(UNIF(0,1)\)
ggplot(uniforms, aes(Z)) + geom_histogram(boundary=0) + theme_minimal()
ggplot(uniforms, aes(Z)) + geom_density(color="blue", size=1) +
stat_function(fun=dnorm, args=list(mean=mean(uniforms$Z),
sd=sd(uniforms$Z)), color="orange", size=1) +
theme_minimal()
two \(UNIF(0,1)\)
ggplot(uniforms, aes(two)) + geom_histogram() + theme_minimal()
ggplot(uniforms, aes(two)) + geom_density(color="blue", size=1) +
stat_function(fun=dnorm, args=list(mean=mean(uniforms$two),
sd=sd(uniforms$two)), color="orange", size=1) +
theme_minimal()
three \(UNIF(0,1)\)
ggplot(uniforms, aes(three)) + geom_histogram() + theme_minimal()
ggplot(uniforms, aes(three)) + geom_density(color="blue", size=1) +
stat_function(fun=dnorm, args=list(mean=mean(uniforms$three),
sd=sd(uniforms$three)), color="orange", size=1) +
theme_minimal()
adding three \(UNIF(0,1)\) doesn’t look as much like a parabola, as it does a Normal, at least by eye…
four \(UNIF(0,1)\)
ggplot(uniforms, aes(four)) + geom_histogram() + theme_minimal()
ggplot(uniforms, aes(four)) + geom_density(color="blue", size=1) +
stat_function(fun=dnorm, args=list(mean=mean(uniforms$four),
sd=sd(uniforms$four)), color="orange", size=1) +
theme_minimal()
five \(UNIF(0,1)\)
ggplot(uniforms, aes(five)) + geom_histogram() + theme_minimal()
ggplot(uniforms, aes(five)) + geom_density(color="blue", size=1) +
stat_function(fun=dnorm, args=list(mean=mean(uniforms$five),
sd=sd(uniforms$five)), color="orange", size=1) +
theme_minimal()
as you add more \(UNIF(0,1)\) RVs together it looks more and more Normal.
adding Normals
standard Normals (0,1)
First, simulate several \(N(0,1)\) RVs, also sum of these RVs, and store in dataframe.
norms <- data.frame(U=rnorm(100000), V=rnorm(100000), W=rnorm(100000),
X=rnorm(100000), Y=rnorm(100000), Z=rnorm(100000))
norms$two <- norms$U + norms$V
norms$three <- norms$U + norms$V + norms$W
norms$four <- norms$U + norms$V + norms$W + norms$X
norms$five <- norms$U + norms$V + norms$W + norms$X + norms$Y
head(norms)
## U V W X Y Z
## 1 0.9581680 0.1310740 -1.3868298 -2.8754473 1.00020538 0.006235532
## 2 0.5813510 -0.1671726 0.2535913 -1.8112824 -0.02783789 -0.157884604
## 3 1.1681468 -1.3054014 0.5745714 1.1506819 2.65099417 1.064256780
## 4 0.6596708 -0.1528759 -1.3677272 -0.8540412 -1.64139015 1.436712707
## 5 -0.7457130 -1.5081185 -1.4254729 0.1393246 -1.53245196 -2.052007159
## 6 0.2144402 0.3832123 1.5384686 -0.7112149 0.61895148 1.614482291
## two three four five
## 1 1.0892420 -0.2975879 -3.173035 -2.172830
## 2 0.4141783 0.6677696 -1.143513 -1.171351
## 3 -0.1372546 0.4373167 1.587999 4.238993
## 4 0.5067949 -0.8609323 -1.714974 -3.356364
## 5 -2.2538314 -3.6793043 -3.539980 -5.072432
## 6 0.5976525 2.1361211 1.424906 2.043858
Each pair of graphs is:
Left side - histogram of RV Right side - density of simulated RV in blue, Normal RV with same mean and sd as simulated RV in orange.
one \(N(0,1)\)
ggplot(norms, aes(Z)) + geom_histogram() + theme_minimal()
ggplot(norms, aes(Z)) + geom_density(color="blue", size=1) +
stat_function(fun=dnorm, args=list(mean=0,
sd=1), color="orange", size=1) +
theme_minimal()
two \(N(0,1)\)
ggplot(norms, aes(two)) + geom_histogram() + theme_minimal()
ggplot(norms, aes(two)) + geom_density(color="blue", size=1) +
stat_function(fun=dnorm, args=list(mean=0,
sd=sqrt(2)), color="orange", size=1) +
theme_minimal()
three \(N(0,1)\)
ggplot(norms, aes(three)) + geom_histogram() + theme_minimal()
ggplot(norms, aes(three)) + geom_density(color="blue", size=1) +
stat_function(fun=dnorm, args=list(mean=0,
sd=sqrt(3)), color="orange", size=1) +
theme_minimal()
four \(N(0,1)\)
ggplot(norms, aes(four)) + geom_histogram() + theme_minimal()
ggplot(norms, aes(four)) + geom_density(color="blue", size=1) +
stat_function(fun=dnorm, args=list(mean=0,
sd=sqrt(4)), color="orange", size=1) +
theme_minimal()
five \(N(0,1)\)
ggplot(norms, aes(five)) + geom_histogram() + theme_minimal()
ggplot(norms, aes(five)) + geom_density(color="blue", size=1) +
stat_function(fun=dnorm, args=list(mean=0,
sd=sqrt(5)), color="orange", size=1) +
theme_minimal()
1 \(N(0,1)\) in grey, 2 \(N(0,1)\) in green, 3 \(N(0,1)\) in orange, 4 \(N(0,1)\) in blue, 5 \(N(0,1)\) in red all plotted on same axes.
ggplot(norms) + geom_density(aes(Z), color="grey20", size=1) +
geom_density(aes(two), color="green", size=1) +
geom_density(aes(three), color="orange", size=1) +
geom_density(aes(four), color="blue", size=1) +
geom_density(aes(five), color="red", size=1) +
xlab("simulated value") + xlim(c(-12,12)) + theme_minimal()
Normals other than standard
simulate some different Normals:
A is \(N(1,1)\)
B is \(N(0,2)\)
[note: I’m using \(N( \mu, sd)\) since that is what R uses.]
also making sums of multiple iid A like RVs (\(N(1,1)\)) and sums of multiple iid B like RVs (\(N(0,2)\)).
norms$A <- rnorm(100000, mean=1, sd=1)
norms$B <- rnorm(100000, mean=0, sd=2)
norms$Atwo <- norms$A + rnorm(100000, mean=1, sd=1)
norms$Athree <- norms$Atwo + rnorm(100000, mean=1, sd=1)
norms$Afour <- norms$Athree + rnorm(100000, mean=1, sd=1)
norms$Afive <- norms$Afour + rnorm(100000, mean=1, sd=1)
norms$Btwo <- norms$B + rnorm(100000, mean=0, sd=2)
norms$Bthree <- norms$Btwo + rnorm(100000, mean=0, sd=2)
norms$Bfour <- norms$Bthree + rnorm(100000, mean=0, sd=2)
norms$Bfive <- norms$Bfour + rnorm(100000, mean=0, sd=2)
head(norms[11:20])
## A B Atwo Athree Afour Afive Btwo
## 1 -1.0438201 -4.8441826 0.8650604 1.3168871 2.318044 2.741057 -3.0955626
## 2 1.1912393 0.9785544 2.5942400 5.6255168 7.080638 9.948361 -0.1011114
## 3 1.9041271 -1.0798495 4.0626890 2.8878648 4.379048 5.087002 -5.1020909
## 4 0.5565502 -0.3527550 2.6218529 4.6465501 4.608646 4.213038 -2.8222264
## 5 -1.8627518 2.8063106 -0.4423007 -0.5511588 1.139664 1.968801 5.8245019
## 6 0.8872037 -5.0739639 1.5333508 4.2351763 5.117495 7.656440 -3.2865228
## Bthree Bfour Bfive
## 1 -1.062202 -2.798807 -4.693949
## 2 -2.637423 -6.391763 -4.758033
## 3 -3.980746 -5.641978 -8.529242
## 4 -4.274471 -3.920905 -1.883908
## 5 9.955951 7.319866 6.562164
## 6 -5.537575 -5.019152 -4.418313
Each pair of graphs is:
Left side - histogram of RV Right side - density of simulated RV in blue, Normal RV with same mean and sd as simulated RV in orange.
one A (\(N(1,1)\))
ggplot(norms, aes(A)) + geom_histogram() + theme_minimal()
ggplot(norms, aes(A)) + geom_density(color="blue", size=1) +
stat_function(fun=dnorm, args=list(mean=1,
sd=1), color="orange", size=1) +
theme_minimal()
five As (\(N(1,1)\))
ggplot(norms, aes(Afive)) + geom_histogram() + theme_minimal()
ggplot(norms, aes(Afive)) + geom_density(color="blue", size=1) +
stat_function(fun=dnorm, args=list(mean=5,
sd=sqrt(5)), color="orange", size=1) +
theme_minimal()
one B (\(N(0,2)\))
ggplot(norms, aes(B)) + geom_histogram() + theme_minimal()
ggplot(norms, aes(B)) + geom_density(color="blue", size=1) +
stat_function(fun=dnorm, args=list(mean=0,
sd=2), color="orange", size=1) +
theme_minimal()
five Bs (\(N(0,2)\))
ggplot(norms, aes(Bfive)) + geom_histogram() + theme_minimal()
ggplot(norms, aes(Bfive)) + geom_density(color="blue", size=1) +
stat_function(fun=dnorm, args=list(mean=0,
sd=sqrt(20)), color="orange", size=1) +
theme_minimal()
Comparing A (\(N(1,1)\)) in blue and B (\(N(0,2)\)) in green B, \(N(0,1)\) in grey.
ggplot(norms) + geom_density(aes(A), color="blue", size=1) +
geom_density(aes(B), color="green", size=1) +
stat_function(fun=dnorm, args=list(mean=0,
sd=1), color="grey70", size=1, linetype=5) +
theme_minimal() + xlab("simulated value")
Comparing one A (dashed) with 2 As in blue and one B (dashed) with 2 Bs in green B. \(N(0,1)\) in dashed grey in each.
ggplot(norms) + stat_function(fun=dnorm, args=list(mean=0,
sd=1), color="grey70", size=1, linetype=5) +
geom_density(aes(Atwo), color="blue", size=1) +
geom_density(aes(A), color=alpha("blue",0.6), size=1, linetype=2) +
stat_function(fun=dnorm, args=list(mean=0,
sd=1), color="grey70", size=1, linetype=5) +
theme_minimal() + labs(x="simulated value", y="density")
ggplot(norms) + stat_function(fun=dnorm, args=list(mean=0,
sd=1), color="grey70", size=1, linetype=5) +
geom_density(aes(Btwo), color="green", size=1) +
geom_density(aes(B), color=alpha("green",0.6), size=1, linetype=2) +
theme_minimal() + labs(x="simulated value", y="density")
Comparing one A (dashed) with sums of As (2,3,4,5) in blue, \(N(0,1)\) in dashed grey.
ggplot(norms) + stat_function(fun=dnorm, args=list(mean=0,
sd=1), color="grey70", size=1, linetype=5) +
geom_density(aes(A), color="#0000ff", size=1, linetype=2) +
geom_density(aes(Atwo), color="#3333ff", size=1) +
geom_density(aes(Athree), color="#6666ff", size=1) +
geom_density(aes(Afour), color="#9999ff", size=1) +
geom_density(aes(Afive), color="#ccccff", size=1) +
theme_minimal() + labs(x="simulated value", y="density")
Comparing one B (dashed) with sums of Bs (2,3,4,5) in green, \(N(0,1)\) in dashed grey.
ggplot(norms) + stat_function(fun=dnorm, args=list(mean=0,
sd=1), color="grey70", size=1, linetype=5) +
geom_density(aes(B), color="#008000", size=1, linetype=2) +
geom_density(aes(Btwo), color="#00b300", size=1) +
geom_density(aes(Bthree), color="#00e600", size=1) +
geom_density(aes(Bfour), color="#1aff1a", size=1) +
geom_density(aes(Bfive), color="#4dff4d", size=1) +
theme_minimal() + labs(x="simulated value", y="density")
Now add three different Normals:
\(N(0,1)\) + \(N(1,1)\) + \(N(0,2)\)
norms$ZAB <- norms$Z + norms$A + norms$B
plot Z+A+B
which should be \(N(1,\sqrt6)\) since:
\(\mu=0+0+1\) and,
\(sd=\sqrt{1^2+1^2+2^2}=\sqrt6\)
histogram on the left, on the right is the density of ZAB in blue and \(N(1,\sqrt6)\) in orange:
ggplot(norms, aes(ZAB)) + geom_histogram() + theme_minimal()
ggplot(norms, aes(ZAB)) + geom_density(color="blue", size=1) +
stat_function(fun=dnorm, args=list(mean=1,
sd=sqrt(6)), color="orange", size=1) +
theme_minimal()
Overlay of Z (\(N(0,1)\)) in orange, A (\(N(1,1)\)) in green, and B (\(N(0,2)\)) in red, with the result of Z+A+B (\(N(1,\sqrt6)\)) in blue and theoretical \(N(1,\sqrt6)\) in grey dots.
ggplot(norms) + geom_density(aes(A), color="#99ff99", size=1) +
geom_density(aes(B), color="#ff8080", size=1) +
geom_density(aes(Z), color="#ffd280", size=1) +
geom_density(aes(ZAB), color="blue", size=1) +
stat_function(fun=dnorm, args=list(mean=1,
sd=sqrt(6)), color="grey70", size=1, linetype=3) +
theme_minimal() + xlab("simulated value")