7.1. Hypotheses about mean of a population

\[ \begin{align} H_0 : \mu \le \mu_0 \\ H_1 : \mu > \mu_0 \end{align} \]

\[ \begin{align} H_0 : \mu \ge \mu_0 \\ H_1 : \mu < \mu_0 \end{align} \]

\[ \begin{align} H_0 : \mu = \mu_0 \\ H_1 : \mu \ne \mu_0 \end{align} \]

Let’s consider the following example. The number of living cells in 5 wells under some conditions are given below. In a reference literature source authors claimed a mean quantity of 5000 living cells under the same conditions. Is our result significantly different?

x =c(5128,4806,5037,4231,4222)

# one sample t-test
t.test(x,mu=5000)

## 
##  One Sample t-test
## 
## data:  x
## t = -1.622, df = 4, p-value = 0.1801
## alternative hypothesis: true mean is not equal to 5000
## 95 percent confidence interval:
##  4145.255 5224.345
## sample estimates:
## mean of x 
##    4684.8

7.2. Hypotheses about mean of a population

\[ \begin{align} H_0 : \pi \le \pi_0 \\ H_1 : \pi > \pi_0 \end{align} \]

\[ \begin{align} H_0 : \pi \ge \pi_0 \\ H_1 : \pi < \pi_0 \end{align} \]

\[ \begin{align} H_0 : \pi = \pi_0 \\ H_1 : \pi \ne \pi_0 \end{align} \]

R can help testing hypotheses about proportions.

Example: During a study of a new drug against viral infection, you have found that 70 out of 100 mice survived, whereas the survival after the standard therapy is 60% of the infected population. Is this enhancement statistically significant?

## make analysis by prop.test(). Approximate!
prop.test(x=70,n=100,p=0.6,alternative="greater")

## 
##  1-sample proportions test with continuity correction
## 
## data:  70 out of 100, null probability 0.6
## X-squared = 3.7604, df = 1, p-value = 0.02624
## alternative hypothesis: true p is greater than 0.6
## 95 percent confidence interval:
##  0.6149607 1.0000000
## sample estimates:
##   p 
## 0.7

## make analysis by binom.test(). Exact!
binom.test(x=70,n=100,p=0.6,alternative="greater")

## 
##  Exact binomial test
## 
## data:  70 and 100
## number of successes = 70, number of trials = 100, p-value =
## 0.02478
## alternative hypothesis: true probability of success is greater than 0.6
## 95 percent confidence interval:
##  0.6157794 1.0000000
## sample estimates:
## probability of success 
##                    0.7

7.3. Comparing means of 2 unmatched samples

\[ \begin{align} H_0 : \mu_1 \le \mu_2 \\ H_1 : \mu_1 > \mu_2 \end{align} \]

\[ \begin{align} H_0 : \mu_1 \ge \mu_2 \\ H_1 : \mu_1 < \mu_2 \end{align} \]

\[ \begin{align} H_0 : \mu_1 = \mu_2 \\ H_1 : \mu_1 \ne \mu_2 \end{align} \]

Parametric version of this comparison is done unig t.test(). Non-parametric - wilcox.test().

Mice=read.table("http://edu.sablab.net/data/txt/mice.txt",header=T,sep="\t")

# ending weight for male and female
xm = Mice$Ending.weight[Mice$Sex=="m"]
xf = Mice$Ending.weight[Mice$Sex=="f"]
t.test(xm,xf)

## 
##  Welch Two Sample t-test
## 
## data:  xm and xf
## t = 13.582, df = 759.53, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  5.265756 7.045119
## sample estimates:
## mean of x mean of y 
##  26.77665  20.62121

wilcox.test(xm,xf)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  xm and xf
## W = 120510, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0

# weight change for male & female
xm = Mice$Weight.change[Mice$Sex=="m"]
xf = Mice$Weight.change[Mice$Sex=="f"]
t.test(xm,xf)

## 
##  Welch Two Sample t-test
## 
## data:  xm and xf
## t = 3.2067, df = 682.86, p-value = 0.001405
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.009873477 0.041059866
## sample estimates:
## mean of x mean of y 
##  1.119401  1.093934

wilcox.test(xm,xf)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  xm and xf
## W = 84976, p-value = 0.02989
## alternative hypothesis: true location shift is not equal to 0

# bleeding time male & female
xm = Mice$Bleeding.time[Mice$Sex=="m"]
xf = Mice$Bleeding.time[Mice$Sex=="f"]
t.test(xm,xf)

## 
##  Welch Two Sample t-test
## 
## data:  xm and xf
## t = -1.1544, df = 722.78, p-value = 0.2487
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -7.213644  1.871515
## sample estimates:
## mean of x mean of y 
##  59.66667  62.33773

wilcox.test(xm,xf)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  xm and xf
## W = 65030, p-value = 0.01781
## alternative hypothesis: true location shift is not equal to 0

7.4. Matched samples

Example. The systolic blood pressures of n=12 women between the ages of 20 and 35 were measured before and after usage of a newly developed oral contraceptive.

BP=read.table("http://edu.sablab.net/data/txt/bloodpressure.txt",header=T,sep="\t")
str(BP)

## 'data.frame':    12 obs. of  3 variables:
##  $ Subject  : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ BP.before: int  122 126 132 120 142 130 142 137 128 132 ...
##  $ BP.after : int  127 128 140 119 145 130 148 135 129 137 ...

## Unpaired
t.test(BP$BP.before,BP$BP.after)

## 
##  Welch Two Sample t-test
## 
## data:  BP$BP.before and BP$BP.after
## t = -0.83189, df = 21.387, p-value = 0.4147
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -9.034199  3.867532
## sample estimates:
## mean of x mean of y 
##  130.6667  133.2500

wilcox.test(BP$BP.before,BP$BP.after)

## Warning in wilcox.test.default(BP$BP.before, BP$BP.after): cannot compute
## exact p-value with ties

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  BP$BP.before and BP$BP.after
## W = 59.5, p-value = 0.487
## alternative hypothesis: true location shift is not equal to 0

## Paired
t.test(BP$BP.before,BP$BP.after,paired=T)

## 
##  Paired t-test
## 
## data:  BP$BP.before and BP$BP.after
## t = -2.8976, df = 11, p-value = 0.01451
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -4.5455745 -0.6210921
## sample estimates:
## mean of the differences 
##               -2.583333

wilcox.test(BP$BP.before,BP$BP.after,paired=T)

## Warning in wilcox.test.default(BP$BP.before, BP$BP.after, paired = T):
## cannot compute exact p-value with ties

## Warning in wilcox.test.default(BP$BP.before, BP$BP.after, paired = T):
## cannot compute exact p-value with zeroes

## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  BP$BP.before and BP$BP.after
## V = 5, p-value = 0.02465
## alternative hypothesis: true location shift is not equal to 0

7.5. Comparing proportions in two populations

\[ \begin{align} H_0 : \pi_1 \le \pi_2 \\ H_1 : \pi_1 > \pi_2 \end{align} \]

\[ \begin{align} H_0 : \pi_1 \ge \pi_2 \\ H_1 : \pi_1 < \pi_2 \end{align} \]

\[ \begin{align} H_0 : \pi_1 = \pi_2 \\ H_1 : \pi_1 \ne \pi_2 \end{align} \]

x = Mice$Sex[Mice$Strain == "SWR/J"]
y = Mice$Sex[Mice$Strain == "MA/MyJ"]
prop.test(c(sum(x=="m"),sum(y=="m")),n=c(length(x),length(y)))

## 
##  2-sample test for equality of proportions with continuity
##  correction
## 
## data:  c(sum(x == "m"), sum(y == "m")) out of c(length(x), length(y))
## X-squared = 0.72283, df = 1, p-value = 0.3952
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.5236857  0.1667063
## sample estimates:
##    prop 1    prop 2 
## 0.4736842 0.6521739

7.6. Comparing variances of two populations

\[ \begin{align} H_0 : \sigma^2_1 = \sigma^2_2 \\ H_1 : \sigma^2_1 \ne \sigma^2_2 \end{align} \]

Example. A school is selecting a company to probide school bus to its pupils. They would like to choose the most punctual one. The times between bus arrival of wto companies were measured and stored in schoolbus dataset. Please test whether the difference is significant with \(\alpha = 0.1\).

Bus=read.table("http://edu.sablab.net/data/txt/schoolbus.txt",header=T,sep="\t")
str(Bus)

## 'data.frame':    26 obs. of  2 variables:
##  $ Milbank  : num  35.9 29.9 31.2 16.2 19 15.9 18.8 22.2 19.9 16.4 ...
##  $ Gulf.Park: num  21.6 20.5 23.3 18.8 17.2 7.7 18.6 18.7 20.4 22.4 ...

var.test(Bus[,1], Bus[,2])

## 
##  F test to compare two variances
## 
## data:  Bus[, 1] and Bus[, 2]
## F = 2.401, num df = 25, denom df = 15, p-value = 0.08105
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.8927789 5.7887880
## sample estimates:
## ratio of variances 
##           2.401036

7.7. Significance of correlation

Example. A malacologist interested in the morphology of West Indian chitons, Chiton olivaceous, measured the length and width of the eight overlapping plates composing the shell of 10 of these animals.

Chiton =read.table("http://edu.sablab.net/data/txt/chiton.txt",header=T,sep="\t")
str(Chiton)

## 'data.frame':    10 obs. of  2 variables:
##  $ Length: num  10.7 11 9.5 11.1 10.3 10.7 9.9 10.6 10 12
##  $ Width : num  5.8 6 5 6 5.3 5.8 5.2 5.7 5.3 6.3

plot(Chiton,pch=19,main=sprintf("r = %.4f",cor(Chiton[,1],Chiton[,2])))

cor.test(Chiton$Length,Chiton$Width)

## 
##  Pearson's product-moment correlation
## 
## data:  Chiton$Length and Chiton$Width
## t = 11.136, df = 8, p-value = 3.781e-06
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.8713272 0.9929217
## sample estimates:
##      cor 
## 0.969226

Exercises

Assume, that an average survival time for glioblastoma patients (early state, age<50) is 18 months. You developed a new treatment, which should increase the survival time and performed a pilot clinical trial (10 patients) in order to determine the positive effect. As a result, you observed the average survival of 20 months. Standard deviation was equal to 5 months. Is the survival improving significant?

See “glio” dataset http://edu.sablab.net/data/txt/glio.txt . Assume, that an average survival time for glioblastoma patients (early state, age<50) is 18 months. You developed a new treatment, which should increase the survival time and performed a pilot clinical trial (10 patients) in order to determine the positive effect. Is the survival improve significant?

Based on complete “mice” data set, means of which parameters (except weight) are significantly different for male and female populations?

Look for dataset “cancer” at http://edu.sablab.net/data/txt/cancer.txt This data contains results of survey aimed at survival and life quality of patients with advanced lung cancer, performed by the North Central Cancer Treatment Group (Loprinzi CL et al, J. of Clinical Oncology. 12(3):601-7, 1994). Look for the survival time (column time, given in days) and compare the survival for male and female populations . Apply both parametric and non-parametric testing.

See “leukemia” data at http://edu.sablab.net/biostat2/leukemia.txt This dataset contains information about two groups of patients who died of acute myelogenous leukemia. Patients were classified into the two groups according to the presence or absence of a morphologic characteristic of white cells. Patients termed AG positive were identified by the presence of Auer rods and/or significant granulature of the leukemic cells in the bone marrow at diagnosis. For AG-negative patients, these factors were absent. Leukemia is a cancer characterized by an overproliferation of white blood cells; the higher the white blood count (WBC), the more severe the disease. Investigate the data using parametric and non-parametric tests.

Statistical Tests

Petr V. Nazarov, LIH

2017-05-10