Difference between revisions of "Lr2"

Revision as of 09:40, 23 April 2020

The topic of this lecture are statistical tests in R.

Beginners in statistics: listen to lecture, then do tasks A, B, C
If you know basics of statistical tests, do tasks B, C, D
More information on this topic in the 1-EFM-340 Computer Statistics course

Introduction to statistical tests: sign test

[1]
Two friends A and B have played their favourite game n=10 times, A has won 6 times and B has won 4 times.
A claims that he is a better player, B claims that such a result could easily happen by chance if they were equally good players.
Hypothesis of player B is called the null hypothesis that the pattern we see (A won more often than B) is simply a result of chance
The null hypothesis reformulated: we toss coin n times and compute value X: the number of times we see head. The tosses are independent and each toss has equal probability of being head or tail
Similar situation: comparing programs A and B on several inputs, counting how many times is program A better than B.

# simulation in R: generate 10 psedorandom bits
# (1=player A won)
sample(c(0,1), 10, replace = TRUE)
# result e.g. 0 0 0 0 1 0 1 1 0 0

# directly compute random variable X, i.e. the sum of bits
sum(sample(c(0,1), 10, replace = TRUE))
# result e.g. 5

# we define a function which will m times repeat 
# the coin tossing experiment with n tosses 
# and returns a vector with m values of random variable X
experiment <- function(m, n) {
  x = rep(0, m)     # create vector with m zeroes
  for(i in 1:m) {   # for loop through m experiments
    x[i] = sum(sample(c(0,1), n, replace = TRUE)) 
  }
  return(x)         # return array of values     
}
# call the function for m=20 experiments, each with n tosses
experiment(20,10)
# result e.g.  4 5 3 6 2 3 5 5 3 4 5 5 6 6 6 5 6 6 6 4
# draw histograms for 20 experiments and 1000 experiments
png("hist10.png")  # open png file
par(mfrow=c(2,1))  # matrix of plots with 2 rows and 1 column
hist(experiment(20,10))
hist(experiment(1000,10))
dev.off() # finish writing to file

It is easy to realize that we get binomial distribution (binomické rozdelenie)
The probability of getting k ones out of n coin tosses is $\Pr(X=k)={n \choose k}2^{-n}$
The p-value of the test is the probability that simply by chance we would get k the same or more extreme than in our data.
In other words, what is the probability that in n=10 tosses we see head 6 times or more (one sided test)
P-value for k ones out of n coin tosses $\sum _{j=k}^{n}{n \choose k}2^{-n}$
If the p-value is very small, say smaller than 0.01, we reject the null hypothesis and assume that player A is in fact better than B

# computing the probability that we get exactly 6 heads in 10 tosses
dbinom(6, 10, 0.5) # result 0.2050781
# we get the same as our formula above:
7*8*9*10/(2*3*4*(2^10)) # result 0.2050781

# entire probability distribution: probabilities 0..10 heads in 10 tosses
dbinom(0:10, 10, 0.5)
# [1] 0.0009765625 0.0097656250 0.0439453125 0.1171875000 0.2050781250
# [6] 0.2460937500 0.2050781250 0.1171875000 0.0439453125 0.0097656250
# [11] 0.0009765625

# we can also plot the distribution
plot(0:10, dbinom(0:10, 10, 0.5))
barplot(dbinom(0:10, 10, 0.5))

# our p-value is the sum for k=6,7,8,9,10
sum(dbinom(6:10, 10, 0.5))
# result: 0.3769531
# so results this "extreme" are not rare by chance,
# they happen in about 38% of cases

# R can compute the sum for us using pbinom 
# this considers all values greater than 5
pbinom(5, 10, 0.5, lower.tail=FALSE)
# result again 0.3769531

# if probability is too small, use log of it
pbinom(9999, 10000, 0.5, lower.tail=FALSE, log.p = TRUE)
# [1] -6931.472
# the probability of getting 10000x head is exp(-6931.472) = 2^{-100000}

# generating numbers from binomial distribution 
# - similarly to our function experiment
rbinom(20, 10, 0.5)
# [1] 4 4 8 2 6 6 3 5 5 5 5 6 6 2 7 6 4 6 6 5

# running the test
binom.test(6, 10, p = 0.5, alternative="greater")
#
#        Exact binomial test
#
# data:  6 and 10
# number of successes = 6, number of trials = 10, p-value = 0.377
# alternative hypothesis: true probability of success is greater than 0.5
# 95 percent confidence interval:
# 0.3035372 1.0000000
# sample estimates:
# probability of success
#                   0.6

# to only get p-value, run
binom.test(6, 10, p = 0.5, alternative="greater")$p.value
# result 0.3769531

Comparing two sets of values: Welch's t-test

Let us now consider two sets of values drawn from two normal distributions with unknown means and variances
The null hypothesis of the Welch's t-test is that the two distributions have equal means
The test computes test statistics (in R for vectors x1, x2):
- (mean(x1)-mean(x2))/sqrt(var(x1)/length(x1)+var(x2)/length(x2))
This test statistics is approximately distributed according to Student's t-distribution with the degree of freedom obtained by

n1=length(x1)
n2=length(x2)
(var(x1)/n1+var(x2)/n2)**2/(var(x1)**2/((n1-1)*n1*n1)+var(x2)**2/((n2-1)*n2*n2))

Luckily R will compute the test for us simply by calling t.test

x1 = rnorm(6, 2, 1)
# 2.70110750  3.45304366 -0.02696629  2.86020145  2.37496993  2.27073550

x2 = rnorm(4, 3, 0.5)
# 3.258643 3.731206 2.868478 2.239788
t.test(x1, x2)
# t = -1.2898, df = 7.774, p-value = 0.2341
# alternative hypothesis: true difference in means is not equal to 0
# means 2.272182  3.024529

x2 = rnorm(4, 5, 0.5)
# 4.882395 4.423485 4.646700 4.515626
t.test(x1,x2)
# t = -4.684, df = 5.405, p-value = 0.004435
# means 2.272182  4.617051

# to get only p-value, run 
t.test(x1,x2)$p.value

We will apply Welch's t-test to microarray data

Data from the GEO database, publication Abbott et al 2007: [Generic and specific transcriptional responses to different weak organic acids in anaerobic chemostat cultures of Saccharomyces cerevisiae
Gene expression measurements under 5 conditions:
- Reference: yeast grown in a normal environment
- 4 different acids added so that cells grow 50% slower (acetic, propionic, sorbic, benzoic)
From each condition (reference and each acid) we have 3 replicates
Together our table has 15 columns (3 replicates from 5 conditions) and 6398 rows (genes)
We will test statistical difference between the reference condition and one of the acids (3 numbers vs other 3 numbers)
See Task B in the exercises

Multiple testing correction

When we run t-tests on the reference vs. acetic acid on all 6398 genes, we get 118 genes with p-value at most 0.01
Purely by chance this would happen in 1% of cases (from the definition of the p-value)
So purely by chance we would expect to get about 64 genes with p-value at most 0.01
So perhaps roughly half of our detected genes (maybe less, maybe more) are false positives
Sometimes false positives may even overwhelm the results
Multiple testing correction tries to limit the number of false positives among the results of multiple statistical tests
Many different methods
The simplest one is Bonferroni correction, where the threshold on the p-value is divided by the number of tested genes, so instead of 0.01 we use threshold 0.01/6398 = 1.56e-6
This way the expected overall number of false positives in the whole set is 0.01 and so the probability of getting even a single false positive is also at most 0.01 (by Markov inequality)
We could instead multiply all p-values by the number of tests and apply the original threshold 0.01 - such artificially modified p-values are called corrected
After Bonferroni correction we get only 1 significant gene

# the results of t-tests are in vector pa of length 6398
# manually multiply p-values by length(pa), count those that have value <= 0.01
sum(pa * length(pa) <= 0.01)
# in R you can use p.adjust for multiple testing correction
pa.adjusted = p.adjust(pa, method ="bonferroni")
# this is equivalent to multiplying by the length and using 1 if the result > 1
pa.adjusted = pmin(pa*length(pa),rep(1,length(pa)))

# there are less conservative multiple testing correction methods, 
# e.g. Holm's method, but in this case we get almost the same results
pa.adjusted2 = p.adjust(pa, method ="holm")

Another frequently used correction is false discovery rate (FDR), which is less strict and controls the overall proportion of false positives among results.

@@ Line 47: / Line 47: @@
 </syntaxhighlight>
 * It is easy to realize that we get [https://en.wikipedia.org/wiki/Binomial_distribution binomial distribution] (binomické rozdelenie)
-* <math>\Pr(X=k) = {n \choose k} 2^{-n}</math>
+* The probability of getting ''k'' ones out of ''n'' coin tosses is  <math>\Pr(X=k) = {n \choose k} 2^{-n}</math>
 * The ''p-value'' of the test is the probability that simply by chance we would get ''k'' the same or more extreme than in our data.
-* In other words, what is the probability that in 10 tosses we see head 6 times or more (one sided test)
+* In other words, what is the probability that in ''n=10'' tosses we see head 6 times or more (one sided test)
-* <math>\sum_{j=k}^n {n \choose k} 2^{-n}</math>
+* P-value for ''k'' ones out of ''n'' coin tosses <math>\sum_{j=k}^n {n \choose k} 2^{-n}</math>
 * If the p-value is very small, say smaller than 0.01, we reject the null hypothesis and assume that player ''A'' is in fact better than ''B''
@@ Line 69: / Line 69: @@
 barplot(dbinom(0:10, 10, 0.5))
-# our p-value is the sum for 6,7,8,9,10
+# our p-value is the sum for k=6,7,8,9,10
 sum(dbinom(6:10, 10, 0.5))
 # result: 0.3769531

Difference between revisions of "Lr2"

Revision as of 09:40, 23 April 2020

Introduction to statistical tests: sign test

Comparing two sets of values: Welch's t-test

Multiple testing correction

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools