Lecture Practical 3: Categorical Model Benchmarks

Benchmark the performances of different testing methods

To know exactly how different tests worked on actual data, we need to perform a “benchmark” to measure their performances. Usually, the performance comparison on statistical testing is firstly conducted with the simulated data set. The advantage of the simulated data is that we can hold the “ground truth” and compute the absolute evaluation metrics for different methods.

In this exercise, we will use a simulation model for gene expression data. The simulated differential gene expression are now in both directions (increase and decrease), and the genes having 0 counts in both conditions are dropped to avoid errors in some testing methods.

Please do note that the probabilistic model we used for simulation can in general hugely influence the testing results. Hence, our assumed data generation process must be as realistic as possible, and often a good simulation work along is an influential scientific paper.

The rule of thumb to select a suitable simulation method is to see the goodness of fit (e.x. using QQ-plot) of your simulation on the real data you want to study, other model selection strategies such as BIC can also be used to choose between different modeling candidates.

### Initialize data generation parameters
set.seed(123)
number_of_genes <- 20000 # Total number of genes
number_of_diff_genes <- 1000 # Total number of deferentially expressed genes
seq_depth_ctrl <- 10 # The sequencing depth size factor in control condition
seq_depth_treatment <- 13 # The sequencing depth size factor in treatment condition

### Simulate the distribution parameters for both the control and treatment groups
expression_level_ctrl <- rgamma(number_of_genes, shape = 2, rate = 1) 
# Gamma distribution is used here to simulate RNA expression level since it has the support of [0,Infty]
diff_expressed_genes_indx <- sample(number_of_genes, number_of_diff_genes) 
# Sample a index for the deferentially expressed genes, now the change can be in both sides
effect_size_FC <- exp(rnorm(number_of_diff_genes))
# Generate the effect sizes of differential gene expressions in fold changes (only consider positive change)
expression_level_treatment <- expression_level_ctrl
expression_level_treatment[diff_expressed_genes_indx] <- expression_level_treatment[diff_expressed_genes_indx] * effect_size_FC #Multiply the effect sizes on the deferentially expressed genes

###Simulate the list of read counts for both groups
Read_count_ctrl <- rpois(number_of_genes, lambda = expression_level_ctrl*seq_depth_ctrl)
Read_count_treatment <- rpois(number_of_genes, lambda = expression_level_treatment*seq_depth_ctrl)
exprs_matrix <- data.frame(Ctrl = Read_count_ctrl, Treatment = Read_count_treatment)
exprs_matrix[rowSums(exprs_matrix)==0,] <- exprs_matrix[rowSums(exprs_matrix)==0,]+1 #add 1 for 0 counts in both samples

1. Implement testing functions

Next, we need to implement 6 functions to compute p-values using different methods of statistical testing. All of the functions take the input value of an gene expression count matrix, and the output is a vector of p-values with the length equal to the number of genes.

Task 1.1: Fill the following function to implement the Poisson test (two-sided) using plug-in estimates for lambda in the control group.

#Define functions for different tests, the input is an expression matrix, and the output is a vector of p-values for each gene (row of the matrix)
PoissonPlugin <- function(exprs_matrix){
seq_depth_est <- colSums(exprs_matrix)
lambda_ctrl_est <- exprs_matrix[,"Ctrl"]
lambda_null <- lambda_ctrl_est/seq_depth_est["Ctrl"]*seq_depth_est["Treatment"]
pvalue <- mapply(function(x,y) poisson.test(x, r=y, alternative = "two.sided")$p.value, x = exprs_matrix[,"Treatment"], y = lambda_null)
return(pvalue)
}

PoissonPlugin(exprs_matrix[1:10,]) #Check sample outputs

##  [1] 0.066010594 0.607733312 0.723510395 0.009089373 0.124635839 0.588534057
##  [7] 0.003340712 1.000000000 0.659281629 0.427001542

Task 1.2: Fill the following function to implement the exact binomial test (two-sided).

ExactBinomial <- function(exprs_matrix){
seq_depth_est <- colSums(exprs_matrix)
pi_null <- seq_depth_est["Treatment"]/(seq_depth_est["Treatment"]+seq_depth_est["Ctrl"])
pvalue <- mapply(function(x,y,z) binom.test(x=x,n=z,p=y, alternative = "two.sided")$p.value, x = exprs_matrix[,"Treatment"], y = pi_null, z = rowSums(exprs_matrix))
return(pvalue)
}

ExactBinomial(exprs_matrix[1:10,]) #Check sample outputs

##  [1] 0.16754531 0.81391066 1.00000000 0.09564304 0.23217615 0.76903115
##  [7] 0.12103468 1.00000000 0.70457453 0.64884704

Task 1.3: Fill the following function to implement the Fisher’s exact test (two-sided).

FisherTest <- function(exprs_matrix){
seq_depth_est <- colSums(exprs_matrix)
pvalue <- mapply(function(x,y){
  tb <- rbind(c(x,y),c(seq_depth_est["Treatment"]-x,seq_depth_est["Ctrl"]-y))
  fisher.test(tb, alternative = "two.sided")$p.value
  }, x = exprs_matrix[,"Treatment"], 
    y = exprs_matrix[,"Ctrl"])
return(pvalue)
}

FisherTest(exprs_matrix[1:10,]) #Check sample outputs

##  [1] 0.15643947 0.79290976 1.00000000 0.08281403 0.17439784 0.75358497
##  [7] 0.11848095 1.00000000 0.67766651 0.62802022

Task 1.4: Fill the following function to implement the chi-squared test (two-sided, without correction).

ChisqrTest <- function(exprs_matrix){
seq_depth_est <- colSums(exprs_matrix)
pvalue <- suppressWarnings( mapply(function(x,y){
  tb <- rbind(c(x,y),c(seq_depth_est["Treatment"]-x,seq_depth_est["Ctrl"]-y))
  chisq.test(tb,correct = FALSE)$p.value
  }, x = exprs_matrix[,"Treatment"], 
    y = exprs_matrix[,"Ctrl"]) )
return(pvalue)
}

ChisqrTest(exprs_matrix[1:10,]) #Check sample outputs

##  [1] 0.10949051 0.71461117 0.98256000 0.07148737 0.16848367 0.69352310
##  [7] 0.09424580 0.57490159 0.64317663 0.57363954

Task 1.5: Fill the following function to implement the chi-squared test with Yates’ continuity correction (2-sided).

ChisqrTestCorrected <- function(exprs_matrix){
seq_depth_est <- colSums(exprs_matrix)
pvalue <- suppressWarnings( mapply(function(x,y){
  tb <- rbind(c(x,y),c(seq_depth_est["Treatment"]-x,seq_depth_est["Ctrl"]-y))
  chisq.test(tb)$p.value
  }, x = exprs_matrix[,"Treatment"], 
    y = exprs_matrix[,"Ctrl"]) )
return(pvalue)
}

ChisqrTestCorrected(exprs_matrix[1:10,]) #Check sample outputs

##  [1] 0.1722859 0.8147851 1.0000000 0.1076420 0.2102479 0.8130858 0.2070308
##  [8] 1.0000000 0.7459528 0.6888500

Task 1.6: Fill the following function to implement the logistic regression with Wald test p-value.

Hint: the tb is just the same of Fisher’s test, and the logistic regression belong to the GLM (generalized linear model) of what family?

LogisticRegression <- function(exprs_matrix){
seq_depth_est <- colSums(exprs_matrix)
pvalue <- suppressWarnings( mapply(function(x,y){
  tb <- rbind(c(x,y),c(seq_depth_est["Treatment"]-x,seq_depth_est["Ctrl"]-y))
  summary(glm(tb~c(0,1),family = "binomial"))$coefficients[2,4]
  }, x = exprs_matrix[,"Treatment"], 
    y = exprs_matrix[,"Ctrl"]) )
return(pvalue)
}#This function is a little bit slow

LogisticRegression(exprs_matrix[1:10,]) #Check sample outputs

##  [1] 0.1175757 0.7146666 0.9825603 0.0763937 0.1694824 0.6936489 0.1329273
##  [8] 0.5819608 0.6433107 0.5740082

2 Compute and summarize performance metrices

Task 2.1: Define a helper function that can generate a 3 elements vector of Type I error, Power, and FDR from the input confusion matrix.

The confu_matrix of the function input has the following structure: \[\begin{pmatrix} \text{H0 true, retain H0} & \text{H1 true, retain H0} \\ \text{H0 true, reject H0} & \text{H1 true, reject H0} \\ \end{pmatrix}\]

helper_func <- function(conf_matrix){
  Type_I <- conf_matrix[2,1]/sum(conf_matrix[1,1]+conf_matrix[2,1])
  Power <- conf_matrix[2,2]/sum(conf_matrix[1,2]+conf_matrix[2,2])
  FDR <- conf_matrix[2,1]/(conf_matrix[2,1]+conf_matrix[2,2])
  return(c(Type_I, Power, FDR))
}

pvalues_lst <- lapply(c("PoissonPlugin",
                        "ExactBinomial",
                        "FisherTest",
                        "ChisqrTest",
                        "ChisqrTestCorrected",
                        "LogisticRegression"), function(x)do.call(x,list(exprs_matrix)))

names(pvalues_lst) <- c("PoissonPlugin",
                        "ExactBinomial",
                        "FisherTest",
                        "ChisqrTest",
                        "ChisqrTestCorrected",
                        "LogisticRegression") 

ground_truth <- rep(FALSE, number_of_genes)
ground_truth[diff_expressed_genes_indx] <- TRUE

helper_func <- function(conf_matrix){
  Type_I <- conf_matrix[2,1]/sum(conf_matrix)
  Power <- conf_matrix[2,2]/sum(conf_matrix[1,2]+conf_matrix[2,2])
  FDR <- conf_matrix[2,1]/(conf_matrix[2,1]+conf_matrix[2,2])
  return(c(Type_I, Power, FDR))
}

performance_metrics <- function(pvalues_lst, alpha=0.05, ground_truth){
  perf_list <- lapply(pvalues_lst,function(x){
    pred_pvalue <- x < alpha
    pred_BH <- p.adjust(x, method = "BH") < alpha
    conf_matrix_p <-table(pred_pvalue, ground_truth)
    conf_matrix_BH <-table(pred_BH, ground_truth)
   c(helper_func(conf_matrix_p),helper_func(conf_matrix_BH))
  })
  perf_df <- data.frame(value=unlist(perf_list),
                        method=rep(names(pvalues_lst),each=length(pvalues_lst)),
                        metric=rep(c("Type-I error rate","Power","FDR"),length(pvalues_lst)*2),
                        correction=rep(rep(c("p-value","BH corrected"),each=3),length(pvalues_lst)))
  return(perf_df)
}

plot_df <- performance_metrics(pvalues_lst,0.05,ground_truth)
plot_df$correction <- factor(plot_df$correction,levels = c("p-value","BH corrected"))
plot_df$method <- factor(plot_df$method,levels = rev(c("PoissonPlugin",
                                                       "ExactBinomial",
                                                       "FisherTest",
                                                       "ChisqrTest",
                                                       "ChisqrTestCorrected",
                                                       "LogisticRegression")))

library(ggplot2)

## Warning: The packages `ellipsis` (>= 0.3.2) and `vctrs` (>= 0.3.8) are required
## as of rlang 1.0.0.

## Warning: replacing previous import 'lifecycle::last_warnings' by
## 'rlang::last_warnings' when loading 'tibble'

## Warning: replacing previous import 'ellipsis::check_dots_unnamed' by
## 'rlang::check_dots_unnamed' when loading 'tibble'

## Warning: replacing previous import 'ellipsis::check_dots_used' by
## 'rlang::check_dots_used' when loading 'tibble'

## Warning: replacing previous import 'ellipsis::check_dots_empty' by
## 'rlang::check_dots_empty' when loading 'tibble'

ggplot(plot_df,aes(x=method,y=value,fill=method))+geom_bar(stat="identity",width=0.8)+
  geom_hline(yintercept = 0.05, colour="darkblue", linetype=2) +
  facet_grid(metric~correction,scales = "free_y")+scale_fill_brewer(palette = "Spectral")+theme_classic()+coord_flip()

The result shows that, except for the Poisson plug-in test that basically fail to control the Type-I error rate and FDR, the rest of the tests all have comparable performances. Particularly, the Fisher’s exact test and exact binomial test have reasonably high power as well as tightly controlled false discoveries after using BH correction.

Lecture Practical 3: Categorical Model Benchmarks

Zhen Wei

March 8, 2023

Benchmark the performances of different testing methods

1. Implement testing functions

2 Compute and summarize performance metrices

Session Info