Lecture practical 2: modeling and removal of batch effects in next-generation sequencing data

0. Practical setup
- 0.1 Simulation
- 0.2 Data structure
1. Exploratory Data Analysis
2. Knowledge-based correction
3. Kowledge free correction
- 3.1 Combat / multiple linear regression
- 3.2. Unsupervised batch effect correction by surrogate variable analysis
4. Performance evaluation
Session Info

0. Practical setup

This exercise demonstrates the use of several core methods in batch effect removal by using a simulated RNA-Seq data set. The performance of different correction methods will be measured by their effects on gene expression level quantification and the differential gene expression analysis.

0.1 Simulation

The count matrix is sampled from the negative binomial distributions, where the batch effects are specified using the GC content bias model implemented by the polyester package. We will simulate 30 samples under 3 batches and 2 biological conditions, and a unique GC bias trend is added for each batch.

10% of genes are set to be deferentially expressed between the 2 conditions (WT and treatment); the effect sizes for differential expression was set to be either 1/2 or 2 with equal probability.

# Extract the genes on chr1 of the human genome
library(BSgenome.Hsapiens.UCSC.hg19)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb         <- TxDb.Hsapiens.UCSC.hg19.knownGene
genome       <- BSgenome.Hsapiens.UCSC.hg19
exbg_chr1    <- keepSeqlevels(exonsBy(txdb, by = "gene"), "chr1", pruning.mode="coarse")
geneSeq_chr1 <- extractTranscriptSeqs(Hsapiens, exbg_chr1) #Gene sequences (exons only)
gnum         <- length(geneSeq_chr1) #Total number of genes

# Simulate a count matrix containing 30 samples
sample_num  <- 30
bio_label   <- rep(c("WT", "Treatment"), each = 15)               # Setup biological conditions
batch_label <- rep(c("Batch1", "Batch2", "Batch3"), c(10, 8, 12)) # Setup batches

set.seed(158)
gexpr_wt  <- rgamma(gnum,shape = 4,scale = 10)
gexpr_trt <- gexpr_wt
num_diff  <- round(gnum*0.1) # 10% of genes are deferentially expressed
diff_indx <- sample.int(gnum, num_diff)
diff_size <- sample(c(0.5,2), size = num_diff, replace = TRUE) # The fold changes are either 1/2 or 2
gexpr_trt[diff_indx] <- gexpr_trt[diff_indx] * diff_size
mu_gexpr  <- c(rep(gexpr_wt,15), rep(gexpr_trt,15))            # Ground truth gene expression level
#Generate gene count using NB distribution
readmat   <- matrix(rnbinom(sample_num*gnum, mu = mu_gexpr, size = 5), ncol=sample_num) 


# Add GC content bias using built-in bias models in package polyester
library(polyester)
gc_trends <- rep(1, sample_num)
gc_trends[batch_label=="Batch2"] <- 4 
gc_trends[batch_label=="Batch3"] <- 7 #Specify different GC biases for different batches

set.seed(137)
readmat_biased <- add_gc_bias(readmat, gc_trends, geneSeq_chr1)
readmat_biased[is.na(readmat_biased)] <- 0
readmat_biased <- pmax(readmat_biased, 0)

# Make a summarized experiment object to store the gene expression data
library(SummarizedExperiment)
se <- SummarizedExperiment(assay   = list(count = readmat_biased), 
                           colData = DataFrame( bio_label = bio_label,
                                              batch_label = batch_label ))
rowRanges(se) <- exbg_chr1
diff_genes    <- rep(FALSE, gnum)
diff_genes[diff_indx] <- TRUE
rowData(se)   <- data.frame(GC = letterFrequency(geneSeq_chr1, "GC", as.prob = TRUE)[,1],
                 true_gexpr_wt = gexpr_wt,
                true_gexpr_trt = gexpr_trt,
                    diff_genes = diff_genes) #Store the GC contents and the "ground truth" under rowData()
colnames(se)  <- paste0("sample_", 1:sample_num)

0.2 Data structure

The experimental designs and count matrix can be accessed by colData() and assay(), respectively.

se

## class: RangedSummarizedExperiment 
## dim: 2338 30 
## metadata(0):
## assays(1): count
## rownames(2338): 10000 100126331 ... 9970 998
## rowData names(4): GC true_gexpr_wt true_gexpr_trt diff_genes
## colnames(30): sample_1 sample_2 ... sample_29 sample_30
## colData names(2): bio_label batch_label

colData(se) #Biological and batch labels

## DataFrame with 30 rows and 2 columns
##             bio_label batch_label
##           <character> <character>
## sample_1           WT      Batch1
## sample_2           WT      Batch1
## sample_3           WT      Batch1
## sample_4           WT      Batch1
## sample_5           WT      Batch1
## ...               ...         ...
## sample_26   Treatment      Batch3
## sample_27   Treatment      Batch3
## sample_28   Treatment      Batch3
## sample_29   Treatment      Batch3
## sample_30   Treatment      Batch3

rowData(se) #Gene's gc content and the "ground truth" labels

## DataFrame with 2338 rows and 4 columns
##                  GC true_gexpr_wt true_gexpr_trt diff_genes
##           <numeric>     <numeric>      <numeric>  <logical>
## 10000      0.400992       52.2877        52.2877      FALSE
## 100126331  0.441860       40.5478        40.5478      FALSE
## 100126346  0.341772       28.6339        28.6339      FALSE
## 100126348  0.637500       47.6883        47.6883      FALSE
## 100126349  0.553571       38.9102        38.9102      FALSE
## ...             ...           ...            ...        ...
## 9928       0.382713       23.3776        23.3776      FALSE
## 9939       0.438617       25.3532        25.3532      FALSE
## 9967       0.482265       49.4985        49.4985      FALSE
## 9970       0.529275       15.2699        15.2699      FALSE
## 998        0.417894       49.7120        49.7120      FALSE

1. Exploratory Data Analysis

Since the aim of EDA is to discover unexpected issues in your data, EDA is the key step within a counter batch effect work-flow. EDA can provide regular “sanity checks” while conducting complex data science projects; it is often absolutely necessary to ensure the validity of a project (although it may not guarantee to always make the result more “positive”).

1.2 Clustering

Task 1.2: Apply hierarchically cluster over samples using the transformed assay of log(count+1); use the euclidean distance as the input metric, and set the clustering method to “ward.D2”.

hc <- hclust(dist(t(log(assay(se)+1))), method = "ward.D2")

Next, the hclust dendrograms are plotted under 2 types of labels.

library(rafalib)
myplclust(hc, labels = se$batch_label, lab.col = as.fumeric(as.character(se$batch_label)), cex = 0.5)

myplclust(hc, labels = se$bio_label,   lab.col = as.fumeric(as.character(se$bio_label)),   cex = 0.5)

Judging from the tree structure, we could see that both the biological and batch factors are significantly contributing to the column-wised distances. However, the experimental conditions seem to have a lager effect, since the partitions under k = 2 can perfectly classify the WT and treatment labels.

1.3 Principal component analysis

Task 1.3: Compute the empirical covarance matrix on the transformed matrix (log(count + 1)), then calculate the 1st and 2nd principal components from the covariance matrix. Store the PCs into the variable names of PC1 and PC2.

cov_mat <- cov(log(assay(se)+1))
decomp <- eigen(cov_mat)
PC1 <- decomp$vectors[,1]
PC2 <- decomp$vectors[,2]

image(cov_mat)

The covariance matrix clearly showing chunks explained by the 3 batches.

library(ggplot2)
ggplot(data.frame(PC1 = PC1, 
                  PC2 = PC2, 
                  batch_label = batch_label,
                  bio_label = bio_label),
         aes(x=PC1,y=PC2,colour=batch_label,shape=bio_label))+
         geom_point()+
         theme_classic()+
         scale_color_brewer(palette = "Dark2")

par(mfrow = c(1,2))
boxplot(PC1~batch_label)
boxplot(PC2~batch_label)

dev.off()

## null device 
##           1

From the dot plots and the box-plots above, we could see that both the PC1 and PC2 are learning the batch labels pretty well. It is good in a sense that it indicates the PCA approach can capture the batch factors induced by the GC-content biases. We can therefore able to correct the batch effect directly using factor analysis approach such as SVA. (will be done in 3.2)

1.4 Differential gene expression analysis

Before implementing any correction measures, we will perform DGEA over rows using t-test, and its accuracy will be used as a baseline for the lateral corrections.

Task 1.4.1: For all genes, compute the t-test p-values (for the difference in means) between the WT and the treatment groups. Then, adjust the p-values using the “BH” approach.

Note: use the rowttests(), which is the vectorized t-test function exported from the package genefilter. The t-test should be based on the transformed count of log(count+1).

library(genefilter)
t_test_pvalues <- rowttests(log(assay(se)+1), factor(se$bio_label))$p.value
padj <- p.adjust(t_test_pvalues, method = "BH") 
alpha <- 0.05
sig_diff_genes <- padj <= alpha
true_diff_genes <- rowData(se)$diff_genes
table(sig_diff_genes, true_diff_genes)

##               true_diff_genes
## sig_diff_genes FALSE TRUE
##          FALSE  1971   72
##          TRUE    132  162

Question 1.4.2: Based on the computed confusion matrix, could the alpha value (0.05) control the FDR? If not, what might be the cause of it?

Your answer:

2. Knowledge-based correction

In the section bellow, we will correct the data through the knowledge-based approach, which is to model the generation process of the technical artifact. For NGS, the predominant source of artifact often comes from the biased amplification induced by the GC contents of cDNA fragments.

Often, such bias can be effectively traced by drawing the smoothed regression fits between the GC contents (of the gene features) and the read counts.

poisson_smooth <- function(...) {
  geom_smooth(method = "glm", method.args = list(family = "poisson"), ...)
}

plot_df           <- reshape2::melt(assay(se))[,-1]
colnames(plot_df) <- c("sample","gene_count")
plot_df$GC        <- rowData(se)$GC
plot_df$batch     <- rep(se$batch_label, each = gnum)

ggplot(plot_df, aes(x = GC, 
                    y = gene_count,
                group = sample,
                color = batch)) +  
                poisson_smooth(formula = y ~ splines::ns(x, 5), size = 0.5, se = FALSE) +
                theme_classic() + scale_color_brewer(palette = "Dark2") +
               labs(x = "GC content",
                    y = "Rate of read count",
                 main = "Poisson Regression")

The diagram above shows the Poisson GLM fits, in which the Y-axis represented the fitted expected count (\(\lambda\) of Poisson distribution). We could see that after smoothing the covariate with splines, there are 3 different trends under each batches. For all of the samples, the likelihood of observing reads coming from genes with extremely high / low GC contents is very small.

From the lecture, we know that the GC bias in NGS is often modeled by the following statistical model.

For gene \(i\), the count \(Y_i\) follows a Poisson distribution:

\[Y_i \sim \mathrm{Poisson}(\lambda_i)\] The mean \(\lambda\) is decomposed into the product of \(\theta_i \times f(X_i) \times s\).

We have discussed in the lecture that the \(f(X_i)\) is estimated from the fitted values of the log Poisson linear models (same as the model fits in the above diagram). Afterward, the corrected read count can be derived by \(Y_i^{\mathrm{correct}}=Y_i / \hat f(X_i)\). (P.S. our simulation here does not include differences in sequencing depths (\(s\)), hence we can treat the count after correction \(Y_i^\mathrm{correct}\) equal to the expression value estimate \(\hat \theta_i\))

However, for identifiability, we often imply the constraint \(\frac{1}{n}\sum_{\forall i} \hat f(X_i)=1\) on the GC bias term. To realize such constraint, we can center the linear model fits on the log scale (i.e. the scale of the linear combination).

P.S. The identifiability constraints for multiplicative terms are often by mean = 1, and for additive terms are often by mean = 0. Please note that the multiplicative terms can become additive on log scale.

Task 2.1: Finish the code chunk bellow, so that a matrix of correction offsets can be derived at the end. The step you need to complete is to center (i.e. x-mean(x) ) the GLM fits for the identifiability constraint.

correction_offsets <- matrix(NA, nrow = gnum, ncol = sample_num)
for(i in 1:ncol(se)){
 glm_fit_i <- glm(y ~ splines::ns(x, 5), data = data.frame(x=rowData(se)$GC, y=assay(se)[,i]), family = "poisson")
 fitted_values_i <- predict(glm_fit_i) #Calculate the glm fits on the linear combination scale
 correction_offsets[,i] <- exp(fitted_values_i - mean(fitted_values_i)) #Center the glm fits and exponentiate it to get the offsets on the Poisson rate scale
}
correction_offsets[1:2,1:3]

##          [,1]     [,2]     [,3]
## [1,] 1.006335 1.010293 1.010734
## [2,] 1.057867 1.064325 1.055998

Task 2.2: Using the correction offsets calculated above, derive the GC corrected count matrix in assay_correct_knowledge.

#Create functions to compute confusion matrix and PCA plots from count matrix X
DGEAcfm <- function(X,biolab,diff_genes){
Ttest <- rowttests(log(X+1), factor(biolab))
t_test_result <- p.adjust(Ttest$p.value, method = "fdr") < 0.05
table(t_test_result, diff_genes)  
}

DGEAcfm(assay_correct_knowledge, se$bio_label, rowData(se)$diff_genes)

##              diff_genes
## t_test_result FALSE TRUE
##         FALSE  2092   70
##         TRUE     11  164

PCAplot <- function(X, biolab, batchlab){
require(ggplot2)
cov_mat <- cov(log(X+1))
decomp <- eigen(cov_mat)
PC1 <- decomp$vectors[,1]
PC2 <- decomp$vectors[,2]
ggplot(data.frame(PC1 = PC1, 
                  PC2 = PC2, 
                  batch_label = batchlab,
                  bio_label = biolab),
                  aes(x = PC1, y = PC2, colour = batch_label, shape = bio_label))+
                  geom_point() +
                  theme_classic() +
                  scale_color_brewer(palette = "Dark2")  
}

PCAplot(assay_correct_knowledge, se$bio_label, se$batch_label)

Question 2.3: Based on the confusion matrix, could the t-test now better control over the FDR after GC correction?

Your answer:

3. Kowledge free correction

3.1 Combat / multiple linear regression

In the class, we have discussed that when the batch factors are given, we can directly fit multiple linear regression models for each genes by treating the batch labels as the covariates for adjustment. The regression fits and corresponding p values are consequently adjusted over the batch confounding.

Task 3.1: Fill the regression design in the code chunk below to complete the Combat adjustment.

lm_coefficients_combat <- matrix(NA, nrow = gnum, ncol = 4)
lm_pvalues_combat      <- rep(NA, gnum)
se$bio_label           <- factor(se$bio_label, levels = c("WT","Treatment")) #Set WT as the reference level in linear model fits

for(i in 1:gnum){
  model_fit_i  <- lm(y~b+w, data = data.frame(y = log(assay(se)[i,] + 1), 
                                              b = se$bio_label, 
                                              w = se$batch_label))
  lm_coefficients_combat[i,] <- summary(model_fit_i)$coefficients[,1]
  lm_pvalues_combat[i]       <- summary(model_fit_i)$coefficients["bTreatment","Pr(>|t|)"]
}

colnames(lm_coefficients_combat) <- names(coefficients(model_fit_i))
lm_pvalues_combat[is.na(lm_pvalues_combat)] <- 1

alpha <- 0.05
mlr_test_result_combat <- p.adjust(lm_pvalues_combat,method = "fdr") < alpha
table(mlr_test_result_combat, diff_genes)

##                       diff_genes
## mlr_test_result_combat FALSE TRUE
##                  FALSE  2103  232
##                  TRUE      1    2

From the confusion matrix computed using the p-values on coefficient of bTreatment, we could see that there are almost no genes left after the adjustment over the confounding batch factors. The Combat approach results in an “over-kill” on our data because the biological label is heavily confounded with the batch label.

3.2. Unsupervised batch effect correction by surrogate variable analysis

SVA is a factor analysis based approach that can estimate the latent batch factors without knowing the batch / technical factors. SVA has an advantage for the new technical domains in which the artifact generation process is poorly understood.

For gene expression data, the input for SVA is a transformed assay so that the noise can be approximately Gaussian.

Task 3.1: Perform SVA on our expression data using the sva() function in package sva, store the learned surrogate variable in SV.

library(sva)
transformed_assay <- log(assay(se)+1)
transformed_assay <- transformed_assay[rowVars(transformed_assay) != 0,]
sva_fit <- sva(transformed_assay, model.matrix(~se$bio_label))

## Number of significant surrogate variables is:  1 
## Iteration (out of 5 ):1  2  3  4  5

SV <- sva_fit$sv

boxplot(SV~batch_label)

lm_coefficients_sva <- matrix(NA, nrow = gnum, ncol = 3)
linear_model_pvalues_sva <- rep(NA, gnum)
for(i in 1:gnum){
  model_matrix_i <- data.frame(y = log(assay(se)[i,]+1),
                               b = se$bio_label,
                               w = SV)
  model_fit_i <- lm(y~b+w,  data = model_matrix_i) #Fit multiple linear regression by treating the SV as the adjustment variable
  lm_coefficients_sva[i,]     <- summary(model_fit_i)$coefficients[,1]
  linear_model_pvalues_sva[i] <- summary(model_fit_i)$coefficients["bTreatment","Pr(>|t|)"]
}
colnames(lm_coefficients_sva) <- names(coefficients(model_fit_i))
linear_model_pvalues_sva[is.na(linear_model_pvalues_sva)] <- 1

mlr_test_result_sva <- p.adjust(linear_model_pvalues_sva,method = "fdr") < 0.05
table(mlr_test_result_sva, diff_genes)

##                    diff_genes
## mlr_test_result_sva FALSE TRUE
##               FALSE  2096  142
##               TRUE      8   92

We could see that the SV is highly correlated with the batch labels. After regressing the SV onto genes, the p-values over the treatment effects become much more sensitive and also able to control the FDR.

4. Performance evaluation

This section display the codes that compare the correction performances among several methods for batch effect adjustment.

Before comparing with the ground truth values, we need to first compute the corrected expression matrix for the knowledge-free methods; recall from the lecture slides, the correction for combat / sva is made by:

\[X^{\mathrm{correct}}=X-\beta W^T\] columns of \(W\) are the known batch effect factors for Combat or the surrogate variables for SVA, and rows in \(\beta\) are linear model coefficients fitted for the corresponding columns in \(W\) for each genes.

Task 4.1: Calculate the corrected expression matrix for Combat and SVA using coefficients \(\beta\) and batch effect factors \(W\), those matrices are either known or fitted in the previous sections.

X <- log(assay(se)+1)
#Check the regression fits over genes
head(lm_coefficients_combat,3)

##      (Intercept) bTreatment    wBatch2    wBatch3
## [1,]    3.623387  0.1227384  0.2713830  0.4180212
## [2,]    3.807989  0.6540371 -0.2826288 -0.5602858
## [3,]    3.033766  0.1685644  0.1372033 -0.2337485

head(lm_coefficients_sva,3)

##      (Intercept) bTreatment           w
## [1,]    3.775098  0.2984718 -0.54539548
## [2,]    3.628120  0.4148114  0.76275349
## [3,]    3.076015 -0.0297570  0.03103741

#Calculate the corrected (transformed) expression matrix for combat and sva
corrected_X_sva <- X - lm_coefficients_sva[,"w"] %*% t(SV)
corrected_X_combat <- X - lm_coefficients_combat[,c("wBatch2","wBatch3")] %*% t(cbind(se$batch_label == "Batch2",
                                                                                      se$batch_label == "Batch3"))

corrected_X_sva[1:3, 1:3]

##           sample_1 sample_2 sample_3
## 10000     3.540082 4.252870 2.825335
## 100126331 4.010457 3.831488 3.549546
## 100126346 2.882468 3.325049 2.701376

corrected_X_combat[1:3, 1:3]

##           sample_1 sample_2 sample_3
## 10000     3.401197 4.127134 2.708050
## 100126331 4.204693 4.007333 3.713572
## 100126346 2.890372 3.332205 2.708050

library(rafalib)
PC_df <- function(x, method){
cov_mat <- cov(x)
decomp <- eigen(cov_mat)
data.frame(PC1 = decomp$vectors[,1], 
           PC2 = decomp$vectors[,2], 
           batch_label = se$batch_label,
           bio_label = se$bio_label,
           method = method)
}

plot_df <- rbind(PC_df(log(assay(se)+1),"no correction"),
                 PC_df(log(assay_correct_knowledge+1),"GC correction"),
                 PC_df(corrected_X_combat,"Combat"),
                 PC_df(corrected_X_sva,"SVA"))

plot_df$method <- factor(plot_df$method, levels = c("no correction", "GC correction", "Combat", "SVA"))

library(ggplot2)
ggplot(plot_df,
         aes(x=PC1,y=PC2,colour=batch_label,shape=bio_label))+
         facet_wrap(~method,nrow=2) +
         geom_point()+
         theme_classic()+
         scale_color_brewer(palette = "Dark2")

# Add another newly published method for comparison, see ?ComBat_seq for details
corrected_assay_sva <- exp(corrected_X_sva)-1
corrected_assay_combat <- exp(corrected_X_combat)-1
corrected_assay_combat_seq <- ComBat_seq(
  assay(se),
  se$batch_label,
  se$bio_label
)

## Found 3 batches
## Using full model in ComBat-seq.
## Adjusting for 1 covariate(s) or covariate level(s)
## Estimating dispersions
## Fitting the GLM model
## Shrinkage off - using GLM estimates for parameters
## Adjusting the data

GT_assay <- matrix(c(rep(gexpr_wt,15),rep(gexpr_trt,15)), ncol=sample_num)
plot_df2 <- data.frame(MSE = c( apply(GT_assay - assay(se), 2, function(x) mean((x)^2)),
                                apply(GT_assay - assay_correct_knowledge, 2, function(x) mean((x)^2)),
                                apply(GT_assay - corrected_assay_sva, 2, function(x) mean((x)^2)),
                                apply(GT_assay - corrected_assay_combat, 2, function(x) mean((x)^2)),
                                apply(GT_assay - corrected_assay_combat_seq, 2, function(x) mean((x)^2))),
                       Methods = rep(c("un-corrected","GC corrected","SVA corrected", "Combat corrected", "Combat seq corrected"), each = sample_num))

plot_df2$Methods <- factor(plot_df2$Methods, levels = c("un-corrected", "Combat corrected", "Combat seq corrected", "SVA corrected", "GC corrected"))

ggplot(plot_df2, aes(x=Methods, y=MSE, color=Methods))+ geom_boxplot(outlier.alpha = 0) + geom_jitter(width = 0.1, alpha=0.8,size = 0.5) + theme_bw() + scale_color_brewer(palette = "Dark2") + theme(axis.text.x = element_text(angle = 310, vjust = -0.5) )

The results show that although all of the 3 correction methods can lead to improvements in the PCA plots, the MSEs evaluated between the corrected matrix and the ground truth demonstrate that the knowledge based correction method outperform the rest in terms of estimating the true expression levels.

Session Info

sessioninfo::session_info()

## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value                       
##  version  R version 4.0.3 (2020-10-10)
##  os       Ubuntu 18.04.5 LTS          
##  system   x86_64, linux-gnu           
##  ui       X11                         
##  language (EN)                        
##  collate  en_US.UTF-8                 
##  ctype    en_US.UTF-8                 
##  tz       Asia/Shanghai               
##  date     2023-03-08                  
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  package                           * version  date       lib source        
##  annotate                            1.67.0   2020-04-27 [2] Bioconductor  
##  AnnotationDbi                     * 1.51.0   2020-04-27 [2] Bioconductor  
##  askpass                             1.1      2019-01-13 [2] CRAN (R 4.0.0)
##  assertthat                          0.2.1    2019-03-21 [2] CRAN (R 4.0.0)
##  Biobase                           * 2.49.0   2020-04-27 [2] Bioconductor  
##  BiocFileCache                       1.13.0   2020-04-27 [2] Bioconductor  
##  BiocGenerics                      * 0.35.2   2020-05-06 [2] Bioconductor  
##  BiocParallel                      * 1.23.0   2020-04-27 [2] Bioconductor  
##  biomaRt                             2.45.0   2020-04-27 [2] Bioconductor  
##  Biostrings                        * 2.57.0   2020-04-27 [2] Bioconductor  
##  bit                                 1.1-15.2 2020-02-10 [2] CRAN (R 4.0.0)
##  bit64                               0.9-7    2017-05-08 [2] CRAN (R 4.0.0)
##  bitops                              1.0-6    2013-08-17 [2] CRAN (R 4.0.0)
##  blob                                1.2.1    2020-01-20 [2] CRAN (R 4.0.0)
##  BSgenome                          * 1.57.0   2020-04-27 [2] Bioconductor  
##  BSgenome.Hsapiens.UCSC.hg19       * 1.4.3    2020-05-13 [2] Bioconductor  
##  cli                                 2.0.2    2020-02-28 [2] CRAN (R 4.0.0)
##  codetools                           0.2-16   2018-12-24 [4] CRAN (R 4.0.0)
##  colorspace                          1.4-1    2019-03-18 [2] CRAN (R 4.0.0)
##  crayon                              1.3.4    2017-09-16 [2] CRAN (R 4.0.0)
##  curl                                4.3      2019-12-02 [2] CRAN (R 4.0.0)
##  DBI                                 1.1.0    2019-12-15 [2] CRAN (R 4.0.0)
##  dbplyr                              1.4.3    2020-04-19 [2] CRAN (R 4.0.0)
##  DelayedArray                      * 0.15.1   2020-05-01 [2] Bioconductor  
##  digest                              0.6.25   2020-02-23 [2] CRAN (R 4.0.0)
##  dplyr                               0.8.5    2020-03-07 [2] CRAN (R 4.0.0)
##  edgeR                               3.32.1   2021-01-14 [2] Bioconductor  
##  ellipsis                            0.3.0    2019-09-20 [2] CRAN (R 4.0.0)
##  evaluate                            0.14     2019-05-28 [2] CRAN (R 4.0.0)
##  fansi                               0.4.1    2020-01-08 [2] CRAN (R 4.0.0)
##  farver                              2.0.3    2020-01-16 [2] CRAN (R 4.0.0)
##  genefilter                        * 1.71.0   2020-04-27 [2] Bioconductor  
##  GenomeInfoDb                      * 1.25.0   2020-04-27 [2] Bioconductor  
##  GenomeInfoDbData                    1.2.3    2020-05-12 [2] Bioconductor  
##  GenomicAlignments                   1.25.0   2020-04-27 [2] Bioconductor  
##  GenomicFeatures                   * 1.41.0   2020-04-27 [2] Bioconductor  
##  GenomicRanges                     * 1.41.1   2020-05-02 [2] Bioconductor  
##  ggplot2                           * 3.3.0    2020-03-05 [2] CRAN (R 4.0.0)
##  glue                                1.4.0    2020-04-03 [2] CRAN (R 4.0.0)
##  gtable                              0.3.0    2019-03-25 [2] CRAN (R 4.0.0)
##  hms                                 0.5.3    2020-01-08 [2] CRAN (R 4.0.0)
##  htmltools                           0.4.0    2019-10-04 [2] CRAN (R 4.0.0)
##  httr                                1.4.1    2019-08-05 [2] CRAN (R 4.0.0)
##  IRanges                           * 2.23.4   2020-05-03 [2] Bioconductor  
##  knitr                               1.28     2020-02-06 [2] CRAN (R 4.0.0)
##  labeling                            0.3      2014-08-23 [2] CRAN (R 4.0.0)
##  lattice                             0.20-41  2020-04-02 [4] CRAN (R 4.0.0)
##  lifecycle                           0.2.0    2020-03-06 [2] CRAN (R 4.0.0)
##  limma                               3.46.0   2020-10-27 [2] Bioconductor  
##  locfit                              1.5-9.4  2020-03-25 [2] CRAN (R 4.0.0)
##  logspline                           2.1.16   2020-05-08 [2] CRAN (R 4.0.0)
##  magrittr                            1.5      2014-11-22 [2] CRAN (R 4.0.0)
##  Matrix                              1.4-1    2022-03-23 [1] CRAN (R 4.0.3)
##  matrixStats                       * 0.56.0   2020-03-13 [2] CRAN (R 4.0.0)
##  memoise                             1.1.0    2017-04-21 [2] CRAN (R 4.0.0)
##  mgcv                              * 1.8-33   2020-08-27 [4] CRAN (R 4.0.2)
##  munsell                             0.5.0    2018-06-12 [2] CRAN (R 4.0.0)
##  nlme                              * 3.1-149  2020-08-23 [4] CRAN (R 4.0.2)
##  openssl                             1.4.1    2019-07-18 [2] CRAN (R 4.0.0)
##  pillar                              1.4.4    2020-05-05 [2] CRAN (R 4.0.0)
##  pkgconfig                           2.0.3    2019-09-22 [2] CRAN (R 4.0.0)
##  plyr                                1.8.6    2020-03-03 [2] CRAN (R 4.0.0)
##  polyester                         * 1.26.0   2020-10-27 [2] Bioconductor  
##  prettyunits                         1.1.1    2020-01-24 [2] CRAN (R 4.0.0)
##  progress                            1.2.2    2019-05-16 [2] CRAN (R 4.0.0)
##  purrr                               0.3.4    2020-04-17 [2] CRAN (R 4.0.0)
##  R6                                  2.4.1    2019-11-12 [2] CRAN (R 4.0.0)
##  rafalib                           * 1.0.0    2015-08-09 [2] CRAN (R 4.0.0)
##  rappdirs                            0.3.1    2016-03-28 [2] CRAN (R 4.0.0)
##  RColorBrewer                        1.1-2    2014-12-07 [2] CRAN (R 4.0.0)
##  Rcpp                                1.0.9    2022-07-08 [1] CRAN (R 4.0.3)
##  RCurl                               1.98-1.2 2020-04-18 [2] CRAN (R 4.0.0)
##  reshape2                            1.4.4    2020-04-09 [2] CRAN (R 4.0.0)
##  rlang                               1.0.5    2022-08-31 [1] CRAN (R 4.0.3)
##  rmarkdown                           2.1      2020-01-20 [2] CRAN (R 4.0.0)
##  Rsamtools                           2.5.0    2020-04-27 [2] Bioconductor  
##  RSQLite                             2.2.0    2020-01-07 [2] CRAN (R 4.0.0)
##  rstudioapi                          0.11     2020-02-07 [2] CRAN (R 4.0.0)
##  rtracklayer                       * 1.49.1   2020-05-02 [2] Bioconductor  
##  S4Vectors                         * 0.27.5   2020-05-06 [2] Bioconductor  
##  scales                              1.1.1    2020-05-11 [2] CRAN (R 4.0.0)
##  sessioninfo                         1.1.1    2018-11-05 [2] CRAN (R 4.0.0)
##  stringi                             1.4.6    2020-02-17 [2] CRAN (R 4.0.0)
##  stringr                             1.4.0    2019-02-10 [2] CRAN (R 4.0.0)
##  SummarizedExperiment              * 1.19.2   2020-05-01 [2] Bioconductor  
##  survival                            3.2-7    2020-09-28 [4] CRAN (R 4.0.2)
##  sva                               * 3.38.0   2020-10-27 [2] Bioconductor  
##  tibble                              3.0.1    2020-04-20 [2] CRAN (R 4.0.0)
##  tidyselect                          1.1.0    2020-05-11 [2] CRAN (R 4.0.0)
##  TxDb.Hsapiens.UCSC.hg19.knownGene * 3.2.2    2020-05-13 [2] Bioconductor  
##  vctrs                               0.3.0    2020-05-11 [2] CRAN (R 4.0.0)
##  withr                               2.2.0    2020-04-20 [2] CRAN (R 4.0.0)
##  xfun                                0.13     2020-04-13 [2] CRAN (R 4.0.0)
##  XML                                 3.99-0.3 2020-01-20 [2] CRAN (R 4.0.0)
##  xtable                              1.8-4    2019-04-21 [2] CRAN (R 4.0.0)
##  XVector                           * 0.29.0   2020-04-27 [2] Bioconductor  
##  yaml                                2.2.1    2020-02-01 [2] CRAN (R 4.0.0)
##  zlibbioc                            1.35.0   2020-04-27 [2] Bioconductor  
## 
## [1] /home/zhen/R/x86_64-pc-linux-gnu-library/4.0
## [2] /usr/local/lib/R/site-library
## [3] /usr/lib/R/site-library
## [4] /usr/lib/R/library