This exercise demonstrates the use of several core methods in batch effect removal by using a simulated RNA-Seq data set. The performance of different correction methods will be measured by their effects on gene expression level quantification and the differential gene expression analysis.
The count matrix is sampled from the negative binomial distributions, where the batch effects are specified using the GC content bias model implemented by the polyester
package. We will simulate 30 samples under 3 batches and 2 biological conditions, and a unique GC bias trend is added for each batch.
10% of genes are set to be deferentially expressed between the 2 conditions (WT and treatment); the effect sizes for differential expression was set to be either 1/2 or 2 with equal probability.
# Extract the genes on chr1 of the human genome
library(BSgenome.Hsapiens.UCSC.hg19)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
genome <- BSgenome.Hsapiens.UCSC.hg19
exbg_chr1 <- keepSeqlevels(exonsBy(txdb, by = "gene"), "chr1", pruning.mode="coarse")
geneSeq_chr1 <- extractTranscriptSeqs(Hsapiens, exbg_chr1) #Gene sequences (exons only)
gnum <- length(geneSeq_chr1) #Total number of genes
# Simulate a count matrix containing 30 samples
sample_num <- 30
bio_label <- rep(c("WT", "Treatment"), each = 15) # Setup biological conditions
batch_label <- rep(c("Batch1", "Batch2", "Batch3"), c(10, 8, 12)) # Setup batches
set.seed(158)
gexpr_wt <- rgamma(gnum,shape = 4,scale = 10)
gexpr_trt <- gexpr_wt
num_diff <- round(gnum*0.1) # 10% of genes are deferentially expressed
diff_indx <- sample.int(gnum, num_diff)
diff_size <- sample(c(0.5,2), size = num_diff, replace = TRUE) # The fold changes are either 1/2 or 2
gexpr_trt[diff_indx] <- gexpr_trt[diff_indx] * diff_size
mu_gexpr <- c(rep(gexpr_wt,15), rep(gexpr_trt,15)) # Ground truth gene expression level
#Generate gene count using NB distribution
readmat <- matrix(rnbinom(sample_num*gnum, mu = mu_gexpr, size = 5), ncol=sample_num)
# Add GC content bias using built-in bias models in package polyester
library(polyester)
gc_trends <- rep(1, sample_num)
gc_trends[batch_label=="Batch2"] <- 4
gc_trends[batch_label=="Batch3"] <- 7 #Specify different GC biases for different batches
set.seed(137)
readmat_biased <- add_gc_bias(readmat, gc_trends, geneSeq_chr1)
readmat_biased[is.na(readmat_biased)] <- 0
readmat_biased <- pmax(readmat_biased, 0)
# Make a summarized experiment object to store the gene expression data
library(SummarizedExperiment)
se <- SummarizedExperiment(assay = list(count = readmat_biased),
colData = DataFrame( bio_label = bio_label,
batch_label = batch_label ))
rowRanges(se) <- exbg_chr1
diff_genes <- rep(FALSE, gnum)
diff_genes[diff_indx] <- TRUE
rowData(se) <- data.frame(GC = letterFrequency(geneSeq_chr1, "GC", as.prob = TRUE)[,1],
true_gexpr_wt = gexpr_wt,
true_gexpr_trt = gexpr_trt,
diff_genes = diff_genes) #Store the GC contents and the "ground truth" under rowData()
colnames(se) <- paste0("sample_", 1:sample_num)
The experimental designs and count matrix can be accessed by colData()
and assay()
, respectively.
se
## class: RangedSummarizedExperiment
## dim: 2338 30
## metadata(0):
## assays(1): count
## rownames(2338): 10000 100126331 ... 9970 998
## rowData names(4): GC true_gexpr_wt true_gexpr_trt diff_genes
## colnames(30): sample_1 sample_2 ... sample_29 sample_30
## colData names(2): bio_label batch_label
colData(se) #Biological and batch labels
## DataFrame with 30 rows and 2 columns
## bio_label batch_label
## <character> <character>
## sample_1 WT Batch1
## sample_2 WT Batch1
## sample_3 WT Batch1
## sample_4 WT Batch1
## sample_5 WT Batch1
## ... ... ...
## sample_26 Treatment Batch3
## sample_27 Treatment Batch3
## sample_28 Treatment Batch3
## sample_29 Treatment Batch3
## sample_30 Treatment Batch3
rowData(se) #Gene's gc content and the "ground truth" labels
## DataFrame with 2338 rows and 4 columns
## GC true_gexpr_wt true_gexpr_trt diff_genes
## <numeric> <numeric> <numeric> <logical>
## 10000 0.400992 52.2877 52.2877 FALSE
## 100126331 0.441860 40.5478 40.5478 FALSE
## 100126346 0.341772 28.6339 28.6339 FALSE
## 100126348 0.637500 47.6883 47.6883 FALSE
## 100126349 0.553571 38.9102 38.9102 FALSE
## ... ... ... ... ...
## 9928 0.382713 23.3776 23.3776 FALSE
## 9939 0.438617 25.3532 25.3532 FALSE
## 9967 0.482265 49.4985 49.4985 FALSE
## 9970 0.529275 15.2699 15.2699 FALSE
## 998 0.417894 49.7120 49.7120 FALSE
Since the aim of EDA is to discover unexpected issues in your data, EDA is the key step within a counter batch effect work-flow. EDA can provide regular “sanity checks” while conducting complex data science projects; it is often absolutely necessary to ensure the validity of a project (although it may not guarantee to always make the result more “positive”).
Task 1.2: Apply hierarchically cluster over samples using the transformed assay of log(count+1); use the euclidean distance as the input metric, and set the clustering method to “ward.D2”.
hc <- hclust(dist(t(log(assay(se)+1))), method = "ward.D2")
Next, the hclust dendrograms are plotted under 2 types of labels.
library(rafalib)
myplclust(hc, labels = se$batch_label, lab.col = as.fumeric(as.character(se$batch_label)), cex = 0.5)
myplclust(hc, labels = se$bio_label, lab.col = as.fumeric(as.character(se$bio_label)), cex = 0.5)
Judging from the tree structure, we could see that both the biological and batch factors are significantly contributing to the column-wised distances. However, the experimental conditions seem to have a lager effect, since the partitions under k = 2 can perfectly classify the WT and treatment labels.
Task 1.3: Compute the empirical covarance matrix on the transformed matrix (log(count + 1)), then calculate the 1st and 2nd principal components from the covariance matrix. Store the PCs into the variable names of PC1
and PC2
.
cov_mat <- cov(log(assay(se)+1))
decomp <- eigen(cov_mat)
PC1 <- decomp$vectors[,1]
PC2 <- decomp$vectors[,2]
image(cov_mat)
The covariance matrix clearly showing chunks explained by the 3 batches.
library(ggplot2)
ggplot(data.frame(PC1 = PC1,
PC2 = PC2,
batch_label = batch_label,
bio_label = bio_label),
aes(x=PC1,y=PC2,colour=batch_label,shape=bio_label))+
geom_point()+
theme_classic()+
scale_color_brewer(palette = "Dark2")
par(mfrow = c(1,2))
boxplot(PC1~batch_label)
boxplot(PC2~batch_label)
dev.off()
## null device
## 1
From the dot plots and the box-plots above, we could see that both the PC1 and PC2 are learning the batch labels pretty well. It is good in a sense that it indicates the PCA approach can capture the batch factors induced by the GC-content biases. We can therefore able to correct the batch effect directly using factor analysis approach such as SVA. (will be done in 3.2)
Before implementing any correction measures, we will perform DGEA over rows using t-test, and its accuracy will be used as a baseline for the lateral corrections.
Task 1.4.1: For all genes, compute the t-test p-values (for the difference in means) between the WT and the treatment groups. Then, adjust the p-values using the “BH” approach.
Note: use the rowttests()
, which is the vectorized t-test function exported from the package genefilter
. The t-test should be based on the transformed count of log(count+1).
library(genefilter)
t_test_pvalues <- rowttests(log(assay(se)+1), factor(se$bio_label))$p.value
padj <- p.adjust(t_test_pvalues, method = "BH")
alpha <- 0.05
sig_diff_genes <- padj <= alpha
true_diff_genes <- rowData(se)$diff_genes
table(sig_diff_genes, true_diff_genes)
## true_diff_genes
## sig_diff_genes FALSE TRUE
## FALSE 1971 72
## TRUE 132 162
Question 1.4.2: Based on the computed confusion matrix, could the alpha value (0.05) control the FDR? If not, what might be the cause of it?
Your answer:
In the section bellow, we will correct the data through the knowledge-based approach, which is to model the generation process of the technical artifact. For NGS, the predominant source of artifact often comes from the biased amplification induced by the GC contents of cDNA fragments.
Often, such bias can be effectively traced by drawing the smoothed regression fits between the GC contents (of the gene features) and the read counts.
poisson_smooth <- function(...) {
geom_smooth(method = "glm", method.args = list(family = "poisson"), ...)
}
plot_df <- reshape2::melt(assay(se))[,-1]
colnames(plot_df) <- c("sample","gene_count")
plot_df$GC <- rowData(se)$GC
plot_df$batch <- rep(se$batch_label, each = gnum)
ggplot(plot_df, aes(x = GC,
y = gene_count,
group = sample,
color = batch)) +
poisson_smooth(formula = y ~ splines::ns(x, 5), size = 0.5, se = FALSE) +
theme_classic() + scale_color_brewer(palette = "Dark2") +
labs(x = "GC content",
y = "Rate of read count",
main = "Poisson Regression")
The diagram above shows the Poisson GLM fits, in which the Y-axis represented the fitted expected count (\(\lambda\) of Poisson distribution). We could see that after smoothing the covariate with splines, there are 3 different trends under each batches. For all of the samples, the likelihood of observing reads coming from genes with extremely high / low GC contents is very small.
From the lecture, we know that the GC bias in NGS is often modeled by the following statistical model.
For gene \(i\), the count \(Y_i\) follows a Poisson distribution:
\[Y_i \sim \mathrm{Poisson}(\lambda_i)\] The mean \(\lambda\) is decomposed into the product of \(\theta_i \times f(X_i) \times s\).
We have discussed in the lecture that the \(f(X_i)\) is estimated from the fitted values of the log Poisson linear models (same as the model fits in the above diagram). Afterward, the corrected read count can be derived by \(Y_i^{\mathrm{correct}}=Y_i / \hat f(X_i)\). (P.S. our simulation here does not include differences in sequencing depths (\(s\)), hence we can treat the count after correction \(Y_i^\mathrm{correct}\) equal to the expression value estimate \(\hat \theta_i\))
However, for identifiability, we often imply the constraint \(\frac{1}{n}\sum_{\forall i} \hat f(X_i)=1\) on the GC bias term. To realize such constraint, we can center the linear model fits on the log scale (i.e. the scale of the linear combination).
P.S. The identifiability constraints for multiplicative terms are often by mean = 1, and for additive terms are often by mean = 0. Please note that the multiplicative terms can become additive on log scale.
Task 2.1: Finish the code chunk bellow, so that a matrix of correction offsets can be derived at the end. The step you need to complete is to center (i.e. x-mean(x) ) the GLM fits for the identifiability constraint.
correction_offsets <- matrix(NA, nrow = gnum, ncol = sample_num)
for(i in 1:ncol(se)){
glm_fit_i <- glm(y ~ splines::ns(x, 5), data = data.frame(x=rowData(se)$GC, y=assay(se)[,i]), family = "poisson")
fitted_values_i <- predict(glm_fit_i) #Calculate the glm fits on the linear combination scale
correction_offsets[,i] <- exp(fitted_values_i - mean(fitted_values_i)) #Center the glm fits and exponentiate it to get the offsets on the Poisson rate scale
}
correction_offsets[1:2,1:3]
## [,1] [,2] [,3]
## [1,] 1.006335 1.010293 1.010734
## [2,] 1.057867 1.064325 1.055998
Task 2.2: Using the correction offsets calculated above, derive the GC corrected count matrix in assay_correct_knowledge
.
#Create functions to compute confusion matrix and PCA plots from count matrix X
DGEAcfm <- function(X,biolab,diff_genes){
Ttest <- rowttests(log(X+1), factor(biolab))
t_test_result <- p.adjust(Ttest$p.value, method = "fdr") < 0.05
table(t_test_result, diff_genes)
}
DGEAcfm(assay_correct_knowledge, se$bio_label, rowData(se)$diff_genes)
## diff_genes
## t_test_result FALSE TRUE
## FALSE 2092 70
## TRUE 11 164
PCAplot <- function(X, biolab, batchlab){
require(ggplot2)
cov_mat <- cov(log(X+1))
decomp <- eigen(cov_mat)
PC1 <- decomp$vectors[,1]
PC2 <- decomp$vectors[,2]
ggplot(data.frame(PC1 = PC1,
PC2 = PC2,
batch_label = batchlab,
bio_label = biolab),
aes(x = PC1, y = PC2, colour = batch_label, shape = bio_label))+
geom_point() +
theme_classic() +
scale_color_brewer(palette = "Dark2")
}
PCAplot(assay_correct_knowledge, se$bio_label, se$batch_label)
Question 2.3: Based on the confusion matrix, could the t-test now better control over the FDR after GC correction?
Your answer:
In the class, we have discussed that when the batch factors are given, we can directly fit multiple linear regression models for each genes by treating the batch labels as the covariates for adjustment. The regression fits and corresponding p values are consequently adjusted over the batch confounding.
Task 3.1: Fill the regression design in the code chunk below to complete the Combat adjustment.
lm_coefficients_combat <- matrix(NA, nrow = gnum, ncol = 4)
lm_pvalues_combat <- rep(NA, gnum)
se$bio_label <- factor(se$bio_label, levels = c("WT","Treatment")) #Set WT as the reference level in linear model fits
for(i in 1:gnum){
model_fit_i <- lm(y~b+w, data = data.frame(y = log(assay(se)[i,] + 1),
b = se$bio_label,
w = se$batch_label))
lm_coefficients_combat[i,] <- summary(model_fit_i)$coefficients[,1]
lm_pvalues_combat[i] <- summary(model_fit_i)$coefficients["bTreatment","Pr(>|t|)"]
}
colnames(lm_coefficients_combat) <- names(coefficients(model_fit_i))
lm_pvalues_combat[is.na(lm_pvalues_combat)] <- 1
alpha <- 0.05
mlr_test_result_combat <- p.adjust(lm_pvalues_combat,method = "fdr") < alpha
table(mlr_test_result_combat, diff_genes)
## diff_genes
## mlr_test_result_combat FALSE TRUE
## FALSE 2103 232
## TRUE 1 2
From the confusion matrix computed using the p-values on coefficient of bTreatment
, we could see that there are almost no genes left after the adjustment over the confounding batch factors. The Combat approach results in an “over-kill” on our data because the biological label is heavily confounded with the batch label.
SVA is a factor analysis based approach that can estimate the latent batch factors without knowing the batch / technical factors. SVA has an advantage for the new technical domains in which the artifact generation process is poorly understood.
For gene expression data, the input for SVA is a transformed assay so that the noise can be approximately Gaussian.
Task 3.1: Perform SVA on our expression data using the sva()
function in package sva, store the learned surrogate variable in SV
.
library(sva)
transformed_assay <- log(assay(se)+1)
transformed_assay <- transformed_assay[rowVars(transformed_assay) != 0,]
sva_fit <- sva(transformed_assay, model.matrix(~se$bio_label))
## Number of significant surrogate variables is: 1
## Iteration (out of 5 ):1 2 3 4 5
SV <- sva_fit$sv
boxplot(SV~batch_label)
lm_coefficients_sva <- matrix(NA, nrow = gnum, ncol = 3)
linear_model_pvalues_sva <- rep(NA, gnum)
for(i in 1:gnum){
model_matrix_i <- data.frame(y = log(assay(se)[i,]+1),
b = se$bio_label,
w = SV)
model_fit_i <- lm(y~b+w, data = model_matrix_i) #Fit multiple linear regression by treating the SV as the adjustment variable
lm_coefficients_sva[i,] <- summary(model_fit_i)$coefficients[,1]
linear_model_pvalues_sva[i] <- summary(model_fit_i)$coefficients["bTreatment","Pr(>|t|)"]
}
colnames(lm_coefficients_sva) <- names(coefficients(model_fit_i))
linear_model_pvalues_sva[is.na(linear_model_pvalues_sva)] <- 1
mlr_test_result_sva <- p.adjust(linear_model_pvalues_sva,method = "fdr") < 0.05
table(mlr_test_result_sva, diff_genes)
## diff_genes
## mlr_test_result_sva FALSE TRUE
## FALSE 2096 142
## TRUE 8 92
We could see that the SV is highly correlated with the batch labels. After regressing the SV onto genes, the p-values over the treatment effects become much more sensitive and also able to control the FDR.
This section display the codes that compare the correction performances among several methods for batch effect adjustment.
Before comparing with the ground truth values, we need to first compute the corrected expression matrix for the knowledge-free methods; recall from the lecture slides, the correction for combat / sva is made by:
\[X^{\mathrm{correct}}=X-\beta W^T\] columns of \(W\) are the known batch effect factors for Combat or the surrogate variables for SVA, and rows in \(\beta\) are linear model coefficients fitted for the corresponding columns in \(W\) for each genes.
Task 4.1: Calculate the corrected expression matrix for Combat and SVA using coefficients \(\beta\) and batch effect factors \(W\), those matrices are either known or fitted in the previous sections.
X <- log(assay(se)+1)
#Check the regression fits over genes
head(lm_coefficients_combat,3)
## (Intercept) bTreatment wBatch2 wBatch3
## [1,] 3.623387 0.1227384 0.2713830 0.4180212
## [2,] 3.807989 0.6540371 -0.2826288 -0.5602858
## [3,] 3.033766 0.1685644 0.1372033 -0.2337485
head(lm_coefficients_sva,3)
## (Intercept) bTreatment w
## [1,] 3.775098 0.2984718 -0.54539548
## [2,] 3.628120 0.4148114 0.76275349
## [3,] 3.076015 -0.0297570 0.03103741
#Calculate the corrected (transformed) expression matrix for combat and sva
corrected_X_sva <- X - lm_coefficients_sva[,"w"] %*% t(SV)
corrected_X_combat <- X - lm_coefficients_combat[,c("wBatch2","wBatch3")] %*% t(cbind(se$batch_label == "Batch2",
se$batch_label == "Batch3"))
corrected_X_sva[1:3, 1:3]
## sample_1 sample_2 sample_3
## 10000 3.540082 4.252870 2.825335
## 100126331 4.010457 3.831488 3.549546
## 100126346 2.882468 3.325049 2.701376
corrected_X_combat[1:3, 1:3]
## sample_1 sample_2 sample_3
## 10000 3.401197 4.127134 2.708050
## 100126331 4.204693 4.007333 3.713572
## 100126346 2.890372 3.332205 2.708050
library(rafalib)
PC_df <- function(x, method){
cov_mat <- cov(x)
decomp <- eigen(cov_mat)
data.frame(PC1 = decomp$vectors[,1],
PC2 = decomp$vectors[,2],
batch_label = se$batch_label,
bio_label = se$bio_label,
method = method)
}
plot_df <- rbind(PC_df(log(assay(se)+1),"no correction"),
PC_df(log(assay_correct_knowledge+1),"GC correction"),
PC_df(corrected_X_combat,"Combat"),
PC_df(corrected_X_sva,"SVA"))
plot_df$method <- factor(plot_df$method, levels = c("no correction", "GC correction", "Combat", "SVA"))
library(ggplot2)
ggplot(plot_df,
aes(x=PC1,y=PC2,colour=batch_label,shape=bio_label))+
facet_wrap(~method,nrow=2) +
geom_point()+
theme_classic()+
scale_color_brewer(palette = "Dark2")
# Add another newly published method for comparison, see ?ComBat_seq for details
corrected_assay_sva <- exp(corrected_X_sva)-1
corrected_assay_combat <- exp(corrected_X_combat)-1
corrected_assay_combat_seq <- ComBat_seq(
assay(se),
se$batch_label,
se$bio_label
)
## Found 3 batches
## Using full model in ComBat-seq.
## Adjusting for 1 covariate(s) or covariate level(s)
## Estimating dispersions
## Fitting the GLM model
## Shrinkage off - using GLM estimates for parameters
## Adjusting the data
GT_assay <- matrix(c(rep(gexpr_wt,15),rep(gexpr_trt,15)), ncol=sample_num)
plot_df2 <- data.frame(MSE = c( apply(GT_assay - assay(se), 2, function(x) mean((x)^2)),
apply(GT_assay - assay_correct_knowledge, 2, function(x) mean((x)^2)),
apply(GT_assay - corrected_assay_sva, 2, function(x) mean((x)^2)),
apply(GT_assay - corrected_assay_combat, 2, function(x) mean((x)^2)),
apply(GT_assay - corrected_assay_combat_seq, 2, function(x) mean((x)^2))),
Methods = rep(c("un-corrected","GC corrected","SVA corrected", "Combat corrected", "Combat seq corrected"), each = sample_num))
plot_df2$Methods <- factor(plot_df2$Methods, levels = c("un-corrected", "Combat corrected", "Combat seq corrected", "SVA corrected", "GC corrected"))
ggplot(plot_df2, aes(x=Methods, y=MSE, color=Methods))+ geom_boxplot(outlier.alpha = 0) + geom_jitter(width = 0.1, alpha=0.8,size = 0.5) + theme_bw() + scale_color_brewer(palette = "Dark2") + theme(axis.text.x = element_text(angle = 310, vjust = -0.5) )
The results show that although all of the 3 correction methods can lead to improvements in the PCA plots, the MSEs evaluated between the corrected matrix and the ground truth demonstrate that the knowledge based correction method outperform the rest in terms of estimating the true expression levels.
sessioninfo::session_info()
## ─ Session info ───────────────────────────────────────────────────────────────
## setting value
## version R version 4.0.3 (2020-10-10)
## os Ubuntu 18.04.5 LTS
## system x86_64, linux-gnu
## ui X11
## language (EN)
## collate en_US.UTF-8
## ctype en_US.UTF-8
## tz Asia/Shanghai
## date 2023-03-08
##
## ─ Packages ───────────────────────────────────────────────────────────────────
## package * version date lib source
## annotate 1.67.0 2020-04-27 [2] Bioconductor
## AnnotationDbi * 1.51.0 2020-04-27 [2] Bioconductor
## askpass 1.1 2019-01-13 [2] CRAN (R 4.0.0)
## assertthat 0.2.1 2019-03-21 [2] CRAN (R 4.0.0)
## Biobase * 2.49.0 2020-04-27 [2] Bioconductor
## BiocFileCache 1.13.0 2020-04-27 [2] Bioconductor
## BiocGenerics * 0.35.2 2020-05-06 [2] Bioconductor
## BiocParallel * 1.23.0 2020-04-27 [2] Bioconductor
## biomaRt 2.45.0 2020-04-27 [2] Bioconductor
## Biostrings * 2.57.0 2020-04-27 [2] Bioconductor
## bit 1.1-15.2 2020-02-10 [2] CRAN (R 4.0.0)
## bit64 0.9-7 2017-05-08 [2] CRAN (R 4.0.0)
## bitops 1.0-6 2013-08-17 [2] CRAN (R 4.0.0)
## blob 1.2.1 2020-01-20 [2] CRAN (R 4.0.0)
## BSgenome * 1.57.0 2020-04-27 [2] Bioconductor
## BSgenome.Hsapiens.UCSC.hg19 * 1.4.3 2020-05-13 [2] Bioconductor
## cli 2.0.2 2020-02-28 [2] CRAN (R 4.0.0)
## codetools 0.2-16 2018-12-24 [4] CRAN (R 4.0.0)
## colorspace 1.4-1 2019-03-18 [2] CRAN (R 4.0.0)
## crayon 1.3.4 2017-09-16 [2] CRAN (R 4.0.0)
## curl 4.3 2019-12-02 [2] CRAN (R 4.0.0)
## DBI 1.1.0 2019-12-15 [2] CRAN (R 4.0.0)
## dbplyr 1.4.3 2020-04-19 [2] CRAN (R 4.0.0)
## DelayedArray * 0.15.1 2020-05-01 [2] Bioconductor
## digest 0.6.25 2020-02-23 [2] CRAN (R 4.0.0)
## dplyr 0.8.5 2020-03-07 [2] CRAN (R 4.0.0)
## edgeR 3.32.1 2021-01-14 [2] Bioconductor
## ellipsis 0.3.0 2019-09-20 [2] CRAN (R 4.0.0)
## evaluate 0.14 2019-05-28 [2] CRAN (R 4.0.0)
## fansi 0.4.1 2020-01-08 [2] CRAN (R 4.0.0)
## farver 2.0.3 2020-01-16 [2] CRAN (R 4.0.0)
## genefilter * 1.71.0 2020-04-27 [2] Bioconductor
## GenomeInfoDb * 1.25.0 2020-04-27 [2] Bioconductor
## GenomeInfoDbData 1.2.3 2020-05-12 [2] Bioconductor
## GenomicAlignments 1.25.0 2020-04-27 [2] Bioconductor
## GenomicFeatures * 1.41.0 2020-04-27 [2] Bioconductor
## GenomicRanges * 1.41.1 2020-05-02 [2] Bioconductor
## ggplot2 * 3.3.0 2020-03-05 [2] CRAN (R 4.0.0)
## glue 1.4.0 2020-04-03 [2] CRAN (R 4.0.0)
## gtable 0.3.0 2019-03-25 [2] CRAN (R 4.0.0)
## hms 0.5.3 2020-01-08 [2] CRAN (R 4.0.0)
## htmltools 0.4.0 2019-10-04 [2] CRAN (R 4.0.0)
## httr 1.4.1 2019-08-05 [2] CRAN (R 4.0.0)
## IRanges * 2.23.4 2020-05-03 [2] Bioconductor
## knitr 1.28 2020-02-06 [2] CRAN (R 4.0.0)
## labeling 0.3 2014-08-23 [2] CRAN (R 4.0.0)
## lattice 0.20-41 2020-04-02 [4] CRAN (R 4.0.0)
## lifecycle 0.2.0 2020-03-06 [2] CRAN (R 4.0.0)
## limma 3.46.0 2020-10-27 [2] Bioconductor
## locfit 1.5-9.4 2020-03-25 [2] CRAN (R 4.0.0)
## logspline 2.1.16 2020-05-08 [2] CRAN (R 4.0.0)
## magrittr 1.5 2014-11-22 [2] CRAN (R 4.0.0)
## Matrix 1.4-1 2022-03-23 [1] CRAN (R 4.0.3)
## matrixStats * 0.56.0 2020-03-13 [2] CRAN (R 4.0.0)
## memoise 1.1.0 2017-04-21 [2] CRAN (R 4.0.0)
## mgcv * 1.8-33 2020-08-27 [4] CRAN (R 4.0.2)
## munsell 0.5.0 2018-06-12 [2] CRAN (R 4.0.0)
## nlme * 3.1-149 2020-08-23 [4] CRAN (R 4.0.2)
## openssl 1.4.1 2019-07-18 [2] CRAN (R 4.0.0)
## pillar 1.4.4 2020-05-05 [2] CRAN (R 4.0.0)
## pkgconfig 2.0.3 2019-09-22 [2] CRAN (R 4.0.0)
## plyr 1.8.6 2020-03-03 [2] CRAN (R 4.0.0)
## polyester * 1.26.0 2020-10-27 [2] Bioconductor
## prettyunits 1.1.1 2020-01-24 [2] CRAN (R 4.0.0)
## progress 1.2.2 2019-05-16 [2] CRAN (R 4.0.0)
## purrr 0.3.4 2020-04-17 [2] CRAN (R 4.0.0)
## R6 2.4.1 2019-11-12 [2] CRAN (R 4.0.0)
## rafalib * 1.0.0 2015-08-09 [2] CRAN (R 4.0.0)
## rappdirs 0.3.1 2016-03-28 [2] CRAN (R 4.0.0)
## RColorBrewer 1.1-2 2014-12-07 [2] CRAN (R 4.0.0)
## Rcpp 1.0.9 2022-07-08 [1] CRAN (R 4.0.3)
## RCurl 1.98-1.2 2020-04-18 [2] CRAN (R 4.0.0)
## reshape2 1.4.4 2020-04-09 [2] CRAN (R 4.0.0)
## rlang 1.0.5 2022-08-31 [1] CRAN (R 4.0.3)
## rmarkdown 2.1 2020-01-20 [2] CRAN (R 4.0.0)
## Rsamtools 2.5.0 2020-04-27 [2] Bioconductor
## RSQLite 2.2.0 2020-01-07 [2] CRAN (R 4.0.0)
## rstudioapi 0.11 2020-02-07 [2] CRAN (R 4.0.0)
## rtracklayer * 1.49.1 2020-05-02 [2] Bioconductor
## S4Vectors * 0.27.5 2020-05-06 [2] Bioconductor
## scales 1.1.1 2020-05-11 [2] CRAN (R 4.0.0)
## sessioninfo 1.1.1 2018-11-05 [2] CRAN (R 4.0.0)
## stringi 1.4.6 2020-02-17 [2] CRAN (R 4.0.0)
## stringr 1.4.0 2019-02-10 [2] CRAN (R 4.0.0)
## SummarizedExperiment * 1.19.2 2020-05-01 [2] Bioconductor
## survival 3.2-7 2020-09-28 [4] CRAN (R 4.0.2)
## sva * 3.38.0 2020-10-27 [2] Bioconductor
## tibble 3.0.1 2020-04-20 [2] CRAN (R 4.0.0)
## tidyselect 1.1.0 2020-05-11 [2] CRAN (R 4.0.0)
## TxDb.Hsapiens.UCSC.hg19.knownGene * 3.2.2 2020-05-13 [2] Bioconductor
## vctrs 0.3.0 2020-05-11 [2] CRAN (R 4.0.0)
## withr 2.2.0 2020-04-20 [2] CRAN (R 4.0.0)
## xfun 0.13 2020-04-13 [2] CRAN (R 4.0.0)
## XML 3.99-0.3 2020-01-20 [2] CRAN (R 4.0.0)
## xtable 1.8-4 2019-04-21 [2] CRAN (R 4.0.0)
## XVector * 0.29.0 2020-04-27 [2] Bioconductor
## yaml 2.2.1 2020-02-01 [2] CRAN (R 4.0.0)
## zlibbioc 1.35.0 2020-04-27 [2] Bioconductor
##
## [1] /home/zhen/R/x86_64-pc-linux-gnu-library/4.0
## [2] /usr/local/lib/R/site-library
## [3] /usr/lib/R/site-library
## [4] /usr/lib/R/library