In the first section of this practical, we will implement various dimensional reduction methods on a scRNA-Seq data set. The sequencing data comes from cells of mouse neocortex, which is a brain region governing higher-level cognitive and perceptual functions. The data set is originally produced by the Allen Institute.
The gene expression assay is saved in the RDS file of scRNA_se.rds
under the current directory.
The code chunk below read the RDS into R and examine its contents. We could see that the data is a SummarizedExperiment
object having 45768 genes and 511 samples (cells). The expression array can be accessed by assays(scRNA_se)[["RPKM"]]
, and the ground truth label of the cell types can be retrieved by scRNA_se$cell_type
.
library(SummarizedExperiment)
scRNA_se <- readRDS("scRNA_se.rds")
scRNA_se
## class: SummarizedExperiment
## dim: 45768 511
## metadata(0):
## assays(1): RPKM
## rownames: NULL
## rowData names(1): gene_id
## colnames: NULL
## colData names(1): cell_type
table(scRNA_se$cell_type) #Biological conditions (experimental design)
##
## CellType_1 CellType_2 CellType_3 CellType_4 CellType_5
## 68 68 108 215 52
str(rowData(scRNA_se)$gene_id) #Gene ids
## chr [1:45768] "0610005C13Rik" "0610006L08Rik" "0610007P14Rik" ...
str(assays(scRNA_se)[["RPKM"]]) #Expression array (in RPKM)
## num [1:45768, 1:511] 0 0 134 165 0 ...
table(colMeans(assays(scRNA_se)[["RPKM"]])) #Check the column averages
##
## 21.8493270407271
## 511
We will first visualize the cellular gene expression profiles by plotting the 1st and 2nd PCs of the data.
Task 1.1.1, Mark: 10%: Transform the expression values by log(RPKM+1) and calculate the PCs using R functions cov()
and eigen()
. Save the results of eigen()
into an object named by decomp
.
Hint: cov()
can calculate the empirical covariance matrix between rows of a matrix input. eigen()
can compute the eigen-decomposition of a matrix. The output of eigen()
contains 2 parts: the eigen vector matrix (columns are PCs/factors) and the eigen values (variance explained along the PCs). The eigen vector matrix can be accessed by $vectors
, and the eigen values can be accessed by $values
.
decomp = eigen(cov(log(assays(scRNA_se)[["RPKM"]]+1))) #Implement eigen-decomposition on covariance matrix of log(RPKM+1)
dim(decomp$vectors) #Check dimensionalities of W
## [1] 511 511
PC12_df <- data.frame(PC1=decomp$vectors[,1],
PC2=decomp$vectors[,2]) #Make a data.frame of PC1 and PC2
ggplot(cbind(PC12_df,label=scRNA_se$cell_type),
aes(x=PC1,y=PC2,colour=label))+
geom_point()+
theme_classic()+
scale_color_brewer(palette = "Spectral")+
labs(title = "PCA of log(RPKM+1)")
Note that the empirical covariance matrix along is sufficient to calculate all the principal components. Hence, PCA can be understood as an algorithm to summarize the covariance structure between data dimensions using a linear model.
cov_matrix <- cov(log(assays(scRNA_se)[["RPKM"]]+1))
sample_ids <- seq_len(ncol(scRNA_se))
image(cov_matrix, x = sample_ids, y = sample_ids) #The covariance structure between columns (samples)
Next, we will try to define performance measures for the dimensional reduction outcome.
Task 1.1.2, Mark: 5% First, run K means clustering (K=5) on the matrix of the top 2 PCs (already stored in the variable PC12_df
) with function kmeans()
. Subsequently, as we have done in the lab practical 2, compare the clustering result with the ground truth label in scRNA_se$cell_type
using metrics of mutual information and chi-squared test statistics.
set.seed(123)
partition <- kmeans(PC12_df, centers = 5)$cluster #Save the clustering results
table(partition, scRNA_se$cell_type) #Contingency table
##
## partition CellType_1 CellType_2 CellType_3 CellType_4 CellType_5
## 1 0 0 22 182 0
## 2 4 67 0 0 0
## 3 0 0 86 33 0
## 4 0 0 0 0 50
## 5 64 1 0 0 2
DescTools::MutInf(partition, scRNA_se$cell_type) #Mutual information
## [1] 1.630551
chisq.test(partition, scRNA_se$cell_type) #Chi-squared statistics
##
## Pearson's Chi-squared test
##
## data: partition and scRNA_se$cell_type
## X-squared = 1630.2, df = 16, p-value < 2.2e-16
The obtained clustering partitions are highly correlated with the true cell categories. However, the cell type 3 & 4 are still not successfully identified from the PCs using K-means.
In this section, we will examine the effect of scaling on PCA.
The log transformation was previously applied, as it is a natural transformation for count data.
Task 1.2.1, Mark: 5% Run PCA without log transformation and compute the mutual information on the K-means clustering result as we have done in Task 1.3.
decomp2 = eigen(cov(assays(scRNA_se)[["RPKM"]])) #Run PCA directly on RPKM
PC12_df <- data.frame(PC1=decomp2$vectors[,1],
PC2=decomp2$vectors[,2])
ggplot(cbind(PC12_df,label=scRNA_se$cell_type),
aes(x=PC1,y=PC2,colour=label))+
geom_point()+
theme_classic()+
scale_color_brewer(palette = "Spectral")+
labs(title = "PCA of RPKM")
set.seed(123)
partition <- kmeans(PC12_df, centers = 5)$cluster
DescTools::MutInf(partition, scRNA_se$cell_type)
## [1] 0.5451725
Without log transformation, the dimensional reduction outcome is compromised immediately.
Task 1.2.2, Mark: 5% Run PCA on log(RPKM+1), and replace the empirical covariance matrix with the matrix of Pearson correlation (called by R function cor()
). Again, compute the mutual information on the clustering result as we have done previously.
decomp3 = eigen(cor(log(assays(scRNA_se)[["RPKM"]]+1))) #Run normalized PCA on log(RPKM+1)
PC12_df <- data.frame(PC1=decomp3$vectors[,1],
PC2=decomp3$vectors[,2])
ggplot(cbind(PC12_df,label=scRNA_se$cell_type),
aes(x=PC1,y=PC2,colour=label))+
geom_point()+
theme_classic()+
scale_color_brewer(palette = "Spectral")+
labs(title = "PCA of log(RPKM+1) using correlation")
set.seed(123)
partition <- kmeans(PC12_df, centers = 5)$cluster
DescTools::MutInf(partition, scRNA_se$cell_type)
## [1] 1.676422
The PCA using Pearson correlation matrix is equivalent to a regular PCA after re-scaling the columns into unit variances (by dividing column standard deviations), such normalization is important when the columns are in different scales since the dimensions with very high variances can dominate the pattern in the empirical covariance matrix.
In our case of scRNA-Seq, the column standardization is less significant, since the samples are in the same scale (all RPKMs). Also, the variances of columns may contain biological information related to the cell lines, which should be estimated as the factors.
Similarly, the PCA can be performed on the row centered data (with X-rowMeans(X)
), such normalization approach can estimate the factors which are independent of the average of the observations. In general, one should always be aware of the interpretation of the factors/PCs when designing the normalization procedures before running the PCA algorithm.
There are non-linear dimensional reduction techniques that generally work better than PCA for non UMI(unique molecular indexed) scRNA-Seq. We will try two of the most commonly used non-linear embedding methods: tSNE and UMAP.
Task 1.3.1, Mark: 5% Run tSNE on log(RPKM+1) and compute the mutual information as it is done by the previous steps.
Hint: call tSNE with tsne()
in the bioconductor package M3C, the input for this function is the expression assay. Save the output of tsne()
in tsne_results
.
library(M3C)
set.seed(123)
tsne_results <- tsne(log(assays(scRNA_se)[["RPKM"]]+1))
ggplot(cbind(tsne_results$data,
label=scRNA_se$cell_type),
aes(x=X1,y=X2,colour=label))+
geom_point()+
theme_classic()+
scale_color_brewer(palette="Spectral")+
labs(title = "tSNE on log(RPKM+1)")
set.seed(123)
partition <- kmeans(tsne_results$data, centers = 5)$cluster
DescTools::MutInf(partition, scRNA_se$cell_type)
## [1] 1.864341
Task 1.3.2, Mark: 5% Run UMAP on log(RPKM+1) and compute the mutual information as it is done by the previous steps.
Hint: use umap()
from package M3C.
set.seed(123)
umap_results <- umap(log(assays(scRNA_se)[["RPKM"]]+1))
ggplot(cbind(umap_results$data,
label=scRNA_se$cell_type),
aes(x=X1,y=X2,colour=label))+
geom_point()+
theme_classic()+
scale_color_brewer(palette="Spectral")+
labs(title = "UMAP on log(RPKM+1)")
set.seed(123)
partition <- kmeans(umap_results$data, centers = 5)$cluster
DescTools::MutInf(partition, scRNA_se$cell_type)
## [1] 1.986195
We could see that both of the non-linear embedding methods can successfully differentiate most of the cell types defined by the ground truth labels.
The objective of the 2nd section is to motivate and illustrate the usage of DESeq2, a powerful software for differential gene expression analysis (DGEA). We will first explore the statistical properties of NGS count data using an RNA-Seq data set that has 48 biological replicates under 2 biological conditions sequenced by the following study:
Schurch, N. J., Schofield, P., Gierliński, M., Cole, C., Sherstnev, A., Singh, V., … & Barton, G. J. (2016). How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use?. Rna, 22(6), 839-851.
The purpose for designing such data set in the paper is to investigate the relationship between the # of biological replicates and the performances of different testing methods for DGEA. The results of their analysis supports the usage of DESeq2 and DESeq in general situations.
The data set is stored in the ERX425102_se.rds
file under the practical folder, which is a SummarizedExperiment
object containing the count matrix and the experimental design.
library(SummarizedExperiment)
ERX425102_se <- readRDS("ERX425102_se.rds")
colData(ERX425102_se) #Sample information, 2 biological conditions with 48 biological replicates for each condition
## DataFrame with 96 rows and 2 columns
## condition Replicate
## <factor> <character>
## 1 Snf2 Rep_1
## 2 Snf2 Rep_2
## 3 Snf2 Rep_3
## 4 Snf2 Rep_4
## 5 Snf2 Rep_5
## ... ... ...
## 92 WT Rep_44
## 93 WT Rep_45
## 94 WT Rep_46
## 95 WT Rep_47
## 96 WT Rep_48
ERX425102_se #Gene expression count over 7131 genes (rows) and 96 samples (columns)
## class: SummarizedExperiment
## dim: 7131 96
## metadata(0):
## assays(1): ''
## rownames(7131): 15S_rRNA 21S_rRNA ... not_aligned alignment_not_unique
## rowData names(0):
## colnames: NULL
## colData names(2): condition Replicate
Since the observations generated by biological replication can be viewed as i.i.d. samples coming from a particular distribution. In this part, we will try to fit the data with 2 commonly used distribution families for counts.
The count values for gene LSR1 under WT condition are extracted and stored in the variable x
.
indx_WT <- colData(ERX425102_se)$condition == "WT"
x <- assay(ERX425102_se)["LSR1",indx_WT]
x #The count for gene LSR1 over 48 biological replicates of condition WT
## [1] 60 163 233 163 193 375 194 84 211 158 65 138 199 131 99 314 206 411 116
## [20] 465 194 424 226 139 277 401 201 209 153 177 265 80 264 263 223 223 100 55
## [39] 66 385 232 149 114 81 109 132 243 128
Function fitdistr()
in package MASS
can fit MLE for commonly used univariate parametric distributions. For example, the code chunk bellow fits the MLE on x
using Poisson distribution. The log likelihood of the fit can be accessed by logLik()
.
library(MASS)
set.seed(123)
ff <- fitdistr(x, "Poisson") #MLE of Poisson
ff
## lambda
## 197.72917
## ( 2.02962)
logLik(ff) #log likelihood of Poisson fit
## 'log Lik.' -1382.263 (df=1)
x
; access its MLE and log likelihood as it is done in the Poisson case. Save your distribution fit into the variable name of ff2
.ff2 <- fitdistr(x, "negative binomial")
ff2
## size mu
## 3.9015361 197.7291667
## ( 0.7821531) ( 14.5906488)
logLik(ff2)
## 'log Lik.' -285.2799 (df=2)
values <- 0:600
plot_df <- data.frame(prob = c(dnbinom(values,size=ff2$estimate[1],mu = ff2$estimate[2]),dpois(values,lambda=ff$estimate)),
values = rep(values,2),
model_fit = rep(c("NB","Poisson"),each=601))
library(ggplot2)
ggplot(plot_df) + geom_line(aes(x=values,y=prob,color=model_fit)) + theme_classic() +
scale_color_manual(values=c("blue","red")) + ylab("p(x)") + xlab("x") +
geom_point(aes(x=obs,y=y), data=data.frame(obs=x,y=0), shape = 3)
Comment: The over-dispersion properties of NB can effectively capture both the biological and technical variations between replicates, while Poisson distribution can only account for the fragment sampling noise of the NGS library (shoot noise).
In the following exercises, we will examine the difference between the mean expression levels of the two biological conditions.
After drawing a boxplot for the expression levels of gene “YGR107W” under the two conditions, we could see that their means are very likely to be significantly different.
count_YGR107W <- data.frame(count = assay(ERX425102_se)["YGR107W",],
condition = ERX425102_se$condition)
count_YGR107W$condition <- factor(count_YGR107W$condition, levels = c("WT","Snf2")) #Set "WT" to the reference level
ggplot(count_YGR107W, aes(x=condition,y=log(count+1)))+geom_boxplot(outlier.alpha = 0, width = 0.5)+
geom_jitter(aes(x=condition,y=log(count+1),colour=condition),width = 0.1, alpha=0.8)+theme_bw()+
scale_colour_brewer(palette = "Set2")
WT <- log(assay(ERX425102_se)["YGR107W",][ERX425102_se$condition=="WT"] + 1)
Snf2 <- log(assay(ERX425102_se)["YGR107W",][ERX425102_se$condition=="Snf2"] + 1)
mean(WT)
## [1] 2.057573
mean(Snf2) - mean(WT)
## [1] 1.533114
Next, we will check the statistical significance of the difference using t-test.
t.test()
on the difference between 2 sample means.t.test(WT, Snf2, alternative = "two.sided")
##
## Welch Two Sample t-test
##
## data: WT and Snf2
## t = -21.475, df = 91.6, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1.674908 -1.391321
## sample estimates:
## mean of x mean of y
## 2.057573 3.590687
Next, we will use an alternative approach than t-test to evaluate differential expression. We can fit a linear regression model on log(count + 1) for gene YGR107W; the covariate should be a dummy variable indicating the conditions, with the level “WT” set as the reference group (=0).
\[\begin{pmatrix} y_1 \\ y_2 \\ \vdots \\ y_{96} \\ \end{pmatrix} =\begin{pmatrix} 1 & 0 \\ 1 & 0 \\ \vdots & \vdots \\ 1 & 1 \\ \end{pmatrix} \begin{pmatrix} \beta_0 \\ \beta_1 \\ \end{pmatrix} +\begin{pmatrix} \epsilon_1 \\ \epsilon_2 \\ \vdots \\ \epsilon_{96} \\ \end{pmatrix}=\beta_0\begin{pmatrix} 1 \\ 1 \\ \vdots \\ 1 \\ \end{pmatrix}\] As it is shown in the matrix notation above, the response variable (y) is the vector of log(count+1) for all the replicates of the gene YGR107W. The covariates (X) is a design matrix for testing the difference between 2 sample means, the first column of the design matrix is a vector of 1 for intercept, and the second column of the design matrix is an indicator variable for biological condition (1 if condition == “Snf2”, and 0 otherwise).
count_YGR107W
, fit a linear regression with lm()
using the design described as above. Store the fitted regression model (output of lm()
) intolm_fit
.#Use a linear regression model
lm_fit <- lm(formula = log(count + 1) ~ condition, data = count_YGR107W)
lm_fit #See the fitted model coefficients
##
## Call:
## lm(formula = log(count + 1) ~ condition, data = count_YGR107W)
##
## Coefficients:
## (Intercept) conditionSnf2
## 2.058 1.533
We could see that the fitted coefficients for \(\beta_0\) and \(\beta_1\) are 2.058 and 1.533, respectively.
As it is demonstrated in the diagram above, under this regression design, \(\hat \beta_0\) captures the average expression level of WT (the reference), and \(\hat \beta_1\) captures the difference between the average gene expression levels.
summary()
.summary(lm_fit)
##
## Call:
## lm(formula = log(count + 1) ~ condition, data = count_YGR107W)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.9590 -0.2234 0.0469 0.2450 0.6860
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.05757 0.05048 40.76 <2e-16 ***
## conditionSnf2 1.53311 0.07139 21.48 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3497 on 94 degrees of freedom
## Multiple R-squared: 0.8307, Adjusted R-squared: 0.8289
## F-statistic: 461.2 on 1 and 94 DF, p-value: < 2.2e-16
Sequencing count data can be modeled using a Poisson generalized linear model (GLM) without the need for log transformation.
In the following code chunk, you will fit a Poisson GLM using glm() to model the count data directly, using the same design matrix as in task 2.3.1. The response variable (y) will now be the count data without log transformation. After fitting the model, compute the log fold change (LFC) between the means of the two groups. In the end, check the relationship between the estimated coefficient of the grouping variable (\(\hat{\beta_1}\)) and the log fold change (LFC).
glm_fit <- glm(formula = count ~ condition, family = "poisson", data = count_YGR107W) #Fit a Poisson GLM with log link
summary(glm_fit) #Checking GLM coefficients and statistics
##
## Call:
## glm(formula = count ~ condition, family = "poisson", data = count_YGR107W)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.6913 -1.0250 -0.0103 0.6753 4.9432
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.99527 0.05322 37.49 <2e-16 ***
## conditionSnf2 1.61734 0.05827 27.76 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 1273.99 on 95 degrees of freedom
## Residual deviance: 232.07 on 94 degrees of freedom
## AIC: 676.55
##
## Number of Fisher Scoring iterations: 4
log(mean(assay(ERX425102_se)["YGR107W",][ERX425102_se$condition=="Snf2"]))-log(mean(assay(ERX425102_se)["YGR107W",][ERX425102_se$condition=="WT"])) #Calculate the log fold change of 2 group means
## [1] 1.617339
Performing differential expression analysis for many genes using R can be challenging due to two main issues related to the t.test()
and glm()
functions.
The t.test()
function performs well only when there are at least 30 replicates for each group (for CLT to work out) or the mean counts of the gene are over 20 in all replicates (where log transformed NB distributed variables converges to Gaussian variables). However, meeting these conditions is difficult in practice. Additionally, running t.test() using a for loop over thousands of genes is time-consuming.
The glm()
function also has several issues. The Poisson GLM cannot account for the over-dispersed nature of biological replicates, and estimating over-dispersion using NB GLM is inaccurate with limited replicates. Furthermore, running glm() using a for loop over thousands of genes is also slow.
To overcome these challenges, the DESeq2 R package provides an excellent solution. It has a vectorized function that can fit multiple NB GLMs at once using its core code written in C++. Additionally, DESeq2 implements a regularized method for estimating dispersion parameters by sharing information over all genes, which generally improves the accuracy of tests.
Hint: see ?DESeq2::DESeq
for the usage of its core function, and use browseVignettes("DESeq2")
to look at the user’s guide.
library(DESeq2)
dds <- DESeqDataSet(se = ERX425102_se, design = ~ condition)
## renaming the first element in assays to 'counts'
dds <- DESeq(dds) #Fit n NB GLMs, where n = the # of genes
## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
## -- replacing outliers and refitting for 7 genes
## -- DESeq argument 'minReplicatesForReplace' = 7
## -- original counts are preserved in counts(dds)
## estimating dispersions
## fitting model and testing
res <- results(dds) #Access to the results of the analysis
res
## log2 fold change (MLE): condition Snf2 vs WT
## Wald test p-value: condition Snf2 vs WT
## DataFrame with 7131 rows and 6 columns
## baseMean log2FoldChange lfcSE stat pvalue
## <numeric> <numeric> <numeric> <numeric> <numeric>
## 15S_rRNA 18.33566 0.167494 0.3358680 0.498689 6.17998e-01
## 21S_rRNA 107.32079 -0.093113 0.2781711 -0.334733 7.37827e-01
## HRA1 2.52619 -0.792982 0.2226176 -3.562082 3.67926e-04
## ICR1 141.57494 0.215399 0.0370318 5.816605 6.00550e-09
## LSR1 207.52279 -0.136647 0.1581332 -0.864123 3.87520e-01
## ... ... ... ... ... ...
## no_feature 681521 -0.186217 0.0470641 -3.95667 7.60021e-05
## ambiguous 899512 -0.389648 0.1455694 -2.67672 7.43475e-03
## too_low_aQual 0 NA NA NA NA
## not_aligned 0 NA NA NA NA
## alignment_not_unique 0 NA NA NA NA
## padj
## <numeric>
## 15S_rRNA 6.73744e-01
## 21S_rRNA 7.82652e-01
## HRA1 6.67187e-04
## ICR1 1.62434e-08
## LSR1 4.49962e-01
## ... ...
## no_feature 0.000146702
## ambiguous 0.011574683
## too_low_aQual NA
## not_aligned NA
## alignment_not_unique NA
Question 2.3.2, Mark: 5% How many genes are significantly deferentially expressed between the 2 conditions under the criteria of FDR < 0.05? Which additional column in the output can be used to select a subset of more biologically significant DEGs among the list of statistically significant candidates?
Answer: There are 2174 genes with an adjusted p-value less than 0.05, which can be considered as significant DEGs. padj
column can be used to select a subset of more biologically significant DEGs among the list of statistically significant candidates
Hint: When controlling FDR, directly use the column padj
in the DESeq2 output, which is the adjusted p-value using the Benjamini Hochberg approach.
sessionInfo()
## R version 4.0.3 (2020-10-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.5 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] DESeq2_1.29.0 MASS_7.3-53
## [3] M3C_1.12.0 SummarizedExperiment_1.19.2
## [5] DelayedArray_0.15.1 matrixStats_0.56.0
## [7] Biobase_2.49.0 GenomicRanges_1.41.1
## [9] GenomeInfoDb_1.25.0 IRanges_2.23.4
## [11] S4Vectors_0.27.5 BiocGenerics_0.35.2
## [13] ggplot2_3.3.0
##
## loaded via a namespace (and not attached):
## [1] bitops_1.0-6 bit64_0.9-7 doParallel_1.0.16
## [4] RColorBrewer_1.1-2 tools_4.0.3 R6_2.4.1
## [7] DBI_1.1.0 colorspace_1.4-1 withr_2.2.0
## [10] tidyselect_1.1.0 Exact_2.1 bit_1.1-15.2
## [13] compiler_4.0.3 expm_0.999-6 labeling_0.3
## [16] scales_1.1.1 mvtnorm_1.1-0 genefilter_1.71.0
## [19] askpass_1.1 stringr_1.4.0 digest_0.6.25
## [22] rmarkdown_2.1 XVector_0.29.0 pkgconfig_2.0.3
## [25] htmltools_0.4.0 umap_0.2.7.0 rlang_0.4.6
## [28] rstudioapi_0.11 RSQLite_2.2.0 farver_2.0.3
## [31] jsonlite_1.6.1 BiocParallel_1.23.0 dplyr_0.8.5
## [34] RCurl_1.98-1.2 magrittr_1.5 GenomeInfoDbData_1.2.3
## [37] Matrix_1.2-18 Rcpp_1.0.4.6 DescTools_0.99.40
## [40] munsell_0.5.0 reticulate_1.20-9000 lifecycle_0.2.0
## [43] stringi_1.4.6 yaml_2.2.1 rootSolve_1.8.2.1
## [46] zlibbioc_1.35.0 Rtsne_0.15 matrixcalc_1.0-3
## [49] grid_4.0.3 blob_1.2.1 crayon_1.3.4
## [52] doSNOW_1.0.19 lmom_2.8 lattice_0.20-41
## [55] splines_4.0.3 annotate_1.67.0 locfit_1.5-9.4
## [58] knitr_1.28 pillar_1.4.4 boot_1.3-25
## [61] gld_2.6.2 corpcor_1.6.9 geneplotter_1.67.0
## [64] codetools_0.2-16 XML_3.99-0.3 glue_1.4.0
## [67] evaluate_0.14 vctrs_0.3.0 png_0.1-7
## [70] foreach_1.5.0 gtable_0.3.0 openssl_1.4.1
## [73] purrr_0.3.4 assertthat_0.2.1 xfun_0.13
## [76] xtable_1.8-4 e1071_1.7-3 RSpectra_0.16-0
## [79] survival_3.2-7 class_7.3-17 tibble_3.0.1
## [82] snow_0.4-3 iterators_1.0.12 memoise_1.1.0
## [85] AnnotationDbi_1.51.0 cluster_2.1.0 ellipsis_0.3.0