Lecture Practical 1. EDA and Batch Effect Correction on NGS data

The Outline for Lecture Practical 1
1. Check the Experiment Design and Data Directories
2. Extract the Longest Transcript of the Gene USF2
3. Load the BAM Files as Fragments on Transcript
4. Plot the Fragment Length Distribution
5. Check the Fragment Coverage on the Transcript
6. Explore Fragment level GC Content Distributions
7. Fit the Linear Relationships Between GC and Fragment Coverages
- Gaussian Regression:
- Poisson Regression:
8. Correct the Fragment Coverage by Straightening the GC content linear effects
Session Info

The Outline for Lecture Practical 1

The major target of this assignment is to reproduce the core analysis of a paper published on in 2016. Next-Generation Sequencing (NGS) technology is currently the most popular data type in bioinformatics. Tens of thousands of NGS datasets are deposited on the published dataset every year, capturing the molecular profiles of DNA, RNA, and epigenetics in cells. In this project, our goal is to conduct a lower level EDA based on a RNA-Seq dataset generated from 2 sequencing centers. We will discover the major source of technical variation in this dataset, and a statistical tool will be applied to correct the technical error.

You could check the paper below for more of the background information:

Love, M. I., Hogenesch, J. B., & Irizarry, R. A. (2016). Modeling of RNA-seq fragment sequence bias reduces systematic errors in transcript abundance estimation. Nature biotechnology, 34(12), 1287.

1. Check the Experiment Design and Data Directories

Our RNA-Seq dataset contains 4 samples sequenced from 2 centers under the platform of Illumina Hi-Seq 2000. All of the samples are originated from the same cell line and the same biological condition in the GEUVADIS project. The reads are previously aligned to the genome assembly hg19 using STAR, and the resulting BAM files are saved under the working directory of Lecture practical 1.

The sample information is summarized in the table samples.csv, and we can construct the BAM file directories from the table. Please use the string manipulation function paste0() to construct the names of the sorted BAM files from the run column of the table, then check if all the files exist under the current working directory with file.exists().

Code Chunk1

## === Hint code, fill the "___" ============= ##
#samples <- read.csv(___)
#bamfiles <- paste0(___) #Paste the sample id with "_sort.bam"
#file.exists(bamfiles) #Check if the bam files exist or not
## ===== Enter your code below  =============== ##
samples <- read.csv("samples.csv")
library(GenomicAlignments)
bamfiles <- paste0(samples$run,"_sort.bam")
file.exists(bamfiles)

## [1] TRUE TRUE TRUE TRUE

2. Extract the Longest Transcript of the Gene USF2

In the following analysis, we want to investigate the RNA-Seq fragment coverage on the transcript of gene USF2. First, please extract the exon regions of the longest transcript of gene USF2 on hg19. The expected output in this step is a GRangesList object with length 1, which contains all exons of the transcript. Name the resulting variable by usf2_longest.

To achieve this purpose, you need to first use the select() function to find the transcript ids and transcript name associated with the gene USF2. You could use keytypes(Homo.sapiens) and columns(Homo.sapiens) to check all of the id conversion options existed in the Human organismDb package. The keys are things can be converted, and the columns are things can be converted into.

USF2 is a gene symbol, so it should have the keytype of "SY___" in the organismDb object, and the corresponding columns should be c("TX___", "TX___") so that we could retrieve both the transcript id and the transcript name from the database.

The output of exonsBy(txdb, by = "tx") is a GRangesList, each element within correspond to the exons of a transcript. You could access the length of each element in a list object with elementNROWS().

Code Chunk2

## === Hint code, fill the "___" ============= ##
# library(TxDb.___.UCSC.___.knownGene)
# library(Homo.sapiens)
# txdb <- TxDb.___.UCSC.___.knownGene
# g <- list()
# g[["USF2"]] <- select(Homo.sapiens, "USF2", ___ , ___)
# ebt <- exonsBy(txdb, by="tx")
# head(names(ebt)) #Check the structure of exonsByTranscript, the names are TXIDs
# usf2 <- ebt[ g[["USF2"]]$TXID ]
# usf2_longest <- usf2[which.max(___)] #You need to only keep the longest transcript among the isoforms
## ===== Enter your code below  =============== ##
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
library(Homo.sapiens)
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
g <- list()
g[["USF2"]] <- select(Homo.sapiens, "USF2", c("TXID","TXNAME"), "SYMBOL")
ebt <- exonsBy(txdb, by="tx")
head(names(ebt))

## [1] "1" "2" "3" "4" "5" "6"

usf2 <- ebt[ g[["USF2"]]$TXID ]
usf2_longest <- usf2[which.max(sum(width(usf2)))]

SAQ1

SAQ1: How many transcript isoforms does USF2 have? How many exons does each transcript have? Please tell what are the transcript id and transcript name for the longest transcript of USF2. Is the longest transcript the one having the most number of exons?
Answer: Gene USF2 has 5 isoforms. For the 5 transcripts of the gene, uc010xss.1, uc002nyq.1, uc002nyr.1, uc002nyt.1, and uc002nyv.1 contain 9, 10, 9, 8, and 8 exons, respectively. The longest transcript has the txid of 66827 and the txname of uc002nyq.1, and it is the one having the most number of exons (10).

3. Load the BAM Files as Fragments on Transcript

Next, we want to write a function that will take the inputs of the BAM file directory and the GRangesList of the transcript, and it will return the GRanges of fragments “on the transcript coordinates” (without introns).

The first step in this function is to read the BAM file using readGAlignmentPairs(). At this step, we need to set the param= argument with ScanBamParam() to define the filters used when reading BAM files: the loaded reads should lie within range of the target transcript, and all of the reads with mapping quality scores less than 30 will be discarded. Please see ?ScanBamParam() on how to set the BAM filtering parameters.

Subsequently, we need to only keep the aligned reads that are compatible to the exons of the transcript. This purpose can be achieved using the function findCompatibleOverlaps(). Then, the coordinates of the loaded reads are based on genome, so we need to convert them to the coordinate of the transcript using the function map___Transcripts().

Please fillin the missing portion of the code chunk below to realize the function readTxFragments().

Code Chunk3

readTxFragments <- function(file, transcript) {
  r <- range(transcript[[1]]) # Get the range of the target transcript on genome
  r <- keepStandardChromosomes(r) # Only keep the transcript on standard chromosome
  
  
  # Read the BAM files with required filters, and find the compatible reads
  ## === Hint code, fill the "___" ============= ##
  # suppressWarnings({
  # gap <- readGAlignmentPairs(file, param=ScanBamParam(which = ___, mapqFilter = ___))
  # }) # suppress warnings about alignments with ambiguous pairing
  # fco <- findCompatibleOverlaps(___, ___)
  ## ===== Enter your code below  =============== ##
   suppressWarnings({
    gap <- readGAlignmentPairs(file, param=ScanBamParam(which=r,mapqFilter = 30))
  })
  fco <- findCompatibleOverlaps(gap, transcript)
  ## ===== Your code is finished  =============== ##
  idx <- queryHits(fco) # get index for the compatible reads
  gr <- as(gap[idx],"GRanges") # subset the reads with index and convert it into GRanges
  strand(gr) <- "*" # set the strand into both stands (*), this will fill the insert and make a pair of reads into 1 range. 
  
  # Convert the paired-end reads into fragments on exons by mapping the ends of the reads to transcript coordinate.
  ## === Hint code, fill the "___" ============= ##
  # m2tx.start <- map___Transcripts(resize(___, width=1, fix="start"), ___)
  # m2tx.end <- map___Transcripts(resize(___, width=1, fix="end"), ___)
  ## ===== Enter your code below  =============== ##
  m2tx.start <- mapToTranscripts(resize(gr, width=1, fix="start"),
                                 transcript)
  m2tx.end <- mapToTranscripts(resize(gr, width=1, fix="end"),
                               transcript)
  ## ===== Your code is finished  =============== ##
  #Flip the start & end if the transcript is on the negative strand.
  tx.strand <- as.character(strand(transcript)[[1]][1])
  if (tx.strand == "+") {
    m2tx <- GRanges(seqnames(m2tx.start),
                    IRanges(start(m2tx.start),start(m2tx.end)))
  } else {
    m2tx <- GRanges(seqnames(m2tx.start),
                    IRanges(start(m2tx.end),start(m2tx.start)))
  }
  return(m2tx)
}

Fragment_list <- lapply(bamfiles, readTxFragments, transcript = usf2_longest)

SAQ2

SAQ2: After iterating the function over all of the BAM files, check the resulting variable Fragment_list; please answer that, for each sample, how many fragments are compatibly aligned to the exons of the target transcript?
Answer: The samples ERR188204, ERR188317, ERR188297, and ERR188088 contain 2158, 2771, 2204, and 2931 compatibly aligned fragments, respectively, on the targeted transcript.

4. Plot the Fragment Length Distribution

After obtaining the aligned fragments on the transcript. We want to compare the lengths of the fragments in different samples. In the following code chunk, please manipulate the Fragment_list with another lapply() function to calculate the fragment lengths for each sample. Then, run the following code to plot the fragment length distributions.

Code Chunk4

## === Hint code, fill the "___" ============= ##
#Fragment_len <- lapply(Fragment_list, ___)
## ===== Enter your code below  ============== ##
Fragment_len <- lapply(Fragment_list, width)
## ===== Your code is finished  =============== ##
plot_df <- data.frame( fragment_length = unlist( Fragment_len ),
                   sample = rep(samples$run, elementNROWS(Fragment_list)),
                   batch = rep(paste0("center ",samples$center), elementNROWS(Fragment_list)) )

library(ggplot2)
library(ggsci) #A package for good looking colours

ggplot(plot_df, aes(x = fragment_length, color = sample, linetype = batch)) + geom_density() +
  theme_classic() + scale_color_npg() + labs(x = "Fragment Lengths")

Next, please calculate the average fragment lengths for each sample (BAM file), and store the vector within the variable named average_frag_len.

Code Chunk5, Mark: 3%

## === Hint code, fill the "___" ============= ##
# average_frag_len <- sapply(Fragment_len, ___ )
## ===== Enter your code below  ============== ##
average_frag_len <- sapply(Fragment_list, function(x) mean(width(x)))

SAQ3, mark: 8%

SAQ3: Are the fragment lengths fixed within each BAM files? Are the average fragment lengths different between samples? Are the read lengths fixed within each samples? Given the read lengths in the BAM files are 76, what are the average “inner distances” of the 4 samples on the target transcript? Please discuss the potential source of variations for the fragment lengths in NGS.
Answer: The fragment lengths are not fixed for each samples, and the average fragment lengths various significantly across samples and batches. The read lengths are usually fixed within each NGS sample. Given the read lengths are 76, the average inner distances of the samples are 7.714551, 18.212198, 41.054900, and 39.096554. One of the potential source of variation in the fragment lengths include the fragment lengths selection under the preparation of illumina sequencing libraries.

P.S. please see this link to help you clarify the differences between fragment, read, insert, and inner distance.

The definition of a fragment in our project actually corresponds to the ‘insert’ region mentioned in the link above. The term insert and fragment are often used interchangeably, because the adapters are removed in order to align the reads to the genome. But we should know that the “full” fragments in NGS do have adapter sequences at the ends.

5. Check the Fragment Coverage on the Transcript

In the next step, we want to visualize the “coverage” of sequencing fragments on the target transcript. The coverages can be understood as the heights obtained by stacking all of the features along the coordinate. In Bioconductor, we can compute the coverage of GRanges using the function coverage().

Again, for each list element of Fragment_list, compute the coverage, and convert the coverage into the integer vectors of counts. Store the coverage list in a variable named by coverage_lst.

Code Chunk6

## === Hint code, fill the "___" ============= ##
# coverage_lst <- lapply(___, function(x) as.vector(___[[1]]))
## === Enter your code below ================= ##
coverage_lst <- lapply(Fragment_list, function(x) as.vector(coverage(x)[[1]]))
## === Your code is finished  ================ ##
plot_df <- data.frame( tx_position = unlist(lapply(coverage_lst,seq_along)),
                       coverage = unlist( coverage_lst ),
                       sample = rep(samples$run, elementNROWS(coverage_lst)),
                       batch = rep(paste0("center ",samples$center), elementNROWS(coverage_lst)) )

ggplot(plot_df, aes(x = tx_position, y = coverage, color = sample, linetype = batch)) + geom_line() +
  theme_classic() + scale_color_npg() + labs(x = "transcript coordinate")

SAQ4

SAQ4: The plot above shows the fragment coverages on the transcript. Ideally, what distribution should the fragment coverages have on an exon and transcript? Are the observed coverages look like what we expected? What are the possible sources of the variation of the coverages between samples? Please try to list some of the potential results if we ignore this issue in the downstream analysis.
Answer: Ideally, the fragments should be uniformly distributed on the transcript and exons, if there is no differential expression in transcript isoforms. The coverage plot does not look like expected, and the source of variation might partially lead by the differential expression in the isoform level. Also, there might be technical errors such as fragment length selection and the fragment GC bias. If we ignore the coverage differences, it may lead to biased detection for the expression levels of genes and isoforms. False discoveries could also be made when comparing multiple RNA-Seq samples.

6. Explore Fragment level GC Content Distributions

Using the Fragment_list generated before, extract the fragment GC contents on the longest transcript over the 4 samples. The sequence of a transcript can be retrieved with extractTranscriptSeqs(). Then, apply Views() to extract the sequences underneath the fragment ranges. Store the GC contents of each list element in a variable named by GC_list.

Code Chunk7

## === Hint code, fill the "___" ============= ##
#library(BSgenome.___.UCSC.___)
#usf2.seq <- extractTranscriptSeqs(___,___)[[1]]
#GC_list <- lapply(Fragment_list,function(x) ___(___(Views(___, ranges(x)) ), ___, as.prob=___))
## === Enter your code below ================= ##
library(BSgenome.Hsapiens.UCSC.hg19)
usf2.seq <- extractTranscriptSeqs(Hsapiens, usf2_longest)[[1]]
GC_list <- lapply(Fragment_list, function(x) letterFrequency(DNAStringSet( Views(usf2.seq, ranges(x)) ), "GC", as.prob=TRUE))
## === Your code is finished  ================ ##
plot_df <- data.frame( GC_content = unlist( GC_list ),
                       sample = rep(samples$run, elementNROWS(GC_list)),
                       batch = rep(paste0("center ",samples$center), elementNROWS(GC_list)) )

ggplot(plot_df, aes(x = GC_content, color = sample, linetype = batch)) + geom_density() +
  theme_classic() + scale_color_npg() + labs(x = "Fragment GC content")

SAQ5, mark: 8%

SAQ5: After plotting the GC content densities, please discuss the potential cause of the variation observed between batches. Assume the observed bias exists across all the fragments of the library (not restricted to one transcript), will this bias lead to mistakes in the gene expression level quantification? Do you expect to observe a similar phenomenon if the reads are generated by 3rd generation sequencing? Please explain the reason.
Answer: The inconsistency among the GC content distributions might due to the technical variation introduced during the PCR amplification. The bias will lead to the systematic mistakes in quantification if it exists in all of the sequencing fragments. The sequence content bias will not occur in the 3rd generation sequencing, because the real-time sequencing does not require PCR amplification of the fragments.

7. Fit the Linear Relationships Between GC and Fragment Coverages

In the previous question, we have explored the “marginal distributions” of fragment GC contents among samples. Then, we will attempt to investigate the “joint distributions” between GC and coverages. If the modeling on the joint is good enough, the “offsets” of the coverages can be estimated, which can be used to correct the GC content bias.

To model the relationship between GC and coverage, we need to first identify all the potential positions of the sequencing fragments, because many fragments that potentially can be generated have 0 counts in our data. The following code chunk will help us generate the coverage for potential fragments.

You are required to fill the central step in this procedure, which is to use the function slidingWindows() to generate potential fragments on the coordinate of the target transcript. Set the window size into the average fragment length calculated from each sample, and set the steps equal 1L (move one base at a time).

Code Chunk8

#Convert fragment into the end POS and compute their coverages
POS <- lapply(Fragment_list, function(x) c(resize(x,1,fix = "start"),
                                           resize(x,1,fix = "end")) )

POS_coverage <- lapply(POS, function(x) as.numeric(coverage(x)[[1]]))

#Get the range of the transcript on its own coordinate
usf2_tx <- range(mapToTranscripts(
                     unlist(usf2_longest),
                            usf2_longest)) 

#Find potential fragments for each sample 

## === Hint code, fill the "___" ============= ##
#PotentialFragments <- lapply( round(average_frag_len),
#                             function(x) slidingWindows(___,___,___)[[1]] )
## === Enter your code below ================= ##
PotentialFragments <- lapply( round(average_frag_len),
                             function(x) slidingWindows(usf2_tx,x,1L)[[1]] )
## === Your code is finished  ================ ##

#Count both ends of the potential fragments
Fragment_counts <- mapply(function(x,y) y[start(x)]+y[end(x)],
                             PotentialFragments,
                             POS_coverage) 

#Calculate GC on potential fragments
GC_listp <- lapply(PotentialFragments,
                   function(x) letterFrequency(DNAStringSet( 
                                  Views(usf2.seq, ranges(x)) ), 
                                        "GC", as.prob=TRUE))

#Build a data.frame for plotting
plot_df <- data.frame( GC_content = unlist( GC_listp ),
                       Fragment_count = unlist(Fragment_counts),
                       sample = rep(samples$run, elementNROWS(GC_listp)),
                       batch = rep(paste0("center ",samples$center),
                                   elementNROWS(GC_listp)) )

Then, we will try to fit an ordinary linear regression model using smoothing splines of 5 knots. The x and y are the GC content and end coverage of potential fragments, respectively.

Gaussian Regression:

## define the smooth function for ordinary linear regression
lm_smooth <- function(...) {
  geom_smooth(method = "lm", ...)
}

ggplot(plot_df, aes(x = GC_content, y = Fragment_count,
                    color = sample, linetype = batch)) + 
        lm_smooth(formula = y ~ splines::ns(x, 5)) +
        theme_classic() + scale_color_npg() + 
        labs(x = "Fragment GC content", 
             y = "Fragment count",
             main = "Gaussian Regression")

We have obtained the fitted curves between GC and coverage for different samples. Subsequently, we will try an alternative model called Poisson regression. Please fill out the function poisson_smooth() so that ggplot2 could fit a Poisson regression on our data.

P.S. the Poisson regression is a type of generalized linear model. In R, Poisson glm can be called with the function glm() and family = “poisson”. The difference between Poisson regression and ordinary linear regression is that the Poisson regression treats the noises as Poisson distributions, while the OLR teats the noises as zero centered normal distributions.

Poisson Regression:

Code Chunk9

poisson_smooth <- function(...) {
## === Hint code, fill the "___" ============= ##
#geom_smooth(method = ___, method.args = list(family = ___), ...)
## === Enter your code below ================= ##
geom_smooth(method = "glm", method.args = list(family = "poisson"), ...)
## === Your code is finished  ================ ##  
}

ggplot(plot_df, aes(x = GC_content, y = Fragment_count,
                    color = sample, linetype = batch)) +  
        poisson_smooth(formula = y ~ splines::ns(x, 5)) +
        theme_classic() + scale_color_npg() + 
        labs(x = "Fragment GC content",
             y = "Fragment count",
             main = "Poisson Regression")

SAQ6

SAQ6: The grey area around the curves indicates the confidence intervals for the estimations of the conditional expectations (regression). From your opinion, which model fits better, and why? Could you try to give some reasons behind the model choices? (please think about the nature of the reads coverage datatype.)
Answer: The Poisson regression fits better because its parameter estimates have smaller standard errors (narrower confidence interval). We will choose the Poisson model due to the response variable in our model being the reads coverage in NGS. The nature of reads coverages is small count values that have shoot noise of Poisson distribution.

8. Correct the Fragment Coverage by Straightening the GC content linear effects

Using your selected model, please complete the following code to obtain the predicted values of reads coverage by fragment GC. Then, the fitted values will be used as offsets to conduct normalization on the sequencing data. Hopefully, this measure could help to remove the technical errors induced by GC.

Code Chunk10

## === Hint code, fill the "___" ============= ##
#offsets <- mapply(function(x,y){predict(___(y~splines::ns(x,5), 
#                  ___ = ___),data.frame(x))},
#                  GC_listp, Fragment_counts) #mapply() is an iterative functional with 2 inputs, see ?mapply
## === Enter your code below ================= ##
offsets <- mapply(function(x,y){predict(glm(y~splines::ns(x,5), 
                  family = poisson),data.frame(x))},
                  GC_listp, Fragment_counts) 
## === Your code is finished  ================ ##

#The read coverage is normalized by dividing the original fragment counts by the exponentiated offsets, the exponetiation is necessary if the prediction is made by poisson GLM which is on the log scale.
normalized_counts <- mapply(function(x,y) x/exp(y), Fragment_counts, offsets)


library(zoo)


#The normalized coverage is calculated by the rolling sums (rollsum) of the normalized counts, while the ends are computed by cumulative sums (cumsum).
normalized_coverages <- mapply(function(x,y) 
                                c(cumsum(x[1:y]),rollsum(x,y),
                                  rev(cumsum(x[length(x):(length(x)-y)])))/2,
                               normalized_counts,
                               average_frag_len) 

#Finally, plot the normalized reads coverage with ggplot2.
plot_df <- data.frame( tx_position = unlist(lapply(normalized_coverages,seq_along)),
                       coverage = unlist( normalized_coverages  ),
                       sample = rep(samples$run, elementNROWS(normalized_coverages)),
                       batch = rep(paste0("center ",samples$center), elementNROWS(normalized_coverages)) )

ggplot(plot_df, aes(x = tx_position, y = coverage, color = sample, linetype = batch)) + geom_line() +
  theme_classic() + scale_color_npg() + labs(x = "transcript coordinate", y = "normalized coverage")

SAQ6

SAQ7: From the corrected coverage plot, does the correction of fragment GC bias reduces the between batch variation? Referring to the work of Love, M. I et.al (2016), could you list some other sources of technical biases in RNA-Seq? What are the potential issues behind the assumptions made when normalizing the technical factors in NGS? (Recall the link between GC and RNA structures as we observed in project 1)
Answer: The corrected coverage reduces the between batch variation as the deviance between the 2 sequencing centers are dropped. Besides fragment GC content, other sources of technical biases in RNA-Seq include end degradation, fragment size selection, and random hexamer priming. The potential issue made for completely straightening the technical factors and reads count is the possibility of removing biological signals that confound with technical features. e.x. The GC content is highly biological as it correlates with the RNA secondary structure.

Session Info

sessionInfo()

## R version 4.0.3 (2020-10-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.5 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    parallel  stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] zoo_1.8-9                              
##  [2] BSgenome.Hsapiens.UCSC.hg19_1.4.3      
##  [3] BSgenome_1.57.0                        
##  [4] rtracklayer_1.49.1                     
##  [5] ggsci_2.9                              
##  [6] ggplot2_3.3.0                          
##  [7] Homo.sapiens_1.3.1                     
##  [8] org.Hs.eg.db_3.11.1                    
##  [9] GO.db_3.11.1                           
## [10] OrganismDbi_1.31.0                     
## [11] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
## [12] GenomicFeatures_1.41.0                 
## [13] AnnotationDbi_1.51.0                   
## [14] GenomicAlignments_1.25.0               
## [15] Rsamtools_2.5.0                        
## [16] Biostrings_2.57.0                      
## [17] XVector_0.29.0                         
## [18] SummarizedExperiment_1.19.2            
## [19] DelayedArray_0.15.1                    
## [20] matrixStats_0.56.0                     
## [21] Biobase_2.49.0                         
## [22] GenomicRanges_1.41.1                   
## [23] GenomeInfoDb_1.25.0                    
## [24] IRanges_2.23.4                         
## [25] S4Vectors_0.27.5                       
## [26] BiocGenerics_0.35.2                    
## 
## loaded via a namespace (and not attached):
##  [1] nlme_3.1-149           bitops_1.0-6           bit64_0.9-7           
##  [4] progress_1.2.2         httr_1.4.1             tools_4.0.3           
##  [7] R6_2.4.1               DBI_1.1.0              mgcv_1.8-33           
## [10] colorspace_1.4-1       withr_2.2.0            tidyselect_1.1.0      
## [13] prettyunits_1.1.1      bit_1.1-15.2           curl_4.3              
## [16] compiler_4.0.3         graph_1.67.0           cli_2.0.2             
## [19] labeling_0.3           scales_1.1.1           RBGL_1.65.0           
## [22] askpass_1.1            rappdirs_0.3.1         stringr_1.4.0         
## [25] digest_0.6.25          rmarkdown_2.1          pkgconfig_2.0.3       
## [28] htmltools_0.4.0        dbplyr_1.4.3           rlang_1.0.5           
## [31] rstudioapi_0.11        RSQLite_2.2.0          farver_2.0.3          
## [34] BiocParallel_1.23.0    dplyr_0.8.5            RCurl_1.98-1.2        
## [37] magrittr_1.5           GenomeInfoDbData_1.2.3 Matrix_1.4-1          
## [40] Rcpp_1.0.9             munsell_0.5.0          fansi_0.4.1           
## [43] lifecycle_0.2.0        stringi_1.4.6          yaml_2.2.1            
## [46] zlibbioc_1.35.0        BiocFileCache_1.13.0   grid_4.0.3            
## [49] blob_1.2.1             crayon_1.3.4           lattice_0.20-41       
## [52] splines_4.0.3          hms_0.5.3              knitr_1.28            
## [55] pillar_1.4.4           biomaRt_2.45.0         XML_3.99-0.3          
## [58] glue_1.4.0             evaluate_0.14          BiocManager_1.30.10   
## [61] vctrs_0.3.0            gtable_0.3.0           openssl_1.4.1         
## [64] purrr_0.3.4            assertthat_0.2.1       xfun_0.13             
## [67] tibble_3.0.1           memoise_1.1.0          ellipsis_0.3.0