The muscData
package contains a set of publicly available single-cell RNA sequencing (scRNA-seq) datasets with complex experimental designs, i.e., datasets that contain multiple samples (e.g., individuals) measured across multiple experimental conditions (e.g., treatments), formatted into SingleCellExperiment
(SCE) Bioconductor objects. Data objects are hosted through Bioconductor’s ExperimentHub web resource.
The table below gives an overview of currently available datasets, including a unique identifier (ID) that can be used to load the data (see next section), a brief description, the original data source, and a reference. Dataset descriptions may also be viewed from within R via ?ID
(e.g., ?Kang18_8vs8
).
ID | Description | Availability | Reference |
---|---|---|---|
Kang18_8vs8 |
10x droplet-based scRNA-seq PBMC data from 8 Lupus patients before and after 6h-treatment with INF-beta (16 samples in total) | Gene Expression Ombnibus (GEO) accession GSE96583 | Kang et al. (2018) |
Crowell19_4vs4 |
Single-nuclei RNA-seq data of 8 CD-1 male mice, split into 2 groups with 4 animals each: vehicle and peripherally lipopolysaccharaide (LPS) treated mice | Figshare DOI:10.6084/m9.figshare.8976473.v1 | Crowell et al. (2019) |
All datasets available within muscData
may be loaded either via named functions that directly reffer to the object names, or by using the ExperimentHub
interface. Both methods are demonstrated below.
The datasets listed above may be loaded into R by their ID. All provided SCEs contain unfiltered raw counts in their assay
slot, and any available gene and cell metadata in the rowData
and colData
slots, respectively.
library(muscData)
## Warning: replacing previous import 'utils::findMatches' by
## 'S4Vectors::findMatches' when loading 'AnnotationDbi'
Kang18_8vs8()
## class: SingleCellExperiment
## dim: 35635 29065
## metadata(0):
## assays(1): counts
## rownames(35635): MIR1302-10 FAM138A ... MT-ND6 MT-CYB
## rowData names(2): ENSEMBL SYMBOL
## colnames(29065): AAACATACAATGCC-1 AAACATACATTTCC-1 ... TTTGCATGGTTTGG-1
## TTTGCATGTCTTAC-1
## colData names(5): ind stim cluster cell multiplets
## reducedDimNames(1): TSNE
## mainExpName: NULL
## altExpNames(0):
ExperimentHub
Besides using an accession function as demonstrated above, we can browse ExperimentHub records (using query
) or package specific records (using listResources
), and then load the data of interest. The key differences between these approaches is that query
will search all of ExperimentHub, while listResources
facilitate data discovery within the specified package (here, muscData
).
query
We first initialize a Hub instance to search for and load available data with the ExperimentHub
function, and store the complete list of >2000 records in a variable eh
. Using query
, we then identify any records made available by muscData
, as well as their accession IDs (EH1234). Finally, we can load the data into R via eh[[id]]
.
# create Hub instance
library(ExperimentHub)
eh <- ExperimentHub()
(q <- query(eh, "muscData"))
## ExperimentHub with 2 records
## # snapshotDate(): 2023-04-24
## # $dataprovider: GEO, F. Hoffmann-La Roche Ltd.
## # $species: Mus musculus, Homo sapiens
## # $rdataclass: SingleCellExperiment
## # additional mcols(): taxonomyid, genome, description,
## # coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
## # rdatapath, sourceurl, sourcetype
## # retrieve records with, e.g., 'object[["EH2259"]]'
##
## title
## EH2259 | Kang18_8vs8
## EH3297 | Crowell19_4vs4
# load data via accession ID
eh[["EH2259"]]
list/loadResources
Alternatively, available records may be viewed via listResources
. To then load a specific dataset or subset thereof using loadResources
, we require a character vector of metadata search terms to filter by.
Available metadata can accessed from the ExperimentHub records found by query
via mcols()
, or viewed using the accessors shown above with option metadata = TRUE
. In the example below, we use "PMBC"
and "INF-beta"
to select the Kang18_8vs8
dataset. However, note that any metadata keyword(s) that uniquely identify the data of interest could be used (e.g., "Lupus"
or "GSE96583"
).
listResources(eh, "muscData")
## [1] "Kang18_8vs8" "Crowell19_4vs4"
# view metadata
mcols(q)
Kang18_8vs8(metadata = TRUE)
# load data using metadata search terms
loadResources(eh, "muscData", c("PBMC", "INF-beta"))
The scater (McCarthy et al. 2017) package provides an easy-to-use set of visualization tools for scRNA-seq data.
For interactive visualization, we recommend the iSEE (interactive SummerizedExperiment Explorer) package (Rue-Albrecht et al. 2018), which provides a Shiny-based graphical user interface for exploration of single-cell data in SummarizedExperiment
format (installation instructions and user guides are available here).
When available, a great tool for interactive exploration and comparison of dimension-reduced embeddings is sleepwalk (Ovchinnikova and Anders 2019).
sessionInfo()
## R version 4.3.0 RC (2023-04-13 r84269)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.2 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.17-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] muscData_1.14.0 SingleCellExperiment_1.22.0
## [3] SummarizedExperiment_1.30.0 Biobase_2.60.0
## [5] GenomicRanges_1.52.0 GenomeInfoDb_1.36.0
## [7] IRanges_2.34.0 S4Vectors_0.38.0
## [9] MatrixGenerics_1.12.0 matrixStats_0.63.0
## [11] ExperimentHub_2.8.0 AnnotationHub_3.8.0
## [13] BiocFileCache_2.8.0 dbplyr_2.3.2
## [15] BiocGenerics_0.46.0 BiocStyle_2.28.0
##
## loaded via a namespace (and not attached):
## [1] KEGGREST_1.40.0 xfun_0.39
## [3] bslib_0.4.2 lattice_0.21-8
## [5] bitops_1.0-7 vctrs_0.6.2
## [7] tools_4.3.0 generics_0.1.3
## [9] curl_5.0.0 tibble_3.2.1
## [11] fansi_1.0.4 AnnotationDbi_1.62.0
## [13] RSQLite_2.3.1 blob_1.2.4
## [15] pkgconfig_2.0.3 Matrix_1.5-4
## [17] GenomeInfoDbData_1.2.10 lifecycle_1.0.3
## [19] compiler_4.3.0 Biostrings_2.68.0
## [21] httpuv_1.6.9 htmltools_0.5.5
## [23] sass_0.4.5 RCurl_1.98-1.12
## [25] yaml_2.3.7 interactiveDisplayBase_1.38.0
## [27] pillar_1.9.0 later_1.3.0
## [29] crayon_1.5.2 jquerylib_0.1.4
## [31] ellipsis_0.3.2 DelayedArray_0.26.0
## [33] cachem_1.0.7 mime_0.12
## [35] tidyselect_1.2.0 digest_0.6.31
## [37] purrr_1.0.1 dplyr_1.1.2
## [39] bookdown_0.33 BiocVersion_3.17.1
## [41] grid_4.3.0 fastmap_1.1.1
## [43] cli_3.6.1 magrittr_2.0.3
## [45] utf8_1.2.3 withr_2.5.0
## [47] filelock_1.0.2 promises_1.2.0.1
## [49] rappdirs_0.3.3 bit64_4.0.5
## [51] rmarkdown_2.21 XVector_0.40.0
## [53] httr_1.4.5 bit_4.0.5
## [55] png_0.1-8 memoise_2.0.1
## [57] shiny_1.7.4 evaluate_0.20
## [59] knitr_1.42 rlang_1.1.0
## [61] Rcpp_1.0.10 xtable_1.8-4
## [63] glue_1.6.2 DBI_1.1.3
## [65] BiocManager_1.30.20 jsonlite_1.8.4
## [67] R6_2.5.1 zlibbioc_1.46.0
Crowell, Helena L, Charlotte Soneson, Pierre-Luc Germain, Daniela Calini, Ludovic Collin, Catarina Raposo, Dheeraj Malhotra, and Mark D Robinson. 2019. “On the Discovery of Population-Specific State Transitions from Multi-Sample Multi-Condition Single-Cell RNA Sequencing Data.” bioRxiv 713412.
Kang, Hyun Min, Meena Subramaniam, Sasha Targ, Michelle Nguyen, Lenka Maliskova, Elizabeth McCarthy, Eunice Wan, et al. 2018. “Multiplexed Droplet Single-Cell Rna-Sequencing Using Natural Genetic Variation.” Nat Biotechnol 36 (1): 89–94. https://doi.org/10.1038/nbt.4042.
McCarthy, Davis J, Kieran R Campbell, Quin F Wills, and Aaron T L Lun. 2017. “Scater: Pre-Processing, Quality Control, Normalization and Visualization of Single-Cell RNA-Seq Data in R.” Bioinformatics 33 (8): 1179–86. https://doi.org/10.1093/bioinformatics/btw777.
Ovchinnikova, Svetlana, and Simon Anders. 2019. “Exploring Dimension-Reduced Embeddings with Sleepwalk.” bioRxiv. https://doi.org/10.1101/603589.
Rue-Albrecht, Kévin, Federico Marini, Charlotte Soneson, and Aaron T L Lun. 2018. “ISEE: Interactive SummarizedExperiment Explorer.” F1000Res 7: 741. https://doi.org/10.12688/f1000research.14966.1.