This package provides the barcode, UMI, and set (BUS) format of the following datasets from 10X genomics:

The original fastq files have already been processed into the BUS format, which is a table with the following columns: barcode, UMI, equivalence class/set, and count (i.e. number of reads for the same barcode, UMI, and set). The datasets have been uploaded to ExperimentHub. This vignette demonstrates how to download the first dataset above with this package. See the BUSpaRse website for more detailed vignettes.

library(TENxBUSData)
library(ExperimentHub)
#> Loading required package: BiocGenerics
#> 
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:stats':
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#> 
#>     Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
#>     as.data.frame, basename, cbind, colnames, dirname, do.call,
#>     duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
#>     lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
#>     pmin.int, rank, rbind, rownames, sapply, saveRDS, setdiff, table,
#>     tapply, union, unique, unsplit, which.max, which.min
#> Loading required package: AnnotationHub
#> Loading required package: BiocFileCache
#> Loading required package: dbplyr

See which datasets are available with this package.

eh <- ExperimentHub()
listResources(eh, "TENxBUSData")
#> [1] "100 1:1 Mixture of Fresh Frozen Human (HEK293T) and Mouse (NIH3T3) Cells"              
#> [2] "1k 1:1 Mixture of Fresh Frozen Human (HEK293T) and Mouse (NIH3T3) Cells (v3 chemistry)"
#> [3] "1k PBMCs from a Healthy Donor (v3 chemistry)"                                          
#> [4] "10k Brain Cells from an E18 Mouse (v3 chemistry)"

In this vignette, we download the 100 cell dataset. The force argument will force redownload even if the files are already present.

TENxBUSData(".", dataset = "hgmm100", force = TRUE)
#> see ?TENxBUSData and browseVignettes('TENxBUSData') for documentation
#> downloading 1 resources
#> retrieving 1 resource
#> loading from cache
#> The downloaded files are in /tmp/RtmpeooQUu/Rbuild2013b030771b27/TENxBUSData/vignettes/out_hgmm100
#> [1] "/tmp/RtmpeooQUu/Rbuild2013b030771b27/TENxBUSData/vignettes/out_hgmm100"

Which files are downloaded?

list.files("./out_hgmm100")
#> [1] "matrix.ec"         "output.sorted"     "output.sorted.txt"
#> [4] "transcripts.txt"

These should be sufficient to construct a sparse matrix with package BUSpaRse.