omicplotR
is an R package containing a
Shiny
app used to visually explore omic datasets, where the
input is a table of read counts from high-throughput sequencing runs. It
integrates the ALDEx2
1 package for compositional analysis of
differential abundance. omicplotR
is intended facilitate
exploring high-throughput sequencing datasets by providing a graphical
user interface for users with and without experience in R.
High-throughput sequencing (HTS) instruments generate an amount of
reads that is constrained by limitations of the sequencing instrument
itself, and do not represent the absolute number of DNA molecules in a
sample. For example, an Illumina NextSeq can deliver up to 400 million
single-end reads, whereas an Illumina MiSeq2 can only deliver up to 15
million single-end reads2. This type of data, which is constrained by
an arbitrary or constant sum, is referred to as compositional data, and
high-throughput sequencing data must be treated as such3. See
ALDEx2
for more information.
Although several R packages exist for exploring high-throughput
sequencing data, they are typically command line based, which presents a
barrier for users without any significant command line or scripting
experience. omicplotR
was created to facilitate the
exploratory phase of high-throughput sequencing data analysis allowing
the generation of basic exploratory plots automatically with adjustable
features and filters.
This vignette provides an overview of the R package
omicplotR
and the input requirements. A tutorial for each
component of the Shiny
app is available on the wiki: https://github.com/dgiguer/omicplotR/wiki.
omicplotR
was developed for several types of HTS datasets
including RNASeq, meta-RNASeq, and 16s rRNA gene sequencing, and in
principle, can be used for nearly any type of data generated by HTS that
contains a tables of counts per feature for each sample.
omicplotR
provides a graphical user interface using the
Shiny
package for the following visualizations for HTS
data:
Additional features include:
ALDEx2
ALDEx2
tables and colour points by
rownames for large datasetsInstall the latest version of omicplotR
using
BiocManager
. Make sure you have the newest version of R,
ALDEx2
, and other dependancies. omicplotR
requires you to have at least R version 3.5. The most up to date version
is available at www.github.com/dgiguer/omicplotr/, and is the dev
branch.
First, load the omicplotR
package. All other
dependencies will be loaded automatically. This will launch the
Shiny
app in your default browser. For this vignette, we
will be using the example data and metadata provided. Example data and
metadata are accessible by data(otu_table)
and
data(metadata)
. They are also available as .txt files in
~/omicplotR/shiny-app/
.
install.packages("BiocManager")
BiocManager::install("omicplotR")
library(omicplotR)
omicplotr.run()
After launching the Shiny
app, click the ‘Input data’
tab to get started.
The ‘Data’ tab on the sidebar panel allows you to choose your own data and metadata by clicking ‘Browse’. To follow along with this vignette, please click the ‘Example data’ tab on the sidebar panel, and click the checkbox for the ‘Vaginal dataset’. This dataset, which includes associated metadata, is from a study that characterized the changes in the vaginal microbiome following antibiotic and probiotic treatment by 16s rRNA gene sequencing4. Return to the ‘Data’ tab on the sidebar panel to view the data and metadata by clicking ‘Show data’ and ‘Show metadata’. The tabs on the main panel allow you to switch between displaying your data and metadata tables.
When choosing your own data set, input requirements are as follows: for both metadata and data, each sample and feature name (operational taxonomic unit - OTU) must be unique. An example of an appropriately formatted data file is shown in Figure 2.
Your metadata file must follow a similar format. An example of an appropriate metadata file is shown in Figure 3.
The ‘Example data’ tab on the sidebar panel provides access to two example datasets. We will be using the provided ‘Vaginal dataset’, which contains both an OTU table and associated metadata. The ‘Selex dataset’ is from a selective growth experiment giving the differential abundance of 1600 enzyme variants5. After selecting the ‘Vaginal dataset’, return to the ‘Data’ tab to and click ‘Show data’ to view the data. You can view the metadata by clicking ‘Show metadata’ and switching the tab in the main panel to ‘Metadata’. Click the ‘PCA Biplots’ main tab to proceed.
The ‘Filtering’ tab within the sidebar panel allows you to choose filtering options for your dataset. Colouring options for a coloured PCA biplot are available under the ‘Colouring options’ tab. The tabs within the main panel allow you to switch between displaying a biplot under ‘Biplot’, a biplot coloured by metadata under ‘Coloured Biplot’, and visualizations of the removed samples/features from filtering under the ‘Removed data’ tab.
“Reasonable” filtering of sparse features (rows) from your dataset should not affect any conclusions made from the PCA biplots. Filtering is exploratory, and should be experimented with to see how removing sparse data affects the structure of your dataset within the PCA biplot. By default, no filtering is applied. We will use a minimum count per OTU filter of 10 for this vignette.
Under the ‘Filtering’ tab within the sidebar panel, the following options are available to filter your data.
Minimum count per OTU This filter removes any rows with a maximum sample count less than the input. It is recommended to remove sparse features. Try setting this filter to 10.
Minimum count sum per sample This filter removes any columns that have a lower sum of counts than the filter. You can visualize which samples are being removed when applying this filter using the ‘Filtering by counts per sample’ graph under the ‘Removed data’ tab.
Minimum proportional abundance This filter calculates frequency for each sample (by column) after filters 1 and 2 are applied, and removes any rows that have all frequencies below the threshold.
Maximum proportional abundance This filter calculates frequency for each sample, and removes any rows that have no frequencies below the threshold.
Minimum count sum per OTU This filter removes any rows that have a lower sum of counts than the filter. You can visualize which rows are removed with the ‘Filtering by counts per row’ graph under the ‘Removed data’ tab.
Variance cutoff This filter calculates the variance for each OTU (row), and removes any rows that have lower variance than the filter. Variance is calculated after all other filters, and after the reads have been transformed by the centre-log ratio.
Adjust scale This changes the scale of the biplot. When set to zero, it shows the relationship between samples (columns), while being set to 1 shows the relationship between OTUs (rows). Data is not filtered from this operation.
To view the coloured PCA biplot, click the ‘Coloured Biplot’ tab within the main panel. Clicking the ‘Colouring options’ panel within the side bar panel allows you to choose which metadata to colour your plot by. We will be colouring the samples based on whether or not they were positive for bacterial vaginosis as defined by Nugent status. Choose the column ‘n_status’ to colour the data by Nugent status. You will need to click the ‘nonnumeric metadata’ radio button since the values of the column are categorical. The sample names that are black (ie, 128bvvc) on the PCA biplot represent samples without metadata.
Under the ‘Colouring options’ tab in the sidebar panel, click on ‘Filter by metadata values’ to replot samples belonging to certain groups according to the metadata. This will generate a pop-up (Figure 5). Filter the dataset to replot only the samples at time zero by selecting the ‘time’ column, and inputting ‘0’ into the text field for Value 1. This should update the coloured PCA biplot to show samples that were taken from time = 0. Click ‘Update Filter’ to update the filter, or ‘Reset Filter’ to reset it. This filter is applied to the data for all other plots (relative abundance, effect plots).
Click the ‘Relative abundance plots’ main tab to continue. The sidebar panel allows you to filter the data by choosing an abundance cutoff, select the clustering method, and select the distance matrix method. The main panel displays a dendrogram above a stacked barplot, and both can be zoomed in by dragging your mouse over a selected region and double clicking.
Any filtering by metadata will be reflected in the dendrogram and stacked barplot as well.
To generate effect plots, click on the ‘Effect plots’ main tab. Here,
you can choose to calculate the ALDEx2
table manually (ie,
choosing which columns to compare) or by using metadata. For this
example, we will use the included metadata to compare samples that were
positive for bacterial vaginosis or not according to Nugent status. More
information on choosing manually for your own data set is available on
the GitHub wiki.
Click on the ‘From metadata’ radio button since we are selecting conditions using our metadata. Select the ‘n_status’ column from metadata. For Group 1 choose ‘bv’, and for Group 2 choose ‘n’. After these have been selected, click the ‘Generate effect plot’ button. An effect plot and Bland-Altman plot will be generated. For large datasets, this will take a long time. It takes about 10 seconds for the example data, which has 77 features for 297 samples.
By hovering your cursor over a point on the effect plot, a stripchart
of the expected CLR abundance (see ALDEx2
) for each sample
will be shown, allowing you to compare the differences between your
samples for a given feature (Figure 6). It also displays information
from the effect table generated by ALDEx2
, such as the
effect size, median difference between groups, and median difference
within groups.
If you are using your own data and want to select groups by columns, you will need to reorder your file to have the first n columns as group 1, and the last n columns as group 2.
When running omicplotR
, your R console will be be
unavailable. To quit the app, click the ‘More’ main tab and click ‘Quit
omicplotR’. Alternatively, you can click command
.
on Mac.
Daniel J Giguere wrote the original omicplotR
code and
designed the Shiny
app, with help from Brandon Lieng, Jean
Macklaim, and Greg Gloor. Brandon Lieng wrote the function
clr.strip()
needed for the strip charts. Jean Macklaim
conceptualized this project, contributed ideas for the design, and wrote
the original code for the taxonomic distribution and dendrograms. Greg
contributed ideas for features, and played a role in designing and
implementing omicplotR
.
Currently version 0.99.4.
For more information about how to use omicplotR
, please
visit the wiki on my Github page:
Fernandes et al (2013) PLOS ONE https://doi.org/10.1371/journal.pone.0067019↩︎
https://www.illumina.com/systems/sequencing-platforms/nextseq/specifications.html↩︎
Gloor et al (2017) Front. Microbiol. https://doi.org/10.3389/fmicb.2017.02224↩︎
Macklaim et al (2015) Microb. Ecol. Heal. Dis. https://doi.org/10.3402/mehd.v26.27799.↩︎
McMurrough et al (2014) PNAS doi:10.1073/pnas.1322352111↩︎