In the last years we have witnessed a dramatic change in the clinical treatment
of patients thanks to molecular and personalized medicine. Many medical
institutes are starting to adopt routine genome-wide screenings to complement
and help diagnosis and treatment choices. As the number of datasets grows, we
need to adapt and improve the methods to cope with the complexity, amount and
multi-level structure of available information. Integrating these type of data
remains challenging due to their dimensionality and diverse features.
Moreover, focusing on dysregulated biological processes rather than individual
genes can offer deeper insights into complex diseases like cancer.
MOSClip
is a multi-omic statistical approach based on pathway topology that
can deal with this complexity. It integrates multiple omics - such as
expression, methylation, mutation, or copy number variation - using various
dimensionality reduction strategies. The analysis can be performed at the level
of entire pathways or pathway modules, allowing for a more detailed examination
of the dysregulated mechanisms within large biological processes.
MOSClip
pathway analysis serves two primary purposes: testing the survival
association of pathways or modules using the Cox proportional hazard model,
and conducting a two-class analysis with a generalized linear model.
Additionally, the package offers valuable graphical tools to visualize and
interpret the results.
In recent years many efforts have been dedicated on multi-omic data integration,
with several tools available for pathway analysis in a multi-omic framework;
however, their purpose is mainly limited to two-class comparison.
MOSClip
(Multi-Omic Survival Clip) was originally developed for survival
pathway analysis, combining both multi-omic data integration and graphical
model theory to keep track of gene interactions among a
pathway (Martini et al. 2019).
With this purpose, MOSClip
allows to test survival association of pathways or
their connected components, that we called modules, in a multi-omic
framework.
A second test has been implemented to perform also two-class comparison, to
investigate pathways or modules association with a specific condition.
Multi-omics gene measurements are tested as covariates of a Cox proportional
hazard model or a generalized linear model, after dimensionality reduction of
data.
MOSClip
is highly flexible thanks to its modular structure, allowing the use
of one or multiple different omics, as well as different data reduction
strategies and tests.
In brief, MOSClip
comprises four main components: pathway topology,
multi-omic data and survival or two-class analysis. The final goal is to find
biological processes impacting patient’s survival or patient’s association with
a specific condition.
Furthermore, several graphical tools have been implemented in MOSClip
to
browse, manage and provide help in the interpretation of the results.
In this vignette, we will show an example of survival analysis on four omics: transcriptome, methylome, genomic mutations and genomic copy number variations, testing if these omics can be synergically involved in pathways with survival or two-class prognostication power.
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("MOSClip")
We start by loading example data provided by MOSClip
package to run our
analysis.
reactSmall
object is a list of three interesting Reactome pathways in
graphite::Pathway
format that we will use in our analysis. More information
can be found on the object documentation (?reactSmall
).
multiOmics
object is a multi-omic dataset containing matrices,
genes per patients, of methylation status, somatic mutations, CNVs, and
transcripts expression levels, for 50 ovarian cancer patients from TCGA.
Genes and patients were manually selected to generate this small dataset
intended to illustrate the functionality of the package, allowing users to
explore its methods and tools in a simplified context.
Information on how the dataset was generated can be found in package
documentation with ?multiOmics
. Moreover, additional information on how to
pre-process datasets for a MOSClip
analysis are available in this GitHub
tutorial
or in the Materials and Methods section of the paper (Martini et al. 2019).
library(MOSClip)
library(MultiAssayExperiment)
library(kableExtra)
data("reactSmall")
data("multiOmics")
multiOmics
is an Omics
class object. This MOSClip
specific object derives
from a MultiAssayExperiment
object. It contains standard slots
ExperimentList
, sampleMap
, colData
, and specific slots for MOSClip
analysis: modelInfo
, where the user should specify the dimensionality
reduction method to process each omic dataset, and specificArgs
, where
additional parameters can be set for each method, as described in the help of
each reduction function.
An Omics
class object can be generated with the function makeOmics
starting
from data matrices and additional annotations.
Available methods for dimensionality reduction can be conveniently visualized
with availableOmicMethods
function.
availableOmicMethods()
#> [1] "summarizeToBinaryEvents"
#> [2] "summarizeToNumberOfEvents"
#> [3] "summarizeWithPca"
#> [4] "summarizeInCluster"
#> [5] "summarizeToBinaryDirectionalEvents"
#> [6] "summarizeToNumberOfDirectionalEvents"
In this vignette, we chose to use PCA for expression data, cluster analysis
for methylation data, vote counting for mutations and CNVs (for detail see
MOSClip
paper (Martini et al. 2019)). This data transformations are easily
applied calling MOSClip
functions, thus, here we need only to provide the
name of the needed function.
Given the nature of the methylation data, we expect to have more than a CpG
cluster associated to a gene. For this reason, MOSClip
provides the
possibility to include a dictionary to associate the methylation level of
multiple CpG clusters to a single gene. Thus, in the methylation specific
arguments you need to provide the dictionary to convert cluster names into
genes.
We are now ready to perform the survival analysis on modules using the
function multiOmicsSurvivalModuleTest
. Required inputs are a multi-omic
dataset and a pathway in Pathway
or graphNEL
format. Alternatively, it is
possible to run the analysis also on gene-sets; in this case the topological
information is lost, thus, only the function for pathway test must be used.
In this example, we choose to test only genes that have at least expression
data, as specified with useTheseGenes
. This a priori filter, however, is not
mandatory.
Moreover, it could be useful to set a seed in order to have reproducible
results.
genesToConsider <- row.names(experiments(multiOmics)$exp)
moduleSurv <- lapply(reactSmall, function(g) {
set.seed(1234)
fcl = multiOmicsSurvivalModuleTest(multiOmics, g,
useTheseGenes = genesToConsider)
fcl
})
Once the analysis is complete, we can plot the tabular summary of the top 10
modules selected by p-value of the Cox proportional hazard model using the
function multiPathwayModuleReport
.
moduleSummary <- multiPathwayModuleReport(moduleSurv)
module | pvalue | cnvNEG | cnvPOS | expPC1 | expPC2 | expPC3 | met2k2 | met3k2 | met3k3 | mut | |
---|---|---|---|---|---|---|---|---|---|---|---|
Activation of Matrix Metalloproteinases.4 | 4 | 0.0008290 | NA | 0.0026686 | 0.0509299 | 0.0002584 | 0.5032458 | 0.0037543 | NA | NA | 0.7128964 |
FGFR1 mutant receptor activation.3 | 3 | 0.0027138 | NA | 0.1441293 | 0.2163497 | NA | NA | 0.0010768 | NA | NA | 0.1113289 |
FGFR1 mutant receptor activation.2 | 2 | 0.0044104 | NA | 0.7453814 | 0.2635201 | NA | NA | NA | 0.0051868 | 0.2507027 | 0.9970709 |
Activation of Matrix Metalloproteinases.6 | 6 | 0.0232918 | NA | 0.0083832 | 0.0500642 | 0.0008553 | 0.5538676 | 0.1345511 | NA | NA | 0.5097630 |
FGFR1 mutant receptor activation.1 | 1 | 0.0248414 | 0.3413743 | 0.8067974 | 0.6359349 | 0.4539220 | NA | NA | 0.4315512 | 0.0028007 | 0.0874373 |
Activation of Matrix Metalloproteinases.5 | 5 | 0.0513868 | NA | 0.0239773 | 0.0623145 | 0.0028872 | 0.6663908 | 0.4852597 | NA | NA | 0.7326374 |
Activation of Matrix Metalloproteinases.7 | 7 | 0.0652934 | 0.8248342 | 0.0360290 | 0.9254547 | 0.0017038 | 0.1847143 | 0.3620709 | NA | NA | NA |
Activation of Matrix Metalloproteinases.9 | 9 | 0.0727559 | NA | 0.0783999 | 0.0225378 | 0.0094491 | 0.6212418 | NA | 0.7186513 | 0.4393132 | 0.2449149 |
VEGFA-VEGFR2 Pathway.1 | 1 | 0.0825774 | 0.4602803 | 0.2725537 | 0.0850171 | 0.3649652 | 0.0081958 | 0.2580115 | NA | NA | 0.0478975 |
VEGFA-VEGFR2 Pathway.8 | 8 | 0.1015215 | NA | 0.0354221 | 0.1335834 | 0.1266076 | NA | 0.1198248 | NA | NA | 0.6545505 |
MOSClip
has plenty of functions to visually explore the results.
In the following paragraphs we will show some examples.
We can visualize the table of module test results for a specific pathway, e.g., Activation of Matrix Metalloproteinases.
plotModuleReport(moduleSurv[["Activation of Matrix Metalloproteinases"]],
MOcolors = c(exp = "red", met = "green",
cnv = "yellow", mut = "blue"))
As you can see, among others, the module number 4 of this pathway is significant, and, in particular, covariates expPC2, met2k2 and cnvPOS are driving this significance.
We can have a look at the pathway graph and the module position in the pathway
using plotModuleInGraph
.
plotModuleInGraph(moduleSurv[["Activation of Matrix Metalloproteinases"]],
pathList = reactSmall,
moduleNumber = 4)
Then, we can check at the differences in survival using Kaplan-Meier curves dividing patients in groups with different omics patterns. Here we consider only covariates expPC2 and met2k.
plotModuleKM(moduleSurv[["Activation of Matrix Metalloproteinases"]],
moduleNumber = 4,
formula = "Surv(days, status) ~ expPC2 + met2k",
paletteNames = "Paired", risk.table = TRUE, inYears = TRUE)
We can explore the most important genes of the pathway and their original profiles across the omics using an heatmap plot of the original values. The most important genes are selected as described in Martini et al. (Martini et al. 2019).
additionalA <- colData(multiOmics)
additionalA$status[additionalA$status == 1] <- "event"
additionalA$status[additionalA$status == 0] <- "no_event"
additionalA$PFS <- as.factor(additionalA$status)
additionalA$status <- NULL
additionalA$years <- round(additionalA$days/365.24, 0)
additionalA$days <- NULL
plotModuleHeat(moduleSurv[["Activation of Matrix Metalloproteinases"]],
moduleNumber = 4,
paletteNames = c(exp = "red", met = "green",
cnv = "yellow", mut = "blue"),
additionalAnnotations = additionalA,
additionalPaletteNames = list(PFS = "violet", years = "teal"),
sortBy = c("expPC2", "met2k", "PFS", "years"),
withSampleNames = FALSE)
In second instance, we can ask if two or more omics are significant in the same
module simultaneously and if this omic interaction is more frequent than those
expected by chance. To perform this test we use the runSupertest
function.
A circle plot is returned with the frequency of all significant omic
combinations and their significance levels.
runSupertest(moduleSummary, pvalueThr = 0.05, zscoreThr = 0.05,
excludeColumns = c("pathway", "module"))
In pathway test the pathway topology (the in and out connections of the pathway genes) can be exploited to guide the data reduction step. To do that, we suggest to use the topological PCA instead of the sparse PCA changing the settings in the omics object.
Then, we are ready to run the analysis using the function
multiOmicsSurvivalPathwayTest
.
data("multiOmicsTopo")
pathwaySurv <- lapply(reactSmall, function(g) {
set.seed(1234)
fcl = multiOmicsSurvivalPathwayTest(multiOmicsTopo, g,
useTheseGenes = genesToConsider)
})
We can plot a report of the first 10 pathways, sorted by pvalue of the Cox proportional hazard model.
plotMultiPathwayReport(pathwaySurv,
MOcolors = c(exp = "red", mut = "blue",
cnv = "yellow", met = "green"))
Finally, we can look at the predictive genes using a heatmap with patient additional annotations.
plotPathwayHeat(pathwaySurv[["Activation of Matrix Metalloproteinases"]],
sortBy = c("expPC1", "cnvPOS", "PFS"),
paletteNames = c(exp = "red", met = "green",
mut = "blue", cnv = "yellow"),
additionalAnnotations = additionalA,
additionalPaletteNames = list(PFS = "violet", years = "teal"),
withSampleNames = FALSE)
Then, we can also check for differences in survival using Kaplan-Meier curves dividing patients in groups with different omics patterns (e.g. patients with different methylation pattern and high and low levels of PC2 in expression).
plotPathwayKM(pathwaySurv[["Activation of Matrix Metalloproteinases"]],
formula = "Surv(days, status) ~ expPC1 + cnvPOS",
paletteNames = "Paired")
MOSClip
gives the possibility to prioritize most important and
stable pathway or module results, running a resampling procedure that can be
found on the
extended tutorial on GitHub.
More tutorials are also available on how to perform a two-class analysis with
MOSClip
, as well as examples of more plots that were not shown in this
vignette.
sessionInfo()
#> R Under development (unstable) (2024-10-21 r87258)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] kableExtra_1.4.0 MultiAssayExperiment_1.33.0
#> [3] SummarizedExperiment_1.37.0 Biobase_2.67.0
#> [5] GenomicRanges_1.59.0 GenomeInfoDb_1.43.0
#> [7] IRanges_2.41.0 S4Vectors_0.45.0
#> [9] BiocGenerics_0.53.0 MatrixGenerics_1.19.0
#> [11] matrixStats_1.4.1 MOSClip_1.1.0
#> [13] BiocStyle_2.35.0
#>
#> loaded via a namespace (and not attached):
#> [1] splines_4.5.0 BiocIO_1.17.0 bitops_1.0-9
#> [4] ggplotify_0.1.2 tibble_3.2.1 qpgraph_2.41.0
#> [7] graph_1.85.0 XML_3.99-0.17 lifecycle_1.0.4
#> [10] rstatix_0.7.2 doParallel_1.0.17 lattice_0.22-6
#> [13] MASS_7.3-61 flashClust_1.01-2 SuperExactTest_1.1.0
#> [16] NbClust_3.0.1 backports_1.5.0 magrittr_2.0.3
#> [19] sass_0.4.9 rmarkdown_2.28 jquerylib_0.1.4
#> [22] yaml_2.3.10 DBI_1.2.3 RColorBrewer_1.1-3
#> [25] multcomp_1.4-26 abind_1.4-8 zlibbioc_1.53.0
#> [28] purrr_1.0.2 elasticnet_1.3 RCurl_1.98-1.16
#> [31] yulab.utils_0.1.7 TH.data_1.1-2 rappdirs_0.3.3
#> [34] sandwich_3.1-1 circlize_0.4.16 GenomeInfoDbData_1.2.13
#> [37] KMsurv_0.1-5 ggrepel_0.9.6 gRbase_2.0.3
#> [40] pheatmap_1.0.12 annotate_1.85.0 commonmark_1.9.2
#> [43] svglite_2.1.3 codetools_0.2-20 DelayedArray_0.33.0
#> [46] ggtext_0.1.2 xml2_1.3.6 DT_0.33
#> [49] tidyselect_1.2.1 shape_1.4.6.1 farver_2.1.2
#> [52] UCSC.utils_1.3.0 GenomicAlignments_1.43.0 jsonlite_1.8.9
#> [55] GetoptLong_1.0.5 Formula_1.2-5 survival_3.7-0
#> [58] iterators_1.0.14 emmeans_1.10.5 coxrobust_1.0.1
#> [61] systemfonts_1.1.0 foreach_1.5.2 tools_4.5.0
#> [64] Rcpp_1.0.13 glue_1.8.0 gridExtra_2.3
#> [67] SparseArray_1.7.0 xfun_0.48 dplyr_1.1.4
#> [70] withr_3.0.2 BiocManager_1.30.25 fastmap_1.2.0
#> [73] fansi_1.0.6 digest_0.6.37 R6_2.5.1
#> [76] gridGraphics_0.5-1 estimability_1.5.1 colorspace_2.1-1
#> [79] Cairo_1.6-2 markdown_1.13 RSQLite_2.3.7
#> [82] utf8_1.2.4 tidyr_1.3.1 generics_0.1.3
#> [85] data.table_1.16.2 corpcor_1.6.10 rtracklayer_1.67.0
#> [88] httr_1.4.7 htmlwidgets_1.6.4 S4Arrays_1.7.0
#> [91] scatterplot3d_0.3-44 graphite_1.53.0 pkgconfig_2.0.3
#> [94] gtable_0.3.6 blob_1.2.4 ComplexHeatmap_2.23.0
#> [97] XVector_0.47.0 survMisc_0.5.6 htmltools_0.5.8.1
#> [100] carData_3.0-5 bookdown_0.41 multcompView_0.1-10
#> [103] clue_0.3-65 scales_1.3.0 leaps_3.2
#> [106] png_0.1-8 knitr_1.48 km.ci_0.5-6
#> [109] rstudioapi_0.17.1 rjson_0.2.23 coda_0.19-4.1
#> [112] checkmate_2.3.2 curl_5.2.3 org.Hs.eg.db_3.20.0
#> [115] cachem_1.1.0 zoo_1.8-12 GlobalOptions_0.1.2
#> [118] stringr_1.5.1 parallel_4.5.0 AnnotationDbi_1.69.0
#> [121] restfulr_0.0.15 pillar_1.9.0 grid_4.5.0
#> [124] reshape_0.8.9 vctrs_0.6.5 maxstat_0.7-25
#> [127] ggpubr_0.6.0 car_3.1-3 xtable_1.8-4
#> [130] cluster_2.1.6 Rgraphviz_2.51.0 evaluate_1.0.1
#> [133] tinytex_0.53 magick_2.8.5 GenomicFeatures_1.59.0
#> [136] mvtnorm_1.3-1 cli_3.6.3 compiler_4.5.0
#> [139] Rsamtools_2.23.0 rlang_1.1.4 crayon_1.5.3
#> [142] ggsignif_0.6.4 labeling_0.4.3 survminer_0.4.9
#> [145] plyr_1.8.9 fs_1.6.4 stringi_1.8.4
#> [148] viridisLite_0.4.2 BiocParallel_1.41.0 munsell_0.5.1
#> [151] Biostrings_2.75.0 qtl_1.70 Matrix_1.7-1
#> [154] lars_1.3 bit64_4.5.2 ggplot2_3.5.1
#> [157] KEGGREST_1.47.0 FactoMineR_2.11 highr_0.11
#> [160] gridtext_0.1.5 exactRankTests_0.8-35 igraph_2.1.1
#> [163] broom_1.0.7 memoise_2.0.1 bslib_0.8.0
#> [166] bit_4.5.0