Introduction
library(ggtree)
library(ggtreeDendro)
library(aplot)
scale_color_subtree <- ggtreeDendro::scale_color_subtree
Clustering is very importance method to classify items into different categories and to infer functions since similar objects tend to behavior similarly. There are more than 200 packages in Bioconductor implement clustering algorithms or employ clustering methods for omic-data analysis.
Albeit the methods are important for data analysis, the visualization
is quite limited. Most the the packages only have the ability to
visualize the hierarchical tree structure using
stats:::plot.hclust()
. This package is design to visualize
hierarchical tree structure with associated data (e.g., clinical
information collected with the samples) using the powerful in-house
developed ggtree
package.
This package implements a set of autoplot()
methods to
display tree structure. We will implement more autoplot()
methods to support more objects. The output of these
autoplot()
methods is a ggtree
object, which
can be further annotated by adding layers using ggplot2 syntax.
Integrating associated data to annotate the tree is also supported by ggtreeExtra
package.
Here are some demonstrations of using autoplot()
methods
to visualize common hierarchical clustering tree objects.
hclust
and dendrogram
objects
These two classes are defined in the stats package.
linkage
object
The class linkage
is defined in the mdendro
package.
agnes
, diana
and twins
objects
These classes are defined in the cluster package.
pvclust
object
The pvclust
class is defined in the pvclust
package.
library(pvclust)
data(Boston, package = "MASS")
set.seed(123)
result <- pvclust(Boston, method.dist="cor", method.hclust="average", nboot=1000, parallel=TRUE)
## Creating a temporary cluster...done:
## socket cluster with 71 nodes on host 'localhost'
## Multiscale bootstrap... Done.
The pvclust
object contains two types of p-values: AU
(Approximately Unbiased) p-value and BP (Boostrap Probability) value.
These values will be automatically labelled on the tree.
Session information
Here is the output of sessionInfo() on the system on which this document was compiled:
## R Under development (unstable) (2024-10-21 r87258)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] pvclust_2.2-0 cluster_2.1.6 mdendro_2.2.1 aplot_0.2.3
## [5] ggtreeDendro_1.9.0 ggtree_3.15.0 yulab.utils_0.1.7
##
## loaded via a namespace (and not attached):
## [1] sass_0.4.9 utf8_1.2.4 generics_0.1.3 tidyr_1.3.1
## [5] prettydoc_0.4.1 ggplotify_0.1.2 lattice_0.22-6 digest_0.6.37
## [9] magrittr_2.0.3 evaluate_1.0.1 grid_4.5.0 fastmap_1.2.0
## [13] jsonlite_1.8.9 ape_5.8 purrr_1.0.2 fansi_1.0.6
## [17] scales_1.3.0 lazyeval_0.2.2 jquerylib_0.1.4 cli_3.6.3
## [21] rlang_1.1.4 munsell_0.5.1 tidytree_0.4.6 withr_3.0.2
## [25] cachem_1.1.0 yaml_2.3.10 tools_4.5.0 parallel_4.5.0
## [29] dplyr_1.1.4 colorspace_2.1-1 ggplot2_3.5.1 vctrs_0.6.5
## [33] R6_2.5.1 gridGraphics_0.5-1 lifecycle_1.0.4 fs_1.6.4
## [37] ggfun_0.1.7 treeio_1.31.0 pkgconfig_2.0.3 pillar_1.9.0
## [41] bslib_0.8.0 gtable_0.3.6 glue_1.8.0 Rcpp_1.0.13
## [45] highr_0.11 xfun_0.48 tibble_3.2.1 tidyselect_1.2.1
## [49] knitr_1.48 farver_2.1.2 htmltools_0.5.8.1 nlme_3.1-166
## [53] patchwork_1.3.0 labeling_0.4.3 rmarkdown_2.28 compiler_4.5.0