The GrafGen package is for classifying Helicobacter pylori genomes
according to genetic distance from nine reference populations as defined
by equation 2 in Jin (2019). The main function is this package is
grafGen()
which requires a file of genotypes that can be
either a PLINK bed file or a VCF file.
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("GrafGen")
Before using the GrafGen package, it must be loaded into an R session.
library(GrafGen)
The GrafGen package includes example data which is a subset of the reference data that was used to train the model. The data is stored in the extdata folder.
dir <- system.file("extdata", package="GrafGen", mustWork=TRUE)
geno.file <- paste0(dir, .Platform$file.sep, "data.vcf.gz")
print(geno.file)
## [1] "/tmp/RtmpLve1Qu/Rinste46785216f55e/GrafGen/extdata/data.vcf.gz"
The grafGen()
function returns a list of class “grafpop”
with two objects: table
and vertex
. The object
table
is a data frame containing hypothetical ancestry
percents (F_percent, E_percent and A_percent) based on known African,
European and Asian samples, respectively, normalized genetic distance
scores (GD1_x, GD2_y, GD3_z), the predicted reference population
(Refpop), nearest neighboring reference population, percent separation
as defined in the user manual and the genetic distances to each
reference populations (hpgpAfrica, hpgpAfrica-distant, hpgpAfroamerica,
hpgpEuroamerica, hpgpMediterranea, hpgpEurope, hpgpEurasia, hpgpAsia,
and hpgpAklavik86-like).
The object vertex
is a list containing the (fixed) x-y
coordinates of the African, European and Asian vertex population
centroids.
ret <- grafGen(geno.file, print=0)
ret$table[seq_len(5), ]
## Sample N_SNPs GD1_x GD2_y GD3_z F_percent E_percent A_percent
## 1 HpGP-ALG-002 35528 1.325330 1.246303 -0.008719 27.79 72.21 0
## 2 HpGP-ALG-004 35528 1.355911 1.264769 0.004511 19.76 80.24 0
## 3 HpGP-ALG-005 35528 1.350071 1.267337 -0.003531 19.70 80.30 0
## 4 HpGP-ALG-006 35528 1.340957 1.265292 -0.002128 21.14 78.86 0
## 5 HpGP-ALG-010 35528 1.343997 1.266336 0.003096 20.57 79.43 0
## hpgpAfrica hpgpAfrica-distant hpgpAfroamerica hpgpEuroamerica
## 1 0.398096 0.661930 0.324232 0.276226
## 2 0.429534 0.660384 0.336875 0.279032
## 3 0.420432 0.655047 0.331030 0.275570
## 4 0.416124 0.658398 0.327399 0.275852
## 5 0.422221 0.657674 0.333455 0.279073
## hpgpMediterranea hpgpEurope hpgpEurasia hpgpAsia hpgpAklavik86-like
## 1 0.277100 0.305868 0.395434 0.577032 0.597266
## 2 0.260454 0.289007 0.380532 0.564095 0.587367
## 3 0.256573 0.284675 0.379566 0.565708 0.589778
## 4 0.257503 0.288340 0.383189 0.573169 0.592772
## 5 0.261808 0.288747 0.384723 0.573536 0.594683
## Refpop Nearest_neighbor Separation_percent
## 1 hpgpEuroamerica hpgpMediterranea 0.32
## 2 hpgpMediterranea hpgpEuroamerica 7.13
## 3 hpgpMediterranea hpgpEuroamerica 7.40
## 4 hpgpMediterranea hpgpEuroamerica 7.13
## 5 hpgpMediterranea hpgpEuroamerica 6.59
Printing the return object from grafGen()
will display a
table of frequency counts for the predicted reference populations for
the user input data.
print(ret)
##
## Predicted reference population counts:
## hpgpAfrica hpgpAfrica-distant hpgpAfroamerica hpgpEuroamerica
## 15 1 10 28
## hpgpMediterranea hpgpEurope hpgpEurasia hpgpAsia
## 44 57 13 36
## hpgpAklavik86-like
## 2
Plotting the return object will display a plot of the genetic
distance scores (GD1_x vs GD2_y) for the user input data and the
reference data. Additional plots can be obtained by calling the
grafGenPlot()
function.
plot(ret)
The functions interactiveReferencePlot
and
interactivePlot
create interactive plots for the reference
data and user input data respectively. A call to
interactiveReferencePlot
will all show the results of all
samples in the reference data. Hovering over a point in the plot will
display three lines of information. Line 1 contains the type and id of
that sample. Line 2 contains the sample’s reference population, next
nearest reference population, and separation percent to the next nearest
reference population as defined in the user manual. Line 3 contains the
percent African, European and Asian ancestry for that sample. The legend
shows the types (which are the source countries in
interactiveReferencePlot
) for all samples, and clicking the
name of a type will add or remove those samples from the plot.
if (interactive()) interactiveReferencePlot()
The GrafGen
package also includes an R shiny app to view
and filter the plot using up to two variables. The function
createApp
returns a list containing the app and data
objects needed with the app. The app then can be launched with the
runApp
function.
tmp <- createApp(ret)
if (interactive()) {
reference_results <- tmp$reference_results
user_results <- tmp$user_results
user_metadata <- tmp$user_metadata
shiny::runApp(tmp$app)
}
sessionInfo()
## R Under development (unstable) (2024-10-21 r87258)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] GrafGen_1.3.0
##
## loaded via a namespace (and not attached):
## [1] gtable_0.3.6 xfun_0.48 bslib_0.8.0
## [4] ggplot2_3.5.1 htmlwidgets_1.6.4 rstatix_0.7.2
## [7] vctrs_0.6.5 tools_4.5.0 generics_0.1.3
## [10] stats4_4.5.0 tibble_3.2.1 fansi_1.0.6
## [13] highr_0.11 pkgconfig_2.0.3 data.table_1.16.2
## [16] RColorBrewer_1.1-3 S4Vectors_0.45.0 GenomeInfoDbData_1.2.13
## [19] lifecycle_1.0.4 farver_2.1.2 compiler_4.5.0
## [22] stringr_1.5.1 munsell_0.5.1 carData_3.0-5
## [25] httpuv_1.6.15 GenomeInfoDb_1.43.0 htmltools_0.5.8.1
## [28] sass_0.4.9 yaml_2.3.10 lazyeval_0.2.2
## [31] Formula_1.2-5 plotly_4.10.4 pillar_1.9.0
## [34] later_1.3.2 car_3.1-3 ggpubr_0.6.0
## [37] jquerylib_0.1.4 tidyr_1.3.1 MASS_7.3-61
## [40] cachem_1.1.0 abind_1.4-8 mime_0.12
## [43] tidyselect_1.2.1 digest_0.6.37 stringi_1.8.4
## [46] dplyr_1.1.4 purrr_1.0.2 cowplot_1.1.3
## [49] fastmap_1.2.0 grid_4.5.0 colorspace_2.1-1
## [52] cli_3.6.3 magrittr_2.0.3 utf8_1.2.4
## [55] broom_1.0.7 withr_3.0.2 scales_1.3.0
## [58] UCSC.utils_1.3.0 promises_1.3.0 backports_1.5.0
## [61] XVector_0.47.0 rmarkdown_2.28 httr_1.4.7
## [64] ggsignif_0.6.4 shiny_1.9.1 evaluate_1.0.1
## [67] knitr_1.48 GenomicRanges_1.59.0 IRanges_2.41.0
## [70] viridisLite_0.4.2 rlang_1.1.4 Rcpp_1.0.13
## [73] xtable_1.8-4 glue_1.8.0 BiocGenerics_0.53.0
## [76] jsonlite_1.8.9 R6_2.5.1 zlibbioc_1.53.0