rgoslin 1.11.0
This project is a parser, validator and normalizer implementation for shorthand lipid nomenclatures, base on the Grammar of Succinct Lipid Nomenclatures project.
Goslin defines multiple grammars for different sources of shorthand lipid nomenclature. This allows to generate parsers based on the defined grammars, which provide immediate feedback whether a processed lipid shorthand notation string is compliant with a particular grammar, or not.
NOTE: Please report any issues you might find to help improve it!
Here, rgoslin 2.0 uses the Goslin grammars and the cppgoslin parser to support the following general tasks:
The package can be installed with the BiocManager
package as follows:
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("rgoslin")
In order to use the provided translation functions of rgoslin, you first need to load the library. If rgoslin is not yet available, please follow the instructions in the previous section on “Installation”.
library(rgoslin)
If you want to check, which grammars are supported, use the following command:
listAvailableGrammars()
#> [1] "Shorthand2020" "Goslin" "FattyAcids" "LipidMaps"
#> [5] "SwissLipids" "HMDB"
To check, whether a given lipid name can be parsed by any of the parsers, you can use the isValidLipidName
method. It will return TRUE
if the given name can be parsed by any of the available parsers and FALSE
if the name was not parseable.
isValidLipidName("PC 32:1")
#> [1] TRUE
Using parseLipidNames
with a lipid name returns a data frame of properties of the parsed lipid name as columns.
df <- parseLipidNames("PC 32:1")
Normalized.Name | Original.Name | Grammar | Message | Adduct | Adduct.Charge | Lipid.Maps.Category | Lipid.Maps.Main.Class | Species.Name | Extended.Species.Name | Molecular.Species.Name | Sn.Position.Name | Structure.Defined.Name | Full.Structure.Name | Functional.Class.Abbr | Functional.Class.Synonyms | Level | Total.C | Total.OH | Total.O | Total.DB | Mass | Sum.Formula |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
PC 32:1 | PC 32:1 | Shorthand2020 | NA | NA | 0 | GP | PC | PC 32:1 | PC | NA | NA | NA | NA | [PC] | [PC, GPC, GPCho, PlsCho] | SPECIES | 32 | 0 | 0 | 1 | 731.5465 | C40H78NO8P |
If you want to set the grammar to parse against manually, this is also possible as the second argument:
originalName <- "TG(16:1(5E)/18:0/20:2(3Z,6Z))"
tagDf <- parseLipidNames(originalName, grammar = "LipidMaps")
Normalized.Name | Original.Name | Grammar | Message | Adduct | Adduct.Charge | Lipid.Maps.Category | Lipid.Maps.Main.Class | Species.Name | Extended.Species.Name | Molecular.Species.Name | Sn.Position.Name | Structure.Defined.Name | Full.Structure.Name | Functional.Class.Abbr | Functional.Class.Synonyms | Level | Total.C | Total.OH | Total.O | Total.DB | Mass | Sum.Formula |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
TG 16:1(5E)/18:0/20:2(3Z,6Z) | TG(16:1(5E)/18:0/20:2(3Z,6Z)) | LipidMaps | NA | NA | 0 | GL | TG | TG 54:3 | TG | TG 16:1_18:0_20:2 | TG 16:1/18:0/20:2 | TG 16:1(5)/18:0/20:2(3,6) | TG 16:1(5E)/18:0/20:2(3Z,6Z) | [TG] | [TG, TAG] | FULL_STRUCTURE | 54 | 0 | 0 | 3 | 884.7833 | C57H104O6 |
Normalized.Name | FA1.Position | FA1.C | FA1.OH | FA1.O | FA1.DB | FA1.Bond.Type | FA1.DB.Positions | FA2.Position | FA2.C | FA2.OH | FA2.O | FA2.DB | FA2.Bond.Type | FA2.DB.Positions | FA3.Position | FA3.C | FA3.OH | FA3.O | FA3.DB | FA3.Bond.Type | FA3.DB.Positions | FA4.Position | FA4.C | FA4.OH | FA4.O | FA4.DB | FA4.Bond.Type | FA4.DB.Positions | LCB.Position | LCB.C | LCB.OH | LCB.O | LCB.DB | LCB.Bond.Type | LCB.DB.Positions |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
TG 16:1(5E)/18:0/20:2(3Z,6Z) | 1 | 16 | 0 | 0 | 1 | ESTER | [5E] | 2 | 18 | 0 | 0 | 0 | ESTER | [] | 3 | 20 | 0 | 0 | 2 | ESTER | [3Z, 6Z] | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
If you want to parse multiple lipid names, use the parseLipidNames
method with a vector of lipid names. This returns a data frame of properties of the parsed lipid names with one row per lipid.
NOTE: Omitting the grammar argument will test all available parsers, the first one to successfully parse the name wins. This will consequently take longer than explicitly setting the grammar to select the parser for.
multipleLipidNamesDf <- parseLipidNames(c("PC 32:1","LPC 34:1","TG(18:1_18:0_16:1)"))
Normalized.Name | Original.Name | Grammar | Message | Adduct | Adduct.Charge | Lipid.Maps.Category | Lipid.Maps.Main.Class | Species.Name | Extended.Species.Name | Molecular.Species.Name | Sn.Position.Name | Structure.Defined.Name | Full.Structure.Name | Functional.Class.Abbr | Functional.Class.Synonyms | Level | Total.C | Total.OH | Total.O | Total.DB | Mass | Sum.Formula |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
PC 32:1 | PC 32:1 | Shorthand2020 | NA | NA | 0 | GP | PC | PC 32:1 | PC | NA | NA | NA | NA | [PC] | [PC, GPC, GPCho, PlsCho] | SPECIES | 32 | 0 | 0 | 1 | 731.5465 | C40H78NO8P |
LPC 34:1 | LPC 34:1 | Shorthand2020 | NA | NA | 0 | GP | LPC | LPC 34:1 | LPC | LPC 34:1 | NA | NA | NA | [LPC] | [LPC, LysoPC, lysoPC] | MOLECULAR_SPECIES | 34 | 0 | 0 | 1 | 745.5985 | C42H84NO7P |
TG 18:1_18:0_16:1 | TG(18:1_18:0_16:1) | LipidMaps | NA | NA | 0 | GL | TG | TG 52:2 | TG | TG 18:1_18:0_16:1 | NA | NA | NA | [TG] | [TG, TAG] | MOLECULAR_SPECIES | 52 | 0 | 0 | 2 | 858.7676 | C55H102O6 |
Normalized.Name | FA1.Position | FA1.C | FA1.OH | FA1.O | FA1.DB | FA1.Bond.Type | FA1.DB.Positions | FA2.Position | FA2.C | FA2.OH | FA2.O | FA2.DB | FA2.Bond.Type | FA2.DB.Positions | FA3.Position | FA3.C | FA3.OH | FA3.O | FA3.DB | FA3.Bond.Type | FA3.DB.Positions | FA4.Position | FA4.C | FA4.OH | FA4.O | FA4.DB | FA4.Bond.Type | FA4.DB.Positions | LCB.Position | LCB.C | LCB.OH | LCB.O | LCB.DB | LCB.Bond.Type | LCB.DB.Positions |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
PC 32:1 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
LPC 34:1 | -1 | 34 | 0 | 0 | 1 | ESTER | [] | -1 | 0 | 0 | 0 | 0 | ESTER | [] | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
TG 18:1_18:0_16:1 | -1 | 18 | 0 | 0 | 1 | ESTER | [] | -1 | 18 | 0 | 0 | 0 | ESTER | [] | -1 | 16 | 0 | 0 | 1 | ESTER | [] | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
Finally, if you want to parse multiple lipid names and want to use one particular grammar, simply add its name as the “grammar” argument.
originalNames <- c("PC 32:1","LPC 34:1","TAG 18:1_18:0_16:1")
multipleLipidNamesWithGrammar <- parseLipidNames(originalNames, grammar = "Goslin")
Normalized.Name | Original.Name | Grammar | Message | Adduct | Adduct.Charge | Lipid.Maps.Category | Lipid.Maps.Main.Class | Species.Name | Extended.Species.Name | Molecular.Species.Name | Sn.Position.Name | Structure.Defined.Name | Full.Structure.Name | Functional.Class.Abbr | Functional.Class.Synonyms | Level | Total.C | Total.OH | Total.O | Total.DB | Mass | Sum.Formula |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
PC 32:1 | PC 32:1 | Goslin | NA | NA | 0 | GP | PC | PC 32:1 | PC | NA | NA | NA | NA | [PC] | [PC, GPC, GPCho, PlsCho] | SPECIES | 32 | 0 | 0 | 1 | 731.5465 | C40H78NO8P |
LPC 34:1 | LPC 34:1 | Goslin | NA | NA | 0 | GP | LPC | LPC 34:1 | LPC | LPC 34:1 | NA | NA | NA | [LPC] | [LPC, LysoPC, lysoPC] | MOLECULAR_SPECIES | 34 | 0 | 0 | 1 | 745.5985 | C42H84NO7P |
TG 18:1_18:0_16:1 | TAG 18:1_18:0_16:1 | Goslin | NA | NA | 0 | GL | TG | TG 52:2 | TG | TG 18:1_18:0_16:1 | NA | NA | NA | [TAG] | [TG, TAG] | MOLECULAR_SPECIES | 52 | 0 | 0 | 2 | 858.7676 | C55H102O6 |
Normalized.Name | FA1.Position | FA1.C | FA1.OH | FA1.O | FA1.DB | FA1.Bond.Type | FA1.DB.Positions | FA2.Position | FA2.C | FA2.OH | FA2.O | FA2.DB | FA2.Bond.Type | FA2.DB.Positions | FA3.Position | FA3.C | FA3.OH | FA3.O | FA3.DB | FA3.Bond.Type | FA3.DB.Positions | FA4.Position | FA4.C | FA4.OH | FA4.O | FA4.DB | FA4.Bond.Type | FA4.DB.Positions | LCB.Position | LCB.C | LCB.OH | LCB.O | LCB.DB | LCB.Bond.Type | LCB.DB.Positions |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
PC 32:1 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
LPC 34:1 | -1 | 34 | 0 | 0 | 1 | ESTER | [] | -1 | 0 | 0 | 0 | 0 | ESTER | [] | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
TG 18:1_18:0_16:1 | -1 | 18 | 0 | 0 | 1 | ESTER | [] | -1 | 18 | 0 | 0 | 0 | ESTER | [] | -1 | 16 | 0 | 0 | 1 | ESTER | [] | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
LIPID MAPS has a number of fatty acids that use names following the IUPAC-IUB Fatty Acids nomenclature. These can now also be parsed and converted to the updated lipid shorthand nomenclature. We are using LMFA01020216 and LMFA08040030 as examples here:
originalNames <- c("LMFA01020216"="5-methyl-octadecanoic acid", "LMFA08040030"="N-((+/-)-8,9-dihydroxy-5Z,11Z,14Z-eicosatrienoyl)-ethanolamine")
normalizedFattyAcidsNames <- parseLipidNames(originalNames, "FattyAcids")
Normalized.Name | Original.Name | Grammar | Message | Lipid.Maps.Category | Lipid.Maps.Main.Class | Species.Name | Extended.Species.Name | Molecular.Species.Name | Sn.Position.Name | Structure.Defined.Name | Full.Structure.Name | Functional.Class.Abbr | Functional.Class.Synonyms | Level | Total.C | Total.OH | Total.O | Total.DB | Mass | Sum.Formula | FA1.Position | FA1.C | FA1.OH | FA1.O | FA1.DB | FA1.Bond.Type | FA1.DB.Positions |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
FA 18:0;5Me | 5-methyl-octadecanoic acid | FattyAcids | NA | FA | FA | FA 19:0 | FA | FA 19:0 | FA 19:0 | FA 18:0;Me | FA 18:0;5Me | [FA] | [FA, fatty acid] | FULL_STRUCTURE | 19 | 0 | 0 | 0 | 298.2872 | C19H38O2 | 1 | 18 | 0 | 0 | 0 | ESTER | [] |
NAE 20:3(5Z,11Z,14Z);8OH,9OH | N-((+/-)-8,9-dihydroxy-5Z,11Z,14Z-eicosatrienoyl)-ethanolamine | FattyAcids | NA | FA | NAE | NAE 20:3;O2 | NAE | NAE 20:3;O2 | NAE 20:3;O2 | NAE 20:3(5,11,14);(OH)2 | NAE 20:3(5Z,11Z,14Z);8OH,9OH | [NAE] | [NAE] | FULL_STRUCTURE | 20 | 2 | 2 | 3 | 381.2879 | C22H39NO4 | 1 | 20 | 2 | 2 | 3 | ESTER | [5Z, 11Z, 14Z] |
The Goslin parser also support reading of lipid shorthand names with adducts:
originalNames <- c("PC 32:1[M+H]1+", "PC 32:1 [M+H]+","PC 32:1")
lipidNamesWithAdduct <- parseLipidNames(originalNames, "Goslin")
This will populate the columns “Adduct” and “AdductCharge” with the respective values. Please note that we recommend to use the adduct and its charge in full IUPAC recommended nomenclature:
Normalized.Name | Original.Name | Grammar | Message | Adduct | Adduct.Charge | Lipid.Maps.Category | Lipid.Maps.Main.Class | Species.Name | Extended.Species.Name | Molecular.Species.Name | Sn.Position.Name | Structure.Defined.Name | Full.Structure.Name | Functional.Class.Abbr | Functional.Class.Synonyms | Level | Total.C | Total.OH | Total.O | Total.DB | Mass | Sum.Formula |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
PC 32:1 | PC 32:1[M+H]1+ | Goslin | NA | [M+H]1+ | 1 | GP | PC | PC 32:1 | PC | NA | NA | NA | NA | [PC] | [PC, GPC, GPCho, PlsCho] | SPECIES | 32 | 0 | 0 | 1 | 732.5538 | C40H79NO8P |
PC 32:1 | PC 32:1 [M+H]+ | Goslin | NA | [M+H]1+ | 1 | GP | PC | PC 32:1 | PC | NA | NA | NA | NA | [PC] | [PC, GPC, GPCho, PlsCho] | SPECIES | 32 | 0 | 0 | 1 | 732.5538 | C40H79NO8P |
PC 32:1 | PC 32:1 | Goslin | NA | NA | 0 | GP | PC | PC 32:1 | PC | NA | NA | NA | NA | [PC] | [PC, GPC, GPCho, PlsCho] | SPECIES | 32 | 0 | 0 | 1 | 731.5465 | C40H78NO8P |
lipidr is a Bioconductor package with specific support for QC checking, uni- and multivariate analysis and visualization of lipidomics data acquired with Skyline or from metabolomics workbench. It uses a custom implementation for lipid name handling that does not yet support the updated shorthand nomenclature.
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("lipidr")
After installation of lipidr, we need to load the following libraries for this example:
library(rgoslin)
library(lipidr)
library(stringr)
library(ggplot2)
We will use lipidr’s workflow example to illustrate how to apply rgoslin in such a use-case. We read in an example dataset exported from Skyline based on a targeted lipidomics experiment:
datadir = system.file("extdata", package="lipidr")
filelist = list.files(datadir, "data.csv", full.names = TRUE) # all csv files
d = read_skyline(filelist)
#> Successfully read 3 methods.
#> Your data contain 58 samples, 10 lipid classes, 277 lipid molecules.
clinical_file = system.file("extdata", "clin.csv", package="lipidr")
d = add_sample_annotation(d, clinical_file)
This dataset contains the lipid names in the Molecule
column and the name preprocessed with lipidr
in the clean_name
column. In this excerpt, you can see PE 34:1 NEG
in row 5, which indicates measurement of this lipid in negative mode. Please note that this is specific to this example and not a generally applied naming convention.
filename | Molecule | Precursor.Mz | Precursor.Charge | Product.Mz | Product.Charge | clean_name | ambig | not_matched | istd | class_stub | chain1 | l_1 | s_1 | chain2 | l_2 | s_2 | chain3 | l_3 | s_3 | chain4 | l_4 | s_4 | total_cl | total_cs | Class |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
A1_data.csv | PE 32:0 | 692.5 | 1 | 551.5 | 1 | PE 32:0 | FALSE | FALSE | FALSE | PE | 32:0 | 32 | 0 | NA | NA | NA | NA | NA | NA | 32 | 0 | PE | |||
A1_data.csv | PE 32:1 | 690.5 | 1 | 549.5 | 1 | PE 32:1 | FALSE | FALSE | FALSE | PE | 32:1 | 32 | 1 | NA | NA | NA | NA | NA | NA | 32 | 1 | PE | |||
A1_data.csv | PE 32:2 | 688.5 | 1 | 547.5 | 1 | PE 32:2 | FALSE | FALSE | FALSE | PE | 32:2 | 32 | 2 | NA | NA | NA | NA | NA | NA | 32 | 2 | PE | |||
A1_data.csv | PE 34:1 | 718.5 | 1 | 577.5 | 1 | PE 34:1 | FALSE | FALSE | FALSE | PE | 34:1 | 34 | 1 | NA | NA | NA | NA | NA | NA | 34 | 1 | PE | |||
A1_data.csv | PE 34:1 NEG | 716.5 | 1 | 196.0 | 1 | PE 34:1 | FALSE | FALSE | FALSE | PE | 34:1 | 34 | 1 | NA | NA | NA | NA | NA | NA | 34 | 1 | PE | |||
A1_data.csv | PE 34:2 | 716.5 | 1 | 575.5 | 1 | PE 34:2 | FALSE | FALSE | FALSE | PE | 34:2 | 34 | 2 | NA | NA | NA | NA | NA | NA | 34 | 2 | PE | |||
A1_data.csv | PE 34:3 | 714.5 | 1 | 573.5 | 1 | PE 34:3 | FALSE | FALSE | FALSE | PE | 34:3 | 34 | 3 | NA | NA | NA | NA | NA | NA | 34 | 3 | PE | |||
A1_data.csv | PE 36:0 | 748.6 | 1 | 607.6 | 1 | PE 36:0 | FALSE | FALSE | FALSE | PE | 36:0 | 36 | 0 | NA | NA | NA | NA | NA | NA | 36 | 0 | PE | |||
A1_data.csv | PE 36:1 | 746.6 | 1 | 605.6 | 1 | PE 36:1 | FALSE | FALSE | FALSE | PE | 36:1 | 36 | 1 | NA | NA | NA | NA | NA | NA | 36 | 1 | PE | |||
A1_data.csv | PE 36:1 NEG | 744.6 | 1 | 196.0 | 1 | PE 36:1 | FALSE | FALSE | FALSE | PE | 36:1 | 36 | 1 | NA | NA | NA | NA | NA | NA | 36 | 1 | PE |
Now, let’s try to parse the clean lipid names to enrich the data table.
Note: In this case, we expect to see error messages, since some lipid names use a) unsupported head group names or b) unsupported suffixes to indicate isotopically labeled lipids.
lipidNames <- parseLipidNames(rowData(d)$clean_name)
#> Encountered an error while parsing 'So1P 17:1': Expecting a single string value: [type=character; extent=4].
#> Encountered an error while parsing 'So1P 18:1': Expecting a single string value: [type=character; extent=4].
We see that lipidr’s example dataset uses the (d9)
suffix to indicate isotopically labeled lipids (d=Deuterium) that are used as internal standards for quantification.
We need to convert Sa1P
, sphinganine-1-phosphate, to SPBP
, as well as So1P
, sphingosine-1-phosphate to align with recent nomenclature updates.
# lipidr stores original lipid names in the Molecule column
old_names <- rowData(d)$Molecule
# split lipid prefix from potential (d7) suffix for labeled internal standards
new_names <- rowData(d)$clean_name %>% str_match(pattern="([\\w :-]+)(\\(\\w+\\))?")
# extract the first match group (the original word is at column index 1)
normalized_new_names <- new_names[,2] %>% str_replace_all(c("Sa1P"="SPBP","So1P"="SPBP")) %>% parseLipidNames(.)
We will receive a number of warnings in the next step, since lipidr currently checks lipid class names against a predefined, internal list that does not contain the updated class names according to the new shorthand nomenclature. The updated Molecule names will however appear in downstream visualizations.
updated <- update_molecule_names(d, old_names, normalized_new_names$Normalized.Name)
#> Joining with `by = join_by(Molecule)`
#> Warning in annotate_lipids(updated_names$Molecule): Some lipid names couldn't be parsed because they don't follow the pattern 'CLS xx:x/yy:y'
#> PE O-34:0, PE O-34:1, PE O-34:2, PE O-36:0, PE O-36:1, PE O-36:2, PE O-36:3, PE O-36:4, PE O-36:5, PE O-38:0, PE O-38:1, PE O-38:2, PE O-38:3, PE O-38:4, PE O-38:5, PE O-40:5, PE O-40:6, PE O-40:7, PE O-32:1, PE O-34:3, PE O-38:6, PE O-38:7, PI O-34:1, SPBP 18:0;OH, SPBP 17:1;O2, SPBP 18:1;O2, SM 18:1;O2, SM 18:0;O2, Cer 18:0;O2, Cer 18:1;O2, Cer 18:2;O2, PC O-32:0, PC O-34:1, PC O-34:2, PC O-34:3, PC O-36:2, PC O-36:3, PC O-36:4, PC O-36:5, PC O-38:0, PC O-38:2, PC O-38:3, PC O-38:4, PC O-38:5, PC O-40:0, PC O-40:1, PC O-40:2, PC O-40:3, PC O-40:4, PC O-40:5, PC O-40:6, PC O-40:7, PC O-32:1, PC O-34:4, PC O-36:6, PC O-38:1, PC O-38:6, PC O-38:7
#> Joining with `by = join_by(Molecule)`
We can augment the class column using rgoslin’s LipidMaps main class, since row order in rgoslin’s output is the same as in its input. Additionally, further information may be interesting to include, such as the mass and sum formula of the uncharged lipids. We will select the same lipid classes as in the lipidr targeted lipidomics vignette: Ceramides (Cer), Lyso-Phosphatidylcholines (LPC) and Phosphatidylcholines (PC).
rowData(updated)$Class <- normalized_new_names$Lipid.Maps.Main.Class
rowData(updated)$Category <- normalized_new_names$Lipid.Maps.Category
rowData(updated)$Molecule <- normalized_new_names$Normalized.Name
rowData(updated)$LipidSpecies <- normalized_new_names$Species.Name
rowData(updated)$Mass <- normalized_new_names$Mass
rowData(updated)$SumFormula <- normalized_new_names$Sum.Formula
# select Ceramides, Lyso-Phosphatidylcholines and Phosphatidylcholines (includes plasmanyls and plasmenyls)
lipid_classes <- rowData(updated)$Class %in% c("Cer","LPC", "PC")
d <- updated[lipid_classes,]
In the next step, we use a non-exported method provided by lipidr to convert the row data into a format more suitable for plotting with ggplot. We will plot the area distribution of lipid species as boxplots, colored by lipid class and facetted by filename, similar to some plot examples in lipidr’s vignette.
ddf <- lipidr:::to_long_format(d)
ggplot(data=ddf, mapping=aes(x=Molecule, y=Area, fill=Class)) + geom_boxplot() + facet_wrap(~filename, scales = "free_y") + scale_y_log10() + coord_flip()
#> Warning in scale_y_log10(): log-10 transformation introduced infinite values.
#> Warning: Removed 15 rows containing non-finite outside the scale range
#> (`stat_boxplot()`).
In the same manner, other columns of lipidr’s rowData
can be updated with columns from rgoslin’s output data frame. Further downstream processing is then possible with lipidr’s own functions.
If you find any issues with the library, would like to have other functionality included, require help or would like to contribute, please contact us via our GitHub Project or via the LIFS support page.
#> R Under development (unstable) (2024-10-21 r87258)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] ggplot2_3.5.1 stringr_1.5.1
#> [3] lipidr_2.21.0 SummarizedExperiment_1.37.0
#> [5] Biobase_2.67.0 GenomicRanges_1.59.0
#> [7] GenomeInfoDb_1.43.0 IRanges_2.41.0
#> [9] S4Vectors_0.45.0 BiocGenerics_0.53.0
#> [11] MatrixGenerics_1.19.0 matrixStats_1.4.1
#> [13] kableExtra_1.4.0 knitr_1.48
#> [15] dplyr_1.1.4 rgoslin_1.11.0
#> [17] BiocStyle_2.35.0
#>
#> loaded via a namespace (and not attached):
#> [1] tidyselect_1.2.1 viridisLite_0.4.2
#> [3] farver_2.1.2 fastmap_1.2.0
#> [5] digest_0.6.37 lifecycle_1.0.4
#> [7] statmod_1.5.0 magrittr_2.0.3
#> [9] compiler_4.5.0 rlang_1.1.4
#> [11] sass_0.4.9 tools_4.5.0
#> [13] utf8_1.2.4 yaml_2.3.10
#> [15] calibrate_1.7.7 data.table_1.16.2
#> [17] S4Arrays_1.7.0 DelayedArray_0.33.0
#> [19] xml2_1.3.6 ropls_1.39.0
#> [21] abind_1.4-8 norm_1.0-11.1
#> [23] purrr_1.0.2 withr_3.0.2
#> [25] grid_4.5.0 fansi_1.0.6
#> [27] colorspace_2.1-1 MASS_7.3-61
#> [29] scales_1.3.0 tinytex_0.53
#> [31] MultiAssayExperiment_1.33.0 cli_3.6.3
#> [33] mvtnorm_1.3-1 tmvtnorm_1.6
#> [35] rmarkdown_2.28 crayon_1.5.3
#> [37] generics_0.1.3 rstudioapi_0.17.1
#> [39] httr_1.4.7 cachem_1.1.0
#> [41] zlibbioc_1.53.0 impute_1.81.0
#> [43] BiocManager_1.30.25 XVector_0.47.0
#> [45] vctrs_0.6.5 Matrix_1.7-1
#> [47] sandwich_3.1-1 jsonlite_1.8.9
#> [49] bookdown_0.41 imputeLCMD_2.1
#> [51] qqman_0.1.9 magick_2.8.5
#> [53] systemfonts_1.1.0 limma_3.63.0
#> [55] tidyr_1.3.1 jquerylib_0.1.4
#> [57] MultiDataSet_1.35.0 glue_1.8.0
#> [59] stringi_1.8.4 gmm_1.8
#> [61] gtable_0.3.6 UCSC.utils_1.3.0
#> [63] munsell_0.5.1 tibble_3.2.1
#> [65] pillar_1.9.0 pcaMethods_1.99.0
#> [67] htmltools_0.5.8.1 GenomeInfoDbData_1.2.13
#> [69] R6_2.5.1 evaluate_1.0.1
#> [71] lattice_0.22-6 highr_0.11
#> [73] bslib_0.8.0 Rcpp_1.0.13
#> [75] svglite_2.1.3 SparseArray_1.7.0
#> [77] xfun_0.48 zoo_1.8-12
#> [79] forcats_1.0.0 pkgconfig_2.0.3