GitHub - Asa12138/ReporterScore: Generalized Reporter Score-based Functional Enrichment Method for Omics Data (original) (raw)

Read this in other languages: English,中文.

ReporterScore

Inspired by the classic ‘RSA’, we developed the improved ‘Generalized Reporter Score-based Analysis (GRSA)’ method, implemented in the R package ‘ReporterScore’, along with comprehensive visualization methods and pathway databases.

‘GRSA’ is a threshold-free method that works well with all types of biomedical features, such as genes, chemical compounds, and microbial species. Importantly, the ‘GRSA’ supports multi-group and longitudinal experimental designs, because of the included multi-group-compatible statistical methods.

The HTML documentation of the latest version is available at Github page.

Citation

To cite ReporterScore in publications use:

C. Peng, Q. Chen, S. Tan, X. Shen, C. Jiang, Generalized Reporter Score-based Enrichment Analysis for Omics Data. Briefings in Bioinformatics (2024). https://doi.org/10.1093/bib/bbae116.

Installation

You can install the released version of ReporterScore fromCRAN with:

install.packages("ReporterScore")

You can install the development version of ReporterScore fromGitHub with:

install.packages("devtools")

devtools::install_github("Asa12138/pcutils") devtools::install_github("Asa12138/ReporterScore")

Usage

1. Inputdata (Feature abundance table and metadata)

For transcriptomic, scRNA-seq, and related gene-based omics data of a specific species, a complete gene abundance table can be used.
For metagenomic and metatranscriptomic data, which involve many different species, a KO abundance table can be used, generated using Blast, Diamond, or KEGG official mapper software to align the reads or contigs to the KEGG or the EggNOG database
For metabolomic data, an annotated compound abundance table can be used, but the standardization of compound IDs (e.g., convert compound IDs to C numbers in the KEGG database) is required.

Format of abundance table:

⚠️Importantly, the input abundance table should not be prefiltered to retain the background information, as the ‘GRSA’ is a threshold-free method.

The rownames are feature ids (e.g. “K00001” (KEGG K number) if feature=“ko”; “PEX11A” (gene symbol) if feature=“gene”; “C00024” (KEGG C number) if feature=“compound”).
The colnames are samples.
The abundance value can be read counts or normalized values (e.g., TPM, FPKM, RPKM, or relative abundance, corresponds to suitable statistical test method).

An example code tailored for a KO abundance table is as follows:

data("KO_abundance_test") head(KO_abundance[, 1:6]) #> WT1 WT2 WT3 WT4 WT5 WT6 #> K03169 0.002653545 0.005096380 0.002033923 0.000722349 0.003468322 0.001483028 #> K07133 0.000308237 0.000280458 0.000596527 0.000859854 0.000308719 0.000878098 #> K03088 0.002147068 0.002030742 0.003797459 0.004161979 0.002076596 0.003091182 #> K03530 0.003788366 0.000239298 0.000445817 0.000557271 0.000222969 0.000529624 #> K06147 0.000785654 0.001213630 0.001312569 0.001662740 0.002387006 0.001725797 #> K05349 0.001816325 0.002813642 0.003274701 0.001089906 0.002371921 0.001795214

And you should also offer a experimental metadata:

Format of metadata table:

The rownames are samples, columns are groups.
The grouping variable can be categories (at least two categories, for differential abundance analysis).
The grouping variable can also be multiple time points (for longitudinal analysis).
The grouping variable can also be continuous (for correlation analysis).

head(metadata) #> Group Group2 #> WT1 WT G3 #> WT2 WT G3 #> WT3 WT G3 #> WT4 WT G3 #> WT5 WT G3 #> WT6 WT G1

⚠️Importantly, the rownames of metadata and the colnames of feature abundance table should be matching or partial matching!

The ReporterScore will automatically match the samples based on the rownames of metadata and the colnames of feature abundance table.

all(rownames(metadata) %in% colnames(KO_abundance))

TRUE

intersect(rownames(metadata), colnames(KO_abundance))>0

TRUE

2. Pathway databases

The ReporterScore package has built-in KEGG pathway, module, gene, compound, and GO databases and also allows customized databases, making it compatible with feature abundance tables from diverse omics data.

You can choose any of the following methods to load the database based on your analysis needs:

ReporterScore has built-in KEGG pathway-KO and module-KO databases (2023-08 version) for KO abundance table. You can useload_KOlist() to have a look and use update_KO_file() to update these databases (by KEGG API) as using latest database is very important.
ReporterScore has built-in KEGG pathway-compound and module-compound databases (2023-08 version) for compound abundance table. You can use load_CPDlist() to have a look and useupdate_KO_file() to update these databases (by KEGG API).
ReporterScore has built-in pathway-ko, pathway-gene, and pathway-compound databases of human (hsa) and mouse (mmu) for ko/gene/compound abundance table. You can usecustom_modulelist_from_org() to have a look. Useupdate_org_pathway() to update these databases and download other organism databases (by KEGG API).
ReporterScore has built-in GO-gene database, You can useload_GOlist() to have a look and use update_GOlist() to update these databases (by KEGG API).
You can just customize your own pathway databases (gene set of interest) by using custom_modulelist().

1. KEGG pathway-KO and module-KO databases

KOlist <- load_KOlist() head(KOlist$pathway)

2. KEGG pathway-compound and module-compound databases

CPDlist <- load_CPDlist() head(CPDlist$pathway)

3. human (hsa) pathway-ko/gene/compound databases

hsa_pathway_gene <- custom_modulelist_from_org( org = "hsa", feature = c("ko", "gene", "compound")[2] ) head(hsa_pathway_gene)

4. GO-gene database

GOlist <- load_GOlist() head(GOlist$BP)

5. customize your own pathway databases

?custom_modulelist()

3. One step GRSA

Use function GRSA or reporter_score can get the reporter score result by one step.

⚠️There are some important arguments for analysis:

mode: “mixed” or “directed” (directed mode only for two groups differential analysis or multi-groups correlation analysis.), see details in pvalue2zs.
method: the method of statistical test for calculating p-value. Default is wilcox.test:
- t.test (parametric) and wilcox.test (non-parametric). Perform comparison between two groups of samples.
- anova (parametric) and kruskal.test (non-parametric). Perform one-way ANOVA test or Kruskal-Wallis rank sum test comparing multiple groups.
- “pearson”, “kendall”, or “spearman” (correlation), see ?cor.
- “none”: use “none” for step by step enrichment. You can calculate the p-value by other methods like “DESeq2”, “Edger”, “Limma”, “ALDEX”, “ANCOM” yourself.
type: choose the built-in pathway database:
- ‘pathway’ or ‘module’ for default KEGG database for microbiome.
- ‘CC’, ‘MF’, ‘BP’, ‘ALL’ for default GO database for homo sapiens.
- org in listed in https://www.genome.jp/kegg/catalog/org_list.htmlsuch as ‘hsa’ (if your kodf is come from a specific organism, you should specify type here)
modulelist: customize database. A dataframe containing ‘id’,‘K_num’,‘KOs’,‘Description’ columns. Take the KOlist as example, use custom_modulelist to build a customize database.
feature: one of “ko”, “gene”, “compound”.

The first level will be set as the control group, you can change the factor level to change your comparison.

For example, we want to compare two groups ‘WT-OE’, and use the “directed” mode as we just want know which pathways are enriched or depleted in OE group:

KO-pathway

cat("Comparison: ", levels(factor(metadata$Group)), "\n") #> Comparison: WT OE

for microbiome!!!

reporter_res <- GRSA(KO_abundance, "Group", metadata, mode = "directed", method = "wilcox.test", perm = 999, type = "pathway", feature = "ko" ) #> ================================Use feature: ko================================= #> ===============================Checking rownames================================ #> Some of your ko_stat are not KO id, check the format! (e.g. K00001) #> 52.7% of your kos in the modulelist! #> 30 samples are matched for next step. #> ===========================Removing all-zero rows: 0============================ #> ===================================1.KO test==================================== #> =================================Checking group================================= #> 30 samples are matched for next step. #> ===========================Removing all-zero rows: 0============================ #> ==============================Calculating each KO=============================== #> ===========================Using method: wilcox.test============================ #> 1000 features done. #> 2000 features done. #> 3000 features done. #> 4000 features done. #> #> Compared groups: WT, OE #> Total KO number: 4535 #> Compare method: wilcox.test #> Time use: 1.179 #> =========================2.Transfer p.value to Z-score========================== #> ==========================3.Calculating reporter score========================== #> ==================================load KOlist=================================== #> ===================KOlist download time: 2023-08-14 16:00:52==================== #> If you want to update KOlist, use update_KO_file() #> ============================Calculating each pathway============================ #> 100 pathways done. #> 400 pathways done. #> ID number: 481 #> Time use: 1.689 #> ====================================All done====================================

Gene-pathway

When you use the gene abundance table of a specific species (e.g. human), remember to set the feature and type!!! Or give the database through modulelist.

data("genedf")

Method 1: Set the `feature` and `type`!

reporter_res_gene <- GRSA(genedf, "Group", metadata, mode = "directed", method = "wilcox.test", perm = 999, type = "hsa", feature = "gene" ) #> ===============================Use feature: gene================================ #> ===============================Checking rownames================================ #> please make sure your input table rows are gene symbol! #> 100% of your genes in the modulelist! #> 30 samples are matched for next step. #> ===========================Removing all-zero rows: 0============================ #> ===================================1.KO test==================================== #> =================================Checking group================================= #> 30 samples are matched for next step. #> ===========================Removing all-zero rows: 0============================ #> ==============================Calculating each KO=============================== #> ===========================Using method: wilcox.test============================ #> 1000 features done. #> #> Compared groups: WT, OE #> Total KO number: 1000 #> Compare method: wilcox.test #> Time use: 0.224 #> =========================2.Transfer p.value to Z-score========================== #> ==========================3.Calculating reporter score========================== #> ================================load hsa pathway================================ #> =================hsa pathway download time: 2023-08-14 23:28:13================= #> If you want to update hsa pathway, use update_org_pathway('hsa') #> please assgin this custom modulelist to reporter_score(modulelist=your_modulelist) to do a custom enrichment. #> You choose the feature: 'gene', make sure the rownames of your input table are right. #> ============================Calculating each pathway============================ #> 100 pathways done. #> 150 pathways done. #> 200 pathways done. #> 250 pathways done. #> 300 pathways done. #> ID number: 343 #> Time use: 1.558 #> ====================================All done====================================

Method 2: Give the database through `modulelist`, same to Method 1.

hsa_pathway_gene <- custom_modulelist_from_org( org = "hsa", feature = "gene" )

reporter_res_gene <- GRSA(genedf, "Group", metadata, mode = "directed", method = "wilcox.test", perm = 999, modulelist = hsa_pathway_gene )

library(patchwork) p1 <- plot_report_bar(reporter_res_gene, rs_threshold = 2)

Use `modify_description` to remove the suffix of pathway description

reporter_res_gene2 <- modify_description(reporter_res_gene, pattern = " - Homo sapiens (human)") p2 <- plot_report_bar(reporter_res_gene2, rs_threshold = 2)

Use `ggplot_translator` to translate pathway description

p3 <- pcutils::ggplot_translator(p2) #> Please set the font family to make the labels display well. #> see how_to_set_font_for_plot().

p1 / p2 / p3

Compound-pathway

When you use the compound abundance table, remember to set the featureand type!!! Or give the database through modulelist.

reporter_res_gene <- GRSA(chem_df, "Group", metadata, mode = "directed", method = "wilcox.test", perm = 999, type = "hsa", feature = "compound" )

Output

The result is a “reporter_score” object:

elements	description
kodf	your input KO_abundance table
ko_stat	ko statistics result contains p.value and z_score
reporter_s	the reporter score in each pathway
modulelist	default KOlist or customized modulelist dataframe
group	The comparison groups in your data
metadata	sample information dataframe contains group

The important result is reporter_res$reporter_s, which is a dataframe contains the reporter score in each pathway:

view data.frame in Rstudio

View(reporter_res$reporter_s)

export result as .csv and check using Excel:

export_report_table(reporter_res, dir_name = "~/Downloads/")

4. Visualization

After we get the reporter score result, we can visualize the result in various ways.

When we focus on the whole result:

Plot the most significantly enriched pathways:

You can set the rs_threshold to filter the pathways, the defaultrs_threshold is 1.64, which is be considered as significant at the 0.05 level.

View(reporter_res$reporter_s)

plot_report_bar(reporter_res, rs_threshold = c(-2.5, 2.5), facet_level = TRUE) #> ==============================load Pathway_htable=============================== #> ===============Pathway_htable download time: 2024-01-12 00:52:39================ #> If you want to update Pathway_htable, use update_htable(type='pathway')

⚠️In the directed mode, Enriched in one group means depleted in another group.

Plot the most significantly enriched pathways (circle packing):

plot_report_circle_packing(reporter_res, rs_threshold = c(-2.5, 2.5)) #> ==============================load Pathway_htable=============================== #> ===============Pathway_htable download time: 2024-01-12 00:52:39================ #> If you want to update Pathway_htable, use update_htable(type='pathway') #> Non-leaf weights ignored #> Scale for fill is already present. #> Adding another scale for fill, which will replace the existing scale.

When we focus on one pathway, e.g. “map00780”:

Plot boxes and lines

plot_KOs_in_pathway(reporter_res, map_id = "map00780")

Plot the distribution of Z-scores

plot_KOs_distribution(reporter_res, map_id = "map00780")

Plot as a network:

plot_KOs_network(reporter_res, map_id = c("map00780", "map00785", "map00900"), main = "", mark_module = TRUE )

Plot the KOs abundance in a pathway:

plot_KOs_box(reporter_res, map_id = "map00780", only_sig = TRUE) #> geom_smooth() using formula = 'y ~ x'

Plot the KOs abundance in a pathway (heatmap):

plot_KOs_heatmap(reporter_res, map_id = "map00780", only_sig = TRUE, heatmap_param = list(cutree_rows = 2) )

Plot the KEGG pathway map:

plot_KEGG_map(reporter_res, map_id = "map00780", color_var = "Z_score")

Example for multi-group or longitudinal

If our experimental design is more than two groups or longitudinal, we can choose multi-groups comparison (or correlation):

cat("Comparison: ", levels(factor(metadata$Group2))) #> Comparison: G1 G2 G3

reporter_res2 <- GRSA(KO_abundance, "Group2", metadata, mode = "directed", method = "spearman", perm = 999 ) #> ================================Use feature: ko================================= #> ===============================Checking rownames================================ #> Some of your ko_stat are not KO id, check the format! (e.g. K00001) #> 52.7% of your kos in the modulelist! #> 30 samples are matched for next step. #> ===========================Removing all-zero rows: 0============================ #> ===================================1.KO test==================================== #> =================================Checking group================================= #> 30 samples are matched for next step. #> ===========================Removing all-zero rows: 0============================ #> ==============================Calculating each KO=============================== #> =============================Using method: spearman============================= #> Using correlation analysis: spearman, the groups will be transform to numeric, note the factor feature of group. #> 1000 features done. #> 2000 features done. #> 3000 features done. #> 4000 features done. #> #> Compared groups: G1, G2, G3 #> Total KO number: 4535 #> Compare method: spearman #> Time use: 0.549 #> =========================2.Transfer p.value to Z-score========================== #> ==========================3.Calculating reporter score========================== #> ==================================load KOlist=================================== #> ===================KOlist download time: 2023-08-14 16:00:52==================== #> If you want to update KOlist, use update_KO_file() #> ============================Calculating each pathway============================ #> 100 pathways done. #> 400 pathways done. #> ID number: 481 #> Time use: 1.643 #> ====================================All done====================================

plot_KOs_in_pathway(reporter_res2, map_id = "map02060") + scale_y_log10()

Example for specified pattern

For example, groups “G1”, “G2”, and “G3” can be set as 1, 10, and 100 if an exponentially increasing trend is expected.

We use 1,5,1 to found pathways with the down-up-down pattern

reporter_res3 <- GRSA(KO_abundance, "Group2", metadata, mode = "directed", perm = 999, method = "pearson", pattern = c("G1" = 1, "G2" = 5, "G3" = 1) ) plot_report_bar(reporter_res3, rs_threshold = 3, show_ID = TRUE)

plot_KOs_in_pathway(reporter_res3, map_id = "map00860")

To explore potential patterns within the data, clustering methods, such as C-means clustering, can be used.

rsa_cm_res <- RSA_by_cm(KO_abundance, "Group2", metadata, method = "pearson", k_num = 3, perm = 999 )

show the patterns

plot_c_means(rsa_cm_res, filter_membership = 0.7)

plot_report_bar(rsa_cm_res, rs_threshold = 2.5, y_text_size = 10)

Details

Step by step

The one step function reporter_score/GRSA consists of three parts：

data("KO_abundance_test") ko_pvalue <- ko.test(KO_abundance, "Group", metadata, method = "wilcox.test") ko_stat <- pvalue2zs(ko_pvalue, mode = "directed") reporter_s1 <- get_reporter_score(ko_stat, perm = 499)

ko.test: this function help to calculate p-value for KO_abundance by various built-in methods such as differential analysis (t.test, wilcox.test, kruskal.test, anova) or correlation analysis (pearson, spearman, kendall). You can also calculate this p-value for KO_abundance by other methodslike “DESeq2”, “Edger”, “Limma”, “ALDEX”, “ANCOM” and do a p.adjust yourself, then skip ko.test step go to step2…
pvalue2zs: this function transfers p-value of KOs to Z-score (select mode: “mixed” or “directed”).
get_reporter_score this function calculate reporter score of each pathways in a specific database. You can use a custom database here.

Take the “Limma” as an example:

1-1. Calculate p-value by Limma

ko_pvalue <- ko.test(KO_abundance, "Group", metadata, method = "none")

ko_Limma_p <- pctax::diff_da(KO_abundance, group_df = metadata["Group"], method = "limma")

1-2. Replace the p-value in ko_pvalue, remember to match the KO_ids

ko_pvalue$p.value <- ko_Limma_p[match(ko_pvalue$KO_id, ko_Limma_p$tax), "pvalue"]

2. Use `pvalue2zs` to get Z-score

ko_stat <- pvalue2zs(ko_pvalue, mode = "directed")

3. Use `get_reporter_score` to get reporter score

reporter_s1 <- get_reporter_score(ko_stat, perm = 499)

4. Combine the result

reporter_res1 <- combine_rs_res(KO_abundance, "Group", metadata, ko_stat, reporter_s1)

Then the reporter_res1 can be used for visualization

Other commonly used enrichment methods

Category	Method	Tools	Notes
ORA	Hypergeometric test / Fisher’s exact test	DAVID (website) , clusterProfiler (R package)	The most common methods used in enrichment analysis. Selecting a list of genes is required.
FCS	Gene set enrichment analysis (GSEA)	GSEA (website)	GSEA creatively uses gene ranking, rather than selecting a list of genes, to identify statistically significant and concordant differences across gene sets.
FCS	Generalized reporter score-based analysis (GRSA/RSA)	ReporterScore (R package developed in this study)	Find significant metabolites (first report), pathways, and taxonomy based on the p-values for multi-omics data.
FCS	Significance Analysis of Function and Expression (SAFE)	safe (R package)	SAFE assesses the significance of gene categories by calculating both local and global statistics from gene expression data.
FCS	Gene Set Analysis (GSA)	GSA (R Package)	GSA was proposed as an improvement of GSEA, using the “maxmean” statistic instead of the weighted sign KS statistic.
FCS	Pathway Analysis with Down-weighting of Overlapping Genes (PADOG)	PADOG (R package)	PADOGA assumes that genes associated with fewer pathways have more significant effects than genes associated with more pathways.
FCS	Gene Set Variation Analysis (GSVA)	GSVA (R package)	A nonparametric, unsupervised method that transforms gene expression data into gene set scores for downstream differential pathway activity analysis.
PT	Topology-based pathway enrichment analysis (TPEA)	TPEA (R package)	Integrate topological properties and global upstream/downstream positions of genes in pathways.

Commonly used enrichment methods for omics data.

ReporterScore also provides other enrichment methods likeKO_fisher(fisher.test), KO_enrich(fisher.test, fromclusterProfiler), KO_gsea (GSEA, from clusterProfiler), KO_gsa(GSA, from GSA), KO_safe (SAFE, from safe), KO_padog (PADOG, from PADOG), KO_gsva (GSVA, from GSVA).

The input data is from reporter_score, and also supports custom databases, so you can easily compare the results of various enrichment methods and conduct a comprehensive analysis:

View(reporter_res2$reporter_s)

reporter_score

filter(reporter_res$reporter_s, abs(ReporterScore) > 1.64, p.adjust < 0.05) %>% pull(ID) -> RS

fisher

fisher_res <- KO_fisher(reporter_res) filter(fisher_res, p.adjust < 0.05) %>% pull(ID) -> Fisher

enricher

enrich_res <- KO_enrich(reporter_res) filter(enrich_res, p.adjust < 0.05) %>% pull(ID) -> clusterProfiler

GESA

set.seed(1234) gsea_res <- KO_gsea(reporter_res, weight = "Z_score") #> Warning in preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam, : There are ties in the preranked stats (57.15% of the list). #> The order of those tied genes will be arbitrary, which may produce unexpected results. #> Warning in fgseaMultilevel(pathways = pathways, stats = stats, minSize = #> minSize, : For some pathways, in reality P-values are less than 1e-10. You can #> set the eps argument to zero for better estimation. filter(data.frame(gsea_res), p.adjust < 0.05) %>% pull(ID) -> GSEA

venn_res <- list(GRSA = RS, Fisher = Fisher, CP = clusterProfiler, GSEA = GSEA) library(pcutils) venn(venn_res, "network")

Other features

uplevel the KOs

KEGG BRITE is a collection of hierarchical classification systems capturing functional hierarchies of various biological objects, especially those represented as KEGG objects.

We collected k00001 KEGG Orthology (KO) table so that you can summaries each levels abundance. Use load_KO_htable to get KO_htable and useupdate_KO_htable to update. Use up_level_KO can upgrade to specific level in one of “pathway”, “module”, “level1”, “level2”, “level3”, “module1”, “module2”, “module3”.

KO_htable <- load_KO_htable() #> =================================load KO_htable================================= #> ==================KO_htable download time: 2024-01-12 00:49:03================== #> If you want to update KO_htable, use update_htable(type='ko') head(KO_htable) #> level1_name level2_name level3_id level3_name #> 1 Metabolism Carbohydrate metabolism map00010 Glycolysis / Gluconeogenesis #> 2 Metabolism Carbohydrate metabolism map00010 Glycolysis / Gluconeogenesis #> 3 Metabolism Carbohydrate metabolism map00010 Glycolysis / Gluconeogenesis #> 4 Metabolism Carbohydrate metabolism map00010 Glycolysis / Gluconeogenesis #> 5 Metabolism Carbohydrate metabolism map00010 Glycolysis / Gluconeogenesis #> 6 Metabolism Carbohydrate metabolism map00010 Glycolysis / Gluconeogenesis #> KO_id KO_name #> 1 K00844 HK; hexokinase [EC:2.7.1.1] #> 2 K12407 GCK; glucokinase [EC:2.7.1.2] #> 3 K00845 glk; glucokinase [EC:2.7.1.2] #> 4 K25026 glk; glucokinase [EC:2.7.1.2] #> 5 K01810 GPI, pgi; glucose-6-phosphate isomerase [EC:5.3.1.9] #> 6 K06859 pgi1; glucose-6-phosphate isomerase, archaeal [EC:5.3.1.9] plot_htable(type = "ko") #> =================================load KO_htable================================= #> ==================KO_htable download time: 2024-01-12 00:49:03================== #> If you want to update KO_htable, use update_htable(type='ko') #> Warning: Vectorized input to element_text() is not officially supported. #> ℹ Results may be unexpected or may change in future versions of ggplot2.

KO_level1 <- up_level_KO(KO_abundance, level = "level1", show_name = TRUE) #> =================================load KO_htable================================= #> ==================KO_htable download time: 2024-01-12 00:49:03================== #> If you want to update KO_htable, use update_htable(type='ko') pcutils::stackplot(KO_level1[-which(rownames(KO_level1) == "Unknown"), ]) + ggsci::scale_fill_d3() + theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))

CARD for ARGs

For convenience, I also included the CARD database fromhttps://card.mcmaster.ca/download/0/broadstreet-v4.0.0.tar.bz2 .

CARDinfo <- load_CARDinfo() #> =================================load CARDinfo================================== #> ==================CARDinfo download time: 2024-01-12 01:12:11=================== #> If you want to update CARDinfo, use update_GOlist() head(CARDinfo$ARO_index) #> ARO Accession CVTERM ID Model Sequence ID Model ID #> 3005099 ARO:3005099 43314 6143 3831 #> 3002523 ARO:3002523 38923 8144 1781 #> 3002524 ARO:3002524 38924 85 746 #> 3002525 ARO:3002525 38925 4719 1246 #> 3002526 ARO:3002526 38926 228 1415 #> 3002527 ARO:3002527 38927 5510 2832 #> Model Name #> 3005099 23S rRNA (adenine(2058)-N(6))-methyltransferase Erm(A) #> 3002523 AAC(2')-Ia #> 3002524 AAC(2')-Ib #> 3002525 AAC(2')-Ic #> 3002526 AAC(2')-Id #> 3002527 AAC(2')-Ie #> ARO Name #> 3005099 23S rRNA (adenine(2058)-N(6))-methyltransferase Erm(A) #> 3002523 AAC(2')-Ia #> 3002524 AAC(2')-Ib #> 3002525 AAC(2')-Ic #> 3002526 AAC(2')-Id #> 3002527 AAC(2')-Ie #> Protein Accession DNA Accession AMR Gene Family #> 3005099 AAB60941.1 AF002716.1 Erm 23S ribosomal RNA methyltransferase #> 3002523 AAA03550.1 L06156.2 AAC(2') #> 3002524 AAC44793.1 U41471.1 AAC(2') #> 3002525 CCP42991.1 AL123456.3 AAC(2') #> 3002526 AAB41701.1 U72743.1 AAC(2') #> 3002527 CAC32082.1 AL583926.1 AAC(2') #> Drug Class #> 3005099 lincosamide antibiotic;macrolide antibiotic;streptogramin antibiotic #> 3002523 aminoglycoside antibiotic #> 3002524 aminoglycoside antibiotic #> 3002525 aminoglycoside antibiotic #> 3002526 aminoglycoside antibiotic #> 3002527 aminoglycoside antibiotic #> Resistance Mechanism CARD Short Name #> 3005099 antibiotic target alteration Spyo_ErmA_MLSb #> 3002523 antibiotic inactivation AAC(2')-Ia #> 3002524 antibiotic inactivation AAC(2')-Ib #> 3002525 antibiotic inactivation AAC(2')-Ic #> 3002526 antibiotic inactivation AAC(2')-Id #> 3002527 antibiotic inactivation AAC(2')-Ie

CAZy for CAZymes

For convenience, I also included the CAZy database fromhttps://bcb.unl.edu/dbCAN2/download/Databases/V12/ .

CAZy_info <- load_CAZy_info() #> =================================load CAZy_info================================= #> ==================CAZy_info download time: 2025-03-14 01:10:20================== #> If you want to update CAZy_info, use update_CAZy_info() head(CAZy_info$fam_subfam_ec) #> Sub_family Accession EC #> 1 AA1_1 AAA33103.1 1.10.3.2 #> 2 AA1_1 AAA33104.1 1.10.3.2 #> 3 AA1_1 AAA86659.1 1.10.3.2 #> 4 AA1_1 AAC41686.1 1.10.3.2 #> 5 AA1_1 AAC41687.1 1.10.3.2 #> 6 AA1_1 AAC49828.1 1.10.3.2

Enzymes

For convenience, I also included the Enzymes database fromhttps://ftp.expasy.org/databases/enzyme/ .

Enzyme_info <- load_Enzyme_info() #> ================================load Enzyme_info================================ #> =================Enzyme_info download time: 2025-03-24 14:51:45================= #> If you want to update Enzyme_info, use update_Enzyme_info() head(Enzyme_info$enzclass_df) #> EC Description #> 1 1.-.-.- Oxidoreductases #> 2 1.1.-.- Acting on the CH-OH group of donors #> 3 1.1.1.- With NAD(+) or NADP(+) as acceptor #> 4 1.1.2.- With a cytochrome as acceptor #> 5 1.1.3.- With oxygen as acceptor #> 6 1.1.4.- With a disulfide as acceptor

Or use the KEGG enzyme htable:

load_Enzyme_htable() -> Enzyme_htable library(pctax) ann_tree(distinct(Enzyme_htable[, 1:3])) -> Enzyme_tr easy_tree(Enzyme_tr, highlight = "Enzyme1_name", add_abundance = F, add_tiplab = F)

Reference

Patil, K. R. & Nielsen, J. Uncovering transcriptional regulation of metabolism by using metabolic network topology. Proc Natl Acad Sci U S A 102, 2685–2689 (2005).
L. Liu, R. Zhu, D. Wu, Misuse of reporter score in microbial enrichment analysis. iMeta. 2, e95 (2023).
https://github.com/wangpeng407/ReporterScore

GitHub - Asa12138/ReporterScore: Generalized Reporter Score-based Functional Enrichment Method for Omics Data (original) (raw)

ReporterScore

Citation

Installation

install.packages("devtools")

Usage

1. Inputdata (Feature abundance table and metadata)

Format of abundance table:

Format of metadata table:

TRUE

TRUE

2. Pathway databases

1. KEGG pathway-KO and module-KO databases

2. KEGG pathway-compound and module-compound databases

3. human (hsa) pathway-ko/gene/compound databases

4. GO-gene database

5. customize your own pathway databases

3. One step GRSA

KO-pathway

for microbiome!!!

Gene-pathway

Method 1: Set the feature and type!

Method 2: Give the database through modulelist, same to Method 1.

Use modify_description to remove the suffix of pathway description

Use ggplot_translator to translate pathway description

Compound-pathway

Output

view data.frame in Rstudio

export result as .csv and check using Excel:

4. Visualization

View(reporter_res$reporter_s)

Example for multi-group or longitudinal

Example for specified pattern

show the patterns

Details

Step by step

1-1. Calculate p-value by Limma

1-2. Replace the p-value in ko_pvalue, remember to match the KO_ids

2. Use pvalue2zs to get Z-score

3. Use get_reporter_score to get reporter score

4. Combine the result

Then the reporter_res1 can be used for visualization

Other commonly used enrichment methods

View(reporter_res2$reporter_s)

reporter_score

fisher

enricher

GESA

Other features

uplevel the KOs

CARD for ARGs

CAZy for CAZymes

Enzymes

Reference

Method 1: Set the `feature` and `type`!

Method 2: Give the database through `modulelist`, same to Method 1.

Use `modify_description` to remove the suffix of pathway description

Use `ggplot_translator` to translate pathway description

2. Use `pvalue2zs` to get Z-score

3. Use `get_reporter_score` to get reporter score