Title: | microbiome biomarker analysis toolkit |
---|---|
Description: | To date, a number of methods have been developed for microbiome marker discovery based on metagenomic profiles, e.g. LEfSe. However, all of these methods have its own advantages and disadvantages, and none of them is considered standard or universal. Moreover, different programs or softwares may be development using different programming languages, even in different operating systems. Here, we have developed an all-in-one R package microbiomeMarker that integrates commonly used differential analysis methods as well as three machine learning-based approaches, including Logistic regression, Random forest, and Support vector machine, to facilitate the identification of microbiome markers. |
Authors: | Yang Cao [aut, cre] |
Maintainer: | Yang Cao <[email protected]> |
License: | GPL-3 |
Version: | 1.9.0 |
Built: | 2024-11-09 02:40:29 UTC |
Source: | https://github.com/yiluheihei/microbiomeMarker |
marker_table
objectOperators acting on marker_table
to extract parts.
## S4 method for signature 'marker_table,ANY,ANY,ANY' x[i, j, ..., drop = TRUE]
## S4 method for signature 'marker_table,ANY,ANY,ANY' x[i, j, ..., drop = TRUE]
x |
a |
i , j
|
elements to extract. |
... |
see |
drop |
ignored now. |
a marker_table
object.
Extract taxa abundances from phyloseq objects.
abundances(object, transform = c("identity", "log10", "log10p"), norm = FALSE) ## S4 method for signature 'otu_table' abundances(object, transform = c("identity", "log10", "log10p"), norm = FALSE) ## S4 method for signature 'phyloseq' abundances(object, transform = c("identity", "log10", "log10p"), norm = FALSE) ## S4 method for signature 'microbiomeMarker' abundances(object, transform = c("identity", "log10", "log10p"))
abundances(object, transform = c("identity", "log10", "log10p"), norm = FALSE) ## S4 method for signature 'otu_table' abundances(object, transform = c("identity", "log10", "log10p"), norm = FALSE) ## S4 method for signature 'phyloseq' abundances(object, transform = c("identity", "log10", "log10p"), norm = FALSE) ## S4 method for signature 'microbiomeMarker' abundances(object, transform = c("identity", "log10", "log10p"))
object |
|
transform |
transformation to apply, the options inclulde:
|
norm |
logical, indicating whether or not to return the normalized taxa abundances. |
abundance matrix with taxa in rows and samples in columns.
otu_table
, phyloseq
,
microbiomeMarker
,transform_abundances
data(caporaso) abd <- abundances(caporaso)
data(caporaso) abd <- abundances(caporaso)
Summarize phyloseq data into a higher phylogenetic level.
aggregate_taxa(x, level, verbose = FALSE)
aggregate_taxa(x, level, verbose = FALSE)
x |
|
level |
Summarization level (from |
verbose |
verbose |
This provides a convenient way to aggregate phyloseq OTUs (or other taxa) when the phylogenetic tree is missing. Calculates the sum of OTU abundances over all OTUs that map to the same higher-level group. Removes ambiguous levels from the taxonomy table. Returns a phyloseq object with the summarized abundances.
Summarized phyloseq object
Contact: Leo Lahti [email protected]
See citation('microbiome')
data(caporaso) caporaso_phylum <- aggregate_taxa(caporaso, "Phylum")
data(caporaso) caporaso_phylum <- aggregate_taxa(caporaso, "Phylum")
Assign a new OTU table in microbiomeMarker object
## S4 replacement method for signature 'microbiomeMarker,otu_table' otu_table(x) <- value ## S4 replacement method for signature 'microbiomeMarker,phyloseq' otu_table(x) <- value ## S4 replacement method for signature 'microbiomeMarker,microbiomeMarker' otu_table(x) <- value
## S4 replacement method for signature 'microbiomeMarker,otu_table' otu_table(x) <- value ## S4 replacement method for signature 'microbiomeMarker,phyloseq' otu_table(x) <- value ## S4 replacement method for signature 'microbiomeMarker,microbiomeMarker' otu_table(x) <- value
x |
|
value |
a microbiomeMarker
object.
Calculating power, false discovery rates, false positive rates and auc ( area under the receiver operating characteristic (ROC) curve) for various DA methods.
compare_DA( ps, group, taxa_rank = "none", methods, args = list(), n_rep = 20, effect_size = 5, k = NULL, relative = TRUE, BPPARAM = BiocParallel::SnowParam(progressbar = TRUE) )
compare_DA( ps, group, taxa_rank = "none", methods, args = list(), n_rep = 20, effect_size = 5, k = NULL, relative = TRUE, BPPARAM = BiocParallel::SnowParam(progressbar = TRUE) )
ps , group , taxa_rank
|
main arguments of all differential analysis
methods. |
methods |
character vector, differential analysis methods to be compared, available methods are "aldex", "ancom", "ancombc", "deseq2", "edger", "lefse", "limma_voom", "metagenomeseq", "simple_stat". |
args |
named list, which used to set the extra arguments of the
differential analysis methods, so the names must be contained in |
n_rep |
integer, number of times to run the differential analyses. |
effect_size |
numeric, the effect size for the spike-ins. Default 5. |
k |
numeric vector of length 3, number of features to spike in each
tertile (lower, mid, upper), e.g. |
relative |
logical, whether rescale the total number of individuals
observed for each sample to the original level after spike-in. Default
|
BPPARAM |
|
To make this function support for different arguments for a certain DA method
args
allows list of list of list e.g. args = list(lefse = list(list(norm = "CPM"), list(norm = "TSS")))
, which specify to compare the different norm
arguments for lefse analysis.
For taxa_rank
, only taxa_rank = "none"
is supported, if this argument is
not "none", it will be forced to "none" internally.
an compareDA
object, which contains a two-length list of:
metrics
: data.frame
, FPR, AUC and spike detection rate for each run.
mm
: differential analysis results.
Confounding variables may mask the actual differential features. This function utilizes constrained correspondence analysis (CCA) to measure the confounding factors.
confounder( ps, target_var, norm = "none", confounders = NULL, permutations = 999, ... )
confounder( ps, target_var, norm = "none", confounders = NULL, permutations = 999, ... )
ps |
a |
target_var |
character, the variable of interest |
norm |
norm the methods used to normalize the microbial abundance data. See
|
confounders |
the confounding variables to be measured, if |
permutations |
the number of permutations, see |
... |
extra arguments passed to |
a data.frame
contains three variables: confounder,
pseudo-F and p value.
data(caporaso) confounder(caporaso, "SampleType", confounders = "ReportedAntibioticUsage")
data(caporaso) confounder(caporaso, "SampleType", confounders = "ReportedAntibioticUsage")
16S read counts and phylogenetic tree file of 34 Illumina samples derived from Moving Pictures of the Human Microbiome (Caporaso et al.) Group label: gut, left palm, right palm, and tongue - indicating different sampled body sites.
a phyloseq::phyloseq object
Yang Cao
Data was downloaded from https://www.microbiomeanalyst.ca
Caporaso, et al. Moving pictures of the human microbiome. Genome Biol 12, R50 (2011).
https://doi.org/10.1186/gb-2011-12-5-r50
Data from a cohort of 94 Bone Marrow Transplant patients previously published on in CID
a phyloseq::phyloseq object
Yang Cao
https://github.com/ying14/yingtools2/tree/master/data
Ying, et al. Intestinal Domination and the Risk of Bacteremia in Patients Undergoing Allogeneic Hematopoietic Stem Cell Transplantation, Clinical Infectious Diseases, Volume 55, Issue 7, 1 October 2012, Pages 905–914,
https://academic.oup.com/cid/article/55/7/905/428203
The data from a subset of the Early Childhood Antibiotics and the Microbiome (ECAM) study, which tracked the microbiome composition and development of 43 infants in the United States from birth to 2 years of age, identifying microbiome associations with antibiotic exposure, delivery mode, and diet.
a phyloseq::phyloseq
object
Bokulich, Nicholas A., et al. "Antibiotics, birth mode, and diet shape microbiome maturation during early life." Science translational medicine 8.343 (2016): 343ra82-343ra82.
https://github.com/FrederickHuangLin/ANCOM/tree/master/data
The data contains 22 European metagenomes from Danish, French, Italian, and Spanish individuals, and 13 Japanese and 4 American.
a phyloseq::phyloseq
object
Yang Cao
Arumugam, Manimozhiyan, et al. Enterotypes of the human gut microbiome. nature 473.7346 (2011): 174-180.
The data from a study on colorectal cancer. Samples that had no DIAGNOSIS
attribute assigned and with less than 500 reads (counts) were removed, and
191 samples remains (91 healthy and 86 Tumors).
a phyloseq::phyloseq
object
Yang Cao
Kostic et al. Genomic analysis identifies association of Fusobacterium with colorectal carcinoma. Genome research, 2012, 22(2), 292-298.
A small subset of the HMP 16S dataset for finding biomarkers characterizing different level of oxygen availability in different bodysites
a phyloseq::phyloseq
object
Yang Cao
http://huttenhower.sph.harvard.edu/webfm_send/129
43 pediatric IBD stool samples obtained from the Integrative Human Microbiome Project Consortium (iHMP). Group label: CD and Controls.
a phyloseq::phyloseq
object
Yang Cao
https://www.microbiomeanalyst.ca/MicrobiomeAnalyst/resources
The dataset contains 30 abundance profiles (obtained processing the 16S reads with RDP) belonging to 10 rag2 (control) and 20 truc (case) mice.
a phyloseq::phyloseq
object
Yang Cao
http://www.huttenhower.org/webfm_send/73
This function extracts the results of posthoc test.
extract_posthoc_res(object, features = NULL)
extract_posthoc_res(object, features = NULL)
object |
a |
features |
either |
a IRanges::SimpleDFrameList
object.
require(IRanges) pht <- postHocTest( result = DataFrameList( featureA = DataFrame( comparisons = c("group2-group1", "group3-group1", "group3-group2"), diff_mean = runif(3), pvalue = rep(0.01, 3), ci_lower = rep(0.01, 3), ci_upper = rep(0.011, 3) ), featureB = DataFrame( comparisons = c("group2-group1", "group3-group1", "group3-group2"), diff_mean = runif(3), pvalue = rep(0.01, 3), ci_lower = rep(0.01, 3), ci_upper = rep(0.011, 3) ) ), abundance = data.frame( featureA = runif(3), featureB = runif(3), group = c("group1", "group2", "grou3") ) ) extract_posthoc_res(pht, "featureA")[[1]]
require(IRanges) pht <- postHocTest( result = DataFrameList( featureA = DataFrame( comparisons = c("group2-group1", "group3-group1", "group3-group2"), diff_mean = runif(3), pvalue = rep(0.01, 3), ci_lower = rep(0.01, 3), ci_upper = rep(0.011, 3) ), featureB = DataFrame( comparisons = c("group2-group1", "group3-group1", "group3-group2"), diff_mean = runif(3), pvalue = rep(0.01, 3), ci_lower = rep(0.01, 3), ci_upper = rep(0.011, 3) ) ), abundance = data.frame( featureA = runif(3), featureB = runif(3), group = c("group1", "group2", "grou3") ) ) extract_posthoc_res(pht, "featureA")[[1]]
Import the output of dada2 into phyloseq object
import_dada2( seq_tab, tax_tab = NULL, sam_tab = NULL, phy_tree = NULL, keep_taxa_rows = TRUE )
import_dada2( seq_tab, tax_tab = NULL, sam_tab = NULL, phy_tree = NULL, keep_taxa_rows = TRUE )
seq_tab |
matrix-like, ASV table, the output of
|
tax_tab |
matrix, taxonomy table, the output of
|
sam_tab |
data.frame or |
phy_tree |
|
keep_taxa_rows |
logical, whether keep taxa in rows or not in the
|
The output of the dada2 pipeline is a feature table of amplicon sequence variants (an ASV table): A matrix with rows corresponding to samples and columns to ASVs, in which the value of each entry is the number of times that ASV was observed in that sample. This table is analogous to the traditional OTU table. Conveniently, taxa names are saved as ASV1, ASV2, ..., in the returned phyloseq object.
phyloseq::phyloseq
object hold the taxonomy info,
sample metadata, number of reads per ASV.
seq_tab <- readRDS(system.file("extdata", "dada2_seqtab.rds", package = "microbiomeMarker" )) tax_tab <- readRDS(system.file("extdata", "dada2_taxtab.rds", package = "microbiomeMarker" )) sam_tab <- read.table(system.file("extdata", "dada2_samdata.txt", package = "microbiomeMarker" ), sep = "\t", header = TRUE, row.names = 1) ps <- import_dada2(seq_tab = seq_tab, tax_tab = tax_tab, sam_tab = sam_tab) ps
seq_tab <- readRDS(system.file("extdata", "dada2_seqtab.rds", package = "microbiomeMarker" )) tax_tab <- readRDS(system.file("extdata", "dada2_taxtab.rds", package = "microbiomeMarker" )) sam_tab <- read.table(system.file("extdata", "dada2_samdata.txt", package = "microbiomeMarker" ), sep = "\t", header = TRUE, row.names = 1) ps <- import_dada2(seq_tab = seq_tab, tax_tab = tax_tab, sam_tab = sam_tab) ps
Import the output of picrust2 into phyloseq object
import_picrust2( feature_tab, sam_tab = NULL, trait = c("PATHWAY", "COG", "EC", "KO", "PFAM", "TIGRFAM", "PHENO") )
import_picrust2( feature_tab, sam_tab = NULL, trait = c("PATHWAY", "COG", "EC", "KO", "PFAM", "TIGRFAM", "PHENO") )
feature_tab |
character, file path of the prediction abundance table of functional feature. |
sam_tab |
character, file path of the sample meta data. |
trait |
character, options are picrust2 function traits (including "COG", "EC", "KO", "PFAM", "TIGRFAM", and "PHENO") and "PATHWAY". |
PICRUSt2 is a software for predicting abundances of functional profiles based on marker gene sequencing data. The functional profiles can be predicted from the taxonomic profiles using PICRUSt2. "Function" usually refers to gene families such as KEGG orthologs and Enzyme Classification numbers, but predictions can be made for any arbitrary trait.
In the phyloseq object
, the predicted function abundance profile is stored
in otu_table
slot. And the functional trait is saved in tax_table
slot,
if the descriptions of function features is not added to the predicted table,
tax_table
will have only one rank Picrust_trait
to represent the function
feature id, or if the desciptions are added one more rank
Picrust_description
will be added to represent the description of
function feature.
phyloseq::phyloseq
object.
sam_tab <- system.file( "extdata", "picrust2_metadata.tsv", package = "microbiomeMarker") feature_tab <- system.file( "extdata", "path_abun_unstrat_descrip.tsv.gz", package = "microbiomeMarker") ps <- import_picrust2(feature_tab, sam_tab, trait = "PATHWAY") ps
sam_tab <- system.file( "extdata", "picrust2_metadata.tsv", package = "microbiomeMarker") feature_tab <- system.file( "extdata", "path_abun_unstrat_descrip.tsv.gz", package = "microbiomeMarker") ps <- import_picrust2(feature_tab, sam_tab, trait = "PATHWAY") ps
Import the qiime2 artifacts, including feature table, taxonomic table, phylogenetic tree, representative sequence and sample metadata into phyloseq object.
import_qiime2( otu_qza, taxa_qza = NULL, sam_tab = NULL, refseq_qza = NULL, tree_qza = NULL )
import_qiime2( otu_qza, taxa_qza = NULL, sam_tab = NULL, refseq_qza = NULL, tree_qza = NULL )
otu_qza |
character, file path of the feature table from qiime2. |
taxa_qza |
character, file path of the taxonomic table from qiime2,
default |
sam_tab |
character, file path of the sample metadata in tsv format,
default |
refseq_qza |
character, file path of the representative sequences from
qiime2, default |
tree_qza |
character, file path of the phylogenetic tree from
qiime2, default |
phyloseq::phyloseq
object.
otuqza_file <- system.file( "extdata", "table.qza", package = "microbiomeMarker" ) taxaqza_file <- system.file( "extdata", "taxonomy.qza", package = "microbiomeMarker" ) sample_file <- system.file( "extdata", "sample-metadata.tsv", package = "microbiomeMarker" ) treeqza_file <- system.file( "extdata", "tree.qza", package = "microbiomeMarker" ) ps <- import_qiime2( otu_qza = otuqza_file, taxa_qza = taxaqza_file, sam_tab = sample_file, tree_qza = treeqza_file ) ps
otuqza_file <- system.file( "extdata", "table.qza", package = "microbiomeMarker" ) taxaqza_file <- system.file( "extdata", "taxonomy.qza", package = "microbiomeMarker" ) sample_file <- system.file( "extdata", "sample-metadata.tsv", package = "microbiomeMarker" ) treeqza_file <- system.file( "extdata", "tree.qza", package = "microbiomeMarker" ) ps <- import_qiime2( otu_qza = otuqza_file, taxa_qza = taxaqza_file, sam_tab = sample_file, tree_qza = treeqza_file ) ps
This is the recommended function for both building and accessing microbiome
marker table (marker_table
).
marker_table(object) ## S4 method for signature 'data.frame' marker_table(object) ## S4 method for signature 'microbiomeMarker' marker_table(object)
marker_table(object) ## S4 method for signature 'data.frame' marker_table(object) ## S4 method for signature 'microbiomeMarker' marker_table(object)
object |
an object among the set of classes defined by the
microbiomeMarker package that contain |
a marker_table
object.
data(enterotypes_arumugam) mm <- run_limma_voom( enterotypes_arumugam, "Enterotype", contrast = c("Enterotype 3", "Enterotype 2"), pvalue_cutoff = 0.05, p_adjust = "fdr" ) marker_table(mm)
data(enterotypes_arumugam) mm <- run_limma_voom( enterotypes_arumugam, "Enterotype", contrast = c("Enterotype 3", "Enterotype 2"), pvalue_cutoff = 0.05, p_adjust = "fdr" ) marker_table(mm)
This Class is inherit from data.frame
. Rows represent the microbiome
markers and variables represents feature of the marker.
names,row.names
a character vector, inherited from the input data.frame
.data
a list, each element corresponding the each column of the input data.frame
.S3Class
character, the S3 class marker_table
inherited from:
"data.frame
"
Yang Cao
object
This function replace the marker_table
slot of object
with value
.
marker_table(object) <- value
marker_table(object) <- value
object |
a |
value |
new value to replace the |
a microbiomeMarker
object.
data(enterotypes_arumugam) mm <- run_limma_voom( enterotypes_arumugam, "Enterotype", contrast = c("Enterotype 3", "Enterotype 2"), pvalue_cutoff = 0.1, p_adjust = "fdr" ) mm_marker <- marker_table(mm) mm_marker marker_table(mm) <- mm_marker[1:2, ] marker_table(mm)
data(enterotypes_arumugam) mm <- run_limma_voom( enterotypes_arumugam, "Enterotype", contrast = c("Enterotype 3", "Enterotype 2"), pvalue_cutoff = 0.1, p_adjust = "fdr" ) mm_marker <- marker_table(mm) mm_marker marker_table(mm) <- mm_marker[1:2, ] marker_table(mm)
This the constructor to build the microbiomeMarker
object, don't
use the new()
constructor.
microbiomeMarker( marker_table = NULL, norm_method = NULL, diff_method = NULL, ... )
microbiomeMarker( marker_table = NULL, norm_method = NULL, diff_method = NULL, ... )
marker_table |
a |
norm_method |
character, method used to normalize the input |
diff_method |
character, method used for microbiome marker identification. |
... |
arguments passed to |
a microbiomeMarker
object.
microbiomeMarker( marker_table = marker_table(data.frame( feature = c("speciesA", "speciesB"), enrich_group = c("groupA", "groupB"), ef_logFC = c(-2, 2), pvalue = c(0.01, 0.01), padj = c(0.01, 0.01), row.names = c("marker1", "marker2") )), norm_method = "TSS", diff_method = "DESeq2", otu_table = otu_table(matrix( c(4, 1, 1, 4), nrow = 2, byrow = TRUE, dimnames = list(c("speciesA", "speciesB"), c("sample1", "sample2")) ), taxa_are_rows = TRUE ), tax_table = tax_table(matrix( c("speciesA", "speciesB"), nrow = 2, dimnames = list(c("speciesA", "speciesB"), "Species") )), sam_data = sample_data(data.frame( group = c("groupA", "groupB"), row.names = c("sample1", "sample2") )) )
microbiomeMarker( marker_table = marker_table(data.frame( feature = c("speciesA", "speciesB"), enrich_group = c("groupA", "groupB"), ef_logFC = c(-2, 2), pvalue = c(0.01, 0.01), padj = c(0.01, 0.01), row.names = c("marker1", "marker2") )), norm_method = "TSS", diff_method = "DESeq2", otu_table = otu_table(matrix( c(4, 1, 1, 4), nrow = 2, byrow = TRUE, dimnames = list(c("speciesA", "speciesB"), c("sample1", "sample2")) ), taxa_are_rows = TRUE ), tax_table = tax_table(matrix( c("speciesA", "speciesB"), nrow = 2, dimnames = list(c("speciesA", "speciesB"), "Species") )), sam_data = sample_data(data.frame( group = c("groupA", "groupB"), row.names = c("sample1", "sample2") )) )
microbiomeMarker-class
is inherited from the phyloseq::phyloseq
by adding a custom slot microbiome_marker
to save the differential analysis
results. And it provides a seamless interface with phyloseq, which makes
microbiomeMarker simple and easy to use. For more details on see the
document of phyloseq::phyloseq
.
## S4 method for signature 'microbiomeMarker' show(object)
## S4 method for signature 'microbiomeMarker' show(object)
object |
a |
a microbiomeMarker
object.
marker_table
a data.frame, a marker_table
object.
norm_method
character, method used to normalize the input phyloseq
object.
diff_method
character, method used for marker identification.
phyloseq::phyloseq
, marker_table
,
summarize_taxa()
Get the number of microbiome markers
nmarker(object) ## S4 method for signature 'microbiomeMarker' nmarker(object) ## S4 method for signature 'marker_table' nmarker(object)
nmarker(object) ## S4 method for signature 'microbiomeMarker' nmarker(object) ## S4 method for signature 'marker_table' nmarker(object)
object |
a |
an integer, the number of microbiome markers
mt <- marker_table(data.frame( feature = c("speciesA", "speciesB"), enrich_group = c("groupA", "groupB"), ef_logFC = c(-2, 2), pvalue = c(0.01, 0.01), padj = c(0.01, 0.01), row.names = c("marker1", "marker2") )) nmarker(mt)
mt <- marker_table(data.frame( feature = c("speciesA", "speciesB"), enrich_group = c("groupA", "groupB"), ef_logFC = c(-2, 2), pvalue = c(0.01, 0.01), padj = c(0.01, 0.01), row.names = c("marker1", "marker2") )) nmarker(mt)
It is critical to normalize the feature table to eliminate any bias due to differences in the sampling sequencing depth.This function implements six widely-used normalization methods for microbial compositional data.
For rarefying, reads in the different samples are randomly removed until the same predefined number has been reached, to assure all samples have the same library size. Rarefying normalization method is the standard in microbial ecology. Please note that the authors of phyloseq do not advocate using this rarefying a normalization procedure, despite its recent popularity
TSS simply transforms the feature table into relative abundance by dividing the number of total reads of each sample.
CSS is based on the assumption that the count distributions in each sample are equivalent for low abundant genes up to a certain threshold. Only the segment of each sample’s count distribution that is relatively invariant across samples is scaled by CSS
RLE assumes most features are not differential and uses the relative abundances to calculate the normalization factor.
TMM calculates the normalization factor using a robust statistics based on the assumption that most features are not differential and should, in average, be equal between the samples. The TMM scaling factor is calculated as the weighted mean of log-ratios between each pair of samples, after excluding the highest count OTUs and OTUs with the largest log-fold change.
In CLR, the log-ratios are computed relative to the geometric mean of all features.
norm_cpm
: This normalization method is from the original LEfSe algorithm,
recommended when very low values are present (as shown in the LEfSe galaxy).
## S4 method for signature 'phyloseq' normalize(object, method = "TSS", ...) ## S4 method for signature 'otu_table' normalize(object, method = "TSS", ...) ## S4 method for signature 'data.frame' normalize(object, method = "TSS", ...) ## S4 method for signature 'matrix' normalize(object, method = "TSS", ...) norm_rarefy( object, size = min(sample_sums(object)), rng_seed = FALSE, replace = TRUE, trim_otus = TRUE, verbose = TRUE ) norm_tss(object) norm_css(object, sl = 1000) norm_rle( object, locfunc = stats::median, type = c("poscounts", "ratio"), geo_means = NULL, control_genes = NULL ) norm_tmm( object, ref_column = NULL, logratio_trim = 0.3, sum_trim = 0.05, do_weighting = TRUE, Acutoff = -1e+10 ) norm_clr(object) norm_cpm(object)
## S4 method for signature 'phyloseq' normalize(object, method = "TSS", ...) ## S4 method for signature 'otu_table' normalize(object, method = "TSS", ...) ## S4 method for signature 'data.frame' normalize(object, method = "TSS", ...) ## S4 method for signature 'matrix' normalize(object, method = "TSS", ...) norm_rarefy( object, size = min(sample_sums(object)), rng_seed = FALSE, replace = TRUE, trim_otus = TRUE, verbose = TRUE ) norm_tss(object) norm_css(object, sl = 1000) norm_rle( object, locfunc = stats::median, type = c("poscounts", "ratio"), geo_means = NULL, control_genes = NULL ) norm_tmm( object, ref_column = NULL, logratio_trim = 0.3, sum_trim = 0.05, do_weighting = TRUE, Acutoff = -1e+10 ) norm_clr(object) norm_cpm(object)
object |
|
method |
the methods used to normalize the microbial abundance data. Options includes:
|
... |
other arguments passed to the corresponding normalization methods. |
size , rng_seed , replace , trim_otus , verbose
|
extra arguments passed to
|
sl |
The value to scale. |
locfunc |
a function to compute a location for a sample. By default, the median is used. |
type |
method for estimation: either "ratio"or "poscounts" (recommend). |
geo_means |
default |
control_genes |
default |
ref_column |
column to use as reference |
logratio_trim |
amount of trim to use on log-ratios |
sum_trim |
amount of trim to use on the combined absolute levels ("A" values) |
do_weighting |
whether to compute the weights or not |
Acutoff |
cutoff on "A" values to use before trimming |
the same class with object
.
edgeR::calcNormFactors()
,DESeq2::estimateSizeFactorsForMatrix()
,
metagenomeSeq::cumNorm()
metagenomeSeq::calcNormFactors()
DESeq2::estimateSizeFactorsForMatrix()
data(caporaso) normalize(caporaso, "TSS")
data(caporaso) normalize(caporaso, "TSS")
phyloseq-class
object to DESeqDataSet-class
objectThis function convert [phyloseq::phyloseq-class] to [
DESeq2::DESeqDataSet-class], which can then be tested using [
DESeq2::DESeq()'].
phyloseq2DESeq2(ps, design, ...)
phyloseq2DESeq2(ps, design, ...)
ps |
the [phyloseq::phyloseq-class |
design |
a |
... |
additional arguments passed to
|
a DESeq2::DESeqDataSet
object.
DESeq2::DESeqDataSetFromMatrix()
,DESeq2::DESeq()
data(caporaso) phyloseq2DESeq2(caporaso, ~SampleType)
data(caporaso) phyloseq2DESeq2(caporaso, ~SampleType)
DGEList
objectThis function convert phyloseq::phyloseq
object to
edgeR::DGEList
object, can then can be used to perform
differential analysis using the methods in edgeR.
phyloseq2edgeR(ps, ...)
phyloseq2edgeR(ps, ...)
ps |
a |
... |
optional, additional named arguments passed to
|
A edgeR::DGEList
object.
data(caporaso) dge <- phyloseq2edgeR(caporaso)
data(caporaso) dge <- phyloseq2edgeR(caporaso)
MRexperiment
objectThe phyloseq data is converted to the relevant
metagenomeSeq::MRexperiment
object, which can then be tested in
the zero-inflated mixture model framework in the metagenomeSeq package.
phyloseq2metagenomeSeq(ps, ...) otu_table2metagenomeSeq(ps, ...)
phyloseq2metagenomeSeq(ps, ...) otu_table2metagenomeSeq(ps, ...)
ps |
|
... |
optional, additional named arguments passed to
|
A metagenomeSeq::MRexperiment
object.
metagenomeSeq::fitTimeSeries()
,
metagenomeSeq::fitLogNormal()
,metagenomeSeq::fitZig()
,
metagenomeSeq::MRtable()
,metagenomeSeq::MRfulltable()
data(caporaso) phyloseq2metagenomeSeq(caporaso)
data(caporaso) phyloseq2metagenomeSeq(caporaso)
plot the abundances of markers
plot_abundance(mm, label_level = 1, max_label_len = 60, markers = NULL, group)
plot_abundance(mm, label_level = 1, max_label_len = 60, markers = NULL, group)
mm |
a |
label_level |
integer, number of label levels to be displayed, default
|
max_label_len |
integer, maximum number of characters of feature label,
default |
markers |
character vector, markers to display, default |
group |
character, the variable to set the group |
a ggplot2::ggplot
object.
data(enterotypes_arumugam) mm <- run_limma_voom( enterotypes_arumugam, "Enterotype", contrast = c("Enterotype 3", "Enterotype 2"), pvalue_cutoff = 0.01, p_adjust = "none" ) plot_abundance(mm, group = "Enterotype")
data(enterotypes_arumugam) mm <- run_limma_voom( enterotypes_arumugam, "Enterotype", contrast = c("Enterotype 3", "Enterotype 2"), pvalue_cutoff = 0.01, p_adjust = "none" ) plot_abundance(mm, group = "Enterotype")
plot cladogram of micobiomeMaker results
plot_cladogram( mm, color, only_marker = FALSE, branch_size = 0.2, alpha = 0.2, node_size_scale = 1, node_size_offset = 1, clade_label_level = 4, clade_label_font_size = 4, annotation_shape = 22, annotation_shape_size = 5, group_legend_param = list(), marker_legend_param = list() )
plot_cladogram( mm, color, only_marker = FALSE, branch_size = 0.2, alpha = 0.2, node_size_scale = 1, node_size_offset = 1, clade_label_level = 4, clade_label_font_size = 4, annotation_shape = 22, annotation_shape_size = 5, group_legend_param = list(), marker_legend_param = list() )
mm |
a microbiomeMarker object |
color |
a color vector, used to highlight the clades of microbiome biomarker. The values will be matched in order (usually alphabetical) with the groups. If this is a named vector, then the colors will be matched based on the names instead. |
only_marker |
logical, whether show all the features or only
markers in the cladogram, default |
branch_size |
numeric, size of branch, default |
alpha |
alpha parameter for shading, default |
node_size_scale |
the parameter 'a' controlling node size:
|
node_size_offset |
the parameter 'b' controlling node size:
|
clade_label_level |
max level of taxa used to label the clade, other level of taxa will be shown on the side. |
clade_label_font_size |
font size of the clade label, default 4. |
annotation_shape |
shape used for annotation, default |
annotation_shape_size |
size used for annotation shape, default |
group_legend_param , marker_legend_param
|
a list specifying
extra parameters of group legend and marker legend, such as |
a ggtree object
Chenhao Li, Guangchuang Yu, Chenghao Zhu, Yang Cao
This function is modified from clada.anno
from microbiomeViz.
data(kostic_crc) kostic_crc_small <- phyloseq::subset_taxa( kostic_crc, Phylum %in% c("Firmicutes") ) mm_lefse <- run_lefse( kostic_crc_small, wilcoxon_cutoff = 0.01, group = "DIAGNOSIS", kw_cutoff = 0.01, multigrp_strat = TRUE, lda_cutoff = 4 ) plot_cladogram(mm_lefse, color = c("darkgreen", "red"))
data(kostic_crc) kostic_crc_small <- phyloseq::subset_taxa( kostic_crc, Phylum %in% c("Firmicutes") ) mm_lefse <- run_lefse( kostic_crc_small, wilcoxon_cutoff = 0.01, group = "DIAGNOSIS", kw_cutoff = 0.01, multigrp_strat = TRUE, lda_cutoff = 4 ) plot_cladogram(mm_lefse, color = c("darkgreen", "red"))
bar and dot plot of effect size microbiomeMarker data. This function returns
a ggplot2
object that can be saved or further customized using ggplot2
package.
plot_ef_bar(mm, label_level = 1, max_label_len = 60, markers = NULL) plot_ef_dot(mm, label_level = 1, max_label_len = 60, markers = NULL)
plot_ef_bar(mm, label_level = 1, max_label_len = 60, markers = NULL) plot_ef_dot(mm, label_level = 1, max_label_len = 60, markers = NULL)
mm |
a |
label_level |
integer, number of label levels to be displayed, default
|
max_label_len |
integer, maximum number of characters of feature label,
default |
markers |
character vector, markers to display, default |
a ggplot project
data(enterotypes_arumugam) mm <- run_limma_voom( enterotypes_arumugam, "Enterotype", contrast = c("Enterotype 3", "Enterotype 2"), pvalue_cutoff = 0.01, p_adjust = "none" ) plot_ef_bar(mm)
data(enterotypes_arumugam) mm <- run_limma_voom( enterotypes_arumugam, "Enterotype", contrast = c("Enterotype 3", "Enterotype 2"), pvalue_cutoff = 0.01, p_adjust = "none" ) plot_ef_bar(mm)
Display the microbiome marker using heatmap, in which rows represents the marker and columns represents the samples.
plot_heatmap( mm, transform = c("log10", "log10p", "identity"), cluster_marker = FALSE, cluster_sample = FALSE, markers = NULL, label_level = 1, max_label_len = 60, sample_label = FALSE, scale_by_row = FALSE, annotation_col = NULL, group, ... )
plot_heatmap( mm, transform = c("log10", "log10p", "identity"), cluster_marker = FALSE, cluster_sample = FALSE, markers = NULL, label_level = 1, max_label_len = 60, sample_label = FALSE, scale_by_row = FALSE, annotation_col = NULL, group, ... )
mm |
a |
transform |
transformation to apply, for more details see
|
cluster_marker , cluster_sample
|
logical, controls whether to perform
clustering in markers (rows) and samples (cols), default |
markers |
character vector, markers to display, default |
label_level |
integer, number of label levels to be displayed, default
|
max_label_len |
integer, maximum number of characters of feature label,
default |
sample_label |
logical, controls whether to show the sample labels in
the heatmap, default |
scale_by_row |
logical, controls whether to scale the heatmap by the
row (marker) values, default |
annotation_col |
assign colors for the top annotation using a named
vector, passed to |
group |
character, the variable to set the group |
... |
extra arguments passed to |
a ComplexHeatmap::Heatmap
object.
transform_abundances
,ComplexHeatmap::Heatmap()
data(kostic_crc) kostic_crc_small <- phyloseq::subset_taxa( kostic_crc, Phylum %in% c("Firmicutes") ) mm_lefse <- run_lefse( kostic_crc_small, wilcoxon_cutoff = 0.01, group = "DIAGNOSIS", kw_cutoff = 0.01, multigrp_strat = TRUE, lda_cutoff = 4 ) plot_heatmap(mm_lefse, group = "DIAGNOSIS")
data(kostic_crc) kostic_crc_small <- phyloseq::subset_taxa( kostic_crc, Phylum %in% c("Firmicutes") ) mm_lefse <- run_lefse( kostic_crc_small, wilcoxon_cutoff = 0.01, group = "DIAGNOSIS", kw_cutoff = 0.01, multigrp_strat = TRUE, lda_cutoff = 4 ) plot_heatmap(mm_lefse, group = "DIAGNOSIS")
postHocTest
plotVisualize the result of post-hoc test using ggplot2
plot_postHocTest(pht, feature, step_increase = 0.12)
plot_postHocTest(pht, feature, step_increase = 0.12)
pht |
a |
feature |
character, to plot the post-toc test result of this feature |
step_increase |
numeric vector with the increase in fraction of total
height for every additional comparison to minimize overlap, default |
a ggplot
object
data(enterotypes_arumugam) ps <- phyloseq::subset_samples( enterotypes_arumugam, Enterotype %in% c("Enterotype 3", "Enterotype 2", "Enterotype 1") ) %>% phyloseq::subset_taxa(Phylum == "Bacteroidetes") pht <- run_posthoc_test(ps, group = "Enterotype") plot_postHocTest(pht, feature = "p__Bacteroidetes|g__Alistipes")
data(enterotypes_arumugam) ps <- phyloseq::subset_samples( enterotypes_arumugam, Enterotype %in% c("Enterotype 3", "Enterotype 2", "Enterotype 1") ) %>% phyloseq::subset_taxa(Phylum == "Bacteroidetes") pht <- run_posthoc_test(ps, group = "Enterotype") plot_postHocTest(pht, feature = "p__Bacteroidetes|g__Alistipes")
Show the ROC curve of the microbiome marker calculated by run_sl
.
plot_sl_roc(mm, group, nfolds = 3, nrepeats = 3, tune_length = 5, ...)
plot_sl_roc(mm, group, nfolds = 3, nrepeats = 3, tune_length = 5, ...)
mm |
a microbiomeMarker object. |
group , nfolds , nrepeats , tune_length , ...
|
same with the |
a ggplot2::ggplot
object.
data(enterotypes_arumugam) # small example phyloseq object for test ps_s <- phyloseq::subset_taxa( enterotypes_arumugam, Phylum %in% c("Firmicutes", "Bacteroidetes") ) set.seed(2021) mm <- run_sl( ps_s, group = "Gender", taxa_rank = "Genus", nfolds = 2, nrepeats = 1, top_n = 15, norm = "TSS", method = "LR", ) plot_sl_roc(mm, group = "Gender")
data(enterotypes_arumugam) # small example phyloseq object for test ps_s <- phyloseq::subset_taxa( enterotypes_arumugam, Phylum %in% c("Firmicutes", "Bacteroidetes") ) set.seed(2021) mm <- run_sl( ps_s, group = "Gender", taxa_rank = "Genus", nfolds = 2, nrepeats = 1, top_n = 15, norm = "TSS", method = "LR", ) plot_sl_roc(mm, group = "Gender")
Plotting DA comparing result
## S3 method for class 'compareDA' plot(x, sort = c("score", "auc", "fpr", "power"), ...)
## S3 method for class 'compareDA' plot(x, sort = c("score", "auc", "fpr", "power"), ...)
x |
an |
sort |
character string specifying sort method. Possibilities are
"score" which is calculated as |
... |
extra arguments, just ignore it. |
a ggplot2::ggplot
object containing 4 subplots: "auc", "fdr",
"power"and "score" plot.
This function is used for create postHocTest
object, and is only used for
developers.
postHocTest( result, abundance, conf_level = 0.95, method = "tukey", method_str = paste("Posthoc multiple comparisons of means: ", method) )
postHocTest( result, abundance, conf_level = 0.95, method = "tukey", method_str = paste("Posthoc multiple comparisons of means: ", method) )
result |
a |
abundance |
data.frame. |
conf_level |
numeric, confidence level. |
method |
character, method for posthoc test. |
method_str |
character, illustrates which method is used for posthoc test. |
a postHocTest
object.
require(IRanges) pht <- postHocTest( result = DataFrameList( featureA = DataFrame( comparisons = c("group2-group1", "group3-group1", "group3-group2"), diff_mean = runif(3), pvalue = rep(0.01, 3), ci_lower = rep(0.01, 3), ci_upper = rep(0.011, 3) ), featureB = DataFrame( comparisons = c("group2-group1", "group3-group1", "group3-group2"), diff_mean = runif(3), pvalue = rep(0.01, 3), ci_lower = rep(0.01, 3), ci_upper = rep(0.011, 3) ) ), abundance = data.frame( featureA = runif(3), featureB = runif(3), group = c("group1", "group2", "grou3") ) ) pht
require(IRanges) pht <- postHocTest( result = DataFrameList( featureA = DataFrame( comparisons = c("group2-group1", "group3-group1", "group3-group2"), diff_mean = runif(3), pvalue = rep(0.01, 3), ci_lower = rep(0.01, 3), ci_upper = rep(0.011, 3) ), featureB = DataFrame( comparisons = c("group2-group1", "group3-group1", "group3-group2"), diff_mean = runif(3), pvalue = rep(0.01, 3), ci_lower = rep(0.01, 3), ci_upper = rep(0.011, 3) ) ), abundance = data.frame( featureA = runif(3), featureB = runif(3), group = c("group1", "group2", "grou3") ) ) pht
The postHocTest Class, represents the result of post-hoc test result among multiple groups
## S4 method for signature 'postHocTest' show(object)
## S4 method for signature 'postHocTest' show(object)
object |
a |
a postHocTest
object.
result
a IRanges::DataFrameList
, each DataFrame
consists
of five variables:
comparisons
: character, specify which two groups to test (the group names
are separated by "_)
diff_mean
: numeric, difference in mean abundances
pvalue
: numeric, p values
ci_lower
and ci_upper
: numeric, lower and upper confidence interval of
difference in mean abundances
abundance
abundance of each feature in each group
conf_level
confidence level
method
method used for post-hoc test
method_str
method illustration
Yang Cao
Perform differential analysis using ALDEx2
run_aldex( ps, group, taxa_rank = "all", transform = c("identity", "log10", "log10p"), norm = "none", norm_para = list(), method = c("t.test", "wilcox.test", "kruskal", "glm_anova"), p_adjust = c("none", "fdr", "bonferroni", "holm", "hochberg", "hommel", "BH", "BY"), pvalue_cutoff = 0.05, mc_samples = 128, denom = c("all", "iqlr", "zero", "lvha"), paired = FALSE )
run_aldex( ps, group, taxa_rank = "all", transform = c("identity", "log10", "log10p"), norm = "none", norm_para = list(), method = c("t.test", "wilcox.test", "kruskal", "glm_anova"), p_adjust = c("none", "fdr", "bonferroni", "holm", "hochberg", "hommel", "BH", "BY"), pvalue_cutoff = 0.05, mc_samples = 128, denom = c("all", "iqlr", "zero", "lvha"), paired = FALSE )
ps |
a |
group |
character, the variable to set the group |
taxa_rank |
character to specify taxonomic rank to perform
differential analysis on. Should be one of
|
transform |
character, the methods used to transform the microbial
abundance. See
|
norm |
the methods used to normalize the microbial abundance data. See
|
norm_para |
arguments passed to specific normalization methods |
method |
test method, options include: "t.test" and "wilcox.test" for two groups comparison, "kruskal" and "glm_anova" for multiple groups comparison. |
p_adjust |
method for multiple test correction, default |
pvalue_cutoff |
cutoff of p value, default 0.05. |
mc_samples |
integer, the number of Monte Carlo samples to use for underlying distributions estimation, 128 is usually sufficient. |
denom |
character string, specifiy which features used to as the denominator for the geometric mean calculation. Options are:
|
paired |
logical, whether to perform paired tests, only worked for method "t.test" and "wilcox.test". |
a microbiomeMarker
object.
Fernandes, A.D., Reid, J.N., Macklaim, J.M. et al. Unifying the analysis of high-throughput sequencing datasets: characterizing RNA-seq, 16S rRNA gene sequencing and selective growth experiments by compositional data analysis. Microbiome 2, 15 (2014).
data(enterotypes_arumugam) ps <- phyloseq::subset_samples( enterotypes_arumugam, Enterotype %in% c("Enterotype 3", "Enterotype 2") ) run_aldex(ps, group = "Enterotype")
data(enterotypes_arumugam) ps <- phyloseq::subset_samples( enterotypes_arumugam, Enterotype %in% c("Enterotype 3", "Enterotype 2") ) run_aldex(ps, group = "Enterotype")
Perform significant test by comparing the pairwise log ratios between all features.
run_ancom( ps, group, confounders = character(0), taxa_rank = "all", transform = c("identity", "log10", "log10p"), norm = "TSS", norm_para = list(), p_adjust = c("none", "fdr", "bonferroni", "holm", "hochberg", "hommel", "BH", "BY"), pvalue_cutoff = 0.05, W_cutoff = 0.75 )
run_ancom( ps, group, confounders = character(0), taxa_rank = "all", transform = c("identity", "log10", "log10p"), norm = "TSS", norm_para = list(), p_adjust = c("none", "fdr", "bonferroni", "holm", "hochberg", "hommel", "BH", "BY"), pvalue_cutoff = 0.05, W_cutoff = 0.75 )
ps |
a |
group |
character, the variable to set the group. |
confounders |
character vector, the confounding variables to be adjusted.
default |
taxa_rank |
character to specify taxonomic rank to perform
differential analysis on. Should be one of
|
transform |
character, the methods used to transform the microbial
abundance. See
|
norm |
the methods used to normalize the microbial abundance data. See
|
norm_para |
named |
p_adjust |
method for multiple test correction, default |
pvalue_cutoff |
significance level for each of the statistical tests, default 0.05. |
W_cutoff |
lower bound for the proportion for the W-statistic, default 0.7. |
In an experiment with only two treatments, this tests the following
hypothesis for feature :
where and
are the mean abundances for feature
in the two groups.
The developers of this method recommend the following significance tests
if there are 2 groups, use non-parametric Wilcoxon rank sum test
stats::wilcox.test()
. If there are more than 2 groups, use nonparametric
stats::kruskal.test()
or one-way ANOVA stats::aov()
.
a microbiomeMarker object, in which the slot
of
marker_table
contains four variables:
feature
, significantly different features.
enrich_group
, the class of the differential features enriched.
effect_size
, differential means for two groups, or F statistic for more
than two groups.
W
, the W-statistic, number of features that a single feature is tested
to be significantly different against.
Huang Lin, Yang Cao
Mandal et al. "Analysis of composition of microbiomes: a novel method for studying microbial composition", Microbial Ecology in Health & Disease, (2015), 26.
data(enterotypes_arumugam) ps <- phyloseq::subset_samples( enterotypes_arumugam, Enterotype %in% c("Enterotype 3", "Enterotype 2") ) run_ancom(ps, group = "Enterotype")
data(enterotypes_arumugam) ps <- phyloseq::subset_samples( enterotypes_arumugam, Enterotype %in% c("Enterotype 3", "Enterotype 2") ) run_ancom(ps, group = "Enterotype")
Differential abundance analysis for microbial absolute abundance data. This
function is a wrapper of ANCOMBC::ancombc()
.
run_ancombc( ps, group, confounders = character(0), contrast = NULL, taxa_rank = "all", transform = c("identity", "log10", "log10p"), norm = "none", norm_para = list(), p_adjust = c("none", "fdr", "bonferroni", "holm", "hochberg", "hommel", "BH", "BY"), prv_cut = 0.1, lib_cut = 0, struc_zero = FALSE, neg_lb = FALSE, tol = 1e-05, max_iter = 100, conserve = FALSE, pvalue_cutoff = 0.05 )
run_ancombc( ps, group, confounders = character(0), contrast = NULL, taxa_rank = "all", transform = c("identity", "log10", "log10p"), norm = "none", norm_para = list(), p_adjust = c("none", "fdr", "bonferroni", "holm", "hochberg", "hommel", "BH", "BY"), prv_cut = 0.1, lib_cut = 0, struc_zero = FALSE, neg_lb = FALSE, tol = 1e-05, max_iter = 100, conserve = FALSE, pvalue_cutoff = 0.05 )
ps |
a |
group |
the name of the group variable in metadata. Specifying
|
confounders |
character vector, the confounding variables to be adjusted.
default |
contrast |
this parameter only used for two groups comparison while there are multiple groups. For more please see the following details. |
taxa_rank |
character to specify taxonomic rank to perform
differential analysis on. Should be one of
|
transform |
character, the methods used to transform the microbial
abundance. See
|
norm |
the methods used to normalize the microbial abundance data. See
|
norm_para |
named |
p_adjust |
method to adjust p-values by. Default is "holm".
Options include "holm", "hochberg", "hommel", "bonferroni", "BH", "BY",
"fdr", "none". See |
prv_cut |
a numerical fraction between 0 and 1. Taxa with prevalences
less than |
lib_cut |
a numerical threshold for filtering samples based on library
sizes. Samples with library sizes less than |
struc_zero |
whether to detect structural zeros. Default is FALSE. |
neg_lb |
whether to classify a taxon as a structural zero in the corresponding study group using its asymptotic lower bound. Default is FALSE. |
tol |
the iteration convergence tolerance for the E-M algorithm. Default is 1e-05. |
max_iter |
the maximum number of iterations for the E-M algorithm. Default is 100. |
conserve |
whether to use a conservative variance estimate of the test statistic. It is recommended if the sample size is small and/or the number of differentially abundant taxa is believed to be large. Default is FALSE. |
pvalue_cutoff |
level of significance. Default is 0.05. |
contrast
must be a two length character or NULL
(default). It is only
required to set manually for two groups comparison when there are multiple
groups. The order determines the direction of comparison, the first element
is used to specify the reference group (control). This means that, the first
element is the denominator for the fold change, and the second element is
used as baseline (numerator for fold change). Otherwise, users do required
to concern this parameter (set as default NULL
), and if there are
two groups, the first level of groups will set as the reference group; if
there are multiple groups, it will perform an ANOVA-like testing to find
markers which difference in any of the groups.
a microbiomeMarker
object.
Lin, Huang, and Shyamal Das Peddada. "Analysis of compositions of microbiomes with bias correction." Nature communications 11.1 (2020): 1-11.
data(enterotypes_arumugam) ps <- phyloseq::subset_samples( enterotypes_arumugam, Enterotype %in% c("Enterotype 3", "Enterotype 2") ) run_ancombc(ps, group = "Enterotype")
data(enterotypes_arumugam) ps <- phyloseq::subset_samples( enterotypes_arumugam, Enterotype %in% c("Enterotype 3", "Enterotype 2") ) run_ancombc(ps, group = "Enterotype")
Differential expression analysis based on the Negative Binomial distribution using DESeq2.
run_deseq2( ps, group, confounders = character(0), contrast = NULL, taxa_rank = "all", norm = "RLE", norm_para = list(), transform = c("identity", "log10", "log10p"), fitType = c("parametric", "local", "mean", "glmGamPoi"), sfType = "poscounts", betaPrior = FALSE, modelMatrixType, useT = FALSE, minmu = ifelse(fitType == "glmGamPoi", 1e-06, 0.5), p_adjust = c("none", "fdr", "bonferroni", "holm", "hochberg", "hommel", "BH", "BY"), pvalue_cutoff = 0.05, ... )
run_deseq2( ps, group, confounders = character(0), contrast = NULL, taxa_rank = "all", norm = "RLE", norm_para = list(), transform = c("identity", "log10", "log10p"), fitType = c("parametric", "local", "mean", "glmGamPoi"), sfType = "poscounts", betaPrior = FALSE, modelMatrixType, useT = FALSE, minmu = ifelse(fitType == "glmGamPoi", 1e-06, 0.5), p_adjust = c("none", "fdr", "bonferroni", "holm", "hochberg", "hommel", "BH", "BY"), pvalue_cutoff = 0.05, ... )
ps |
a |
group |
character, the variable to set the group, must be one of the var of the sample metadata. |
confounders |
character vector, the confounding variables to be adjusted.
default |
contrast |
this parameter only used for two groups comparison while there are multiple groups. For more please see the following details. |
taxa_rank |
character to specify taxonomic rank to perform
differential analysis on. Should be one of
|
norm |
the methods used to normalize the microbial abundance data. See
|
norm_para |
arguments passed to specific normalization methods. Most users will not need to pass any additional arguments here. |
transform |
character, the methods used to transform the microbial
abundance. See
|
fitType , sfType , betaPrior , modelMatrixType , useT , minmu
|
these seven
parameters are inherited form
For more details, see |
p_adjust |
method for multiple test correction, default |
pvalue_cutoff |
pvalue_cutoff numeric, p value cutoff, default 0.05. |
... |
extra parameters passed to |
Note: DESeq2 requires the input is raw counts (un-normalized counts), as
only the counts values allow assessing the measurement precision correctly.
For more details see the vignette of DESeq2 (vignette("DESeq2")
).
Thus, this function only supports "none", "rarefy", "RLE", "CSS", and "TMM" normalization methods. We strongly recommend using the "RLE" method (default normalization method in the DESeq2 package). The other normalization methods are used for expert users and comparisons among different normalization methods.
For two groups comparison, this function utilizes the Wald test (defined by
DESeq2::nbinomWaldTest()
) for hypothesis testing. A Wald test statistic
is computed along with a probability (p-value) that a test statistic at least
as extreme as the observed value were selected at random. contrasts
are
used to specify which two groups to compare. The order of the names
determines the direction of fold change that is reported.
Likelihood ratio test (LRT) is used to identify the genes that significantly changed across all the different levels for multiple groups comparisons. The LRT identified the significant features by comparing the full model to the reduced model. It is testing whether a feature removed in the reduced model explains a significant variation in the data.
contrast
must be a two length character or NULL
(default). It is only
required to set manually for two groups comparison when there are multiple
groups. The order determines the direction of comparison, the first element
is used to specify the reference group (control). This means that, the first
element is the denominator for the fold change, and the second element is
used as baseline (numerator for fold change). Otherwise, users do required
to concern this parameter (set as default NULL
), and if there are
two groups, the first level of groups will set as the reference group; if
there are multiple groups, it will perform an ANOVA-like testing to find
markers which difference in any of the groups.
a microbiomeMarker
object.
Love, Michael I., Wolfgang Huber, and Simon Anders. "Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2." Genome biology 15.12 (2014): 1-21.
DESeq2::results()
,DESeq2::DESeq()
data(enterotypes_arumugam) ps <- phyloseq::subset_samples( enterotypes_arumugam, Enterotype %in% c("Enterotype 3", "Enterotype 2")) %>% phyloseq::subset_taxa(Phylum %in% c("Firmicutes")) run_deseq2(ps, group = "Enterotype")
data(enterotypes_arumugam) ps <- phyloseq::subset_samples( enterotypes_arumugam, Enterotype %in% c("Enterotype 3", "Enterotype 2")) %>% phyloseq::subset_taxa(Phylum %in% c("Firmicutes")) run_deseq2(ps, group = "Enterotype")
Differential expression analysis based on the Negative Binomial distribution using edgeR.
run_edger( ps, group, confounders = character(0), contrast = NULL, taxa_rank = "all", method = c("LRT", "QLFT"), transform = c("identity", "log10", "log10p"), norm = "TMM", norm_para = list(), disp_para = list(), p_adjust = c("none", "fdr", "bonferroni", "holm", "hochberg", "hommel", "BH", "BY"), pvalue_cutoff = 0.05, ... )
run_edger( ps, group, confounders = character(0), contrast = NULL, taxa_rank = "all", method = c("LRT", "QLFT"), transform = c("identity", "log10", "log10p"), norm = "TMM", norm_para = list(), disp_para = list(), p_adjust = c("none", "fdr", "bonferroni", "holm", "hochberg", "hommel", "BH", "BY"), pvalue_cutoff = 0.05, ... )
ps |
ps a |
group |
character, the variable to set the group, must be one of the var of the sample metadata. |
confounders |
character vector, the confounding variables to be adjusted.
default |
contrast |
this parameter only used for two groups comparison while there are multiple groups. For more please see the following details. |
taxa_rank |
character to specify taxonomic rank to perform
differential analysis on. Should be one of
|
method |
character, used for differential analysis, please see details below for more info. |
transform |
character, the methods used to transform the microbial
abundance. See
|
norm |
the methods used to normalize the microbial abundance data. See
|
norm_para |
arguments passed to specific normalization methods. Most users will not need to pass any additional arguments here. |
disp_para |
additional arguments passed to |
p_adjust |
method for multiple test correction, default |
pvalue_cutoff |
numeric, p value cutoff, default 0.05 |
... |
extra arguments passed to the model. See |
Note that edgeR is designed to work with actual counts. This means that transformation is not required in any way before inputting them to edgeR.
There are two test methods for differential analysis in edgeR, likelihood ratio test (LRT) and quasi-likelihood F-test (QLFT). The QLFT method is recommended as it allows stricter error rate control by accounting for the uncertainty in dispersion estimation.
contrast
must be a two length character or NULL
(default). It is only
required to set manually for two groups comparison when there are multiple
groups. The order determines the direction of comparison, the first element
is used to specify the reference group (control). This means that, the first
element is the denominator for the fold change, and the second element is
used as baseline (numerator for fold change). Otherwise, users do required
to concern this parameter (set as default NULL
), and if there are
two groups, the first level of groups will set as the reference group; if
there are multiple groups, it will perform an ANOVA-like testing to find
markers which difference in any of the groups.
a microbiomeMarker
object.
Yang Cao
Robinson, Mark D., and Alicia Oshlack. "A scaling normalization method for differential expression analysis of RNA-seq data." Genome biology 11.3 (2010): 1-9.
Robinson, Mark D., Davis J. McCarthy, and Gordon K. Smyth. "edgeR: a Bioconductor package for differential expression analysis of digital gene expression data." Bioinformatics 26.1 (2010): 139-140.
edgeR::glmFit()
,edgeR::glmQLFit()
,edgeR::estimateDisp()
,normalize()
data(enterotypes_arumugam) ps <- phyloseq::subset_samples( enterotypes_arumugam, Enterotype %in% c("Enterotype 3", "Enterotype 2") ) run_edger(ps, group = "Enterotype")
data(enterotypes_arumugam) ps <- phyloseq::subset_samples( enterotypes_arumugam, Enterotype %in% c("Enterotype 3", "Enterotype 2") ) run_edger(ps, group = "Enterotype")
Perform Metagenomic LEFSe analysis based on phyloseq object.
run_lefse( ps, group, subgroup = NULL, taxa_rank = "all", transform = c("identity", "log10", "log10p"), norm = "CPM", norm_para = list(), kw_cutoff = 0.05, lda_cutoff = 2, bootstrap_n = 30, bootstrap_fraction = 2/3, wilcoxon_cutoff = 0.05, multigrp_strat = FALSE, strict = c("0", "1", "2"), sample_min = 10, only_same_subgrp = FALSE, curv = FALSE )
run_lefse( ps, group, subgroup = NULL, taxa_rank = "all", transform = c("identity", "log10", "log10p"), norm = "CPM", norm_para = list(), kw_cutoff = 0.05, lda_cutoff = 2, bootstrap_n = 30, bootstrap_fraction = 2/3, wilcoxon_cutoff = 0.05, multigrp_strat = FALSE, strict = c("0", "1", "2"), sample_min = 10, only_same_subgrp = FALSE, curv = FALSE )
ps |
a |
group |
character, the column name to set the group |
subgroup |
character, the column name to set the subgroup |
taxa_rank |
character to specify taxonomic rank to perform
differential analysis on. Should be one of
|
transform |
character, the methods used to transform the microbial
abundance. See
|
norm |
the methods used to normalize the microbial abundance data. See
|
norm_para |
named |
kw_cutoff |
numeric, p value cutoff of kw test, default 0.05 |
lda_cutoff |
numeric, lda score cutoff, default 2 |
bootstrap_n |
integer, the number of bootstrap iteration for LDA, default 30 |
bootstrap_fraction |
numeric, the subsampling fraction value for each
bootstrap iteration, default |
wilcoxon_cutoff |
numeric, p value cutoff of wilcoxon test, default 0.05 |
multigrp_strat |
logical, for multiple group tasks, whether the test is
performed in a one-against one (more strict) or in a one-against all
setting, default |
strict |
multiple testing options, 0 for no correction (default), 1 for independent comparisons, 2 for independent comparison. |
sample_min |
integer, minimum number of samples per subclass for performing wilcoxon test, default 10 |
only_same_subgrp |
logical, whether perform the wilcoxon test only
among the subgroups with the same name, default |
curv |
logical, whether perform the wilcoxon test using the
Curtis's approach, defalt |
a microbiomeMarker object, in which the slot
of
marker_table
contains four variables:
feature
, significantly different features.
enrich_group
, the class of the differential features enriched.
lda
, logarithmic LDA score (effect size)
pvalue
, p value of kw test.
Yang Cao
Segata, Nicola, et al. Metagenomic biomarker discovery and explanation. Genome biology 12.6 (2011): R60.
data(kostic_crc) kostic_crc_small <- phyloseq::subset_taxa( kostic_crc, Phylum == "Firmicutes" ) mm_lefse <- run_lefse( kostic_crc_small, wilcoxon_cutoff = 0.01, group = "DIAGNOSIS", kw_cutoff = 0.01, multigrp_strat = TRUE, lda_cutoff = 4 )
data(kostic_crc) kostic_crc_small <- phyloseq::subset_taxa( kostic_crc, Phylum == "Firmicutes" ) mm_lefse <- run_lefse( kostic_crc_small, wilcoxon_cutoff = 0.01, group = "DIAGNOSIS", kw_cutoff = 0.01, multigrp_strat = TRUE, lda_cutoff = 4 )
Differential analysis using limma-voom
run_limma_voom( ps, group, confounders = character(0), contrast = NULL, taxa_rank = "all", transform = c("identity", "log10", "log10p"), norm = "none", norm_para = list(), voom_span = 0.5, p_adjust = c("none", "fdr", "bonferroni", "holm", "hochberg", "hommel", "BH", "BY"), pvalue_cutoff = 0.05, ... )
run_limma_voom( ps, group, confounders = character(0), contrast = NULL, taxa_rank = "all", transform = c("identity", "log10", "log10p"), norm = "none", norm_para = list(), voom_span = 0.5, p_adjust = c("none", "fdr", "bonferroni", "holm", "hochberg", "hommel", "BH", "BY"), pvalue_cutoff = 0.05, ... )
ps |
ps a |
group |
character, the variable to set the group, must be one of the var of the sample metadata. |
confounders |
character vector, the confounding variables to be adjusted.
default |
contrast |
this parameter only used for two groups comparison while there are multiple groups. For more please see the following details. |
taxa_rank |
character to specify taxonomic rank to perform
differential analysis on. Should be one of
|
transform |
character, the methods used to transform the microbial
abundance. See
|
norm |
the methods used to normalize the microbial abundance data. See
|
norm_para |
arguments passed to specific normalization methods. Most users will not need to pass any additional arguments here. |
voom_span |
width of the smoothing window used for the lowess
mean-variance trend for |
p_adjust |
method for multiple test correction, default |
pvalue_cutoff |
cutoff of p value, default 0.05. |
... |
extra arguments passed to |
contrast
must be a two length character or NULL
(default). It is only
required to set manually for two groups comparison when there are multiple
groups. The order determines the direction of comparison, the first element
is used to specify the reference group (control). This means that, the first
element is the denominator for the fold change, and the second element is
used as baseline (numerator for fold change). Otherwise, users do required
to concern this parameter (set as default NULL
), and if there are
two groups, the first level of groups will set as the reference group; if
there are multiple groups, it will perform an ANOVA-like testing to find
markers which difference in any of the groups.
a microbiomeMarker
object.
Law, C. W., Chen, Y., Shi, W., & Smyth, G. K. (2014). voom: Precision weights unlock linear model analysis tools for RNA-seq read counts. Genome biology, 15(2), 1-17.
data(enterotypes_arumugam) mm <- run_limma_voom( enterotypes_arumugam, "Enterotype", contrast = c("Enterotype 3", "Enterotype 2"), pvalue_cutoff = 0.01, p_adjust = "none" ) mm
data(enterotypes_arumugam) mm <- run_limma_voom( enterotypes_arumugam, "Enterotype", contrast = c("Enterotype 3", "Enterotype 2"), pvalue_cutoff = 0.01, p_adjust = "none" ) mm
run_marker
is a wrapper of all differential analysis functions.
run_marker( ps, group, da_method = c("lefse", "simple_t", "simple_welch", "simple_white", "simple_kruskal", "simple_anova", "edger", "deseq2", "metagenomeseq", "ancom", "ancombc", "aldex", "limma_voom", "sl_lr", "sl_rf", "sl_svm"), taxa_rank = "all", transform = c("identity", "log10", "log10p"), norm = "none", norm_para = list(), p_adjust = c("none", "fdr", "bonferroni", "holm", "hochberg", "hommel", "BH", "BY"), pvalue_cutoff = 0.05, ... )
run_marker( ps, group, da_method = c("lefse", "simple_t", "simple_welch", "simple_white", "simple_kruskal", "simple_anova", "edger", "deseq2", "metagenomeseq", "ancom", "ancombc", "aldex", "limma_voom", "sl_lr", "sl_rf", "sl_svm"), taxa_rank = "all", transform = c("identity", "log10", "log10p"), norm = "none", norm_para = list(), p_adjust = c("none", "fdr", "bonferroni", "holm", "hochberg", "hommel", "BH", "BY"), pvalue_cutoff = 0.05, ... )
ps |
a |
group |
character, the variable to set the group |
da_method |
character to specify the differential analysis method. The options include:
|
taxa_rank |
character to specify taxonomic rank to perform
differential analysis on. Should be one of
|
transform |
character, the methods used to transform the microbial
abundance. See
|
norm |
the methods used to normalize the microbial abundance data. See
|
norm_para |
arguments passed to specific normalization methods |
p_adjust |
method for multiple test correction, default |
pvalue_cutoff |
numeric, p value cutoff, default 0.05. |
... |
extra arguments passed to the corresponding differential analysis
functions, e.g. |
This function is only a wrapper of all differential analysis functions, We recommend to use the corresponding function, since it has a better default arguments setting.
a microbiomeMarker
object.
run_lefse()
,run_simple_stat()
,run_test_two_groups()
,
run_test_multiple_groups()
,run_edger()
,run_deseq2()
,
run_metagenomeseq
,run_ancom()
,run_ancombc()
,run_aldex()
,
run_limma_voom()
,run_sl()
Differential expression analysis based on the Zero-inflated Log-Normal mixture model or Zero-inflated Gaussian mixture model using metagenomeSeq.
run_metagenomeseq( ps, group, confounders = character(0), contrast = NULL, taxa_rank = "all", transform = c("identity", "log10", "log10p"), norm = "CSS", norm_para = list(), method = c("ZILN", "ZIG"), p_adjust = c("none", "fdr", "bonferroni", "holm", "hochberg", "hommel", "BH", "BY"), pvalue_cutoff = 0.05, ... )
run_metagenomeseq( ps, group, confounders = character(0), contrast = NULL, taxa_rank = "all", transform = c("identity", "log10", "log10p"), norm = "CSS", norm_para = list(), method = c("ZILN", "ZIG"), p_adjust = c("none", "fdr", "bonferroni", "holm", "hochberg", "hommel", "BH", "BY"), pvalue_cutoff = 0.05, ... )
ps |
ps a |
group |
character, the variable to set the group, must be one of the var of the sample metadata. |
confounders |
character vector, the confounding variables to be adjusted.
default |
contrast |
this parameter only used for two groups comparison while there are multiple groups. For more please see the following details. |
taxa_rank |
character to specify taxonomic rank to perform
differential analysis on. Should be one of |
transform |
character, the methods used to transform the microbial
abundance. See
|
norm |
the methods used to normalize the microbial abundance data. See
|
norm_para |
arguments passed to specific normalization methods. |
method |
character, which model used for differential analysis, "ZILN" (Zero-inflated Log-Normal mixture model)" or "ZIG" (Zero-inflated Gaussian mixture model). And the zero-inflated log-normal model is preferred due to the high sensitivity and low FDR. |
p_adjust |
method for multiple test correction, default |
pvalue_cutoff |
numeric, p value cutoff, default 0.05 |
... |
extra arguments passed to the model. more details see
|
metagnomeSeq provides two differential analysis methods, zero-inflated
log-normal mixture model (implemented in
metagenomeSeq::fitFeatureModel()
) and zero-inflated Gaussian mixture
model (implemented in metagenomeSeq::fitZig()
). We recommend
fitFeatureModel over fitZig due to high sensitivity and low FDR. Both
metagenomeSeq::fitFeatureModel()
and metagenomeSeq::fitZig()
require
the abundance profiles before normalization.
For metagenomeSeq::fitZig()
, the output column is the coefficient of
interest, and logFC column in the output of
metagenomeSeq::fitFeatureModel()
is analogous to coefficient. Thus,
logFC is really just the estimate the coefficient of interest in
metagenomeSeq::fitFeatureModel()
. For more details see
these question Difference between fitFeatureModel and fitZIG in metagenomeSeq.
contrast
must be a two length character or NULL
(default). It is only
required to set manually for two groups comparison when there are multiple
groups. The order determines the direction of comparison, the first element
is used to specify the reference group (control). This means that, the first
element is the denominator for the fold change, and the second element is
used as baseline (numerator for fold change). Otherwise, users do required
to concern this paramerter (set as default NULL
), and if there are
two groups, the first level of groups will set as the reference group; if
there are multiple groups, it will perform an ANOVA-like testing to find
markers which difference in any of the groups.
Of note, metagenomeSeq::fitFeatureModel()
is not allows for multiple
groups comparison.
a microbiomeMarker
object.
Yang Cao
Paulson, Joseph N., et al. "Differential abundance analysis for microbial marker-gene surveys." Nature methods 10.12 (2013): 1200-1202.
data(enterotypes_arumugam) ps <- phyloseq::subset_samples( enterotypes_arumugam, Enterotype %in% c("Enterotype 3", "Enterotype 2") ) run_metagenomeseq(ps, group = "Enterotype")
data(enterotypes_arumugam) ps <- phyloseq::subset_samples( enterotypes_arumugam, Enterotype %in% c("Enterotype 3", "Enterotype 2") ) run_metagenomeseq(ps, group = "Enterotype")
Multiple group test, such as anova and Kruskal-Wallis rank sum test, can be used to uncover the significant feature among all groups. Post hoc tests are used to uncover specific mean differences between pair of groups.
run_posthoc_test( ps, group, transform = c("identity", "log10", "log10p"), norm = "TSS", norm_para = list(), conf_level = 0.95, method = c("tukey", "games_howell", "scheffe", "welch_uncorrected") )
run_posthoc_test( ps, group, transform = c("identity", "log10", "log10p"), norm = "TSS", norm_para = list(), conf_level = 0.95, method = c("tukey", "games_howell", "scheffe", "welch_uncorrected") )
ps |
a |
group |
character, the variable to set the group |
transform |
character, the methods used to transform the microbial
abundance. See
|
norm |
the methods used to normalize the microbial abundance data. See
|
norm_para |
arguments passed to specific normalization methods |
conf_level |
confidence level, default 0.95 |
method |
one of "tukey", "games_howell", "scheffe", "welch_uncorrected", defining the method for the pairwise comparisons. See details for more information. |
a postHocTest object
postHocTest, run_test_multiple_groups()
data(enterotypes_arumugam) ps <- phyloseq::subset_samples( enterotypes_arumugam, Enterotype %in% c("Enterotype 3", "Enterotype 2", "Enterotype 1") ) %>% phyloseq::subset_taxa(Phylum == "Bacteroidetes") pht <- run_posthoc_test(ps, group = "Enterotype") pht
data(enterotypes_arumugam) ps <- phyloseq::subset_samples( enterotypes_arumugam, Enterotype %in% c("Enterotype 3", "Enterotype 2", "Enterotype 1") ) %>% phyloseq::subset_taxa(Phylum == "Bacteroidetes") pht <- run_posthoc_test(ps, group = "Enterotype") pht
Perform simple statistical analysis of metagenomic profiles. This function
is a wrapper of run_test_two_groups
and run_test_multiple_groups
.
run_simple_stat( ps, group, taxa_rank = "all", transform = c("identity", "log10", "log10p"), norm = "TSS", norm_para = list(), method = c("welch.test", "t.test", "white.test", "anova", "kruskal"), p_adjust = c("none", "fdr", "bonferroni", "holm", "hochberg", "hommel", "BH", "BY"), pvalue_cutoff = 0.05, diff_mean_cutoff = NULL, ratio_cutoff = NULL, eta_squared_cutoff = NULL, conf_level = 0.95, nperm = 1000, ... )
run_simple_stat( ps, group, taxa_rank = "all", transform = c("identity", "log10", "log10p"), norm = "TSS", norm_para = list(), method = c("welch.test", "t.test", "white.test", "anova", "kruskal"), p_adjust = c("none", "fdr", "bonferroni", "holm", "hochberg", "hommel", "BH", "BY"), pvalue_cutoff = 0.05, diff_mean_cutoff = NULL, ratio_cutoff = NULL, eta_squared_cutoff = NULL, conf_level = 0.95, nperm = 1000, ... )
ps |
a |
group |
character, the variable to set the group |
taxa_rank |
character to specify taxonomic rank to perform
differential analysis on. Should be one of
|
transform |
character, the methods used to transform the microbial
abundance. See
|
norm |
the methods used to normalize the microbial abundance data. See
|
norm_para |
arguments passed to specific normalization methods |
method |
test method, options include: "welch.test", "t.test" and "white.test" for two groups comparison, "anova"and "kruskal" for multiple groups comparison. |
p_adjust |
method for multiple test correction, default |
pvalue_cutoff |
numeric, p value cutoff, default 0.05 |
diff_mean_cutoff , ratio_cutoff
|
only used for two groups comparison,
cutoff of different means and ratios, default |
eta_squared_cutoff |
only used for multiple groups comparison, numeric,
cutoff of effect size (eta squared) default |
conf_level |
only used for two groups comparison, numeric, confidence level of interval. |
nperm |
integer, only used for two groups comparison, number of permutations for white non parametric t test estimation |
... |
only used for two groups comparison, extra arguments passed to
|
a microbiomeMarker
object.
run_test_two_groups()
,run_test_multiple_groups()
data(enterotypes_arumugam) ps <- phyloseq::subset_samples( enterotypes_arumugam, Enterotype %in% c("Enterotype 3", "Enterotype 2") ) run_simple_stat(ps, group = "Enterotype")
data(enterotypes_arumugam) ps <- phyloseq::subset_samples( enterotypes_arumugam, Enterotype %in% c("Enterotype 3", "Enterotype 2") ) run_simple_stat(ps, group = "Enterotype")
Identify biomarkers using logistic regression, random forest, or support vector machine.
run_sl( ps, group, taxa_rank = "all", transform = c("identity", "log10", "log10p"), norm = "none", norm_para = list(), nfolds = 3, nrepeats = 3, sampling = NULL, tune_length = 5, top_n = 10, method = c("LR", "RF", "SVM"), ... )
run_sl( ps, group, taxa_rank = "all", transform = c("identity", "log10", "log10p"), norm = "none", norm_para = list(), nfolds = 3, nrepeats = 3, sampling = NULL, tune_length = 5, top_n = 10, method = c("LR", "RF", "SVM"), ... )
ps |
a |
group |
character, the variable to set the group. |
taxa_rank |
character to specify taxonomic rank to perform
differential analysis on. Should be one of
|
transform |
character, the methods used to transform the microbial
abundance. See
|
norm |
the methods used to normalize the microbial abundance data. See
|
norm_para |
named |
nfolds |
the number of splits in CV. |
nrepeats |
the number of complete sets of folds to compute. |
sampling |
a single character value describing the type of additional
sampling that is conducted after resampling (usually to resolve class
imbalances). Values are "none", "down", "up", "smote", or "rose". For
more details see |
tune_length |
an integer denoting the amount of granularity in the
tuning parameter grid. For more details see |
top_n |
an integer denoting the top |
method |
supervised learning method, options are "LR" (logistic regression), "RF" (rando forest), or "SVM" (support vector machine). |
... |
extra arguments passed to the classification. e.g., |
Only support two groups comparison in the current version. And the marker was selected based on its importance score. Moreover, The hyper-parameters are selected automatically by a grid-search based method in the N-time K-fold cross-validation. Thus, the identified biomarker based can be biased due to model overfitting for small datasets (e.g., with less than 100 samples).
The argument top_n
is used to denote the number of markers based on the
importance score. There is no rule or principle on how to select top_n
,
however, usually it is very useful to try a different top_n
and compare
the performance of the marker predictions for the testing data.
a microbiomeMarker object.
Yang Cao
caret::train()
,caret::trainControl()
data(enterotypes_arumugam) # small example phyloseq object for test ps_small <- phyloseq::subset_taxa( enterotypes_arumugam, Phylum %in% c("Firmicutes", "Bacteroidetes") ) set.seed(2021) mm <- run_sl( ps_small, group = "Gender", taxa_rank = "Genus", nfolds = 2, nrepeats = 1, top_n = 15, norm = "TSS", method = "LR", ) mm
data(enterotypes_arumugam) # small example phyloseq object for test ps_small <- phyloseq::subset_taxa( enterotypes_arumugam, Phylum %in% c("Firmicutes", "Bacteroidetes") ) set.seed(2021) mm <- run_sl( ps_small, group = "Gender", taxa_rank = "Genus", nfolds = 2, nrepeats = 1, top_n = 15, norm = "TSS", method = "LR", ) mm
Statistical test for multiple groups
run_test_multiple_groups( ps, group, taxa_rank = "all", transform = c("identity", "log10", "log10p"), norm = "TSS", norm_para = list(), method = c("anova", "kruskal"), p_adjust = c("none", "fdr", "bonferroni", "holm", "hochberg", "hommel", "BH", "BY"), pvalue_cutoff = 0.05, effect_size_cutoff = NULL )
run_test_multiple_groups( ps, group, taxa_rank = "all", transform = c("identity", "log10", "log10p"), norm = "TSS", norm_para = list(), method = c("anova", "kruskal"), p_adjust = c("none", "fdr", "bonferroni", "holm", "hochberg", "hommel", "BH", "BY"), pvalue_cutoff = 0.05, effect_size_cutoff = NULL )
ps |
a |
group |
character, the variable to set the group |
taxa_rank |
character to specify taxonomic rank to perform
differential analysis on. Should be one of
|
transform |
character, the methods used to transform the microbial
abundance. See
|
norm |
the methods used to normalize the microbial abundance data. See
|
norm_para |
arguments passed to specific normalization methods |
method |
test method, must be one of "anova" or "kruskal" |
p_adjust |
method for multiple test correction, default |
pvalue_cutoff |
numeric, p value cutoff, default 0.05. |
effect_size_cutoff |
numeric, cutoff of effect size default |
a microbiomeMarker
object.
run_posthoc_test()
,run_test_two_groups()
,run_simple_stat()
data(enterotypes_arumugam) ps <- phyloseq::subset_samples( enterotypes_arumugam, Enterotype %in% c("Enterotype 3", "Enterotype 2", "Enterotype 1") ) mm_anova <- run_test_multiple_groups( ps, group = "Enterotype", method = "anova" )
data(enterotypes_arumugam) ps <- phyloseq::subset_samples( enterotypes_arumugam, Enterotype %in% c("Enterotype 3", "Enterotype 2", "Enterotype 1") ) mm_anova <- run_test_multiple_groups( ps, group = "Enterotype", method = "anova" )
Statistical test between two groups
run_test_two_groups( ps, group, taxa_rank = "all", transform = c("identity", "log10", "log10p"), norm = "TSS", norm_para = list(), method = c("welch.test", "t.test", "white.test"), p_adjust = c("none", "fdr", "bonferroni", "holm", "hochberg", "hommel", "BH", "BY"), pvalue_cutoff = 0.05, diff_mean_cutoff = NULL, ratio_cutoff = NULL, conf_level = 0.95, nperm = 1000, ... )
run_test_two_groups( ps, group, taxa_rank = "all", transform = c("identity", "log10", "log10p"), norm = "TSS", norm_para = list(), method = c("welch.test", "t.test", "white.test"), p_adjust = c("none", "fdr", "bonferroni", "holm", "hochberg", "hommel", "BH", "BY"), pvalue_cutoff = 0.05, diff_mean_cutoff = NULL, ratio_cutoff = NULL, conf_level = 0.95, nperm = 1000, ... )
ps |
a |
group |
character, the variable to set the group |
taxa_rank |
character to specify taxonomic rank to perform
differential analysis on. Should be one of
|
transform |
character, the methods used to transform the microbial
abundance. See
|
norm |
the methods used to normalize the microbial abundance data. See
|
norm_para |
arguments passed to specific normalization methods |
method |
test method, must be one of "welch.test", "t.test" or "white.test" |
p_adjust |
method for multiple test correction, default |
pvalue_cutoff |
numeric, p value cutoff, default 0.05 |
diff_mean_cutoff , ratio_cutoff
|
cutoff of different means and ratios,
default |
conf_level |
numeric, confidence level of interval. |
nperm |
integer, number of permutations for white non parametric t test estimation |
... |
extra arguments passed to |
a microbiomeMarker
object.
Yang Cao
run_test_multiple_groups()
,run_simple_stat
data(enterotypes_arumugam) mm_welch <- run_test_two_groups( enterotypes_arumugam, group = "Gender", method = "welch.test" ) mm_welch
data(enterotypes_arumugam) mm_welch <- run_test_two_groups( enterotypes_arumugam, group = "Gender", method = "welch.test" ) mm_welch
Subset markers based on an expression related to the columns and values
within the marker_table
slot of mm
.
subset_marker(mm, ...)
subset_marker(mm, ...)
mm |
a |
... |
the subsetting expression passed to |
a subset object in the same class with mm
.
data(enterotypes_arumugam) mm <- run_limma_voom( enterotypes_arumugam, "Enterotype", contrast = c("Enterotype 3", "Enterotype 2"), pvalue_cutoff = 0.01, p_adjust = "none" ) subset_marker(mm, pvalue < 0.005)
data(enterotypes_arumugam) mm <- run_limma_voom( enterotypes_arumugam, "Enterotype", contrast = c("Enterotype 3", "Enterotype 2"), pvalue_cutoff = 0.01, p_adjust = "none" ) subset_marker(mm, pvalue < 0.005)
Provides summary information of the representation of a taxonomic levels within each sample.
summarize_taxa(ps, level = rank_names(ps)[1], absolute = TRUE, sep = "|")
summarize_taxa(ps, level = rank_names(ps)[1], absolute = TRUE, sep = "|")
ps |
a |
level |
taxonomic level to summarize, default the top level rank of the
|
absolute |
logical, whether return the absolute abundance or
relative abundance, default |
sep |
a character string to separate the taxonomic levels. |
a phyloseq::phyloseq
object, where each row represents a
taxa, and each col represents the taxa abundance of each sample.
data(enterotypes_arumugam) summarize_taxa(enterotypes_arumugam)
data(enterotypes_arumugam) summarize_taxa(enterotypes_arumugam)
Summary differential analysis methods comparison results
## S3 method for class 'compareDA' summary( object, sort = c("score", "auc", "fpr", "power"), boot = TRUE, boot_n = 1000L, prob = c(0.05, 0.95), ... )
## S3 method for class 'compareDA' summary( object, sort = c("score", "auc", "fpr", "power"), boot = TRUE, boot_n = 1000L, prob = c(0.05, 0.95), ... )
object |
an |
sort |
character string specifying sort method. Possibilities are
"score" which is calculated as |
boot |
logical, whether use bootstrap for confidence limites of the
score, default |
boot_n |
integer, number of bootstraps, default 1000L. |
prob |
two length numeric vector, confidence limits for score, default
|
... |
extra arguments affecting the summary produced. |
a data.frame
containing measurements for differential analysis
methods:
call
: differential analysis commands.
auc
: area under curve of ROC.
fpr
: false positive rate
power
: empirical power.
fdr
: false discover7y rate.
score
: score whch is calculated as .
score_*
: confidence limits of score.
otu_table
sample by sampleTransform the taxa abundances in otu_table
sample by sample, which means
the counts of each sample will be transformed individually.
transform_abundances(object, transform = c("identity", "log10", "log10p"))
transform_abundances(object, transform = c("identity", "log10", "log10p"))
object |
|
transform |
transformation to apply, the options inclulde:
|
A object matches the class of argument object
with the transformed
otu_table
.
data(oxygen) x1 <- transform_abundances(oxygen) head(otu_table(x1), 10) x2 <- transform_abundances(oxygen, "log10") head(otu_table(x2), 10) x3 <- transform_abundances(oxygen, "log10p") head(otu_table(x3), 10)
data(oxygen) x1 <- transform_abundances(oxygen) head(otu_table(x1), 10) x2 <- transform_abundances(oxygen, "log10") head(otu_table(x2), 10) x3 <- transform_abundances(oxygen, "log10p") head(otu_table(x3), 10)