Title: | Dirichlet-Multinomial Mixture Model Machine Learning for Microbiome Data |
---|---|
Description: | Dirichlet-multinomial mixture models can be used to describe variability in microbial metagenomic data. This package is an interface to code originally made available by Holmes, Harris, and Quince, 2012, PLoS ONE 7(2): 1-15, as discussed further in the man page for this package, ?DirichletMultinomial. |
Authors: | Martin Morgan [aut, cre] |
Maintainer: | Martin Morgan <[email protected]> |
License: | LGPL-3 |
Version: | 1.47.2 |
Built: | 2024-11-13 11:17:41 UTC |
Source: | https://github.com/mtmorgan/DirichletMultinomial |
Dirichlet-multinomial mixture models can be used to describe variability in microbial metagenomic data. This package is an interface to code originally made available by Holmes, Harris, and Qunice, 2012, PLoS ONE 7(2): 1-15.
The estimation routine is from the LGPL-licensed (as stated on the corresponding googlecode page) source http://microbedmm.googlecode.com/files/MicrobeDMMv1.0.tar.gz, retrieved 17 Feburary 2012.
The algorithm is described in Holmes I, Harris K, Quince C, 2012 Dirichlet Multinomial Mixtures: Generative Models for Microbial Metagenomics. PLoS ONE 7(2): e30126. doi:10.1371/journal.pone.0030126.
Maintainer: Martin Morgan mailto:[email protected]
Run cross-validation on Dirichlet-Multinomial generative classifiers.
cvdmngroup(ncv, count, k, z, ..., verbose = FALSE, .lapply = parallel::mclapply)
cvdmngroup(ncv, count, k, z, ..., verbose = FALSE, .lapply = parallel::mclapply)
ncv |
|
count |
|
k |
named |
z |
True group assignment. |
... |
Additional arguments, passed to |
verbose |
|
.lapply |
A function used to perform the outer cross-vaildation
loop, e.g., |
A data.frame
summarizing classifications of test samples in
cross-validation groups. Columns are:
group |
The cross-validation group in which the indivdual was used for testing. |
additional columns |
Named after classification groups, giving the posterior probability of assignment. |
Martin Morgan mailto:[email protected]
dmn
, DirichletMultinomial-package,
vignette("DirichletMultinomial")
data(xval) ## result of following commands head(xval) ## Not run: ## count matrix fl <- system.file(package="DirichletMultinomial", "extdata", "Twins.csv") count <- t(as.matrix(read.csv(fl, row.names=1))) ## phenotype fl <- system.file(package="DirichletMultinomial", "extdata", "TwinStudy.t") pheno0 <- scan(fl) lvls <- c("Lean", "Obese", "Overwt") pheno <- factor(lvls[pheno0 + 1], levels=lvls) names(pheno) <- rownames(count) ## subset keep <- c("Lean", "Obese") count <- count[pheno pheno <- factor(pheno[pheno ## cross-validation, single Dirichlet component for Lean, 3 for Obese xval <- cvdmngroup(nrow(count), count, c(Lean=1, Obese=3), pheno, verbose=TRUE, mc.preschedule=FALSE) ## End(Not run)
data(xval) ## result of following commands head(xval) ## Not run: ## count matrix fl <- system.file(package="DirichletMultinomial", "extdata", "Twins.csv") count <- t(as.matrix(read.csv(fl, row.names=1))) ## phenotype fl <- system.file(package="DirichletMultinomial", "extdata", "TwinStudy.t") pheno0 <- scan(fl) lvls <- c("Lean", "Obese", "Overwt") pheno <- factor(lvls[pheno0 + 1], levels=lvls) names(pheno) <- rownames(count) ## subset keep <- c("Lean", "Obese") count <- count[pheno pheno <- factor(pheno[pheno ## cross-validation, single Dirichlet component for Lean, 3 for Obese xval <- cvdmngroup(nrow(count), count, c(Lean=1, Obese=3), pheno, verbose=TRUE, mc.preschedule=FALSE) ## End(Not run)
These data objects correspond to steps in a typical work flow, as
described in the vignette to this package. fit
corresponds to
dmn
fits to different values of k
. bestgroup
is
the result of the two-group generative classifier. xval
summarizes leave-one-out cross validation of the classifier.
data(fit) data(bestgrp) data(xval)
data(fit) data(bestgrp) data(xval)
fit
is a list of seven DMN
objects.
bestgrp
is a DMNGroup
object.
xval
is a data.frame
with columns corresponding to the
cross-validation group membership and the Lean and Obese posterior
probabilities.
data(fit); fit[1:2] plot(sapply(fit, laplace), type="b") data(bestgrp); bestgrp data(xval); head(xval, 3)
data(fit); fit[1:2] plot(sapply(fit, laplace), type="b") data(bestgrp); bestgrp data(xval); head(xval, 3)
Fit Dirichlet-Multinomial models to a sample x taxon count matrix.
dmn(count, k, verbose = FALSE, seed = runif(1, 0, .Machine$integer.max))
dmn(count, k, verbose = FALSE, seed = runif(1, 0, .Machine$integer.max))
count |
|
k |
|
verbose |
|
seed |
|
This implements Dirichlet-multinomial mixture models describe in the package help page, DirichletMultinomial-package.
An object of class dmn
, with elements (elements are usually
retrieved via functions defined in the package, not directly).
GoodnessOfFit |
NLE, LogDet, Laplace, AIC, and BIC criteria assessing goodness-of-fit. |
Group |
|
Mixture |
|
Fit |
|
Martin Morgan mailto:[email protected]
Holmes I, Harris K, Quince C, 2012 Dirichlet Multinomial Mixtures: Generative Models for Microbial Metagenomics. PLoS ONE 7(2): e30126. doi:10.1371/journal.pone.0030126.
DirichletMultinomial-package,
vignette("DirichletMultinomial")
data(fit) ## k = 1:7; full example in vignette lplc <- sapply(fit, laplace) plot(lplc, type="b") fit[[which.min(lplc)]]
data(fit) ## k = 1:7; full example in vignette lplc <- sapply(fit, laplace) plot(lplc, type="b") fit[[which.min(lplc)]]
"DMN"
Result from fitting a Dirichlet-Multinomial model.
Objects can be created by calls to dmn
..
The contents of a slot is usually retrieved via the methods described
on the mixture
help page.
NLE, LogDet, Laplace, AIC, and BIC criteria assessing goodness-of-fit.
matrix
of dimension samples x k
,
providing the Dirichlet parameter vectors.
numeric()
of length k
, with relative
weight of each component.
matrix()
of dimension taxa x k
with
95% lower bounds on Dirichlet component vector estimates.
matrix()
of dimension taxa x k
with
Dirichlet component vector estimates.
matrix()
of dimension taxa x k
with
95% upper bounds on Dirichlet component vector estimates.
See the mixture
help page.
Martin Morgan mailto:[email protected]
data(fit) fit[[4]]
data(fit) fit[[4]]
Fit Dirichlet-Multinomial generative classifiers to groups (rows) within a sample x taxon count matrix.
dmngroup(count, group, k, ..., simplify = TRUE, .lapply = parallel::mclapply)
dmngroup(count, group, k, ..., simplify = TRUE, .lapply = parallel::mclapply)
count |
|
group |
|
k |
|
... |
Additional arguments, passed to |
simplify |
Return only the best-fit model for each group? |
.lapply |
An |
This function divided count
into groups defined by
group
, creates all combinations of group
x k
,
and evaluates each using dmn
. When simplify=TRUE
,
the best (Laplace) fit is selected for each group.
An object of class dmngroup
, a list of fitted models of class
dmn
. When simplify=TRUE
, elements are named by
the group to which they correspond.
Martin Morgan mailto:[email protected]
Holmes I, Harris K, Quince C, 2012 Dirichlet Multinomial Mixtures: Generative Models for Microbial Metagenomics. PLoS ONE 7(2): e30126. doi:10.1371/journal.pone.0030126.
dmn
, DirichletMultinomial-package,
vignette("DirichletMultinomial")
## best fit for groups 'Lean' and 'Obese'; full example in vignette. ## Not run: bestgrp <- dmngroup(count, pheno, k=1:5, verbose=TRUE, mc.preschedule=FALSE) ## End(Not run) data(bestgrp) bestgrp bestgrp[["Obese"]]
## best fit for groups 'Lean' and 'Obese'; full example in vignette. ## Not run: bestgrp <- dmngroup(count, pheno, k=1:5, verbose=TRUE, mc.preschedule=FALSE) ## End(Not run) data(bestgrp) bestgrp bestgrp[["Obese"]]
"DMNGroup"
Result from fitting a Dirichlet-Multinomial generative classifier.
Objects can be created by calls to dmngroup
.
All slots in this class are inheritted from SimpleList
; see
‘Methods’, below, for information on how to manipulate this
object.
Class "SimpleList"
, directly.
Class "List"
, by class "SimpleList", distance 2.
Class "Vector"
, by class "SimpleList", distance 3.
Class "Annotated"
, by class "SimpleList", distance 4.
See the mixture
help page for functions that operate on
DMNGroup
and DMN
.
DMNGroup
can be manipulated as a list; see
SimpleList
for a description of typical
list-like functions.
Martin Morgan mailto:[email protected]
mixture
, DMN
,
SimpleList
.
data(bestgrp) bestgrp bestgrp[[1]]
data(bestgrp) bestgrp bestgrp[[1]]
Produce a heat map summarizing count data, grouped by Dirichlet component.
heatmapdmn(count, fit1, fitN, ntaxa = 30, ..., transform = sqrt, lblwidth = 0.2 * nrow(count), col = .gradient)
heatmapdmn(count, fit1, fitN, ntaxa = 30, ..., transform = sqrt, lblwidth = 0.2 * nrow(count), col = .gradient)
count |
A matrix of sample x taxon counts, as supplied to
|
fit1 |
An instance of class |
fitN |
An instance of class |
ntaxa |
The |
... |
Additional arguments, ignored. |
transform |
Transformation to apply to count data prior to visualization; this does not influence mixture membership or taxnomic ordering. |
lblwidth |
The proportion of the plot to dedicate to taxanomic labels, as a fraction of the number of samples to be plotted. |
col |
The colors used to display (possibly transformed, by
|
Columns of the heat map correspond to samples. Samples are grouped by Dirichlet component, with average (Dirichlet) components summarized as a separate wide column. Rows correspond to taxonomic groups, ordered based on contribution to Dirichlet components.
Martin Morgan mailto:[email protected]
## counts fl <- system.file(package="DirichletMultinomial", "extdata", "Twins.csv") count <- t(as.matrix(read.csv(fl, row.names=1))) ## all and best-fit clustering data(fit) lplc <- sapply(fit, laplace) best <- fit[[which.min(lplc)]] heatmapdmn(count, fit[[1]], best, 30)
## counts fl <- system.file(package="DirichletMultinomial", "extdata", "Twins.csv") count <- t(as.matrix(read.csv(fl, row.names=1))) ## all and best-fit clustering data(fit) lplc <- sapply(fit, laplace) best <- fit[[which.min(lplc)]] heatmapdmn(count, fit[[1]], best, 30)
The accessors mixture
and mixturewt
return information
about the estimated Dirichlet components of the fitted
model. Return values are described in the Values section, below.
mixture(object, ..., assign=FALSE) mixturewt(object, ...) goodnessOfFit(object, ...) laplace(object, ...) ## S4 method for signature 'DMN' AIC(object, ..., k = 2) ## S4 method for signature 'DMN' BIC(object, ...) ## S4 method for signature 'DMN' fitted(object, ..., scale=FALSE) ## S4 method for signature 'DMN' predict(object, newdata, ..., logevidence=FALSE) ## S4 method for signature 'DMNGroup' fitted(object, ...) ## S4 method for signature 'DMNGroup' predict(object, newdata, ..., assign=FALSE) ## S4 method for signature 'DMNGroup' summary(object, ...)
mixture(object, ..., assign=FALSE) mixturewt(object, ...) goodnessOfFit(object, ...) laplace(object, ...) ## S4 method for signature 'DMN' AIC(object, ..., k = 2) ## S4 method for signature 'DMN' BIC(object, ...) ## S4 method for signature 'DMN' fitted(object, ..., scale=FALSE) ## S4 method for signature 'DMN' predict(object, newdata, ..., logevidence=FALSE) ## S4 method for signature 'DMNGroup' fitted(object, ...) ## S4 method for signature 'DMNGroup' predict(object, newdata, ..., assign=FALSE) ## S4 method for signature 'DMNGroup' summary(object, ...)
object |
An instance of class |
newdata |
A |
... |
Additional arguments, available to methods, when applicable. |
assign |
|
scale |
|
logevidence |
|
k |
ignored. |
mixture
with assign=FALSE
returns a matrix of sample x
Dirichlet component estimates. With assign=TRUE
mixture
returns a named vector indexing the maximal Dirichlet component of
each sample.
mixturewt
returns a matrix with rows corresponding to mixture
components, and columns pi
(component weight) and theta
(component variability). Small values of theta
correspond to
highly variable components.
goodnessOfFit
returns a named numeric vector of measures of
goodness of fit.
laplace
, AIC
, and BIC
return the corresponding
measures of goodness of fit.
Martin Morgan mailto:[email protected]
data(fit) best <- fit[[4]] mixturewt(best) head(mixture(best), 3) head(mixture(best, assign=TRUE), 3) goodnessOfFit(best) fl <- system.file(package="DirichletMultinomial", "extdata", "Twins.csv") count <- t(as.matrix(read.csv(fl, row.names=1))) data(bestgrp) bestgrp head(predict(bestgrp, count))
data(fit) best <- fit[[4]] mixturewt(best) head(mixture(best), 3) head(mixture(best, assign=TRUE), 3) goodnessOfFit(best) fl <- system.file(package="DirichletMultinomial", "extdata", "Twins.csv") count <- t(as.matrix(read.csv(fl, row.names=1))) data(bestgrp) bestgrp head(predict(bestgrp, count))
Returns a data.frame
summarizing the cummulative true- and
false-positive probabilities from expected and observed
classifications.
roc(exp, obs, ...)
roc(exp, obs, ...)
exp |
|
obs |
Predicted probability of assignment to the group identified
by |
... |
Additional arguments, available to methods. |
A data.frame
with columns
TruePositive |
Cummulative probability of correct assignment. |
FalsePositive |
Cummulative probability of incorrect assignment. |
Martin Morgan mailto:[email protected]
library(lattice) ## count matrix fl <- system.file(package="DirichletMultinomial", "extdata", "Twins.csv") count <- t(as.matrix(read.csv(fl, row.names=1))) ## phenotype fl <- system.file(package="DirichletMultinomial", "extdata", "TwinStudy.t") pheno0 <- scan(fl) lvls <- c("Lean", "Obese", "Overwt") pheno <- factor(lvls[pheno0 + 1], levels=lvls) names(pheno) <- rownames(count) ## count data used for cross-validation, and cross-validation count <- csubset(c("Lean", "Obese"), count, pheno) data(bestgrp) ## true, false positives from single-group classifier bst <- roc(pheno[rownames(count)] == "Obese", predict(bestgrp, count)[,"Obese"]) head(bst) ## lattice plot xyplot(TruePostive ~ FalsePositive, bst, type="l", xlab="False Positive", ylab="True Positive")
library(lattice) ## count matrix fl <- system.file(package="DirichletMultinomial", "extdata", "Twins.csv") count <- t(as.matrix(read.csv(fl, row.names=1))) ## phenotype fl <- system.file(package="DirichletMultinomial", "extdata", "TwinStudy.t") pheno0 <- scan(fl) lvls <- c("Lean", "Obese", "Overwt") pheno <- factor(lvls[pheno0 + 1], levels=lvls) names(pheno) <- rownames(count) ## count data used for cross-validation, and cross-validation count <- csubset(c("Lean", "Obese"), count, pheno) data(bestgrp) ## true, false positives from single-group classifier bst <- roc(pheno[rownames(count)] == "Obese", predict(bestgrp, count)[,"Obese"]) head(bst) ## lattice plot xyplot(TruePostive ~ FalsePositive, bst, type="l", xlab="False Positive", ylab="True Positive")
csubset
creates a subset of a count matrix, based on identity
of column phenotypes to a specified value.
csubset(val, x, pheno, cidx = TRUE)
csubset(val, x, pheno, cidx = TRUE)
val |
|
x |
A matrix of counts, with rows corresponding to samples and columns to taxonomic groups. |
pheno |
A |
cidx |
A |
A matrix
of counts, with rows satisfying pheno %in%
val
and with columns equal either to ncol(x)
(when
cidx=TRUE
) or the number of columns with non-zero counts after
row subsetting (cidx=FALSE
).
Martin Morgan mailto:[email protected]
## count matrix fl <- system.file(package="DirichletMultinomial", "extdata", "Twins.csv") count <- t(as.matrix(read.csv(fl, row.names=1))) ## phenotype fl <- system.file(package="DirichletMultinomial", "extdata", "TwinStudy.t") pheno0 <- scan(fl) lvls <- c("Lean", "Obese", "Overwt") pheno <- factor(lvls[pheno0 + 1], levels=lvls) names(pheno) <- rownames(count) ## subset dim(count) sum("Lean" == pheno) dim(csubset("Lean", count, pheno)) dim(csubset("Lean", count, pheno, cidx=FALSE))
## count matrix fl <- system.file(package="DirichletMultinomial", "extdata", "Twins.csv") count <- t(as.matrix(read.csv(fl, row.names=1))) ## phenotype fl <- system.file(package="DirichletMultinomial", "extdata", "TwinStudy.t") pheno0 <- scan(fl) lvls <- c("Lean", "Obese", "Overwt") pheno <- factor(lvls[pheno0 + 1], levels=lvls) names(pheno) <- rownames(count) ## subset dim(count) sum("Lean" == pheno) dim(csubset("Lean", count, pheno)) dim(csubset("Lean", count, pheno, cidx=FALSE))