Package 'DirichletMultinomial' reference manual

Title:	Dirichlet-Multinomial Mixture Model Machine Learning for Microbiome Data
Description:	Dirichlet-multinomial mixture models can be used to describe variability in microbial metagenomic data. This package is an interface to code originally made available by Holmes, Harris, and Quince, 2012, PLoS ONE 7(2): 1-15, as discussed further in the man page for this package, ?DirichletMultinomial.
Authors:	Martin Morgan [aut, cre]
Maintainer:	Martin Morgan <[email protected]>
License:	LGPL-3
Version:	1.47.2
Built:	2024-11-13 11:17:41 UTC
Source:	https://github.com/mtmorgan/DirichletMultinomial

Dirichlet-Multinomial Mixture Model Machine Learning for Microbiome Data

Description

Dirichlet-multinomial mixture models can be used to describe variability in microbial metagenomic data. This package is an interface to code originally made available by Holmes, Harris, and Qunice, 2012, PLoS ONE 7(2): 1-15.

Details

The estimation routine is from the LGPL-licensed (as stated on the corresponding googlecode page) source http://microbedmm.googlecode.com/files/MicrobeDMMv1.0.tar.gz, retrieved 17 Feburary 2012.

The algorithm is described in Holmes I, Harris K, Quince C, 2012 Dirichlet Multinomial Mixtures: Generative Models for Microbial Metagenomics. PLoS ONE 7(2): e30126. doi:10.1371/journal.pone.0030126.

Author(s)

Maintainer: Martin Morgan mailto:[email protected]

Cross-validation on Dirichlet-Multinomial classifiers.

Description

Run cross-validation on Dirichlet-Multinomial generative classifiers.

Usage

cvdmngroup(ncv, count, k, z, ..., verbose = FALSE,
    .lapply = parallel::mclapply)
cvdmngroup(ncv, count, k, z, ..., verbose = FALSE,
    .lapply = parallel::mclapply)

Arguments

`ncv`	`integer(1)` number of cross-validation groups, between 2 and `nrow(count)`.
`count`	`matrix` of sample x taxon counts, subsets of which are used for training and cross-validation.
`k`	named `integer()` vector of groups and number of Dirichlet components; e.g., `c(Lean=1, Obese=3)` performs cross-validation for models with `k=1` Dirichlet components for the ‘Lean’ group, `k=3` Dirichlet components for ‘Obese’.
`z`	True group assignment.
`...`	Additional arguments, passed to `dmn` during each cross-validation.
`verbose`	`logical(1)` indicating whether progress should be reported
`.lapply`	A function used to perform the outer cross-vaildation loop, e.g., `lapply` for calculation on a single processor, `parallel::mclapply` for parallel evaluation.

Value

A data.frame summarizing classifications of test samples in cross-validation groups. Columns are:

`group`	The cross-validation group in which the indivdual was used for testing.
`additional columns`	Named after classification groups, giving the posterior probability of assignment.

Author(s)

Martin Morgan mailto:[email protected]

Examples

data(xval)  ## result of following commands
head(xval)

## Not run: 
## count matrix
fl <- system.file(package="DirichletMultinomial", "extdata",
                  "Twins.csv")
count <- t(as.matrix(read.csv(fl, row.names=1)))

## phenotype
fl <- system.file(package="DirichletMultinomial", "extdata",
                  "TwinStudy.t")
pheno0 <- scan(fl)
lvls <- c("Lean", "Obese", "Overwt")
pheno <- factor(lvls[pheno0 + 1], levels=lvls)
names(pheno) <- rownames(count)

## subset
keep <- c("Lean", "Obese")
count <- count[pheno 
pheno <- factor(pheno[pheno 

## cross-validation, single Dirichlet component for Lean, 3 for Obese
xval <- cvdmngroup(nrow(count), count, c(Lean=1, Obese=3), pheno,
                   verbose=TRUE, mc.preschedule=FALSE)

## End(Not run)data(xval)  ## result of following commands
head(xval)

## Not run: 
## count matrix
fl <- system.file(package="DirichletMultinomial", "extdata",
                  "Twins.csv")
count <- t(as.matrix(read.csv(fl, row.names=1)))

## phenotype
fl <- system.file(package="DirichletMultinomial", "extdata",
                  "TwinStudy.t")
pheno0 <- scan(fl)
lvls <- c("Lean", "Obese", "Overwt")
pheno <- factor(lvls[pheno0 + 1], levels=lvls)
names(pheno) <- rownames(count)

## subset
keep <- c("Lean", "Obese")
count <- count[pheno 
pheno <- factor(pheno[pheno 

## cross-validation, single Dirichlet component for Lean, 3 for Obese
xval <- cvdmngroup(nrow(count), count, c(Lean=1, Obese=3), pheno,
                   verbose=TRUE, mc.preschedule=FALSE)

## End(Not run)

Data objects used for examples and the vignette

Description

These data objects correspond to steps in a typical work flow, as described in the vignette to this package. fit corresponds to dmn fits to different values of k. bestgroup is the result of the two-group generative classifier. xval summarizes leave-one-out cross validation of the classifier.

Usage

data(fit)
data(bestgrp)
data(xval)
data(fit)
data(bestgrp)
data(xval)

Format

fit is a list of seven DMN objects.

bestgrp is a DMNGroup object.

xval is a data.frame with columns corresponding to the cross-validation group membership and the Lean and Obese posterior probabilities.

Examples

data(fit); fit[1:2]
plot(sapply(fit, laplace), type="b")
data(bestgrp); bestgrp
data(xval); head(xval, 3)
data(fit); fit[1:2]
plot(sapply(fit, laplace), type="b")
data(bestgrp); bestgrp
data(xval); head(xval, 3)

Fit Dirichlet-Multinomial models to count data.

Description

Fit Dirichlet-Multinomial models to a sample x taxon count matrix.

Usage

dmn(count, k, verbose = FALSE, seed = runif(1, 0, .Machine$integer.max))
dmn(count, k, verbose = FALSE, seed = runif(1, 0, .Machine$integer.max))

Arguments

`count`	`matrix()` of sample x taxon counts.
`k`	`integer(1)`, the number of Dirichlet components to fit.
`verbose`	`logical(1)` indicating whether progress in fit should be reported.
`seed`	`numeric(1)` random number seed.

Details

This implements Dirichlet-multinomial mixture models describe in the package help page, DirichletMultinomial-package.

Value

An object of class dmn, with elements (elements are usually retrieved via functions defined in the package, not directly).

`GoodnessOfFit`	NLE, LogDet, Laplace, AIC, and BIC criteria assessing goodness-of-fit.
`Group`	`matrix` of dimension samples x `k`, providing the Dirichlet parameter vectors.
`Mixture`	Weight `numeric()` of length `k`, with relative weight of each component.
`Fit`	Lower `matrix()` of dimension taxa x `k` with 95% lower bounds on Dirichlet component vector estimates. Estimate `matrix()` of dimension taxa x `k` with Dirichlet component vector estimates. Upper `matrix()` of dimension taxa x `k` with 95% upper bounds on Dirichlet component vector estimates.

Author(s)

Martin Morgan mailto:[email protected]

References

Holmes I, Harris K, Quince C, 2012 Dirichlet Multinomial Mixtures: Generative Models for Microbial Metagenomics. PLoS ONE 7(2): e30126. doi:10.1371/journal.pone.0030126.

Examples

data(fit)
## k = 1:7; full example in vignette
lplc <- sapply(fit, laplace)
plot(lplc, type="b")
fit[[which.min(lplc)]]
data(fit)
## k = 1:7; full example in vignette
lplc <- sapply(fit, laplace)
plot(lplc, type="b")
fit[[which.min(lplc)]]

Class `"DMN"`

Description

Result from fitting a Dirichlet-Multinomial model.

Objects from the Class

Objects can be created by calls to dmn..

Slots

The contents of a slot is usually retrieved via the methods described on the mixture help page.

goodnessOfFit

NLE, LogDet, Laplace, AIC, and BIC criteria assessing goodness-of-fit.

group

matrix of dimension samples x k, providing the Dirichlet parameter vectors.

mixture

Weight: numeric() of length k, with relative weight of each component.

fit

Lower: matrix() of dimension taxa x k with 95% lower bounds on Dirichlet component vector estimates.
Estimate: matrix() of dimension taxa x k with Dirichlet component vector estimates.
Upper: matrix() of dimension taxa x k with 95% upper bounds on Dirichlet component vector estimates.

Methods

See the mixture help page.

Author(s)

Martin Morgan mailto:[email protected]

Examples

data(fit)
fit[[4]]
data(fit)
fit[[4]]

Dirichlet-Multinomial generative classifiers.

Description

Fit Dirichlet-Multinomial generative classifiers to groups (rows) within a sample x taxon count matrix.

Usage


dmngroup(count, group, k, ..., simplify = TRUE,
    .lapply = parallel::mclapply)

dmngroup(count, group, k, ..., simplify = TRUE,
    .lapply = parallel::mclapply)

Arguments

`count`	`matrix()` of sample x taxon counts.
`group`	`factor()` or vector to be coerced to a factor, with as many elements as there are rows in `count`, indicating the group to which the corresponding sample belongs.
`k`	`integer()`, the number(s) of Dirichlet components to fit.
`...`	Additional arguments, passed to `dmn`.
`simplify`	Return only the best-fit model for each group?
`.lapply`	An `lapply`-like function for application of group x k fits.

Details

This function divided count into groups defined by group, creates all combinations of group x k, and evaluates each using dmn. When simplify=TRUE, the best (Laplace) fit is selected for each group.

Value

An object of class dmngroup, a list of fitted models of class dmn. When simplify=TRUE, elements are named by the group to which they correspond.

Author(s)

Martin Morgan mailto:[email protected]

References

Holmes I, Harris K, Quince C, 2012 Dirichlet Multinomial Mixtures: Generative Models for Microbial Metagenomics. PLoS ONE 7(2): e30126. doi:10.1371/journal.pone.0030126.

Examples

## best fit for groups 'Lean' and 'Obese'; full example in vignette.
## Not run: bestgrp <- dmngroup(count, pheno, k=1:5, verbose=TRUE, 
                    mc.preschedule=FALSE)

## End(Not run)
data(bestgrp)
bestgrp
bestgrp[["Obese"]]
## best fit for groups 'Lean' and 'Obese'; full example in vignette.
## Not run: bestgrp <- dmngroup(count, pheno, k=1:5, verbose=TRUE, 
                    mc.preschedule=FALSE)

## End(Not run)
data(bestgrp)
bestgrp
bestgrp[["Obese"]]

Class `"DMNGroup"`

Description

Result from fitting a Dirichlet-Multinomial generative classifier.

Objects from the Class

Objects can be created by calls to dmngroup.

Slots

All slots in this class are inheritted from SimpleList; see ‘Methods’, below, for information on how to manipulate this object.

Extends

Class "SimpleList", directly. Class "List", by class "SimpleList", distance 2. Class "Vector", by class "SimpleList", distance 3. Class "Annotated", by class "SimpleList", distance 4.

Methods

See the mixture help page for functions that operate on DMNGroup and DMN.

DMNGroup can be manipulated as a list; see SimpleList for a description of typical list-like functions.

Author(s)

Martin Morgan mailto:[email protected]

Examples

data(bestgrp)
bestgrp
bestgrp[[1]]
data(bestgrp)
bestgrp
bestgrp[[1]]

Heatmap representation of samples assigned to Dirichlet components.

Description

Produce a heat map summarizing count data, grouped by Dirichlet component.

Usage

heatmapdmn(count, fit1, fitN, ntaxa = 30, ...,
    transform = sqrt, lblwidth = 0.2 * nrow(count), col = .gradient)
heatmapdmn(count, fit1, fitN, ntaxa = 30, ...,
    transform = sqrt, lblwidth = 0.2 * nrow(count), col = .gradient)

Arguments

`count`	A matrix of sample x taxon counts, as supplied to `dmn`.
`fit1`	An instance of class `dmn`, from a model fit to a single Dirichlet component, `k=1` in `dmn`.
`fitN`	An instance of class `dmn`, from a model fit to `N != 1` components, `k=N` in `dmn`.
`ntaxa`	The `ntaxa` most numerous taxa to display counts for.
`...`	Additional arguments, ignored.
`transform`	Transformation to apply to count data prior to visualization; this does not influence mixture membership or taxnomic ordering.
`lblwidth`	The proportion of the plot to dedicate to taxanomic labels, as a fraction of the number of samples to be plotted.
`col`	The colors used to display (possibly transformed, by `transform`) count data, as used by `image`.

Details

Columns of the heat map correspond to samples. Samples are grouped by Dirichlet component, with average (Dirichlet) components summarized as a separate wide column. Rows correspond to taxonomic groups, ordered based on contribution to Dirichlet components.

Author(s)

Martin Morgan mailto:[email protected]

Examples

## counts
fl <- system.file(package="DirichletMultinomial", "extdata",
                  "Twins.csv")
count <- t(as.matrix(read.csv(fl, row.names=1)))

## all and best-fit clustering
data(fit)
lplc <- sapply(fit, laplace)
best <- fit[[which.min(lplc)]]

heatmapdmn(count, fit[[1]], best, 30)
## counts
fl <- system.file(package="DirichletMultinomial", "extdata",
                  "Twins.csv")
count <- t(as.matrix(read.csv(fl, row.names=1)))

## all and best-fit clustering
data(fit)
lplc <- sapply(fit, laplace)
best <- fit[[which.min(lplc)]]

heatmapdmn(count, fit[[1]], best, 30)

Access model components.

Description

The accessors mixture and mixturewt return information about the estimated Dirichlet components of the fitted model. Return values are described in the Values section, below.

Usage

mixture(object, ..., assign=FALSE)
mixturewt(object, ...)
goodnessOfFit(object, ...)
laplace(object, ...)
## S4 method for signature 'DMN'
AIC(object, ..., k = 2)
## S4 method for signature 'DMN'
BIC(object, ...)

## S4 method for signature 'DMN'
fitted(object, ..., scale=FALSE)
## S4 method for signature 'DMN'
predict(object, newdata, ..., logevidence=FALSE)
## S4 method for signature 'DMNGroup'
fitted(object, ...)
## S4 method for signature 'DMNGroup'
predict(object, newdata, ..., assign=FALSE)
## S4 method for signature 'DMNGroup'
summary(object, ...)
mixture(object, ..., assign=FALSE)
mixturewt(object, ...)
goodnessOfFit(object, ...)
laplace(object, ...)
## S4 method for signature 'DMN'
AIC(object, ..., k = 2)
## S4 method for signature 'DMN'
BIC(object, ...)

## S4 method for signature 'DMN'
fitted(object, ..., scale=FALSE)
## S4 method for signature 'DMN'
predict(object, newdata, ..., logevidence=FALSE)
## S4 method for signature 'DMNGroup'
fitted(object, ...)
## S4 method for signature 'DMNGroup'
predict(object, newdata, ..., assign=FALSE)
## S4 method for signature 'DMNGroup'
summary(object, ...)

Arguments

`object`	An instance of class `dmn`.
`newdata`	A `matrix` of new sample x taxon data to be fitted to the model of `object`.
`...`	Additional arguments, available to methods, when applicable.
`assign`	`logical(1)` indicating whether the maximum per-sample mixture component should be returned (`assign=FALSE`), or the full mixture matrix (`assign=TRUE`).
`scale`	`logical(1)` indicating whether fitted values should be returned unscaled (default, `scaled=FALSE`) or scaled by the variability of `mixturewt` parameter `theta`.
`logevidence`	`logical(1)` indicating whether posterior probability (default, `logevidence=FALSE`) or log evidence `logical=TRUE` should be returned.
`k`	ignored.

Value

mixture with assign=FALSE returns a matrix of sample x Dirichlet component estimates. With assign=TRUE mixture returns a named vector indexing the maximal Dirichlet component of each sample.

mixturewt returns a matrix with rows corresponding to mixture components, and columns pi (component weight) and theta (component variability). Small values of theta correspond to highly variable components.

goodnessOfFit returns a named numeric vector of measures of goodness of fit.

laplace, AIC, and BIC return the corresponding measures of goodness of fit.

Author(s)

Martin Morgan mailto:[email protected]

Examples

data(fit)
best <- fit[[4]]
mixturewt(best)
head(mixture(best), 3)
head(mixture(best, assign=TRUE), 3)
goodnessOfFit(best)

fl <- system.file(package="DirichletMultinomial", "extdata",
                  "Twins.csv")
count <- t(as.matrix(read.csv(fl, row.names=1)))
data(bestgrp)
bestgrp
head(predict(bestgrp, count))
data(fit)
best <- fit[[4]]
mixturewt(best)
head(mixture(best), 3)
head(mixture(best, assign=TRUE), 3)
goodnessOfFit(best)

fl <- system.file(package="DirichletMultinomial", "extdata",
                  "Twins.csv")
count <- t(as.matrix(read.csv(fl, row.names=1)))
data(bestgrp)
bestgrp
head(predict(bestgrp, count))

Summarize receiver-operator characteristics

Description

Returns a data.frame summarizing the cummulative true- and false-positive probabilities from expected and observed classifications.

Usage

roc(exp, obs, ...)
roc(exp, obs, ...)

Arguments

`exp`	`logical()` vector of expected classifications to a particular group.
`obs`	Predicted probability of assignment to the group identified by `TRUE` values in `exp`. The length of `exp` and `obs` must be identical.
`...`	Additional arguments, available to methods.

Value

A data.frame with columns

`TruePositive`	Cummulative probability of correct assignment.
`FalsePositive`	Cummulative probability of incorrect assignment.

Author(s)

Martin Morgan mailto:[email protected]

Examples

library(lattice)

## count matrix
fl <- system.file(package="DirichletMultinomial", "extdata",
                  "Twins.csv")
count <- t(as.matrix(read.csv(fl, row.names=1)))

## phenotype
fl <- system.file(package="DirichletMultinomial", "extdata",
                  "TwinStudy.t")
pheno0 <- scan(fl)
lvls <- c("Lean", "Obese", "Overwt")
pheno <- factor(lvls[pheno0 + 1], levels=lvls)
names(pheno) <- rownames(count)

## count data used for cross-validation, and cross-validation
count <- csubset(c("Lean", "Obese"), count, pheno)
data(bestgrp)

## true, false positives from single-group classifier
bst <- roc(pheno[rownames(count)] == "Obese",
           predict(bestgrp, count)[,"Obese"])
head(bst)

## lattice plot
xyplot(TruePostive ~ FalsePositive, bst, type="l",
       xlab="False Positive", ylab="True Positive")
library(lattice)

## count matrix
fl <- system.file(package="DirichletMultinomial", "extdata",
                  "Twins.csv")
count <- t(as.matrix(read.csv(fl, row.names=1)))

## phenotype
fl <- system.file(package="DirichletMultinomial", "extdata",
                  "TwinStudy.t")
pheno0 <- scan(fl)
lvls <- c("Lean", "Obese", "Overwt")
pheno <- factor(lvls[pheno0 + 1], levels=lvls)
names(pheno) <- rownames(count)

## count data used for cross-validation, and cross-validation
count <- csubset(c("Lean", "Obese"), count, pheno)
data(bestgrp)

## true, false positives from single-group classifier
bst <- roc(pheno[rownames(count)] == "Obese",
           predict(bestgrp, count)[,"Obese"])
head(bst)

## lattice plot
xyplot(TruePostive ~ FalsePositive, bst, type="l",
       xlab="False Positive", ylab="True Positive")

Helpful utility functions

Description

csubset creates a subset of a count matrix, based on identity of column phenotypes to a specified value.

Usage

csubset(val, x, pheno, cidx = TRUE)
csubset(val, x, pheno, cidx = TRUE)

Arguments

`val`	`character(1)` specifying the subset of `phenotype` to select.
`x`	A matrix of counts, with rows corresponding to samples and columns to taxonomic groups.
`pheno`	A `character()` vector of length equal to the number of rows in `count`, indicating the phenotype of the corresponding sample.
`cidx`	A `logical(1)` indicating whether columns (taxa) with zero counts in the count matrix following removal of taxa not satisfying `pheno %in% val` should be removed. `cidx=FALSE` removes the 0-count columns.

Value

A matrix of counts, with rows satisfying pheno %in% val and with columns equal either to ncol(x) (when cidx=TRUE) or the number of columns with non-zero counts after row subsetting (cidx=FALSE).

Author(s)

Martin Morgan mailto:[email protected]

Examples

## count matrix
fl <- system.file(package="DirichletMultinomial", "extdata",
                  "Twins.csv")
count <- t(as.matrix(read.csv(fl, row.names=1)))

## phenotype
fl <- system.file(package="DirichletMultinomial", "extdata",
                  "TwinStudy.t")
pheno0 <- scan(fl)
lvls <- c("Lean", "Obese", "Overwt")
pheno <- factor(lvls[pheno0 + 1], levels=lvls)
names(pheno) <- rownames(count)

## subset
dim(count)
sum("Lean" == pheno)
dim(csubset("Lean", count, pheno))
dim(csubset("Lean", count, pheno, cidx=FALSE))
## count matrix
fl <- system.file(package="DirichletMultinomial", "extdata",
                  "Twins.csv")
count <- t(as.matrix(read.csv(fl, row.names=1)))

## phenotype
fl <- system.file(package="DirichletMultinomial", "extdata",
                  "TwinStudy.t")
pheno0 <- scan(fl)
lvls <- c("Lean", "Obese", "Overwt")
pheno <- factor(lvls[pheno0 + 1], levels=lvls)
names(pheno) <- rownames(count)

## subset
dim(count)
sum("Lean" == pheno)
dim(csubset("Lean", count, pheno))
dim(csubset("Lean", count, pheno, cidx=FALSE))

Package 'DirichletMultinomial'

Help Index

Dirichlet-Multinomial Mixture Model Machine Learning for Microbiome Data

Description

Details

Author(s)

Cross-validation on Dirichlet-Multinomial classifiers.

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Data objects used for examples and the vignette

Description

Usage

Format

Examples

Fit Dirichlet-Multinomial models to count data.

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Class "DMN"

Description

Objects from the Class

Slots

Methods

Author(s)

See Also

Examples

Dirichlet-Multinomial generative classifiers.

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Class "DMNGroup"

Description

Objects from the Class

Slots

Extends

Methods

Author(s)

See Also

Examples

Heatmap representation of samples assigned to Dirichlet components.

Description

Usage

Arguments

Details

Author(s)

Examples

Access model components.

Description

Usage

Arguments

Value

Author(s)

Examples

Summarize receiver-operator characteristics

Description

Usage

Arguments

Value

Author(s)

Examples

Helpful utility functions

Description

Usage

Class `"DMN"`

Class `"DMNGroup"`