Fitnoise

Poster describing Fitnoise

presented at ABiC 2014
Fitnoise is R+ software for Empirical Bayesian linear modelling. It has capabilities very similar to Limma but allows parametric noise models with an arbitrary number of parameters. Parameters are fitted using REML.

31 October 2014, from Nesoni version 0.128:

fitnoise.R is distributed as part of Nesoni, but also works as a standalone R+ module.

A unique feature of Fitnoise is its ability to detect differential poly(A) tail length in PAT-Seq data. See also Tail Tools.

Basic usage

Say you have a dgelist, mydgelist, and a design matrix mydesign, and a set of coefficients, testcoefs, to test:

H1 is all columns in the design matrix. H0 is those columns not named in testcoefs.

Run R+ with:

OPENBLAS_NUM_THREADS=1 R

...

source("fitnoise.R")

# voom from limma to calculate weights
myelist <- voom(mydgelist, mydesign)

myfit <- fit.elist(myelist, mydesign, model=model.t.standard, cores=8)

#examine noise fit
myfit

#produce a toptable-like result data-frame
result <- test.fit(myfit, coefs=testcoefs)

When examining myfit, if the "noise combined p-value" is low this indicates a poor fit by the noise model. A more accurate noise model may be required.

Noise models

The novel feature of Fitnoise is the availability of different noise models. Above we used model.t.standard, which performs moderated t tests or F tests similarly to limma. Available are:

Writing new noise models is straightforward.

Fitnoise handles missing data gracefully.

Use without replicates

Say mydesign represents a null hypothesis H0. We seek genes that are outliers given a noise model. t-distribution based models allow for outliers, so we instead use a normal-distribution based model. This is effectively asserting that genes have fixed biological variability. This is a poor assertion to make.
myfit <- fit.elist(myelist, mydesign, model=model.normal.standard, cores=16)

#produce a toptable-like result data-frame based on surprising deviation 
#from the noise model in individual genes
result <- test.fit(myfit)

Note how we never specified an alternative hypothesis H1.

Equivalently, you can call fit.elist with H1 as the design and an extra parameter "noise.design" giving H0, then call test.fit normally. This will give identical p-values, and the output will include coefficient estimates.

This produces a ranking of genes by interest, taking into account uncertainty in their expression levels. p-values / FDR-values are for ranking purposes only. This method can not distinguish highly variable genes from truly differentially expressed genes. Now go and tell your collaborator to produce replicates next time.

Experimental: negative control features

A negative control feature is one in which the noise distribution is typical, and which is believed to not have been changed by the experiment. Constitutively expressed housekeeping genes would make good negative controls. Spiked in RNA would not make good negative controls as they will not have typical biological noise. Empirically chosen negative controls might underestimate or misestimate the noise distribution, use with caution (eg features chosen from non-significant features from a differential test without negative controls).

A logical vector of negative control features can be passed to fit.elist as parameter "controls". You can also pass your null hypothesis design as "control.design" (this defaults to a single column of ones).

Negative controls, if known, allow noise to be fitted more accurately. For example they may be useful for model.t.per.sample.var if you have few samples. If the noise model is covariant between samples with covariance spanning experimental groups, negative controls are absolutely necessary (none of the current noise models use this.)