Wombac

Description

Synopsis: Wombac rapidly finds core genome SNPs from samples and produces an alignment of those SNPs which can be used to build a phylogenomic tree. It can handle 100s of samples and uses multiple CPUs on a single system efficiently. Computations can re-used for building new trees when new samples are added, saving lots of time. Wombac only looks for substitution SNPs, not indels, and it may miss some SNPs, but it will find enough to build high-resolution trees.

Input: Snippy needs a reference genome in FASTA format (can be in multiple contigs) and a series of samples. A sample can either be:

  1. a folder containing FASTQ short reads: eg. R1.fq.fz R2.fq.gz
  2. a multi-FASTA file: eg. contigs.fa or NC_273461.fna
  3. a .tar.gz file containing FASTA contig files: eg. Ecoli_K12mut.contig.tar.gz (from EBI/NCBI)

Output: Wombac produces standards-compliant output files: BAM, VCF (per sample) and an overall .ALN (FASTA aligned core SNPs).

Etymology: The name Wombac is a combination of bac (for "Bacteria") and Wombat (to represent its Australian origin), which is an animal with a very solid core!

Download

wombac-2.0.tar.gz - 27 Jan 2015 - GitHub

Usage

% ls -R
K12.fna 
EcPoo.fasta 
EHEC.contigs.fa 
UPEC/R1.fq.gz UPEC/R2.fq.gz
EPEC/R1.fastq EPEC/R2.fastq
APEC/s_1_sequence.txt
K12mut.contigs.tar.gz

% wombac --outdir Tree --ref K12.fna --run EcPoo.fasta EHEC.contigs.fa UPEC/ EPEC/ APEC/ K12mut.contigs.tar.gz
(wait a while)

% SplitsTree -i Tree/core.aln
(play with tree)

License

Wombac is free software, released under the GPL (version 3).

Contact