Vica: Software to identify highly divergent DNA and RNA viruses and phages in microbiomes¶
Vica is designed to identify highly divergent viruses and phage representing new families or orders in assembled metagenomic and metatranscriptomic data. Vica does this by combining information from across the spectrum of composition to homology. The current version of Vica uses three feature sets (5-mers, codon usage in all three frames, and minhash sketches from long kmers (k=24,31). The classifier uses a jointly trained deep neural network and logistic model implemented in Tensorflow. The software is designed to identify both DNA and RNA viruses and phage in metagenomes and metatranscriptomes.
The current leases does not include trained models but we will be adding them in the future to allow for the rapid identification of viruses without model training.
This package can classify assembled data and train new classification models. Most users will only use the classification functionality in Vica. We will provide trained models for classifying contigs in future releases. classification can be easily invoked with the command:
vica classify -infile contigs.fasta -out classifications.txt -modeldir modeldir
The package also has a suite of tools to prepare data, train and evaluate new classification models. Many of the workflows for doing this can be evoked with the same sub-command interface:
vica split vica get_features vica train vica evaluate
For details see the Tutorial.
The package relies on a number of python dependencies that are resolved when the package is installed with PIP.
The non-python dependencies are:
- Bbtools > v37.75- https://jgi.doe.gov/data-and-tools/bbtools/
- Prodigal > v2.6.3 - https://github.com/hyattpd/Prodigal
- GNU Coreutils - http://www.gnu.org/software/coreutils/coreutils.html