Velvet Assembly¶
Velvet was one of the first de novo genomic assemblers specially designed for short read sequencing technologies. It was developed by Daniel Zerbino and Ewan Birney at the European Bioinformatics Institute (EMBL-EBI). Velvet currently takes in short read sequences, removes errors then produces high quality unique contigs. It then uses paired-end read and long read information, when available, to retrieve the repeated areas between contigs. See the Velvet GitHub page for more info.
Step 1: velveth¶
velveth
takes in a number of sequence files, produces a hashtable, then
outputs two files in an output directory (creating it if necessary), Sequences
and Roadmaps, which are necessary for running velvetg
in the next step.
Let’s create multiple hashtables using kmer-lengths of 31 and 51. We are going to run two jobs in parallel:
cd /mnt/WGS-data
velveth velvet_31 31 -shortPaired -fastq -separate read1.fq read2.fq &
velveth velvet_51 51 -shortPaired -fastq -separate read1.fq read2.fq &
Once the two jobs are finished (use top to monitor your jobs), you should have two output directories for the two different kmer-lengths: velvet_31 and velvet_51.
Step 2: velvetg¶
Now we have to start the actual assembly using
velvetg
. velvetg
is the core of Velvet where the de Bruijn
graph is built then manipulated. Let’s run assemblies for both
kmer-lengths. See the Velvet manual for more
info about parameter settings. Again, we submit the job to the compute
cluster:
velvetg velvet_31 -cov_cutoff auto -ins_length 270 -min_contig_lgth 500 -exp_cov auto &
velvetg velvet_51 -cov_cutoff auto -ins_length 270 -min_contig_lgth 500 -exp_cov auto &
The contig sequences are located in the velvet_31 and velvet_51
directories in file contigs.fa. Let’s get some very basic statistics
on the contigs. The script getN50.pl
reads the contig file and
computes the total length of the assembly, number of contigs, N50 and
largest contig size. In our example we will exclude contigs shorter
than 500bp (option -s 500):
getN50.pl -s 500 -f velvet_31/contigs.fa
getN50.pl -s 500 -f velvet_51/contigs.fa