Assembly Complete; onto polishing and BUSCO QC'ing
Using two SMRT cells (PacBio SII) worth of long-read data (N50 > 32,000 bp) we opted to use the wtdbg2 redbean (https://github.com/ruanjue/wtdbg2) assembler, we have assembled the wood frog genome - all 6+ Gbp in only 33 hours! We did it on a stand alone computer - not a supercluster. The machine was initally buily by VelocityMicro and further upgraded and modified by us. Briefly, it's a Phanteks Ethnoo Primo Case housing dual 1300W PSUs driving a SuperMicro H11DSi-NT motherboard with dual socket AMD EPYC 7601 CPUs with 2x64 cores (128 threads; 124 used for the assembly); with 2 TB ECC DDR4 2666 MHz RAM, 25 TB HDD RAID10; and dual EKWB liquid cooled RTX TITANs. We're running Ubuntu 18.04 LTS as an O/S. We did have to make some modifications for cooling because of the heat generated by the RAM modules, so we added four GSkills Turbulence III ram coolers and added a few extra fans to increase push/pull air flow thoughout the case. We chose the new AMD EPYC processors because they are true dual threaded processors and pound for pound outclass the INTEL CPUs by every metric.
Next we will "polish" (correct for potential mistakes) the assembly with a single lane's worth of Illumina HiSeq3000 2x150 bp paired-end high quality short-read data. This is necessary to compensate for the the lower quality, but MUCH longer reads generated with the PacBio Sequel II instrument. Why use both? The PBSII instrument gave us single molecule reads lengths of VERY long length, more than half the genome was sequenced in pieces greater than 32,000 base-pairs. This facilitates us putting together the genome in the correct orientation, especially for genomes where we suspect we have large amounts of repeats. We will be using a new polisher just released called ntEdit (https://academic.oup.com/bioinformatics/article/35/21/4430/5490204), which is an ultrafast polisher capable of polishing the white spruce genome (20 Gbp) in 25 minutes. We will experiment and likely perform the polishing step iteratively evaluating the results with BUSCO (https://busco.ezlab.org/).
1 comments