Genome data arrived!
Dear Azolla Team,
We have just received the first batch of Azolla genome sequencing results from BGI (formally Beijing Genomics Institute)!
BGI scientists, Shifeng Cheng, Xin Liu, Bo Song, prepped the DNA we sent and produced a staggering amount of 64,000,000,000 nucleotide data (64 "giga bases"). The data files are so huge that I couldn't even open them in a regular text editor! (and also maxed out my macbook storage space...).
Check out the file size on the right... (many files are already compressed!)

Here is what the data actually look like:

The above figure shows information from 5 sequences, and one of them is highlighted with blue background. the first line is the sequence ID (blue), the second line is the actual DNA sequence of a hundred nucleotides (green), and the forth line is the "quality score" associated with each nucleotide readout (orange) with "B" the lowest and "h" the highest quality. This data format is called "fastq".
We have *millions* of such sequence data, and we are right now trying to piece them together to get longer contiguous fragments ("contigs" as we often say). This will be our first pass to explore the Azolla genome space.
BGI scientists and our collaborators at Utrecht University, Netherlands are cranking up "gene machines" to decode more Azolla genome sequence data. Very exciting, isn't it!
Please stay tuned for more updates! And thank you for your kind support making this possible!!
Cheers!
Fay-Wei
1 comment