Fay-Wei Li

Fay-Wei Li

Dec 09, 2014

Group 6 Copy 354
1

Genome data arrived!

Dear Azolla Team, 

We have just received the first batch of Azolla genome sequencing results from BGI (formally Beijing Genomics Institute)! 

BGI scientists, Shifeng Cheng, Xin Liu, Bo Song, prepped the DNA we sent and produced a staggering amount of 64,000,000,000 nucleotide data (64 "giga bases"). The data files are so huge that I couldn't even open them in a regular text editor! (and also maxed out my macbook storage space...). 

Check out the file size on the right... (many files are already compressed!)

Here is what the data actually look like: 

The above figure shows information from 5 sequences, and one of them is highlighted with blue background. the first line is the sequence ID (blue), the second line is the actual DNA sequence of a hundred nucleotides (green), and the forth line is the "quality score" associated with each nucleotide readout (orange) with "B" the lowest and "h" the highest quality. This data format is called "fastq". 

We have *millions* of such sequence data, and we are right now trying to piece them together to get longer contiguous fragments ("contigs" as we often say). This will be our first pass to explore the Azolla genome space. 

BGI scientists and our collaborators at Utrecht University, Netherlands are cranking up "gene machines" to decode more Azolla genome sequence data. Very exciting, isn't it! 

Please stay tuned for more updates! And thank you for your kind support making this possible!!

Cheers!

Fay-Wei

1 comment

Join the conversation!Sign In
  • David Rabanus
    David RabanusBacker
    What a great data set! Have you you considered formulating a data mining challenge on kaggle.com? That would put very experienced brains on the task of finding sequence correlations, commonalities, deviations etc. in your samples... Best regards and all the best! - David
    Dec 10, 2014
  • Oscar Jasklowski
    Oscar JasklowskiBacker
    Hi Fay-Wei, I'm curious about David's question. I thought I'd comment here to make sure you saw it :)
    Mar 15, 2015

About This Project

Azolla is a symbiotic superorganism that captures all the nitrogen fertilizer it needs to grow from the air around it. Asia’s farmers have long known this, growing Azolla together with rice to provide a natural fertilizer to bolster rice productivity.

Genome sequencing of Azolla is a big step toward potentially helping crops to use less synthetic nitrogen that would benefit farmers' bottom lines, the environment, and the prices we pay for food.
Blast off!

Browse Other Projects on Experiment

Related Projects

Wormfree World - Finding New Cures

Hookworms affect the lives of more than 400,000,000 men, women and children around the world. The most effective...

Viral Causes of Lung Cancer

We have special access to blood specimens collected from more than 9,000 cancer free people. These individuals...

Cannibalism in Giant Tyrannosaurs

This is the key question we hope to answer with this study. This project is to fund research into a skull...

Backer Badge Funded

A biology project funded by 123 people

Add a comment