Sequence editing COMPLETE! Plus first gene alignment metadata
Finally, the fun stuff begins.
Over the past few days, I’ve wrapped up the few remaining sequences that still needed to be edited and assembled, which means that at long last, this project moves towards its main analytical steps – phylogenetic analysis and barcode-based identification of the algae I collected for this project.
After finishing the final assemblies and reviewing the sequences I already have, I created this chart to help visualize most of the genetic data available for each strain and barcode (using Microsoft Office tools, in my classic fashion):

In total, a full-length sequence for at least one barcode was obtained from 41 strains of algae used in this experiment: the cultured strains JIAC1-36, excepting JIAC16; 4 collected filamentous algae (Fila1, Fila2, OedoSF, and Spiro); frozen previously-cultured Selenastraceae material (Sel); and collected Volvocaceae material (VolvFL). From a simple visual inspection of this chart, it is obvious that these sequences were not distributed evenly amongst all species; some strains performed better than others in terms of the number of successfully-sequenced genes they yielded. Three of these strains (JIACs 3, 4, and 10) yielded sequences for all six markers tested in this project, and four more (JIACs 2, 15, 26, and 31) yielded sequences for five of the six markers. On the other end of the spectrum, a mere two of the possible six markers were amplified and sequenced successfully from seven strains (JIACs 6, 7, 9, 11, 13i, Sel, and Spiro; the rbcL sequenced from Fila2, while complete, is a Spirogyra gene), and four strains are represented by only a single gene’s worth of data – JIACs 8, 14, 17, and Fila1 (the rbcL sequence for OedoSF, while complete, is Spirogyra in origin). In general, the strains with the most thorough molecular documentation tend to be those in the family Scenedesmaceae or the order Volvocales, whereas those with the fewest complete sequences are of the family Selenastraceae, the class Trebouxiophyceae, or the polyphyletic clade of filamentous green algae (including streptophyte and chlorophyte lineages). That being said, I believe the problems associated with the successful amplification and sequencing of their barcodes are a function of the physical attributes of the cultures, and not the incompatibility of the barcodes. But I’ll try and discuss those at a later time when I give my final guesses as to the identities of each algal culture/strain/species.
In addition to cataloging my full sequences, I also created single FASTAs to which I added all of the individual full sequences for each particular gene, used the default settings with the ClustalX plugin in MEGA7 to align them, and then trimmed the ends of the alignments down to minimize gaps. Some sample pictures:

This is a portion of the UPA alignment; in total, it covers 358 positions (bases and gaps).
The ITS2 region is the only one for which I haven’t yet done an alignment. Although each of the sequences can be used to identify their respective strains via BLAST, they don’t like to cooperate with each other via ClustalX, and they may prove to be too variable to align altogether. I’ll have to experiment with selectively removing a few sequences from the alignment to see if that improves the prospects of everything else aligning, as a few sequences can be easily spotted as divergent from others.

The tufA alignment, pictured above, is currently my largest, but it aligns very well, with a total of 867 positions.
The rbcL alignment came out a little shorter than I would have liked, at 1298 positions, but it seems to look good, so hopefully it works well!
18S, as a well-conserved ribosomal RNA marker (like UPA and 26S), aligned quite well, and is decently long at 1721 positions.

26S isn’t a very commonly used phylogenetic tool, but hopefully my alignment (pictured above) works alright. At 2072 positions, it’s the longest, and hopefully it proves to be informative.
And that’s a wrap for sequence editing and alignment! Over the next couple of weeks, I hope to provide some updates on my first phylogenetic trees and experiments I run, as well as check the identities of each sequence and then do some consensus determination of what each of my strains are, with the help of visual and molecular data. Stay tuned.
0 comments