Our assignment for the past week has been to create phylogenetic trees from multiple sequences alignments based on clusters of orthologous genes (COGs). Specifically, to decide why a simple BLAST search was unable to accurately place a subject gene from Cryptosporidium parvum into a COG category.
I think this assignment is an interesting exercise in ‘real’ bioinformatics: where data is messy, the programs are challenging to install and use, and in the end you’re not quite sure what you ended up with, but it’s enormous fun anyway!
While we were supposed to run these analyses on Bio-Linux, where everything is pre-installed, I decided to be difficult and use my Ubuntu install instead.
Our assignment is to use:
- jalview and T-COFFEE or seaview and MUSCLE for aligning and editing the sequences
- RAxML for producing the phylogenetic tree
- treeview for viewing the tree
Although jalview was more highly recommended by my professor, seaview was in the Ubuntu repository, so I used that instead. Treeview was also in the repository (under treeviewx). RAxML was not.
Multiple Sequence Alignment (seaview)
The first step in generating a phylogenetic tree is to create a multiple sequence alignment. I placed all the sequences that I wanted to align into a FASTA file and loaded that into seaview (File > Open Fasta > Show: All files). The alignment was easy to do with the built-in MUSCLE implementation (Align > Align all).
The next step was surprising to me: remove all the gaps in the sequence. In this way, the tree generator only has to consider residue substitutions instead of in-dels.
The way that seaview allows you to remove sections of the multiple sequence alignment is slightly obscure. Essentially, you must save the sections you want to keep to another file.
- Create a set composed of all sites. (Sites > Create set, click Ok)
- Below your multiple sequence alignment is now a row of X’s with the sequence title ‘all sites’. By clicking on an X, you can de-select it and turn it into a -. You can expand the gapped region by clicking on an adjacent X and dragging the mouse pointer in the direction of choice. The de-selected sequences above turn gray. Deselect all the unconserved sequence.
- Save the set (Sites > Save set). Not really sure what this does, but occasionally the next step doesn’t work without it.
- Save your selection (File>Save selection) as a phylip file (for RAxML).
- Open the file you just saved to check if you missed any small gaps, and repeat steps 1-4 if necessary.
I could not convince seaview to de-select the first and last residues from the sequence, so I had to edit the sequence by selecting Props > Allow seq editing. Make sure to delete every beginning/ending residue in every sequence!
Phylogenetic Trees (RAxML)
To create the phylogenetic tree, our instructors told us to use RAxML, a fast maximum likelihood algorithm. Installation was slightly tricky. I downloaded the tar.bz2 from the website and unzipped it. Unfortunately, there were no installation instructions inside.
Fortunately, a quick Google brought me to a forum where someone asked that exact question. Running make worked for me using:
make -f Makefile.gcc
Oh make, we don’t miss you or like you at all. Fortunately, it installed without any problems. After moving all of my phylip files created by seaview into the RAxML folder, I used a little script written by a former co-worker Cedric Simillion to run RAxML:
#!/bin/tcsh #Created by Cedric Simillion on Thu Jan 29 15:24:04 GMT 2009 set b='basename $1 .phylip' set o=$b.tre ./raxmlHPC -f a -n $o -s $1 -m PROTCATWAG -c 25 -p 5464 -x 88 -\# 100
Save that to a text file, make it executable, and then run the program just by entering:
Less than an hour later, I had my maximum likelihood trees (I ran several because I had different MSAs with added and removed sequences).
Phylogenetic tree visualisation
I would love to show my trees to you, but unfortunately RAxML’s trees keep crashing treeviewx. So you’ll have to wait in fevered anticipation for my update, hopefully tomorrow after I consult with an expert.