IGF1R (Insulin-like growth factor 1 receptor) is a gene central to several growth pathways and has been discovered to be under positive selection in the marmoset genome, and is likely strongly related to the small statue of marmosets.
Marmoset Genome Sequencing and Analysis Consortium. The common marmoset genome provides insight into primate biology and evolution. Nature Genetics, 46(8):850-857. 2014. paper here
In this exercise we will attempt to reconstruct some of the findings from the paper.
You can find all data files in data subdirectory
You will need some additional packages, if you did not install them previously (install as root):
sudo apt-get install muscle paml seaview pymol bioperl
We will start from the alignment file which stores DNA sequences of IGF1R in several mammals (the alignment is in Phylip format): data/igf1r.phy
Species are named in UCSC Genome Browser nomenclature: hg - human, panTro - chimp, ponAbe - orang, rheMac - macaque, calJac - marmoset, mm9 - mouse, rn4 - rat, canFam - dog
Question: Explore this file (you can either look at the file directly, or you can use a seaview viewer) and look at the differences between individual sequences.
Look at the file data/tree_marmoset.nh which contains a phylogenetic tree that we will use in the rest of the analysis. Note that in tree_marmoset.nh we have marked a branch leading to marmoset with mark #1.
Optional question: Use the alignment to build a phylogenetic tree (e.g. by using program phyml). Does this tree differ from what would you expect? Are there any weird branch lengths?
In this step, we will try to identify sites that are under positive selection in marmoset lineage. We will use "Bayes Empirical Bayes" method from PAML software.
./run_beb.pl igf1r.phy tree_marmoset.nh hg19 0.5 myout
How would you examine positive selection on a branch to macaque instead of marmoset? (You will need files myout.* later, so if you use run_beb again, change the last parameter.)
./remap-prot.pl myout.fa 1igr.fa myout.list >remapped.list