Lecture 4

project review

Don't use MEGAblast - megablast is only for looking at similar sequences
use tblastn - do the search at the protein level with
wordsize 2
blosom 45
sequence conservation and the dna level - simplest explanation is that it's a gene
if you search a protein database the introns will be gone

profile - weight certain things more - build profile iteratively -
if have representation of conserved motif
identify the catalytic residues
prebuild conserved motif profiles (RPS-blast)
NOW = match profiles: match conserves residues to conserved residues
uses profile on both sides of search
HHpred
more sensitive, specific, better
HMMs scoring is 20 plus amino acids, plus insertions and deletions
ROC curve receiver operatior curve - sensitivity vs specificity
best - HHpred + SS (secondary structure)
secondary structure is the general three-dimensional form of local segments of biopolymers such as proteins and nucleic acids (DNA/RNA).
secondary structure added by psi-pred
hhpred blows competition out of water when family is known, out of luck

protein folding problem has not ben solved
even works well at predicting tertiary structure prediction
ergo = sequence actually does beget structure begets function
sequence is enough to define structure
proteins structure (at least globular ones, the ones we can crystalize) have a structure, contrast to RNA which is floppy

scop = structural calssification of proteins, hierarchical classification scheme
go - gene ontology, hierarchical
orthologs, evolutionary classification of genes
orthologs are related by speciation
paralogs related by duplication - not orthologs! like olfactory receptor, diverges and performs different function
al beta gamma fetal hemoglobin - fetal binds oxygen tightly wraps tighter so it could steal oxygen away from mother
paralogs are co-orthologous to genes - they don't individually are ortho to others, but as a set they are
ortholog typically do same function
xenology - genes arises by horizontal gene transfer from another organism
alpha proteobacteria - mitochondria
in-paralogs - two genes where speciation, then duplication
co-ortho - collectively orthologous
orthologous group collection of all descendents of ancestral gene - a simplifying principle
flybase, wormbase, yeast genome browser
pick the db that's the prettiest and works the best

reconciles a species tree with a gene tree
occam's razor, duplication event towards leaves of tree, less likely that there's duplication event further up in the tree and then loss even in other species
other tree topology can suggest duplication before speciation "out-paralogs" ... two separate gene units
in-paralog duplication AFTER speciation event