Lecture 2

From Colettapedia
Jump to navigation Jump to search
  • BIOF518
  • finding correspondances between amino acids "residues"
  • assessing similarity w/ substitution matrices
  • substitution: probability that one amino acid mutates into another
  • easier for some amino acids to (one step mutations) to convert to other (two and three- step full codon changes)
  • valine, lucine, isolucine, can substitute each other and it won't matter. evolutionarily speaking, doesn't disrupt function of protein
  • phenyalanine, tryptophan, tyrosine
  • substitution matrix goes one-for-one, very simple, reward match, penalize mismatch
  • many mutations don't matter
  • add: insertion/deletion penalty (indel penalty)
    • bigger penalty for gap open, less penalty to gap extend (insertion)
  • Most people use BLOSUM -
  • nucleotide
    • amino acid (20) has more inferential statistical power than nucleotide (4)
    • phenotype is done at protein level, nucleotide is far less conserved
    • translate to protein, align, then map backwards to nucleotides
  • BLOSUM62 - most important matrix - covers protein query length of 85-300 amino acids long
  • for more divergent (hunam -> bacteria) BLOSUM45 - best for detecting long and weak alignments
  • human to another human -- BLOSUM90
  • example: hemoglobin - paraloges (in same organism)
  • if ever run BLAST - look at "consensus"
  • Identical
  • stronger: invariant
  • similar - like the plus in consensus
  • %60 identical is different thatn %60 similar (much weaker inference)
  • sturcture is conserved even when sequence is not
  • homogous is like being pregnant, you are or you aren't

alignment algorithms

  1. set up corresponding positions
  2. score w/ substitution matrix
  • needleman-wunsch - dynamic programming - global alignment only, get tripped up easily
  • most of the time you want local alignment
  • smith-waterman - like to use if at all possible pick teh best local path (don't have to start from teh corners, find teh best "ribbon" in teh grid, ribbon path with the higest scores)
    • a matrix M x K
    • use it for pairwise sequence alignment

multiple sequence alignments

  • do whenever you can - accuracy is much better
  • use multiple species - get better statistical power
  • clustalW - don't use this. guide tree - align the easy local stuff first
  • MUSCLE - use this. builds draft alignment, then improves. get to the multiple alignment as quickly as possible, have correspondence of positions.

protein structutre prediction

  • neural network implies we don't know waht's going on underlying , black box
  • secondary protein structure - tool "psipred"
  • tertiary structure:
    • ab initio
      • we can't even model water, electron is delocalized,
    • if modeling as balls on a spring, can get pretty far though
  • better: homology modelling - IF YOU CAN FIND ALIGNMENT

steps

  1. identify homologs with known structure
  • identify structure of you protein by comparing homologous protein with known structure
  • multiple sequence alignment bettern than pairwise
  • make alignments be as accurate as possible
  • past sequence into Phyre - find me structure, and it does!
  • align: best practice to align domains - a unit of verttical descent

finding domains

  • domains: basic unit of homologs
  • may have different evolutionary history, not vertical descent from common ancestor
  • active site residues vs. catalytic site residues.

sequence motifs=

  • only a few key sequences
  • most pro
  • only a few proteins are absolutely necessary
  • some proteins just exist to pad teh core
  • sequence motif - a regular expression for amino acids
  • sequence motif - confers the function of protein
  • even better structureal motifs - structural template
  • X arrangement will cause protein to be a peptidase for example, you will never find certain structures that don't perform a certain structure
  • could have two different proteins that have exact same active site ... may be functional analogs, but
  • you can tell the important amino acids by looking at what's conserved across species, if not conserved, prolly not important