Lecture 1

From Colettapedia
Jump to navigation Jump to search

general

  • "Pera Poogibo"
  • theoretical: inside the bioinformatics black box... overview of methods
  • applied bioinformatics
  • NCBI - evolutionary genomics research group
  • Pere, David, Ben, Yoo-Ah Sijung
  • 4 projects - deadline is day of journal club
  • ben.busby@gmail.com - use this
    • applied bioinformatics class in library
    • two scripting courses - perl python
    • medical genomics course
    • theoretical genomics
    • database design

bioinformatics

  • wet fishing expeditions are costly
  • pub med literature searches
    • boolean operators AND OR
    • "search phrase"
    • wildcard oxida*
    • fields [bracket delimited]

dna sequencing information

  • NCBI - Gene db with information about molecule
    • key field restrictions: organism, gene, (official symbol of gene locus), chromosome, title, accession
    • unique id for each molecule
  • NCBI - Nucleotide - RNA sequence
  • Joint genome institute DOE
  • ensembl
    • ENST
    • waning in popularity these days
  • UCSC genome browser
  • Galaxy
    • short read - alumina read
    • tools like bowtie (compilation required)
    • 200GB - good for one project
  • DAVID
  • Extremobase
  • model organisms like drsophila and c elegans

rna sequence

  • microRNAs
    • miRBase
  • regulatory information
    • transfac
  • trace/SRA
    • RNAseq
    • best to load up into GALAXY
  • ncRNA - non-coding RNA
  • CoreNucleotide (genBank)
    • key fields: organism accession, author, title sequence length properties gene
    • VOCAB - seed sequence

RNA expression (microarray)

  • NCBI- GEO
    • GDS entries better than GSE datasets
  • Stanford Microarray database
  • UCSC precomputed normalization
  • tissue-specific microarray data
    • bioGPS

SNPs

  • 1000 genome project - 10 GB
  • dbSNP

comparative genomics

  • taxonomic and chromosomal information
  • NCBI
    • taxonomy
    • genome
    • cancer chromosome

proteins

  • NCBI - Protein - largest db in world
    • key field restrictions: organism, title, author, mol wt, seq length, gene, [ecno] enzyme commission number
    • ec number categorize
    • example: cps1 AND homosapiens
    • go to gene page
    • HOT: alternate reading frames nested genes
  • VOCAB - protein domain - chunk of protein that is conserved btw species
    • CDD, SMART, PFAM
    • compare orthology
  • structural info
    • RCSB protein data bank (pdb)
    • use to grab protein structures
  • binding information
    • string - problem no so accurate - uses text mining
    • go to protein - what bind to? cog? networks?
    • text-mining - watson
    • ex:
      • two proteins - LAMP1 GANKYRIN
  • network and functional info
    • KEGG -- HIGH RECOMMENDATION BY BEN BUSBY!
    • kyoto encyclopedia of genes and genomes
    • shows pathways
    • KEGG pathway database
      • enter keywords argenine biosynthesis
      • making pics for presentations

phosphorylation

  • proleins
  • kinase - catalyze
  • phosphorylase - takes off
  • COOL - titin

intron structure

  • NCBI - nucleotide
    • search for word join - find intronic coordinated - gives offset

homology vs similarity

  • cannot say somewhat homologous
    • a quantitative statement describing
  • can say somewhat similar
  • identity - when aligned, x# of nucleotides are the same
  • 70% identical on amino acid level = 97% chance of having same structure
    • for function purposes probably doing more or less the same thing

evolution in 15 minutes

  • COOL: pseudomonas natriegens
  • microevolution - changes in gene frequency in pop from one generation to the next
  • macroevolution - descent of different species from a common ancestor over many generations - accumulation of variation
  • nothing in biology makes sense except in light of evolution - dobzhansky, 1973
  • requires descendent through genetic inheritance
  • the first phylogenetic tree - from darwin notebook in 1837 ("I think")
  • origin of species 1859 - only one picture!
  • model of speciation - separated populations
  • fitch WM 2000 Homology
  • homologous features vs analogous features
  • Molecules as documents - beginning of
  • homology in molecular data
  • gene duplication, orthologues and paralogues
    • paralogues - process of duplication
    • orthologues - speciation
  • horizontal gene transfer
    • transformation, transduction, bacterial conjugation - Dolittle 1996

5 steps of phylogenomics dancing

  1. sequence data
  2. align sequence
  3. phylogenetic signal? patterns-> evolutionary processes?
  4. characters based methods vs distance methods
  5. calculate best fit tree
  6. bootstrap - test phylogenetic reliability
  • gene tree don't necessarily give us species tree

more ben

  • entrez - manually curated db
  • genBank and RefSeq - GI (GenBank Identifier)
  • e-utilities
    • E-fetch
    • put in a wait command, batches of 50 and three second
  • NCBI ftp site
    • aspericyte
    • genbank sequence data files ncbi-asn1