Lecture 1
Jump to navigation
Jump to search
Contents
general
- "Pera Poogibo"
- theoretical: inside the bioinformatics black box... overview of methods
- applied bioinformatics
- NCBI - evolutionary genomics research group
- Pere, David, Ben, Yoo-Ah Sijung
- 4 projects - deadline is day of journal club
- ben.busby@gmail.com - use this
- applied bioinformatics class in library
- two scripting courses - perl python
- medical genomics course
- theoretical genomics
- database design
bioinformatics
- wet fishing expeditions are costly
- pub med literature searches
- boolean operators AND OR
- "search phrase"
- wildcard oxida*
- fields [bracket delimited]
dna sequencing information
- NCBI - Gene db with information about molecule
- key field restrictions: organism, gene, (official symbol of gene locus), chromosome, title, accession
- unique id for each molecule
- NCBI - Nucleotide - RNA sequence
- Joint genome institute DOE
- ensembl
- ENST
- waning in popularity these days
- UCSC genome browser
- Galaxy
- short read - alumina read
- tools like bowtie (compilation required)
- 200GB - good for one project
- DAVID
- Extremobase
- model organisms like drsophila and c elegans
rna sequence
- microRNAs
- miRBase
- miRBase
- regulatory information
- transfac
- trace/SRA
- RNAseq
- best to load up into GALAXY
- ncRNA - non-coding RNA
- CoreNucleotide (genBank)
- key fields: organism accession, author, title sequence length properties gene
- VOCAB - seed sequence
RNA expression (microarray)
- NCBI- GEO
- GDS entries better than GSE datasets
- Stanford Microarray database
- UCSC precomputed normalization
- tissue-specific microarray data
- bioGPS
SNPs
- 1000 genome project - 10 GB
- dbSNP
comparative genomics
- taxonomic and chromosomal information
- NCBI
- taxonomy
- genome
- cancer chromosome
proteins
- NCBI - Protein - largest db in world
- key field restrictions: organism, title, author, mol wt, seq length, gene, [ecno] enzyme commission number
- ec number categorize
- example: cps1 AND homosapiens
- go to gene page
- HOT: alternate reading frames nested genes
- VOCAB - protein domain - chunk of protein that is conserved btw species
- CDD, SMART, PFAM
- compare orthology
- structural info
- RCSB protein data bank (pdb)
- use to grab protein structures
- binding information
- string - problem no so accurate - uses text mining
- go to protein - what bind to? cog? networks?
- text-mining - watson
- ex:
- two proteins - LAMP1 GANKYRIN
- network and functional info
- KEGG -- HIGH RECOMMENDATION BY BEN BUSBY!
- kyoto encyclopedia of genes and genomes
- shows pathways
- KEGG pathway database
- enter keywords argenine biosynthesis
- making pics for presentations
phosphorylation
- proleins
- kinase - catalyze
- phosphorylase - takes off
- COOL - titin
intron structure
- NCBI - nucleotide
- search for word join - find intronic coordinated - gives offset
homology vs similarity
- cannot say somewhat homologous
- a quantitative statement describing
- can say somewhat similar
- identity - when aligned, x# of nucleotides are the same
- 70% identical on amino acid level = 97% chance of having same structure
- for function purposes probably doing more or less the same thing
evolution in 15 minutes
- COOL: pseudomonas natriegens
- microevolution - changes in gene frequency in pop from one generation to the next
- macroevolution - descent of different species from a common ancestor over many generations - accumulation of variation
- nothing in biology makes sense except in light of evolution - dobzhansky, 1973
- requires descendent through genetic inheritance
- the first phylogenetic tree - from darwin notebook in 1837 ("I think")
- origin of species 1859 - only one picture!
- model of speciation - separated populations
- fitch WM 2000 Homology
- homologous features vs analogous features
- Molecules as documents - beginning of
- homology in molecular data
- gene duplication, orthologues and paralogues
- paralogues - process of duplication
- orthologues - speciation
- horizontal gene transfer
- transformation, transduction, bacterial conjugation - Dolittle 1996
5 steps of phylogenomics dancing
- sequence data
- align sequence
- phylogenetic signal? patterns-> evolutionary processes?
- characters based methods vs distance methods
- calculate best fit tree
- bootstrap - test phylogenetic reliability
- gene tree don't necessarily give us species tree
more ben
- entrez - manually curated db
- genBank and RefSeq - GI (GenBank Identifier)
- e-utilities
- E-fetch
- put in a wait command, batches of 50 and three second
- NCBI ftp site
- aspericyte
- genbank sequence data files ncbi-asn1