BIOF 309 Curriculum
Jump to navigation
Jump to search
Contents
- 1 Changes for 2015
- 2 Course Units
- 3 Read files, string manipulation and Design Patterns(class 4)
- 4 Functions, Generators, and Debugging (screencast 2)
- 5 Slice, dice, Combine and Sort (class 5)
- 6 Regular Expressions (class 7)
- 7 Classes and Object-Oriented Programming (class 8)
- 7.1 vocab
- 7.2 Philosophical underpinnings
- 7.3 Defining classes
- 7.4 The simplest class
- 7.5 python magic attributes
- 7.6 python object hooks
- 7.7 special attibutes
- 7.8 class attribute naming convention
- 7.9 Using classes in your code
- 7.10 Subclassing
- 7.11 Extend Python built-in types via inheritance
- 7.12 Abstract Base Class (interface)
- 7.13 @classmethod and @staticmethod
- 7.14 Python descriptor
- 7.15 Metaclasses
- 8 Algorithms
- 9 Data Viz (Class 9)
- 10 Statistics techniques (class 10)
- 11 Machine Learning (Class 11)
- 12 Possible optional classes
- 13 Software development techniques (optional class)
- 14 Bash Shell (optional class)
- 14.1 Introduction
- 14.2 Motivation
- 14.3 Installation
- 14.4 General Concepts
- 14.5 unix filesystem
- 14.6 privilege system
- 14.7 More Concepts
- 14.8 Moving around in the shell for the first time
- 14.9 Commands
- 14.10 Viewing the contents of files
- 14.11 Edit a file
- 14.12 grep
- 14.13 Advanced Concepts
- 14.14 tips
- 15 Vocabulary Terms
Changes for 2015
- Use Python3
- Magic number becomes student number which students can refer to
- Google API
- Get statistics for who uses class email list
- Download homeworks, grade them, and output an html saying what their grade was
- Autograder for girlfriend guess hw:
- ReturnWordsInWalden: Check type of return, number of entries, lowercase-ize
- Have another function that checks intersection list?
- Change objective to focus on finding gf's name more?
- NO TABS ANYWHERE!!!
- Critical to have notebooks so people don't have to type along
- Have an algorithms section that talks about sorted()
- Explicitly cover
if __name__ == '__main__':
- The __main__ namespace, __builtin__ namespace __file__
,__doc__
- Implement some sort of quiz to happen often, perhaps as a class exercise, via Moodle installation
- More coverage on abstract methods/abstract properties vs. concrete methods/properties on ABCs
- Proper introduction of the use of git/github as a tool
- Autograder hw4 check return types!!!!!! Plus better doc strings as to what should be returned.
- Use the slider to vary a parameter in IPython, and other tricks
- buffer = a temporary storage unit/zone where you put things. size and contents may change rapidly. ("flush the buffer")
- docstring notation ->returns, function signature, etc.
- yield in the middle of a function, the finer points of yield
- clarify what it means to pass if you're an auditor
IPython
four most helpful commands
- ? - intro and overview
- %quickref
- help
- object?
Feaures
- tab completion
- explore objects
- magic functions
- %run any python script and load results into interactive namespace
- importable as a package
IPython
- from command line
ipython --help | less
- history
- execute shell commands by prefacing it with !
- capture the output in a python variable with double !!
- reload a module - useful for making changes
- system for caching input/output
Magic command system
- %lsmagic - see all magic commands
- %somemagicfunction? - see the help for the magic function
- single % = magic function
- double %% = cell magic
- enable automagic - don't have to type the single %
- define your own magics
- %magic
- %cd
- %timeit time; this statement;
- %%timeit setup code; more setup code
subsequent lines are what is timed.
- %pdoc - pring docstring
- %pdef - print call signature
- %env - show environment variables
- %load file - load apython script asif it were typed into the ipython terminal
- %pastebin -d "description" line numbers: Upload code to Github's Gist paste bin, returning the URL.
- %prun - python code profiler
- %recall - more like recall as in re-call than as in remember
tips
- semicolon suppresses print statement output
Course Units
- BIOF 309 Project Ideas
- Class 1: Introduction and Installation
- Screencast 1: Numbers - former: Class 2: Python Primitives
- Class 2: Iterables
- Class 3: Control Flow
Read files, string manipulation and Design Patterns(class 4)
Design Patterns
- Why study design patterns? It's a cookbook for programmers!
- loop forever
- search loop
- dictionary iteration
- file iteration
- numbering iterations
- repeat
- do
- collect
- Combine
- count
- Collection Combine
- Search
- Filtered Do
- Filtered Collect
- Filtered Collect Groups of lines
- Filtered Combine
- Filtered Count
- Nested Iteration
- Recursive Tree Iteration
Algorithms
- Big-O notation
- quicksort
Homework
- read in this file and output the same file sorted by state then city.
Functions, Generators, and Debugging (screencast 2)
- Your programs are getting longer, now's a good time to mention The Python Style Guide, and the Google Python Style Guide
functions
- eliminate repetitiveness
- global vs local variables - name bound to a value
- namespace - a list of all the names for all the variables that are currently being used
- A scope is a textual region of a Python program where a namespace is directly accessible. “Directly accessible” here means that an unqualified reference to a name attempts to find the name in the namespace.
- you get a new scope after every def, but not ifs, for loops, etc
- you can "read" variables from inner scopes, you just can't rebind that name to a new value or object
- but! you can add additional items to a dict via inner scope.
- values generated are not retained between calls!
- what are the side effects of running the function, e.g., print statements.
- return statement
- allowed to have multiple returns in a function, but once it's hit, function exits immediately!
- Can use this to break out of loop forever patterns in a way similar to break.
- Can only return a single object, but that single object can be a container that contains multiple things
- allowed to have multiple returns in a function, but once it's hit, function exits immediately!
- arguments
- default arguments
- keyword args come only after positional args
- recursive
- the return statement
- the "backtrace" or call stack
- pass
- do nothing
- how you tell python you want an empty block
- usually a placeholder when you haven't decided what to put there
- comments and docstrings
- comments disappear when python interprets code, but docstrings remain, accessible via help()
- docstring conventions, longer than one line, make one line summary, then blank. end with blank line
Assertions
- Best Practice For Python Assert
- Asserts should be used to test conditions that should never happen. The purpose is to crash early in the case of a corrupt program state.
- Exceptions should be used for errors that can conceivably happen, and you should almost always create your own Exception classes.
- For example, if you're writing a function to read from a configuration file into a dict, improper formatting in the file should raise a ConfigurationSyntaxError, while you can assert that you're not about to return None.
- In your example, if x is a value set via a user interface or from an external source, an exception is best.
- If x is only set by your own code in the same program, go with an assertion.
Function decorators
def output_railroad_switch( method_that_prints_output ): """This is a decorator that optionally lets the user specify a file to which to redirect STDOUT. To use, you must use the keyword argument "output_filepath" and optionally the keyword argument "mode" """ def print_method_wrapper( *args, **kwargs ): retval = None if "output_filepath" in kwargs: output_filepath = kwargs[ "output_filepath" ] del kwargs[ "output_filepath" ] if "mode" in kwargs: mode = kwargs[ "mode" ] del kwargs[ "mode" ] else: mode = 'w' print 'Saving output of function "{0}()" to file "{1}", mode "{2}"'.format(\ method_that_prints_output.__name__, output_filepath, mode ) import sys backup = sys.stdout sys.stdout = open( output_filepath, mode ) retval = method_that_prints_output( *args, **kwargs ) sys.stdout.close() sys.stdout = backup else: retval = method_that_prints_output( *args, **kwargs ) return retval return print_method_wrapper
- example: define a decorator which calls the decorated function 10 times
- need example of decorator with an argument, see here at 35:27
Anonymous functions
- lambda a, b: blah blah
- more about these when talking about sorting
Iterators
- not rewindable, reversable, copyable
- value of it changes every time you use it
- e.g., it = iter(range(10)); zip(it, it) yields [(0, 1), (2, 3), ... (9, 10)]
Generators
- generator an easy way to create an iterator
- generates each item on the fly
- ""A generator can take the place of a list when the list is so long, and/or its values are so big that creating the entire list before processing its elements would use enormous amounts of memory"
- yield statement
- "A function that uses a yield statement in place of return generated a new generator object each time it's called. The generator object encapsulates the bindings and code each time its called , keeping them together for as long as the generator is in use"
- next( generator[, default])
- values inside are RETAINED between calls
- no way to access individual element, can only call next
- can call set list or tuple on a generator to get one, but don't do it on an infinite generator!
- reverse operation, calling iter( ...) on a list
- itertools.cycle( list ) - repeating sequence over and over.
- Coroutines - "allow multiple entry points for suspending and resuming execution at certain locations."
yield
becomes an expression that returnsstuff
from the outside, passed in when caller usesgenerator.send( stuff )
Python Debugger
- comment things out
- use print statements
- inserting in using import pdb;pdb.set_trace()
- commands
- backtrace
- post mortem debugging:
python -m pdb your_script.py
Slice, dice, Combine and Sort (class 5)
Goal
- Do a screen cast for this
- this is the lecture you talk about list comprehensions, zipping and unzipping tuples, and sorting stuff
String Format
- Build a format string ahead of time and call .format on it while iterating
- .format() minilanguage
- sprintf operations, significant digits - give the place holders names, format after the colon
- * operator turns tuple into indiv values for format arguments.
- joins and split, and strip
- unpack a split into multiple variables
- print "{num:>12.3f} and without pad {num:0.3f}".format( num=num )
file IO
- open(), close()
- with keyword
- reading modes "read write append"
- ways of reading files
- line by line iteration
- slurp entire text into one variable
- chomp
- blocking operations
- Accessing memory mapped files is faster than using direct read and write operations for two reasons. Firstly, a system call is orders of magnitude slower than a simple change to a program's local memory. Secondly, in most operating systems the memory region mapped actually is the kernel's page cache (file cache), meaning that no copies need to be created in user space - from memory-mapped file
Comprehension
- can have for list, set and dict
Conditional Comprehensions
- [expression for element in collection if test ]
zip and unzip
- zip() and zip*
- itertools.islice
Anonymous functions
- lambda a, b: blah blah
- more about these when talking about sorting
Sort stuff
- can provide key sorted( ..., key = lambda x: ... )
- full discussion of cmp(a,b), returns -1, 1, 1
- can define your own lambda
- perhaps use
- sort by column a, then column b:
- sort_func = lambda A, B: cmp( A[0], B[0] ) if A[0] != B[0] else cmp( A[1], B[1] )
- reverse = True
Homework
- make dict of DNA/RNA codon table
Regular Expressions (class 7)
- Pattern Matching
- greedy vs. non-greedy
- grouping
- substitution
- beginning of line, end of line
- backreference
Classes and Object-Oriented Programming (class 8)
vocab
- CS terms
- instance/instantiate
- object
- data field
- process/function/subroutine
- class
- encapsulation
- inheritance
- implement
- python names for cs concepts
- object
- member
- method
- attributes
- class
- instance attribute
- class attribute
Philosophical underpinnings
- Simple collections are good, but nested collections can be a pain
- Comparison of programming paradigms
- Key concept: members and attributes used by the public, vs member and attributes that are meant for internal use only
- by convention members starting with underscore not supposed to be used by the public
- python convention: don't use variables/functions that start with _, i.e., those functions aren't part of the interface
- <quote>In Python, class is a sysnonym for type and object is a synonym for value. The period in str.count tells python to look for the named method in class str... The statement "every value has a type" can be rephrased as "every object is an instance of a class." Classes and instances form the basis of object-oriented programming.</quote>
- data model
- interface - information hiding - "You don't need to know that!"
- abstraction - hide away implementation details
- public vs. private
- distinction between "In python everything is an object" in the conceptual sense, vs. "all objects inherit from object"
- stencil metaphor - a template.
Defining classes
- the anatomy of a class definition
- inherit from object
- in python 2.x you have to explicitly inherit from object, in python 3.x all user-defined classes implicitly inherit from object
- this is part of the cruft that removed in python 3
- "attributes"
- functions become methods
- what's the difference setween functions and methods? self passed as first object
The simplest class
- def MyNewClass( object ) : pass
- docstring
python magic attributes
- what's all the special __stuff__ in an object when you call dir( obj )?
- duck typing - Special Method Names
class thing( object ): def __init__( self, value ): self.val = value def AddMeWithAnotherThing( self, other ): return self.__class__( self.val + other.val ) def __add__( self, other ): return self.AddMeWithAnotherThing( other ) def __repr__(self): return "thing(" + str(self.val) + ")" things = map( thing, range(20) ) print reduce( lambda x, y: x+y, things )
- Inheritance (OOP) vs. emulation (implement the functions that make an object behave like another)
python object hooks
- (slide stolen from a googletechtalk)
- defines how your object behaves
- how to make your classes act like built-in python classes such as int, str, etc.
- The methods that get called by python automatically in certain situations
- By implementing these methods, it signals to python that they need to be called for that situation
- Don't ever call these under-under methods directly
- accessing attributes
- __getattr__
- constructor, initializer, finalizer
- __new__, __init__, __del__
- new vs. init allows for immutable types
- __new__, __init__, __del__
- the usual operators
- +, -, *, /, **, >, <, ==, +=, etc
- supporting sorting __lt__
- +, -, *, /, **, >, <, ==, +=, etc
- indexing
- calling like a function
- __call__
- with context
- __enter__ and __exit__
- iteration, truthvalue tests, containment tests
- for item in obj:, if obj:, if item in obj:
- __iter__
- __next
- conversion
- str( obj ), int( obj )
- __repr__, __float__(self)
- str( obj ), int( obj )
- __contains__ for keyword in
def __getitem__(self, index): # Seq API requirement """Returns a subsequence of single letter, use my_seq[index].""" #Note since Python 2.0, __getslice__ is deprecated #and __getitem__ is used instead. #See http://docs.python.org/ref/sequence-methods.html if isinstance(index, int): #Return a single letter as a string return self._data[index] else: #Return the (sub)sequence as another Seq object return Seq(self._data[index], self.alphabet)
special attibutes
- Consult datamodel documentation for reference
- instance attributes (per-object data) vs. class attribute
- __dict__ - accessible via vars() command, get dict containing instance attribute names, values, can set them via dict notation instead of dot notation.
- __class__
- class attributes are fetchable through the instances
- what that means is if you
- e.g., keeping track of all instances
class attribute naming convention
- CamelCaseNamingStyle vs pot_hole_naming_style
- _single_leading_underscore: weak "internal use" indicator.
- single_trailing_underscore_: used by convention to avoid conflicts with Python keyword (e.g., class_)
- __double_leading_underscore: when naming a class attribute, invokes name mangling (inside class FooBar, __boo becomes _FooBar__boo; )
- __double_leading_and_trailing_underscore__: "magic" objects or attributes that live in user-controlled namespaces. E.g. __init__, __import__ or __file__. Never invent such names; only use them as documented.
Using classes in your code
- access methods and attributes using the dot
Subclassing
- one way to define how your class behaves is through implementing magic methods, the other is to inherit from another class
- Inheritance
- reasons: factor out common code
- subclass
- you can subclass built in types, e.g., subclass list type to return evens() and odds()
- isa() vs hasa()
- isinstance() vs. type()
- overloaded
- multiple inheritance
- many frown upon this
- using super()
- help() will show what methods are defined where
Extend Python built-in types via inheritance
Abstract Base Class (interface)
- concrete vs abstract
- design by contract - interface
- area function for square, circle
- base class animal, subclass dog cat, virtual method talk.
@classmethod and @staticmethod
- @classmethod passes the class type of the object it was called on as the first argument instead of the instance it was called on
- NewFrom...
Python descriptor
- __get__(), __set__(), and __delete__()
- property( get, set, del )
Metaclasses
- the class of a class
- the type that a class is
- it's what implements a class
- in python the base metaclass is type
- the lass that creates the class
- the type of object is type
Algorithms
Data Viz (Class 9)
Key Concepts
- raster vs SVG
- HTML
- CSS
Static plots
- Publishable? maybe
- matplotlib
- Proof-of-concept D3 viewer for matplotlib by JakeVDP
Web-based highly interactive graphics
- Publishable? definitely
- HTML5 Canvas element - Kinda like logo. or Paint. For a web browser.
- D3.js - Represent data using native Web browser technology - bind data in the HTML document to the representation in the canvas element.
- Vega - A visualization grammar
- Vincent - A Python to Vega translator
- Pandas
- Vega explanation
- Bokeh documentation
Statistics techniques (class 10)
- python.statistics module, new since python 3.4
- randomizing and taking statistics
- doing a thousand simulations
- normal distribution
- t-test - rPy
- p value
- pandas
Machine Learning (Class 11)
- kaggle - "Making data science into a sport."
- scikit-learn
- mahotas
- orange
- Orange3 - in development, requires python 3.2
- Orange 2.7 - stable as of 30 Apr 2014
Possible optional classes
Packaging
help('modules')
- Modules - section on layout of a package
__init__.py
- import selected Classes, functions into package level__all__ = ['submodule1', 'submodule2']
- corresponds with client usagefrom package import *
- site-packages - destination for third-party packages
- easy_install - bundled with setuptools that lets you automatically download, build, install, and manage Python packages.
- setuptools packager
- Getting started with setuptools and setup.py
- PyPI
- pip
- egg - Same concept as a .jar file in Java, it is a .zip file with some metadata files renamed .egg, for distributing code as bundles.
$ file /Users/chris/scratch_python27_dir/lib/python2.7/site-packages/ColettaModule-0.1-py2.7-macosx-10.9-x86_64.egg /Users/chris/scratch_python27_dir/lib/python2.7/site-packages/ColettaModule-0.1-py2.7-macosx-10.9-x86_64.egg: Zip archive data, at least v2.0 to extract $ unzip /Users/chris/scratch_python27_dir/lib/python2.7/site-packages/ColettaModule-0.1-py2.7-macosx-10.9-x86_64.egg Archive: /Users/chris/scratch_python27_dir/lib/python2.7/site-packages/ColettaModule-0.1-py2.7-macosx-10.9-x86_64.egg inflating: _ColettaModule.py inflating: _ColettaModule.pyc inflating: _ColettaModule.so inflating: EGG-INFO/dependency_links.txt inflating: EGG-INFO/native_libs.txt inflating: EGG-INFO/PKG-INFO inflating: EGG-INFO/requires.txt inflating: EGG-INFO/SOURCES.txt inflating: EGG-INFO/top_level.txt inflating: EGG-INFO/zip-safe
Utilities
- goal: use python as glue for your other programs
- shutil
- make directories
- parsing command line arguments
- assertion
- advanced datetime
- pickle
- you take an object as it exists in RAM, like an instance of a class, and you dump that object bit for bit to a file
- The pickle is binary, not text, not human readable, that's the point! Meant to be quick, concise, small.
- Resurrect the object back into memory by unpickling
- I use all the time to pickle results generate on servers to be analyzed locally
- PicklingTools library - look for "Everything you wanted to know about pickling but were afraid to ask"
- STDIN/STDOUT/STDERR
- take result/output from another executable and using it in program
- maybe NCBI C++ toolkit integration??
- programmatically do internet queries via urllib or other
- glob module
Bioinformatics with python
Simple Web Applications
- cgi-scripts
GUI programming
- python tkinter
- PyQt4
Software development techniques (optional class)
UnitTests
- test-driven development
- writing unit tests
Using TimeIt
- timeit - documentation
Profiling
- The Python Profilers - documentation
import cProfile, pstats self.pr = cProfile.Profile() self.pr.enable() print "\n<<<---" # Do stuff p = pstats.Stats( self.pr ) p.strip_dirs() p.sort_stats ( 'cumtime' ) p.print_stats() print "\n--->>>"
Debugger
- I think debugger should actually go in this lecture
In-code Documentation Extraction
- idea: don't separate code and documentation
pydoc -p 1234
- doxygen - need doxypy to reformat docstrings so that doxygen's special commands are supported.
- sphinx
- doxygen to sphinx
Other
- Pylint - Analyze source code looking for bugs
- Pyreverse - autogenerate UML diagrams from source code
- revision control = GIT
- what is a race condition?
- virtualenv to debug installation? activate by running
source bin/activate
Can use --system-site-packages. - pip/setuptools/distribute, pip install from github
Bash Shell (optional class)
Introduction
- shot of scotty talking into the mouse
- you talking directly to the computer!
- the computer talking directly back to you
- clip of "It's a Unix System" scene in Jurassic Park
- longer clip
Motivation
- Example: Find in files in Microsoft Word - even programmers do it this way! better tool is grep.
- grep is the poster child for why you would use Unix
- iterate over files quickly
- Login to remote servers and execute commands - connect to and use a server the way that you'd use a computer that was physically sitting in front of you. Access to more powerful computers - now-a-days big data transfer over internet is long/putting data onto hard drives and Fed-Exing them is a PITA unless you do it often. Rather than transfer the data to you, you go to the data
- anything you can do with a windowing system, I can do better in a terminal
- browsing google with lynx
Installation
- Macs and Linux: command line built in: Terminal app
- Windows - download Cygwin, or virtual machine "OS inside another OS" usually small performance penalty.
General Concepts
- Spaces, quotes, escape characters
- no spaces in file names please
- there's no undelete!
- file systems can be mounted in ram
- backtics!
- pipes | > < << >>
- environment variables - change your environment!
- $PATH - like an onion
- which asks if an executable is in the path
- install two or more versions of a program side by side and the one that wins is the one that appears first in the $PATH
unix filesystem
- one unified system, contrast with windows
- all devices hang off one tree
- use command
mount
to see what devices are mounted where
- go through the usual directories
- home, /usr/bin, installed goes /usr/local/bin
- root dir
- root privileges
- do a symbolic link to build your own local directory tree
- ls thing.{pdf,txt}
privilege system
- sudo
- 755
More Concepts
- one command at a time
- The idea of . and ..
- read and write permissions
- ls
- dotfiles = hidden. need ls -A
- ls -lhS = sort in order of size
- ls -ltr = sort such that the most recently modified files show up at the bottom
- PATH
- intricately related with bashrc bashprofile
- OS looks for executable files in order that they appear in the PATH list
- separated by :
- use command
which
to look for executable in path/identify which one will in fact run if there are multiple
- running a script (e.g., using "./")
- Positional arguments vs keyword arguments
- using the manual
- pipes
- redirecting output to a file
- redirect output to another program
- the bit bucket
- redirect yes pump to /dev/null
- redirect stderr to stdout
- pipe yes pump to stdin when asked to delete read-only files
- home directory - blending of bash and UNIX
- .____rc files
- process IDs and kill
- user - who, superuser and sudo
- Unix File Permissions
Moving around in the shell for the first time
- You've woken up and suddenly there's a terminal in front of you. It's very strange. Let me be your guide.
- Prompt
- Scrolling
- tab complete
- up and down arrows
- history
- rerun a command using the bang number operator
- the meta characters to go all the way forward back, by one word, etc
Commands
- whoami
- where am i? - pwd
- what did i just do? - history
- what's in here? -ls
- what files are in there?
- sort by time of creation
- human readable file sizes
- cd change directory
- make a directory mkdir
- delete a file - be careful - no undos!!
- delete a directory
- how big is this folder? du
- how much space do I have on my hard disk?
- links, symbolic and hard
- making and extracting tarballs and using wget to download them
Viewing the contents of files
- cat - tac
- more/less
- universal navigation for less, vim, man
- G, gg, ctrl-f, ctrl-b, q, search with /, n, N
Edit a file
- nano
- pico
- vim
grep
- the double grep
grep similarity_matrix *.cpp | grep new
Advanced Concepts
- Iterating over files
- for
- find .. -exec .. {} \;
- grep | xargs ..
- using basename/dirname to slice and dice filenames
- using regular expression to extract parts of filename
- variables
- $( evaluate in here )
- arithmetic expressions
- saving output of backticks into variables
- package managers - requires developer tools on mac
- need a compiler (gcc)
- MacOS: MacPorts > homebrew
- Ubuntu (Debian): Advanced Packaging Tool (apt-get)
- CentOS (Red Hat): Yellowdog Updater, Modified (yum)
tips
- don't use spaces in filenames!
- tab completion
- change your .bashrc to make aliases and change prompt
Vocabulary Terms
- expression - one statement/command/line
- shell/command line/interactive mode
- variable/instance attribute
- class
- methods/functions
- arguments
- variadic
- keyword argument
- default
- pseudocode
- recursion
- syntax error
- exception, are "raised" (somtimes I say "throw")
- object oriented/encapulation
- scope: global vs. local vs. nested
- no way for outer to assign to inner
- operator
- escape character
- concatenation
- API
- iterable
- talk about vocabulary overlap
- syntactic sugar = a feature of the language syntax that makes the code more readable (e.g. list comprehension )