BIOF 309 Curriculum

From Colettapedia
Jump to navigation Jump to search

Contents

Changes for 2015

  1. Use Python3
  2. Magic number becomes student number which students can refer to
  3. Google API
    1. Get statistics for who uses class email list
    2. Download homeworks, grade them, and output an html saying what their grade was
  4. Autograder for girlfriend guess hw:
    1. ReturnWordsInWalden: Check type of return, number of entries, lowercase-ize
    2. Have another function that checks intersection list?
    3. Change objective to focus on finding gf's name more?
  5. NO TABS ANYWHERE!!!
  6. Critical to have notebooks so people don't have to type along
  7. Have an algorithms section that talks about sorted()
  8. Explicitly cover if __name__ == '__main__': - The __main__ namespace, __builtin__ namespace
  9. __file__, __doc__
  10. Implement some sort of quiz to happen often, perhaps as a class exercise, via Moodle installation
  11. More coverage on abstract methods/abstract properties vs. concrete methods/properties on ABCs
  12. Proper introduction of the use of git/github as a tool
  13. Autograder hw4 check return types!!!!!! Plus better doc strings as to what should be returned.
  14. Use the slider to vary a parameter in IPython, and other tricks
  15. buffer = a temporary storage unit/zone where you put things. size and contents may change rapidly. ("flush the buffer")
  16. docstring notation ->returns, function signature, etc.
  17. yield in the middle of a function, the finer points of yield
  18. clarify what it means to pass if you're an auditor

IPython

four most helpful commands

  •  ? - intro and overview
  • %quickref
  • help
  • object?

Feaures

  • tab completion
  • explore objects
  • magic functions
  • %run any python script and load results into interactive namespace
  • importable as a package IPython
  • from command line ipython --help | less
  • history
  • execute shell commands by prefacing it with !
    • capture the output in a python variable with double !!
  • reload a module - useful for making changes
  • system for caching input/output

Magic command system

  • %lsmagic - see all magic commands
  • %somemagicfunction? - see the help for the magic function
  • single % = magic function
  • double %% = cell magic
  • enable automagic - don't have to type the single %
  • define your own magics
  1. %magic
  2. %cd
  3. %timeit time; this statement;
    1.  %%timeit setup code; more setup code
  subsequent lines are what is timed.
  1. %pdoc - pring docstring
  2. %pdef - print call signature
  3. %env - show environment variables
  4. %load file - load apython script asif it were typed into the ipython terminal
  5. %pastebin -d "description" line numbers: Upload code to Github's Gist paste bin, returning the URL.
  6. %prun - python code profiler
  7. %recall - more like recall as in re-call than as in remember

tips

  • semicolon suppresses print statement output

Course Units

Read files, string manipulation and Design Patterns(class 4)

Design Patterns

  • Why study design patterns? It's a cookbook for programmers!
  • loop forever
  • search loop
  • dictionary iteration
  • file iteration
  • numbering iterations
  • repeat
  • do
  • collect
  • Combine
  • count
  • Collection Combine
  • Search
  • Filtered Do
  • Filtered Collect
  • Filtered Collect Groups of lines
  • Filtered Combine
  • Filtered Count
  • Nested Iteration
  • Recursive Tree Iteration

Algorithms

  • Big-O notation
  • quicksort

Homework

  • read in this file and output the same file sorted by state then city.

Functions, Generators, and Debugging (screencast 2)

functions

  • eliminate repetitiveness
  • global vs local variables - name bound to a value
  • namespace - a list of all the names for all the variables that are currently being used
    • A scope is a textual region of a Python program where a namespace is directly accessible. “Directly accessible” here means that an unqualified reference to a name attempts to find the name in the namespace.
    • you get a new scope after every def, but not ifs, for loops, etc
    • you can "read" variables from inner scopes, you just can't rebind that name to a new value or object
      • but! you can add additional items to a dict via inner scope.
  • values generated are not retained between calls!
  • what are the side effects of running the function, e.g., print statements.
  • return statement
    • allowed to have multiple returns in a function, but once it's hit, function exits immediately!
      • Can use this to break out of loop forever patterns in a way similar to break.
    • Can only return a single object, but that single object can be a container that contains multiple things
  • arguments
  • default arguments
  • keyword args come only after positional args
  • recursive
  • the return statement
  • the "backtrace" or call stack
  • pass
    • do nothing
    • how you tell python you want an empty block
    • usually a placeholder when you haven't decided what to put there
  • comments and docstrings
  • comments disappear when python interprets code, but docstrings remain, accessible via help()
    • docstring conventions, longer than one line, make one line summary, then blank. end with blank line

Assertions

  • Best Practice For Python Assert
  • Asserts should be used to test conditions that should never happen. The purpose is to crash early in the case of a corrupt program state.
  • Exceptions should be used for errors that can conceivably happen, and you should almost always create your own Exception classes.
    • For example, if you're writing a function to read from a configuration file into a dict, improper formatting in the file should raise a ConfigurationSyntaxError, while you can assert that you're not about to return None.
  • In your example, if x is a value set via a user interface or from an external source, an exception is best.
  • If x is only set by your own code in the same program, go with an assertion.

Function decorators

def output_railroad_switch( method_that_prints_output ):
	"""This is a decorator that optionally lets the user specify a file to which to redirect
	STDOUT. To use, you must use the keyword argument "output_filepath" and optionally
	the keyword argument "mode" """

	def print_method_wrapper( *args, **kwargs ):
		retval = None
		if "output_filepath" in kwargs:
			output_filepath = kwargs[ "output_filepath" ]
			del kwargs[ "output_filepath" ]
			if "mode" in kwargs:
				mode = kwargs[ "mode" ]
				del kwargs[ "mode" ]
			else:
				mode = 'w'
			print 'Saving output of function "{0}()" to file "{1}", mode "{2}"'.format(\
			      method_that_prints_output.__name__, output_filepath, mode )
			import sys
			backup = sys.stdout
			sys.stdout = open( output_filepath, mode )
			retval = method_that_prints_output( *args, **kwargs )
			sys.stdout.close()
			sys.stdout = backup
		else:
			retval = method_that_prints_output( *args, **kwargs )
		return retval

	return print_method_wrapper
  • example: define a decorator which calls the decorated function 10 times
  • need example of decorator with an argument, see here at 35:27

Anonymous functions

  • lambda a, b: blah blah
    • more about these when talking about sorting

Iterators

  • not rewindable, reversable, copyable
  • value of it changes every time you use it
    • e.g., it = iter(range(10)); zip(it, it) yields [(0, 1), (2, 3), ... (9, 10)]

Generators

  • generator an easy way to create an iterator
  • generates each item on the fly
  • ""A generator can take the place of a list when the list is so long, and/or its values are so big that creating the entire list before processing its elements would use enormous amounts of memory"
  • yield statement
    • "A function that uses a yield statement in place of return generated a new generator object each time it's called. The generator object encapsulates the bindings and code each time its called , keeping them together for as long as the generator is in use"
  • next( generator[, default])
  • values inside are RETAINED between calls
  • no way to access individual element, can only call next
  • can call set list or tuple on a generator to get one, but don't do it on an infinite generator!
    • reverse operation, calling iter( ...) on a list
  • itertools.cycle( list ) - repeating sequence over and over.
  • Coroutines - "allow multiple entry points for suspending and resuming execution at certain locations."
    • yield becomes an expression that returns stuff from the outside, passed in when caller uses generator.send( stuff )

Python Debugger

  • comment things out
  • use print statements
  • inserting in using import pdb;pdb.set_trace()
  • commands
  • backtrace
  • post mortem debugging: python -m pdb your_script.py

Slice, dice, Combine and Sort (class 5)

Goal

  • Do a screen cast for this
  • this is the lecture you talk about list comprehensions, zipping and unzipping tuples, and sorting stuff

String Format

  • Build a format string ahead of time and call .format on it while iterating
  • .format() minilanguage
  • sprintf operations, significant digits - give the place holders names, format after the colon
  • * operator turns tuple into indiv values for format arguments.
  • joins and split, and strip
  • unpack a split into multiple variables
  • print "{num:>12.3f} and without pad {num:0.3f}".format( num=num )

file IO

  • open(), close()
  • with keyword
  • reading modes "read write append"
  • ways of reading files
    • line by line iteration
    • slurp entire text into one variable
    • chomp
  • blocking operations
  • Accessing memory mapped files is faster than using direct read and write operations for two reasons. Firstly, a system call is orders of magnitude slower than a simple change to a program's local memory. Secondly, in most operating systems the memory region mapped actually is the kernel's page cache (file cache), meaning that no copies need to be created in user space - from memory-mapped file

Comprehension

  • can have for list, set and dict

Conditional Comprehensions

  • [expression for element in collection if test ]

zip and unzip

  • zip() and zip*
  • itertools.islice

Anonymous functions

  • lambda a, b: blah blah
    • more about these when talking about sorting

Sort stuff

  • can provide key sorted( ..., key = lambda x: ... )
  • full discussion of cmp(a,b), returns -1, 1, 1
  • can define your own lambda
  • perhaps use
  • sort by column a, then column b:
    • sort_func = lambda A, B: cmp( A[0], B[0] ) if A[0] != B[0] else cmp( A[1], B[1] )
  • reverse = True

Homework

  • make dict of DNA/RNA codon table

Regular Expressions (class 7)

  • Pattern Matching
  • greedy vs. non-greedy
  • grouping
  • substitution
  • beginning of line, end of line
  • backreference

Classes and Object-Oriented Programming (class 8)

vocab

  1. CS terms
    1. instance/instantiate
    2. object
    3. data field
    4. process/function/subroutine
    5. class
    6. encapsulation
    7. inheritance
    8. implement
  2. python names for cs concepts
    1. object
    2. member
    3. method
    4. attributes
    5. class
    6. instance attribute
    7. class attribute

Philosophical underpinnings

  • Simple collections are good, but nested collections can be a pain
  • Comparison of programming paradigms
  • Key concept: members and attributes used by the public, vs member and attributes that are meant for internal use only
    • by convention members starting with underscore not supposed to be used by the public
  • python convention: don't use variables/functions that start with _, i.e., those functions aren't part of the interface
  • <quote>In Python, class is a sysnonym for type and object is a synonym for value. The period in str.count tells python to look for the named method in class str... The statement "every value has a type" can be rephrased as "every object is an instance of a class." Classes and instances form the basis of object-oriented programming.</quote>
  • data model
  • interface - information hiding - "You don't need to know that!"
    • abstraction - hide away implementation details
    • public vs. private
  • distinction between "In python everything is an object" in the conceptual sense, vs. "all objects inherit from object"
  • stencil metaphor - a template.

Defining classes

  • the anatomy of a class definition
  • inherit from object
    • in python 2.x you have to explicitly inherit from object, in python 3.x all user-defined classes implicitly inherit from object
    • this is part of the cruft that removed in python 3
  • "attributes"
  • functions become methods
    • what's the difference setween functions and methods? self passed as first object

The simplest class

  • def MyNewClass( object ) : pass
  • docstring

python magic attributes

  • what's all the special __stuff__ in an object when you call dir( obj )?
  • duck typing - Special Method Names
class thing( object ):

  def __init__( self, value ):
    self.val = value

  def AddMeWithAnotherThing( self, other ):
    return self.__class__( self.val + other.val )

  def __add__( self, other ):
    return self.AddMeWithAnotherThing( other )

  def __repr__(self):
    return "thing(" + str(self.val) + ")"

things = map( thing, range(20) )

print reduce( lambda x, y: x+y, things )
  • Inheritance (OOP) vs. emulation (implement the functions that make an object behave like another)

python object hooks

  • (slide stolen from a googletechtalk)
  • defines how your object behaves
  • how to make your classes act like built-in python classes such as int, str, etc.
  • The methods that get called by python automatically in certain situations
    • By implementing these methods, it signals to python that they need to be called for that situation
  • Don't ever call these under-under methods directly
  • accessing attributes
    • __getattr__
  • constructor, initializer, finalizer
    • __new__, __init__, __del__
      • new vs. init allows for immutable types
  • the usual operators
    • +, -, *, /, **, >, <, ==, +=, etc
      • supporting sorting __lt__
  • indexing
  • calling like a function
    • __call__
  • with context
    • __enter__ and __exit__
  • iteration, truthvalue tests, containment tests
  • for item in obj:, if obj:, if item in obj:
    • __iter__
    • __next
  • conversion
    • str( obj ), int( obj )
      • __repr__, __float__(self)
  • __contains__ for keyword in
    def __getitem__(self, index):                 # Seq API requirement
        """Returns a subsequence of single letter, use my_seq[index]."""
        #Note since Python 2.0, __getslice__ is deprecated
        #and __getitem__ is used instead.
        #See http://docs.python.org/ref/sequence-methods.html
        if isinstance(index, int):
            #Return a single letter as a string
            return self._data[index]
        else:
            #Return the (sub)sequence as another Seq object
            return Seq(self._data[index], self.alphabet)

special attibutes

  • Consult datamodel documentation for reference
  • instance attributes (per-object data) vs. class attribute
  • __dict__ - accessible via vars() command, get dict containing instance attribute names, values, can set them via dict notation instead of dot notation.
  • __class__
  • class attributes are fetchable through the instances
    • what that means is if you
    • e.g., keeping track of all instances

class attribute naming convention

  • CamelCaseNamingStyle vs pot_hole_naming_style
  • _single_leading_underscore: weak "internal use" indicator.
  • single_trailing_underscore_: used by convention to avoid conflicts with Python keyword (e.g., class_)
  • __double_leading_underscore: when naming a class attribute, invokes name mangling (inside class FooBar, __boo becomes _FooBar__boo; )
  • __double_leading_and_trailing_underscore__: "magic" objects or attributes that live in user-controlled namespaces. E.g. __init__, __import__ or __file__. Never invent such names; only use them as documented.

Using classes in your code

  • access methods and attributes using the dot

Subclassing

    • one way to define how your class behaves is through implementing magic methods, the other is to inherit from another class
  • Inheritance
    • reasons: factor out common code
  • subclass
    • you can subclass built in types, e.g., subclass list type to return evens() and odds()
  • isa() vs hasa()
    • isinstance() vs. type()
  • overloaded
  • multiple inheritance
    • many frown upon this
  • using super()
  • help() will show what methods are defined where

Extend Python built-in types via inheritance

Abstract Base Class (interface)

  • concrete vs abstract
  • design by contract - interface
    • area function for square, circle
    • base class animal, subclass dog cat, virtual method talk.

@classmethod and @staticmethod

  • @classmethod passes the class type of the object it was called on as the first argument instead of the instance it was called on
    • NewFrom...

Python descriptor

  • __get__(), __set__(), and __delete__()
  • property( get, set, del )

Metaclasses

  • the class of a class
  • the type that a class is
  • it's what implements a class
  • in python the base metaclass is type
  • the lass that creates the class
  • the type of object is type

Algorithms

Data Viz (Class 9)

Key Concepts

  • raster vs SVG
  • HTML
  • CSS

Static plots

Web-based highly interactive graphics

  • Publishable? definitely
  1. HTML5 Canvas element - Kinda like logo. or Paint. For a web browser.
  2. D3.js - Represent data using native Web browser technology - bind data in the HTML document to the representation in the canvas element.
  3. Vega - A visualization grammar
  4. Vincent - A Python to Vega translator
  5. Pandas

Statistics techniques (class 10)

  • python.statistics module, new since python 3.4
  • randomizing and taking statistics
  • doing a thousand simulations
  • normal distribution
  • t-test - rPy
  • p value
  • pandas

Machine Learning (Class 11)

Possible optional classes

Packaging

  • help('modules')
  • Modules - section on layout of a package
  • __init__.py - import selected Classes, functions into package level
  • __all__ = ['submodule1', 'submodule2'] - corresponds with client usage from package import *
  • site-packages - destination for third-party packages
  • easy_install - bundled with setuptools that lets you automatically download, build, install, and manage Python packages.
  • setuptools packager
  • Getting started with setuptools and setup.py
  • PyPI
  • pip
  • egg - Same concept as a .jar file in Java, it is a .zip file with some metadata files renamed .egg, for distributing code as bundles.

$ file /Users/chris/scratch_python27_dir/lib/python2.7/site-packages/ColettaModule-0.1-py2.7-macosx-10.9-x86_64.egg
/Users/chris/scratch_python27_dir/lib/python2.7/site-packages/ColettaModule-0.1-py2.7-macosx-10.9-x86_64.egg: Zip archive data, at least v2.0 to extract

$ unzip /Users/chris/scratch_python27_dir/lib/python2.7/site-packages/ColettaModule-0.1-py2.7-macosx-10.9-x86_64.egg
Archive:  /Users/chris/scratch_python27_dir/lib/python2.7/site-packages/ColettaModule-0.1-py2.7-macosx-10.9-x86_64.egg
  inflating: _ColettaModule.py       
  inflating: _ColettaModule.pyc      
  inflating: _ColettaModule.so       
  inflating: EGG-INFO/dependency_links.txt  
  inflating: EGG-INFO/native_libs.txt  
  inflating: EGG-INFO/PKG-INFO       
  inflating: EGG-INFO/requires.txt   
  inflating: EGG-INFO/SOURCES.txt    
  inflating: EGG-INFO/top_level.txt  
  inflating: EGG-INFO/zip-safe

Utilities

  • goal: use python as glue for your other programs
  • shutil
  • make directories
  • parsing command line arguments
  • assertion
  • advanced datetime
  • pickle
    • you take an object as it exists in RAM, like an instance of a class, and you dump that object bit for bit to a file
    • The pickle is binary, not text, not human readable, that's the point! Meant to be quick, concise, small.
    • Resurrect the object back into memory by unpickling
    • I use all the time to pickle results generate on servers to be analyzed locally
    • PicklingTools library - look for "Everything you wanted to know about pickling but were afraid to ask"
  • STDIN/STDOUT/STDERR
  • take result/output from another executable and using it in program
    • maybe NCBI C++ toolkit integration??
  • programmatically do internet queries via urllib or other
  • glob module

Bioinformatics with python

Simple Web Applications

  • cgi-scripts

GUI programming

Software development techniques (optional class)

UnitTests

  • test-driven development
    • writing unit tests

Using TimeIt

Profiling

    import cProfile, pstats
    self.pr = cProfile.Profile()
    self.pr.enable()
    print "\n<<<---"
    # Do stuff
    p = pstats.Stats( self.pr )
    p.strip_dirs()
    p.sort_stats ( 'cumtime' )
    p.print_stats()
    print "\n--->>>"

Debugger

  • I think debugger should actually go in this lecture

In-code Documentation Extraction

  • idea: don't separate code and documentation
  • pydoc -p 1234
  • doxygen - need doxypy to reformat docstrings so that doxygen's special commands are supported.
  • sphinx
  • doxygen to sphinx

Other

  • Pylint - Analyze source code looking for bugs
    • Pyreverse - autogenerate UML diagrams from source code
  • revision control = GIT
  • what is a race condition?
  • virtualenv to debug installation? activate by running source bin/activateCan use --system-site-packages.
  • pip/setuptools/distribute, pip install from github

Bash Shell (optional class)

Introduction

Motivation

  • Example: Find in files in Microsoft Word - even programmers do it this way! better tool is grep.
    • grep is the poster child for why you would use Unix
  • iterate over files quickly
  • Login to remote servers and execute commands - connect to and use a server the way that you'd use a computer that was physically sitting in front of you. Access to more powerful computers - now-a-days big data transfer over internet is long/putting data onto hard drives and Fed-Exing them is a PITA unless you do it often. Rather than transfer the data to you, you go to the data
  • anything you can do with a windowing system, I can do better in a terminal
    • browsing google with lynx

Installation

  • Macs and Linux: command line built in: Terminal app
  • Windows - download Cygwin, or virtual machine "OS inside another OS" usually small performance penalty.

General Concepts

  • Spaces, quotes, escape characters
    • no spaces in file names please
  • there's no undelete!
  • file systems can be mounted in ram
  • backtics!
  • pipes | > < << >>
  • environment variables - change your environment!
    • $PATH - like an onion
    • which asks if an executable is in the path
    • install two or more versions of a program side by side and the one that wins is the one that appears first in the $PATH

unix filesystem

  • one unified system, contrast with windows
    • all devices hang off one tree
    • use command mount to see what devices are mounted where
  • go through the usual directories
    • home, /usr/bin, installed goes /usr/local/bin
  • root dir
  • root privileges
  • do a symbolic link to build your own local directory tree
  • ls thing.{pdf,txt}

privilege system

  • sudo
  • 755

More Concepts

  • one command at a time
  • The idea of . and ..
  • read and write permissions
  • ls
    • dotfiles = hidden. need ls -A
    • ls -lhS = sort in order of size
    • ls -ltr = sort such that the most recently modified files show up at the bottom
  • PATH
    • intricately related with bashrc bashprofile
    • OS looks for executable files in order that they appear in the PATH list
    • separated by :
    • use command which to look for executable in path/identify which one will in fact run if there are multiple
  • running a script (e.g., using "./")
  • Positional arguments vs keyword arguments
  • using the manual
  • pipes
    • redirecting output to a file
    • redirect output to another program
  • the bit bucket
    • redirect yes pump to /dev/null
    • redirect stderr to stdout
    • pipe yes pump to stdin when asked to delete read-only files
  • home directory - blending of bash and UNIX
  • .____rc files
  • process IDs and kill
  • user - who, superuser and sudo
  • Unix File Permissions

Moving around in the shell for the first time

  • You've woken up and suddenly there's a terminal in front of you. It's very strange. Let me be your guide.
  • Prompt
  • Scrolling
  • tab complete
  • up and down arrows
  • history
    • rerun a command using the bang number operator
  • the meta characters to go all the way forward back, by one word, etc

Commands

  • whoami
  • where am i? - pwd
  • what did i just do? - history
  • what's in here? -ls
    • what files are in there?
    • sort by time of creation
    • human readable file sizes
  • cd change directory
  • make a directory mkdir
  • delete a file - be careful - no undos!!
  • delete a directory
  • how big is this folder? du
  • how much space do I have on my hard disk?
  • links, symbolic and hard
  • making and extracting tarballs and using wget to download them

Viewing the contents of files

  • cat - tac
  • more/less
  • universal navigation for less, vim, man
  • G, gg, ctrl-f, ctrl-b, q, search with /, n, N

Edit a file

  • nano
  • pico
  • vim

grep

  • the double grep grep similarity_matrix *.cpp | grep new

Advanced Concepts

  • Iterating over files
    • for
    • find .. -exec .. {} \;
    • grep | xargs ..
  • using basename/dirname to slice and dice filenames
  • using regular expression to extract parts of filename
  • variables
  • saving output of backticks into variables
  • package managers - requires developer tools on mac
    • need a compiler (gcc)
    • MacOS: MacPorts > homebrew
    • Ubuntu (Debian): Advanced Packaging Tool (apt-get)
    • CentOS (Red Hat): Yellowdog Updater, Modified (yum)

tips

  • don't use spaces in filenames!
  • tab completion
  • change your .bashrc to make aliases and change prompt

Vocabulary Terms

  • expression - one statement/command/line
  • shell/command line/interactive mode
  • variable/instance attribute
  • class
  • methods/functions
    • arguments
    • variadic
    • keyword argument
    • default
  • pseudocode
  • recursion
  • syntax error
  • exception, are "raised" (somtimes I say "throw")
  • object oriented/encapulation
  • scope: global vs. local vs. nested
    • no way for outer to assign to inner
  • operator
  • escape character
  • concatenation
  • API
  • iterable
  • talk about vocabulary overlap
  • syntactic sugar = a feature of the language syntax that makes the code more readable (e.g. list comprehension )