Intro to Python and Beyond

From Colettapedia
Jump to: navigation, search


Feedback from first class

  • Too much material covered
  • Focus on more practical things, i.e., data processing
  • Focus on things that Microsoft excel can't do, i.e., regular expressions
  • Homework, in-class problems
  • Separate the class for beginners and programmers

Class 1

Intro

  • My first experience with Python, coding in the back of Ledbetter's
  • Course Format
    • 6 total hours of instruction, bootcamp style
    • No homework - Try it at home? Anaconda.
  • Goal we're working towards
    • Drive to the basket: spreadsheet manipulation
    • Read in and Excel file
    • import pandas as pd; pd.read_excel()
    • Do some transformations on the data
    • Visualize the data
  • What we won't cover
    • There is an ecosystem of Python functionality; some of it falls under core python, some of it falls under 3rd party addins, some of it falls under the environment in which you interact with python (Jupyter Notebook). This class is going to gloss over the boundary between all three.
    • Regular expressions - a concise way of representing patterns that appear in text and modifying based on those patterns.
    • Object oriented programming
      • We'll learn the core data types in Python, OOP is how you make your own data types
  • Roadmap
    • Day 1: The IDE + graphing calculator math stuff
    • The operators
    1. the control flow
    2. The data manip

IDE Concepts

  • Compare and contrast with Excel (MATLAB)
    • Excel: Costs $$
    • Python: Free
    • Excel: little cubby holes that you can shove data into
    • Python give it a command
    • Excel: All the operations that you can do with the data, you can look through all the menus
    • Python: The amount of functionality far exceeds that of excel, can't have a button for every single thing. You have to memorize some syntax rather than pushing a button
  • The User Interface
    • "The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, machine learning and much more."
  • The concept of a notebook workflow - a distributable live document. reproducible science, like a cooking recipe or driving directions.
    • Trace a journey from start to finish
    • Example notebooks
  • What's a kernel?
    • Communicates with a kernel, do typing via web browser, good for local or cloud.
  • Do the user interface tour
  • Different cell types
    • editing vs. scrolling
    • repeating (Ctrl Enter), vs Shift Enter
    • Insert cell below
    • Delete cell
    • Split a cell
  • getting help
    • GOOGLE!!!!!
  • tab-completion
  • Markdown
    • Headings preceeded by #
    • unordered lists
    • ordered lists
    • Math equations go in between two $
  • file system commands
    • pwd
    • ls
    • cd
  • Import a package
  • Save to PDF

First steps with syntax

  • comment
  • one statement per line
  • assignment - a name on the left side of an equal sign
  • see what's inside a named value
    • use print to put things onto the same row
  • type()
  • numeric data types (scalars)
    • int
    • float
    • PEMDAS operators
    • convert back and forth
  • boolean data type
    • operators: and or
  • string data type
    • operators: concat, in, not in
    • slicing
    • string functions

Day 2-ish

  • Iterable data types
    • tuple, list, dict, set
    • len()
    • count unique values
  • indentation
  • if else
  • for
    • nested
  • function
    • arguments
    • scope
  • Iterating over lists
  • sorted

Day3-ish: Data Manipulation

  • read and write files in raw python
  • pandas syntax
  • rows = observations, columns = variables
  • import from excel
  • head, tail, sample
  • set numrows, num_columns
  • index and columns
    • Index must be unique
    • Initially just the row number: it doesn't have to be, could also be the participant id
  • subselect
  • sort_values()
  • add/drop rows or columns
  • value_counts
  • count empties
  • groupby
  • Further reading: pivot, merge

Day 4-ish: analysis and visualization

  • correlation
  • t-test
  • z-score
    • matplotlib & seaborn
  • Histogram
  • scatter plot
  • boxplot
  • multiple series