Python & Stuff
-
Upload
jacob-perkins -
Category
Technology
-
view
1.605 -
download
2
description
Transcript of Python & Stuff
![Page 1: Python & Stuff](https://reader035.fdocuments.in/reader035/viewer/2022070303/54b7a4114a79591c048b465d/html5/thumbnails/1.jpg)
Python & Stuff
All the things I like about Python, plus a bit more.
Friday, November 4, 11
![Page 2: Python & Stuff](https://reader035.fdocuments.in/reader035/viewer/2022070303/54b7a4114a79591c048b465d/html5/thumbnails/2.jpg)
Jacob PerkinsPython Text Processing with NLTK 2.0 Cookbook
Co-Founder & CTO @weotta
Blog: http://streamhacker.com
NLTK Demos: http://text-processing.com
@japerk
Python user for > 6 years
Friday, November 4, 11
![Page 3: Python & Stuff](https://reader035.fdocuments.in/reader035/viewer/2022070303/54b7a4114a79591c048b465d/html5/thumbnails/3.jpg)
What I use Python for
web development with Django
web crawling with Scrapy
NLP with NLTK
argparse based scripts
processing data in Redis & MongoDB
Friday, November 4, 11
![Page 4: Python & Stuff](https://reader035.fdocuments.in/reader035/viewer/2022070303/54b7a4114a79591c048b465d/html5/thumbnails/4.jpg)
Topicsfunctional programming
I/O
Object Oriented programming
scripting
testing
remoting
parsing
package management
data storage
performanceFriday, November 4, 11
![Page 5: Python & Stuff](https://reader035.fdocuments.in/reader035/viewer/2022070303/54b7a4114a79591c048b465d/html5/thumbnails/5.jpg)
Functional Programminglist comprehensions
slicing
iterators
generators
higher order functions
decorators
default & optional arguments
switch/case emulationFriday, November 4, 11
![Page 6: Python & Stuff](https://reader035.fdocuments.in/reader035/viewer/2022070303/54b7a4114a79591c048b465d/html5/thumbnails/6.jpg)
List Comprehensions
>>> [i for i in range(10) if i % 2][1, 3, 5, 7, 9]>>> dict([(i, i*2) for i in range(5)]){0: 0, 1: 2, 2: 4, 3: 6, 4: 8}>>> s = set(range(5))>>> [i for i in range(10) if i in s][0, 1, 2, 3, 4]
Friday, November 4, 11
![Page 7: Python & Stuff](https://reader035.fdocuments.in/reader035/viewer/2022070303/54b7a4114a79591c048b465d/html5/thumbnails/7.jpg)
Slicing
>>> range(10)[:5][0, 1, 2, 3, 4]>>> range(10)[3:5][3, 4]>>> range(10)[1:5][1, 2, 3, 4]>>> range(10)[::2][0, 2, 4, 6, 8]>>> range(10)[-5:-1][5, 6, 7, 8]
Friday, November 4, 11
![Page 8: Python & Stuff](https://reader035.fdocuments.in/reader035/viewer/2022070303/54b7a4114a79591c048b465d/html5/thumbnails/8.jpg)
Iterators
>>> i = iter([1, 2, 3])>>> i.next()1>>> i.next()2>>> i.next()3>>> i.next()Traceback (most recent call last): File "<stdin>", line 1, in <module>StopIteration
Friday, November 4, 11
![Page 9: Python & Stuff](https://reader035.fdocuments.in/reader035/viewer/2022070303/54b7a4114a79591c048b465d/html5/thumbnails/9.jpg)
Generators>>> def gen_ints(n):... for i in range(n):... yield i... >>> g = gen_ints(2)>>> g.next()0>>> g.next()1>>> g.next()Traceback (most recent call last): File "<stdin>", line 1, in <module>StopIteration
Friday, November 4, 11
![Page 10: Python & Stuff](https://reader035.fdocuments.in/reader035/viewer/2022070303/54b7a4114a79591c048b465d/html5/thumbnails/10.jpg)
Higher Order Functions
>>> def hof(n):... def addn(i):... return i + n... return addn... >>> f = hof(5)>>> f(3)8
Friday, November 4, 11
![Page 11: Python & Stuff](https://reader035.fdocuments.in/reader035/viewer/2022070303/54b7a4114a79591c048b465d/html5/thumbnails/11.jpg)
Decorators>>> def print_args(f):... def g(*args, **kwargs):... print args, kwargs... return f(*args, **kwargs)... return g... >>> @print_args... def add2(n):... return n+2... >>> add2(5)(5,) {}7>>> add2(3)(3,) {}5
Friday, November 4, 11
![Page 12: Python & Stuff](https://reader035.fdocuments.in/reader035/viewer/2022070303/54b7a4114a79591c048b465d/html5/thumbnails/12.jpg)
Default & Optional Args>>> def special_arg(special=None, *args, **kwargs):... print 'special:', special... print args... print kwargs... >>> special_arg(special='hi')special: hi(){}>>> >>> special_arg('hi')special: hi(){}
Friday, November 4, 11
![Page 13: Python & Stuff](https://reader035.fdocuments.in/reader035/viewer/2022070303/54b7a4114a79591c048b465d/html5/thumbnails/13.jpg)
switch/case emulation
OPTS = { “a”: all, “b”: any}
def all_or_any(lst, opt): return OPTS[opt](lst)
Friday, November 4, 11
![Page 14: Python & Stuff](https://reader035.fdocuments.in/reader035/viewer/2022070303/54b7a4114a79591c048b465d/html5/thumbnails/14.jpg)
Object Oriented
classes
multiple inheritance
special methods
collections
defaultdict
Friday, November 4, 11
![Page 15: Python & Stuff](https://reader035.fdocuments.in/reader035/viewer/2022070303/54b7a4114a79591c048b465d/html5/thumbnails/15.jpg)
Classes>>> class A(object):... def __init__(self):... self.value = 'a'... >>> class B(A):... def __init__(self):... super(B, self).__init__()... self.value = 'b'... >>> a = A()>>> a.value'a'>>> b = B()>>> b.value'b'
Friday, November 4, 11
![Page 16: Python & Stuff](https://reader035.fdocuments.in/reader035/viewer/2022070303/54b7a4114a79591c048b465d/html5/thumbnails/16.jpg)
Multiple Inheritance
>>> class B(object):... def __init__(self):... self.value = 'b'... >>> class C(A, B): pass... >>> C().value'a'>>> class C(B, A): pass... >>> C().value'b'
Friday, November 4, 11
![Page 17: Python & Stuff](https://reader035.fdocuments.in/reader035/viewer/2022070303/54b7a4114a79591c048b465d/html5/thumbnails/17.jpg)
Special Methods
__init__
__len__
__iter__
__contains__
__getitem__
Friday, November 4, 11
![Page 18: Python & Stuff](https://reader035.fdocuments.in/reader035/viewer/2022070303/54b7a4114a79591c048b465d/html5/thumbnails/18.jpg)
collections
high performance containers
Abstract Base Classes
Iterable, Sized, Sequence, Set, Mapping
multi-inherit from ABC to mix & match
implement only a few special methods, get rest for free
Friday, November 4, 11
![Page 19: Python & Stuff](https://reader035.fdocuments.in/reader035/viewer/2022070303/54b7a4114a79591c048b465d/html5/thumbnails/19.jpg)
defaultdict>>> d = {}>>> d['a'] += 2Traceback (most recent call last): File "<stdin>", line 1, in <module>KeyError: 'a'>>> import collections>>> d = collections.defaultdict(int)>>> d['a'] += 2>>> d['a']2>>> l = collections.defaultdict(list)>>> l['a'].append(1)>>> l['a'][1]
Friday, November 4, 11
![Page 20: Python & Stuff](https://reader035.fdocuments.in/reader035/viewer/2022070303/54b7a4114a79591c048b465d/html5/thumbnails/20.jpg)
I/O
context managers
file iteration
gevent / eventlet
Friday, November 4, 11
![Page 21: Python & Stuff](https://reader035.fdocuments.in/reader035/viewer/2022070303/54b7a4114a79591c048b465d/html5/thumbnails/21.jpg)
Context Managers
>>> with open('myfile', 'w') as f:... f.write('hello\nworld')...
Friday, November 4, 11
![Page 22: Python & Stuff](https://reader035.fdocuments.in/reader035/viewer/2022070303/54b7a4114a79591c048b465d/html5/thumbnails/22.jpg)
File Iteration
>>> with open('myfile') as f:... for line in f:... print line.strip()... helloworld
Friday, November 4, 11
![Page 23: Python & Stuff](https://reader035.fdocuments.in/reader035/viewer/2022070303/54b7a4114a79591c048b465d/html5/thumbnails/23.jpg)
gevent / eventlet
coroutine networking libraries
greenlets: “micro-threads”
fast event loop
monkey-patch standard library
http://www.gevent.org/
http://www.eventlet.net/
Friday, November 4, 11
![Page 24: Python & Stuff](https://reader035.fdocuments.in/reader035/viewer/2022070303/54b7a4114a79591c048b465d/html5/thumbnails/24.jpg)
Scripting
argparse
__main__
atexit
Friday, November 4, 11
![Page 25: Python & Stuff](https://reader035.fdocuments.in/reader035/viewer/2022070303/54b7a4114a79591c048b465d/html5/thumbnails/25.jpg)
argparseimport argparse
parser = argparse.ArgumentParser(description='Train a NLTK Classifier')
parser.add_argument('corpus', help='corpus name/path')parser.add_argument('--no-pickle', action='store_true', default=False, help="don't pickle")parser.add_argument('--trace', default=1, type=int, help='How much trace output you want')
args = parser.parse_args()
if args.trace: print ‘have args’
Friday, November 4, 11
![Page 26: Python & Stuff](https://reader035.fdocuments.in/reader035/viewer/2022070303/54b7a4114a79591c048b465d/html5/thumbnails/26.jpg)
__main__
if __name__ == ‘__main__’: do_main_function()
Friday, November 4, 11
![Page 27: Python & Stuff](https://reader035.fdocuments.in/reader035/viewer/2022070303/54b7a4114a79591c048b465d/html5/thumbnails/27.jpg)
atexit
def goodbye(name, adjective): print 'Goodbye, %s, it was %s to meet you.' % (name, adjective)
import atexitatexit.register(goodbye, 'Donny', 'nice')
Friday, November 4, 11
![Page 28: Python & Stuff](https://reader035.fdocuments.in/reader035/viewer/2022070303/54b7a4114a79591c048b465d/html5/thumbnails/28.jpg)
Testing
doctest
unittest
nose
fudge
py.test
Friday, November 4, 11
![Page 29: Python & Stuff](https://reader035.fdocuments.in/reader035/viewer/2022070303/54b7a4114a79591c048b465d/html5/thumbnails/29.jpg)
doctestdef fib(n): '''Return the nth fibonacci number. >>> fib(0) 0 >>> fib(1) 1 >>> fib(2) 1 >>> fib(3) 2 >>> fib(4) 3 ''' if n == 0: return 0 elif n == 1: return 1 else: return fib(n - 1) + fib(n - 2)
Friday, November 4, 11
![Page 30: Python & Stuff](https://reader035.fdocuments.in/reader035/viewer/2022070303/54b7a4114a79591c048b465d/html5/thumbnails/30.jpg)
doctesting modules
if __name__ == ‘__main__’: import doctest doctest.testmod()
Friday, November 4, 11
![Page 31: Python & Stuff](https://reader035.fdocuments.in/reader035/viewer/2022070303/54b7a4114a79591c048b465d/html5/thumbnails/31.jpg)
unittest
anything more complicated than function I/O
clean state for each test
test interactions between components
can use mock objects
Friday, November 4, 11
![Page 32: Python & Stuff](https://reader035.fdocuments.in/reader035/viewer/2022070303/54b7a4114a79591c048b465d/html5/thumbnails/32.jpg)
nose
http://readthedocs.org/docs/nose/en/latest/
test runner
auto-discovery of tests
easy plugin system
plugins can generate XML for CI (Jenkins)
Friday, November 4, 11
![Page 33: Python & Stuff](https://reader035.fdocuments.in/reader035/viewer/2022070303/54b7a4114a79591c048b465d/html5/thumbnails/33.jpg)
fudge
http://farmdev.com/projects/fudge/
make fake objects
mock thru monkey-patching
Friday, November 4, 11
![Page 34: Python & Stuff](https://reader035.fdocuments.in/reader035/viewer/2022070303/54b7a4114a79591c048b465d/html5/thumbnails/34.jpg)
py.test
http://pytest.org/latest/
similar to nose
distributed multi-platform testing
Friday, November 4, 11
![Page 35: Python & Stuff](https://reader035.fdocuments.in/reader035/viewer/2022070303/54b7a4114a79591c048b465d/html5/thumbnails/35.jpg)
Remoting Libraries
Fabric
execnet
Friday, November 4, 11
![Page 36: Python & Stuff](https://reader035.fdocuments.in/reader035/viewer/2022070303/54b7a4114a79591c048b465d/html5/thumbnails/36.jpg)
Fabric
http://fabfile.org
run commands over ssh
great for “push” deployment
not parallel yet
Friday, November 4, 11
![Page 37: Python & Stuff](https://reader035.fdocuments.in/reader035/viewer/2022070303/54b7a4114a79591c048b465d/html5/thumbnails/37.jpg)
fabfile.pyfrom fabric.api import run
def host_type(): run('uname -s')
fab command$ fab -H localhost,linuxbox host_type[localhost] run: uname -s[localhost] out: Darwin[linuxbox] run: uname -s[linuxbox] out: Linux
Friday, November 4, 11
![Page 38: Python & Stuff](https://reader035.fdocuments.in/reader035/viewer/2022070303/54b7a4114a79591c048b465d/html5/thumbnails/38.jpg)
execnethttp://codespeak.net/execnet/
open python interpreters over ssh
spawn local python interpreters
shared-nothing model
send code & data over channels
interact with CPython, Jython, PyPy
py.test distributed testing
Friday, November 4, 11
![Page 39: Python & Stuff](https://reader035.fdocuments.in/reader035/viewer/2022070303/54b7a4114a79591c048b465d/html5/thumbnails/39.jpg)
execnet example
>>> import execnet, os>>> gw = execnet.makegateway("ssh=codespeak.net")>>> channel = gw.remote_exec("""... import sys, os... channel.send((sys.platform, sys.version_info, os.getpid()))... """)>>> platform, version_info, remote_pid = channel.receive()>>> platform'linux2'>>> version_info(2, 4, 2, 'final', 0)
Friday, November 4, 11
![Page 40: Python & Stuff](https://reader035.fdocuments.in/reader035/viewer/2022070303/54b7a4114a79591c048b465d/html5/thumbnails/40.jpg)
Parsing
regular expressions
NLTK
SimpleParse
Friday, November 4, 11
![Page 41: Python & Stuff](https://reader035.fdocuments.in/reader035/viewer/2022070303/54b7a4114a79591c048b465d/html5/thumbnails/41.jpg)
NLTK Tokenization
>>> from nltk import tokenize>>> tokenize.word_tokenize("Jacob's presentation")['Jacob', "'s", 'presentation']>>> tokenize.wordpunct_tokenize("Jacob's presentation")['Jacob', "'", 's', 'presentation']
Friday, November 4, 11
![Page 42: Python & Stuff](https://reader035.fdocuments.in/reader035/viewer/2022070303/54b7a4114a79591c048b465d/html5/thumbnails/42.jpg)
nltk.grammar
CFGs
Chapter 9 of NLTK Book: http://nltk.googlecode.com/svn/trunk/doc/book/ch09.html
Friday, November 4, 11
![Page 43: Python & Stuff](https://reader035.fdocuments.in/reader035/viewer/2022070303/54b7a4114a79591c048b465d/html5/thumbnails/43.jpg)
more NLTK
stemming
part-of-speech tagging
chunking
classification
Friday, November 4, 11
![Page 44: Python & Stuff](https://reader035.fdocuments.in/reader035/viewer/2022070303/54b7a4114a79591c048b465d/html5/thumbnails/44.jpg)
SimpleParse
http://simpleparse.sourceforge.net/
Parser generator
EBNF grammars
Based on mxTextTools: http://www.egenix.com/products/python/mxBase/mxTextTools/ (C extensions)
Friday, November 4, 11
![Page 45: Python & Stuff](https://reader035.fdocuments.in/reader035/viewer/2022070303/54b7a4114a79591c048b465d/html5/thumbnails/45.jpg)
Package Management
import
pip
virtualenv
mercurial
Friday, November 4, 11
![Page 46: Python & Stuff](https://reader035.fdocuments.in/reader035/viewer/2022070303/54b7a4114a79591c048b465d/html5/thumbnails/46.jpg)
importimport modulefrom module import function, ClassNamefrom module import function as f
always make sure package directories have __init__.py
Friday, November 4, 11
![Page 47: Python & Stuff](https://reader035.fdocuments.in/reader035/viewer/2022070303/54b7a4114a79591c048b465d/html5/thumbnails/47.jpg)
pip
http://www.pip-installer.org/en/latest/
easy_install replacement
install from requirements files
$ pip install simplejson[... progress report ...]Successfully installed simplejson
Friday, November 4, 11
![Page 48: Python & Stuff](https://reader035.fdocuments.in/reader035/viewer/2022070303/54b7a4114a79591c048b465d/html5/thumbnails/48.jpg)
virtualenv
http://www.virtualenv.org/en/latest/
create self-contained python installations
dependency silos
works great with pip (same author)
Friday, November 4, 11
![Page 49: Python & Stuff](https://reader035.fdocuments.in/reader035/viewer/2022070303/54b7a4114a79591c048b465d/html5/thumbnails/49.jpg)
mercurial
http://mercurial.selenic.com/
Python based DVCS
simple & fast
easy cloning
works with Bitbucket, Github, Googlecode
Friday, November 4, 11
![Page 50: Python & Stuff](https://reader035.fdocuments.in/reader035/viewer/2022070303/54b7a4114a79591c048b465d/html5/thumbnails/50.jpg)
Flexible Data Storage
Redis
MongoDB
Friday, November 4, 11
![Page 51: Python & Stuff](https://reader035.fdocuments.in/reader035/viewer/2022070303/54b7a4114a79591c048b465d/html5/thumbnails/51.jpg)
Redis
in-memory key-value storage server
most operations O(1)
lists
sets
sorted sets
hash objects
Friday, November 4, 11
![Page 52: Python & Stuff](https://reader035.fdocuments.in/reader035/viewer/2022070303/54b7a4114a79591c048b465d/html5/thumbnails/52.jpg)
MongoDB
memory mapped document storage
arbitrary document fields
nested documents
index on multiple fields
easier (for programmers) than SQL
capped collections (good for logging)
Friday, November 4, 11
![Page 53: Python & Stuff](https://reader035.fdocuments.in/reader035/viewer/2022070303/54b7a4114a79591c048b465d/html5/thumbnails/53.jpg)
Python Performance
CPU
RAM
Friday, November 4, 11
![Page 54: Python & Stuff](https://reader035.fdocuments.in/reader035/viewer/2022070303/54b7a4114a79591c048b465d/html5/thumbnails/54.jpg)
CPU
probably fast enough if I/O or DB bound
try PyPy: http://pypy.org/
use CPython optimized libraries like numpy
write a CPython extension
Friday, November 4, 11
![Page 55: Python & Stuff](https://reader035.fdocuments.in/reader035/viewer/2022070303/54b7a4114a79591c048b465d/html5/thumbnails/55.jpg)
RAM
don’t keep references longer than needed
iterate over data
aggregate to an optimized DB
Friday, November 4, 11
![Page 56: Python & Stuff](https://reader035.fdocuments.in/reader035/viewer/2022070303/54b7a4114a79591c048b465d/html5/thumbnails/56.jpg)
import this>>> import thisThe Zen of Python, by Tim Peters
Beautiful is better than ugly.Explicit is better than implicit.Simple is better than complex.Complex is better than complicated.Flat is better than nested.Sparse is better than dense.Readability counts.Special cases aren't special enough to break the rules.Although practicality beats purity.Errors should never pass silently.Unless explicitly silenced.In the face of ambiguity, refuse the temptation to guess.There should be one-- and preferably only one --obvious way to do it.Although that way may not be obvious at first unless you're Dutch.Now is better than never.Although never is often better than *right* now.If the implementation is hard to explain, it's a bad idea.If the implementation is easy to explain, it may be a good idea.Namespaces are one honking great idea -- let's do more of those!
Friday, November 4, 11