Tweets by @markusgattol |
Python, it just fits your brain... IntroductionThis sections features as a general get to know Python section in that it touches on the most profound theoretical and practical subjects. First thing to remember about Python is
... in Python everything is an object!
Second most important thing with regards to Python is that
Iterators are everywhere, underlying everything, always just Both, iterators and objects are explained in detail further down... Main Usage AreasSo what is it that most people use Python for? Well, there are two main usage areas: There are many others too like for example scientific computing or robotics but those are areas which have a considerably smaller userbase than the two major areas mentioned above. Why Python?Python where we can, C++ where we must... As with many things in live, simplicity is key. Even more so if by gaining simplicity we do not have to cut back on features but maybe even gain on both ends. Wow! Guess what, that thing exists and it goes by the name Python:
Python where we can, C++ where we must — they used (a subset of) C++ If you are asking yourself Who are they? in this context, the answer is: The founders of Google. So, why might someone make the decision for this technology stack? Easy:
If I can write 10 lines of code in language X to accomplish what took you or in other words >>> if succinctness == power: ... print("You are using Python.") ... ... You are using Python. >>> Again you see, simplicity is good, simplicity scales, simplicity shortens product cycles, simplicity helps reduce time to market and last but not least, simplicity is more fun — heck, I would rather spend a few hours and write some useful piece of software rather than debugging some arcane memory bug in an even more arcane programming language. Been there, done that...
Everything should be made as simple as possible, but not simpler. With the following subsections we are going to look at why/what that often mentioned simplicity might be. I know you want facts, and rightfully so! PhilosophyIt is important for anyone involved with Python to at least understand a few basic/core ideas about the language itself:
Random StuffThere are quite a few random things eventually interesting... History of Python
1991 - Dutch programmer Guido van Rossum travels to Argentina Those who are looking for a serious answer, go use some search engine ;-] Zen of Python>>> import this The Zen of Python, by Tim Peters Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex is better than complicated. Flat is better than nested. Sparse is better than dense. Readability counts. Special cases aren't special enough to break the rules. Although practicality beats purity. Errors should never pass silently. Unless explicitly silenced. In the face of ambiguity, refuse the temptation to guess. There should be one -- and preferably only one -- obvious way to do it. Although that way may not be obvious at first unless you're Dutch. Now is better than never. Although never is often better than *right* now. If the implementation is hard to explain, it's a bad idea. If the implementation is easy to explain, it may be a good idea. Namespaces are one honking great idea -- let's do more of those! >>> Everything written in Python should adhere to those principles. Things like frameworks which are written in Python might have additional principles/conventions on top of the ones outlined above. In addition, there are coding styles which we should adhere. Cheat Sheet or RefCardYes, the Internet has plenty of resources on the matter. Here is one of them. QuickstartThis subsection is a summary of the semantics and the syntax found in Python. It can be read and followed along on the command line in less than an hour. It is intended as a glance into Python for those who have not had contact with Python yet, or, for those who want a quick refresh of the cornerstones that makeup for most of Python's look and feel. Also, without further notice, note that this page is about Python 3 where applicable and only refers to Python 2 where still necessary at the point of writing. One of the things I like most about Python is that it is not as wordy as Java/C++ and not as cryptic as Perl but just a programming language with an pragmatic approach to software development — something I would also love to see for JavaScript, which somehow has become my second most used programming language right after Python and before Java/C++. Enough said, let us now go and ride the snake a little... PreparationsIt is strongly recommended to follow along using Python's built-in interactive shell. Personally I prefer to use bpython but then the standard built-in shell is just fine. One needs to install Python if not installed already e.g. using APT
(Advanced Packaging Tool) by issuing sa@wks:~$ which python /usr/bin/python sa@wks:~$ python3 Python 3.2 (r32:88445, Feb 20 2011, 19:50:20) [GCC 4.4.5] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> As of now (November 2011)
Ok, grab your helmets, fasten seat belts... turn ignition... BasicsAssignment, values, names and the >>> foo = 3 # bind (through assignment) name foo to value 3 >>> print(foo) # use print function on object foo 3 >>> bar = "hello world" # a string is a value too >>> print(bar) hello world >>> fiz = foo # bind name fiz to the same value foo is bound to already >>> print(fiz) 3 >>> Expressions and statements: >>> 3 + 2 # an expression is something 5 >>> print(3 + 2) # a statement does something 5 >>> len(bar) # another statement 11 >>> Code blocks are set apart using whitespace rather than braces. Conditionals, clauses and loops work as one would expect: >>> for character in "abc": ... print(character) # 4 spaces per indentation level ... ... a b c >>> for character in "abc": ... print(character) ... if character in "a": ... print("found character 'a'") # 8 spaces on level 2 ... ... ... a found character 'a' b c >>> if 4 == 2 + 2: ... print("boolean context evaluated to true") ... ... else: ... print("boolean context evaluated to false") ... ... boolean context evaluated to true >>> Mostly the semantics of Following the link about counters also shows us the use of the Data StructuresWith only a few built-in Python data structures we can probably cover 80% of use cases. To get to 100% we can then use third party add-ons (so-called packages and/or modules) written by others or simply build our custom data structures and maybe also some custom algorithms which we purposely designed to go with our custom data structures. Relations amongst Data StructuresWe have already seen some data structures — numbers and strings. Numbers are so-called literals whereas strings are sequences. Sequences itself are a subset of containers, which in turn is not just the superset to sequences but also to mappings and sets. What all data structures have in common is that each of them is either itself an object or some sort of grouping thereof. Another thing that is true for any data structure is that it is either mutable (can be modified in place) or immutable (cannot be modified in place) — place being location(s) in memory. Below is a sketch picturing what we just said about how data structures in Python relate (formatting does not carry any information but was chosen to make things fit): o b j e c t / \ / \ literals c o n t a i n e r s / \ / | \ / \ / | \ immutable mutable s e q u e n c e s m a p p i n g s s e t s / \ / \ / \ / \ / \ / \ / \ / \ n u m b e r s etc. immutable mutable mutable immutable immutable mutable / | \ / | \ / | / / \ / | \ / | \ / | / / \ integral complex real/float strings tuples etc. lists etc. dictionaries frozenset set / \ / \ / \ / \ / \ / \ integer boolean decimal binary OrderedDict etc. The sketch, even if it is not complete but only shows the root and a few branches, is a good enough approximation and should help with understanding how data structures in Python relate — they are basically arranged in a tree structure, based on semantics they carry. Numbers, Lists, Tuples, DictionariesNumbers/Integers >>> 2 2 >>> type(2) # check for class/type <class 'int'> # yes, 2 is indeed an integer Numbers/Floats: >>> 1.1 + 2.2 3.3000000000000003 >>> type(1.1) <class 'float'> >>> foo = [2, 4, "hello world"] # create a lists with three items >>> type(foo) <class 'list'> >>> foo[0] # get item at index position 0 2 >>> foo[2] 'hello world' >>> foo[2] = "hello big world" # assign to index position 2 >>> foo [2, 4, 'hello big world'] # assignment worked because lists are mutable >>> foo[1:] # get a slice [4, 'hello big world'] >>> foo[:-1] # slice but with negative upper boundary [2, 4] >>> foo[-1:] # negative lower boundary ['hello big world'] >>> bar = (2, 4, "hello world") >>> type(bar) <class 'tuple'> >>> bar[0] 2 >>> bar[2] 'hello world' >>> bar[2] = "hello big world" Traceback (most recent call last): # because tuples are immutable File "<input>", line 1, in <module> TypeError: 'tuple' object does not support item assignment >>> bar (2, 4, 'hello world') >>> bar[1:] # same as for lists (4, 'hello world')
>>> fiz = {'foo': 3, 'baz': "hello world", 'foobar': bar} # using bar from above >>> type(fiz) <class 'dict'> >>> fiz {'foobar': (2, 4, 'hello world'), 'baz': 'hello world', 'foo': 3} >>> fiz['foo'] 3 >>> fiz['foobar'] (2, 4, 'hello world') >>> fiz['baz'] = "hello big world" >>> fiz['baz'] 'hello big world' >>> fiz {'foobar': (2, 4, 'hello world'), 'baz': 'hello big world', 'foo': 3} >>> fiz['foobar'][1] # nested 4 >>> Built-in GoodiesSo far we have only scratched the subject of built-in data structures in Python. We already know that each data structure is in fact an object. This object holds some sort of data e.g. a list has items. It is not far of to think that it would be nice if those objects also had ways to operate on the data they store e.g. reverse the items of a list etc. Guess what, that is exactly the case — every data structure comes with ready-made functionality to operate on the data it stores! Let us just have a quick look at a few: >>> foo [2, 4, 'hello big world'] >>> foo.reverse() # works because lists are mutable >>> foo ['hello big world', 4, 2] >>> bar.count(4) # number occurrences of value 4 in tuple bar 1 >>> bar.index(2) # index position of value 2 0 >>> fiz.items() dict_items([('foobar', (2, 4, 'hello world')), ('foo', 3), ('baz', 'hello big world')]) >>> fiz.keys() dict_keys(['foobar', 'foo', 'baz']) >>> fiz.values() dict_values([(2, 4, 'hello world'), 3, 'hello big world']) >>> By the way, those built-in goodies are actually method objects linked to from each data structure instance but let us not get ahead of ourselves for now... FunctionsThe next step is to combine all we have seen so far and group behavior (code) in order to more efficiently manage state (data). Functions are basically just code blocks (set apart by indentation) which we can refer to by name and which body contains statements and expression which semantically belong together. However, grouping is just one thing that is nice about using functions. Being able to refer to code blocks by name is nice too. However, where the real benefit comes into play is when we start reusing those code blocks in order to avoid code duplication. >>> def amplify(foo): # function signature; foo is its only parameter ... print(foo * 3) # function body ... ... >>> amplify('hi') # works for strings hihihi >>> amplify(2) # and numbers... 6 >>> type(amplify) <class 'function'> >>> bar = amplify # now bar and amplify, both are bound to >>> type(bar) <class 'function'> >>> bar is amplify # the same function object True >>> bar <function amplify at 0x24f3490> # 0x24f3490 address of function object in memory >>> amplify <function amplify at 0x24f3490> >>> bar(2) 6 >>> The most important thing to understand about functions is that they
are objects too, just like strings etc. What that means is that we can
assign them to arbitrary names — like we just did when we bound the
name It is also important to understand that functions can take arguments,
something quite important when we want to reuse code where the
processing (read code/logic) stays the same but of course input data
varies (e.g. can be At this point there is no need to concern ourselves with the concepts of scope and namespaces, as we will see more on them later. The last thing which is certainly necessary to know with regards to functions is that they always return something. Custom Data StructuresNext to many built-in data structures we can create our custom ones — the terms class/type, superclass/supertype, subclass/subtype, instance and OOP (Object-Oriented Programming), all hint that we are dealing with custom data structures. I advice people to maybe go visit each of those links, maybe read the first one or two paragraphs and then come back for a light introduction into custom data structures in Python: 1 >>> class Foo: # creating a class/type 2 ... pass 3 ... 4 ... 5 >>> type(Foo) 6 <class 'type'> 7 >>> class Bar(Foo): # subclassing 8 ... def __init__(self, firstname=None, surname=None): # a (special) method 9 ... self.firstname = firstname or None 10 ... self.surname = surname or None 11 ... 12 ... def print_name(self): # another method 13 ... print("I am {} {}.".format(self.firstname, self.surname)) 14 ... 15 ... 16 ... 17 >>> type(Bar) 18 <class 'type'> 19 >>> Bar.__bases__ 20 (<class '__main__.Foo'>,) 21 >>> aperson = Bar(firstname="Niki", surname="Miller") # instantiating 22 >>> isinstance(aperson, Bar) 23 True 24 >>> isinstance(aperson, Foo) 25 True 26 >>> aperson.print_name() # method call 27 I am Niki Miller. 28 >>> In only 28 lines we have shown about 90% there is to know about custom
data structures in Python. In line 1, we use the With Since a class/type is basically a blueprint used to create instances
from, every instance (individual person in our case) will have a
different name of course. We want to store a persons name
automatically right after creating the instance i.e. when it gets
initialized — that is what the We can put whatever we want into the body of That is what self is used for... referencing the instance in question i.e. either accessing already stored information (line 13) or storing information (lines 9 and 10) on an instance of some class/type. Also, note the use of or which in our current case means that only if
we provide e.g. We already know the built-in Line 22 and 23 show how we can check whether or not
Last but not least, a word on inheritance... The nifty thing from
line 19 and 20 is using a so-called class/type attribute to see if Standard LibraryIt is a must for any Pythoneer to know about the Python standard library, what it contains, as well as how and when to use it. It has lots of useful code, highly optimized, for many problems seen by most people over and over again for all kinds of problem domains across many industries. Using bits and pieces form the standard library always starts with importing code which can then be used right away. Let us look at some examples: >>> import math >>> math.pi 3.141592653589793 >>> math.cos((math.pi)/3) 0.5000000000000001 >>> from datetime import date >>> today = date.today() >>> dateofbirth = date(1901, 11, 11) >>> age = today - dateofbirth >>> age.days 39993 >>> from datetime import datetime >>> datetime.now() datetime.datetime(2011, 5, 11, 22, 52, 20, 42708) >>> foo = datetime.now() >>> foo.isoformat() '2011-05-11T22:52:26.306873' >>> foo.hour 22 >>> foo.isocalendar() (2011, 19, 3) >>> import zlib >>> foo = b"loooooooooooooong string..." >>> len(foo) 28 >>> compressedfoo = zlib.compress(foo) >>> len(compressedfoo) 23 >>> zlib.decompress(compressedfoo) b'loooooooooooooong string...' >>> import random >>> foo = [3, 6, 7] >>> random.shuffle(foo) >>> foo [3, 7, 6] >>> random.choice(['gym', 'no gym']) 'gym' >>> random.randrange(10) 3 >>> random.randrange(10) 7 >>> import glob >>> glob.glob('*txt') ['file.txt', 'myfile.txt'] >>> import os >>> os.getcwd() '/tmp' >>> os.environ['HOME'] '/home/sa' >>> os.environ['PATH'].split(":") ['/usr/local/bin', '/usr/bin', '/bin', '/usr/local/games', '/usr/games', '/home/sa/0/bash', '/home/sa/0/bash/photo_utilities'] >>> os.uname() ('Linux', 'wks', '2.6.38-2-amd64', '#1 SMP Thu Apr 7 04:28:07 UTC 2011', 'x86_64') >>> import platform >>> platform.architecture() ('64bit', 'ELF') >>> platform.python_compiler() 'GCC 4.4.5' >>> platform.python_implementation() 'CPython' >>> from urllib.request import urlopen >>> bar = urlopen('') >>> bar.getheaders() [('Connection', 'close'), ('Date', 'Wed, 11 May 2011 22:08:15 GMT'), ('Server', 'Cherokee/1.0.8 (Debian GNU/Linux)'), ('ETag', '4dc82a11=6364'), ('Last-Modified', 'Mon, 09 May 2011 21:53:21 GMT'), ('Content-Type', 'text/html'), ('Content-Length', '25444')] >>> import timeit >>> foobar = timeit.Timer("math.sqrt(999)", "import math") >>> foobar.timeit() 0.18407893180847168 >>> foobar.repeat(3, 100) [2.7894973754882812e-05, 2.3126602172851562e-05, 2.288818359375e-05] >>> import sys >>> sys.path ['', '/usr/local/bin', '/usr/lib/python3.2', '/usr/lib/python3.2/plat-linux2', '/usr/lib/python3.2/lib-dynload', '/usr/local/lib/python3.2/dist-packages', '/usr/lib/python3/dist-packages'] >>> import keyword >>> keyword.iskeyword("as") True >>> keyword.iskeyword("def") True >>> keyword.iskeyword("class") True >>> keyword.iskeyword("foo") False >>> import json >>> print(json.dumps({'foo': {'name': "MongoDB", 'type': "document store"}, ... 'bar': {'name': "neo4j", 'type': "graph store"}}, ... sort_keys=True, indent=4)) { "bar": { "name": "neo4j", "type": "graph store" }, "foo": { "name": "MongoDB", "type": "document store" } } >>> ... and that was not even 0.1% of what is available from the Python standard library! ScriptsMost people, before they write applications composed of several files/libraries (i.e. modules and/or packages), probably start out writing themselves simple scripts in order to automate things such as system administration tasks. That is actually the perfect way into Python after working through some quickstart section such as this one because, in order to create and execute scripts, one needs to know about the pound bang line, import, docstrings, how to use Python's standard library, as well as what it means to write pythonic code. On-disk LocationPython can live anywhere on the filesystem and there are quite a few ways to influence and determine where things go... PYTHONPATH VariableWell, actually we are not talking about Finding Code on the FilesystemIf we have code (Python package or modules) somewhere on the
filesystem that we want Python to know about, we need to import that
code using the So how do we tell Python about the places where it should look for
code? The variable
Before we start, let us take a look at sa@wks:~$ python >>> dir() ['__builtins__', '__doc__', '__name__', '__package__'] >>> import pprint >>> import sys >>> pprint.pprint(sys.path) ['', '/usr/lib/python3.2', '/usr/lib/python3.2/plat-linux2', '/usr/lib/python3.2/lib-dynload', '/usr/lib/python3.2/dist-packages', '/usr/local/lib/python3.2/dist-packages'] If we decided to add our own or some third party code without adding a
new directory to
If we want/have to add another directory to
Manually adding to sys.pathThis one is straight forward as we only need to append to >>> import os >>> sys.path.append('/tmp') >>> sys.path.append(os.path.expanduser('~/0/django')) >>> pprint.pprint(sys.path) ['', '/usr/lib/python3.2', '/usr/lib/python3.2/plat-linux2', '/usr/lib/python3.2/lib-dynload', '/usr/lib/python3.2/dist-packages', '/usr/local/lib/python3.2/dist-packages', '/tmp', '/home/sa/0/django'] >>> Adding directories manually is quick and certainly nice while doing
development/testing but it is not what we want for some permanent
setup like for example a long-term development project or a production
site. For those, we want to add directories to Automatically adding to sys.pathWhen a module named sa@wks:~$ echo $PATH /usr/local/bin:/usr/bin:/bin:/usr/games:/home/sa/0/bash sa@wks:~$ When Most Linux distributions include Python as a standard part of the
system, so sa@wks:~$ python >>> import sys >>> sys.prefix '/usr' >>> sa@wks:~$ So now we know how finding code on the filesystem works. This however
does not help us much since we do not want to use any of the default
paths/directories listed in
Although the standard method so far is to add directories to
So what do we do? Piece of cake, we use All we need to do is to put our 1 sa@wks:/tmp$ mkdir test; cd test; echo -e "foo\nbar" > our_path_file.pth 2 sa@wks:/tmp/test$ mkdir foo bar 3 sa@wks:/tmp/test$ echo 'print("inside foo.py")' > foo/foo.py 4 sa@wks:/tmp/test$ echo 'print("inside bar.py")' > bar/bar.py 5 sa@wks:/tmp/test$ type ta 6 ta is aliased to `tree -a -I \.git*\|*\.\~*\|*\.pyc' 7 sa@wks:/tmp/test$ ta ../test/ 8 ../test/ 9 |-- bar 10 | `-- bar.py 11 |-- foo 12 | `-- foo.py 13 `-- our_path_file.pth 14 15 2 directories, 3 files 16 sa@wks:/tmp/test$ cat our_path_file.pth 17 foo 18 bar 19 sa@wks:/tmp/test$ python3 20 Python 3.1.1+ (r311:74480, Oct 12 2009, 05:40:55) 21 [GCC 4.3.4] on linux2 22 Type "help", "copyright", "credits" or "license" for more information. 23 >>> import pprint, sys, site 24 >>> pprint.pprint(sys.path) 25 ['', 26 '/usr/lib/python3.1', 27 '/usr/lib/python3.1/plat-linux2', 28 '/usr/lib/python3.1/lib-dynload', 29 '/usr/lib/python3.1/dist-packages', 30 '/usr/local/lib/python3.1/dist-packages'] 31 >>> site.addsitedir('/tmp/test') 32 >>> pprint.pprint(sys.path) 33 ['', 34 '/usr/lib/python3.1', 35 '/usr/lib/python3.1/plat-linux2', 36 '/usr/lib/python3.1/lib-dynload', 37 '/usr/lib/python3.1/dist-packages', 38 '/usr/local/lib/python3.1/dist-packages', 39 '/tmp/test', 40 '/tmp/test/foo', 41 '/tmp/test/bar'] 42 >>> import foo 43 inside foo.py 44 >>> import foo 45 >>> import bar 46 inside bar.py Python now finds our modules
Certainly, no one really cares to use What I often do is to add to 47 >>> site.USER_SITE 48 '/home/sa/.local/lib/python3.1/site-packages' 49 >>> import os 50 >>> dir() 51 ['__builtins__', '__doc__', '__name__', '__package__', '__warningregistry__', 'bar', 'foo', 'os', 'pprint', 'site', 'sys'] 52 >>> mypth = os.path.join(site.USER_SITE, 'mypath.pth') 53 >>> print(mypth) 54 /home/sa/.local/lib/python3.1/site-packages/mypath.pth 55 >>> module_paths_to_add_to_sys_path = ["/home/sa/0/django", "/home/sa/0/bash"] 56 >>> if not os.path.isdir(site.USER_SITE): 57 ... os.makedirs(site.USER_SITE) 58 ... 59 >>> with open(mypth, "a") as f: 60 ... f.write("\n".join(module_paths_to_add_to_sys_path)) 61 ... f.write("\n") 62 ... 63 33 64 1 65 >>> pprint.pprint(sys.path) 66 ['', 67 '/usr/lib/python3.1', 68 '/usr/lib/python3.1/plat-linux2', 69 '/usr/lib/python3.1/lib-dynload', 70 '/usr/lib/python3.1/dist-packages', 71 '/usr/local/lib/python3.1/dist-packages', 72 '/tmp/test', 73 '/tmp/test/foo', 74 '/tmp/test/bar'] 75 >>> site.addsitedir(site.USER_SITE) 76 >>> pprint.pprint(sys.path) 77 ['', 78 '/usr/lib/python3.1', 79 '/usr/lib/python3.1/plat-linux2', 80 '/usr/lib/python3.1/lib-dynload', 81 '/usr/lib/python3.1/dist-packages', 82 '/usr/local/lib/python3.1/dist-packages', 83 '/tmp/test', 84 '/tmp/test/foo', 85 '/tmp/test/bar', 86 '/home/sa/.local/lib/python3.1/site-packages', 87 '/home/sa/0/django', 88 '/home/sa/0/bash'] 89 >>> 90 sa@wks:/tmp/test$ Virtual EnvironmentA standard system has what is called a main Python installation also
known as global Python context/space i.e. a Python interpreter living
at Another way to have modules/packages installed would be to use virtualenv. It can be used to create isolated Python contexts/spaces i.e. those virtual environments can have their own Python interpreter as well as their own set of modules/packages installed and therefore have no connection with the global Python context/space whatsoever. Note that we can not just clone the global Python context/space or create an entirely separated Python context/space to work with, but we can also link any directories into any virtual environment. This means ultimate flexibility without risking to damage the existing main Python installation also known as global Python context/space. Configuration InformationThe sysconfig module provides access to Python's configuration information like the list of installation paths and the configuration variables relevant for the current platform. Since Python 3.2 we can issue sa@wks:~$ python -m sysconfig Platform: "linux-x86_64" Python version: "3.2" Current installation scheme: "posix_prefix" Paths: data = "/usr" include = "/usr/include/python3.2mu" platinclude = "/usr/include/python3.2mu" platlib = "/usr/lib/python3.2/site-packages" platstdlib = "/usr/lib/python3.2" purelib = "/usr/lib/python3.2/site-packages" scripts = "/usr/bin" stdlib = "/usr/lib/python3.2" Variables: ABIFLAGS = "mu" AC_APPLE_UNIVERSAL_BUILD = "0" [skipping a lot of lines...] py_version = "3.2" py_version_nodot = "32" py_version_short = "3.2" srcdir = "/home" userbase = "/home/sa/.local" sa@wks:~$ InterpretedPython is an interpreted language, as opposed to a compiled one, though the distinction can be blurry because of the presence bytecode. This means that Python source code files ( In the end we always need to decide on a per case basis — there is no one fits all possible use cases programming language out there... InterpretersThere are several implementations... WRITEME CPythonPyPy
Unladen SwallowStackless Python
BytecodePython source code ( Bytecode is also cached in This intermediate language is said to run on a virtual machine that executes the machine code corresponding to each bytecode e.g. CPython, Jython, etc. That said, bytecodes are not expected to work on different Python virtual machines nor can we expect them to work across Python releases on the same virtual machine. Also, note that Garbage CollectionNot the thing your neighbours are talking about when referring to your car but rather the automatic memory management of Python which is based on its dynamic type system and a combination of reference counting and garbage collection.
In a nutshell: once the last reference to an object is removed, the object is deallocated... that is, left floating around in memory until deleted/overwritten. The memory it occupied is said to be freed and possibly immediately reused by another (new) object. Python's memory management is smart enough to detect and break cyclic references between objects that might otherwise occupy memory indefinitely, which in its worst case might cause memory shortage. Reference CountThe number of references to an object. When the reference count of an object drops to zero, it is deallocated. Reference counting is generally not visible to Python code, but it is a key element of the CPython implementation. The Pieces of the PuzzleIn a way it is like doing a puzzle... small bits and pieces are used to assemble bigger ones, which are used to assemble even bigger ones which in turn make for a nice whole... Let us have a look at various kinds of blocks and how they fit together: Working SetA collection of distributions available for importing. These are the
distributions that are on the Working sets include all distributions available for importing, not
just the sub-set of distributions which have actually been imported
using the Standard LibraryPython's standard library is very extensive, offering a wide range of facilities. The library contains built-in modules (written in C) that provide access to system functionality such as file I/O that would otherwise be inaccessible to Python programmers, as well as modules written in Python that provide standardized solutions for many problems that occur in everyday programming. Some of these modules are explicitly designed to encourage and enhance the portability of Python programs by abstracting away platform-specifics into platform-neutral APIs. ProThe argument for having a standard library, aside from the afore mentioned is that it also helps with what is known as the selection problem. This is the problem of picking a third party module (sometimes even finding it, although PyPI helps with that) and figuring out if it is any good. Simply figuring out the quality of a module is a lot of work, and the amount of work multiplies drastically if there are several third party modules that seem to cater to the same problem domain at hand. Often, the only way we can really tell if a package/module is going to work well is to actually try using it. Generally this has to be done in a real program under real use case circumstances which means that if we picked poorly we may have wasted time and effort. Even if we can rule out a piece of code relatively early, we had to spend time to read documentation and skim over code. And frankly speaking, it is frustrating to run into near misses i.e. packages/modules that almost do what we need and almost work but in the end have some edge cases unsolved. Faced with this, it often at least feels easier to write something from scratch ourselves if what we want is not too much work anyway. When a module has made it into the standard library, we do not have to go through all of this (mostly true) as we can just use the package/module/class/function/etc., secure in the confidence that this is a good implementation of whatever problem we need to solve. Someone else has already gone through all of the quality assurance process, and if there were multiple implementations, somebody has probably either picked the best one or at least determined that they are more or less equivalent and so we are not missing anything very important by not looking at the other options. ContraHowever, there are more and more voices saying that the standard library has become to big and should be cut down or set aside from Python core (the interpreter) release cycles altogether (releasing more often than core). The argument is that once code is included into the standard library, it stifles innovation on that particular area (because it is tied to release cycles of Python core and must maintain full backwards compatibility) and discourages other developers from innovating in that same area. Module, PackageWe can think of modules as extensions/add-on/plugins that can be imported into Python to extend its capabilities beyond the core i.e. the interpreter itself. A module is usually just a file on the filesystem, containing source
code (statements, functions, classes, etc.) for a particular use case
e.g. sa@sub:/tmp$ mkdir graphics; touch graphics/{draw,colorize}.py; ta graphics graphics |-- colorize.py `-- draw.py 0 directories, 2 files sa@sub:/tmp$ type ta ta is aliased to `tree --charset ascii -a -I \.git*\|*\.\~*\|*\.pyc' sa@sub:/tmp$ There are two ways how modules and/or packages are distributed:
When importing, it has become good practice to import in the following order, one import per line:
One good example for a Python package can be found in Django where every project and the applications it contains is/are in fact Python packages. Get a List of available ModulesUse >>> sys.modules.keys()[:4] ['pygments.styles', 'code', 'opcode', 'distutils'] >>> Modules create NamespacesModules play an important role in Python since they create namespaces when being imported. Organize ModulesIt is recommended to organize modules in a particular way. __main__When we run a Python script then the interpreter treats it like any
other module i.e. it gets its own global namespace. There is one
difference however: the interpreter assigns >>> if __name__ == '__main__': ... print("We are either using the interpreter interactively or we just executed a script.") ... ... We are either using the interpreter interactively or we just executed a script. >>> __name__ '__main__' >>> What it does is change semantics based on whether or not we run the
Extension ModuleThis is software written in the same low-level language the particular Python implementation is written in e.g. C/C++ for CPython or Java for Jython. The extension module is typically contained in a single dynamically
loadable and pre-compiled file e.g. a built-in Modules
>>> import sys >>> from pprint import pprint as pp >>> pp(sys.builtin_module_names) ('__main__', '_ast', '_bisect', '_codecs', '_collections', '_ctypes', '_elementtree', '_functools', '_hashlib', '_heapq', '_io', '_locale', '_pickle', '_random', '_socket', '_sre', '_ssl', '_struct', '_symtable', '_thread', '_warnings', '_weakref', 'array', 'atexit', 'binascii', 'builtins', # contains built-in functions, exceptions, and other objects 'cmath', [skipping a lot of lines...] 'zipimport', 'zlib') >>> Note that the builtins module is one of the built-in modules with the Python interpreter. __builtin__, builtinsWe have built-in functions like >>> import __builtin__ Traceback (most recent call last): File "<stdin>", line 1, in <module> ImportError: No module named __builtin__ >>> import sys; sys.version[:3] '3.3' >>> import builtins, pprint >>> builtins.__doc__.splitlines()[0] 'Built-in functions, exceptions, and other objects.' >>> pprint.pprint(list(builtins.__dict__.items())[::15]) [('bytearray', <class 'bytearray'>), ('oct', <built-in function oct>), ('bytes', <class 'bytes'>), ('ImportWarning', <class 'ImportWarning'>), ('filter', <class 'filter'>), ('open', <built-in function open>), ('hasattr', <built-in function hasattr>), ('id', <built-in function id>), ('ZeroDivisionError', <class 'ZeroDivisionError'>)] >>> As can be seen, the __builtins__As an implementation detail, most modules have the name __future__Basically what the name promises, it is a module that brings future
features which are not enabled with the current version of Python core
(the interpreter) by default. Simply using Finder, Loader, ImporterTo use functionality which is not built-in with Python core i.e. the
interpreter itself, we need to get it from somewhere else e.g. the
Python Standard Library or some module/package. This is called
importing — basically everything that involves the This process of importing is, as many other things, specified by a so-called protocol. The Importer protocol involves two objects: a finder and a loader.
In many cases the finder and loader are one and the same object i.e.
How to ImportThe way we import modules effects the way we use namespaces quite a bit. Go here for more information on the matter. Distutils, Setuptools, DistributeAlthough those tools have nothing to do with writing source code itself, they are needed to work with the whole Python ecosystem. Go here and here for more information. PyPIThe PyPI (Python Package Index) is the default packaging index for the Python community (same CPAN is for Perl). Package managers such as EasyInstall, zc.buildout, PIP and PyPM use PyPI as the default source for packages and their dependencies. PyPI is open to all Python developers to consume and distribute their distributions. DistributionNot to be confused with Linux distributions e.g. Debian, Suse, Ubuntu,
etc. A Python distribution is a versioned and compressed archive file
(e.g. A distribution is often mistakenly called a package — this is the
term commonly used in other fields of computing. For example Debian
calls these files package files ( ProjectA library, framework, script, plugin, application, or collection of data or other resources, or any combination thereof. Python projects must have unique CamelCase names, which are registered on PyPI. Each project will then contain one or more releases, and each release may comprise one or more distributions. There is a strong convention to name a project after the name of the
package which is imported to run that project e.g. ExampleA Python project consists at least of two files living side by side in
the same directory — a setup.py file which describes the metadata of
the project, and a Python module containing Python source code to
implement the functionality of the project. However, usually the
minimal layout of a project contains a little more than just a
It is wise to create a full Python package i.e. a directory with an
__init__.py file, called Next to the Python package a project should also have a The result will then look like this on the filesystem — it all starts
with the CamelCase project directory sa@wks:~$ type ta; ta FlyingDingo/ ta is aliased to `tree --charset ascii -a -I \.git*\|*\.\~*\|*\.pyc' FlyingDingo/ |-- flyingdingo | `-- __init__.py |-- AUTHORS.txt |-- LICENSE.txt |-- README.txt `-- setup.py 1 directory, 4 files sa@wks:~$ Note that ReleaseA snapshot of a project at a particular point in time, denoted by a version identifier. Making a release may entail the publishing of multiple distributions as we might release for several platforms. For example, if version 1.0 of a project was released, it could be available in both a source distribution format and a Windows installer distribution format. FilesThere are lots of files used for various things: setup.pyUpdate: note that
Some packages are pure Python and are only byte-compiled, other packages may also contain native C code which will require a native compiler like gcc or cl and some Python interfacing module like swig or pyrex. Generally setup.cfgIt is a configuration file local to some package which is used to
record configuration data for a particular package. At first it looks at the system-wide configuration file e.g.
Any of those levels overrides the former one e.g. personal overrides system-wide, package-local overrides personal and of course, package-local also overrides system-wide. Note that with the introduction of packaging into the standard library
in Python 3.3, __init__.pyFiles named Every time we use Next to signifying that a directory is a Python package, An example in which we may want initialization is when we want to read
in a bunch of data once at package-load time (e.g. from files, a
database, the web...), in which case it is much nicer to put that
reading in a private function in the package's By using Last but not least, if we want to specify the public API for a package
then we put models.pyAll we just said about site.py
KeywordsAs every other programming language out there, Python has keywords too: >>> import keyword >>> from pprint import pprint as pp >>> pp(keyword.kwlist) ['False', 'None', 'True', 'and', 'as', 'assert', 'break', 'class', 'continue', 'def', 'del', 'elif', 'else', 'except', 'finally', 'for', 'from', 'global', 'if', 'import', 'in', 'is', 'lambda', 'nonlocal', 'not', 'or', 'pass', 'raise', 'return', 'try', 'while', 'with', 'yield'] >>> Coding StylePython coding style and guidelines have PEP 8 and the Zen of Python at their core. In addition, there are docstrings which are also part of adhering to good coding style in Python. However, there is more than just PEP 8 and PEP 257. Why have a Coding Style?As project size increases, the importance of consistency increases too. Most projects start with isolated tasks, but will quickly integrate the pieces into shared libraries as they mature. Testing and a consistent coding style are critical to having trusted code to integrate and can be considered main pillars of quality assurance. Also, guesses about naming and interfaces will be correct more often than not which can greatly enhance developer experience and productivity. Good code is useful to have around for several reasons: Code written to these standards should be useful for teaching purposes, and also to show potential employers during interviews. Most people are reluctant to show code samples — but then having good code that we have written and tested will put us well ahead of the crowd. Also, reusable components make it much easier to change requirements, refactor code and perform analyses and benchmarks. With good coding standards in the end everybody wins: Developers because there will be less bugs and guessing which means there will be more time to innovate and do bleeding-edge stuff which is a lot more fun compared to hunting down and fixing bugs all the time. Marketing will be happy because TtM (Time to Market) will be reduced, and new features delivered faster. Management will be happy because RoI (Return on Investment) will go up and at the same time administrative costs will go down. Last but not least, users will appreciate the fact that there will be less bugs and more new features more often. EAFPEAFP (Easier to ask for Forgiveness than Permission) is a
programming principle how to approach problems/things when
programming. This clean and fast style is characterized by the
presence of many It is nothing Python specific but can actually be found with many programming languages. With Python however, because of it is nature, adhering to this principle works quite well. In Python EAFP is generally preferred over LBYL (Look before you Leap), which is the contrary principle and, for example, the predominant coding style with C. Pound Bang Line, Shebang LineExecutable scripts on Unix-like systems may have something like
File PermissionsFile permissions are set depending on umask which is why we usually
end up with file permissions of sa@wks:~$ umask 0022 sa@wks:~$ touch foo.py sa@wks:~$ ls -l foo.py -rw-r--r-- 1 sa sa 0 Apr 20 18:56 foo.py sa@wks:~$ Now, in order to execute sa@wks:~$ echo 'print("Hello World")' > foo.py; cat foo.py print("Hello World") sa@wks:~$ python foo.py # using the interpreter directly Hello World sa@wks:~$ but it needs to be executable plus have its pound bang line in case we would want it to run like this sa@wks:~$ echo -e '#!/usr/bin/env python\nprint("Hello World")' > foo.py; cat foo.py #!/usr/bin/env python print("Hello World") sa@wks:~$ ls -l foo.py; ./foo.py -rw-r--r-- 1 sa sa 43 Apr 20 18:59 foo.py bash: ./foo.py: Permission denied sa@wks:~$ chmod 755 foo.py sa@wks:~$ ls -l foo.py -rwxr-xr-x 1 sa sa 43 Apr 20 18:59 foo.py sa@wks:~$ ./foo.py # no ./ needed if current dir is in PATH Hello World sa@wks:~$ Underscore, GettextA single underscore ( for loopSometimes people use loop variables such as >>> for each in range(2): ... print(each) ... ... 0 1 >>> or this >>> for i in range(2): ... print(i) ... ... 0 1 >>> but not this >>> for _ in range(2): ... print(_) ... ... 0 1 >>> Using Single Quotes vs Double QuotesThere are 4 ways we can quote strings in Python:
Semantically there is no difference in Python i.e. we can use either.
The triple string delimiters >>> print('This is a string using a single quote!') This is a string using a single quote! >>> print("This is a string using a double quote!") This is a string using a double quote! >>> print("""Using tiple quotes ... we can do ... multiline strings.""") Using tiple quotes we can do multiline strings. >>> This example, shows that single quotes ( >>> print("She said, "Don't do it") File "<stdin>", line 1 print("She said, "Don't do it") ^ SyntaxError: invalid syntax >>> What happened? We thought double and single quotes are interchangeable. Well, truth is, they are for the most part but not always. When we try to mix them, it can often end up in a syntax error, meaning that our code has been entered incorrectly, and Python does not know what we are trying to accomplish. What really happens is that Python sees our first double quote and
interprets that as the beginning of our string. When it encounters the
double quote before the word >>> print("She said, \"Don't do it\"") She said, "Don't do it" >>> Finally, let us take a moment to discuss the triple quote. We briefly saw its usage earlier. In that example, we saw that the triple quote allows we to write some text on multiple lines, without being processed until we close it with another triple quote. This technique is useful if we have a large amount of data that we do not wish to print on one line, or if we want to create line breaks within our code as shown below: >>> print("""I said ... foo, he said ... bar and baz is ... what happened.""") I said foo, he said bar and baz is what happened. >>> There is another way to print text on multiple lines using the newline
( >>> print("I said\nfoo, he said\nbar and baz is\nwhat happened.") I said foo, he said bar and baz is what happened. >>> Note that we did not have to use triple quotes in this case! Last but
not least, look what a simple >>> print(r'I said\nfoo, he said\nbar and baz is\nwhat happened.') I said\nfoo, he said\nbar and baz is\nwhat happened. >>> In this case Recommendation
1 >>> anumber = 2 2 >>> "there are {} cats on the roof".format(anumber) 3 'there are 2 cats on the roof' 4 >>> CONSTANTS = {'keyfoo': "some string", 'keybar': "another string"} 5 >>> print(CONSTANTS['keyfoo']) 6 some string 7 >>> CONSTANTS[keyfoo] 8 Traceback (most recent call last): 9 File "<stdin>", line 1, in <module> 10 NameError: name 'keyfoo' is not defined 11 >>> CONSTANTS = {'keyfoo's number': "some string", 'keybar': "another string"} 12 File "<stdin>", line 1 13 CONSTANTS = {'keyfoo's number': "some string", 'keybar': "another string"} 14 ^ 15 SyntaxError: invalid syntax 16 >>> Docstrings and raw string literals ( def some_function(foo, bar): """Return a foo-appropriate string reporting the bar count.""" return somecontainer['bar'] re.search(r'(?i)(arr|avast|yohoho)!', message) is not None DocstringA string literal which appears as the first expression in a class,
method, function or module. While ignored when the suite is executed,
it is recognized by the compiler and put into the Since it is available via introspection, it is the canonical place for documentation of the object... in Python everything is an object, remember? After this reminder, what is left to say is that PEP 257 has all there is to know with regards to docstrings and their conventions. There is also a more concise version available from the official Python documentation. ExamplesSeveral examples of docstrings can be found here. Naming VariablesWe should choose a variable name that people will most likely guess,
something semantically related to the task the variable is involved
in. A variable name should be descriptive, but not too long e.g.
Sometimes good variable names are hard to find. We should not be
afraid to change variable names except when they are part of a public
API which means other people are likely to use them, and rely on the
fact that the API does not change. It may take some time in working
with the source code in order to come up with reasonable variable
names for everything. However, if we have unit tests, it is easy to
change them, especially with global search and replace in editors like
GNU Emacs or a simple Use singular variable names for individual things, plural variable
names for collections. For example, we would expect
Python is a polymorphic programming language. We should therefore not
make the data type part of the variable name because we might
want/need to change the implementation later e.g. we should use
We should make the variable name as precise as possible. For example,
if the variable name is the name of the input file, it should be
called It is recommended to use One-letter variable names should only occur in mathematical functions
or as loop iterators with limited scope. Limited scope covers things
like In general, we should limit our use of abbreviations. A few well-known
abbreviations are fine, but we do not want to come back to our code in
6 months and have to figure out what Full Name Abbreviation alignment aln auxillary aux citation cite current curr database db dictionary dict directory dir end of file eof frequency freq expected exp index idx input in maximum max minimum min number num observed obs original orig output out previous prev record rec reference ref sequence seq standard deviation stdev statistics stats string str structure struct temporary temp taxonomic tax variance var Naming ConventionsIt is important to follow naming conventions because they make it much easier to guess what a name refers to. In particular, it should be easy to guess what scope a name is defined in, what it refers to, whether it is fine to change its value, and whether its referent is callable or not. The following rules provide these distinctions: Names to AvoidNames that should be avoided in general are the characters Another thing to avoid are single character names except for counters
or iterators e.g. as used in for loops. Available Naming Styles
There is also the style of using a short unique prefix to group
related names together, although not used much in Python. For example,
the The X11 library uses a leading Special FormsThe following special forms using leading or trailing underscores are recognized. These can be combined with any case convention mentioned:
Usage
ExamplesT y p e C o n v e n t i o n E x a m p l e package lowercase foo lowercase-with-hyphens foo-bar module lowercase baz lowercase_with_underscores baz_foo non-public module _lowercaseawithleadingunderscore _baz _lowercase_with_underscores_and_leading_underscore _baz_foo constant UPPERCASE TOTALS UPPER_CASE_WITH_UNDERSCORES ALLOWED_OFFSET non-public constant _UPPERCASEWITHLEADINGUNDERSCORE _TOTALS _UPPER_CASE_WITH_UNDERSCORES_AND_LEADING_UNDERSCORE _ALLOWED_OFFSET variable lowercasenoun car lowercase_noun_with_underscores gas_station global variable gCapitalizedWordNounWithLeadingG gCar gCapitalizedWordsNounWithLeadingG gGasStation private variable __lowercase_with_two_leading_underscores __delegator_obj_ref function lowercaseaction() disperse() lowercase_action_with_underscores() find_all() non-public function _lowercaseactionwithleadingunderscore() _disperse() _lowercase_action_with_underscores_and_leading_underscore() _find_all() method lowercaseaction() randomize() lowercase_action_with_underscores() cache_and_delete() non-public method _lowercaseactionwithleadingunderscore() _randomize() _lowercase_action_with_underscores_and_leading_underscore() _cache_and_delete() private method __lowercase_with_two_leading_underscores() __delegator_obj_ref() class CapitalizedWordsNoun SampleSequence non-public class _CapitalizedWordsNounWithLeadingUnderscore _TestSequence exception CapitalizedWordsNounError DiskCountingError Organize ModulesThe first line of each file/module should be Next should be the docstring with a description. If the description is long, the first line should be a short summary that makes sense on its own, separated from the rest by a newline. All code, including import statements, should follow the docstring.
Otherwise, the docstring will not be recognized by the interpreter,
and we will not have access to it in an interactive session (i.e.
through We should import built-in modules first, followed by third-party modules, followed by any changes to installation paths and our own modules. Especially, additions/removals to the installation path and names of our own modules are likely to change rapidly — keeping them in one place makes them easier to find. Assuming we are not distributing our source code as Python package and
therefore do not provide a setup.py, we should put what usually goes
into __author__ = "John Doe" __author_email__ = "[email protected]" __copyright__ = "Copyright (C) 2012 Free Software Foundation, Inc." __development_status__ = "Production/Stable" __license__ = "Simplified BSD License" __url__ = "http://example.com/somemodule.py" __version__ = "1.4.1" __development_status__ should typically be one of Examplesa@wks:~$ cat somemodule.py #!/usr/bin/env python """Provides Foo class for baz. Lorem ipsum dolor sit. Hendrerit volutpat praesent ad mattis posuere nonummy congue. Gravida cum eu nullam. Accumsan lacus malesuada inceptos ligula mollis mus eros cum donec dis arcu posuere ante, nisl. Viverra consequat quam quisque hymenaeos mi vulputate neque, curae quam. """ __author__ = "John Doe" __author_email__ = "[email protected]" __copyright__ = "Copyright (C) 2012 Free Software Foundation, Inc." __development_status__ = "Production/Stable" __license__ = "Simplified BSD License" __url__ = "http://example.com" __version__ = "2.4.1" import sys import os from random import choice, random import zmq import ownmodule class Foo: """Computes Gauss variations. Lorem ipsum dolor sit. Sodales urna ut. Eros sociis, aptent metus curae odio nibh semper, platea fusce. Metus netus. Tristique a. Nostra etiam feugiat, vitae justo. Aliquam proin urna dapibus ut, quis porta, nonummy non, ut. Etiam donec per ultricies, magnis et sed imperdiet morbi. """ def __init__(self, barfoo): """Initialize instances of Foo.""" pass def show_path(self, bazfoo): """Prints baz on Foo. Lorem ipsum dolor sit amet, consecteteur adipiscing elit. Interdum metus. Cras adipiscing sit fusce non vel est sollicitudin ve, justo. """ pass def compute_path(self, foobar): """Computes path to blabla.""" pass def main(): """Executed when run as script.""" pass if __name__ == '__main__': sys.exit(main()) #_ emacs local variables # Local Variables: # mode: python # allout-layout: (0 : 0) # End: sa@wks:~$ pep8 somemodule.py # all good for pep8 (no output) sa@wks:~$ pylint --disable=F0401,W0611 somemodule.py # all good for pylint as well sa@wks:~$ echo; pylint --help-msg=F0401,W0611 # closer look of what we ignored :F0401: *Unable to import %r* Used when pylint has been unable to import a module. This message belongs to the imports checker. :W0611: *Unused import %s* Used when an imported module or variable is not used. This message belongs to the variables checker. sa@wks:~$ pychecker -p somemodule.py Processing module somemodule (somemodule.py)... Warnings... None sa@wks:~$ Using sys.exit() or the atexit module we can make sure our script acts
appropriately at all times. For example, Finally we take a closer look at Inline CommentsInline comments start with Inline comments are different to docstrings — they should be used for on the same line comments regarding some particular piece of source code. As with docstrings, they should always be updated when source code changes. Incorrect inline comments are far worse than no comments at all (since they are actively misleading). Also,
Brevity is the soul of wit. meaning that inline comments should be as short as possible, explaining what needs to be explained with as little words as possible in the most precise way possible. Let us look at an example: 1 win_size -= 20 # decrement win_size by 20 2 win_size -= 20 # leave space for the scrollbar 3 4 self._scrollbar_size = 20 5 win_size -= self._scrollbar_size Inline comments should say more than the code itself (line 2 for example immediately tells us that this is a GUI application that probably does some dynamic window resizing) rather than just stating the obvious (line 1). We should examine our comments carefully as they may indicate that we might be better off refactoring our source code e.g. by renaming variables and getting rid of inline comments — if in doubt, we should not use inline comments at all. As an example, comment in line 1 should be removed because it is stating the obvious. Furthermore, we should not scatter magic numbers and other constants that have to be explained through our code. It is far better to use variable names whose names are self-explanatory, especially if we use the same constant more than once. Finally, we should consider turning constants into class or instance data (lines 4 and 5) because it is all too common that constants need to change over time or they are simply used in several places. PythonicSo what does it mean if somebody says foo looks pythonic? What does it mean if we write something in Python and then somebody comes along and calls our creation unpythonic? Let us take little detour first... A common neologism in the Python community is pythonic, which can have a wide range of meanings but is almost always related to coding style. Therefore to say that a piece of source code is pythonic is to say that it uses Python idioms well, that it is natural and shows fluency in the language by whoever wrote it. Likewise, to say of an interface or language feature that it is pythonic is to say that it works well with Python idioms, that its use meshes well with the rest of the language and the entire Python ecosystem. In contrast, a mark of unpythonic source code is that it attempts to write some other programming language (e.g. C++, Lisp, Perl, or Java) source code in Python i.e. that is, provides a rough transcription rather than an idiomatic translation of forms from another programming language. The concept of Pythonicity is tightly bound to Python's minimalist philosophy of readability and avoiding the "there's more than one way to do it" approach. Unreadable code or incomprehensible idioms are unpythonic. When going from one programming language to another, some things have to be unlearned first. What we know from other programming languages may not be useful in Python at all — maybe they are, maybe not, maybe just portions of it... __init__.py for Initialization__init__.py is our friend when we need to carry out actions once when a package is imported and before any of the code contained inside the package is executed. Use the Standard LibraryThe standard library is our friend, let us use it: >>> foo = "/home/sa" >>> baz = "somefile" >>> foo + "/" + baz # unpythonic '/home/sa/somefile' >>> import os.path >>> os.path.join(foo, baz) # pythonic '/home/sa/somefile' >>> Other useful functions in >>> somefoo = list(range(9)) >>> somefoo [0, 1, 2, 3, 4, 5, 6, 7, 8] >>> import random >>> random.shuffle(somefoo) # pythonic >>> somefoo [8, 4, 5, 0, 7, 2, 6, 3, 1] >>> max(somefoo) # pythonic 8 >>> min(somefoo) # pythonic 0 >>> A more advanced example using >>> class Product: ... def __init__(self, price): ... self.price = price ... ... ... >>> products = [Product(price) for price in (9.99, 4.99, 10)] >>> products [<__main__.Product object at 0x1969350>, <__main__.Product object at 0x1969a50>, <__main__.Product object at 0x1969a90>] >>> products[0].price 9.99 >>> products[1].price 4.99 >>> products[2].price 10 >>> from operator import attrgetter >>> for item in sorted(products, key=attrgetter('price')): ... print("Price: {:>5.2F}".format(item.price)) ... ... Price: 4.99 Price: 9.99 Price: 10.00 >>> There are also many useful built-in functions people sometimes seem
not to be aware of for some reason e.g. Create [], {}, ()>>> bar = list() # unpythonic >>> type(bar) <class 'list'> >>> del bar >>> bar = [] # pythonic >>> type(bar) <class 'list'> >>> foo = {} >>> type(foo) <class 'dict'> >>> baz = set() # {} is a dictionary so we need to use set() >>> type(baz) <class 'set'> >>> Copy [], {}, ()At this point it is assumed people know the difference between what is a so-called shallow copy and a deep copy. The difference between shallow and deep copying in Python is only relevant for compound objects i.e. objects that contain other objects, like lists or class instances. We already know some key-facts about objects like for example that every object has a unique ID. With that in mind we can now take look at how to best copy built-in types such as lists. Python's standard library has a module called copy we can use whenever we need to make a deep copy of a compound object such as for example a nested list: >>> import copy >>> nested_list = [[1, 2], None] # nested list >>> id(nested_list[0]) 140575082935216 >>> id(nested_list[:][0]) # shallow copy using slice notation 140575082935216 >>> id(copy.copy(nested_list)[0]) # shallow copy using copy.copy() 140575082935216 >>> nested_list[0] is copy.copy(nested_list)[0] # shallow copying: same object True >>> id(copy.deepcopy(nested_list)[0]) # deep copy using copy.deepcopy() 140575082942976 >>> nested_list[0] is copy.deepcopy(nested_list)[0] # deep copying: different object False >>> Obviously this subtle difference between compound and non-compound
objects is a non-issue when dealing with non-compound objects such as
flat lists because normal (read shallow) copy operations already
create a different object (i.e. no need for >>> flat_list = [1, 2] >>> id(flat_list) 140575082949808 >>> id(flat_list[:]) 140575082949088 >>> flat_list is flat_list[:] # slice notation False >>> flat_list is copy.copy(flat_list) # using copy.copy() False >>> Multi-Line StatementsSince Python treats a newline as a statement terminator, and since statements are often more than is comfortable to put in one line, many people do: if foo.bar()['first'][0] == baz.ham(1, 2)[5:9] and \ # unpythonic verify(34, 20) != skip(500, 360): pass Using a value = foo.bar()['first'][0] * baz.ham(1, 2)[5:9] \ # unpythonic + verify(34, 20) * skip(500, 360) then it would just be subtly wrong. It is usually much better to use the implicit continuation inside parenthesis. This version is bulletproof: value = (foo.bar()['first'][0] * baz.ham(1, 2)[5:9] + # pythonic verify(34, 20) * skip(500, 360)) Also, note that the preferred place to break around a binary operator
(e.g. Multi-Line Strings/ExpressionsSometimes we still see things like DESCRIPTION = "Lorem ipsum dolor sit amet, maecenas consectetur adipiscing " +\ "elit. Ac sapien at dui pellentesque ornare vitae vel dui. Donec " +\ "ac justo eget ligula vehicula adipiscing nec vel orci." when DESCRIPTION = ("Lorem ipsum dolor sit amet, maecenas consectetur adipiscing " "elit. Ac sapien at dui pellentesque ornare vitae vel dui. " "Donec ac justo eget ligula vehicula adipiscing nec vel orci.") would be more pythonic. ImportDo not use from foo import *. Go here and here for more information. Do not use a plain exceptPython has the try: foo = opne("somefile") # misspelled "open" except: sys.exit("could not open file!") The second line triggers a try: foo = opne("somefile") except IOError: sys.exit("could not open file") When this is run, Python will produce a traceback showing the
Because We need Counters rarely>>> counter = 0 # unpythonic >>> while counter < 10: ... # do some stuff ... counter += 1 ... ... >>> counter 10 >>> for counter in range(10): # pythonic ... # do some stuff ... pass ... ... >>> or, another example, the usual index thingy: >>> food = ['donkey', 'orange', 'fish'] >>> for i in range(len(food)): # unpythonic ... print(food[i]) ... ... donkey orange fish >>> for item in food: # pythonic ... print(item) ... ... donkey orange fish >>> and yet another example: >>> i = 0 >>> for item in range(10, 14): # unpythonic ... print(i, item) ... i += 1 ... ... 0 10 1 11 2 12 3 13 >>> for i, item in enumerate(range(10, 14)): # pythonic ... print(i, item) ... ... 0 10 1 11 2 12 3 13 >>> Explicit Iterators only occasionallyInternally Python speaks a lot of iteratorisch all the time... for loops are no exception, an iterator is created implicitly. The following example indexes a list. >>> counter = 0 # unpythonic >>> while counter < len(somecontainer): ... callable_consuming_container_items(somecontainer[counter]) ... counter += 1 ... ... >>> for item in somecontainer: # pythonic ... callable_consuming_container_items(item) ... ... >>> We can go as far as to say that, for simple things, we do not need to create iterators explicitly at all. There are certain cases however when explicit iterators are pretty handy, like for example when we start processing an iterable, stop, do something else, come back and continue processing the iterable (possible because the iterator remembers where it stopped). Let us do some run-up first: >>> somecontainer = list(range(7)) >>> type(somecontainer) <class 'list'> >>> somecontainer [0, 1, 2, 3, 4, 5, 6] >>> somecontaineriterator = iter(somecontainer) >>> type(somecontaineriterator) <class 'list_iterator'> Now, we are ready to start using the iterator: >>> for item in somecontaineriterator: # start consuming the iterable somecontainer ... if item < 4: ... print(item) ... ... else: ... break # breaks out of the nearest enclosing for/while loop ... ... ... 0 1 2 3 Do not be fooled, the iterator stopped at >>> print("Something unrelated to somecontaineriterator.") Something unrelated to somecontaineriterator. >>> next(somecontaineriterator) # continues where previous for/while loop left off 5 >>> next(somecontaineriterator) 6 >>> next(somecontaineriterator) Traceback (most recent call last): # we have exhausted the iterator File "<input>", line 1, in <module> StopIteration >>> Even if this example might look a bit confusing at first glance, it
really is not. All there is to it is an iterator
( Test for MembershipIf we want to know whether or not some container contains a certain
item (called member in case of sets) then we should turn to Using >>> sys.version '2.7.2+ (default, Oct 5 2011, 10:41:47) \n[GCC 4.6.1]' >>> somedict = {} >>> if somedict.has_key(foo): # slow, unpythonic ... pass ... ... >>>
>>> sys.version '3.3.0a0 (default:0b50008bb953, Nov 8 2011, 15:06:08) \n[GCC 4.6.2]' >>> somedict = {} >>> if foo in somedict: # fast, pythonic, mandatory in Python 3 ... pass ... ... >>> Although we used a dictionary in this example, thus the membership
test looked for a key equal to the name The proof with regards to speed... While the following observation is
not always true, it is fair to say that usually, in Python, the
faster solution is the more elegant/pythonic one — that is why sa@wks:~$ python -c 'import sys; print(sys.version)' 2.7.2+ (default, Oct 5 2011, 10:41:47) [GCC 4.6.1] sa@wks:~$ python -m timeit -s 'd=dict.fromkeys(range(99))' '12 in d' 10000000 loops, best of 3: 0.0636 usec per loop sa@wks:~$ python -m timeit -s 'd=dict.fromkeys(range(99))' 'd.has_key(12)' 10000000 loops, best of 3: 0.0964 usec per loop sa@wks:~$ AssignmentsThere is a pythonic way how to do assignments too. Use built-in Data StructuresMany things we would use a for/while loop for in other languages does not require a loop in Python at all. Python provides many higher level facilities to operate on all kinds
of objects. For sequences for example there are The point is that if we keep our data in Python's common data structures such as tuples, dictionaries, lists, sets, etc., we get tons of built-in Python goodies for free. Even if we need some custom data structure, we are almost certainly able to build those from Python's common data structures, thus being able to use all the out of the box goodies as well. So, what is all this goodies? Let us do an example: How do we get some peoples names from an on-disk file into a Python data structure, make sure duplicates are removed, leading and trailing whitespace is stripped, and, because we do not want risking to bring down our server because file handles stay opened, make sure the file gets closed no matter what (power outage etc.)? Well, no, that is not a 1200 lines program... more like two lines actually: sa@wks:/tmp$ cat people.txt Dora John Dora Mike Dora Alex Alex sa@wks:/tmp$ python >>> with open('people.txt', encoding='utf-8') as a_file: # context manager ... {line.strip() for line in a_file} # set comprehension ... ... {'Alex', 'Mike', 'John', 'Dora'} >>> And no, no Range Selection, Negative IndexesWhile this one requires 6 lines of code >>> def print_name(*args, **kwargs): ... if len(args) == 3: ... print("firstname: {} surname: {}".format(args[1], args[2])) ... ... elif len(args) == 2: ... print("firstname: {} surname: {}".format(args[0], args[1])) ... ... ... >>> print_name("Mr", "Steve", "Willis") firstname: Steve surname: Willis >>> print_name("Steve", "Willis") firstname: Steve surname: Willis the next one is more pythonic because it only requires 3 lines of code although doing exactly the same thing: >>> def print_name(*args, **kwargs): ... if 1 < len(args) < 4: ... print("firstname: {} surname: {}".format(args[-2], args[-1])) ... ... ... >>> print_name("Mr", "Steve", "Willis") firstname: Steve surname: Willis >>> print_name("Steve", "Willis") firstname: Steve surname: Willis >>> Whenever we can write the same functionality with less code, then the shorter version is considered more pythonic (assuming that readability/maintainability stays as good or even gets better with the shorter version). Tuples are not just read-only ListsTuples are not just read-only lists... this is a common misconception! And no, we do not want to get rid of either one because they are redundant — lists and tuples are not redundant, they are not the same! We are talking apples and bananas here, not apples and apples... misconception, as I said, sometimes even amongst experienced Pythoneers. Lists are intended to be used as homogeneous sequences, while tuples are hetereogeneous data structures. In other words
The whole is more than the sum of its parts. And that is exactly what the experienced Pythoneer thinks when he thinks/talks about tuples. Depending on what stuff and how it is assembled and put into a tuple, meaning can differ dramatically: >>> person = ("Steve", 23, "male", "London") >>> print("{} is {}, {} and lives in {}.".format(person[0], person[1], person[2], person[3])) Steve is 23, male and lives in London. >>> person = ("male", "Steve", 23, "London") #different tuple, same code >>> print("{} is {}, {} and lives in {}.".format(person[0], person[1], person[2], person[3])) male is Steve, 23 and lives in London. >>> The index in a tuple has an implied semantics. The point of a tuple is that the i-th slot means something specific. In other words, a tuple is an index-based rather than name based data structure.
Let us start over, reset ourselves... now in a more generic way: Python has two seemingly similar sequence types, tuples and lists. The difference between the two that people notice right away, besides literal syntax (parentheses vs square brackets), is that tuples are immutable and lists are mutable. Because this distinction is strictly enforced by Python, some other more interesting differences in application tend to get overshadowed. One common summary of these more interesting differences is that tuples are heterogeneous and lists are homogeneous. In other words:
What are these kinds of stuff things we are talking about? Data types? Sometimes, yes. But data types may not tell the whole story. Let us consider the following two data structures: >>> foo = 2011, 11, 3, 15, 23, 59 >>> foo (2011, 11, 3, 15, 23, 59) # tuple >>> list(range(9)) [0, 1, 2, 3, 4, 5, 6, 7, 8] # list >>>
It is easy to imagine adding and/or removing items from the list without breaking code that uses it or creating some undefined state. If we were to do the same for the tuple... bang A great example of the complementary use of both types is the Python DB API's fetchmany() method, which returns the result of a query as a list of tuples i.e. the result set as a whole is a list, because rows are functionally equivalent (homogeneous). The individual rows are tuples, because rows are coherent, record-like groupings of (heterogeneous) column data e.g. a person, a datetime, etc. There is considerable overlap in the ways tuples and lists can be
used, but the built-in capabilities of the two structures highlight
some of the distinctions. For example, tuples have no Now, how can we be pythonic when using tuples? There is an answer to that as well: Tuple unpacking is a useful technique to extract values from a tuple. Classes are not for grouping Utility FunctionsC# and Java can have code only within classes, and end up with many
utility classes containing only static methods. A common example is a
mathematical function such as sa@wks:/tmp$ echo -e 'def sin():\n pass' > foo.py; cat foo.py def sin(): pass sa@wks:/tmp$ python >>> import foo >>> foo.sin() >>> Say no to getter and setter MethodsThe way to do encapsulation in Python is by using a property rather than getter and setter methods on an object. Using properties we can alter attributes on an object and completely change the implementation mechanism, with no change to any calling code whatsoever (read stable API). Functions are ObjectsIn fact... yes, I know, I said it before and I say it again: in Python everything is an object. Functions are objects. A function is an object that happens to be callable. The example below does an in-place sort of a list of dictionaries
based on the value of the >>> somefoo = [{'price': 9.99}, {'price': 4.99}, {'price': 10}] >>> somefoo [{'price': 9.99}, {'price': 4.99}, {'price': 10}] >>> def lookup_price(someobject): ... return someobject['price'] ... ... >>> somefoo.sort(key=lookup_price) # pass function object lookup_price >>> somefoo [{'price': 4.99}, {'price': 9.99}, {'price': 10}] # in-place sort of somefoo took place >>> type(somefoo) <class 'list'> >>> type(somefoo[0]) <class 'dict'> >>> There is a big difference between Finally, note that if we did not want the in-place sort then we could
have created a new list using CallableThere is also a pythonic way of checking if some object is callable. Delegating CallsIf we have to delegate calls to a superclass/supertype using super() is strongly recommended. Ternary OperatorThe ternary operator should not be done with True/False EvaluationsEven though core Python principles say things should be done explicitly rather than implicitly, that is not true for simple true/false evaluations which should be done implicitly rather than explicitly. Do this >>> foo = [1, 6] # non-empty sequence evaluates to true >>> if foo: # unproblematic thus recommended ... print("foo evaluates to true") ... ... else: ... print("foo evaluates to false") ... ... foo evaluates to true rather than this >>> if foo != []: # to many things can go wrong here ... print("foo != [] evaluates to true") ... ... else: ... print("foo != [] evaluates to false") ... ... foo != [] evaluates to true >>> Python evaluates certain values to false when in a boolean context —
rule of thumb is that all empty values are considered false e.g. In addition to that basic information there are a few other things we should be aware of:
Objects... in Python everything is an object! That is not just true in case we use Python for OOP (Object-Oriented Programming) but also for how Python works internally — there are objects created, mangled, shifted around, deleted, send, retrieved... it really is all objects, cover to cover... Before we start it is important to note that everything on this page is about the modern concept of types/classes in Python, the one called new-style classes — basically, everything from Python 2.2 onwards... Key FactsEvery object has
Object values can be changed, identity and type not
Objects may have
NamesNames are not properties of the object itself, and the object itself does not know what it is called. An object can have any number of names, or no name at all. Names live in namespaces (such as a module namespace, an instance namespace, a function's local namespace). Namespaces are collections of (name, object reference) pairs (implemented using dictionaries). When we call a function or a method, its namespace is initialized with the arguments we call it with (the names are taken from the function's argument list, the objects are those we pass in). In Python, a name or identifier is like a nametag attached to an
object (in Python everything is an object). For example, if we assign
the value Here, an integer object ( Now the name If we assign one name to another name e.g. Now the name >>> a = 1 >>> b = a >>> b 1 >>> a 1 >>> b is a True >>> id(a) 9376544 >>> id(b) 9376544 >>> Indeed, Name vs VariableWe commonly refer to names as variables, even in Python. This is because it is common terminology. What we really mean if we say the word variable in Python is name or identifier. In Python, variables are nametags for values, not labeled boxes containing values! For example, let us do the same example from above but now we assume
C++ instead of Python which defaults to call-by-value evaluation.
Assigning to a variable (e.g. Box Now box
Nametag vs Box SemanticsWRITEME Value, Variable, AssignmentA variable is a name that represents or refers to a value (an object
with some content) — variables names should be chosen appropriately,
matching whatever is semantically correct for the situation at hand.
The process of pointing a variable to a value is called assignment.
For example, the statement Instead of saying pointing to, it is common to say that we are binding
the name Assignments always go into the innermost scope. Also, they do not copy data, rather, an assignment binds a name to an object. AssignmentWe have just seen one above: foo = 10 foo = 20 means that we are first adding the name foo = [] foo.append(1) we are first adding the name Things like Chained AssignmentWe can do it the unpythonic way first: >>> bar = 5 # bind name bar to value 5 >>> foo = 5 >>> bar 5 >>> foo 5 >>> foo == bar # equality check True >>> foo is bar # identity check True >>> del bar # remove binding >>> del foo >>> bar Traceback (most recent call last): File "<input>", line 1, in <module> NameError: name 'bar' is not defined Now, the pythonic way: >>> foo = bar = 5 # chained assignment >>> foo 5 >>> bar 5 >>> foo == bar True >>> foo is bar True Now, let us change the current value of >>> foo = foo + 1 >>> foo 6 >>> foo == bar False >>> foo is bar False >>> Augmented AssignmentWe can do >>> somename = 10 >>> somename = somename + 1 >>> somename 11 >>> or we can use an augmented assignment >>> somename = 10 # new assignment rebinds the name somename to value 10 again >>> somename += 1 >>> somename 11 >>> Both are semantically equivalent and yield the same result. The latter however is more pythonic and concise. Note that augmented assignments are nothing specific to Python but rather many programming languages have them. Sequence Packing/UnpackingThis works for arbitrary iterables e.g. lists, strings, tuples, etc. >>> mytuple = 4, 'foo', ['bar', 3, 'nose'] # sequence packing >>> mytuple (4, 'foo', ['bar', 3, 'nose']) >>> x, y, z = mytuple # sequence unpacking >>> x 4 >>> y 'foo' >>> z ['bar', 3, 'nose'] >>> mytuple[2][1] == z[1] # evaluates to 3 == 3 True >>> While the above is probably known to most people, extended iterable unpacking (introduced with PEP 3132) might not be: >>> list(range(6)) [0, 1, 2, 3, 4, 5] >>> x, *y, z = range(6) # *foo can be at any position e.g. middle >>> x 0 >>> z 5 >>> y [1, 2, 3, 4] >>> type(y) <class 'list'> # not a tuple but a list >>> *foo, baz = range(3) >>> foo [0, 1] >>> baz 2 >>> *foo, baz, *bar = range(8) # there can only be one ;-] File "<input>", line 1 SyntaxError: two starred expressions in assignment >>> foo = {} >>> foo[0] = (1, 2) >>> foo[1] = (3, 4, 5, 9) >>> for a, (b, *c) in foo.items(): ... print(a, b, c) ... ... 0 1 [2] 1 3 [4, 5, 9] >>> Some might expect to receive a tuple when assigning to Sequence unpacking is also useful when returning multiple values e.g. if we wanted to choose a pythonic way of returning values from a function, this is what we can do: >>> import os >>> filename, extension = os.path.splitext('picture.png') >>> filename 'picture' >>> extension '.png' >>> Tuple UnpackingA tuple is a sequence. Unpacking it in a pythonic manner also adheres to the notion that tuples are not just read-only lists: >>> connections = [] >>> connections.append(('1.1.1.1', 223)) >>> connections.append(('2.2.2.2', 12112)) >>> connections.append(('123.212.1.2', 42344)) >>> connections [('1.1.1.1', 223), ('2.2.2.2', 12112), ('123.212.1.2', 42344)] >>> type(connections) <class 'list'> >>> type(connections[0]) <class 'tuple'> >>> for (ip_address, port) in connections: ... if port > 1023: ... print("Connection on IP {:>17} using port {:>5}.".format(ip_address, port)) ... ... ... Connection on IP 2.2.2.2 using port 12112. Connection on IP 123.212.1.2 using port 42344. >>> Reading this code tells us that Mutable vs ImmutableImmutable objects cannot change their value/content and keep their ID
as mutable ones can — they cannot be modified in place as their
mutable counterparts. In other words, if we want to alter an immutable
object, we need to create a new/different one — the new one will have
a different ID. As a CPython implementation detail: Immutable objects play an important role in places where a constant hash value is needed, for example as a key in a dictionary or member of a set. To name a few immutable objects: numbers, strings, tuples, frozensets, byte, None, etc. What all those have in common is that they are Python built-in data structures. It is fair to say that immutable datatypes can be thought of as the basic building blocks used to assemble more complex datatypes e.g. if we use a class to create ourselves a particular datatype used for our individual web application, this class, or rather instances thereof, would probably be mutable but some of its attributes might not be. In a NutshellThe majority of Python's built-in datatypes are immutable i.e. they cannot change their value/content without changing their ID — they cannot be modified in place as their mutable counterparts. Most custom datatypes (e.g. a user-defined class) on the other hand are mutable but might have attributes made of immutable built-in datatypes. Equality vs Identity
>>> foo = bar = list(range(4)) # two names binding to the same list object >>> baz = list(range(4)) # a third name binding to a second lists object >>> foo [0, 1, 2, 3] >>> bar [0, 1, 2, 3] >>> baz [0, 1, 2, 3] >>> id(foo) 39291432 >>> id(bar) 39291432 >>> id(baz) 40455344 >>> foo == bar == baz # equality check True >>> foo is bar is baz # identity check False >>> foo is bar True >>> foo is baz False >>> foo is bar is not baz True >>> As can be see, A prominent example of when it actually makes quite a difference whether we check for equality or identity is with None. CallableA Callable is an abstract form of a function respectively an arbitrary Python object that mimics the behavior of a function in that it can be called. In other words, any callable object (e.g. user-defined functions,
built-in functions, methods of built-in objects, class objects,
methods of class instances, and all objects having a Many kinds of objects in Python are callable, and they can serve many different purposes:
ExamplesThe >>> def foo(): ... pass ... ... >>> callable(foo) True >>> callable(39) False >>> callable(max) True >>> max(3, 5, 88) 88 >>> callable(None) False >>> Making something callablehashable
WRITEME First-class ObjectWRITEME Object-Oriented RelationshipsThis subsection explains the type-instance and supertype-subtype relationships, and can be safely skipped if the reader is already familiar with these OOP concepts. Skimming over the rules below might be useful though. Meet SquasherThis is Squasher. Squasher is a super-smart Python, so smart in fact that he is going to help me explain object-oriented relationships... Types of RelationshipsWhile there are many different objects, there are basically only two kinds of relationships:
Beware of AmbiguityNote the ambiguity in plain English: The term is a is used for both of the above relationships i.e. people tend to say Squasher is a snake and snake is a reptile. That is wrong because it is ambiguous, it leads to confusion and thus mistakes. In order to avoid ambiguity and therefore be able to properly distinguish both cases, the terms outlined above should be used. Properties of RelationshipsIt is useful at this point to note the following (independent) properties of relationships:
In other words, the head end of a dashed arrow can move up a solid arrow, and the tail end can move down. These properties can be directly derived from the definition of the is a kind of (superclass-subclass) relationship. Using the Dashed Arrow Up Rule on our reptile/snake/Squasher example from above we can now conclude that 1) Squasher is an instance of snake (the type of Squasher is snake) and 2) Squasher is an instance of reptile (the type of Squasher is reptile).... Hmm... What?... Squasher has two types? Well, no... Earlier we said that every object has exactly one type. So how come Squasher seems to have two? Note that although both statements are correct, one is more correct (and in fact subsumes the other). In other words:
>>> class Reptile(): # real code would have docstrings ... pass ... ... >>> class Snake(Reptile): # subclassing Reptile ... pass ... ... >>> squasher = Snake() # instantiating a snake; moment of birth for Squasher >>> squasher.__class__ <class '__main__.Snake'> # Squasher's type is Snake >>> isinstance(squasher, Snake) True >>> isinstance(squasher, Reptile) True # Dashed Arrow Up Rule >>> issubclass(Snake, Reptile) True >>> Reptile.__bases__ (<class 'object'>,) # huh? more on that later... >>> Snake.__bases__ (<class '__main__.Reptile'>,) # Snake is a kind of Reptile >>> A similar rules exists for the is a kind of (superclass-subclass) relationship:
Now assume we had subclassed
Object SystemThis subsection will help us understand how objects in Python are created, when this happens and why. Basic ConceptsNow, after some little detour to object-oriented relationships and after cementing the notion of key-facts about objects into our brains, we are now ready to take a detailed look at objects in Python, what they are, why they are useful and why they behave the way the do. So what exactly is a Python object? An object is an axiom in our system i.e. it is the notion of some entity, the most basic building block used to build everything else. We define an object by saying it has:
ExampleEven a simple object such as the number 2 has a lot more to it than meets the eye: 1 >>> foo = 2 2 >>> type(foo) 3 <class 'int'> 4 >>> type(type(foo)) 5 <class 'type'> 6 >>> type(foo).__bases__ 7 (<class 'object'>,) 8 >>> dir(foo) 9 ['__abs__', 10 '__add__', 11 '__and__', 12 '__bool__', 13 14 15 [skipping a lot of lines...] 16 17 18 'conjugate', 19 'denominator', 20 'from_bytes', 21 'imag', 22 'numerator', 23 'real', 24 'to_bytes'] 25 >>> In line 1 we give an integer the name The Of course, the built-in Any class we define is an object, and of course, instances of those classes are objects as well. Even the functions and methods we define are objects. Yet, as we will see, not all objects are made equal. Clean SlateWe are now going to build the Python object system from scratch. Let us begin at the beginning... with a clean slate:
One might be wondering why a clean slate has two grey lines running vertically through it. All will be revealed later. For now this will help distinguish a slate from other figures. On this clean slate, we will gradually put different objects, and draw various relationships, until it is left looking quite full. At this point, it helps if any preconceived object oriented notions of classes and objects are set aside, and everything is perceived in terms of objects and relationships. RelationshipsAs we introduce many different objects, we use two kinds of relationships to connect them. These are the is a kind of (subclass-superclass) relationship and the is an instance of (type-instance) relationship. Add the ObjectsWe are now going to start looking at the object system, bottom-up, i.e. we start with the two most basic objects: type and objectWe examine two objects: 1 >>> object 2 <class 'object'> 3 >>> type 4 <class 'type'> 5 >>> type(object) 6 <class 'type'> 7 >>> type(type) 8 <class 'type'> 9 >>> object.__class__ 10 <class 'type'> 11 >>> object.__bases__ 12 () 13 >>> type.__class__ 14 <class 'type'> 15 >>> type.__bases__ 16 (<class 'object'>,) 17 >>> Lines 1 to 8 show the names respectively representations of the two
most basic objects in Python, In line 5 we start exploring By exploring
These two objects, Let us continue with our explorations: 1 >>> isinstance(object, object) 2 True 3 >>> isinstance(type, object) 4 True 5 >>> In lines 1 and 2 we can see the Dashed Arrow Up Rule in action again.
Since Lines 3 and 4 show applying both, the Dashed Arrow Up Rule and the Dashed Arrow Down Rule which effectively reverses the direction of the dashed arrow. Type ObjectNow for a new concept... type objects. Both of the objects we
introduced so far (
Since the introduction of new-style classes, types and classes are
really the same in Python. Thus it is no wonder that the Before new-style classes were introduced types and classes had their
differences. The term class was traditionally used to refer to a class
created by the Type/Non-Type TypesTypes and, for lack of a better word, non-types are both objects but
only types can have subclasses. Non-types are concrete values so it
does not make sense for another object to be a subclass of a non-type.
Three good examples of objects that are non-types are the integer
We can verify that this rule is true for all objects we have come
across so far, including
Note that we are drawing arrows on our slate for only the direct
relationships, not the implied ones i.e. only if one object is
another's Built-in TypesWe already scratched the surface of built-in types earlier. Now we are
going to give it a more detailed look. Python does not ship with only
two objects, oh no, the two basic types (
This diagram shows a few built-in types. Let us have a closer look at them: 1 >>> list 2 <class 'list'> 3 >>> list.__class__ 4 <class 'type'> 5 >>> list.__bases__ 6 (<class 'object'>,) 7 >>> tuple.__class__ 8 <class 'type'> 9 >>> tuple.__bases__ 10 (<class 'object'>,) 11 >>> dict.__class__ 12 <class 'type'> 13 >>> dict.__bases__ 14 (<class 'object'>,) # object is the supertype/superclass of all objects 15 >>> mylist = [1, 2, 3] # encoding an objects type into its name is unpythonic 16 >>> mylist.__class__ 17 <class 'list'> 18 >>> Line 2 shows the representation of the type of the built-in In line 6 we can see that the built-in All things just said about Of course, when we create a tuple or a dictionary, they are instances of their respective built-in types as well. Last but not least, how can we create an instance of New Objects by SubclassingThe built-in types are, well, built into Python. They are there when we start Python, and remain their after we finish. However, how can we create new types? New types cannot pop out of thin air, rather they have to be built using existing ones. >>> class C: # implicitly subclassed from object ... pass ... ... >>> class D: ... pass ... ... >>> class E(C, D): ... pass ... ... >>> C.__class__ <class 'type'> # Dashed Arrow Down Rule >>> D.__class__ <class 'type'> >>> E.__class__ <class 'type'> >>> C.__bases__ (<class 'object'>,) >>> D.__bases__ (<class 'object'>,) >>> E.__bases__ (<class '__main__.C'>, <class '__main__.D'>) >>> At first we create two new types ( Subclassing built-in TypesSubclassing built-in types is straightforward, actually we have been
doing it all along whenever we subclassed 1 >>> class Foo(list): 2 ... def append(self, item): # overrides append from the built-in list type 3 ... list.append(self, int(item)) 4 ... 5 ... 6 ... 7 >>> bar = Foo() 8 >>> type(bar) 9 <class '__main__.Foo'> 10 >>> Foo.__bases__ 11 (<class 'list'>,) 12 >>> bar 13 [] 14 >>> bar.append(3) 15 >>> bar 16 [3] 17 >>> bar.append(2.432) 18 >>> bar 19 [3, 2] 20 >>> len(bar) 21 2 22 >>> bar[1] = 2.432 23 >>> bar 24 [3, 2.432] 25 >>> bar.color = "blue" 26 >>> bar 27 [3, 2.432] 28 >>> bar.color 29 'blue' 30 >>> In lines 2 to 3 we override the The other interesting bit is with lines 22 to 24. As we can see,
assignments to a particular index position of our list instance In order to have the same casting in place for assignments as well we
would have to define the special method Because the Customizing the instantiation and creation process... Another way of creating a list subclass/subtype is by customizing its
instantiation process. Instantiating a list subclass/subtype works
just like instantiating any other type works which is by calling
The way we customize the instantiation/creation process of a list
subclass/subtype is by having the special method Tuples are immutable and different from lists such that once a tuple instance is created, it cannot be changed (modified in place) anymore. In general, every time a new instance of some class/type is created,
two special methods are called —
first The 1 >>> class Foo(list): # real code would have docstrings 2 ... def __init__(self, itr): 3 ... list.__init__(self, [int(item) for item in itr]) 4 ... 5 ... 6 ... 7 >>> class Bar(tuple): 8 ... def __new__(cls, itr): 9 ... seq = [int(item) for item in itr] 10 ... return tuple.__new__(cls, seq) 11 ... 12 ... 13 ... 14 >>> bazbar = Foo() # we need to supply an iterable 15 Traceback (most recent call last): 16 File "<input>", line 1, in <module> 17 TypeError: __init__() takes exactly 2 arguments (1 given) 18 >>> bazbar = Foo([1, 32.243, 111.2]) 19 >>> bazbar 20 [1, 32, 111] 21 >>> type(bazbar) 22 <class '__main__.Foo'> 23 >>> bazbar.__class__ 24 <class '__main__.Foo'> 25 >>> Foo.__bases__ 26 (<class 'list'>,) # Foo is a list subclass/subtype 27 >>> foobaz = Bar() 28 Traceback (most recent call last): 29 File "<input>", line 1, in <module> 30 TypeError: __new__() takes exactly 2 arguments (1 given) 31 >>> foobaz = Bar([2.3, 3.42433, 4]) 32 >>> foobaz 33 (2, 3, 4) 34 >>> Bar.__bases__ 35 (<class 'tuple'>,) # Bar is a tuple subclass/subtype 36 >>> type(foobaz) 37 <class '__main__.Bar'> 38 >>> foobaz[1] = 3.23 # tuples are immutable 39 Traceback (most recent call last): 40 File "<input>", line 1, in <module> 41 TypeError: 'Bar' object does not support item assignment 42 >>> foobaz[1] 43 3 44 >>> bazbar[1] 45 32 46 >>> bazbar[1] = 4.32 # lists are mutable 47 >>> bazbar 48 [1, 4.32, 111] 49 >>> The difference of customizing the instantiation/creation process
depending on whether or not we subclass immutable or mutable types can
be seen from lines 1 to 13. For immutable types we need to override
In both cases we take an iterable as second argument and cast its
items to Note that the New Objects by InstantiationSubclassing is only half the story of new types... >>> obj = object() >>> type(obj) <class 'object'> >>> cobj = C() >>> type(cobj) <class '__main__.C'> >>> class FooBar(list): ... pass ... ... >>> FooBar.__bases__ (<class 'list'>,) >>> foo = FooBar() >>> type(foo) <class '__main__.FooBar'> >>> isinstance(foo, list) # Dashed Arrow Up Rule True >>> The call operator ( Of course, we can subclass
Note that by implicitly subclassing Notes on InstantiationHow does Python really create a new object?
When using instantiation, we specify the type, but how does Python know which type to use when we use subclassing?
Can we instead specify a particular type to use?
Can we use any type for an object's
What if we have multiple supertypes/superclasses, and do not specify a
Wrap UpWe ended up with a comprehensive map of Python's object system. Here we also unravel the mystery of the vertical grey lines. They just segregate objects into three spaces based on what the common man calls them — metaclasses, classes, and instances.
It is also worth noting that SummaryThere are two kinds of objects in Python:
AttributeNext to understanding iterators, descriptors, decorators and how the object system in Python works, understanding what attributes are, how they are accessed and what intrinsic semantics go along with doing so, is probably the most important thing to know for any Pythoneer out there. A value associated with an object which is referenced by name using
dotted expressions. For example, if object When we apply the power of the almighty dotted expression
( Attribute AccessWhich object does an attribute access return though? Where does the object set as an attribute end up? What are the ties between attribute access and inheritance? And most importantly, what exactly does attribute access mean? Let us have a look... >>> class Ding: # real code would have docstrings ... pass ... ... >>> dong = Ding() >>> dong.foo = 2 # setting an attribute by assignment >>> dong.foo # attribute reference 2 >>> del dong.foo # attribute deletion >>> An attribute can be referenced, assigned to or deleted. Any of these or any combination thereof is what we call attribute access. What exactly happens during attribute access is explained next. Examples of Attribute AccessLet us start with the simplest of attribute access types, referencing
an attribute. An attribute reference is an expression of the form
We can already see that there are quite a few combinations with regards to attribute access depending on
While we now know that the algorithm used to figure out what to do when an attribute access happens, let it be known that we can of course entirely customize attribute access. However, let us not get ahead of ourselves and start with the basics: >>> class Foo: # real code would have docstrings ... baz = 23 ... bar = 45 ... def faz(self): ... print("Method faz in class Foo.") ... ... def foz(self): ... print("Method foz in class Foo.") ... ... ... >>> class Rab(Foo): # subclassing Foo ... bar = 89 # overriding bar ... fuz = 32 ... noz = 314 ... def foz(self): # overriding foz ... print("Method foz in class Rab.") ... ... def sna(self): ... print("Method sna in class Rab.") ... ... ... >>> fux = Rab() >>> fux.noz = 22 # setting attributes on the instance will >>> fux.niz = 42 >>> Rab.__name__ 'Rab' >>> del Rab.__name__ Traceback (most recent call last): File "<input>", line 1, in <module> TypeError: can't delete Rab.__name__ >>> Rab.__bases__ (<class '__main__.Foo'>,) >>> fux.__class__ <class '__main__.Rab'> >>> type(fux) <class '__main__.Rab'> >>> Rab.__dict__ dict_proxy({'__module__': '__main__', 'bar': 89, 'noz': 314, 'sna': <function sna at 0x1adb1e8>, '__getattribute__': <slot wrapper '__getattribute__' of 'object' objects>, 'fuz': 32, '__doc__': None, 'foz': <function foz at 0x1adb7c0>}) >>> fux.__dict__ {'niz': 42, 'noz': 22} # put them into __dict__ on the instance >>> dir(fux) ['__class__', # attributes automatically set by Python '__delattr__', '__dict__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'bar', # attributes set by us 'baz', 'faz', 'foz', # a method is an attribute too... 'fuz', 'niz', 'noz', 'sna'] >>> Rab.foz <function foz at 0x1aec160> >>> Rab.__dict__['foz'] <function foz at 0x1aec160> >>> fux.foz #... a so-called bound method <bound method Rab.foz of <__main__.Rab object at 0x1b7fe90>> >>> fux.foz() Method foz in class Rab. >>> We create two classes/types, It is worth nothing that every class/type has a few special attributes
such as One special attribute that is of special interest to us is Using the Then, finally, a look at a method and as can be seen, a method really is nothing special but just yet another attribute and depending on where/how it is accessed it is bound/unbound etc. It is now time to have a look at how exactly an attribute lookup works and how it differs on whether or not we start out on a class/type itself or an instance thereof. Getting an Attribute from a ClassUsing
Getting an Attribute from an InstanceWhen we do
Examples of Attribute access on Instances - non-callable Attributes: >>> fux.__dict__ {'niz': 42, 'noz': 22} # niz and noz are keys in fux's __dict__ >>> fux.noz 22 >>> fux.niz 42 >>> The two attributes >>> fux.fuz 32 >>> fux.__class__.__dict__['fuz'] # semantically equivalent to ... 32 >>> Rab.__dict__['fuz'] #... this one as can be seen below (same object) 32 >>> fux.__class__.__dict__['fuz'] is Rab.__dict__['fuz'] True >>> If we lookup the We can also see that there is more than one way to reference the >>> fux.baz 23 >>> Foo.__dict__['baz'] # semantically equivalent to ... 23 >>> fux.__class__.__bases__[0].__dict__['baz'] #... this one as can be seen below (same object) 23 >>> Foo.__dict__['baz'] is fux.__class__.__bases__[0].__dict__['baz'] True >>>
Examples of Attribute access on Instances - callable Attributes: >>> fux.sna <bound method Rab.sna of <__main__.Rab object at 0x1b7fc50>> >>> Rab.__dict__['sna'].__get__(fux, Rab) <bound method Rab.sna of <__main__.Rab object at 0x1b7fc50>> >>> Rab.sna <function sna at 0x1c58490> >>> Rab.sna is Rab.__dict__['sna'] True >>> fux.sna is Rab.__dict__['sna'].__get__(fux, Rab) False >>> While we have seen above that there are different ways to get to an object behind an attribute name and that in case this object is a non-callable object we can always positively check for identity, the question now remains whether or not the same is true for callable objects as well. If we take a look at the What Python did to create the bound method on instance Looking up a callable Attribute, starting on the Instance: Now, let us have a look at what happens if With this example we start the lookup process on the instance rather
than its class/type. Note that we will not explicitly look into how
things differ in case we start the lookup from the class/type i.e.
When doing the At this point we know two important facts which lead to the final
step/action of the attribute lookup process: because we started out on
instance Thus, when a descriptor is involved whatever is the result of this descriptor call gets returned/assigned/deleted, depending on which descriptor method gets called. This is how method calls work — a method object is just a function wrapper attached to another object which calls the function object and thereby provides information about the instance and class it was called on. Attribute is not FoundWhen no attribute is found a >>> fux.duck Traceback (most recent call last): File "<input>", line 1, in <module> AttributeError: 'Rab' object has no attribute 'duck' >>> However, if Setting an AttributeIt is important to note that the order in which lookups are performed only happens when we refer to an attribute but not when it is set e.g. by assigning. When we set a new attribute on some class/type or instance then we
only set the If the check for an overriding descriptor such as Customizing Attribute AccessIn case we want to deviate from the default attribute access machinery, we can do so and influence/customize the default order/way attribute access is done. We do so by means of some special methods namely
Notice that all of those methods are bound methods i.e. they are given an instance as implicit first argument which they then use to carry out whatever task we implemented with them.
ExamplesWe are now going to look at what different semantics we get based on
whether we use __getattr__() is conditional: >>> class Bar: # real code would have docstrings ... def __getattr__(self, key): ... if key == 'drink': ... return "whiskey" ... ... else: ... raise AttributeError ... ... ... ... >>> foo = Bar() >>> foo.__dict__ {} # no foo.__dict__['drink'] key so __getattr__() is called >>> foo.drink 'whiskey' >>> foo.drink = "milk" # setting foo.__dict__['drink'] >>> foo.drink 'milk' >>> foo.__dict__ {'drink': 'milk'} # now __getattr__() is not called anymore >>> The attribute name is passed into If the attribute name is unknown, When we fail to return anything and After instantiating After explicitly setting __getattribute__() is unconditional: >>> class Foo: # real code would have docstrings ... def __getattribute__(self, key): ... if key == 'drink': ... return "whiskey" ... ... else: ... raise AttributeError ... ... ... ... >>> baz = Foo() >>> baz.__dict__ # we always go through __getattribute__() Traceback (most recent call last): File "<input>", line 1, in <module> File "<input>", line 7, in __getattribute__ AttributeError >>> baz.drink 'whiskey' >>> baz.drink = "milk" # (trying to) set foo.__dict__['drink'] >>> baz.drink 'whiskey' # huh?... __getattribute__() is unconditional >>> baz.__dict__ # but at least we do not return None Traceback (most recent call last): File "<input>", line 1, in <module> File "<input>", line 7, in __getattribute__ AttributeError >>> Even after explicitly setting So where did If our class defines a We need to be careful with >>> class FooBar: ... def __getattribute__(self, somekey): ... raise AttributeError # every attribute reference will raise AttributeError ... ... def count_items(self): # therefore this method will never be called ... pass ... ... ... >>> itemlist = FooBar() >>> itemlist.count_items() Traceback (most recent call last): File "<input>", line 1, in <module> File "<input>", line 3, in __getattribute__ AttributeError >>> The class/type TypeWhat is the difference between a class and a type? There is none, they are the same thing. Polymorphism, Encapsulation, InheritancePolymorphism, encapsulation, inheritance... these terms mean that we can use the same operations on objects of different types, and they will work as if by magic (polymorphism) — we care about interfaces rather than object types. We hide unimportant details of how objects work from the outside world (encapsulation), and we can create specialized objects from general ones (inheritance). ExpressionStatementBefore we start looking at statements, let us first clarify on the difference between statements and expressions: Statement vs ExpressionAn expression is something e.g. exec() vs eval()Quite often we see questions like: How do I execute Python code from a string? Let us start with saying that this is generally not a good idea and its use should be kept to a minimum if not avoided at all. The reason is that executing strings is considered insecure (especially in the context of web applications). For statements we can use >>> type(exec) <class 'builtin_function_or_method'> >>> mycode = 'print("hello world")' >>> exec(mycode) hello world >>> As a marginal note, When we need the value of an expression, >>> type(eval) <class 'builtin_function_or_method'> >>> myvar = eval('2 + 1') >>> myvar 3 >>> However, as mentioned, the first step should be to ask ourselves if we really need to execute code from a string. Executing code from strings should generally be the position of last resort — it is slow, ugly and dangerous if it can contain user-entered code. We should always look at alternatives first (e.g. literal_eval()), such as higher order functions, to see if these can better meet our needs. Now, a closer look at some of the more interesting statements... clauseEverybody knows the Anyhow, back on topic, what is a clause? First of all, why is it
called clause rather than statement? Well, it is not a statement on
its own. Rather, a clause is part of another statement
e.g. a >>> number = int(input('Enter a number: ')) Enter a number: 3 >>> type(number) <class 'int'> >>> number 3 >>> if number < 10: ... print("number is smaller than 10") ... ... elif 10 <= number < 100: ... print("number is between 10 and 99") ... ... else: ... print("number is bigger than 99") ... number is smaller than 10 >>> The only thing worth noting here is that because break, continue, elseWe can use the Well, they either execute a block of code until their condition
becomes false (
How do we do that? The answer is we use the breakThe 1 >>> from math import sqrt 2 >>> int(2.61343) 3 2 4 >>> int(2.1) 5 2 6 >>> int(-2.834) 7 -2 8 >>> int(-2.1) 9 -2 10 >>> sqrt(9) 11 3.0 12 >>> sqrt(3) 13 1.7320508075688772 14 >>> int(sqrt(3)) 15 1 16 >>> for number in range(99, 0, -1): 17 ... root = sqrt(number) 18 ... 19 ... if root == int(root): 20 ... print(number) 21 ... break 22 ... 23 81 24 >>> This example makes use of the Once we get to the iteration where The Ok, nice, but what is the point we are trying to make? Well, say we
left out line 21 i.e. we would not use >>> for number in range(99, 0, -1): ... root = sqrt(number) ... ... if root == int(root): ... print(number) ... 81 64 49 36 25 16 9 4 1 >>> Ah, we are still able to find the biggest square below 100 but then why iterate down to zero if 81 is all we are after? This may not make much of a difference with this simple example but what would execution time look like if had to deal with 100,000,000 iterations rather than 100? In addition, to make things even more realistic, let us assume the items of our sequence would not be numbers and what we do with each item of the sequence is not just computing its square and testing for equality but, what if we had multi-page text documents which we are scanning for a particular sequence of characters? You see where this is going... continueThe >>> for item in "iamastring": ... if item == "i": ... continue ... ... if item == "a": ... continue ... ... print(item) ... m s t r n g Nothing much to say here really. A string is a sequence so works
perfectly fine with a However, some folks will tell you that for them >>> for item in "iamastring": ... if item not in "ia": ... print(item) ... m s t r n g >>> for item in "iamastring": ... if not (item == "i" or item == "a"): ... print(item) ... m s t r n g >>> And in fact that is true. There is no need to ever use elseThe
Let us expand on the squares example from above where we want to find
the largest square below 100, namely 81. However, we are going to
alter the example so that in fact we will not find the biggest square
below 100 (because we do not iterate down to zero but rather stop at
82). What 1 >>> for number in range(99, 81, -1): 2 ... print(number, end=' ') 3 ... 4 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82 >>> 5 >>> for number in range(99, 81, -1): 6 ... root = sqrt(number) 7 ... 8 ... if root == int(root): 9 ... print(number) 10 ... break 11 ... 12 ... else: 13 ... print("Hm, I did not find a square...") 14 ... 15 Hm, I did not find a square... 16 >>> Lines 1 to 4 are just to show that we really never get to 81 as the right index is exclusive whereas the left on is inclusive — we end up with what is shown in line 4. Since we never get to 81, that means we never enter the code block in
lines 9 and 10 and thus never break out of the Below is a slightly more complex example which makes use of the 1 >>> for number in range(2, 16): 2 ... for divisor in range(2, number): 3 ... if number % divisor == 0: 4 ... break 5 ... 6 ... else: 7 ... print(number, "is a prime") 8 ... 9 ... 10 2 is a prime 11 3 is a prime 12 5 is a prime 13 7 is a prime 14 11 is a prime 15 13 is a prime 16 >>>
passThe >>> while True: ... pass # FIXME: infinite loop ... ... Traceback (most recent call last): File "<input>", line 2, in <module> KeyboardInterrupt >>> This example waits for keyboard interrupt (Ctrl+C) to terminate and will otherwise loop forever. >>> class BazFoo: # real code would have docstrings ... pass # TODO: implement me ... ... >>> Here we have a minimal class. Sometimes we just put the stub in place and finish it later e.g. when we write tests before the actual code, as usual, because we are responsible software engineers. >>> def ensure_human_shape(*args, **kwargs): ... pass # TODO: implement me ... ... >>> Another place Do not stub, document!What we can do as well, because it is semantically equivalent, is this: >>> def ensure_human_shape(*args, **kwargs): ... """Make sure the alien body looks human.""" ... ... # TODO: implement me >>> Rather than using a and, or, notThe operators
... some common examples before we take a detailed look at >>> myfoo = "" # empty string evaluates to false in a boolean context >>> type(myfoo) <type 'str'> >>> myfoo or "we don't like empty strings" # x is false so y is returned "we don't like empty strings" >>> myfoo = "we really don't..." >>> myfoo or "we don't like empty strings" # x is true so it is returned without evaluating y "we really don't..." >>> As can be seen, the fact that >>> def baz(): ... print("DDoS... battle stations everybody!") ... ... >>> callable(baz) # being a function, baz is callable True >>> myfoo = "" >>> myfoo or baz() DDoS... battle stations everybody! >>> myfoo = "So peaceful today..." >>> myfoo or baz() 'So peaceful today...' >>> Comparing both, the andSo let us have a closer look at >>> "x" and "y" 'y' >>> False and "y" False >>> "x" and "y" and (2, 4) (2, 4) >>> "x" and "" and (2, 4) '' >>> {} and [] and (2, 4) {} >>> {} and [] and () {} >>> As we can see, evaluation starts on the left and only continues
further right if the current value under evaluation does not evaluate
to orand now >>> "x" or "y" 'x' >>> False or "y" 'y' >>> None or "y" # None is false in a boolean context 'y' >>> "x" or "y" or (2, 4) 'x' >>> "x" or "" or (2, 4) 'x' >>> {} or [] or (2, 4) (2, 4) >>> {} or [] or () () >>> And again, evaluation always starts on the left and only continues
further right if the current value under evaluation does not evaluate
to notUsed to negate logical state i.e. flip the logical state of its
operand. For example, if a boolean context evaluates to >>> if not "": # empty string evaluates to false in a boolean context ... print("not flipped False to True") ... ... else: ... print("not flipped True to False") ... ... not flipped False to True >>> other Use Cases
Ternary OperatorThere are two possibles syntax choices here. At first the old and-or trickery and then the new and recommended if-else variant: and-or TrickThe ternary operator for Python... >>> abooleancontext = "" >>> abooleancontext and "x" or "y" 'y' >>> abooleancontext = "foo" >>> abooleancontext and "x" or "y" 'x' >>> abooleancontext and "" or "y" # x is false so it always returns y 'y' >>> abooleancontext = "" >>> abooleancontext and "" or "y" # x is false so it always returns y 'y' >>> As can be seen, the and-or trick is what most of us know from C/C++ as
the Therefore, combining x if abooleancontext else y>>> abooleancontext '' # empty string evaluates to false in a boolean context >>> "x" if abooleancontext else "y" 'y' >>> abooleancontext = "bar" >>> "x" if abooleancontext else "y" 'x' >>> "" if abooleancontext else "y" '' >>> "" if abooleancontext else "" '' >>> abooleancontext = "" >>> "" if abooleancontext else "y" 'y' >>> Nothing much to say here except for that this variant is what should be used to have the ternary operator in Python because it is unproblematic and easier to read and thus considered more pythonic compared to doing the ternary operator thingy using the old and quirky and-or trickery. ExceptionsExceptions are used to handle program state that is sub-optimal but can be handled by a program without leading to a crash. This is different to the concept/idea of assert which is used to test for state that must not happen. Sometimes exceptions are also used for program flow (the codepath through a program). This however is considered bad practice as it is misuse of the general concept/idea of exceptions and often leads to complex and ugly code. Exceptions are a means of altering the codepath by breaking out of the normal flow of control of a code block in order to handle errors or exceptional conditions/state. An exception is raised at the point where the error/condition/state is detected i.e. it may be handled by the surrounding code block or by any code block that directly or indirectly called the code block where the error/condition/state occurred (somewhere further up the call stack). For example, Python raises an exception when it detects a runtime
error such as division by zero. However, we can explicitly raise an
exception with the Python uses the termination model of exception handling i.e. an exception handler can find out what happened and continue execution in a stack frame further up the call stack, but it cannot repair the cause of the exception and retry the failing operation (except by re-entering the offending piece of code from the top again). When an exception is not handled at all, Python either terminates
execution of the program or returns to its interactive main loop. In
either case, it prints a call stack backtrace also known as traceback
(except when the exception is Exceptions are identified by class/type instances. The CatchWe have already seen that using a bare except clause is a bad idea. Another example of where exceptions are used is with context managers. PEP 3110 brought a change when it landed in Python 2.6. Since then except clauses are written using an as clause: >>> try: ... prnt("typo in print") # typo will raise NameError exception ... ... except NameError as e: # as clause ... print('A "NameError" exception ocurred: ', e) ... ... A "NameError" exception ocurred: name 'prnt' is not defined >>> We can also catch two or more different types of exceptions with a single except clause: >>> try: ... prnt("typo in print") # raises exception ... 2 + "foo" ... ... except (NameError, TypeError) as e: ... print('exception ocurred: ', e) ... ... exception ocurred: name 'prnt' is not defined >>> try: ... 2 + "foo" # raises exception ... prnt("typo in print") ... ... except (NameError, TypeError) as e: ... print('exception ocurred: ', e) ... ... exception ocurred: unsupported operand type(s) for +: 'int' and 'str' >>> raiseWe can also raise exceptions in our own code: >>> try: ... raise Exception("foo", "bar") ... ... except Exception as e: # bind exception object to name e ... for each in e.args: # e[i] does not work anymore in Python 3 ... print(each) ... ... ... foo bar >>> Exception ObjectAs shown above, by using an as clause we can get access to the exception object in the current scope. An exception object itself has attributes such as: >>> e = Exception("foo", "bar") >>> e.args ('foo', 'bar') >>> e.__class__ <class 'Exception'> >>> e.__reduce_ex__() (<class 'Exception'>, ('foo', 'bar'), {}) >>> Creating our own ExceptionsWe can create our own exception objects by subclassing Context ManagerA context manager is an object which controls the context seen by code contained inside a with compound statement. The concept of context managers seems to confuse people a lot, not as much as decorators or let alone attribute access but still, one might get the idea that context managers to many is black magic. The concept of context manager is explained best by first elaborating on the terminology used, next the problem domains they are applied to (read use cases), followed by examples in code and finally a somewhat detailed look at their innards and the processes involved when they are being used. As everything else in Python, a context manager is an object. A
context manager is created using the Now that we have a basic idea about what we are dealing with, next thing to do is boost our understanding of context managers by looking at some use cases. It will become pretty clear pretty quickly what the typical problem domain is where context managers are the solution. Once this is understood, we can look at how they are used, and after that, peek under the hood and figure out how context managers work and how we can build our custom ones. Use CasesSome typical use cases for context managers are:
withThe Python The The Python standard library has many resources that obey the context
management protocol already and so can be used with However, as mentioned, it quickly became obvious that context managers are the solution to a whole range of other problems too, not just use cases involving the acquisition and release of resources. That is why context managers are used all over the place for all kinds of things involving changes/alterations to the current run time (context) in some way. Standard SyntaxThis is how we make use of the with expression [as variable]: with-block
Nested SyntaxContext managers can be nested: with expression-1 [as variable], expression-2 [as variable]: with-block is equivalent to with expression-1 [as variable]: with expression-2 [as variable]: with-block try/finally vs withBefore the arrival of the try: # this block gets executed no matter what except [expression]: # this block handles the exception finally: # clean up e.g. release acquired resources The Using an Below is an example of acquiring a resource (e.g. a file) and making sure it is released again after being used: >>> foo = open('somefile.txt', mode='w', encoding='utf-8') >>> try: ... foo.write("hello world") ... ... finally: ... foo.close() ... ... 11 # somefile.txt contains 11 bytes >>> sa@wks:/tmp$ cat somefile.txt hello worldsa@wks:/tmp$ And now the same using the >>> with open('anotherfile.txt', mode='w', encoding='utf-8') as bar: ... bar.write("Hey there too!") ... ... 14 >>> sa@wks:/tmp$ cat anotherfile.txt Hey there too!sa@wks:/tmp$ As can be seen, the ExamplesNow, before we look at the context management protocol and finally how to build custom context managers, let us have a look at the most common use cases for context managers where Python's ready-made context managers are used: Files: We have already looked at one case and here is another one. One of my favorites however is whenever I can use Counter, to for example count the occurrences of words in a text file and sort them in descending order: >>> from collections import Counter >>> Counter.__bases__ (<class 'dict'>,) >>> with open('/tmp/gpl-3.0.txt') as foo: ... words = re.findall('\w+', foo.read().lower()) ... Counter(words).most_common(10) ... ... [('the', 345), ('of', 221), ('to', 192), ('a', 184), ('or', 151), ('you', 128), ('license', 102), ('and', 98), ('work', 97), ('that', 91)] >>> In case we want to temporarily alter arithmetic precision: >>> from decimal import Context >>> from decimal import Decimal >>> from decimal import localcontext >>> foo = Decimal('43') >>> foo.sqrt() Decimal('6.557438524302000652344109998') >>> with localcontext(Context(prec=4)): ... foo.sqrt() ... ... Decimal('6.557') # temporarily switched to lower precision >>> foo.sqrt() Decimal('6.557438524302000652344109998') # back to original precision >>> Locking/Unlocking: Whenever we execute code in parallel using threads, there are use cases where we need locking/unlocking of resources to for example protect them from parallel access: >>> from threading import Lock >>> hasattr(Lock(), '__enter__') True # it really is a... >>> hasattr(Lock(), '__exit__') True #... context manager >>> with Lock(): ... pass # critical code here ... ... >>> Context Management ProtocolBy now we already know about the use cases, syntax and semantics of
context managers. We have learned that the As usual, this step is well defined by one of several so-called
protocols in Python. In order to turn a random object into a context
manager it needs to implement the context management protocol which
means it needs to have two special methods defined, >>> class Foo: ... def __init__(self): ... pass ... ... def __enter__(self): ... print("hello") ... ... def __exit__(self, extype, exvalue, traceback): ... print("world") ... ... ... >>> with Foo(): ... print("big") ... ... hello # we enter the temporary context big world # we leave the temporary context >>> Creating a custom context manager is straight forward as can be seen.
We just create a class/type as usual and implement the __enter__()Called when we enter a temporary context. The with compound statement
will bind the return value to the target(s) specified in the >>> class Bar: ... def __init__(self): ... pass ... ... def __enter__(self): ... return "neo4j and MongoDB rock!" ... ... def __exit__(self, extype, exvalue, traceback): ... pass ... ... ... >>> with Bar() as foobar: # binds name foobar to return value of __enter__() ... foobar ... ... 'neo4j and MongoDB rock!' >>> __exit__()Called when we exit the temporary context. It takes four
formal parameters, one being self: If an exception is supplied, and >>> class Fiz: ... def __init__(self): ... pass ... ... def __enter__(self): ... pass ... ... def __exit__(self, extype, exvalue, traceback): ... print("type: {}, value: {}, traceback: {}".format(extype, exvalue, traceback)) ... print("clean up nontheless...") ... # not returning a true value i.e. exceptions propagate >>> with Fiz(): ... print("do some stuff... oops, it raises an exception") ... raise RuntimeError("Something bad happened") ... ... do some stuff... oops, it raises an exception type: <class 'RuntimeError'>, value: Something bad happened, traceback: <traceback object at 0x35e9bd8> clean up nontheless... # we have a chance to clean up nonetheless... Traceback (most recent call last): File "<input>", line 3, in <module> RuntimeError: Something bad happened >>> The important bit to understand here is that even though our
with-block raised an exception, we still can clean up e.g. close
opened resources etc. Also, we did not swallow the exception but let
it propagate further up the call stack, something this [skipping a lot of lines...] ... def __exit__(self, extype, exvalue, traceback): ... print("type: {}, value: {}, traceback: {}".format(extype, exvalue, traceback)) ... print("clean up nontheless...") ... return True # swallow exception [skipping a lot of lines...] Creating Context ManagersThere are two ways to do it:
contextlibThe >>> import contextlib >>> contextlib.__all__ ['contextmanager', 'closing', 'ContextDecorator'] >>>
Now, let us take the hello big world example from before and rewrite it using a generator-based context manager: >>> from contextlib import contextmanager >>> @contextmanager ... def foobar(): ... print("hello") ... yield ... print("world") ... ... >>> with foobar(): ... print("big") ... ... hello big world >>> It might not be obvious at first glance but using generator-based context managers can really save some typing although one might argue that in fact class/type-based context managers are probably easier for most people to understand and grasp when confronted with the notion of context managers for the first time. as clause
>>> from contextlib import contextmanager >>> @contextmanager ... def barbaz(): ... pass ... yield "neo4j and MongoDB rock!" # this value (e.g. string) is bound to name foo below ... pass ... ... >>> with barbaz() as foo: ... print(foo) ... ... neo4j and MongoDB rock! >>> Context Managers and ExceptionsSince we also do not have an explicit __exit__() special method which
>>> from contextlib import contextmanager >>> @contextmanager ... def fiz(): ... try: ... yield ... ... except RuntimeError as inst: # will catch RuntimeError exceptions only ... print("RuntimeError: {}".format(inst.args[0])) ... ... finally: ... print("clean up...") ... ... ... >>> with fiz(): ... print("with-block...") ... ... with-block... # no exception raised clean up... >>> with fiz(): ... print("with-block...") ... raise RuntimeError("Something bad happened") ... ... with-block... RuntimeError: Something bad happened # handled exception clean up... >>> with fiz(): ... print("with-block...") ... raise Exception ... ... with-block... clean up... Traceback (most recent call last): # unhandled exception propagates up the call stack File "<input>", line 3, in <module> Exception >>> As can be seen, the generator-based context manager initializes the
context, yields exactly one time, then cleans up the context. The
with-block is executed at the point where the generator yields and the
generator is resumed after the with-block is exited. The value
yielded, if any, is bound to the variable in the Exceptions from within the with-block are re-raised inside the
generator i.e. they can be caught and handled there using a
Using a closingWe already know that file objects are one example of Python's
built-in context managers as they ensure that when used with the >>> foo = open('/tmp/file.txt', mode='w', encoding='utf-8') >>> foo.write("some stuff...") 14 >>> foo.close() # manually closing the file >>> foo.write("some more stuff...") Traceback (most recent call last): File "<input>", line 1, in <module> ValueError: I/O operation on closed file. >>> but rather, when using >>> with open('/tmp/file.txt', mode='w', encoding='utf-8') as bar: ... bar.write("more stuff...") ... ... 14 >>> bar.write("and even more...") Traceback (most recent call last): File "<input>", line 1, in <module> ValueError: I/O operation on closed file. >>> As can be seen, there is no need for us to explicitly call Now, what if we do not have file objects to deal with but rather something akin like for example some object that allows us to do I/O as well, thus it provides some sort of handle too, which should get closed eventually as well? Basically what we want is something like shown below but maybe have a shortcut to it: >>> from contextlib import contextmanager >>> from urllib.request import urlopen # not a file object but provides a handle too >>> @contextmanager ... def open_url(url): ... try: ... foo = urlopen(url) ... print("page is closed: {}".format(foo.isclosed())) ... yield foo ... ... except RuntimeError: ... pass ... ... finally: ... foo.close() # we have to explicitly close the handle ... print("page is closed: {}".format(foo.isclosed())) ... ... ... >>> with open_url('') as page: ... numberlines = 0 ... for line in page: ... numberlines += 1 ... ... print("{} has {} lines".format(page.geturl(), numberlines)) ... ... page is closed: False # in open_url's try has 600 lines page is closed: True # in open_url's finally >>> Now, let us do the same thing but let us use the >>> from contextlib import closing >>> from urllib.request import urlopen >>> with closing(urlopen('')) as page: ... print("page is closed: {}".format(page.isclosed())) ... numberlines = 0 ... for line in page: ... numberlines += 1 ... ... print("{} has {} lines".format(page.geturl(), numberlines)) ... ... page is closed: False has 600 lines >>> print("page is closed: {}".format(page.isclosed())) page is closed: True # closing did its job >>> As can be seen, we did not have to do an explicit call to ContextDecoratorAs mentioned, the 1 >>> from contextlib import ContextDecorator 2 >>> class FooBar(ContextDecorator): # real code would have docstrings 3 ... def __enter__(self): 4 ... print("hello") 5 ... 6 ... def __exit__(self, extype, exvalue, traceback): 7 ... print("world") 8 ... 9 ... 10 ... 11 >>> with FooBar(): # used as class/type-based context manager 12 ... print("big") 13 ... 14 ... 15 hello 16 big 17 world 18 >>> @FooBar() # still a class/type-based context manager but used as decorator 19 ... def foo(): 20 ... print("big") 21 ... 22 ... 23 >>> foo() 24 hello 25 big 26 world 27 >>> def foo(): # shown just for demonstration purposes 28 ... with FooBar(): 29 ... print("big") 30 ... 31 ... 32 ... 33 >>> foo() 34 hello 35 big 36 world 37 >>> Note how the version from lines 18 to 20 is just syntactic sugar for what is shown in lines 27 to 29. now with exception handling... And of course, all the exception handling works just like before but now we can have our class/type-based context manager used as decorator and also have exception handling: >>> class FooBar(ContextDecorator): ... def __enter__(self): ... print("hello") ... ... def __exit__(self, extype, exvalue, traceback): ... print("type: {}, value: {}, traceback: {}".format(extype, exvalue, traceback)) ... print("clean up nontheless...") ... ... ... >>> with FooBar(): ... print("big") ... raise RuntimeError("Something bad happened") ... ... hello big type: <class 'RuntimeError'>, value: Something bad happened, traceback: <traceback object at 0x273eef0> clean up nontheless... Traceback (most recent call last): File "<input>", line 3, in <module> RuntimeError: Something bad happened >>> @FooBar() ... def foo(): ... print("big") ... raise RuntimeError("Something bad happened") ... ... >>> foo() hello big type: <class 'RuntimeError'>, value: Something bad happened, traceback: <traceback object at 0x24efd88> clean up nontheless... Traceback (most recent call last): File "<input>", line 1, in <module> File "/usr/lib/python3.2/contextlib.py", line 16, in inner return func(*args, **kwds) File "<input>", line 4, in foo RuntimeError: Something bad happened >>> Boolean ContextWRITEME Argument, ParameterFormal parameters are those we declare within the function/method signature, the values we supply to a function/method call are called actual parameters or arguments. The argument, also known as actual parameter, is the value passed to a function/method, assigned to a named local variable within the function/method body. Positional/Keyword ArgumentsIn its definition a function/method may have both, positional
arguments and keyword arguments. Positional and keyword arguments may
be of variable length i.e. The convention within the function/method signature is to use >>> def foo(*args, **kwargs): ... print(args) ... print(kwargs) ... ... >>> foo() () # positional arguments are stored in a tuple {} # keyword arguments are stored in a dictionary >>> foo(2, "hello", offset=19) (2, 'hello') {'offset': 19} >>> Argument ListEverything in between Any expression may be used within the argument list, and the evaluated value is passed to the named local variable within the function/method body. In general, an argument list must have any positional arguments followed by any keyword arguments, where the keywords must be chosen from the formal parameter names. It is not important whether a formal parameter has a default parameter value or not. No argument may receive a value more than once i.e. formal parameter names corresponding to positional arguments cannot be used as keywords in the same function/method call. Default Parameter ValueKeyword arguments are often used to provide default parameter values i.e. values that get passed into the functions/method body if not explicitly specified when we make the function/method call: >>> def greet_all(greeting="Hello"): # strings are immutable ... print(greeting) ... ... >>> greet_all() Hello # default parameter value >>> greet_all(greeting="Hi there") Hi there # was explicitly specified >>> The default parameter value for a function/method argument is only evaluated once, when the function/method is defined — which for example happens when the module it is contained in is loaded because it is imported. Python then assigns the default parameter value to the variable. As we will see, this may cause problems if the default parameter value is a mutable object such as a list or a dictionary. If the function/method modifies the object (e.g. by appending an item to a list), the default parameter value is modified. Mutable Types as Default Parameter ValuesNow, the one thing that trips up any Python greenhorn... Mutable types used as default parameter values in function/method definitions ... We should not use a mutable type (a value that can be modified in place) as value for default parameter values... big NoNo! Here is why: 1 >>> def foo(bar, baz=[]): # lists are mutable 2 ... baz.append(bar) 3 ... print(baz) 4 ... 5 ... 6 >>> foo.__defaults__ 7 ([],) 8 >>> id(foo(3)) 9 [3] # yes, that is what we expect but... 10 8794272 11 >>> foo.__defaults__ 12 ([3],) #... we have now changed the default value for baz... 13 >>> id(foo(4)) 14 [3, 4] #... which is bad, as can be seen 15 8794272 16 >>> foo.__defaults__ 17 ([3, 4],) 18 >>> id(foo(5, baz=[2, 1])) 19 [2, 1, 5] # works as expected because baz was explicitly specified 20 8794272 21 >>> foo.__defaults__ 22 ([3, 4],) 23 >>> After evaluating the function/method, Python does not check if a value
(and therefore, with CPython, its location in memory) has changed
after being defined. If we look at the ID then we can see that
function When we initially appended a value to the list represented by None as Default Parameter ValueNone is used a lot in combination with default parameter values e.g. when we specify formal parameters and assign them default values: >>> def foo(bar, baz=None): # None is immutable ... if baz is None: ... baz = [] ... ... baz.append(bar) ... print(baz) ... ... >>> foo.__defaults__ (None,) >>> id(foo(3)) [3] 8794272 >>> foo.__defaults__ (None,) # not [3] as above >>> id(foo(4)) [4] # not [3, 4] as above 8794272 >>> foo.__defaults__ (None,) >>> id(foo(5, baz=[2, 1])) [2, 1, 5] 8794272 >>> foo.__defaults__ (None,) >>>
Default Parameter Value Assignment inside Function/MethodSomething that works well and keeps function/method signatures short
is using >>> def foo(*args, **kwargs): ... bar = args[0] if args else [] ... baz = kwargs.get('baz', []) ... fiz = kwargs.get('fiz') ... print(bar, baz, fiz) ... ... >>> foo.__defaults__ # this time we set defaults inside the function/method >>> id(foo()) [] [] None 8121376 >>> id(foo(3)) 3 [] None 8121376 >>> id(foo(4, baz="some string")) 4 some string None 8121376 >>> id(foo(foobar=43)) [] [] None 8121376 >>> What we did here in order to have default parameter values is using
the ternary operator for the Namespace, ScopeBecause namespaces are one honking great idea, everybody should know about them. Purpose and UseA namespace is a mapping from names to objects like for example, it is the place where a variable is stored which points to some object. Namespaces are implemented as dictionaries. There is the local, global and built-in namespaces as well as nested namespaces in objects (e.g. with methods). Namespaces support modularity by preventing naming conflicts — this is because we can structure our source code into context related bits and pieces. For instance, the functions Namespaces go hand in hand with scope. A scope is a textual region of source code where a namespace is directly accessible. Directly accessible here means that an unqualified reference to a name attempts to find the name in the current local namespace. Import mattersFor once there is the way semantics are different based on what
All these ways to do imports apply to all names i.e. classes/types, functions/methods... every object... Imports can be confusing for the effect they have on the namespace, but exercising a little care can make things much cleaner. relative vs absolute import
WRITEME How they workA namespace is a mapping from names to objects. Most namespaces are implemented as Python dictionaries (keys are names, values are memory addresses where objects can be found), but that is normally not noticeable in any way (except for performance), and it may change in the future. Examples of namespaces are:
The important thing to know about namespaces is that there is no
connection between names in different namespaces. For instance, two
different modules may both define a function Namespaces are searched for names inside out i.e. if there is a certain name declared in the module's global namespace, we can reuse the name inside a function while being certain that any other function will get the global name. Of course, we can force the function to use the global name by prefixing the name with the global keyword. But if we need to use this, then we might be better off using classes and objects anyway... Strictly speaking, references to names in modules are attribute
references i.e. with Attributes may be read-only or writable. In the latter case,
assignment to attributes is possible i.e. we can write assignments
such as selfClasses and namespaces have special interactions. The only way for a class's method (not to be confused with class method) to access its own variables or functions (as names) is to use a reference to itself. This means that the first argument of a method must be a We can define multiple classes in the same module (and hence the same namespace) and have them share some global data. This is different from other object-oriented programming languages but then one usually gets used to it pretty quickly... LifetimeNamespaces are created at different moments and have different lifetimes. The namespace containing the built-in names is created when the Python interpreter starts up, and is never deleted. The global namespace for a module is created when the module definition is read i.e. when the module is imported. Usually, module namespaces also last until the interpreter quits. The statements executed by the top-level invocation of the
interpreter, either read from a script file or interactively, are
considered part of a module called __main__, so they have their own
global namespace. The built-in names actually also live in a module
The local namespace for a function is created when the function is called, and deleted when the function returns or raises an exception that is not handled within the function. Actually, forgetting would be a better way to describe what actually happens. Of course, recursive invocations each have their own local namespace. globals(), locals()
The contents of this dictionary should not be modified as changes may not affect the values of local and free variables used by the interpreter. Free Variabledef foo(): bar = 42 baz = [bar + each for each in range(10)] When a name such as If a name such as ScopeA scope is a textual region where a namespace is directly accessible.
Directly accessible here means that an unqualified reference to a name
( Although scopes are determined statically, they are used dynamically. At any time during execution, there are at least three nested scopes whose namespaces are directly accessible:
If a name is declared global (using the Usually, the local scope references the local names of the (textually) current function. Outside functions, the local scope references the same namespace as the global scope i.e. the module's namespace. Class definitions place yet another namespace in the local scope. When a class/type definition is entered, a new namespace is created, and used as the local scope. All assignments to local variables now go into this new namespace. When a class/type definition is left, a class/type object is created. This is basically a wrapper around the contents of the namespace created by the class/type definition. The original local scope (the one in effect just before the class/type definition was entered) is reinstated, and the class/type object is bound here to the class/type name given in the class/type definition. It is important to realize that scopes are determined textually i.e. the global scope of a function defined in a module is that module's namespace, no matter from where or by what alias the function is called. On the other hand, the actual search for names is done dynamically i.e. at run time. However, the language definition is evolving towards static name resolution (at compile time) therefore we should not rely on dynamic name resolution! In fact, local variables are already determined statically. A special quirk of Python is that assignments always go into the innermost scope. Assignments do not copy data, rather they bind names to objects. The same is true for deletions: the statement Nested ScopeA nested scope is the ability to refer to a variable in an enclosing
definition. For instance, a function defined inside another function
can refer to variables in the outer function (using Local variables, both, read and write from/to the innermost scope. Likewise, global variables read and write from/to the global namespace. nonlocal, globalAs mentioned before, we have two statements at hand that allow us to change the dynamics of scoping and namespaces. The So what is the difference between Below is an example demonstrating how to reference the different
scopes/namespaces, and how def scope_test(): # real code would have docstrings foo = "test foo" def do_local(): foo = "local foo" def do_nonlocal(): nonlocal foo foo = "nonlocal foo" def do_global(): global foo foo = "global foo" do_local() print("After local assignment:", foo) do_nonlocal() print("After nonlocal assignment:", foo) do_global() print("After global assignment:", foo) scope_test() print("In global scope:", foo) The output of the example code is: After local assignment: test foo After nonlocal assignment: nonlocal foo After global assignment: nonlocal foo In global scope: global foo Note how the local assignment (which is default) did not change
Another example of where FunctionA function is a block of statements which returns some value (or None) to a caller. It can be passed zero or more arguments which may be used in the execution of the function body: >>> def some_action(*args, **kwargs): # functions perform "actions/tasks" ... pass ... ... >>> type(some_action) <type 'function'> >>> some_action() # using the call operator on the function object >>> type(some_action.__get__) # functions are descriptors <class 'method-wrapper'> >>> Above is a function with a single statement (pass) in its function
body. If we call it then it does nothing, nothing at all... except
for being an object of type Calling a function works by using the call operator ( Relationship with Descriptors/MethodsA function is to a method what a pip is to an apple... A method is in fact a function. In other words, when we use a method, a function object gets wrapped by a method object i.e. there could not be methods without functions. To support method calls, functions have the returnOne thing any function does is return something, always — a function will never be without a return value/object based on which we can act. If we do not explicitly use the >>> def foo(): ... pass # we do not return anything explicitly ... ... >>> type(foo()) <class 'NoneType'> >>> if foo() is None: ... print("foo returned None") ... ... foo returned None >>> def foo(): ... return # explicit use of return but without an argument ... ... >>> type(foo()) <class 'NoneType'> >>> if foo() is None: ... print("foo returned None") ... else: ... print("foo didn't return None") ... ... foo returned None >>> def foo(): ... return "Hello" # explicit use of return but with an argument ... ... >>> foo() 'Hello' # it really returns something... >>> if foo() is None: ... print("foo returned None") ... ... else: ... print("foo didn't return None") ... ... foo didn't return None >>> type(foo()) <class 'str'> #... a string this time >>> lambdaAn anonymous inline function consisting of a single expression which
is evaluated when the function is called. The syntax to create a
lambda function is In many cases making use of closures seem a wiser choice, if only for the fact that they can be named and thus reused. Another fact to consider is that with closures we can use statements, something lambda functions do not allow us to do. CofunctionCofunctions are based on subgenerators. WRITEME Function Annotations
WRITEME Function PassingThere are two reasons why we want to do that:
Callback FunctionA callback is a function provided by the consumer of an API (Application Programming Interface) that the API can then turn around and invoke (calling us back). For example, if we setup a Dr.'s appointment, we can give them our phone number, so they can call us the day before to confirm the appointment. A callback is like that, except instead of just being a phone number, it can be arbitrary instructions like send us an email at this address, and also call our secretary and have her put it in our calendar. Callbacks are often used in situations where an action is asynchronous. If we need to call a function, and immediately continue working, we can not sit there wait for its return value to let us know what happened, so we provide a callback. When the function has finished its asynchronous work it will invoke our callback code with some predetermined arguments (usually some we supply, and some about the status and result of the asynchronous action we requested). If the Dr. is out of the office, or they are still working on the schedule, rather than having us wait on hold until he gets back, which could be several hours, we hang up, and once the appointment has been scheduled, they call us. Python will invoke our callback code with any arguments we supply and the result of its asynchronous computation, once this asynchronous computation has finished executing. Let us look at some example: sa@wks:~$ python >>> def callback(nums): ... """The callback function.""" ... return sum(nums) * 2 ... >>> def another_callback(nums): ... """Yet another callback function.""" ... return sum(nums) * 3 ... >>> def strange_sum(nums, cb): ... """Asynchronous computation. ... ... Returns the sum, if less than 10 else returns the result ... of calling the callback function cb(), which must accepts ... one list argument. ... ... """ ... if sum(nums) > 10: ... print("no callback function used") ... ... else: ... return cb(nums) ... ... >>> print(strange_sum([1, 10], callback)) no callback function used None >>> print(strange_sum([3, 2], another_callback)) 15 >>> print(strange_sum([6, 4, 3], another_callback)) no callback function used None >>> sa@wks:~$ So basically, a callback is a function that we pass as an argument (to
another function that is; functions itself are only values in Python
i.e. calling Function HandlerA handler is a asynchronous callback subroutine that can be told to do some work for us and call back when it is done (see Dr.'s appointment example). ClosureWhile usually we use an object to represent state (data) and attach behavior (code) to it, a closure does the opposite: A closure is a function (code) with objects (state) attached to it. PastIn the past a closed function was a function where the binding for its variables was known in advance. Some early languages did not have closed functions so the binding for variables was unknown up until run time (late binding). Programming languages that had both, open functions and closed functions, needed a way to distinguish between the two, so people started referring to the latter as closures. PresentIn Python, as well as in most other modern programming languages, all functions are closed functions in the above sense i.e. there are no variables which we do not know their binding before run time. Because of this the term closure has morphed from a function for which all variables have a known binding to a function that can refer to environments which are no longer active e.g. the namespace of an outer function, even after that function has finished executing. Guess what, every function in Python has this intrinsic capability... In Python, all functions come enabled closures i.e. local names (variables) can bind to names in outer scopes. It is up to us whether or not we exploit the closure capability of a function, thus creating a closure: >>> def foo(): ... counter = 0 ... def bar(): ... nonlocal counter # exploit closure capability ... counter += 1 ... return counter ... ... return bar ... ... >>> c1 = foo() >>> c2 = foo() >>> c1() 1 >>> c1() 2 # preserve state across function calls >>> c1() 3 >>> c2() 1 >>> c2() 2 >>> c1() 4 >>> We use nonlocal to exploit the fact that a function is a closure and that we can bind values to variables of an outer scope. This way our values will exist across function calls and can be used to e.g. built a counter into a function. While the above is useful, so far we are actually not exploiting the
full potential of closures because we do not provide input to either
function, neither the outer ( >>> def foo(offset=None): # real code would have docstrings ... counter = 0 ... if offset is not None: ... counter += offset ... ... def bar(): ... nonlocal counter ... counter += 1 ... return counter ... ... return bar ... ... >>> c1 = foo() >>> c2 = foo(3) # now with offset >>> c1() 1 >>> c1() 2 >>> c2() 4 >>> c2() 5 >>> c1() 3 >>> The next level of enhancement would be to enable the inner function
( __closure__
>>> c1.__closure__ (<cell at 0x1ab4e88: int object at 0x927120>,) >>> c1.__closure__[0] <cell at 0x1ab4e88: int object at 0x927120> # our counter variable >>> GeneratorA generator is a function that returns an iterator. It looks like a normal function except that it contains Each SubgeneratorWith the introduction of PEP 343 generators got enhanced so that it became possible to do basic coroutines with them. However, PEP 380 will probably bring something even better than that... Subgenerators! And that is not where it ends... PEP 3152 will enable us to build cofunctions bases on subgenerators... WRITEME Generator ExpressionAn expression that returns an iterator for lazy evaluation. It looks
like a normal expression followed by a >>> sum(number * number for number in range(10)) 285 >>> sum(number * number for number in range(10) if number % 2) 165 >>> type(number * number for number in range(10)) <class 'generator'> >>> However, this did in fact not show the actual nature of generator
expressions i.e. the fact that we actually deal with an iterator. What
happened is that >>> mygenerator = (number * number for number in range(10)) >>> mygenerator.next <method-wrapper 'next' of generator object at 0x20fda00> >>> mygenerator.next() 0 >>> mygenerator.next() 1 >>> mygenerator.next() 4 >>> mygenerator.next() 9 >>> mygenerator.next() 16 >>> mygenerator.next() 25 >>> mygenerator.next() 36 >>> mygenerator.next() 49 >>> mygenerator.next() 64 >>> mygenerator.next() 81 >>> mygenerator.next() Traceback (most recent call last): File "<input>", line 1, in <module> StopIteration >>> Now, as can be seen, What we have just learned is that generator expressions are the lazy evaluation equivalent of list comprehensions. The upshot from this is that using a generator expression instead of a list comprehension can save both, CPU and RAM. Therefore, if we need to build a list that we do not actually need
(e.g. because we are passing it to Also, note that if we want to create a tuple, list or set from a generator expression, then we can save ourselves a pair of parenthesis: >>> tuple((number * number for number in range(10))) (0, 1, 4, 9, 16, 25, 36, 49, 64, 81) >>> tuple(number * number for number in range(10)) (0, 1, 4, 9, 16, 25, 36, 49, 64, 81) >>> Finally, the only thing left to say is probably that when combined with other Python built-ins such as for example all() and any(), generator expressions really are a very powerful tool because they pack a lot of behavior (source code) in just a single line of code. DecoratorA decorator is a function that wraps another object, usually another function/method or a class/type. It controls input and output to/from the wrapped object. Decorators are merely syntactic sugar and should not be confused with the language-agnostic decorator design pattern in a strict sense as they provide a lot more functionality. Many people even say that the name decorator in Python is a misleading one... A decorator dynamically adds/removes responsibilities and/or capabilities to/from an object. It does so without changing the object's interface which makes decorators most useful when we want to alter responsibilities and/or capabilities of an object without superclassing/subtyping it or without using composition. Another way to put it would be to say that subclassing/subtyping adds behavior at compile time, thus affecting all instances of a class/type. Decorating however adds, and instantly makes available, new behavior at run time, possibly in a way so that it only affects a single instance. This fact makes decorators incredibly versatile and therefore the number of possible use cases is almost infinite. To name a few examples of canonical uses of decorators: they are used for creating class methods and static methods, managing attributes, tracing, setting pre- and postconditions, synchronization, descriptors, logging, tail recursion elimination, memoization, and even improving the writing of decorators themselves. However, every Pythoneer will probably come across a dozen more use cases during his career which, one way or the other, involve the use of decorators. We now know that decorators wrap other objects e.g. other functions.
We also mentioned that the generator syntax ( orig_function = my_decorator(orig_function) is equivalent to this version which does use the explicit generator syntax: @my_decorator def orig_function(): print("inside orig_function") What happens behind the curtains is that when the compiler (the Python
interpreter) passes over this code, Use Cases, Idiom ClarificationBefore we go into details about what exactly decorators are, how we construct and use them, let us look at some miscellaneous information like for example when and why we might be choosing to use a decorator. Why have Decorators?Python decorators are an interesting example of why syntactic sugar matters. In principle, their introduction in Python 2.4 changed nothing, since they did not provide any new functionality that was not already present in the language. In practice however, their introduction has significantly changed the way we structure source code today:
Decorator vs Decorator PatternDespite the name, Python decorators are not an implementation of the decorator design pattern but instead can be used to implement it if needed. The decorator design pattern is a design pattern used in statically typed object-oriented programming languages to allow responsibilities and/or capabilities to be added to objects at compile time so they can be used later at run time. The name decorator in Python was initially (and rightfully so if I may add) used with some trepidation because there was concern that it would be confused with, or used synonymously with the decorator design pattern. Other names were considered for it, but unluckily the name decorator was chosen. Quick recap: Python decorators can dynamically add/remove responsibilities and/or capabilities to/from an object at run time i.e. they are a higher-level construct compared to the decorator design pattern used in statically typed languages. Of course, we can use a Python decorator to implement the decorator design pattern but as mentioned before, that is an very limited use of it. Most people will agree that a Python decorator, is probably best equated to a macro. Decorator vs AdapterThe decorator design pattern differs from the adapter design pattern in that decorators wrap functions/methods whereas adapters wrap classes/types or instances thereof. The alerted reader might note that since the introduction of class decorators with PEP 3129 this line has been blurred and indeed, class decorators are capable of the same things as adapters are. Target/Decorator/Wrapper ObjectsBefore we start let us highlight something important which we will explain in detail later on. Decorating an object usually involves three entities:
Function/Method DecoratorFunction/method as well as class-based decorators were introduced a long time before class decorators. The main difference is simply that function/method decorators decorate function/method objects whereas class/type decorators are used to decorate class/type objects. Decorator BasicsDecorator as well as wrapper objects are ordinary Python functions. It is the way we use them and how they work together with our target objects that makes them decorators. We already know that functions in Python are objects, just like everything else. That is what we are going to look at first because to understand decorators, we must first understand that functions/methods are objects in Python as this has important implications for understanding how decorators work. Let us have a look: 1 >>> def shout(*args): 2 ... print(args[0].capitalize()) 3 ... 4 ... 5 >>> shout("yes") 6 Yes 7 >>> type(shout) 8 <class 'function'> 9 >>> id(shout) 10 140324592422704 11 >>> scream = shout 12 >>> type(scream) 13 <class 'function'> 14 >>> id(scream) 15 140324592422704 16 >>> shout is scream 17 True 18 >>> scream("yes") 19 Yes 20 >>> del shout 21 >>> shout("yes") 22 Traceback (most recent call last): 23 File "<input>", line 1, in <module> 24 NameError: name 'shout' is not defined 25 >>> scream("yes") 26 Yes 27 >>> As can be seen from line 11, we can bind as many names as we want to
the same object (a function object in this case). After line 11, both
names, Only if all bindings get removed then the object becomes unreachable and garbage collection kicks in, but that is another story altogether ... The point to note here is that functions/methods are objects and behave and can be treated as such. We will see how this is a crucial fact to make decorators work. We can define Functions/Methods inside other Functions/Methods... The other important thing to realize is that we can define functions/methods inside other functions/methods, another thing needed to make decorators work: >>> def talk(): ... def whisper(): ... return "Yes" ... ... print(whisper()) ... ... >>> talk() Yes >>> whisper() Traceback (most recent call last): File "<input>", line 1, in <module> NameError: name 'whisper' is not defined >>> Lesson to be learned here is that functions/methods can be nested but,
if nested, then scoping becomes relevant i.e. we can see that References to Function/Method Objects are passed around... We have seen that functions/methods are objects and therefore
So, what does that mean? Why is this important with regards to decorators? Well, that means that a function/method can return another function/method (actually any callable works but more on that later). Let us have a look: >>> def baz(ban="doupper"): ... ... def upper(arg="yEs"): ... return arg.upper() ... ... def lower(arg="yEs"): ... return arg.lower() ... ... if ban == "doupper": ... return upper # note the lack of the call operator ... ... else: ... return lower ... ... >>> baz <function baz at 0x2267050> >>> baz() <function upper at 0x7faee8b36f30> >>> baz("something else than doupper") <function lower at 0x2267f30> >>> foo = baz() # bind name foo to function object upper >>> print(foo) <function upper at 0x7faee8b36f30> >>> print(foo()) # call upper and print result YES >>> print(foo("make tall")) # call upper with non-default keyword argument MAKE TALL >>> print(baz("gimme lower")()) # call lower and print result yes >>> print(baz("gimme lower")("Lower THIS")) # call lower with non-default keyword argument lower this >>> The point here is that we can define functions which we then pass
around without directly calling to the function object (we are not
making use of the call operator Functions/Methods can be passed as Arguments... Last but not least, the final piece still missing from our puzzle before we finally have all bits needed to create decorators... we are passing function/method objects as arguments (also known as actual parameters) to another function/method: >>> def pre(): ... print("called before") ... ... >>> def post(): ... print("called after") ... ... >>> def eating_functions(before, after): ... before() ... print("do some stuff here...") ... after() ... ... >>> eating_functions(pre, post) called before do some stuff here... called after >>> Handcrafted DecoratorsNow that we know all the ingredients needed to create decorators, let
us just go ahead and do it! We start doing it by hand explicitly using
all the afore mentioned target, decorator and wrapper objects.
Ultimately we will switch to the well-known and streamlined syntactic
sugar notation of using the 1 >>> def decorator_function(target_function): 2 ... 3 ... def wrapper_function(): 4 ... print("do stuff before calling target function") 5 ... target_function() # call operator is present 6 ... print("do stuff after calling target function") 7 ... 8 ... return wrapper_function # note the lack of the call operator 9 ... 10 ... 11 >>> def foo(): 12 ... print("I say foo") 13 ... 14 ... 15 >>> def bar(): 16 ... print("I say bar") 17 ... 18 ... 19 >>> foo() 20 I say foo 21 >>> bar() 22 I say bar Nothing unusual up to line 22... we create our decorator function
which contains the wrapper function, just as explained during the
introduction. The functions 23 >>> decorated_foo = decorator_function(foo) 24 >>> decorated_foo() 25 do stuff before calling target function 26 I say foo 27 do stuff after calling target function 28 >>> decorated_bar = decorator_function(bar) 29 >>> decorated_bar() 30 do stuff before calling target function 31 I say bar 32 do stuff after calling target function First time we use our decorator is in line 23 — we have
seen this before remember? We will later replace it with the 33 >>> foo() 34 I say foo 35 >>> foo = decorator_function(foo) 36 >>> foo() 37 do stuff before calling target function 38 I say foo 39 do stuff after calling target function So while what we did in line 24 is nice, that is actually not
transparent to whoever uses The fix is trivial though, we just rebind the name 40 >>> @decorator_function 41 ... def foo(): 42 ... print("I say foo") 43 ... 44 ... 45 >>> foo() 46 do stuff before calling target function 47 I say foo 48 do stuff after calling target function 49 >>> Finally, we made the home run and arrived at line 40 were we use the
well-known decorator notation and declare that we want to use the
decorator from lines 1 to 8 on our That is it, we just covered decorators, hooray! :-] The reader who understood things so far has at this point understood (the mystery of) decorators in Python! Everything now following in this subsection is just about some additional bells and whistles but nothing as substantial compared to what we have learned to far about decorators in Python. Chaining DecoratorsThe next thing we are going to look at is something most people will want to do after they have been using decorators for some time and thus gained enough self-confidence. So, what is that thing? It is using two or more decorators on a single target function/method, commonly known as chaining decorators together. First and most important thing to note about chaining decorators is that order matters i.e. as there is one decorator per line, the outcome will be different whether we write @fuz @fiz def baz(): pass or @fiz @fuz def baz(): pass As usual, all this is best explained with an example, a rather tasty one if I may say so... sandwich anyone?! Let us make a sandwich: >>> def bread(func): # a decorator ... def wrapper(): ... print("</''''''\>") ... func() ... print("<\______/>") ... ... return wrapper ... ... >>> def ingredients(func): # another decorator ... def wrapper(): ... print("/tomatoes/") ... func() ... print(" ~salad~") ... ... return wrapper ... ... >>> def sandwich(filling=" -cheese-"): # target function ... print(filling) ... ... >>> sandwich() -cheese- >>> sandwich = bread(ingredients(sandwich)) # using non-@ notation >>> sandwich() </''''''\> /tomatoes/ -cheese- ~salad~ <\______/> >>> @bread # using @ notation ... @ingredients ... def sandwich(filling=" -cheese-"): ... print(filling) ... ... >>> sandwich() </''''''\> /tomatoes/ -cheese- ~salad~ <\______/> >>> @ingredients # changing order leads to... ... @bread ... def sandwich(filling=" -cheese-"): ... print(filling) ... ... >>> sandwich() /tomatoes/ # this anti-sandwich :-] </''''''\> -cheese- <\______/> ~salad~ >>> Passing Arguments to the Target FunctionThere is one notable thing to all examples we discussed so far — none of them passed arguments to the target function while it was decorated. Let us take our initial example and alter it so that the target function receives an argument while being decorated: 1 >>> def decorator_function(target_function): # real code would have docstrings 2 ... 3 ... def wrapper_function(*args): 4 ... print("do stuff before calling target function") 5 ... target_function(*args) 6 ... print("do stuff after calling target function") 7 ... 8 ... return wrapper_function 9 ... 10 ... 11 >>> @decorator_function 12 ... def foo(*args): 13 ... print("I say foo plus I have a {} eating {}".format(args[0], args[1])) 14 ... 15 ... 16 >>> foo("fish", "tomcat") 17 do stuff before calling target function 18 I say foo plus I have a fish eating tomcat 19 do stuff after calling target function 20 >>> The only thing that needed change on the decorator side can be seen from lines 3 and 5 respectively, where we need to ensure that the wrapper function/method passes along all the arguments to the target function/method. On the side that gets decorated the change is even more obvious...
The target function/method now has an argument list accepting any
number of positional arguments which will be turned into a tuple which
means we can reference individual arguments by index e.g. Decorating Methods/Alter ArgumentsWe already know that methods are in fact just functions, made pretty
using some self lipstick. That means that with regards to decorating
methods, the only additional thing we need to is to be aware to use
1 >>> def mydecorator(target): 2 ... def wrapper(self, offset=0): # same argument list as target function/method 3 ... offset -= 5 4 ... return target(self, offset) 5 ... 6 ... return wrapper 7 ... 8 ... This time we consider Remember what we said at the beginning of this section :... controls
input and output to/from the wrapped object... that of course
includes its argument list which we fiddle with in our One recommendation before we move on, something that trips up lots of people is when they, unintentionally or not, change the argument list which by itself does not necessarily break anything but makes it very likely that at some point someone will get confused and write stuff that is going to blow up. Therefore, let us do everybody a favor and retain argument lists such cases as shown above in lines 2 to 4 and further down, whenever used, such as with line 29. 9 >>> class Person: 10 ... def __init__(self): 11 ... self.bodyweight = None 12 ... # no use of mydecorator 13 ... def print_bodyweight(self, offset=0): 14 ... return self.bodyweight + offset 15 ... 16 ... 17 ... 18 >>> steve = Person() 19 >>> steve.bodyweight = 80 20 >>> steve.print_bodyweight() 21 80 22 >>> steve.print_bodyweight(-5) # Steve lies a bit 23 75 Steve does not get the additional liar's boost of our 24 >>> class DecoratedPerson: 25 ... def __init__(self): 26 ... self.bodyweight = None 27 ... 28 ... @mydecorator # now mydecorator is used 29 ... def print_bodyweight(self, offset=0): # same argument list as decorator 30 ... return self.bodyweight + offset 31 ... 32 ... 33 ... 34 >>> paul = DecoratedPerson() 35 >>> paul.bodyweight = 80 36 >>> paul.bodyweight 37 80 38 >>> paul.print_bodyweight() 39 75 40 >>> paul.print_bodyweight(-5) # Paul beats Steve being a liar 41 70 42 >>> Paul totally does. Signature-Preserving vs Signature-ChangingGo here and here for more information. Function/Method Decorator - Generalized FormWhile the all we have seen so far is fine, it makes sense to have a generalized version of a decorator that we can use for any function/method. We are now going to create a generalized version of a function/method decorator which will include the use of *args and **kwargs: >>> def mydecorator(target): ... def wrapper(*args, **kwargs): # the use of *args and **kwargs is key for a generalized decorator ... print(args) # lookout for the output of this and ... print(kwargs) # this in all examples below ... ... target(*args, **kwargs) ... ... return wrapper ... ... The above decorator is used by all variations of the below which are functions and a method on a class/type. >>> @mydecorator ... def foo(): ... print("Function without arguments.") ... ... >>> foo() () {} Function without arguments. >>> @mydecorator ... def bar(a, b, c): ... print("Function with positional arguments.") ... ... >>> bar(1, "duck", 2) (1, 'duck', 2) {} Function with positional arguments. >>> @mydecorator ... def baz(a, b, c, akey=None): ... print("Function with positional and keyword arguments.") ... ... >>> baz(1, 2, 3, akey="some value") (1, 2, 3) {'akey': 'some value'} Function with positional and keyword arguments. >>> class FooBar: ... def __init__(self): ... self.bodyweight = None ... ... @mydecorator ... def print_bodyweight(self, *args, **kwargs): ... print("Method with positional and keyword arguments.") ... print(self.bodyweight + args[0]) ... ... ... >>> tom = FooBar() >>> tom.bodyweight >>> tom.bodyweight = 90 >>> tom.bodyweight 90 >>> tom.print_bodyweight(10, 3, somekey="some value") (<__main__.FooBar object at 0x26fb7d0>, 10, 3) {'somekey': 'some value'} Method with positional and keyword arguments. 100 >>> tom.bodyweight 90 >>> The above is self-explanatory. The important thing to note is that the same decorator is used for a function with positional and/or keyword arguments as well as a method on a class/type. Passing Arguments to the DecoratorWe have already seen how to pass arguments to the target object, now it is time to go beyond ordinary use of decorators and look at advanced subjects such as how to pass arguments to the decorator itself. Doing so requires some thought because a decorator function usually takes another object as its argument and therefore we cannot pass the decorated function arguments directly to the decorator. ReminderBefore rushing to the solution, let us write a little reminder: 1 >>> def mygenerator(target): 2 ... print("Despite my name, I am just a function.") 3 ... 4 ... def wrapper(): 5 ... print("I am the wrapper function inside mygenerator.") 6 ... target() 7 ... 8 ... return wrapper 9 ... 10 ... 11 >>> def lazy(): 12 ... print("lazy evaluation") 13 ... 14 ... 15 >>> lazy_decorated = mygenerator(lazy) 16 Despite my name, I am just a function. 17 >>> lazy_decorated() 18 I am the wrapper function inside mygenerator. 19 lazy evaluation 20 >>> @mygenerator # @ and () are both call operators 21 ... def lazy(): 22 ... print("lazy evaluation") 23 ... 24 ... 25 Despite my name, I am just a function. 26 >>> lazy() 27 I am the wrapper function inside mygenerator. 28 lazy evaluation 29 >>> The things to remember from this reminder snippet are with lines 15
and 20 as well as 16 and 25. Huh?... Yes, @ is a Call Operator tooLet us now take a closer look at how we can use all the knowledge we have gathered so far and find a way for passing arguments to decorators themselves: 1 >>> def metadecorator(): 2 ... print("metadecorator") 3 ... 4 ... def mydecorator(target): 5 ... print("mydecorator") 6 ... 7 ... def wrapper(): 8 ... print("wrapper") 9 ... return target() 10 ... 11 ... return wrapper 12 ... 13 ... return mydecorator 14 ... 15 ... 16 >>> brandnewdecorator = metadecorator() 17 metadecorator 18 >>> def target_function(): 19 ... print("decorated function") 20 ... 21 ... 22 >>> decorated_target_function = brandnewdecorator(target_function) 23 mydecorator 24 >>> decorated_target_function() 25 wrapper 26 decorated function This example is basically to show what is called when and how often.
Next, the decorator itself (e.g. Last but not least, how often is the wrapper function/method called?
It is called every time our decorated object
( 27 >>> decorated_target_function = metadecorator()(target_function) 28 metadecorator 29 mydecorator 30 >>> decorated_target_function() 31 wrapper 32 decorated function Oh mei, what have we done?! Exactly the same as with lines 16 to 26, just shorter/smarter. However, we can be even smarter, just like Shakespeare used to say, back in the good old days:
Brevity is the soul of wit. 33 >>> @metadecorator() 34 ... def target_function(): 35 ... print("decorated function") 36 ... 37 ... 38 metadecorator 39 mydecorator 40 >>> target_function() 41 wrapper 42 decorated function So now, after two iterations, we ended up with what can be seen in
lines 33 to 43, all thanks to the fact decorators are in fact just
functions and that line 33 actually does two calls... remember, both,
Passing Arguments to the Decorator and the Target FunctionLet us now extend on the above example and do what we came here to do, passing arguments to a decorator as well as the target function: 43 >>> def metadecorator(*args, **kwargs): 44 ... print("metadecorator") 45 ... print(args) 46 ... print(kwargs) 47 ... 48 ... def mydecorator(target): 49 ... print("mydecorator") 50 ... print(args) 51 ... print(kwargs) 52 ... 53 ... def wrapper(*args, **kwargs): 54 ... print("wrapper") 55 ... print(args) 56 ... print(kwargs) 57 ... return target(*args, **kwargs) 58 ... 59 ... return wrapper 60 ... 61 ... return mydecorator 62 ... 63 ... 64 >>> @metadecorator("decfoo", "decbar", decoratorkey="decorator value") 65 ... def target_function(funcfoo, funcbar, functionkey="function value"): 66 ... print(funcfoo, funcbar, functionkey) 67 ... 68 ... 69 metadecorator 70 ('decfoo', 'decbar') 71 {'decoratorkey': 'decorator value'} 72 mydecorator 73 ('decfoo', 'decbar') 74 {'decoratorkey': 'decorator value'} 75 >>> target_function("hello", "world", functionkey="what a nice day today!") 76 wrapper 77 ('hello', 'world') 78 {'functionkey': 'what a nice day today!'} 79 hello world what a nice day today! Last but not least, let us make things a bit more dynamic i.e. use input arguments which we either dynamically compute or which we simply grab from some name that is set in some scope we can access: 80 >>> foofiz = "I am the first positional argument for the decorator" 81 >>> justbar = "and I am a cat therfore I say mew mew mew" 82 >>> fizfoo = "London" 83 >>> @metadecorator(foofiz, justbar, decoratorkey=[1, 2, 8]) 84 ... def target_function(funcfoo, funcbar, functionkey={'howuseful': "Molto utile"}): 85 ... print(funcfoo, funcbar, functionkey) 86 ... 87 ... 88 metadecorator 89 ('I am the first positional argument for the decorator', 'and I am a cat therfore I say mew mew mew') 90 {'decoratorkey': [1, 2, 8]} 91 mydecorator 92 ('I am the first positional argument for the decorator', 'and I am a cat therfore I say mew mew mew') 93 {'decoratorkey': [1, 2, 8]} 94 >>> target_function("hello", fizfoo, functionkey="what a nice day today!") 95 wrapper 96 ('hello', 'London') 97 {'functionkey': 'what a nice day today!'} 98 hello London what a nice day today! 99 >>> Nothing much to say here either as the code speaks for itself — we defined a few names in lines 80 to 82 and then used those as basic expressions in lines 83 and 94 instead of what we did before (lines 64 and 75). Finally, again, let us be reminded that decorators are called only
once, when Python does an import. We can not dynamically set arguments
afterwards i.e. when we do Best Practices
1 >>> from functools import wraps 2 >>> print(functools.wraps.__doc__) 3 Decorator factory to apply update_wrapper() to a wrapper function 4 5 Returns a decorator that invokes update_wrapper() with the decorated 6 function as the wrapper argument and the arguments to wraps() as the 7 remaining arguments. Default arguments are as for update_wrapper(). 8 This is a convenience function to simplify applying partial() to 9 update_wrapper(). 10 >>> def target(): 11 ... print("target") 12 ... 13 ... 14 >>> print(target.__name__) 15 target 16 >>> def mydecorator(target): 17 ... 18 ... def wrapper(): 19 ... print("wrapper") 20 ... return target() 21 ... 22 ... return wrapper 23 ... 24 ... 25 >>> @mydecorator 26 ... def target(): 27 ... print("target") 28 ... 29 ... 30 >>> print(target.__name__) 31 wrapper # Hm, that is not what we want 32 >>> def mydecorator(target): 33 ... 34 ... @wraps(target) 35 ... def wrapper(): 36 ... print("wrapper") 37 ... return target() 38 ... 39 ... return wrapper 40 ... 41 ... 42 >>> @mydecorator 43 ... def target(): 44 ... print("target") 45 ... 46 ... 47 >>> print(target.__name__) 48 target # much better 49 >>> Decorator Use CasesAs mentioned at the beginning of this subsection, the use cases for decorators are almost infinite. By now we are all probably longing to use them already, I am for sure. Now the big question, what can we use decorators for? It all seem so cool and powerful, but a practical example would be great of course. Well, there are <a_ton++> possibilities. Classic use cases include adding/removing responsibilities and/or capabilities to/from an object e.g. extending a function from an external library which we can not modify directly, or, fiddle things for a debug purpose because we do not want to modify things because after all, debugging is temporary. One huge use case is with regards to the DRY (Don't repeat yourself) principle as decorators allow us to extend several functions/methods with a single piece of code (the decorator) without rewriting every single one of those functions/methods: 1 >>> def benchmark(target): 2 ... import time 3 ... 4 ... def wrapper(*args, **kwargs): 5 ... starttime = time.clock() 6 ... result = target(*args, **kwargs) 7 ... print(target.__name__, time.clock() - starttime) 8 ... return result 9 ... 10 ... return wrapper 11 ... 12 ... 13 >>> def counter(target): 14 ... counter.count = 0 15 ... 16 ... def wrapper(*args, **kwargs): 17 ... counter.count += 1 18 ... result = target(*args, **kwargs) 19 ... print("{} has been used {} times".format(target.__name__, counter.count)) 20 ... return result 21 ... 22 ... return wrapper 23 ... 24 ... 25 >>> def logger(target): 26 ... 27 ... def wrapper(*args, **kwargs): 28 ... result = target(*args, **kwargs) 29 ... print("logging function {} with {} and {}".format(target.__name__, args, kwargs)) 30 ... return result 31 ... 32 ... return wrapper 33 ... 34 ... 35 >>> def reverse_string(*args, **kwargs): 36 ... print(args[0][::-1]) 37 ... 38 ... 39 >>> reverse_string("Hello World") 40 dlroW olleH 41 >>> @counter 42 ... @benchmark 43 ... @logger 44 ... def reverse_string(*args, **kwargs): 45 ... print(args[0][::-1]) 46 ... 47 ... 48 >>> reverse_string("I am on fire.") 49 .erif no ma I 50 logging function reverse_string with ('I am on fire.',) and {} 51 wrapper 0.0 52 wrapper has been used 1 times 53 >>> reverse_string("London town, place to be!") 54 !eb ot ecalp ,nwot nodnoL 55 logging function reverse_string with ('London town, place to be!',) and {} 56 wrapper 0.0 57 wrapper has been used 2 times 58 >>> reverse_string("But then nothing can beat Carinthia!!") 59 !!aihtniraC taeb nac gnihton neht tuB 60 logging function reverse_string with ('But then nothing can beat Carinthia!!',) and {} 61 wrapper 0.0 62 wrapper has been used 3 times 63 >>> reverse_string("foo", akey="a value", anotherkey=(3, 1)) 64 oof 65 logging function reverse_string with ('foo',) and {'akey': 'a value', 'anotherkey': (3, 1)} 66 wrapper 0.0 67 wrapper has been used 4 times As the code is self-explanatory, the only things worth mentioning are that at first we create three decorators, then we define and use a function, at first undecorated (lines 35 to 40) and then we use the very same function but we apply our decorators to it (lines 41 to 67). Now, we did not exactly demonstrate how to adhere the DRY principle so far... let us change that... DRY means reusing... this time our decorators... 68 >>> @counter 69 ... @benchmark 70 ... @logger 71 ... def grab_url(*args, **kwargs): 72 ... try: 73 ... import httplib2 74 ... http = httplib2.Http('/tmp/.mycache') 75 ... response, content = http.request(args[0]) 76 ... 77 ... if response.status == 200: 78 ... print("{} has a size of {} bytes".format(args[0], len(content))) 79 ... 80 ... 81 ... except ImportError: 82 ... print("failed to import httplib2") 83 ... 84 ... 85 ... 86 >>> grab_url("http://google.com") 87 http://google.com has a size of 9358 bytes 88 logging function grab_url with ('http://google.com',) and {} 89 wrapper 90 0.01 91 wrapper has been used 1 times 92 >>> grab_url("/ws/python.html") 93 /ws/python.html has a size of 647697 bytes 94 logging function grab_url with ('/ws/python.html',) and {} 95 wrapper 96 0.01 97 wrapper has been used 2 times 98 >>> grab_url("/ws/python.html") 99 /ws/python.html has a size of 647697 bytes 100 logging function grab_url with ('/ws/python.html',) and {} 101 wrapper 102 0.0 103 wrapper has been used 3 times 104 >>> Note how we can reuse our decorators for another piece of code
(function Class-based DecoratorFirst of all, this one is not to be confused with a class decorator as class decorators can be function/method or class-based i.e. a class decorator decorates a class, whether or not it is a function/method or class-based decorator does not matter. Class-based decorators are quite the same as function/method decorators with the only difference being that this time we are going to construct our decorator using a class/type rather than a function/method. Decorators are just callables, and hence can be a class/type which has
a >>> class Logger: ... def __init__(self, target): # __init__ receives target which is then used by ... self.target = target ... print("logging {}".format(self.target.__name__)) ... ... def __call__(self, *args, **kwargs): # __call__ further down ... print(args) ... print(kwargs) ... return self.target(*args, **kwargs) ... ... ... >>> def compute_squares(a, b): ... return a**b ... ... >>> compute_squares(2, 3) 8 >>> @Logger ... def compute_squares(a, b): ... return a**b ... ... logging compute_squares >>> compute_squares(2, 3) # when we call to Logger.__call__ (2, 3) {} 8 >>> Class DecoratorIntroduced with PEP 3129, this one is not to be confused with a class-based decorator because a class-based decorator is semantically the same as a function/method decorator i.e. decorator objects used to decorate a target object. A class decorator is a decorator object used to decorate a class/type-like target object i.e. its name says that it is a decorator object used to decorate a class/type-like object rather than for example a function/method object etc. A class decorator takes a class/type as its input and returns a class/type. Last note on class decorators is about the fact that we might use them instead of adapters. The main usage for class decorators is either in managing a class/type itself and/or, to intercept and thus control instance creation and instance management. Let us now look at an example where we use a class decorator to create a class registry that contains all classes/types we have: >>> registry = {} >>> def mydecorator(cls): # a function/method decorator will do ... registry[cls._clsid] = cls ... return cls # receives and returns the class/type ... ... >>> class Foo: ... _clsid = "Foo" ... ... >>> registry {} # empty; we did not use the class decorator so far >>> @mydecorator # decorating a class/type ... class Foo: ... _clsid = "Foo" ... ... >>> registry {'Foo': <class '__main__.Foo'>} >>> @mydecorator ... class FooBar: ... _clsid = "FooBar" ... ... >>> registry {'Foo': <class '__main__.Foo'>, 'FooBar': <class '__main__.FooBar'>} >>> Well, nothing much to say as the code is self-explanatory except maybe
its notable that we used a function/method -based decorator
( Object Oriented ProgrammingThere are four features/characteristics commonly present in OO (and some non-OO) languages: abstraction, encapsulation, inheritance, and polymorphism. ClassA class/type is a datatype, same as a list, tuple, dictionary etc. are
datatypes. Objects derived from one particular class/type are said to
be instances of that class/type. Using sa@wks:~$ python >>> foo = 3 >>> bar = range(3) >>> type(foo) <type 'int'> >>> type(bar) <class 'range'> >>> sa@wks:~$ object, type
class type(object): pass Let us have a closer look: >>> object.__subclasses__()[0] <type 'type'> >>> As we can see, class Foo(object): # real code would have docstrings pass class Foo(): pass class Foo: pass That is, in versions prior to Python 3 we had to use class Foo(object): pass in order to force the creation of new-style classes. Built-in TypesGo here for more information. CoercionIs the implicit conversion of an instance of one type to another
during an operation which involves two arguments of the same type. For
example, Without coercion, all arguments of even compatible types would have to
be normalized to the same value by the programmer e.g. Coercion is defined for numeric types as well as booleans. However, there is no implicit conversion between e.g. numbers and strings — a string is an invalid argument to a mathematical function expecting a number: >>> import math >>> math.floor(3.44) 3.0 >>> math.floor(1.22 + 1) 2.0 >>> math.floor("I am a string") Traceback (most recent call last): File "<input>", line 1, in <module> TypeError: a float is required >>> str() vs repr()What is the difference between BasicsBelow we can see that whenever we print out a string it comes surrounded with quotes. Other examples are numbers, which, if printed out, come with their internal representation. That is because Python prints values as they would be written in a source code file or in an interactive interpreter session, not as the user would want to see them. >>> "hello, world" 'hello, world' >>> 10000L 10000L >>> mynumber = 10000L >>> type(mynumber) <type 'long'> >>> mynumber 10000L
If we wanted to have a result that is more pleasing to the human eye,
>>> print("hello, world") hello, world >>> print(10000L) 10000 >>> print(mynumber) 10000 >>> There is a simple explanation for the observed behavior: as a user we
are not interested in whether or not the value
However, if we were another program, we would be interested in what data type is used to store our value. Therefore, internally we (need to) know about the data type/structure
used to store our value, something which our users do not care about.
The bottom line is that whenever we show information to the user,
What is actually going on here is that two different mechanisms are
used to convert values to strings — this is the moment when
There is a subtle difference however. The result of >>> foo = "2 + 2" >>> foo == eval(repr(foo)) True >>> foo == eval(str(foo)) False >>> __str__() and __repr__()
1 >>> class Topping: # real code would have docstrings 2 ... def __init__(self, name): 3 ... self.name = name 4 ... 5 ... def __repr__(self): 6 ... return '<Topping %r>' % self.name 7 ... 8 ... def __str__(self): 9 ... return self.name 10 ... 11 ... 12 ... 13 >>> my_topping_instance = Topping('cheese') 14 >>> repr(my_topping_instance) 15 "<Topping 'cheese'>" 16 >>> str(my_topping_instance) 17 'cheese' 18 >>> print(my_topping_instance) 19 cheese As can be seen from lines 16 to 19, Also, note how we get different strings depending on whether or not we
use 20 >>> class Topping: 21 ... def __init__(self, name): 22 ... self.name = name 23 ... 24 ... def __repr__(self): 25 ... return '<Topping %r>' % self.name 26 ... 27 ... 28 ... 29 >>> my_topping_instance = Topping('ham') 30 >>> repr(my_topping_instance) 31 "<Topping 'ham'>" 32 >>> str(my_topping_instance) 33 "<Topping 'ham'>" 34 >>> print(my_topping_instance) 35 <Topping 'ham'> 36 >>> __str__() on a ContainerThe For example, what would it mean, say, if Class/Type NamespaceA class/type definition creates a new namespace. Object-Oriented RelationshipsIt is important to know about the basic types of object-oriented relationships. Subclass, SuperclassWhen the objects belonging to class/type A form a subset of the objects belonging to class/type B, class A is called a superclass/supertype of class/type B. Class/Type B is then called a subclass/subtype of class/type A. This is called single inheritance. Its inheritance chain therefore looks like this on paper: A | B # B is subclassing A and like this in code: >>> class A: # real code would have docstrings ... pass ... ... >>> class B(A): ... pass ... ... >>> B.__bases__ (<class '__main__.A'>,) >>> issubclass(B, A) True >>> A more practical example... cat can be a superclass to a subclass tiger. Lion can also be a subclass/subtype of the superclass/supertype cat — both, tigers and lions are cats, sorta, bulky though. Anyhow, let us not get off-topic: shark is not a subclass/subtype of cat since obviously a shark ain't no cat... shark can have a fish class/type as its superclass/supertype. Yeah, yeah, yeah... you smarty pants, the answer is yes! You can have your tigershark as well ;-] New-Style ClassThis is the old name for the flavor of classes/types now used for all
class/type objects. In earlier Python versions, only new-style
classes/types could use Python's newer, versatile features like
__slots__, descriptors, properties, In versions prior to Python 3 we had to use class Foo(object): pass i.e. subclass What is also new and exclusive to new-style classes is that each
new-style class keeps a list of weak references to its immediate
subclasses. The >>> import numbers >>> numbers.Complex.__subclasses__() [<class 'numbers.Real'>] >>> numbers.Real.__subclasses__() [<class 'numbers.Rational'>] >>> numbers.Rational.__subclasses__() [<class 'numbers.Integral'>] >>> numbers.Integral.__subclasses__() [] This nicely shows the numbers hierarchy i.e. >>> int.__base__ <type 'object'> >>> int.__subclasses__() [<type 'bool'>] >>> bool.__subclasses__() [] >>> bool.__base__ <type 'int'> >>> object.__subclasses__()[:3] [<type 'type'>, <type 'weakref'>, <type 'weakcallableproxy'>] >>> InstanceThis is nothing special to Python but rather a general term with OOP (Object-Oriented Programming). An instance is an occurrence or a copy of an object, whether currently executing or not. Instances of a class share the same set of attributes, yet will typically differ in what those attributes contain. For example, a class The description of the class/type would itemize such attributes and define the operations or actions relevant for the class, such as increase salary or change telephone number. __slots__By default, instances of classes have a dictionary for attribute storage. This wastes space for objects having very few instance variables. The space consumption can become acute when creating large numbers of instances. The default can be overridden by defining Though popular, the technique is somewhat tricky to get right and is best reserved for rare cases where there are large numbers of instances in a memory-critical application. ExampleLet us have a quick look at how it works. First without using
>>> class Baz(list): ... pass ... ... >>> foo = Baz() >>> foo.__dict__ {} >>> foo.color = "red" >>> foo.weight = 78 >>> foo.__dict__ {'color': 'red', 'weight': 78} >>> As can be seen, we subclass the built-in list type so we automatically
get a Now let us extend on that example and provide a >>> class Baz(list): ... __slots__ = ['color', 'species'] # we only allow those attributes ... pass ... ... >>> foo = Baz() >>> foo.color = "purple" >>> foo.species = "frog" >>> foo.weight = 4 # throws exception because it is not allowed Traceback (most recent call last): File "<input>", line 1, in <module> AttributeError: 'Baz' object has no attribute 'weight' >>> foo.__dict__ # no __dict__ gets created Traceback (most recent call last): File "<input>", line 1, in <module> AttributeError: 'Baz' object has no attribute '__dict__' >>> foo.__slots__ # can be a string, iterable, or sequence of strings ['color', 'species'] >>> foo.__slots__[0] # we choose a list so indexes work 'color' >>> foo.color 'purple' >>> foo.species 'frog' >>> As mentioned, the purpose and recommended use of Class Variable vs Instance VariableFirst of all, let us clarify one thing: using the term static variable and class variable interchangeably in Python is wrong because there are no static variables in Python — at least not in the sense of C/C++/Java static variables. The closest thing to a static variable as known from C/C++/Java is a class variable — Python has class variables. Python has static methods but those are a different can of worms altogether... A code example says more than a thousand words so let us dive right in: >>> class FooBar: # real code would have docstrings ... myvar = "foo" ... ... def __init__(self, bar): ... self.myinstancevariable = bar ... ... >>> instance0 = FooBar() >>> instance1 = FooBar() >>> instance0.myvar 'foo' >>> instance1.myvar 'foo' >>> FooBar.myvar 'foo' >>> instance0.__class__.myvar 'foo' >>> At first we define a class containing a class variable Next we create two instances and check the contents of their instance
variables. We also check, via the instance, what the contents for the
class variable is when we go back from >>> instance0.myvar = "shadow class variable" >>> instance0.myvar 'shadow class variable' >>> FooBar.myvar 'foo' >>> instance0.__class__.myvar 'foo' >>> instance1.myvar 'foo' >>> What happens here is that we shadow the class variable >>> FooBar.myvar = "override class variable" >>> FooBar.myvar 'override class variable' >>> instance0.__class__.myvar 'override class variable' >>> instance1.__class__.myvar 'override class variable' >>> instance1.myvar 'override class variable' >>> instance0.myvar 'shadow class variable' >>> And now we override the class variable The important thing to remember with regards to class variables is that class variables are shared across all instances of the class and unless shadowed on the instance, are the same across all instances. MetaclassMetaclasses are classes whose instances are classes... we can think of a metaclass as blueprint used to build other/enhanced objects. Usually the metaclass defines procedures to create instances of itself and things like for example static methods. A metaclass can implement a design pattern or describe a shorthand for particular kinds of classes. Metaclasses are often used to describe frameworks. In languages such as Python, Ruby, Java, and Smalltalk, a class is also an object (in Python everything is an object remember?) thus each class is an instance of the unique metaclass, the one built-in with the language — for example, in Python each object and class is an instance of object. Abstract Class, Abstract SuperclassAn abstract class, abstract superclass, or more commonly referred to as ABC (Abstract Base Class) is a class that cannot be instantiated i.e. such a class is only meaningful if the programming language in question supports inheritance (which is the case with Python). An abstract (base) class is also always automatically a superclass from which subclasses are derived. Abstract superclasses are often used to represent abstract concepts or entities e.g. a database interface. The incomplete features of the abstract superclass are then shared by a group of subclasses which add different variations of missing pieces. For example, different database backends have the same basic set of features inherited from the same superclass (the abstract base class) but each one adds individual features based on one particular database backend. Also, note that an ABC is not automatically a metaclass itself since we usually use class MyABC(metaclass=ABCMeta): pass to create ourselves an ABC which we then subclass i.e. we do not
subclass MethodMethods are in fact just functions. Functions themselves are descriptors, so-called non-data descriptors to be precise. Methods come in various sorts such as bound and unbound methods as well as instance and class methods and a few other kinds of methods which are even more arcane in nature. Rather than functions, methods are defined inside a class body i.e. they become attributes of a class which means they live inside the class's namespace. A simple example of a class containing two methods is shown below: class Foo: """Computes Gauss variations. Nostra etiam feugiat, vitae justo. Aliquam proin urna dapibus ut, sed imperdiet morbi. """ def __init__(self, barfoo): """Initialize instances of Foo.""" pass def show_path(self, bazfoo): """Does baz on Foo.""" pass A method is a function that takes a class's instance (self) as its first parameter. This is how method calls work — a method object is just a function wrapper attached to another object which calls the function object and thereby provides information about the instance and class it was called on. Methods/Functions are DescriptorsWe now know that methods are in fact just functions. Another fact
worth noting is that functions/methods are descriptors
(non-data descriptors to be more precise; they have a The descriptor protocol specifies that during
attribute lookup/reference, if an attribute name (also known as
identifier) resolves to a class attribute and this attribute has a
Method Calls - Class vs InstanceThe return value of a method call becomes the result of the attribute lookup. This mechanism is what provides support for dynamically computed attributes — this works both ways i.e. for lookups and for setting attributes. Also, as indicated already but now in more detail, the function type implements the descriptor protocol which means that
With the instance object (if any) and class object available, it is easy to create a method object that wraps the function object. This object is itself a callable object — calling it mostly implicitly injects the instance as the first parameter in the argument list and returns the result of calling the wrapped function object. We just said mostly. That is because there are unbound methods, static methods and class methods etc. which do not implicitly inject the instance as first argument into the argument list. selfWhen a method is called, it receives the instance object on which it
is called (called self by convention) as its first argument. This
first parameter is also what binds a method to a particular instance
of a class/type. The Bound/Unbound MethodIn Python, all functions (and as such methods) are objects which can be passed around just like any other object. The key difference between bound and unbound methods is that a bound method is associated with a particular instance of a class/type while an unbound method is not. In other words: when we do an attribute access on an instance, we get a bound method whereas when we do an attribute access on a class, an unbound method is what we get. Bound MethodEvery time a bound method is called, the instance is passed to it as
its first parameter ( 1 >>> class Cat: 2 ... noise = "miau" 3 ... # method: function defined inside class body 4 ... def make_noise(self): # first parameter named self 5 ... print(self.noise) 6 ... 7 ... 8 ... 9 >>> somecat = Cat() 10 >>> somecat.make_noise() # bound method i.e. no need to pass self 11 miau 12 >>> somecat.make_noise # attribute access on instance somecat 13 <bound method Cat.make_noise of <__main__.Cat object at 0x1a24dd0>> 14 >>> quacking_duck = somecat.make_noise # note the lack of the call operator 15 >>> quacking_duck() 16 miau The key point here is with line 10. A bound method of instance The body of the method (line 5) can access the instance attributes as
attributes of Lines 12 to 16 really just show what we already know,
functions/methods are objects i.e. we can bind to them like we can
bind to any other object. Thus, even though the method call in line 15
looks exactly like a function call, the variable/name Unbound MethodThe concept of unbound methods has been removed from the language in Python 3 meaning that when referencing a method as a class attribute, we now get a function in return: >>> import platform >>> platform.python_version() '3.2.1rc1' >>> class Foo: ... pass ... ... >>> def bar(self): ... pass ... ... >>> bar <function bar at 0x2bf41e8> >>> Foo.bar = bar >>> Foo.bar <function bar at 0x2bf41e8> # function >>> Foo.bar is bar True >>> id(Foo.bar) 46088680 >>> id(bar) 46088680 >>> As said, this was different with Python versions prior to Python 3: >>> import platform >>> platform.python_version() '2.6.7' >>> class Foo: ... pass ... ... >>> def bar(self): ... pass ... ... >>> bar <function bar at 0x1a7a5f0> >>> Foo.bar = bar >>> Foo.bar <unbound method Foo.bar> # unbound method >>> Foo.bar is bar False >>> id(Foo.bar) 25218176 >>> id(bar) 27764208 >>> Even though there are no more unbound methods in Python 3, we should
probably know about them: An unbound method is not associated with a
particular instance which means that there is no implicit It is fair to say that unbound methods were used far less frequently than bound methods — most of the time bound methods were just what we needed in Python 2, plus their behavior seemed more intuitive to most people. However, unbound methods certainly were useful in some cases e.g. when we needed to access overridden methods higher up in the inheritance chain. Extending on the example from above: We have a 17 >>> Cat.make_noise() # lacking explicit argument (instance) 18 Traceback (most recent call last): 19 File "<input>", line 1, in <module> 20 TypeError: make_noise() takes exactly 1 argument (0 given) 21 >>> Cat.make_noise(somecat) 22 miau 23 >>> Cat.make_noise 24 <function make_noise at 0x1abfa68> # Python 3, a function 25 >>> Cat.__dict__['make_noise'] 26 <function make_noise at 0x1abfa68> 27 >>> somename = Cat.make_noise 28 >>> somename(somecat) 29 miau 30 >>> As can be seen in line 21, as opposed to line 10 from above, we now
call the method on the class/type rather than on the instance i.e. the
method is not associated with a particular instance. Earlier we said
that with bound methods the instance the method is called on is passed
implicitly using What is really happening in case we reference a method via some class/type object can be seen in lines 23 to 26 — Python 2 returns an unbound method which in fact is just a wrapper to the function object, or, in Python 3, the function object is returned right away as shown above. In Python 2, this wrapper, in addition to the function it wraps, takes
additional read-only attributes: Lines 27 to 29 show that we can call a callable class attribute, be it
an unbound method or just a method, the same way we call its Static MethodA static method does not receive an implicit first argument (such as
for example This means the special behavior/constraints (such as present with ordinary, bound and unbound methods) with regards to the first parameter does not affect us. We can call static methods on a class or any instance thereof, and no implicit special behavior/constraint will be involved in doing so. A static method may have any signature — from no formal parameter at all up to many, and the first parameter, if any, plays no special role. Basically, a static method is like an ordinary function except that it is bound to a class object. Static methods are a way of putting behavior (i.e. code e.g. a function) into a class (e.g. because it logically belongs there), while indicating that it does not require access to the class. For example, a static method may be used for processing class attributes that span instances. So what is all this fuzz about? What are the use cases that have real
practical value? Well, for example, sometimes we just do not want our
class to automatically make a function a method, even if we put the
function inside the class body. That is where the >>> class Foo: ... @staticmethod ... def bar(): # no first parameter ... pass ... ... ... >>> baz = Foo() >>> Foo.__dict__['bar'].__get__(baz, Foo) <function bar at 0x1ab8050> # a function rather than a bound method >>> Last but not least, let us note that a static method of course has nothing to do with what is wrongly labeled static variables. I just mention this totally unrelated fact here because some people think there is a connection between the two and the concept of static variables exists... neither one is the case plus there is no such thing as static variables (see link above). Class MethodA class method is a method that is called on a class or any instance
thereof. With class methods Python implicitly binds the first
parameter (named In other words: rather than receiving the instance as its first argument, a class method receives the class as its implicit first argument. Class methods are most useful for when we need to have methods that are not specific to any particular instance of a class, but still involve the class in some way. >>> class Foo: # real code would have docstrings ... @classmethod ... def bar(cls): # first parameter named cls ... pass ... ... ... >>> baz = Foo() >>> Foo.__dict__['bar'].__get__(baz, Foo) <bound method type.bar of <class '__main__.Foo'>> # on the class >>> baz.bar <bound method type.bar of <class '__main__.Foo'>> # on an instance of the class >>> In both attribute reference cases we can see that Python now implicitly references the class rather than a) nothing (as seen with the static method example) or b) the instance of the class (as seen with the bound method example). Class Method vs Static MethodThe difference between a class method and a static method is the
presence/lack of the implicit first argument — a static method has
none whereas a class method receives the class as its first argument
(named Class Method vs Bound MethodThe difference between a class method and a bound method is with the
implicit first argument — a bound method receives the instance as its
first argument (named Abstract Method
WRITEME Special MethodsThose are methods which are implicitly called by Python itself in
order to execute a certain operation on an object of particular type,
such as addition or initialization for example. Special methods have
names starting and ending with double underscores e.g. Those are just a few of the many special methods with which we can determine the semantics of objects in Python. Typical use cases for special methods involve:
Method StubPlease go here for further information. AbstractionWRITEME PolymorphismPolymorphism (also known as duck-typing) enables one common interface for many implementations, and for objects to act differently under different circumstances. WRITEME EncapsulationWRITEME InheritanceIf we do Foo.name i.e. access the attribute However, although MRO makes it sound as if this is only happening for methods, it is important to understand that the name MRO exists for historical reasons only and that in fact every attribute lookup (not just methods) works in MRO order. Maybe nowadays we would choose the name ARO (Attribute Resolution Order) instead of MRO, as ARO would express correctly what is really going on these days. Inheritance ChainTwo possibilities exist with regards to inheritance — single and multiple inheritance. The main difference is that with single inheritance, the most complex inheritance chain we can end up with is a tree whereas by using multiple inheritance we can end up with a so-called DAG (Directed Acyclic Graph). Single InheritanceSingle inheritance is the trivial case of >>> class Foo: ... pass ... ... >>> class Bar(Foo): ... pass ... ... >>> class Fiz(Bar): ... pass ... ... >>> Fiz.__bases__ (<class '__main__.Bar'>,) >>> Bar.__bases__ (<class '__main__.Foo'>,) >>> Foo.__bases__ (<class 'object'>,) >>> Every class/type only ever has a single superclass/supertype. At the
beginning is object, next our first class/type ( Multiple InheritanceThe only thing in addition to single inheritance that multiple inheritance allows for is one class/type to have more than one superclass/supertype. If we extend on the example of single inheritance from above: >>> class Foo: ... pass ... ... >>> class Bar(Foo): ... pass ... ... >>> class Buz(Foo): ... pass ... ... >>> class Fiz(Bar, Buz): ... pass ... ... >>> Fiz.__bases__ (<class '__main__.Bar'>, <class '__main__.Buz'>) >>> Buz.__bases__ (<class '__main__.Foo'>,) >>> Bar.__bases__ (<class '__main__.Foo'>,) >>> Foo.__bases__ (<class 'object'>,) >>> As can be seen, we added object | Foo / \ Bar Buz \ / Fiz As we will see, this is the reason why Python and many other programming languages introduced the so-called C3 MRO — it can cope with diamond-shaped relationships which can form within a DAG (Directed Acyclic Graph). Method/Attribute Resolution OrderNowadays every object is a subclass/subtype of object, something which caused problems with attribute lookup the way it was done before C3 MRO was implemented. ProblemIn the classic object model, attribute access among direct and indirect superclasses/supertypes proceeds left-first then depth-first. While very simple, this rule caused undesired results when multiple superclasses/supertypes inherited from the same superclass/supertype (diamond-shaped inheritance chain) and shadow/override different subsets of the common superclass/supertype attribute(s). In this case, the shadowed/overridden attributes of the rightmost superclass/supertype would be hidden in the lookup process. For example, if A subclasses B and C in that order, and B and C each subclass D, the classic lookup process proceeds in the conceptual order A, B, D, C, D. Since Python looked up D before C, any attribute defined in class/type D, even if class/type C overrides it, is therefore found only in the superclass/supertype D version. SolutionIn the new-style object model all classes/types directly or indirectly subclass object. Therefore, any multiple inheritance chain might give us diamond-shaped inheritance graphs and thus the classic MRO would often produce problems as well. That is why Python's new-style object model changed the MRO to left-to-right then depth-first order and it would also leave out any but the rightmost occurrence of any given class/type — using super() is just one example where this becomes an important fact as we will see later on. Extending on the example from the previous paragraph, D is now assumed
to be a new-style class i.e. it is subclassing
___mro___Here is the code to prove that what we just discussed is true: >>> class D: ... pass ... ... >>> class B(D): ... pass ... ... >>> class C(D): ... pass ... ... >>> class A(B, C): ... pass ... ... >>> D.__bases__ (<class 'object'>,) # D is a new-style class/type >>> A.__bases__ (<class '__main__.B'>, <class '__main__.C'>) >>> A.__mro__ (<class '__main__.A'>, # left-to-right then depth-first <class '__main__.B'>, <class '__main__.C'>, <class '__main__.D'>, <class 'object'>) >>> Now what is Shadowing/Overriding AttributesIn Python we can shadow/override any type of attribute, whether it is a callable or just a simple literal. MRO is important when it comes to shadowing/overriding attributes during inheritance. Examples and explanations are gives here and here. Delegating Calls to Superclass/SupertypeQuite often we want to delegate calls to a callable (e.g. a method) from a subclass/subtype to its superclass/supertype because the very same callable might have been shadowed/overridden in the subclass/subtype. Doing so is made easy using an unbound method: >>> class Foo: ... def greet(self, *args, **kwargs): ... print("Hello {}!".format(args[0])) ... ... ... >>> class Bar(Foo): ... def greet(self, *args, **kwargs): ... print("inside Bar.greet") ... Foo.greet(self, *args, **kwargs) # using an unbound method with explicit self ... ... ... >>> baz = Bar() >>> baz.greet("World") inside Bar.greet Hello World! # we delegated the call from Bar.greet to Foo.greet >>> Bar.__bases__ (<class '__main__.Foo'>,) >>> Bar.__mro__ (<class '__main__.Bar'>, <class '__main__.Foo'>, <class 'object'>) >>> With this example we used an unbound method in For example, delegating from a subclass/subtype's >>> class Foo: ... def __init__(self, *args, **kwargs): ... self.name = kwargs['name'] ... >>> class Bar(Foo): ... def __init__(self, *args, **kwargs): ... self.type = kwargs['type'] ... ... ... >>> Bar.__mro__ (<class '__main__.Bar'>, <class '__main__.Foo'>, <class 'object'>) >>> fiz = Bar(type="cat", name="mister") >>> fiz.__dict__ {'type': 'cat'} # poor cat, it does not even have a name >>> What happened? What is the problem? Well, the problem is that although
Ideally, in nine out of ten cases, what we want is that when we
instantiate What we could do is to simply use our unbound method trick again and
rewrite >>> class Bar(Foo): ... def __init__(self, *args, **kwargs): ... Foo.__init__(self, *args, **kwargs) ... self.type = kwargs['type'] ... ... ... >>> fiz = Bar(type="cat", name="mister") >>> fiz.__dict__ {'name': 'mister', 'type': 'cat'} # mister! :-] >>> superNow, using unbound methods like this works but is not quite versatile
plus hardcoding class/type names certainly is unpythonic. >>> class Bar(Foo): ... def __init__(self, *args, **kwargs): ... super().__init__(*args, **kwargs) # no more hardcoding class/type names ... self.type = kwargs['type'] ... ... ... >>> foz = Bar(type="cat", name="mister") >>> foz.__dict__ {'name': 'mister', 'type': 'cat'} >>> What happened? Well, There are two typical use cases for super()
Using super() is recommended In general it is considered good practice of always doing calls to
superclasses/supertypes using Using CompositionThis is basically about combining basic data types into more complex ones — assembling features/functionality, very much like what mixins can be used for. Composition creates objects often referred to as having a has a relationship e.g. car has a gearbox — this is different to what inheritance does. Inheritance is the process of adding details to a basic data type in order to create a more specific one i.e. inheritance creates is a kind of relationships e.g. car is a kind of vehicle. Whether to chose inheritance or composition depends on particular use case at hand. Inheritance is a more commonly-understood idea. Asking a typical developer about composition will most likely result in some mumbling and deflection, whereas the same question about inheritance will probably reveal a whole host of opinions and experience. That is not to say that composition is some sort of dark art, but simply that it is less commonly talked about and even less often used. As more of a sidenote than anything else, inheritance can be speedier in some compiled languages due to some compile-time optimizations vs. the dynamic lookup that composition requires. Of course, in Java we cannot escape the dynamic method lookup, and in Python it is all a moot point. In the end, both, inheritance and composition, cater to the same problem domain (building complex data types from/with simpler ones) but achieve such through different ways. Composition creates objects often referred to as having a has a relationship (car has a gearbox) whereas inheritance is the process of adding detail to a general data type to create a more specific data type i.e. create a is a kind of relationship e.g. car is a kind of vehicle. The basic messages is that, as with many things in life, there is not better as both ways depend on the use case in hand and more often than not, personal preference of the individual programmer or the team writing some piece of code. ExampleBefore talking about the consequences of inheritance vs composition, some simple examples of both are needed. Here is a simplistic example of object composition: class UserDetails: """A class that compiles blabla.""" email = "[email protected]" homepage = "" class User: """A class used to store blabla.""" first_name = "Markus" last_name = "Gattol" details = UserDetails() Obviously these are not very useful classes, but the essential point
is that we have created a namespace for each An example of the same objects, modified to use inheritance might look as follows: class User: """A class in charge of blabla.""" first_name = "Markus" last_name = "Gattol" class UserDetails(User): """A class blabla.""" email = "[email protected]" homepage = "" Now we have a flat namespace, which contains all of the attributes
from both of the objects. In the case of any collisions, Python will
take the attribute from ConsequencesFrom a pure programming language complexity standpoint, object composition is the simpler of the two methods. In fact, the word object may not even apply here, as it is possible to achieve this type of composition using C structures, which are clearly not objects in the sense that we think of them today. Another immediate thing to notice is that with composition, there is no possibility of namespace clashes. There is no need to determine which attribute should win, between the object and the composed object, as each attribute remains readily available. The composed object has no knowledge about its containing class, so it
can completely encapsulate its particular functionality. This also
means that it cannot make any assumptions about its containing class,
and the entire scheme can be considered less brittle. Change an
attribute or method on That being said, object inheritance is arguably more straightforward.
After all, an e-mail address is not a logical property of some
real-world object called ConclusionMost people using both find object composition to be desirable. The reasons seems to be that many projects get incredibly (and unnecessarily) confusing due to complicated inheritance hierarchies. However, there are some cases where inheritance simply makes more sense logically and programmatically. These are typically the cases where an object has been broken into so many subcomponents that it does not make sense any more as an object itself. The Django web-framework for example has an interesting way of dealing with model inheritance. It uses composition behind the scenes, and then flattens the namespace according to typical inheritance rules. However, this is done in a way so that composition still exists under the covers which still allows composition to be used if needed/desired. The answer is not going to be composition always or inheritance always or even any combination of the two, always, or even something similar but not quite the same such as traits. Each has its drawbacks and advantages and those should be considered before choosing any approach. More research needs to be done on the hybrid approaches, as well, because things like what Django is doing will provide more answers to more people than traditional approaches. ComprehensionRoughly speaking, comprehension denotes mathematical notation used to represent infinite mathematical constructs. Comprehensions, with regards to programming languages, are most closely associated with Haskell, but are available in other languages such as Python, Scheme and Common Lisp as well. One type of comprehension found in Python is list comprehension. List comprehension is greedy evaluation i.e. it computes the entire result all at once, as a list. Generator expressions on the other hand do lazy evaluation i.e. they computes one value at a time, when needed, as individual values. This is especially useful for long/big sequences where the computed list is just an intermediate step and not the final result. Below are some examples of the types of comprehensions found in Python: 1 >>> [n * n for n in range(5)] # list comprehension, greedy evaluation 2 [0, 1, 4, 9, 16] 3 >>> {n * n for n in range(5)} # set comprehension 4 {0, 1, 4, 16, 9} 5 >>> {n: n * n for n in range(5)} # dictionary comprehension 6 {0: 0, 1: 1, 2: 4, 3: 9, 4: 16} 7 >>> mygenerator = (n * n for n in range(5)) # generator expression, lazy evaluation 8 >>> mygenerator.__next__() 9 0 10 >>> mygenerator.__next__() 11 1 12 >>> mygenerator.__next__() 13 4 14 >>> mygenerator.__next__() 15 9 16 >>> mygenerator.__next__() 17 16 18 >>> mygenerator.__next__() 19 Traceback (most recent call last): 20 File "<stdin>", line 1, in <module> 21 StopIteration 22 >>> As we can see, the generator expression returns an iterator for lazy
evaluation (lines 8 to 17). Once the iterator is exhausted because
there is no more data available from the stream, the List ComprehensionList comprehension is a way of creating lists from sequences e.g. other lists. In general, list comprehensions work quite similar to for loops. Common applications are to make lists where each item in the list is the result of some operations applied to each item of a sequence, or, to create a subsequence of those items that satisfy a certain condition. >>> somecontainer = [number * number for number in range(10)] >>> print(somecontainer) [0, 1, 4, 9, 16, 25, 36, 49, 64, 81] >>> sum(somecontainer) 285 >>> type(somecontainer) <class 'list'> >>> somecontainer = [number * number for number in range(10) if number % 2] >>> print(somecontainer) [1, 9, 25, 49, 81] >>> In case we are only interested in the sum i.e. we do not need the intermediate list of squares, it is smarter to use a generator expression expression as it lazily produces values, one at a time.
Set ComprehensionPython 3 introduces set comprehensions. Similar in form to list comprehensions, set comprehensions generate Python sets instead of lists: >>> {char for char in "ABCDABCD"} {'A', 'C', 'B', 'D'} >>> myset = {char for char in "ABCDABCD" if char not in "AD"} >>> print(myset) {'C', 'B'} >>> type(myset) <class 'set'> >>> Dictionary ComprehensionAlso with Python 3, we got another nifty type of comprehension, namely dictionary comprehension. Dict comprehensions can be used to create dictionaries from arbitrary key/value expressions: >>> {key: value for key, value in enumerate("ABCD")} {0: 'A', 1: 'B', 2: 'C', 3: 'D'} >>> somecontainer = {key: value for key, value in enumerate("ABCD") if value not in "CB"} >>> print(somecontainer) {0: 'A', 3: 'D'} >>> type(somecontainer) <class 'dict'> >>> {key: pow(2, key) for key in (1, 2, 4, 6)} {1: 2, 2: 4, 4: 16, 6: 64} >>> Here is a trick with dictionary comprehensions that might be useful someday — swapping the keys and values of a dictionary: >>> somecontainer = {'a': 1, 'b': 3, 'c': "foo"} >>> somecontainer.keys() dict_keys(['a', 'c', 'b']) >>> somecontainer.values() dict_values([1, 'foo', 3]) >>> somecontainer = {value: key for key, value in somecontainer.items()} >>> somecontainer.keys() dict_keys([3, 1, 'foo']) >>> somecontainer.values() dict_values(['b', 'a', 'c']) >>> Of course, this only works if the values of the dictionary are immutable e.g. like strings, numbers or tuples for example. If we try this with a dictionary which contains lists (which we know are mutable sequences), then this will fail because a dictionary can not have mutable types as its keys: >>> somecontainer = {'a': 1, 'b': 3, 'c': ["a", "foo"]} >>> somecontainer.keys() dict_keys(['a', 'c', 'b']) >>> somecontainer.values() dict_values([1, ['a', 'foo'], 3]) >>> somecontainer = {value: key for key, value in somecontainer.items()} Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 1, in <dictcomp> TypeError: unhashable type: 'list' >>> IteratorIn Python iterators are everywhere, underlying everything, always just out of sight. For example, comprehensions and generators are one way to create
iterators. Another example are An iterator is an object representing a stream of data made from
iterable objects e.g. characters from a string (in Python 3 a string
is a sequence of unicode characters). Repeated calls to the iterator's
>>> myiterator = iter("big dog") >>> type(myiterator) <class 'str_iterator'> >>> next(myiterator) 'b' >>> next(myiterator) 'i' >>> next(myiterator) 'g' >>> next(myiterator) ' ' >>> next(myiterator) 'd' >>> next(myiterator) 'o' >>> next(myiterator) 'g' >>> next(myiterator) Traceback (most recent call last): File "<input>", line 1, in <module> StopIteration >>> When there is no more data available from the stream, a StopIteration
exception is raised. At this point, the iterator object is exhausted
and any further calls to its Iterators are required to have an One notable exception is code which attempts multiple iteration
passes. A container object (such as a list) produces a fresh new
iterator each time we pass it to the Now that we know about __prev__(), __current__()Many people would like to have additional functionality with regards
to iterators e.g. itertoolshow to flatten a nested lists
IterableA container object capable of returning its items one at a time rather than all at once. Examples of iterable objects include sequence types such as Iterables in for loopsIterables can be used in a for loop and in many other places where a
sequence is needed ( When using iterable objects, it is usually not necessary to call
all()Is the iterable empty? If not, are all items of the iterable true? How do we test for an empty iterable i.e. one that does not contain any items? We could do this >>> myiterable = list(range(1, 4)) >>> myiterable [1, 2, 3] >>> def all(iterable): ... for item in iterable: ... if not item: ... return False ... return True ... ... >>> all(myiterable) # all items are true... True >>> all([]) #... or iterable is empty True >>> or... we could be smarter and just use the built-in function >>> all <function all at 0x7ff2e1a09af0> # our own all() >>> del all >>> all <built-in function all> # now we have the built-in again >>> all(myiterable) True >>> all([]) True >>> Note that we need to use As can be seen, >>> myiterable = list(range(4)) >>> myiterable [0, 1, 2, 3] # 0 evaluates to false in a boolean context >>> all(myiterable) False >>> When we start combining things like for example generator expressions
and >>> myiterable = range(1, 4) >>> list(myiterable) [1, 2, 3] >>> all(number != 0 for number in myiterable) # true if all items of myiterable are non-zero True >>> myiterable = range(4) >>> all(number != 0 for number in myiterable) False >>> In this case we check against the integer any()Is the iterable empty? If not, does it have at least one item that
is true? While all() is useful, Remember that If In short: >>> def any(iterable): ... for item in iterable: ... if item: ... return True ... return False ... ... >>> myiterable = list(range(1)) >>> myiterable [0] # 0 evaluates to false in a boolean context >>> any(myiterable) False >>> any([]) False >>> myiterable = list(range(2)) >>> myiterable [0, 1] >>> any(myiterable) True >>> Now, let us unshadow (remove the binding of the name >>> any <function any at 0x7ff2e1a176b0> # our own any() >>> del any >>> any <built-in function any> # now we have the built-in again >>> any(myiterable) True >>> myiterable = "" # an empty string is an empty iterable too >>> any(myiterable) False >>> When we start combining things like for example generator expressions
and >>> myiterable = range(5) >>> list(myiterable) [0, 1, 2, 3, 4] >>> any(number > 3 for number in myiterable) # true if any item of myiterable is > 3 True >>> myiterable = range(4) >>> any(number > 3 for number in myiterable) False >>> DescriptorAs for everything else in Python, a descriptor is an object too. A python object is said to be a descriptor if it implements the the so-called descriptor protocol. In other words: An object which defines any of the
To make a read-only data descriptor we can define both special
methods, Because descriptors are a powerful, general purpose protocol, understanding the descriptor protocol is key to understanding Python's innards because so many things in Python are based on it e.g. functions, methods, properties, class methods, static methods, and references to superclasses/supertypes, etc. Descriptors are used throughout Python itself to implement new style classes introduced in version 2.2, they also simplify the underlying C-code and offer a flexible set of new tools for everyday Python programs. Descriptor ProtocolAny object with a Such an object qualifies as a descriptor and can be placed inside a
class/type's or instance's Creating a DescriptorLet us now have a look at how to create a descriptor: >>> class BazBar: # real code would have docstrings ... def __get__(self, instance, owner=None): ... print("calling __get__()") ... ... def __set__(self, instance, value): ... print("calling __set__()") ... ... def __delete__(self, instance): ... print("calling __delete__()") ... ... ... >>> The methods making up the descriptor protocol only apply when an
instance of the class/type containing such method appears in an
owner's class/type (see
Using a DescriptorWhat we defined with 1 >>> class FooBaz: # the owner class/type 2 ... bar = BazBar() # attribute bar is now a descriptor 3 ... 4 ... 5 >>> niznoz = FooBaz() 6 >>> dir(FooBaz) 7 ['__class__', 8 '__delattr__', 9 '__dict__', 10 11 12 [skipping a lot of lines...] 13 14 15 '__str__', 16 '__subclasshook__', 17 '__weakref__', 18 'bar'] 19 >>> FooBaz.__dict__['bar'] 20 <__main__.BazBar object at 0x32a0d50> 21 >>> FooBaz.bar 22 calling __get__() 23 >>> dir(niznoz) 24 ['__class__', 25 '__delattr__', 26 '__dict__', 27 28 29 [skipping a lot of lines...] 30 31 32 '__weakref__', 33 'bar'] # bar is an attribute on instance niznoz and not 34 >>> niznoz.__dict__ 35 {} # a key in niznoz's __dict__ 36 >>> niznoz.bar 37 calling __get__() 38 >>> niznoz.bar = "setting a value by assignment" 39 calling __set__() 40 >>> niznoz.__dict__ 41 {} 42 >>> niznoz.__dict__['bar'] = "force-set a value" # a different bar; not our descriptor 43 >>> niznoz.__dict__ 44 {'bar': 'force-set a value'} 45 >>> niznoz.bar # accessing descriptor bar via instance 46 calling __get__() 47 >>> del niznoz.bar 48 calling __delete__() 49 >>> FooBaz.bar # accessing descriptor bar via class/type 50 calling __get__() 51 >>> FooBaz.bar = "set/replace value for key bar" 52 >>> FooBaz.bar 53 'set/replace value for key bar' # descriptor replaced on owner's class/type but 54 >>> FooBaz.__dict__['bar'] 55 'set/replace value for key bar' 56 >>> niznoz.bar # because bar is an instance variable it is 57 calling __get__() # not replaced on its instance 58 >>> fuzfiz = FooBaz() 59 >>> fuzfiz.bar # bar on fuzfiz still is the class variable 60 'set/replace value for key bar' 61 >>> First thing to note is with line 2 where we instantiate our descriptor
class/type The circle is closed in line 5 where we instantiate the owner
class/type, thereby creating an object that has an attribute Lines 6 to 37: If we now take a closer look at our owner class/type
As we can see from line 38, setting/updating a value for For the sake of completeness, lines 47 and 48 show how to delete
attribute If we compare line 49 to e.g. line 45 then we can see how attribute
access can either happen trough the class or trough the instance. Note
however that when accessed from the owner class/type itself, only the
One last important thing to note is that descriptors only work when
attached to classes/types (e.g. Invoking DescriptorsA descriptor is an object attribute with binding behavior i.e. any
attribute on an object whose attribute access has been overridden by
methods in the descriptor protocol ( When an object's attribute is a descriptor, its special binding
behavior is triggered upon attribute access. For example, with an
instance However, if during attribute access it turns out that Anyhow, the important point to remember is that the starting point for
descriptor invocation is a binding such as
For instance bindings, the precedence of descriptor invocation depends
on the which descriptor methods are defined. A descriptor can define
any combination of If it does not define Normally, data descriptors define both Python methods (including static methods and class methods) are implemented as non-data descriptors. Accordingly, instances can redefine and shadow/override methods. This allows individual instances to acquire behaviors that differ from other instances of the same class. Properties on the other hand are data descriptors. Accordingly, instances cannot override the behavior of a property. Faith of Descriptors is with __getattribute__()We already know that __getattribute__() is used to
customize attribute access. The main thing to know about it is that it
is called unconditionally. In addition to what is known about
Extending on the example from above... The details of invocation
depend on whether
Summary
We now know that the mechanism for descriptor calls is embedded in the
Shadowing/OverridingOverall attribute lookup/reference semantics not only depends on inheritance and thus MRO (Method Resolution Order) but is also determined by which type of descriptor is involved i.e. data descriptors and non-data descriptors differ in how shadowing/overriding them works with respect to entries in an instance's dictionary and also how the overall faith of a descriptor is determined.
Python methods (including static methods and class methods) are implemented as non-data descriptors. Accordingly, instances can rebind and thereby shadow/override methods. This allows individual instances to acquire behaviors that differ from other instances of the same class/type. The property() built-in function is implemented as a data descriptor i.e. instances cannot shadow/override the behavior of a property. Data DescriptorAny object that has both, a PropertyA property is a data descriptor, created using the BenefitsReadability is increased by eliminating explicit In terms of performance, its possible to bypass properties using trivial accessor methods which directly access attributes. This also allows accessor methods to be added in the future without breaking the interface. However, one should keep in mind that such practices of bypassing a property are generally frowned upon because they harm readability and simplicity of our source code. DownsidesLooking at source code, we will see that a property function is textually specified after its getter and setter methods, requiring one to notice they are used for a property further down. That of course is not true when using the @property decorator which goes before the getter method (named after the attribute itself) and is used to implement read-only properties. If we are not careful/diligent we might manage to hide side-effects using properties, much like mistakes made with regards to operator overloading. For example, inheritance with properties can be non-obvious if the property itself is not shadowed/overridden. In this case we must make sure that getter methods are called indirectly to ensure methods shadowed/overridden in subclasses/subtypes are called by the property. ExampleIt is recommended to use properties to get or set attributes where we would normally have used getter and setter methods. Read-only properties should be created using the @property decorator. 1 >>> class Foo: # real code would have docstrings 2 ... def get_bar(self): 3 ... return self.__bar 4 ... 5 ... def set_bar(self, value): 6 ... self.__bar = value 7 ... 8 ... def del_bar(self): 9 ... del self.__bar 10 ... 11 ... bar = property(get_bar, set_bar, del_bar, "docstring... lorem ipsum") 12 ... 13 ... 14 ... 15 >>> Foo.bar 16 <property object at 0x1e2f260> 17 >>> Foo.__dict__ 18 <dict_proxy object at 0x1e21130> 19 >>> Foo.__dict__['bar'] 20 <property object at 0x1e2f260> 21 >>> type(Foo.__dict__['bar']) 22 <class 'property'> 23 >>> fiz = Foo() 24 >>> fiz.bar 25 Traceback (most recent call last): 26 File "<input>", line 1, in <module> 27 File "<input>", line 3, in get_bar 28 AttributeError: 'Foo' object has no attribute '_Foo__bar' 29 >>> fiz.bar = 3 30 >>> fiz.bar 31 3 32 >>> del fiz.bar 33 >>> fiz.bar 34 Traceback (most recent call last): 35 File "<input>", line 1, in <module> 36 File "<input>", line 3, in get_bar 37 AttributeError: 'Foo' object has no attribute '_Foo__bar' 38 >>> fiz.bar = range(4) 39 >>> fiz.bar 40 range(0, 4) 41 >>> type(fiz.bar) 42 <class 'range'> 43 >>> list(fiz.bar) 44 [0, 1, 2, 3] 45 >>> Foo.bar.__doc__ 46 'docstring... lorem ipsum' 47 >>> A property provides an easy way to call functions whenever an attribute is accessed (referenced, set or deleted) on the instance. When the attribute is referenced from the class/type, the getter method is not called but the property object itself (lines 15 to 22) is returned. A docstring can also be provided as can be seen in lines lines 11 as well as 45 and 46 respectively. One might have noticed that we used a particular notation (lines 3, 6
and 9) for naming our attribute A final word about inheritance with regards to properties: subclassing
a class/type containing a property (e.g. Read-only/Non-deletable PropertiesWe already know that by default a property is always a data-descriptor. However, as outlined in the beginning, in order to make a read-only data descriptor the only thing we need to do is to not defining all its special methods: >>> class Baz: ... def get_it(self): ... pass ... ... def set_it(self, value): ... pass ... ... def del_it(self): ... pass ... ... full = property(get_it, set_it, del_it, "I can be referenced, set, and deleted.") ... nodelete = property(get_it, set_it, "I can be referenced, set, but not deleted.") ... readonly = property(get_it, "I can be referenced but not set or deleted.") ... ... >>> foofiz = Baz() >>> del foofiz.full >>> del foofiz.nodelete Traceback (most recent call last): File "<input>", line 1, in <module> TypeError: 'str' object is not callable >>> foofiz.readonly = 4 Traceback (most recent call last): File "<input>", line 1, in <module> TypeError: 'str' object is not callable >>> foofiz.full = 4 >>> Nothing to say here as the example is self-explanatory... @propertyNow that we know how to create a read-only property why not go further
and use the pythonic way to do so. Doing read-only properties the
pythonic way means using a decorator. Meet >>> class Maus: # real code would have docstrings ... def __init__(self): ... self._enemy = "Katze" ... ... @property ... def enemy(self): ... """Return Maus's worst enemy.""" # automatically becomes docstring ... return self._enemy ... ... ... >>> baz = Maus() >>> baz.enemy 'Katze' >>> Maus.enemy.__doc__ "Return Maus's worst enemy." >>> This turns the More DecoratorsA property object has getter, setter, and deleter methods usable as decorators that create a copy of the property with the corresponding accessor function set to the decorated function. This is best explained with an example: 1 >>> class Bar: 2 ... def __init__(self): 3 ... self.__foo = None 4 ... 5 ... @property 6 ... def foo(self): 7 ... """I am the 'foo' attribute.""" 8 ... return self.__foo 9 ... 10 ... @foo.setter 11 ... def foo(self, value): 12 ... self.__foo = value 13 ... 14 ... @foo.deleter 15 ... def foo(self): 16 ... del self.__foo 17 ... 18 ... 19 ... 20 >>> Bar.foo.__doc__ 21 "I am the 'foo' attribute." 22 >>> Bar.foo.fget 23 <function foo at 0x1ea2ea8> 24 >>> Bar.foo.fset 25 <function foo at 0x1ea2d98> 26 >>> Bar.foo.fdel 27 <function foo at 0x1ead050> 28 >>> Bar.foo.getter 29 <built-in method getter of property object at 0x1eaa158> 30 >>> Bar.foo.setter 31 <built-in method setter of property object at 0x1eaa158> 32 >>> Bar.foo.deleter 33 <built-in method deleter of property object at 0x1eaa158> 34 >>> baz = Bar() 35 >>> baz.__dict__ 36 {'_Bar__foo': None} 37 >>> baz.foo is None 38 True 39 >>> baz.foo = 5 40 >>> baz.__dict__ 41 {'_Bar__foo': 5} 42 >>> del baz.foo 43 >>> baz.__dict__ 44 {} 45 >>> baz.foo 46 Traceback (most recent call last): 47 File "<input>", line 1, in <module> 48 File "<input>", line 8, in foo 49 AttributeError: 'Bar' object has no attribute '_Bar__foo' 50 >>> Bar.foo.fget(baz) 51 >>> Bar.foo.fset(baz, 8) 52 >>> Bar.foo.fget(baz) 53 8 54 >>> baz.__dict__ 55 {'_Bar__foo': 8} 56 >>> The returned property object has the attributes The main thing to remember from this example is that we can see that
each property object has its built-in getter, setter and deleter
methods (lines 28 to 33) which we can use as decorators when prefixed
with the attribute name. This is the most concise way to work with
properties because we do not have to use an explicit We also decided again to prefix the attribute name ( Non-Data DescriptorAn object that only defines a As for data descriptors, non-data descriptors have distinct semantics with regards to shadowing/overriding and at which point in the precedence chain they are called. Data StructuresIn Python we have two groups of data structures — literals and containers, each of which have several subsets identified by certain constraints and/or capabilities. WRITEME Data Structures - LiteralsWRITEME None
Equality and SubclassingGenerally speaking, we need to be careful when using >>> None == False False >>> None == True False >>> None == 0 False >>> None == [] False >>> None == None # only time the equality check returns True True >>>
>>> type(None) <class 'NoneType'> >>> x = y = None >>> x == None True >>> y == None True >>> x == y True >>> class MyNoneType(None): ... pass ... ... Traceback (most recent call last): File "<input>", line 1, in <module> TypeError: cannot create 'NoneType' instances >>> None is a Singleton
>>> None is None # identity check True >>> id(None) == id(None) # equality check of None's id True >>> id(None) 8794272 >>> is None vs == NoneWhich one is better/correct? The short answer is that we should always use The long answer is that because there is the fact that
None is a singleton and that there is the general notion of
equality vs identity, checking some value against The >>> import dis >>> def foo(bar): ... return bar is not None # this version is preferred ... ... >>> dis.dis(foo) 2 0 LOAD_FAST 0 (bar) 3 LOAD_CONST 0 (None) 6 COMPARE_OP 9 (is not) 9 RETURN_VALUE >>> def foo(bar): ... return not bar is None ... ... >>> dis.dis(foo) 2 0 LOAD_FAST 0 (bar) 3 LOAD_CONST 0 (None) 6 COMPARE_OP 9 (is not) 9 RETURN_VALUE >>> FunctionsAn important thing to know is that None as Default Parameter ValueGo here for more information. MiscellaneousAssignments to EllipsisThe >>>... Ellipsis >>> Ellipsis Ellipsis >>> type(...) <class 'ellipsis'> >>> type(Ellipsis) <class 'ellipsis'> >>> As we can see, >>> Ellipsis is... True >>> ban =... >>> bis =... >>> boo = Ellipsis >>> ban is boo True >>> ban is boo is bis True >>>
Practical ImplicationsSo what are the practical implications with regards to the
>>> from numpy import arange >>> arange(12).reshape(4, 3) array([[ 0, 1, 2], [ 3, 4, 5], [ 6, 7, 8], [ 9, 10, 11]]) >>> arange(12).reshape(4, 3)[:2, :2] array([[0, 1], [3, 4]]) >>> type(arange(12).reshape(4, 3)[:2, :2]) <type 'numpy.ndarray'> # multidimensional array type >>> The >>> arange(12).reshape(4, 3)[..., :2] # placeholder specifying notation array([[ 0, 1], [ 3, 4], [ 6, 7], [ 9, 10]]) >>> arange(12).reshape(4, 3)[:4, :2] array([[ 0, 1], [ 3, 4], [ 6, 7], [ 9, 10]]) >>> That is pretty much it with regards to the NotImplementedNumbers
IntegralIntegerBoolean
Operations and built-in functions that have a Boolean result always return 0 or False for false and 1 or True for true, unless otherwise stated. (Important exception: the Boolean operations or and and always return one of their operands.) Real/FloatBinary Floating PointDecimal Floating Point
Because I wanted a Money data type... Decimal vs Binary Floating Point
WRITEME ComplexData Structures - ContainersSome objects contain references to other objects. These are called containers. The most prominent examples of containers are tuples, lists and dictionaries. This section will look at those afore mentioned types as well as others, not so prominent container types. WRITEME Sequences
Immutable SequencesStringsString
>>> import string >>> string.ascii_letters 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ' >>> string.punctuation '!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~' >>> string.whitespace '\t\n\x0b\x0c\r ' >>> UserStringString Formatting
TupleNamed TupleWRITEME Named Tuple vs FooBefore we start let us look at some questions often raised, namely what is difference amongst named tuples and a punch of other data structures in Python:
Examples
>>> import datetime >>> foo = datetime.datetime.utcnow() >>> bar = foo.timetuple() >>> bar time.struct_time(tm_year=2011, tm_mon=2, tm_mday=21, tm_hour=16, tm_min=47, tm_sec=45, tm_wday=0, tm_yday=52, tm_isdst=-1) >>> type(bar) <class 'time.struct_time'> >>> print(bar[0]) 2011 >>> print(bar.tm_year) 2011 >>> Byte
RangeMutable SequencesLists
UserListDequeByte ArrayMiscellaneousSequence ComparisonSequences can be compared in case their current objects under comparison are of the same type: >>> [1, 3] > [1, 4] False >>> [1, 3] > [1, 2] True >>> [1, 3] > [1, 2, 3] True >>> [1.11, 3] > [1, 2, 3] True >>> [1.11, 3] > [1, 2.447, 3] True >>> [1, 3] > [1, 2.447, "astring"] # decision can be made on [1] True >>> [1, 3] > [1, "astring"] # [1] has different types thus fails Traceback (most recent call last): File "<input>", line 1, in <module> TypeError: unorderable types: int() > str() >>> The comparison uses lexicographical ordering i.e. first the first two
items ( If all items of two sequences compare equal, the sequences are considered equal. If one sequence is an initial sub-sequence of the other, the shorter sequence is the smaller (lesser) one. Lexicographical ordering for strings in Python uses the ASCII ordering for individual characters. >>> ord("a") 97 >>> chr("97") # the reverse 'a' >>> ord("b") 98 >>> "a" > "b" False >>> "a" < "b" True >>> "ab" < "a" False >>> import pymongo >>> pymongo.version '1.9+' >>> ord(".") 46 >>> ord("+") 43 >>> pymongo.version > "1.8" True >>> pymongo.version[2] > "1.8"[2] True >>> pymongo.version[2] '9' >>> "1.8"[2] '8' >>> pymongo.version[:2] == "1.8"[:2] # up to [2] they are equal True >>> pymongo.version[:2] '1.' >>> "1.8"[:2] '1.' >>> Comparing objects of different types with >>> 0 < 0.0 False >>> 0 == 0.0 True >>> 0.8 >= 0.8 True >>> 0.83 > 0.8 True >>> 0.8 > 0.8 False >>> Otherwise, rather than providing an arbitrary ordering, the
interpreter will raise a >>> "some string" > [1, 5] Traceback (most recent call last): File "<input>", line 1, in <module> TypeError: unorderable types: str() > list() >>> Sets
WRITEME SetFrozensetMappingsDictionaries
DictionaryUserDictDefaultDictOrderedDictChainMapCounter
securedictViewThe objects returned from 1 >>> somecontainer = {'one': 1, 'foo': 5} 2 >>> keys = somecontainer.keys() 3 >>> keys 4 dict_keys(['foo', 'one']) 5 >>> somecontainer['bar'] = 2 6 >>> keys 7 dict_keys(['foo', 'bar', 'one']) 8 >>> list(keys) 9 ['foo', 'bar', 'one'] Python 2 vs Python 3In Python 2 views have to be explicitly created (e.g. by using
This is how the syntax changes from Python 2 to Python 3 for doing
semantically the same thing. As mentioned further down, the point is
that we should now use View vs IteratorNote that views are semantically somewhat hovering in between lists
and iterators — view objects returned by for key in dict.keys(): pass Python will create an iterator for us. In Python 3, for key in dict.keys(): pass and for key in iter(dict.keys()): pass is one of implicit vs explicit creation of the iterator. Whilst both, views and iterators are lazy evaluation, we need to remember that if we create an explicit iterator (line 14) then it can only be used once (lines 17 to 20) whereas a view can be reused as often as required: 10 >>> "foo" in keys 11 True 12 >>> "bar" in keys 13 True 14 >>> myiterator = iter(somecontainer.keys()) 15 >>> "foo" in myiterator 16 True 17 >>> "bar" in myiterator 18 True 19 >>> "bar" in myiterator 20 False 21 >>> "bar" in keys 22 True 23 >>> "bar" in keys 24 True Also, notice that if we create an explicit iterator (line 25) and then modify the dictionary after creating the iterator (line 26), then the iterator is invalidated 25 >>> anotheriterator = iter(somecontainer.keys()) 26 >>> somecontainer['cheese'] = 7 27 >>> for key in anotheriterator: 28 ... print(key) 29 ... 30 Traceback (most recent call last): 31 File "<stdin>", line 1, in <module> 32 RuntimeError: dictionary changed size during iteration something that is not true if we use a view 33 >>> for key in keys: 34 ... print(key) 35 ... 36 cheese 37 foo 38 bar 39 one 40 >>> type(keys) 41 <class 'dict_keys'> 42 >>> type(somecontainer) 43 <class 'dict'> 44 >>> type(anotheriterator) 45 <class 'dict_keyiterator'> 46 >>> keys 47 dict_keys(['cheese', 'foo', 'bar', 'one']) 48 >>> As we can see, we are still using the same view ( Quality AssuranceHigh-quality products that are simple to use, with short time to market cycles and top-notch customer support... Get this right and consider yourself a winner, do not and you go out of business quickly. Quality assurance for a software product encompasses the whole set of requirements definition, software design, coding conventions, software configuration management, peer review, issue tracking, change management, testing, debugging, release management, and product integration — all this without over-engineering things, with building and keeping momentum as a team and with keeping the fun-factor and excitement for developers alive...
All that might sound awfully complicated but in reality it is not that hard to accomplish because there is often a lot of overlap among those areas and not everything is needed for every project.
This section assumes that we managed to accomplished #1 and #2 already. In order to satisfy #3 we need to have checks, tests and a system that can run those checks and tests automatically every time new code is checked into the repository. At this point (December 2011) it seems that #4 is only done by a minority of software projects. The notion of building metrics for business values into software as we write it is quite new and not in widespread use yet. This is going to change because people already start realizing the importance of having accurate information on business values of their software when it is running in production. The remainder of this section will focus on #3 and #4 but will initially also have a quick look at parts of #2 e.g. peer review and TDD. Our goal is to end up with a set of processes and systems which allows us to do quality assurance for the systems we build. Design DocumentsDesign documents help us in managing complexity and synchronizing people from different backgrounds. Projects above a certain size and scale are not manageable without having design documents. That being said, even for small projects it is worth doing them because they help us catch flaws in mental models early in the process, thus saving a lot of time and hassle down the road. Depending on the design document, some of them are living documents as they are constantly being amended to reflect a change of state and/or goals set. As for most things with quality assurance, the key to design documents is to not create to much of a burden — only if things are simple and show benefits rather quickly will people do them... The danger with design documents is that the whole process can easily become to theoretical/complex which will then lead to two different worlds existing in parallel — the one on paper and what happens on the ground. To avoid this we should keep things simple — simplicity is good, simplicity scales, simplicity shortens product cycles, simplicity helps reduce time to market and last but not least, simplicity makes for better quality... Another thing that is very important but forgotten a lot is that we should design for testability. Again, that is easy if we keep things simple but unfeasible if we over-complicate things by over-engineering them and/or add features just for the sake of features even if nobody needs them. Usually when we design systems then we have to think about hardware and software, about users, about the business model, and the business values which are going to matter. Our design has to account for scalability as we usually start rather small. One thing that is usually true is that rather than having a small number of big and complex systems, it is preferable to have a big number of small and simple systems — reasons why we would want this are maintainability, usability, flexibility and redundancy. Cost (per user/task, TCO...) and reduced vendor lock-in are two more reasons. Among others, those reasons are key in creating a high-quality product. Quality assurance and success really starts with design and I can therefore not stress its importance often enough. User Requirements DocumentA URD is a document written from the point of view of a (non-technical) user. It should be short (three pages or less). It does not contain tech-jargon but words and descriptions non-technical people would use. This is because often users are not able to communicate the entirety of their needs and wants, and the information they provide may also be incomplete, inaccurate and self-conflicting. The responsibility of completely understanding what users want then falls to us, the providers of the product. Once the required information is completely gathered it is documented in the URD, which is meant to spell out exactly what the system must do and becomes part of the contractual agreement — or just the internal URD in case we are planning for a system that users use online. A user cannot demand features not in the URD without renegotiating and a developer cannot claim the product is ready if it does not meet an item of the URD. The URD can be used as a guide to planning cost, timetables, milestones, testing, etc. The explicit nature of the URD allows users to show it to various stakeholders to make sure all necessary features are described. Formulating a URD requires negotiation to determine what is technically and economically feasible. Writing a URD is part science and part art as it requires both, technical skills and interpersonal skills. People able to write such documents are scarce — non-technicians tend to produce a lot of blablabla, thereby failing to express and pin-down the essence that matters mid to long-term. Most technicians on the other hand would deliver a technical document that lacks creativity and does not reflect the needs and ideas of the majority of users. Market Requirements DocumentThis one is the seconds non-technical document after the URD that people tend to write. However, whether or not we would do a MRD depends on whether or not there is a commercial angle at play (military and science projects often do not have a commercial angle to them). A MRD is a document that expresses the users wants and needs for a product or service. It should explain what (new) product is being discussed (referencing the URD is usually fine), the targeted markets, products in competition with the proposed one, why markets are likely to want this product (e.g. unique selling proposition) etc. Again, three pages or less, that is what I consider practical (even for big projects).
Brevity is the soul of wit. Architecture Requirements DocumentThis one is usually the first of three technical document we write and thus tightly coupled to the HNHCRD (Hardware/Networking/Housing/Connectivity Document) and the SRD (Software Requirements Document). It is written from the point of view of a system administrator and systems architect/integrator with the help of one or more software engineers who are later going to write the SRD (which also builds on the ARD). The ARD describes the major software blocks that make up a service/system and how they interact with each other so that the goals outlined in the URD and MRD can be meet. Questions asked and answered by and from the ARD are of the form: Do we need to store data? If so, what characteristics do we need from the data tier? Should we switch the I/O scheduler to deadline rather than keeping the default CFQ scheduler? What about other system control parameters? How does the logic tier connect to the data tier i.e. how does the logic tier do I/O to/from the data tier? Is the logic tier a distributed system? Is the logic tier a modular system or is it monolithic? A mixture? Regarding the entire system/stack, do we have a globally distributed system that needs to scale to millions of users? Are we doing scientific computations on thousands of shared-nothing nodes that need be connected through a low-latency network, thus be at the same geographical location? Are we talking standard IT system or are we talking autonomous deep-sea robot with requirements for hard-realtime? Is it a heart monitor with strict MTTF (Mean Time To Failure) requirements, or maybe we are talking mobile phone applications with unpredictable on/offline patterns on an ARM platform with strict low-power requirements? Do we have a UI (User Interface) and if so, how does it interact with the logic tier? Each of those systems/services will be build using software and some means of communication/interaction with humans and/or other systems/services. However, software used and the means of communication amongst software/system blocks might be vastly different for each of those systems. The ARD should look at the parts/components involved, tell us why a certain technology is used over another and finally tell us how all the moving software parts are connected in order to create the service/system described by the URD. Hardware/Networking/Housing/Connectivity Requirements DocumentThe second technical document — usually written by a system integrator/administrator, an IP engineer and a purchase manager — looks at what is needed in terms of hardware, the network, where hardware is kept, as well as how the system/service is connected to the Internet. Apart from describing what we need to buy and/or loan in order to build the things outlined by the URD and ARD, what the HNHCRD also does is look at how those things are being provisioned and kept functioning over their entire life-cycle. The questions that the HNHCRD builds upon are from the URD and ARD. For example: What kind of hardware do we need? What types of devices do we support, stationary (e.g. office workstation) or mobile (e.g. mobile phone)? Both? Are we going to use ARM based CPUs, a purpose-build SOC (System-on-a-chip), or standard x86-64 hardware? Is power consumption a major concern, if so, should be pick the low-voltage CPUs and SSDs? Do we have the usual 2 x 16A power per 42U rack available? Maybe we need 2 x 32A because we have a 49U rack with high-density equipment? How many U do we need? Do we house our hardware in a cage, a private room, a single rack or is shared rack-space enough? How is physical access managed? How do we do inventory management? Is all the hardware in the same datacenter and if so can we have direct connects with separate VLANs? Does this network need to be a low-latency InfiniBand network? Do switches need to be BGP and IPv6 capable? How many units can we stack together to create a single big virtual single-IP managed switch? Do the stacking links between switches need to be high bandwidth and low-latency, more than ordinary switch ports for server nodes? How many publicly available IP addresses do we need? IPv4 or IPv6? Do we need to become a RIPE/ARIN/APNIC/etc. member? Is our connection to the backbone multi-homed? Do we need VRRP capable switches? What domain names do we need to register? Do we need a SSL certificate and if so do we maybe need a SAN-ready wildcard certificate? How do we handle provisioning and purchase? How is hardware maintenance and change requests handled? What SLAs do we provide for our service/system? That in turn, what SLAs do we need from our contractors and suppliers in order to provide said SLAs to our users/customers? What response times to hardware failure and/or change requests do we need? Do we need 24/7 staff on-site in the datacenter? Last but not least, of all the possible variants on the table, which one is the most cost effective one with the best ratio of initial investment to TCO and lowest cost per user/task? Which one will help us deliver the best quality, have the shortest time to market, give the best ROI and be the best solution for our users/customers? Software Requirements DocumentThis is the last technical document we write and builds upon the ARD and to some extend the HNHCRD. It is usually the most technical one as it describes all the nitty-gritty details about our software stack:
TestingThe only good is testing and the only evil is not to... Software testing is any activity aimed at evaluating an attribute or capability of a program or system and determining that it meets its required results. Typically, more than 57% percent of the development time is spent in testing...
Software is not unlike other physical systems where inputs are received and outputs are produced. Where software differs is in the manner in which it fails. Most physical systems fail in a fixed (and reasonably small) set of ways. By contrast, software can fail in many bizarre ways. Detecting all of the different failure modes for software is generally unfeasible. Unlike most physical systems, most of the defects in software (also known as bugs) are design defects, not manufacturing defects. Software does not suffer from corrosion, wear-and-tear — generally it will not change until upgrades, or until obsolescence. Once software is shipped, the design defects will be buried in and remain latent until activation. Software defects will almost always exist in any software module with moderate size: not because programmers are careless or irresponsible, but because the complexity of software is generally intractable and humans have only limited ability to manage complexity. It is also true that for any complex systems, design defects can never be completely ruled out. Discovering the design defects in software, is equally difficult, for the same reason of complexity. Because software and any digital systems are not continuous, testing boundary values are not sufficient to guarantee correctness. All the possible values need to be tested and verified, but complete testing is unfeasible. Exhaustively testing a simple program to add only two integer inputs of 32-bits (yielding 2^64 distinct test cases) would take hundreds of years, even if tests were performed at a rate of thousands per second. Obviously, for a realistic software module, the complexity can be far beyond the example mentioned here. If inputs from the real world are involved, the problem will get worse, because timing and unpredictable environmental effects and human interactions are all possible input parameters under consideration. A further complication has to do with the dynamic nature of programs. If a failure occurs during preliminary testing and the code is changed, the software may now work for a test case that it did not work for previously. But its behavior on pre-error test cases that it passed before can no longer be guaranteed (also known as regression). To account for this possibility, testing should be restarted. The expense of doing this is often prohibitive. Pesticide ParadoxAn interesting analogy parallels the difficulty in software testing with the pesticide, known as the pesticide paradox: Every method we use to prevent or find software defects (bugs) leaves a residue of subtler software defects against which those methods are ineffectual. But this alone will not guarantee to make the software better, because the complexity barrier principle states: Software complexity (and therefore that of software defects) grows to the limits of our ability to manage that complexity. By eliminating the (previous) easy software defects we allowed another escalation of features and complexity, but this time we have subtler software defects to face, just to retain the reliability we had before. Society seems to be unwilling to limit complexity because we all want that extra bell, whistle, and feature interaction. Thus, our users always push us to the complexity barrier and how close we can approach that barrier is largely determined by the strength of the techniques we can wield against ever more complex and subtle software defects. Rationale for TestingRegardless of the limitations, testing is and must be an integral part in software development. It is broadly deployed in every phase in the software development cycle. Typically, more than 57% percent of the development time is spent in testing. Testing is usually performed for the following purposes: QualityAs computers and software are used in critical applications, the outcome of a software defect can be severe — software defects in critical systems have caused airplane crashes, heart monitors to malfunction, space shuttle missions to go awry, halted trading on the stock market. Less severe but happening more often, software defects are causing companies go bankrupt, employees loosing their jobs, and investors loosing all their money just because a software defect caused data loss or a security hole through which trade secrets leaked out. Even less dramatic but now happening almost every day, a software defect causing a public relation disaster because it enabled a privacy violation of thousands of customers... Software defects can kill. Software defects can cause disasters. In our computerized embedded world, the quality and reliability of software can be a matter of life and death or at least it makes the difference between a profitable business and one that goes bankrupt. Quality is the conformance to the specified design requirements. Being correct, the minimum requirement of quality, means performing as required under specified circumstances. Debugging, a narrow view of software testing, is performed heavily to find out design defects by the programmer. The imperfection of human nature makes it almost impossible to make a moderately complex program correct the first time. Finding the problems and getting them fixed, is the purpose of debugging during the programming phase. Verification & ValidationTesting can serve as metric. It is heavily used as a tool in the verification and validation process. Testers can make claims based on interpretations of the testing results, which either the product works under certain situations, or it does not. We can also compare the quality among different products under the same specification, based on results from the same test. We can not test quality directly, but we can test related factors to make quality visible to the human eye. Quality has three sets of factors — functionality, engineering, and adaptability. These three sets of factors can be thought of as dimensions in the software quality space. Each dimension may be broken down into its component factors and considerations made at successively lower detail levels. Some of the most frequently cited quality considerations are:
Good testing provides measures for all relevant factors. The importance of any particular factor varies from application to application. Any system where human lives are at stake must place extreme emphasis on reliability and integrity. In the typical business system usability and maintainability are the key factors, while for a one-time scientific program neither may be significant. Our testing, to be fully effective, must be geared towards measuring each relevant factor and thus forcing quality to become tangible and visible. Tests with the purpose of validating the product works are named positive tests. The drawbacks are that it can only validate that the software works for the specified test cases. A finite number of tests can not validate that the software works for all situations. On the contrary, only one failed test is sufficient enough to show that the software does not work. Negative tests, refers to the tests aiming at breaking the software, or showing that it does not work. A piece of software must have sufficient exception handling capabilities to survive a significant level of negative tests. A testable design is a design that can be easily validated, verified/falsified and maintained (e.g. Unit Tests). Because testing is a rigorous effort and requires significant time and cost, design for testability is also an important design rule for software development. Reliability EstimationSoftware reliability has important relations with many aspects of software, including the structure, and the amount of testing it has been subjected to. Based on an operational profile (an estimate of the relative frequency of use of various inputs to the program), testing can serve as a statistical sampling method to gain failure data for reliability estimation. Software testing is not mature, it exists in an area between science and art because we are still unable to make pure science. Today we are still using the same testing techniques invented 20-30 years ago, some of which are crafted methods or heuristics rather than good engineering methods. Software testing can be costly, but not testing software is even more expensive, especially when human lives are at stake. We can never be sure that a piece of software is correct. We can never be sure that the specifications are correct. No verification system can verify every correct program. We can never be certain that a verification system is correct either. Conclusions
Software MetricsWe can not control what we can not measure... A software metric is a measure of some property of a piece of software or its specifications.
Since quantitative measurements are essential in all sciences, there is a continuous effort by computer science practitioners and theoreticians to bring similar approaches to software development. The goal is obtaining objective, reproducible and quantifiable measurements, which may have numerous valuable applications in schedule and budget planning, estimating the business value of a feature, cost estimation, quality assurance, testing, performance optimization, assignment of man-power... Metrics allow us to make statements about whether or not something works. Metrics generated from automated testing focus developers on developing functional, quality code, and help develop momentum in a team. Metrics are essential in making the right the decisions at the right time. Metrics are knowledge.
Knowledge is power. We need to know what our software does when it runs in production. Only then can we know about business values that matter to our users, when, and why. Knowing about business values allows us to make the right decisions faster, do the right things at the right time and, mostly even more important, avoiding doing the wrong thing at the wrong time — no more guessing... Businesses that manage to get this right make for happy users. Happy users eventually turn into paying customers... While there are many ready-made tools available for monitoring basics things like network, diskspace, and load, in order to get information about the business values of our software, we need to build metrics right into it. Let us have a look at a quick example: def handle_request(self, request): with self.latency.time(): # important business value # code to handle the request Here we want to know about the latency of requests (how long it takes our website to answer a users's request). Why? Because latency is an important business value — people like fast over slow, they stay, they come back, eventually they become customers and start paying us money... Obviously, latency is just one example, there are many more important business values, each one more or less important. The point is that when we are able to generate, record, and display metrics like this one then we have a much better inside into what our software does at any given point in time — we get a better understanding what is an important business value and what is not. This in turn helps us moving in the right direction and avoid costly mistakes — we end up building a more valuable product that is more attractive to users. Mid to long-term we will also manage to have a product which TCO (Total Cost of Ownership) will be lower and which RoI (Return on Investment) will happen faster. Metrics are also a very valuable instrument in spotting problems in existing products of which we already know have business value. Metrics can help us spot problems that would have otherwise gone unnoticed. For example, a spike in latency would indicate a problem but for some reason load, diskspace and network graphs look fine... Maybe the new feature we rolled-out an hour ago is the culprit? Maybe... Ah, wait! No more guessing remember? If we do not have a clear answer based on facts (our metrics) then we simply need to add more metrics for this business value and the ones related to it — only hard evidence based on facts is allowed! Metrics in PracticeThe problem with software metrics and software testing in general is that there is a lot of theory and only little that works well in practice and does not lead to unnecessary complexity that hurts us more than it helps us:
WRITEME Benchmarking, ProfilingBenchmarking and profiling in the software sense is about speed (execution time) — how long does it take a system to answer a question or finish a task it has been commanded to do. Time always mattes. Speed is a business value therefore software metrics related to execution time are of interest to us. Truth is that more often than not we have to make compromises between speed and quality e.g. a trading system's numbers are more valuable the higher their precision is but even the highest precision is worthless if it happens just one second to late — the world has moved on by then... Doing the wrong thing or giving the wrong answer at the right time is equally bad of course. The interesting thing about execution time is that it is one of just a few software metrics that can be measured during testing and later when our software runs in production — as opposed to for example code coverage, a software metric that matters only during testing.
WRITEME Code CoverageCode coverage is a quantitative measure of finding out how much of our code has been executed when we run our tests. It is important to understand that we use coverage analysis to assure quality of our tests, not the quality of the actual product. How can we increase code coverage for our production code? There are two things we can do to increase test coverage of our production code: we write tests for already existing production code or we remove duplication and/or orphaned/unused production code for which no tests exist yet. The alerted reader might sense the contradiction here. In theory none of the two should be necessary because we adhere to TDD which means we write our tests before we write our production code — doing so means we will always have 100% code coverage. While true in theory, it depends on the property/method of code coverage we pursue. While code coverage is a software metric in the strict sense, it is used in testing rather than later, for monitoring a business value, when software runs in production. In particular, code coverage can help us with
A code coverage analyzer automates this process and is either invoked manually by us as part of TDD and/or automatically by our continuous integration/deployment system after we made a commit to the source code repository. Code Coverage Property/MethodThere are different ways to measure code coverage, each of which has its benefits and drawbacks and none of which is the best for any use case. The different properties/methods used in code coverage are:
Statement CoverageStatement coverage measures the number of statements that were executed by our tests. Most tools such as coverage.py do statement coverage by default but can often be told to run in branch coverage mode as well. Branch CoverageBranch coverage measures the number of branches executed by our tests, where 100% branch coverage means that every branch of our code has been executed at least once by one of our tests. If we compare branch coverage to statement coverage, it is harder to achieve because it requires more tests to be written. Doing so requires more time and knowledge too but in the end branch coverage provides better overall coverage — software for which branch coverage is low is generally not considered to be thoroughly tested. As mentioned, achieving high coverage with branch coverage often involves writing additional tests where our software is supposed to fail (rather than succeed) in some way e.g. run into an assert or throw an exception. In order to achieve the same amount of coverage with statement coverage it is usually enough to have tests that test our software for its intended usage i.e. cases where it would succeed. As with testing in general, there is a limit to the coverage that can be achieved with branch coverage as some branches in our code may only be used for handling of errors that are beyond the control of our tests. In some cases so-called stress testing can achieve higher branch coverage by producing the conditions under which certain error handling branches are followed. Another way to increase coverage is by using fault injection (knowingly and on purpose providing wrong/faulty input). coverage.pycoverage.py does statement coverage by default, it tells us which
statements were executed when we run our tests. In case we want to
measure coverage by branch coverage measurement we can use the
When measuring branches, WRITEME Test-driven DevelopmentThe reason why TDD has become so popular is because it provides the best ratio of money/time invested to quality produced. It is also very important to emphasize that TDD is actually more of a design methodology rather than a mere testing methodology. TDD (Test-driven Development) scales well, both, in software (project size) as well as on the human side (team size). People generally adapt quickly to the concept of writing tests before writing the actual software. TDD requires us to think about interfaces before we start writing code, it encourages simple designs, it inspires confidence, code coverage becomes measurable, and last but not least, TDD does away with the phenomenon where people write code for functionality that is not required simply because they stop writing code once all tests pass:
Unit TestingWRITEME Integration Testing
System Testing
Source Code CheckerWe use source code checkers for static source code checks i.e. to test whether or not code is in compliance with a predefined set of rules and best practices such as coding style. This might happen several times before code makes it to production — the developer would manually run a source code checker on the code he just wrote before he checks it into the source code repository, in addition our continuous integration/deployment system would run those checks again automatically for any new commit that passes trough on the way to production. The use of analytical methods to check source code in order to detect software defects and improve quality in general is nothing new — it is the most basic thing to do for quality assurance and should be a given for any software project. pep8
PyCheckerIn addition to the bug-checking that PyChecker performs, Pylint offers some additional features such as checking line length, whether variable names are well-formed according to our coding standard, whether declared interfaces are fully implemented... PylintThis one is a very good compliment to I ended up using them over others ( py.testpy.test can be used for unit testing, integration testing and system testing — one tool to test our software end-to-end.
WRITEME Assertionspy.test does assertion introspection i.e. by default we do not get
Python's default assert semantics — py.test rewrites py.test only rewrites test code directly discovered by its test
collection process. This means that Assertion Introspection Methodspy.test has three assertion introspection methods: Funcargs
built-in Funcargs- http://pytest.org/latest/tmpdir.html Mysetup Pattern
Parametrized Testing
pytest-cov
pytest-pep8
Miscellaneous
DebuggingDebugging is the process of trying to find out why a certain technical system does not show a certain expected behavior and/or fails to produce any reasonable result at all. The inverse is true as well... debugging is sometimes used to verify expected outcome.
While the main purpose of debugging certainly is about tracking down errors and/or unexpected behavior, debugging is also very good in terms of educating ourselves because it allows us to look behind the curtain of what Python is doing at any moment — what the names are it is currently dealing with, what codepath that lead to a certain subroutine... good stuff! Our tool of choice is pdb, the module that defines an interactive source code debugger for Python programs and which is shipped as part of Python's standard library. pdb supports setting breakpoints, stepping through source code line by line, inspecting stack frames, source code listing, evaluation of arbitrary Python code in the context of any stack frame, post-mortem debugging and then some more... It can be called from a running program or dropped into on error when running tests. pdb is also extensible as it defines the class/type >>> import sys >>> 'pdb' in sys.builtin_module_names False # pdb is not a built-in module but >>> import pdb >>> pdb.__file__ '/home/sa/0/cpython3.3/lib/python3.3/pdb.py' # shipped as part of the standard library >>> import os >>> os.path.dirname(pdb.__file__) in sys.path # pdb lives on sys.path per default so import is easy True >>> pdb.__all__ # pdb's public API ['run', 'pm', 'Pdb', # class Pdb which can be used to extend pdb 'runeval', 'runctx', 'runcall', 'set_trace', 'post_mortem', 'help'] >>> Starting pdbAs mentioned, there are a few ways to start the debugger... Command LineThe first one is to simply start it from the command line and feed it some piece of Python source code. Let us draft something quickly which we can feed to pdb: #!/usr/bin/env python class Foo: def __init__(self, num_loops): self.count = num_loops def go(self): for i in range(self.count): print(i) return if __name__ == '__main__': Foo(5).go() and then feed it to pdb on the command line (py33) sa@wks:~/0/py33$ python -m pdb pdb0.py > /home/sa/0/py33/pdb0.py(4)<module>() -> class Foo: (Pdb) list 1 #!/usr/bin/env python 2 3 4 -> class Foo: # pdb pauses on encounter of the first statement/expression 5 6 def __init__(self, num_loops): 7 self.count = num_loops 8 9 def go(self): 10 for i in range(self.count): 11 print(i) (Pdb) Running pdb from the command line causes it to load our source file
Interactive Interpreter SessionWe can also start pdb from an interactive interpreter session: >>> import pdb >>> import pdb0 >>> pdb.run('pdb0.Foo(5).go()') > <string>(1)<module>()->None (Pdb) step --Call-- > /home/sa/0/py33/pdb0.py(6)__init__() -> def __init__(self, num_loops): (Pdb) list 1 #!/usr/bin/env python 2 3 4 class Foo: 5 6 -> def __init__(self, num_loops): 7 self.count = num_loops 8 9 def go(self): 10 for i in range(self.count): 11 print(i) (Pdb) quit >>> Many Pythoneers use this workflow with the interactive interpreter
while developing early versions of their software because it lets them
experiment more interactively without the save/run/repeat cycle
otherwise needed. As can be seen, to run pdb from within an
interactive interpreter session The argument to From a running ProgramBoth of the previous examples assume we want to start the debugger at
the beginning of our program. For a long-running process where the
problem appears much later during program execution, it is more
convenient to start the debugger from inside our running program using
(py33) sa@wks:~/0/py33$ cat pdb0.py #!/usr/bin/env python import pdb class Foo: def __init__(self, num_loops): self.count = num_loops def go(self): for i in range(self.count): pdb.set_trace() print(i) return if __name__ == '__main__': Foo(5).go() (py33) sa@wks:~/0/py33$ chmod 755 pdb0.py (py33) sa@wks:~/0/py33$ ./pdb0.py > /home/sa/0/py33/pdb0.py(14)go() -> print(i) (Pdb) list 9 self.count = num_loops 10 11 def go(self): 12 for i in range(self.count): 13 pdb.set_trace() 14 -> print(i) 15 return 16 17 18 if __name__ == '__main__': 19 Foo(5).go() (Pdb) quit (py33) sa@wks:~/0/py33$ Line 13 of the sample script triggers pdb at that point in execution.
After a FailureDebugging a failure after a program terminates is called post-mortem
debugging. pdb supports post-mortem debugging through the (py33) sa@wks:~/0/py33$ cat pdb0.py #!/usr/bin/env python class Foo: def __init__(self, num_loops): self.count = num_loops def go(self): for i in range(foobarbaz): print(i) return if __name__ == '__main__': Foo(5).go() (py33) sa@wks:~/0/py33$ python >>> import pdb0 >>> pdb0.Foo(3).go() Traceback (most recent call last): File "<input>", line 1, in <module> File "pdb0.py", line 10, in go for i in range(foobarbaz): NameError: global name 'foobarbaz' is not defined >>> import pdb >>> pdb.pm() > /home/sa/0/py33/pdb0.py(10)go() -> for i in range(foobarbaz): (Pdb) print foobarbaz NameError: name 'foobarbaz' is not defined (Pdb) list 5 6 def __init__(self, num_loops): 7 self.count = num_loops 8 9 def go(self): 10 -> for i in range(foobarbaz): 11 print(i) 12 return 13 14 15 if __name__ == '__main__': (Pdb) Here the incorrect Dropping into pdb when running TestsWe are going to use py.test because it is the most sophisticated tool out there: (py33) sa@wks:~/0/py33$ pip freeze | grep test pytest==2.2.0 pytest-pep8==0.7 (py33) sa@wks:~/0/py33$ (py33) sa@wks:~/0/py33$ py.test --pdb pdb0.py ====================== test session starts ====================== platform linux -- Python 3.3.0 -- pytest-2.2.0 collected 1 items pdb0.py >>>>>>>>>>>>>>>>>>>>>>>>>>> traceback >>>>>>>>>>>>>>>>>>>>>>>>>>> def test_go(): > Foo(5).go() pdb0.py:16: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = <pdb0.Foo object at 0x7f91400e3290> def go(self): > for i in range(foobarbaz): E NameError: global name 'foobarbaz' is not defined pdb0.py:10: NameError >>>>>>>>>>>>>>>>>>>>>>>>> entering PDB >>>>>>>>>>>>>>>>>>>>>>>>>> > /home/sa/0/py33/pdb0.py(10)go() -> for i in range(foobarbaz): (Pdb) list 3, 20 3 4 class Foo: 5 6 def __init__(self, num_loops): 7 self.count = num_loops 8 9 def go(self): 10 -> for i in range(foobarbaz): 11 print(i) 12 return 13 14 15 def test_go(): 16 Foo(5).go() 17 18 19 if __name__ == '__main__': 20 Foo(5).go() (Pdb) quit F =================== 1 failed in 5.82 seconds ==================== (py33) sa@wks:~/0/py33$ py.test looks for functions/methods prefixed with py.test runs The same semantics that apply to using (py33) sa@wks:~/0/py33$ cat pdb0.py #!/usr/bin/env python import pytest # no need for an additional import pdb class Foo: def __init__(self, num_loops): self.count = num_loops def go(self): for i in range(self.count): pytest.set_trace() print(i) return def test_go(): Foo(5).go() if __name__ == '__main__': Foo(5).go() (py33) sa@wks:~/0/py33$ py.test --pep8 pdb0.py =========================== test session starts ============================ platform linux -- Python 3.3.0 -- pytest-2.2.0 pep8 ignore opts: (performing all available checks) collected 2 items pdb0.py . >>>>>>>>>>>>>>>>> PDB set_trace (IO-capturing turned off) >>>>>>>>>>>>>>>>>> > /home/sa/0/py33/pdb0.py(14)go() -> print(i) (Pdb) list 9 self.count = num_loops 10 11 def go(self): 12 for i in range(self.count): 13 pytest.set_trace() 14 -> print(i) 15 return 16 17 18 def test_go(): 19 Foo(5).go() (Pdb) print i, self.count (0, 5) (Pdb) quit (py33) sa@wks:~/0/py33$ Interactive Commandspdb is designed to be easy to use interactively. We can interact with
it using a small command language ( >>> import readline >>> readline.__file__ '/home/sa/0/py33/lib/python3.3/lib-dynload/readline.cpython-33m.so' >>> Getting HelpOnce at the pdb prompt we can then use (py33) sa@wks:~/0/py33$ python pdb0.py > /home/sa/0/py33/pdb0.py(14)go() -> print(i) (Pdb) help Documented commands (type help <topic>): ======================================== EOF cl disable interact next return u where a clear display j p retval unalias alias commands down jump pp run undisplay args condition enable l print rv unt b cont exit list q s until break continue h ll quit source up bt d help longlist r step w c debug ignore n restart tbreak whatis Miscellaneous help topics: ========================== pdb exec (Pdb) We can also get to certain help topics by using (Pdb) help until unt(il) [lineno] Without argument, continue execution until the line with a number greater than the current one is reached. With a line number, continue execution until a line with a number greater or equal to that is reached. In both cases, also stop when the current frame returns. (Pdb) help exec (!) statement Execute the (one-line) statement in the context of the current stack frame. The exclamation point can be omitted unless the first word of the statement resembles a debugger command. To assign to a global variable you must always prefix the command with a 'global' command, e.g.: (Pdb) global list_options; list_options = ['-l'] (Pdb) (Pdb) Navigating the Execution StackUnless the command was (Pdb) w /home/sa/0/py33/pdb0.py(23)<module>() -> Foo(5).go() > /home/sa/0/py33/pdb0.py(14)go() -> print(i) (Pdb) # runs w/where again /home/sa/0/py33/pdb0.py(23)<module>() -> Foo(5).go() > /home/sa/0/py33/pdb0.py(14)go() -> print(i) (Pdb) list 9 self.count = num_loops 10 11 def go(self): 12 for i in range(self.count): 13 pdb.set_trace() 14 -> print(i) 15 return 16 17 18 if __name__ == '__main__': 19 Foo(5).go() (Pdb) # does not run l/list again [EOF] (Pdb) As can be seen, at any point while pdb is running we can use
Moving up/down the Call StackIf up/down the call stack sounds confusing then we can just think about it as back and forth in time between the point in time where Python starts executing our program and the point in time where we actually drop into pdb. Two very handy commands in addition to (Pdb) where # let us have a look at the call stack /home/sa/0/py33/pdb0.py(19)<module>() -> Foo(5).go() # first (topmost) stack frame on the call stack > /home/sa/0/py33/pdb0.py(14)go() -> print(i) # second stack frame (Pdb) l 9 self.count = num_loops 10 11 def go(self): 12 for i in range(self.count): 13 pdb.set_trace() 14 -> print(i) # execution pauses after the pdb.set_trace() line 15 return 16 17 18 if __name__ == '__main__': 19 Foo(5).go() (Pdb) up # move to older stack frame (upwards the call stack) > /home/sa/0/py33/pdb0.py(19)<module>() -> Foo(5).go() (Pdb) l 14 print(i) 15 return 16 17 18 if __name__ == '__main__': 19 -> Foo(5).go() # first stack frame [EOF] (Pdb) down # move to newer stack frame again (downwards the call stack) > /home/sa/0/py33/pdb0.py(14)go() -> print(i) (Pdb) l 9 self.count = num_loops 10 11 def go(self): 12 for i in range(self.count): 13 pdb.set_trace() 14 -> print(i) # back on second stack frame 15 return 16 17 18 if __name__ == '__main__': 19 Foo(5).go() (Pdb) Examining Variables on the Call StackThe (py33) sa@wks:~/0/py33$ cat pdb1.py #!/usr/bin/env python import pdb def foo(n=5, output="to be printed"): if n > 0: foo(n - 1) else: pdb.set_trace() print(output) return if __name__ == '__main__': foo() (py33) sa@wks:~/0/py33$ python pdb1.py > /home/sa/0/py33/pdb1.py(11)foo() -> print(output) (Pdb) where /home/sa/0/py33/pdb1.py(17)<module>() -> foo() /home/sa/0/py33/pdb1.py(8)foo() -> foo(n - 1) /home/sa/0/py33/pdb1.py(8)foo() -> foo(n - 1) /home/sa/0/py33/pdb1.py(8)foo() -> foo(n - 1) /home/sa/0/py33/pdb1.py(8)foo() -> foo(n - 1) /home/sa/0/py33/pdb1.py(8)foo() -> foo(n - 1) > /home/sa/0/py33/pdb1.py(11)foo() -> print(output) As can be seen, execution pauses after the (Pdb) args n = 0 output = 'to be printed' (Pdb) up # going back in time, up the call stack > /home/sa/0/py33/pdb1.py(8)foo() -> foo(n - 1) (Pdb) args n = 1 output = 'to be printed' (Pdb) print n # print evaluates an expression using the current frame variables/state 1 (Pdb) print n + 4 5 (Pdb) down # going forward in time, down the call stack > /home/sa/0/py33/pdb1.py(11)foo() -> print(output) # back at the frame where execution initially paused The We could also use Python's (pdb) print(sys.version[:3]) # pass through to the interpreter '3.3' (Pdb) !output # pass through to the interpreter and evaluate 'to be printed' (Pdb) !output = "but I want THIS to be printed..." (Pdb) l 6 def foo(n=5, output="to be printed"): 7 if n > 0: 8 foo(n - 1) 9 else: 10 pdb.set_trace() 11 -> print(output) 12 13 return 14 15 16 if __name__ == '__main__': (Pdb) step # step through line 11 (execute it) but I want THIS to be printed... > /home/sa/0/py33/pdb1.py(13)foo() -> return (Pdb) Stepping through a ProgramIn addition to navigating up and down the call stack using (py33) sa@wks:~/0/py33$ cat pdb2.py #!/usr/bin/env python import pdb def faz(n): for i in range(n): j = i * n print(i, j) return if __name__ == '__main__': pdb.set_trace() faz(5) (py33) sa@wks:~/0/py33$ python pdb2.py > /home/sa/0/py33/pdb2.py(15)<module>() -> faz(5) (Pdb) l 10 return 11 12 13 if __name__ == '__main__': 14 pdb.set_trace() 15 -> faz(5) [EOF] (Pdb) s --Call-- > /home/sa/0/py33/pdb2.py(6)faz() -> def faz(n): (Pdb) l 1 #!/usr/bin/env python 2 3 import pdb 4 5 6 -> def faz(n): 7 for i in range(n): 8 j = i * n 9 print(i, j) 10 return 11 (Pdb) s > /home/sa/0/py33/pdb2.py(7)faz() -> for i in range(n): # third execution point after dropping into pdb (Pdb) # hitting enter repeats the last command (step) > /home/sa/0/py33/pdb2.py(8)faz() -> j = i * n (Pdb) p i, n (0, 5) As we already know, the call to
(Pdb) u > /home/sa/0/py33/pdb2.py(15)<module>() -> faz(5) (Pdb) next 0 0 1 5 2 10 3 15 4 20 --Return-- > /home/sa/0/py33/pdb2.py(15)<module>()->None -> faz(5) (Pdb) s (py33) sa@wks:~/0/py33$ As can be seen, we first went up the call stack again using Something in between (py33) sa@wks:~/0/py33$ python pdb2.py > /home/sa/0/py33/pdb2.py(15)<module>() -> faz(5) (Pdb) s # we make a step --Call-- > /home/sa/0/py33/pdb2.py(6)faz() -> def faz(n): (Pdb) s # another step > /home/sa/0/py33/pdb2.py(7)faz() -> for i in range(n): (Pdb) p n # look at values in the current frame 5 # maybe found what we were looking for (Pdb) return # and thus skip forward 0 0 1 5 2 10 3 15 4 20 --Return-- > /home/sa/0/py33/pdb2.py(10)faz()->None -> return # explicit return i.e. function returns here (Pdb) w /home/sa/0/py33/pdb2.py(15)<module>() -> faz(5) > /home/sa/0/py33/pdb2.py(10)faz()->None -> return (Pdb) (py33) sa@wks:~/0/py33$
(py33) sa@wks:~/0/py33$ cat pdb3.py #!/usr/bin/env python import pdb def bar(i, n): return i * n def baz(n): for i in range(n): j = bar(i, n) print(i, j) return if __name__ == '__main__': pdb.set_trace() baz(5) (py33) sa@wks:~/0/py33$ python pdb3.py > /home/sa/0/py33/pdb3.py(19)<module>() -> baz(5) (Pdb) s --Call-- > /home/sa/0/py33/pdb3.py(10)baz() -> def baz(n): (Pdb) l 5 6 def bar(i, n): 7 return i * n 8 9 10 -> def baz(n): 11 for i in range(n): 12 j = bar(i, n) 13 print(i, j) 14 return 15 (Pdb) until 13 # execute until line 13 > /home/sa/0/py33/pdb3.py(13)baz() -> print(i, j) (Pdb) p i, j, n (0, 0, 5) (Pdb) s 0 0 > /home/sa/0/py33/pdb3.py(11)baz() -> for i in range(n): (Pdb) s > /home/sa/0/py33/pdb3.py(12)baz() -> j = bar(i, n) (Pdb) s --Call-- > /home/sa/0/py33/pdb3.py(6)bar() -> def bar(i, n): (Pdb) l 1 #!/usr/bin/env python 2 3 import pdb 4 5 6 -> def bar(i, n): 7 return i * n 8 9 10 def baz(n): 11 for i in range(n): (Pdb) r --Return-- > /home/sa/0/py33/pdb3.py(7)bar()->5 # return value is 5 -> return i * n (Pdb) s > /home/sa/0/py33/pdb3.py(13)baz() -> print(i, j) (Pdb) p i, j, n (1, 5, 5) (Pdb) w /home/sa/0/py33/pdb3.py(19)<module>() -> baz(5) > /home/sa/0/py33/pdb3.py(13)baz() -> print(i, j) (Pdb) l 8 9 10 def baz(n): 11 for i in range(n): 12 j = bar(i, n) 13 -> print(i, j) 14 return 15 16 17 if __name__ == '__main__': 18 pdb.set_trace() (Pdb) until # continue execution until the line with a number greater than the current one is reached 1 5 2 10 3 15 4 20 > /home/sa/0/py33/pdb3.py(14)baz() -> return (Pdb) s --Return-- > /home/sa/0/py33/pdb3.py(14)baz()->None -> return # next line after the for loop is exhausted (Pdb) s --Return-- > /home/sa/0/py33/pdb3.py(19)<module>()->None -> baz(5) (Pdb) l 14 return 15 16 17 if __name__ == '__main__': 18 pdb.set_trace() 19 -> baz(5) [EOF] (Pdb) s (py33) sa@wks:~/0/py33$ BreakpointsThe most powerful way of stepping trough our programs is done using
so-called breakpoints: As programs grow longer, even using A better solution is to let the program execute until it reaches a point where we want execution to pause and hand control over to pdb. We could use set_trace() to drop us into pdb, but that only works if there is only one point where we want to pause execution... It is more convenient to run our program through pdb and tell it where to pause execution using so-called breakpoints. pdb would then pause execution at the line before a breakpoint and drop us into the pdb prompt: (py33) sa@wks:~/0/py33$ cat pdb4.py #!/usr/bin/env python def calc(i, n): bar = i * n print('bar =', bar) if bar > 0: print('bar is positive') return bar def foo(n): for i in range(n): print('i =', i) bar = calc(i, n) return if __name__ == '__main__': foo(5) (py33) sa@wks:~/0/py33$ python -m pdb pdb4.py > /home/sa/0/py33/pdb4.py(4)<module>() -> def calc(i, n): (Pdb) l 1 #!/usr/bin/env python 2 3 4 -> def calc(i, n): # pdb pauses on encounter of the first statement/expression 5 bar = i * n 6 print('bar =', bar) 7 if bar > 0: 8 print('bar is positive') 9 10 return bar 11 (Pdb) w /home/sa/0/cpython3.3/lib/python3.3/bdb.py(392)run() -> exec(cmd, globals, locals) <string>(1)<module>() > /home/sa/0/py33/pdb4.py(4)<module>() -> def calc(i, n): (Pdb) break 8 # set breakpoint using a line number Breakpoint 1 at /home/sa/0/py33/pdb4.py:8 (Pdb) continue # keep executing until the next breakpoint i = 0 bar = 0 i = 1 bar = 5 > /home/sa/0/py33/pdb4.py(8)calc() -> print('bar is positive') (Pdb) There are several options to (py33) sa@wks:~/0/py33$ python -m pdb pdb4.py > /home/sa/0/py33/pdb4.py(4)<module>() -> def calc(i, n): (Pdb) break calc # using a function name to set the breakpoint Breakpoint 1 at /home/sa/0/py33/pdb4.py:4 # breakpoint with ID 1 (Pdb) l 1 #!/usr/bin/env python 2 3 4 B-> def calc(i, n): # set breakpoint indicated by capital B 5 bar = i * n 6 print('bar =', bar) 7 if bar > 0: 8 print('bar is positive') 9 10 return bar 11 (Pdb) continue i = 0 > /home/sa/0/py33/pdb4.py(5)calc() -> bar = i * n (Pdb) w /home/sa/0/cpython3.3/lib/python3.3/bdb.py(392)run() -> exec(cmd, globals, locals) <string>(1)<module>() /home/sa/0/py33/pdb4.py(21)<module>() -> foo(5) /home/sa/0/py33/pdb4.py(16)foo() -> bar = calc(i, n) > /home/sa/0/py33/pdb4.py(5)calc() -> bar = i * n (Pdb)
Managing BreakpointsAs each new breakpoint is added, it is assigned a numerical identifier. These IDs can then be used to enable, disable, and clear the breakpoints interactively. The debugging session below sets two breakpoints, then disables one.
The program is run until the remaining breakpoint is encountered, and
then the other breakpoint is turned back on with the (py33) sa@wks:~/0/py33$ python -m pdb pdb4.py > /home/sa/0/py33/pdb4.py(4)<module>() -> def calc(i, n): # pdb pauses on encounter of the first statement/expression (Pdb) l 1 #!/usr/bin/env python 2 3 4 -> def calc(i, n): 5 bar = i * n 6 print('bar =', bar) 7 if bar > 0: 8 print('bar is positive') 9 10 return bar 11 (Pdb) break calc Breakpoint 1 at /home/sa/0/py33/pdb4.py:4 (Pdb) break 8 Breakpoint 2 at /home/sa/0/py33/pdb4.py:8 (Pdb) break # list breakpoints Num Type Disp Enb Where 1 breakpoint keep yes at /home/sa/0/py33/pdb4.py:4 2 breakpoint keep yes at /home/sa/0/py33/pdb4.py:8 (Pdb) l 1, 10 1 #!/usr/bin/env python 2 3 4 B-> def calc(i, n): # execution currently paused at breakpoint with ID 1 5 bar = i * n 6 print('bar =', bar) 7 if bar > 0: 8 B print('bar is positive') 9 10 return bar (Pdb) disable 1 # disable a breakpoint Disabled breakpoint 1 at /home/sa/0/py33/pdb4.py:4 (Pdb) break Num Type Disp Enb Where 1 breakpoint keep no at /home/sa/0/py33/pdb4.py:4 2 breakpoint keep yes at /home/sa/0/py33/pdb4.py:8 (Pdb) continue i = 0 bar = 0 i = 1 bar = 5 > /home/sa/0/py33/pdb4.py(8)calc() -> print('bar is positive') (Pdb) break Num Type Disp Enb Where 1 breakpoint keep no at /home/sa/0/py33/pdb4.py:4 2 breakpoint keep yes at /home/sa/0/py33/pdb4.py:8 breakpoint already hit 1 time # we hit breakpoint with ID 2 once already (Pdb) enable 1 # enable a disables breakpoint Enabled breakpoint 1 at /home/sa/0/py33/pdb4.py:4 (Pdb) break Num Type Disp Enb Where 1 breakpoint keep yes at /home/sa/0/py33/pdb4.py:4 # marked non-temporary (keep) 2 breakpoint keep yes at /home/sa/0/py33/pdb4.py:8 breakpoint already hit 1 time (Pdb) w /home/sa/0/cpython3.3/lib/python3.3/bdb.py(392)run() -> exec(cmd, globals, locals) <string>(1)<module>() /home/sa/0/py33/pdb4.py(21)<module>() -> foo(5) /home/sa/0/py33/pdb4.py(16)foo() -> bar = calc(i, n) > /home/sa/0/py33/pdb4.py(8)calc() -> print('bar is positive') (Pdb) l 3 4 B def calc(i, n): 5 bar = i * n 6 print('bar =', bar) 7 if bar > 0: 8 B-> print('bar is positive') 9 10 return bar 11 12 13 def foo(n): (Pdb) clear 1 # delete/clear breakpoint with ID 1 Deleted breakpoint 1 at /home/sa/0/py33/pdb4.py:4 (Pdb) break Num Type Disp Enb Where 2 breakpoint keep yes at /home/sa/0/py33/pdb4.py:8 breakpoint already hit 1 time (Pdb) l 1, 10 1 #!/usr/bin/env python 2 3 4 def calc(i, n): 5 bar = i * n 6 print('bar =', bar) 7 if bar > 0: 8 B-> print('bar is positive') 9 10 return bar (Pdb) Temporary BreakpointsA temporary breakpoint is automatically cleared the first time program execution hits it. Using a temporary breakpoint lets us reach a particular spot in the program flow quickly, just as with a regular breakpoint, but since it is cleared immediately it does not interfere with subsequent progress if that part of the program is run repeatedly: (py33) sa@wks:~/0/py33$ python -m pdb pdb4.py > /home/sa/0/py33/pdb4.py(4)<module>() -> def calc(i, n): (Pdb) tbreak 8 # set temporary breakpoint at line 8 Breakpoint 1 at /home/sa/0/py33/pdb4.py:8 (Pdb) break Num Type Disp Enb Where 1 breakpoint del yes at /home/sa/0/py33/pdb4.py:8 # marked temporary (del) (Pdb) l 1 #!/usr/bin/env python 2 3 4 -> def calc(i, n): 5 bar = i * n 6 print('bar =', bar) 7 if bar > 0: 8 B print('bar is positive') 9 10 return bar 11 (Pdb) c i = 0 bar = 0 i = 1 bar = 5 Deleted breakpoint 1 at /home/sa/0/py33/pdb4.py:8 # pdb deletes temporary breakpoint automatically > /home/sa/0/py33/pdb4.py(8)calc() -> print('bar is positive') (Pdb) break # temporary breakpoint is gone (Pdb) q (py33) sa@wks:~/0/py33$ Conditional BreakpointsRather than enabling/disabling breakpoints manually we can use conditional breakpoints which then gives us finer control over when pdb pauses our program. Conditions can be applied to breakpoints so that execution only pauses when those conditions are met. Conditional breakpoints can be set in two ways:
(py33) sa@wks:~/0/py33$ python -m pdb pdb4.py > /home/sa/0/py33/pdb4.py(4)<module>() -> def calc(i, n): (Pdb) break 5, i > 0 # set condition at the same time the breakpoint is set Breakpoint 1 at /home/sa/0/py33/pdb4.py:5 (Pdb) break Num Type Disp Enb Where 1 breakpoint keep yes at /home/sa/0/py33/pdb4.py:5 stop only if i > 0 # condition (Pdb) l 1, 10 1 #!/usr/bin/env python 2 3 4 -> def calc(i, n): 5 B bar = i * n 6 print('bar =', bar) 7 if bar > 0: 8 print('bar is positive') 9 10 return bar (Pdb) continue i = 0 bar = 0 i = 1 > /home/sa/0/py33/pdb4.py(5)calc() -> bar = i * n (Pdb) l 1 #!/usr/bin/env python 2 3 4 def calc(i, n): 5 B-> bar = i * n 6 print('bar =', bar) 7 if bar > 0: 8 print('bar is positive') 9 10 return bar 11 (Pdb) p i, n (1, 5) (Pdb) condition 1 i > 4 # add condition to existing breakpoint with ID 1 New condition set for breakpoint 1. (Pdb) break Num Type Disp Enb Where 1 breakpoint keep yes at /home/sa/0/py33/pdb4.py:5 stop only if i > 4 # same breakpoint, new condition breakpoint already hit 2 times (Pdb) continue bar = 5 bar is positive i = 2 bar = 10 bar is positive i = 3 bar = 15 bar is positive i = 4 bar = 20 bar is positive The program finished and will be restarted > /home/sa/0/py33/pdb4.py(4)<module>() -> def calc(i, n): (Pdb) break Num Type Disp Enb Where 1 breakpoint keep yes at /home/sa/0/py33/pdb4.py:5 stop only if i > 4 breakpoint already hit 5 times (Pdb) q (py33) sa@wks:~/0/py33$ Ignoring BreakpointsPrograms with a lot of looping or recursive calls to the same function
are often easier to debug by skipping ahead in the execution, instead
of watching every call or breakpoint. The (py33) sa@wks:~/0/py33$ python -m pdb pdb4.py > /home/sa/0/py33/pdb4.py(4)<module>() -> def calc(i, n): (Pdb) l 1 #!/usr/bin/env python 2 3 4 -> def calc(i, n): 5 bar = i * n 6 print('bar =', bar) 7 if bar > 0: 8 print('bar is positive') 9 10 return bar 11 (Pdb) break 8 Breakpoint 1 at /home/sa/0/py33/pdb4.py:8 (Pdb) ignore 1 2 # do not pause on the next two hits of breakpoint with ID 1 Will ignore next 2 crossings of breakpoint 1. (Pdb) break Num Type Disp Enb Where 1 breakpoint keep yes at /home/sa/0/py33/pdb4.py:8 ignore next 2 hits (Pdb) continue i = 0 bar = 0 i = 1 bar = 5 bar is positive i = 2 bar = 10 bar is positive i = 3 bar = 15 > /home/sa/0/py33/pdb4.py(8)calc() -> print('bar is positive') (Pdb) break Num Type Disp Enb Where 1 breakpoint keep yes at /home/sa/0/py33/pdb4.py:8 breakpoint already hit 3 times (Pdb) ignore 1 1 Will ignore next 1 crossing of breakpoint 1. (Pdb) break Num Type Disp Enb Where 1 breakpoint keep yes at /home/sa/0/py33/pdb4.py:8 ignore next 1 hits breakpoint already hit 3 times (Pdb) ignore 1 0 # explicitly resetting the ignore count to zero re-activates the breakpoint Will stop next time breakpoint 1 is reached. (Pdb) break Num Type Disp Enb Where 1 breakpoint keep yes at /home/sa/0/py33/pdb4.py:8 breakpoint already hit 3 times (Pdb) quit (py33) sa@wks:~/0/py33$ Triggering Actions on a BreakpointIn addition to the purely interactive mode, pdb supports basic
scripting. Using After issuing the (py33) sa@wks:~/0/py33$ python -m pdb pdb4.py > /home/sa/0/py33/pdb4.py(4)<module>() -> def calc(i, n): (Pdb) l 1 #!/usr/bin/env python 2 3 4 -> def calc(i, n): 5 bar = i * n 6 print('bar =', bar) 7 if bar > 0: 8 print('bar is positive') 9 10 return bar 11 (Pdb) break 5 Breakpoint 1 at /home/sa/0/py33/pdb4.py:5 (Pdb) commands 1 # start command session for breakpoint with ID 1 (com) print("i is {}".format(i)) # add Python statement (com) end # end command session (Pdb) continue i = 0 'i is 0' > /home/sa/0/py33/pdb4.py(5)calc() -> bar = i * n (Pdb) continue bar = 0 i = 1 'i is 1' > /home/sa/0/py33/pdb4.py(5)calc() -> bar = i * n (Pdb) break Num Type Disp Enb Where 1 breakpoint keep yes at /home/sa/0/py33/pdb4.py:5 breakpoint already hit 2 times (Pdb) condition 1 i > 2 # all other things work unchanged e.g. adding a condition New condition set for breakpoint 1. (Pdb) break Num Type Disp Enb Where 1 breakpoint keep yes at /home/sa/0/py33/pdb4.py:5 stop only if i > 2 breakpoint already hit 2 times (Pdb) continue bar = 5 bar is positive i = 2 bar = 10 bar is positive i = 3 'i is 3' > /home/sa/0/py33/pdb4.py(5)calc() -> bar = i * n (Pdb) quit (py33) sa@wks:~/0/py33$ Restarting a ProgramWhen pdb reaches the end of our program, it automatically restarts it. We can also restart it explicitly without leaving pdb and thereby losing breakpoints and other settings. Running the below program to completion within the debugger prints the name of the file, since no other arguments were given on the command line: (py33) sa@wks:~/0/py33$ cat pdb5.py #!/usr/bin/env python import sys def foo(): print("Command line arguments: {}".format(sys.argv)) return if __name__ == '__main__': foo() (py33) sa@wks:~/0/py33$ python -m pdb pdb5.py > /home/sa/0/py33/pdb5.py(4)<module>() # first run through pdb5.py -> import sys # pdb pauses on encounter of the first statement/expression (Pdb) continue Command line arguments: ['pdb5.py'] The program finished and will be restarted # pdb restarts a program automatically > /home/sa/0/py33/pdb5.py(4)<module>() # second run through pdb5.py -> import sys The program can be restarted using (Pdb) run foo 2 "long argument..." Restarting pdb5.py with arguments: pdb5.py > /home/sa/0/py33/pdb5.py(4)<module>() -> import sys (Pdb) continue Command line arguments: ['pdb5.py', 'foo', '2', 'long argument...'] The program finished and will be restarted > /home/sa/0/py33/pdb5.py(4)<module>() -> import sys (Pdb)
(py33) sa@wks:~/0/py33$ python -m pdb pdb5.py > /home/sa/0/py33/pdb5.py(4)<module>() -> import sys (Pdb) list 1 #!/usr/bin/env python 2 3 4 -> import sys 5 6 7 def foo(): 8 print("Command line arguments: {}".format(sys.argv)) 9 return 10 11 (Pdb) break 8 # set breakpoint Breakpoint 1 at /home/sa/0/py33/pdb5.py:8 (Pdb) continue > /home/sa/0/py33/pdb5.py(8)foo() -> print("Command line arguments: {}".format(sys.argv)) (Pdb) run 1 3 cat # restart with new arguments Restarting pdb5.py with arguments: pdb5.py > /home/sa/0/py33/pdb5.py(4)<module>() -> import sys (Pdb) break Num Type Disp Enb Where 1 breakpoint keep yes at /home/sa/0/py33/pdb5.py:8 # we did not loose the breakpoint breakpoint already hit 1 time (Pdb) continue > /home/sa/0/py33/pdb5.py(8)foo() -> print("Command line arguments: {}".format(sys.argv)) (Pdb) s Command line arguments: ['pdb5.py', '1', '3', 'cat'] > /home/sa/0/py33/pdb5.py(9)foo() -> return (Pdb) list 4 import sys 5 6 7 def foo(): 8 B print("Command line arguments: {}".format(sys.argv)) 9 -> return 10 11 12 if __name__ == '__main__': 13 foo() [EOF] (Pdb) quit (py33) sa@wks:~/0/py33$ Saving Configuration SettingsDebugging a program involves a lot of repetition — running the code,
observing the output, adjusting the code and/or inputs, and running it
again. Luckily there is
Any command that can be typed at the pdb prompt ( MiscellaneousThere are more things we can do like for example use the Another thing which I personally do not use very often though is assert
One thing that is often forgotten is that So when should we use >>> __debug__ # a constant, True if Python was not started with -O, therefore True >>> if __debug__: # no need to wrap this block into a try/except/finally compound statement ... if not 5 > 6: ... raise AssertionError("foo", "bar") ... ... ... Traceback (most recent call last): File "<input>", line 3, in <module> AssertionError: ('foo', 'bar') >>> Continuous Integration/Deployment
Peer ReviewPeer review is what needs to happen every time we add/amend code, right after running tests and source code checkers and before we check the added/amended code into the main repository's development branch. Peer review should be made as simple as possible because otherwise it is not going to happen. In case we are using Github then pull requests are the most common way of doing peer review — people can fork our repository, add/remove/amend, and finally issue a pull request. For members of an organization page, it makes sense to maybe have a
branch per member or have feature branches where commits can be
reviewed by others before they get merged into the repositories
default development branch (e.g. We should have made sure our code is in compliance with basic Python guidelines and the particular project in question before we ask peers to review our code — for example, someone asking for peer review should have used source code checkers, run all existing tests, and added new ones for all the code he might have added/amended. Release Management
Jenkins
py.test
pep8
PylintSoftware MetricsMiscellaneous
Package, Distribute, InstallThis section is all about packaging, distributing and installing Python software. History
This text is a literal copy take from Martijn Faassen blog where he describes the current (October 2010) state of packaging/distributing Python software. The reason I (Markus Gattol) include it here in full length again is that I find it utterly important for anyone to understand the pig picture about why things are the way the are today and what happened during the last ten or so years so that we finally ended up with a pretty amazing toolchain and infrastructure in order to develop, package, distribute and share Python software. IntroductionEarlier this year I (read Martijn Faassen) was at PyCon in the US. I had an interesting experience there: people were talking about the problem of packaging and distributing Python libraries. People had the impression that this was an urgent problem that had not been solved yet. I detected a vibe asking for the Python core developers to please come and solve our packaging problems for us. I felt like I had stepped into a parallel universe. I have been using powerful tools to assemble applications from Python packages automatically for years now. Last summer at EuroPython, when this discussion came up again, I maintained that packaging and distributing Python libraries is a solved problem. I put the point strongly, to make people think. I fully agree that the current solutions are imperfect and that they can be improved in many ways. But I also maintain that the current solutions are indeed solutions. There is now a lot of packaging infrastructure in the Python community, a lot of technology, and a lot of experience. I think that for a lot of Python developers the historical background behind all this is missing. I will try to provide one here. It is important to realize that progress has been made, step by step, for more than a decade now, and we have a fine infrastructure today. I have named some important contributors to the Python packaging story, but undoubtedly I have also not mentioned a lot of other important names. My apologies in advance to those I missed. The dawn of Python packagingThe Python world has been talking about solutions for packaging and distributing Python libraries for a very long time. I remember when I was new in the Python world about a decade ago, in the late 90s, it was considered important and urgent that the Python community implement something like Perl's CPAN. I am sure too that this debate had started long before I started paying attention. I have never used CPAN, but over the years I have seen it held up by many as something that seriously contributes to the power of the Perl language. With CPAN, I understand, you can search and browse Perl packages and you can install them from the net. So, lots of people were talking about a Python equivalent to CPAN with some urgency. At the same time, the Python world did not seem to move very quickly on this front... DistutilsThe Distutils SIG (special interest group) was started in late 1998.
Greg Ward in the context of this discussion group started to create
Distutils about this time. Distutils allows you to structure your
Python project so that it has a setup.py. Through this MetadataWe now had a way to distribute and install Python packages, if we did the distribution ourselves. We did not have a centralized index (or catalog) of packages yet, however. To work on this, the Catalog SIG was started in the year 2000. The first step was to standardize the metadata that could be cataloged by any index of Python packages. Andrew Kuchling drove the effort on this, culminating in PEP 241 in 2001, later updated by PEP 314. Distutils was modified so it could work with this standardized metadata. PyPIIn late 2002, Richard Jones started work on the PyPI (Python Package Index) — PyPI was initially known as the Cheeseshop. The first work on an implementation started, and PEP 301 that describes PyPI was also created then. Distutils was extended so the metadata and packages themselves could be uploaded to this package index. By 2003, the Python package index was up and running. The Python world now had a way to upload packages and metadata
(actually known as distributions) to a central index. If we then
manually downloaded a package we could install it using SetuptoolsPhillip Eby started work on Setuptools in 2004. Setuptools is a whole range of extensions to Distutils such as from a binary installation format (eggs), an automatic package installation tool, and the definition and declaration of scripts for installation. Work continued throughout 2005 and 2006, and feature after feature was added to support a whole range of advanced usage scenarios. By 2005, you could install packages automatically into your Python interpreter using EasyInstall. Dependencies would be automatically pulled in. If packages contained C code it would pull in the binary egg, or if not available, it would compile one automatically. The sheer amount of features that Setuptools brings to the table must be stressed: namespace packages, optional dependencies, automatic manifest building by inspecting version control systems, web scraping to find packages in unusual places, recognition of complex version numbering schemes, and so on, and so on. Some of these features perhaps seem esoteric to many, but complex projects use many of them. The Problems of Shared PackagesThe problem remained that all these packages were installed into your Python interpreter. This is icky. People's site-packages directories became a mess of packages. You also need root access to EasyInstall a package into your system Python. Sharing all packages in a direcory in general, even locally, is not always a good idea: one version of a library needed by one application might break another one. Solutions for this emerged in 2006. VirtualenvIan Bicking drove one line of solutions: virtual-python, which evolved into workingenv, which evolved into virtualenv in 2007. The concept behind this approach is to allow the developer to create as many fully working Python environments as they like from a central system installation of Python. When the developer activates the virtualenv, EasyInstall respectively its successor PIP will install all packages into its the virtualenv's site-packages directory. This allows you to create a virtualenv per project and thus isolate each project from each other. BuildoutIn 2006 as well, Jim Fulton created Buildout, building on Setuptools
and EasyInstall. Buildout can create an isolated project environment
like virtualenv does, but is more ambitious: the goal is to create a
system for repeatable installations of potentially very complex
projects. Instead of writing an The brilliance of Buildout is that it is easily extensible with new installation recipes. These recipes themselves are also installed automatically from PyPI. This has spawned a whole ecosystem of Buildout recipes that can do a whole range of things, from generating documentation to installing MySQL. Since Buildout came out of the Zope world, Buildout for a long time was seen as something only Zope developers would use, but the technology is not Zope-specific at all, and more and more developers are picking up on it. In 2008, Ian Bicking created an alternative for EasyInstall called PIP, also building on Setuptools. Less ambitious than buildout, it aimed to fix some of the shortcomings of EasyInstall. I have not used it myself yet, so I will leave it to others to go into details. Setuptools and the Standard LibraryThe many improvements that Setuptools brought to the Python packaging story had not made it into the Python Standard Library, where Distutils was stagnating. Attempts had been made to bring Setuptools into the standard library at some point during its development, but for one reason or another these efforts had foundered. Setuptools probably got where it is so quickly because it worked around often very slow process of adopting something into the standard library, but that approach also helped confuse the situation for Python developers. Last year Tarek Ziade started looking into the topic of bringing improvements into Distutils. There was a discussion just before PyCon 2009 about this topic between various Python developers as well, which probably explains why the topic was in the air. I understood that some decisions were made:
DistributeBy 2008, Setuptools had become a vital part of the Python development infrastructure. Unfortunately the Setuptools development process has some flaws. It is very centered around Phillip Eby. While he had been extremely active before, by that time he was spending a lot less energy on it. Because of the importance of the technology to the wider community, various developers had started contributing improvements and fixes, but these were piling up. This year, after some period of trying to open up the Setuptools project itself, some of these developers led by Tarek Ziade decided to fork Setuptools. The fork is named Distribute. The aim is to develop the technology with a larger community of developers. One of the first big improvements of the Distribute project is Python 3 support. Quite understandably this fork led to some friction between Tarek, Phillip and others. I trust that this friction will resolve itself and that the developers involved will continue to work with each other, as all have something valuable contribute. PackagingFrom Setuptools to Distribute to distutils2 to packaging. Starting with Python 3.3 packaging will replace distutils and become the standard for packaging/distributing/installing. Operating System PackagingOne point that always comes up in discussions about Python packaging tools is operating system packaging. In particular Linux distributions have developed extremely powerful ways to distribute and install complex libraries and application, manage versions and dependencies and so on. Naturally when the topic of Python packaging comes up, people think about operating system packaging solutions like this. Let me start off that I fully agree that Python packaging solutions can learn a lot from operating system packaging solutions. Why don't we just use a solution like that directly, though? Why is a Python specific packaging solution necessary at all? There are a number of answers to this. One is that operating packaging solutions are not universal: if we decided to use Debian's system, what would we do on Windows? The most important answer however is that there are two related but also very different use cases for packaging:
The Python packaging systems described primarily tries to solve the development use case: I am a Python developer, and I am developing multiple projects at the same time, perhaps in multiple versions, that have different dependencies. I need to reuse packages created by other developers, so I need an easy way to depend on such packages. These packages are sometimes in a rather early state of development, or perhaps I am even creating a new one. If I want to improve such a package I depend on, I need an easy way to start hacking on it. Operating system packaging solutions as I have seen them used are ill suited for the development use case. They are aimed at creating a single consistent installation that is easy to upgrade with an eye on security. Backwards compatibility is important. Packages tend to be relatively mature. For all I know it might indeed be possible to use an operating system packaging tool as a good development packaging tool. But I have heard very little about such practices. Please enlighten me if you have. It is also important to note that the Python world is not as good as
it should be at supporting operating system packaging solutions. The
freeing up of package metadata from the confines of the ConclusionsWe are now in a time of consolidation and opening up. Many of the solutions pioneered by Setuptools are going to be polished to go into the Python Standard Library. At the same time, the community surrounding these technologies is opening up. By making metadata used by Distutils and Setuptools more easily available to other systems, new tools can also more easily be created. The Python packaging story had many contributors over the years. We now have a powerful infrastructure. Do we have an equivalent to CPAN? I do not know enough about CPAN to be sure. But what we have is certainly useful and valuable. In my parallel universe, I use advanced Python packaging tools every day, and I recommend all Python programmers to look into this technology if they have not already. Join me in my parallel universe! Update: I just found out there was a huge thread on python-dev about this in the last few days which focused around the question whether we have the equivalent of CPAN now. One of them funny coincidences... History ContinuesAt PyCon 2010 the decision was made to basically exchange distutils with distutils2 where distutils2 is a fork of Distribute. Setuptools, distutils and Distribute are going to die (read phased out). pip will stay and once distutils is replaced by distutils2, it will work with it as it does now (October 2010) with Distribute. Take a look at this picture:
Glossary
WRITEME ExamplesJust provide the below links and then show a bunch of examples for each step i.e. package, distribute, install; thereby using the tools (pip, distribute, etc.) listed below
WRITEME Tools, UtilitiesThis section provides information on what tools I use on a daily basis when it comes to developing/deploying/administer/test/etc. Python software. pythonrcWhen using the interactive interpreter it is frequently handy to have some code executed every time the interpreter is started e.g. to load some module or to set some environment variable etc. We can do this by setting the environment variable sa@wks:~$ cat .pythonrc #!/usr/bin/env python """Initialization file for the Python interpreter. This is the initialization file for the Python interpreter used in interactive sessions. Some general pointers: - /ws/python.html#pythonrc - /ws/python.html#bpython """ __author__ = "Markus Gattol" __author_email__ = "[email protected]" __copyright__ = "Copyright (C) 2012 Free Software Foundation, Inc." __development_status__ = "Production/Stable" __license__ = "Simplified BSD License" __url__ = "/ws/python.html" __version__ = "1.0" #_ main #_. imports import sys import os import pprint #_. import saved bpython sessions if available try: from startup import * # do not do that in real code print("Successful import from startup.py.") except ImportError: print("No startup file available.") #_. colored prompt and autocompletion if os.getenv('TERM') in ('xterm', 'vt100', 'rxvt', 'Eterm', 'putty'): try: import readline except ImportError: print("Module readline not available.") sys.ps1 = '\033[01;33m>>> \033[0m' sys.ps2 = '\033[01;33m... \033[0m' else: import rlcompleter readline.parse_and_bind("tab: complete") sys.ps1 = '\001\033[01;33m\002>>> \001\033[0m\002' sys.ps2 = '\001\033[01;33m\002... \001\033[0m\002' #_. fast way to show what is on sys.path try: def show_sys_path(): """Prints a pretty version of everything on sys.path. We do this because having one output per line is easier to read compared to having several outputs on a single line. """ pprint.pprint(sys.path) # show automatically when starting the interactive interpreter print("sys.path currently holds:") show_sys_path() except ImportError: print("Module pprint not available.") #_. shadow sys.display hook with pprint variant def my_displayhook(value): """Overriding the built-in version with our own. We do this because having one output per line is easier to read compared to having several outputs on a single line. """ if value is not None: try: import __builtin__ __builtin__._ = value except ImportError: # not Python 2 but Python 3 import builtins builtins._ = value pprint.pprint(value) sys.displayhook = my_displayhook #_. do for bpython what shell_plus from django-extensions does for ipython try: from django.core.management import setup_environ from django.conf import settings try: import settings setup_environ(settings) print("Sucessfully imported Django settings.") except ImportError: # non-standard place for config import config.settings setup_environ(config.settings) print("Sucessfully imported Django settings.") try: print("Attempting to import Django models:") from django.db.models.loading import get_models, get_apps for app in get_apps(): app_models = get_models(app) if not app_models: continue model_labels = ", ".join([model.__name__ for model in app_models]) try: exec("from %s import *" % app.__name__) print(" From '%s' load: %s" % (app.__name__.split('.')[-2], model_labels)) except Exception: print(" Not imported for '%s'" % app.__name__.split('.')[-2]) except ImportError: pass except ImportError: pass #_ emacs local variables # Local Variables: # mode: python # allout-layout: (0 : 0) # End: All right, now let us check if we wrote a good module or not: sa@wks:~$ pep8 .pythonrc # all good for pep8 (no output) sa@wks:~$ pylint --disable=F0401,W0703,W0122,W0611,C0301 .pythonrc # all good for pylint as well sa@wks:~$ echo; pylint --help-msg=F0401,W0703,W0122,W0611,C0301 # closer look of what we ignored :F0401: *Unable to import %r* Used when pylint has been unable to import a module. This message belongs to the imports checker. :W0703: *Catch "Exception"* Used when an except catches Exception instances. This message belongs to the exceptions checker. :W0122: *Use of the exec statement* Used when you use the "exec" statement, to discourage its usage. That doesn't mean you can not use it ! This message belongs to the basic checker. :W0611: *Unused import %s* Used when an imported module or variable is not used. This message belongs to the variables checker. :C0301: *Line too long (%s/%s)* Used when a line is longer than a given number of characters. This message belongs to the format checker. sa@wks:~$ pep8 tells us nothing (no output), meaning everything is fine. BPythonIs there a better Python Shell/Interpreter? Yes, yes, there is! There is iPython and then there is bpython which I have come to love. It is packaged with Debian sa@wks:~$ dpkg -l bpython | grep ii ii bpython 0.9.7.1-1 fancy interface to the Python interpreter - Curses frontend sa@wks:~$ There is also http://bpaste.net, a pastebin site. This for itself is no big deal. The fact that bpython can ship off its contents (what we typed) at the press of a button, right into bpaste.net, however is — I often use it to sketch things in a live interpreter session and then quickly show it to folks while we talk on IRC, maybe during debugging some code and stuff like that. There are a lot more goodies at our disposal like for example
Django support. Most of it can be configured in sa@wks:~$ cat .bpython/config | grep -v \# | grep . [general] auto_display_list = True syntax = True arg_spec = True hist_file = ~/.pythonhist hist_len = 5000 tab_length = 4 color_scheme = suno [keyboard] pastebin = F8 save = C-s And then there is of course a custom theme we might use sa@wks:~$ cat .bpython/suno.theme | grep -v \# | grep . [syntax] keyword = y name = W comment = w string = M error = r number = G operator = Y punctuation = y token = C [interface] background = d output = w main = w prompt = w prompt_more = w sa@wks:~$ The coolest thing about bpython is probably autocompletion, inline syntax highlighting, the fact that is shows us the expected parameter list as we type and last but not least, the possibility to rewind what we typed not just graphically but also internally i.e. the results of each such expression we typed. Below is a screenshot showing a few of the just mentioned things: Multiple Python VersionsCurrently (March 2011) we can run bpython with those Python versions: 2.4, 2.5, 2.6, 2.7 and 3. The default version is the one our Debian system links to: sa@wks:~$ type ll; ll $(which python) ll is aliased to `ls -lh -I "*\.pyc"' lrwxrwxrwx 1 root root 9 Jan 17 07:53 /usr/bin/python -> python2.6 sa@wks:~$ However, we can use others as well simply by creating an alias in our
sa@wks:~$ grep ', bpython' -A9 .bashrc ###_ , bpython # use whatever python is default plus consider environment alias bp='/usr/bin/env bpython' # try to force python2.6; does not consider environment alias bp2='$(which python2.6) -m bpython.cli' # try to force python2.6; does not consider environment alias bp3='$(which python3) -m bpython.cli' sa@wks:~$ The added benefit of bpython and Python 3Currently (March 2011) there is no bpython Debian package for Python 3, only for Python 2. That is no problem however, we can install from source:
bpython and VirtualenvIf we want to use bpython from within a virtual environment, then we need to do three things: install bpython into virtual environmentOne, we install bpython into each virtual environment using modify .pythonrcIf a ~/.pythonrc is used then we add a pound bang line at the top sa@wks:~$ head -n1 .pythonrc #!/usr/bin/env python sa@wks:~$ What happens now is that bpython Environment AwarenessLast but not least, we start bpython so that the current environment is taken into account. We can use a simple shell alias for that. bpython and DjangoUsually, being at the root of a Django project, which we created using
sa@wks:~/0/django/myproject$ python manage.py help shell | grep Runs Runs a Python interactive interpreter. Tries to use IPython, if it's available. sa@wks:~/0/django/myproject$ If however we want The rationale behind Note that whatever file Furthermore, we can also change the prompts Now that we have [skipping a lot of lines...] #_. do for bpython what shell_plus from django-extensions does for ipython try: from django.core.management import setup_environ from django.conf import settings try: import settings setup_environ(settings) print("Sucessfully imported Django settings.") except ImportError: import config.settings setup_environ(config.settings) print("Sucessfully imported Django settings.") try: print("Attempting to import Django models:") from django.db.models.loading import get_models, get_apps for app in get_apps(): app_models = get_models(app) if not app_models: continue model_labels = ", ".join([model.__name__ for model in app_models]) try: exec("from %s import *" % app.__name__) print(" From '%s' load: %s" % (app.__name__.split('.')[-2], model_labels)) except Exception: print(" Not imported for '%s'" % app.__name__.split('.')[-2]) except ImportError: pass except ImportError: pass [skipping a lot of lines...] With this in place bpython (or even just the ordinary Python interpreter) imports the Django environment for us. Let us now have a look at what it looks/feels like when we issue our bp alias from within an activated virtual environment which already has a Django project installed: (aa) sa@wks:~/0/python/projects/aa$ bp No startup file available. sys.path currently holds: ['', '/home/sa/0/1/aa/bin', '/home/sa/0/1/aa/lib/python2.6/site-packages/distribute-0.6.10-py2.6.egg', '/home/sa/0/1/aa/lib/python2.6/site-packages/pip-0.7.2-py2.6.egg', '/home/sa/.pip/source/django-authorizenet', '/home/sa/.pip/source/django-dbindexer', [skipping a lot of lines...] '/home/sa/0/python/projects/aa', '/home/sa/0/1/aa', '/home/sa/0/python/projects/aa/apps'] Sucessfully imported Django settings. Attempting to import Django models: From 'auth' load: Permission, Group, User, Message From 'contenttypes' load: ContentType From 'sessions' load: Session From 'sites' load: Site From 'polls' load: Poll, Choice From 'admin' load: LogEntry From 'socialregistration' load: FacebookProfile, TwitterProfile, OpenIDProfile, OpenIDStore, OpenIDNonce From 'featureflipper' load: Feature From 'reversion' load: Revision, Version From 'djcelery' load: TaskMeta, TaskSetMeta, IntervalSchedule, CrontabSchedule, PeriodicTasks, PeriodicTask, WorkerState, TaskState >>> We started bpython from within an activated virtual environment and it
gave us all kinds of goodies right out of the box — that is because
>>> settings.INSTALLED_APPS ['django_mongodb_engine', 'django.contrib.auth', 'django.contrib.contenttypes', 'django.contrib.sessions', 'django.contrib.sites', 'django.contrib.messages', 'polls', 'django.contrib.admin', [skipping a lot of lines...] 'dbindexer'] >>> sys.displayhook <function my_displayhook at 0x1c10380> >>> len(dir()) 104 >>> exit()
(aa) sa@wks:~/0/python/projects/aa$ cat apps/polls/models.py import datetime from django.db import models class Poll(models.Model): # real code would have docstrings question = models.CharField(max_length=200) pub_date = models.DateTimeField('date published') def __unicode__(self): return self.question def was_published_today(self): return self.pub_date.date() == datetime.date.today() was_published_today.short_description = 'Published today?' class Choice(models.Model): poll = models.ForeignKey(Poll) choice = models.CharField(max_length=200) votes = models.IntegerField() def __unicode__(self): return self.choice (aa) sa@wks:~/0/python/projects/aa$ Within our Django project we have created a Django application called
(aa) sa@wks:~/0/python/projects/aa$ bp No startup file available. sys.path currently holds: ['', '/home/sa/0/1/aa/bin', '/home/sa/0/1/aa/lib/python2.6/site-packages/distribute-0.6.10-py2.6.egg', [skipping a lot of lines...] From 'reversion' load: Revision, Version From 'djcelery' load: TaskMeta, TaskSetMeta, IntervalSchedule, CrontabSchedule, PeriodicTasks, PeriodicTask, WorkerState, TaskState >>> Poll.objects.all() [] >>> p = Poll(question="What's up?", pub_date=datetime.now()) >>> p.save() >>> p.id u'4d63beba4ed6db0e36000000' >>> type(p.id) <type 'unicode'> >>> p.question "What's up?" >>> p.pub_date datetime.datetime(2011, 2, 22, 7, 46, 47, 917063) >>> p.was_published_today() True >>> Nothing unusual there except for Startup FileLast but not least, we can use bpython's ability to save the current session to a file. This file is then used to load our former session into bpython again, effectively allowing us to resume our work where we left off before. The way this is accomplished is also by using ~/.pythonrc: [skipping a lot of lines...] #_. import saved bpython sessions if available try: from startup import * # do not do that in real code print("Successful import from startup.py.") except ImportError: print("No startup file available.") [skipping a lot of lines...] We can then use the >>> print("funky donkey at work") funky donkey at work >>> foo = range(4) >>> foo [0, 1, 2, 3] >>> [ here I used C+s to save the current session to startup.py... ] >>> exit() (aa) sa@wks:~/0/python/projects/aa$ cat startup.py print("funky donkey at work") # OUT: funky donkey at work foo = range(4) foo # OUT: [0, 1, 2, 3] (aa) sa@wks:~/0/python/projects/aa$ bp funky donkey at work Successful import from startup.py. sys.path currently holds: ['', '/home/sa/0/1/aa/bin', '/home/sa/0/1/aa/lib/python2.6/site-packages/distribute-0.6.10-py2.6.egg', [skipping a lot of lines...] From 'reversion' load: Revision, Version From 'djcelery' load: TaskMeta, TaskSetMeta, IntervalSchedule, CrontabSchedule, PeriodicTasks, PeriodicTask, WorkerState, TaskState >>> foo [0, 1, 2, 3] >>> Note how it now printed Virtualenv, VirtualenvwrapperThis one is all about gaining freedom — the kind of freedom that allows us to be creative, have fun and get things done quickly and in a straight forward and simple manner. So, what is it that virtualenv does in a nutshell? By using Virtual environments are isolated by default but we can also have
symlinks leaving it and going into our global Python context/space —
the installation-dependent default path and the global Python
interpreter living at Note that this behavior changed in October 2011. Before that virtual
environments were not isolated by default but were integrated with the
global Python context/space by default and one had to use the
As said, nowadays the default behavior is that those sandboxes are isolated from the rest of the system (no outgoing symlinks), meaning that by using a virtual environment, we can try out software, alter software, add/remove things, etc. — all without any danger of accidentally doing something stupid to our global Python context/space. This makes virtual environments the perfect tool for testbeds, staging
environments, versioned deployments... Installing and setting up VirtualenvInstalling virtualenv is easy. Debian provides a package for it sa@wks:~$ type dpl; dpl *virtualenv | grep ii dpl is aliased to `dpkg -l' ii python-virtualenv 1.6-4 Python virtual environment creator sa@wks:~$ virtualenv --version 1.6.4 sa@wks:~$ virtualenv --help Usage: virtualenv [OPTIONS] DEST_DIR Options: --version show program's version number and exit -h, --help show this help message and exit -v, --verbose Increase verbosity -q, --quiet Decrease verbosity -p PYTHON_EXE, --python=PYTHON_EXE The Python interpreter to use, e.g., --python=python2.5 will use the python2.5 interpreter to create the new environment. The default is the interpreter that virtualenv was installed with (/usr/bin/python) --clear Clear out the non-root install and start from scratch --system-site-packages Give access to the global site-packages dir to the virtual environment --unzip-setuptools Unzip Setuptools or Distribute when installing it --relocatable Make an EXISTING virtualenv environment relocatable. This fixes up scripts and makes all .pth files relative --distribute Use Distribute instead of Setuptools. Set environ variable VIRTUALENV_USE_DISTRIBUTE to make it the default --extra-search-dir=SEARCH_DIRS Directory to look for setuptools/distribute/pip distributions in. You can add any number of additional --extra-search-dir paths. --never-download Never download anything from the network. Instead, virtualenv will fail if local distributions of setuptools/distribute/pip are not present. --prompt==PROMPT Provides an alternative prompt prefix for this environment sa@wks:~$ Of course, one could also use Using VirtualenvBasically, what we need to know is how to create a new virtual environment (line 1), enter and activate it (lines 31 and 32), carry out some commands (e.g. line 33, looking what Python interpreter is currently active) and last but not least, switch back from the virtual environment into the global Python context/space (line 35) and yet again, look up the currently active Python interpreter (lines 36 and 37): 1 sa@wks:~/0/1$ virtualenv my_test_virt_env 2 New python executable in my_test_virt_env/bin/python 3 Installing distribute..................done. 4 Installing pip.....................done. 5 sa@wks:~/0/1$ type td; td my_test_virt_env/ 6 td is aliased to `tree --charset ascii -d -I \.git*\|*\.\~*\|*\.pyc' 7 my_test_virt_env/ 8 |-- bin 9 |-- include 10 | `-- python2.7 -> /usr/include/python2.7 11 |-- lib 12 | `-- python2.7 13 | |-- config -> /usr/lib/python2.7/config 14 | |-- distutils 15 | |-- encodings -> /usr/lib/python2.7/encodings 16 | |-- lib-dynload -> /usr/lib/python2.7/lib-dynload 17 | `-- site-packages 18 | |-- distribute-0.6.19-py2.7.egg 19 | | |-- EGG-INFO 20 | | `-- setuptools 21 | | |-- command 22 | | `-- tests 23 | `-- pip-1.0.2-py2.7.egg 24 | |-- EGG-INFO 25 | `-- pip 26 | |-- commands 27 | `-- vcs 28 `-- local -> /home/sa/0/1/my_test_virt_env 29 30 21 directories 31 sa@wks:~/0/1$ cd my_test_virt_env/ 32 sa@wks:~/0/1/my_test_virt_env$ source bin/activate 33 (my_test_virt_env)sa@wks:~/0/1/my_test_virt_env$ which python 34 /home/sa/0/1/my_test_virt_env/bin/python 35 (my_test_virt_env)sa@wks:~/0/1/my_test_virt_env$ deactivate 36 sa@wks:~/0/1/my_test_virt_env$ which python 37 /usr/bin/python 38 sa@wks:~/0/1/my_test_virt_env$ cd The whole point of using VirtualenvwrapperVirtualenvwrapper is a set of extensions to virtualenv. The extensions include wrappers for creating and deleting virtual environments and otherwise managing our development workflow, making it easier to work on more than one project at a time without introducing conflicts in their dependencies. Installing and activating virtualenvwrapper is easy — one might
either use Next we are going to address our 37 sa@wks:~$ grep -A4 ', virtualenvwrapper' .bashrc 38 ###_ , virtualenvwrapper 39 export WORKON_HOME=$HOME/0/1 40 alias cdveroots='cd $WORKON_HOME' 41 42 43 sa@wks:~$ source .bashrc; echo $WORKON_HOME 44 /home/sa/0/1 The important part here is with line 39 where we tell virtualenvwrapper where our virtual environments are going to live on the filesystem. With line 40 we also add an alias which is going to save us a lot of
time down the road since it always beams us back into Excellent! We are done installing and setting up virtualenv and virtualenvwrapper. More information can be found here, here and here. Usage Examples - CommandsWe have a bunch of commands that come with the sa@wks:~$ egrep '^[[:alpha:]]+.*\(\) {' /etc/bash_completion.d/virtualenvwrapper | grep -v _ | cut -f1 -d ' ' mkvirtualenv rmvirtualenv lsvirtualenv showvirtualenv workon add2virtualenv cdsitepackages cdvirtualenv lssitepackages toggleglobalsitepackages cpvirtualenv sa@wks:~$ Next I am going to provide a few examples about how to use some of the commands so folks can see how things work right away: 45 sa@wks:~$ workon 46 my_test_virt_env 47 sa@wks:~$ workon my_test_virt_env 48 (my_test_virt_env)sa@wks:~$ cdvirtualenv 49 (my_test_virt_env)sa@wks:~/0/1/my_test_virt_env$ cdveroot 50 (my_test_virt_env)sa@wks:~/0/1$ ll 51 total 4.0K 52 drwxr-xr-x 5 sa sa 4.0K Sep 1 13:46 my_test_virt_env 53 (my_test_virt_env)sa@wks:~/0/1$ cd /tmp 54 (my_test_virt_env)sa@wks:/tmp$ cdvirtualenv 55 (my_test_virt_env)sa@wks:~/0/1/my_test_virt_env$ pwd 56 /home/sa/0/1/my_test_virt_env 57 (my_test_virt_env)sa@wks:~/0/1/my_test_virt_env$ echo $VIRTUALENVWRAPPER_HOOK_DIR 58 /home/sa/.virtualenvs 59 (my_test_virt_env)sa@wks:~/0/1/my_test_virt_env$ ll $VIRTUALENVWRAPPER_HOOK_DIR 60 total 44K 61 -rwxrwxr-x 1 sa sa 106 May 11 10:18 get_env_details 62 -rw-r--r-- 1 sa sa 3.7K Sep 1 14:15 hook.log 63 -rwxrwxr-x 1 sa sa 92 May 11 10:18 initialize 64 -rwxr-xr-x 1 sa sa 1.3K Sep 1 14:06 postactivate 65 -rwxrwxr-x 1 sa sa 71 May 11 10:18 postdeactivate 66 -rwxr-xr-x 1 sa sa 122 Sep 1 13:37 postmkvirtualenv 67 -rwxrwxr-x 1 sa sa 63 May 11 10:18 postrmvirtualenv 68 -rwxrwxr-x 1 sa sa 70 May 11 10:18 preactivate 69 -rwxrwxr-x 1 sa sa 72 May 11 10:18 predeactivate 70 -rwxrwxr-x 1 sa sa 94 May 11 10:18 premkvirtualenv 71 -rwxrwxr-x 1 sa sa 64 May 11 10:18 prermvirtualenv 72 (my_test_virt_env)sa@wks:~/0/1/my_test_virt_env$ The command reference list all commands available. My favorite is
probably 73 (my_test_virt_env)sa@wks:~/0/1$ deactivate 74 sa@wks:~/0/1$ mkvirtualenv test 75 New python executable in test/bin/python 76 Installing distribute....................................................................................................................................................................................done. 77 virtualenvwrapper.user_scripts creating /home/sa/0/1/test/bin/predeactivate 78 virtualenvwrapper.user_scripts creating /home/sa/0/1/test/bin/postdeactivate 79 virtualenvwrapper.user_scripts creating /home/sa/0/1/test/bin/preactivate 80 virtualenvwrapper.user_scripts creating /home/sa/0/1/test/bin/postactivate 81 virtualenvwrapper.user_scripts creating /home/sa/0/1/test/bin/get_env_details 82 (test)sa@wks:~/0/1$ workon 83 my_test_virt_env 84 test Line 73 shows how easy it is to create a new virtual environment using
We now have two virtual environments (lines 83 and 84) which can be
listed using 85 (test)sa@wks:~/0/1$ workon my_test_virt_env 86 (my_test_virt_env)sa@wks:~/0/1$ cdvirtualenv 87 (my_test_virt_env)sa@wks:~/0/1/my_test_virt_env$ rmvirtualenv my_test_virt_env 88 ERROR: You cannot remove the active environment ('my_test_virt_env'). 89 Either switch to another environment, or run 'deactivate'. 90 (my_test_virt_env)sa@wks:~/0/1/my_test_virt_env$ rmvirtualenv test 91 (my_test_virt_env)sa@wks:~/0/1/my_test_virt_env$ workon 92 my_test_virt_env Lines 85 to 93 show a few things about deleting a virtual environment.
As we can see from lines 87 to 89, deleting/removing the currently
active virtual environment does not work — this is a safety switch
provided by virtualenvwrapper. As lines 90 to 92 show, our former
created virtual environment 93 (my_test_virt_env)sa@wks:~/0/1/my_test_virt_env$ cd 94 (my_test_virt_env)sa@wks:~$ cdvirtualenv bin 95 (my_test_virt_env)sa@wks:~/0/1/my_test_virt_env/bin$ pwd 96 /home/sa/0/1/my_test_virt_env/bin 97 (my_test_virt_env)sa@wks:~/0/1/my_test_virt_env/bin$ 98 (my_test_virt_env)sa@wks:~/0/1/my_test_virt_env/bin$ cdsitepackages 99 (my_test_virt_env)sa@wks:~/0/1/my_test_virt_env/lib/python2.6/site-packages$ pwd 100 /home/sa/0/1/my_test_virt_env/lib/python2.6/site-packages Since I am such a fan of
101 (my_test_virt_env)sa@wks:~/0/1/my_test_virt_env/lib/python2.6/site-packages$ cd 102 (my_test_virt_env)sa@wks:~$ lssitepackages 103 distribute-0.6.15-py2.6.egg easy-install.pth pip-1.0-py2.6.egg setuptools.pth 104 (my_test_virt_env)sa@wks:~$ lssitepackages -l 105 total 16 106 drwxr-xr-x 4 sa sa 4096 Sep 1 14:07 distribute-0.6.15-py2.6.egg 107 -rw-r--r-- 1 sa sa 235 Sep 1 13:46 easy-install.pth 108 drwxr-xr-x 4 sa sa 4096 Sep 1 14:07 pip-1.0-py2.6.egg 109 -rw-r--r-- 1 sa sa 30 Sep 1 14:07 setuptools.pth 110 (my_test_virt_env)sa@wks:~$ cd /tmp However, what if we just wanted to know its contents without visiting
The last command we are going to take a look at is 111 (my_test_virt_env)sa@wks:/tmp$ git clone git://github.com/pinax/pinax.git 112 Cloning into pinax... 113 remote: Counting objects: 40935, done. 114 remote: Compressing objects: 100% (13975/13975), done. 115 remote: Total 40935 (delta 23793), reused 39744 (delta 22788) 116 Receiving objects: 100% (40935/40935), 15.18 MiB | 969 KiB/s, done. 117 Resolving deltas: 100% (23793/23793), done. 118 (my_test_virt_env)sa@wks:/tmp$ cdvirtualenv 119 (my_test_virt_env)sa@wks:~/0/1/my_test_virt_env$ python 120 Python 2.6.7 (r267:88850, Aug 3 2011, 11:33:52) 121 [GCC 4.6.1] on linux2 122 Type "help", "copyright", "credits" or "license" for more information. 123 No startup.py file available. 124 sys.path currently holds: 125 ['', 126 '/home/sa/0/1/my_test_virt_env/lib/python2.6/site-packages/distribute-0.6.15-py2.6.egg', 127 '/home/sa/0/1/my_test_virt_env/lib/python2.6/site-packages/pip-1.0-py2.6.egg', 128 '/home/sa/0/1/my_test_virt_env/lib/python2.6', 129 '/home/sa/0/1/my_test_virt_env/lib/python2.6/plat-linux2', 130 '/home/sa/0/1/my_test_virt_env/lib/python2.6/lib-tk', 131 '/home/sa/0/1/my_test_virt_env/lib/python2.6/lib-old', 132 '/home/sa/0/1/my_test_virt_env/lib/python2.6/lib-dynload', 133 '/usr/lib/python2.6', 134 '/usr/lib/python2.6/plat-linux2', 135 '/usr/lib/python2.6/lib-tk', 136 '/home/sa/0/1/my_test_virt_env/lib/python2.6/site-packages'] 137 >>> 138 (my_test_virt_env)sa@wks:~/0/1/my_test_virt_env$ cdsitepackages 139 (my_test_virt_env)sa@wks:~/0/1/my_test_virt_env/lib/python2.6/site-packages$ type pi; pi pth 140 pi is aliased to `ls -la | grep' 141 -rw-r--r-- 1 sa sa 235 Sep 1 13:46 easy-install.pth 142 -rw-r--r-- 1 sa sa 30 Sep 1 14:07 setuptools.pth 143 (my_test_virt_env)sa@wks:~/0/1/my_test_virt_env/lib/python2.6/site-packages$ add2virtualenv /tmp/pinax/ 144 Warning: Converting "/tmp/pinax/" to "/tmp/pinax" 145 (my_test_virt_env)sa@wks:~/0/1/my_test_virt_env/lib/python2.6/site-packages$ pi pth 146 -rw-r--r-- 1 sa sa 235 Sep 1 13:46 easy-install.pth 147 -rw-r--r-- 1 sa sa 30 Sep 1 14:07 setuptools.pth 148 -rw-r--r-- 1 sa sa 11 Sep 1 14:41 virtualenv_path_extensions.pth 149 (my_test_virt_env)sa@wks:~/0/1/my_test_virt_env/lib/python2.6/site-packages$ cat virtualenv_path_extensions.pth 150 /tmp/pinax 151 (my_test_virt_env)sa@wks:~/0/1/my_test_virt_env/lib/python2.6/site-packages$ python 152 Python 2.6.7 (r267:88850, Aug 3 2011, 11:33:52) 153 [GCC 4.6.1] on linux2 154 Type "help", "copyright", "credits" or "license" for more information. 155 No startup.py file available. 156 sys.path currently holds: 157 ['', 158 '/home/sa/0/1/my_test_virt_env/lib/python2.6/site-packages/distribute-0.6.15-py2.6.egg', 159 '/home/sa/0/1/my_test_virt_env/lib/python2.6/site-packages/pip-1.0-py2.6.egg', 160 '/home/sa/0/1/my_test_virt_env/lib/python2.6', 161 '/home/sa/0/1/my_test_virt_env/lib/python2.6/plat-linux2', 162 '/home/sa/0/1/my_test_virt_env/lib/python2.6/lib-tk', 163 '/home/sa/0/1/my_test_virt_env/lib/python2.6/lib-old', 164 '/home/sa/0/1/my_test_virt_env/lib/python2.6/lib-dynload', 165 '/usr/lib/python2.6', 166 '/usr/lib/python2.6/plat-linux2', 167 '/usr/lib/python2.6/lib-tk', 168 '/home/sa/0/1/my_test_virt_env/lib/python2.6/site-packages', 169 '/tmp/pinax'] 170 >>> 171 (my_test_virt_env)sa@wks:~/0/1/my_test_virt_env/lib/python2.6/site-packages$ cdvirtualenv 172 (my_test_virt_env)sa@wks:~/0/1/my_test_virt_env$ add2virtualenv 173 Usage: add2virtualenv dir [dir ...] 174 175 Existing paths: 176 /tmp/pinax 177 (my_test_virt_env)sa@wks:~/0/1/my_test_virt_env$ python -c 'import pinax; print(pinax.VERSION)' 178 (0, 9, 0, 'a', 2) 179 (my_test_virt_env)sa@wks:~/0/1/my_test_virt_env$ deactivate; cd $VIRTUALENVWRAPPER_HOOK_DIR With this example we first clone (read download) Pinax source code
into The important thing is with line 143 which makes it so that
Usage Examples - HooksVirtualenvwrapper provides hooks that can be used to carry out actions at certain times depending on the work we do with regards to our virtual environments. There are two types of hooks. Global hooks (lines 61 to 71) which live
in Secondly, there are per virtual environment hooks which live in
While we have a bunch of global hooks, currently (September 2011)
there are only five per virtual environment hooks namely Hooks are either sourced (allowing them to modify our shell
environment e.g. change the color of our shell prompt) or run as an
external program (e.g. As an example, we are going to add a little color in order to make it easier for us to distinguish whether we are using a virtual environment or whether we are acting within the global Python context/space of our operating system. 180 sa@wks:~/.virtualenvs$ ll 181 total 48K 182 -rwxrwxr-x 1 sa sa 106 May 11 10:18 get_env_details 183 -rw-r--r-- 1 sa sa 5.1K Sep 1 15:26 hook.log 184 -rwxrwxr-x 1 sa sa 92 May 11 10:18 initialize 185 -rwxr-xr-x 1 sa sa 1.4K Sep 1 14:22 postactivate 186 -rwxrwxr-x 1 sa sa 71 May 11 10:18 postdeactivate 187 -rwxr-xr-x 1 sa sa 123 Sep 1 14:22 postmkvirtualenv 188 -rwxrwxr-x 1 sa sa 63 May 11 10:18 postrmvirtualenv 189 -rwxrwxr-x 1 sa sa 70 May 11 10:18 preactivate 190 -rwxrwxr-x 1 sa sa 72 May 11 10:18 predeactivate 191 -rwxrwxr-x 1 sa sa 94 May 11 10:18 premkvirtualenv 192 -rwxrwxr-x 1 sa sa 64 May 11 10:18 prermvirtualenv 193 sa@wks:~/.virtualenvs$ cat postactivate 194 #!/bin/bash 195 # This hook is run after every virtualenv is activated. 196 sa@wks:~/.virtualenvs$ workon my_test_virt_env 197 (my_test_virt_env)sa@wks:~/.virtualenvs$ deactivate 198 199 200 [ here we edit postactivate... ] 201 202 203 sa@wks:~/.virtualenvs$ cat postactivate 204 #!/bin/bash 205 # This hook is run after every virtualenv is activated. 206 207 PS1="\[\033[01;33m\]($(basename $VIRTUAL_ENV))\[\033[00m\] $_OLD_VIRTUAL_PS1" 208 cd $VIRTUAL_ENV 209 sa@wks:~/.virtualenvs$ workon my_test_virt_env 210 (my_test_virt_env) sa@wks:~/0/1/my_test_virt_env$
The currently active virtual environment is now yellow plus we got a blank in between the yellow colored virtual environment and our default prompt. Another very handy thing I would recommend putting into Since PIP is used by pretty much anybody these days, here is how we make virtualenv/virtualenvwrapper and PIP complement each other nicely, thus providing for enhanced user experience: sa@wks:~$ grep -A8 '\. pip' .bashrc ###_ . pip export PIP_DOWNLOAD_CACHE=$HOME/.pip/cache export PIP_SOURCE_DIR=$HOME/.pip/source export PIP_BUILD_DIR=$HOME/.pip/build export PIP_VIRTUALENV_BASE=$WORKON_HOME export PIP_REQUIRE_VIRTUALENV=true sa@wks:~$ This makes PIP use a local cache directory, thus save us time and bandwidth and also allows us to work offline to some degree e.g. while on a plane without Internet connectivity. We also have a directory to store source code downloaded with PIP (which is what we use when we decide to use so-called editable packages e.g. those are taken directly from some GIT repository which updates we want to follow). Of course, in some cases we need to compile/build source code so we also have a build directory. Next it detects an active virtual environment and install to it,
without having to pass it the Django EnvironmentThis one I love! So far we have seen how to create virtual environments. We even use virtualenvwrapper to make it a joy to work with those virtual environments. We create all kinds of Python projects atop/inside those virtual
environments e.g. we can test our code with different Python versions
like for example 2.7 and 3.2 by simply switching back and forth with
However, while creating virtual environments is now easy and fast
( Virtualenv-CommandsSo far I am not using it but from what I have seen it is pretty cool too. Please go here and here for more information. Detect a VirtualenvEven though the idea behind a virtual environment is to be transparent, there might be cases when we want to detect a virtual environment i.e. whether or not we run inside one or not. Here is how if sys.real_prefix: print("We are inside a virtualenv.") This checks for the existence of Virtualenv + GIT + Bash PromptWe already know how to display information related to the current virtual environment inside our Bash prompt. If, in addition, we want to have GIT related information as well, we can do so easily. Here is what it looks like:
PIP... the Python installation tool PIP is an acronym for pip installs packages, while technically the tool installs distributions, the name package is used as its meaning is more widely understood. Even the site where distributions are distributed at is called the Python Package Index rather than Python Distribution Index. WRITEME Bash CompletionIn order to have Bash completion with PIP, here is what we do:
sa@wks:~$ grep -A7 "^_pip_completion()" .bashrc _pip_completion() { COMPREPLY=( $( COMP_WORDS="${COMP_WORDS[*]}" \ COMP_CWORD=$COMP_CWORD \ PIP_AUTO_COMPLETE=1 $1 ) ) } complete -o default -F _pip_completion pip sa@wks:~$ A quick Editable PackagesThose are packages/applications we install using Distutils, Setuptools, DistributeOne should have read about the history of packaging before continuing here.
DistributeThis is the one we recommend to use. It is the successor to Setuptools, which itself has always been considered the better Distutils.
GNU EmacsSince GNU Emacs is my weapon of choice for pretty much any battle these days, I would like to honor my good fellow by explicitly telling a bit how I made the out of the box setup which Emacs provides for Python programming even more cosy ;-] TheoryBecause sometimes knowing how to fly a plane is not enough but rather, we need to know the physics involved and maybe even how to design the engine. Call by...value? reference? Neither one is what Python does! Let us first have a look at what those two evaluation schemes are and then how Python differs. Call-by-ValueCall-by-value evaluation (also known as pass-by-value) is the most common evaluation strategy, used in languages as different as C/C++ and Scheme. C++ defaults to call-by-value evaluation but offers to use call-by-reference where/when needed/desired. In call-by-value, the argument expression is evaluated, and the resulting value is bound to the corresponding variable in the function (frequently by copying the value into a new memory region) — a pictures example of this can be seen here. If the function or method is able to assign values to its parameters, only its local copy is assigned i.e. everything passed into a function call is unchanged in the caller's scope when the function returns. Call-by-ReferenceIn call-by-reference evaluation (also known as pass-by-reference), a function receives an implicit reference to the argument, rather than a copy of its value. This means that a function or method can modify the argument and thus the value in the caller's scope. Call-by-reference therefore has the advantage of greater time- and space-efficiency (values do not need to be copied in memory), as well as the potential for greater communication between a function/method and its caller (the function/method can return information using its reference arguments), but the disadvantage that a function must often take special steps to protect values it wishes to pass to other functions. Perl for example defaults to call-by-reference whereas others such as C++ default to call-by-value but offer means to use call-by-reference. Call-by-SharingWhat Python does is called call-by-sharing also known as call-by-object-reference. So how is this different from call-by-value and call-by-reference? Well, it is both, somewhat...
The reason why people often confuse what Python does (call-by-sharing) with call-by-value and/or call-by-reference may be due to the fact that, in Python, the value of a name is a reference to an object i.e. we always pass the value (no implicit copying), and that value is always a reference — call-by-object-reference... Since Python is a dynamically typed language, Python values (actually objects — in Python everything is an object, remember?), not variables, carry type. This has implications for many aspects of the way the language behaves e.g. the way default parameter values behave. All variables in Python hold references to objects, and these references are passed to functions/methods. A function/method cannot change the value a variable references in its calling function. Type SystemBefore we start, there are some terms we need to know:
Python has a dynamic type system. However, despite having a dynamic type system, Python is strongly typed, forbidding operations that are not well-defined like, for example, adding a number to a string. Being a dynamically typed language means name resolution is made through dynamic binding also known as late binding i.e. name resolution happens during run time. In other words, Python binds method and variable names during program execution rather than at compile time. Functional ProgrammingWRITEME Functions in Python are so-called First-Class objects. Inversion of Control
IntrospectionIntrospection is source code looking at source code e.g. other modules and functions in memory as objects, getting information about them, and manipulating them. In general, in computing, type introspection is a capability of some object-oriented programming languages to determine the type of an object at run time. WRITEME Aspect Oriented ProgrammingWRITEME Metaprogramming
MetaclassPlease go here. Abstract Class, Abstract SuperclassPlease go here for more information. ReflectionLazy vs Greedy EvaluationWRITEME ProtocolParallelismWRITEME
Global Interpreter LockA GIL (Global Interpreter Lock) is a mutual exclusion lock held by an interpreter thread. Its use is to avoid sharing code that is not thread-safe with other threads. There is always one GIL for one interpreter process. Problem is, while this gives better performance on single-core machines, it fails to scale for multiprocessor machines. Even though that with CPython we are stuck with the GIL, there are projects trying to get rid of the GIL — examples are Unladen Swallow and Stackless Python. Last but not least, even with CPython, the GIL might not be as much of a problem as many think as we can have several ways for parallelism even with CPython. Design Patterns
A design pattern is a general reusable solution to a commonly This is a huge subject, ready to fill bookshelves on its own. This subsection will look at design patterns at code-level with regards to Python i.e. not all known design patterns exist in Python respectively make sense using them when programming in Python (this is true for any other language as well). Design patterns can be divided into several categories: Creational Patterns, Structural Patterns, Behavioral Patterns and Concurrency Patterns. They are described using the concepts of delegation, aggregation, and consultation. There also exists another classification that has the notion of architectural design patterns which may be applied at the architecture level of the software such as the MVC (Model-View-Controller) pattern. This high-level view on design patterns is not covered here. WRITEME Creational PatternCreational design patterns are design patterns that deal with object creation mechanisms, trying to create objects in a manner suitable to the situation. The basic form of object creation could result in design problems or added complexity to the design. Creational design patterns solve this problem by somehow controlling this object creation. Singleton / BorgAny type of which only one instance exists at all times is called a singleton. One example of a singleton is None.
FactoryMixinGenerally, in OOP (Object-Oriented Programming) languages, a mixin is a class/type that provides a certain functionality to be inherited or just reused by a subclass/subtype, while not meant for instantiation. Inheriting from a mixin is not a form of specialization but is rather a means of assembling features/functionality, something that can also be achieved through composition (which in fact is mostly the better way to assemble features/functionality). A class/type may inherit most or all of its features/functionality from one or more mixins through multiple inheritance. Of course, only programming languages that support multiple inheritance allow us to do mixins but that is implicit to the cause. Let me just make a short note on mixins in Ruby:
So what is the difference between a mixin and multiple inheritance? Is it just a matter of semantics and use? Yes, the difference between a mixin and multiple inheritance is in fact just a matter of semantics i.e. if we create a class/type using multiple inheritance then we might as well utilize mixins by subclassing from them, thus assembling features/functionality in our class/type. We ultimately end up with the assembly of features/functionality contained in all mixins and everything we put into this particular class/type itself. Mixins encourage the DRY (Don't repeat yourself) principle (code reuse) because they can be used in two ways:
We are now going to use the However, additionally, there are two mixin classes/types: Clearly, the functionality to create a new thread or fork a process is
not terribly useful as a stand-alone class/type but very much so when
used as a mixin. Also, just to mention/demonstrate it, there is
>>> import socketserver >>> socketserver.__all__ ['TCPServer', # the base class/type we are going to use 'UDPServer', 'ForkingUDPServer', 'ForkingTCPServer', 'ThreadingUDPServer', 'ThreadingTCPServer', # ready-made class/type 'BaseRequestHandler', 'StreamRequestHandler', 'DatagramRequestHandler', 'ThreadingMixIn', # the threading mixin, the mixin we chose 'ForkingMixIn', 'UnixStreamServer', 'UnixDatagramServer', 'ThreadingUnixStreamServer', 'ThreadingUnixDatagramServer'] Let us have a look at the ready-made class/type first — its superclass/supertype as well as its MRO (Method Resolution Order): >>> socketserver.ThreadingTCPServer.__bases__ (<class 'socketserver.ThreadingMixIn'>, <class 'socketserver.TCPServer'>) >>> socketserver.ThreadingTCPServer.mro() [<class 'socketserver.ThreadingTCPServer'>, <class 'socketserver.ThreadingMixIn'>, <class 'socketserver.TCPServer'>, <class 'socketserver.BaseServer'>, <class 'object'>] >>> class Bar(socketserver.ThreadingTCPServer): ... pass ... ... >>> Bar.__bases__ (<class 'socketserver.ThreadingTCPServer'>,) >>> Bar.mro() [<class '__main__.Bar'>, <class 'socketserver.ThreadingTCPServer'>, <class 'socketserver.ThreadingMixIn'>, <class 'socketserver.TCPServer'>, <class 'socketserver.BaseServer'>, <class 'object'>] Now with the extra manual step of building a class/type that is
semantically the same as >>> class Foo(socketserver.ThreadingMixIn, socketserver.TCPServer): ... pass ... ... >>> Foo.__bases__ (<class 'socketserver.ThreadingMixIn'>, <class 'socketserver.TCPServer'>) >>> Foo.mro() [<class '__main__.Foo'>, <class 'socketserver.ThreadingMixIn'>, <class 'socketserver.TCPServer'>, <class 'socketserver.BaseServer'>, <class 'object'>] >>> Marvelous! TraitTraits are a simple composition mechanism for structuring object-oriented programs. A trait is essentially a parameterized set of methods which serves as a behavioral building block for classes/types and is the most basic unit of code reuse. With traits, classes/types are still organized in a single inheritance hierarchy, but they can make use of traits to specify the incremental difference in behavior with respect to their superclasses/supertypes. Unlike mixins and multiple inheritance, traits do not employ inheritance as the composition operator. Instead, trait composition is based on a set of composition operators that are complementary to single inheritance and result in better composition properties. In short: A trait is a bunch of methods and attributes with the following characteristics:
Characteristics from 4 to 7 are the distinguishing characteristics of traits with respect to multiple inheritance and mixins. In particular, because of 4 and 5, all the complications with the MRO (Method Resolution Order) disappear and the overriding is never implicit. Property 6 is mostly unusual — typically in Python the superclass/supertype has the precedence over mixin classes. Property 7 should be intended in the sense that a trait implementation must provide introspection facilities to make seemless the transition between classes viewed as atomic entities and as composed entities. Resource Acquisition Is Initialization
Structural PatternStructural design patterns are design patterns that ease the design by identifying a simple way to realize relationships between entities. FacadeProxyAdapterWhat an adapter does is wrap a class/type DecoratorSee decorator and decorator vs adapter. Behavioral PatternBehavioral design patterns are design patterns that identify common communication patterns between objects and realize these patterns. By doing so, these patterns increase flexibility in carrying out this communication. State Pattern
Delegation
Strategy
Chain of Responsibility
ObserverVisitorTemplateConcurrency PatternConcurrency patterns are those types of design patterns that deal with multi-threaded programming paradigm. Actor Model
MiscellaneousThis section provides miscellaneous information within regards to Python. Manual InstallHere is how we manually install Python from trunk/HEAD/tip (or
whatever one may call it; read most-current or up-to-date) on Debian
at a filesystem location of our choosing (it usually installs to
sa@wks:~$ su Password: wks:/home/sa# lsb_release -a No LSB modules are available. Distributor ID: Debian Description: Debian GNU/Linux unstable (sid) Release: unstable Codename: sid wks:/home/sa# cd /tmp; aptitude install git mercurial build-essential zlib1g-dev libreadline-dev libncursesw5-dev libncurses5-dev libsqlite3-dev mime-support libbz2-dev [skipping a lot of lines...] wks:/tmp# hg clone http://hg.python.org/cpython # here we clone the Python Mercurial repository destination directory: cpython # once cloned "hg pull -u" would pull in new changes requesting all changes adding changesets adding manifests adding file changes added 73394 changesets with 164534 changes to 9371 files (+1 heads) updating to branch default 3716 files updated, 0 files merged, 0 files removed, 0 files unresolved wks:/tmp# date -u; cd cpython Sun Nov 6 01:33:06 UTC 2011 wks:/tmp/cpython# ./configure --prefix=/tmp/python-$(date +%s) # let us put a timestamp here checking for hg... found checking for --enable-universalsdk... no checking for --with-universal-archs... 32-bit [skipping a lot of lines...] creating Modules/Setup.local creating Makefile wks:/tmp/cpython# make && make install gcc -pthread -c -Wno-unused-result -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I. -I./Include -DPy_BUILD_CORE -o Modules/python.o ./Modules/python.c gcc -pthread -c -Wno-unused-result -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I. -I./Include -DPy_BUILD_CORE -o Parser/acceler.o Parser/acceler.c [skipping a lot of lines...] (cd /tmp/python-1320543270/bin; ln -s 2to3-3.3 2to3) rm -f /tmp/python-1320543270/bin/pysetup3 (cd /tmp/python-1320543270/bin; ln -s pysetup3.3 pysetup3) wks:/tmp/cpython# cd .. wks:/tmp# cd python-1320543270/ wks:/tmp/python-1320543270# date -d @1320543270 Sun Nov 6 01:34:30 GMT 2011 wks:/tmp/python-1320543270# date -d @$(date +%s) Sun Nov 6 01:40:02 GMT 2011 # compiling took roughly six minutes wks:/tmp/python-1320543270# ./bin/python3 -c 'import sys; print(sys.version)' 3.3.0a0 (default:992ba03d60a8, Nov 6 2011, 01:37:15) [GCC 4.6.2] wks:/tmp/python-1320543270# type la; la la is aliased to `ls -la' total 24 drwxr-xr-x 6 root root 4096 Nov 6 01:38 . drwxrwxrwt 41 root root 4096 Nov 6 01:45 .. drwxr-xr-x 2 root root 4096 Nov 6 01:38 bin drwxr-xr-x 3 root root 4096 Nov 6 01:38 include drwxr-xr-x 4 root root 4096 Nov 6 01:38 lib drwxr-xr-x 3 root root 4096 Nov 6 01:38 share wks:/tmp/python-1320543270# echo 'that is it... we just build ourselves a bleeding-edge up-to-date Python :)' that is it... we just build ourselves a bleeding-edge up-to-date Python :) wks:/tmp/python-1320543270# DebianThis subsection is intended to cover Debian specifics with regards to Python. ../dist-packagesBefore we actually answer that, let us have a look at the big picture of having public and private installations of Python modules and packages. Let us also have a glance at the difference about the main Python installations (also known as global Python context/space) and virtual environments: By default Python modules/packages are searched in the current working
directory first, next in the directories listed in the PYTHONPATH
environment variable and finally all directories listed in the
The full truth is that Python initializes That said, there are generally three ways to install Python modules/packages — there are public ones and private ones with regards to the systems main Python installation (also known as global Python context/space) and then there are virtual environments which are either clones of the global Python context/space or which are entirely separated Python contexts/spaces on their own:
Right now we are only looking at the global Python context/space and leave aside virtual environments. We are also just looking at the public modules/packages subset and not how to handle private modules/packages within the global Python context/space. Finally, why Debian has ../dist-packages directories:The installation location for Python code packaged by Debian is the
system Python modules directory, Tools used for packaging Python source code for Debian like
In case we are on Python 2.6 or later and do not use APT but some
other means (e.g. EasyInstall, PIP, etc.) to install public Python
code, When binary packages ship identical source code for multiple Python
versions, for instance SummaryThe below is true for Python 2.6 and bigger
Note the difference between Monkey PatchingA monkey patch is a way to extend and/or modify the run-time code of dynamic languages such as Smalltalk, JavaScript, Objective-C, Ruby, Perl, Python, Groovy, etc. without altering its on-disk source code. In Python, the term monkey patch only refers to dynamic modifications of a class/type at run time based on the intent to patch existing methods in an external class as a workaround to a bug and/or feature which does not act as we desire. Examples of using monkey patching are
In general however it is fair to say that one should refrain from monkey patching since it mostly introduces more problems than it solves.
Even if monkey patching is not used, many people see a problem with the availability of the feature, since the ability to use monkey patching in a programming language is incompatible with enforcing strong encapsulation, as required by the object-capability model, between objects. Bottom line is, one should not use monkey patching for the afore mentioned reasons and many more... Magic NumberThe magic number from Unix-like operating systems where the first few
bytes of a file hold a marker indicating the file type. It tells the
operating system whether or not the file is a binary executable, and
if so, which of several types thereof. The 1 sa@wks:/tmp$ echo "some text" > mytextfile 2 sa@wks:/tmp$ file my* 3 myimage.png: PNG image data, 2560 x 1600, 8-bit/color RGB, non-interlaced 4 mytextfile: ASCII text 5 sa@wks:/tmp$ mv myimage.png myblabla.txt 6 sa@wks:/tmp$ file myblabla.txt 7 myblabla.txt: PNG image data, 2560 x 1600, 8-bit/color RGB, non-interlaced 8 sa@wks:/tmp$ File extensions and name do not matter, only the magic number matters as can be seen from lines 3 and 7, where we are actually looking at the same file which only happens to have different names and file extensions. So, what does all this have to do with Python one might ask? Well,
Python puts a similar marker into its bytecode ( The Python interpreter then makes sure this number is correct when loading it. Anything that damages this magic number will cause a problem like for example Traceback (most recent call last): File "amman.001", line 3, in <module> ImportError: Bad magic number in..........amman.pyc Things that could cause such damage include things like editing the
If they are our own However, if they are not ours, we will have to either get the One thing worth noting here is that, as it often happens, we might see
such problem only occur under certain circumstances e.g. with lazy
imports Magic Numbers in PythonAs mentioned, different versions of the Python interpreter have
different magic numbers. The list of all magic numbers can be found in
sa@wks:~/0/python/py3/Python$ grep -A51 "Known values" import.c Known values: Python 1.5: 20121 Python 1.5.1: 20121 Python 1.5.2: 20121 [skipping a lot of lines...] Python 3.2a0: 3160 (add SETUP_WITH) tag: cpython-32 Python 3.2a1: 3170 (add DUP_TOP_TWO, remove DUP_TOPX and ROT_FOUR) tag: cpython-32 Python 3.2a2 3180 (add DELETE_DEREF) sa@wks:~/0/python/py3/Python$ date -u Wed Feb 2 10:44:13 UTC 2011 sa@wks:~/0/python/py3/Python$ Sorting and Searching
TimeTime is an important subject, in general, and of course also for
programmers and technicians like us. Python's standard library ships a
few modules (time, datetime and calendar) that help us with all kinds
of time-related tasks in Python... set it, read it, store it,
manipulate it... People have also written many third-party modules
such as python-dateutil which provide us with additional features such
as enhanced mathematical operations on naive, awareThere are two kinds of date and time objects in Python: naive and aware. This distinction refers to whether the object has any notion of timezone, DST (Daylight Saving Time), or other kind of algorithmic or political time adjustment. Whether a naive date/time object represents UTC (Universal Time Coordinated), local time, or time in some other timezone is purely up to the program, just like it is up to the program whether a particular number represents metres, miles, or mass. Naive date/time objects are easy to understand and to work with, at the cost of ignoring some aspects of reality. Daylight Saving Time, UTC, GMTDST is this bizarre thing of moving clocks back and forth twice a
year. This happens roughly at the same time in most countries but
there are of course many exceptions which makes the whole notion of
DST even more silly. Python's Roughly speaking, UTC is the better/newer GMT (Greenwich Mean Time),
that is all we need to know when it comes to coding really. Quite
often however the terms are used interchangeably e.g. the We should always determine and store time in UTC — storing time as local time, its offset to UTC and whether or not DST is in effect should be avoided as it is a recipe for confusion and errors. If we need to record where time was taken then we store the offset to UTC, the timezone name, and whether or not DST is in effect separately and apply this information whenever date/time is exposed externally e.g. to a user. In other words: we always deal with UTC-based date/time values/objects internally and only map to local representation and/or DST when date and/or time exposed externally e.g. shown to a user. Do:
Do Not:
ISO 8601
>>> from time import strftime >>> strftime("%Y-%m-%d %H:%M:%S") # time module '2011-10-29 21:06:40' >>> from datetime import datetime >>> str(datetime.now()).split('.')[0] # datetime module '2011-10-29 21:06:46' >>> "{:%Y-%m-%d %H:%M:%S}".format(datetime.now()) # using .format() string method '2011-10-29 21:06:51' >>> time vs datetimeThe That being said, timeThe time module uses two different representations for a point in time and provides numerous functions to help us convert back and forth between the two:
One of the core functions of the Wall Clock Time vs Processor Time >>> import time >>> time.time() 1320009370.628919 # seconds since Unix epoch (1970-01-01 00:00:00) >>> type(time.time()) <class 'float'> # type float >>> int(time.time()) # seconds resolution as integer is easier to work with >>> 1320009378 >>> time.ctime() 'Sun Oct 30 21:22:43 2011' # non-ISO 8601 string representation Although the value is always a float, actual precision is
platform-dependent. The float representation is useful when storing or
comparing times internally, but not as useful for showing it
externally e.g. to a user. In that case it makes more sense to use
While >>> import time >>> def worker(): ... for i in range(1000000): ... i += i/2 ... ... ... >>> def using_time(): ... start = time.time() ... worker() ... print("elapsed time: {}".format(time.time() - start)) ... ... >>> def using_clock(): ... start = time.clock() ... worker() ... print("elapsed time: {}".format(time.clock() - start)) ... ... >>> using_time() elapsed time: 1.7183270454406738 >>> using_clock() elapsed time: 1.6999999999999886 >>> Although both, Timezones >>> time.tzname ('GMT', 'BST') # I am currently in London therefore >>> time.timezone 0 # local time and UTC is the same in my case >>> time.daylight 1 # this timezone uses DST >>> time.altzone -3600 # current DST offset in seconds >>> time.strftime('%Y-%m-%d %H:%M:%S', time.gmtime()) '2011-10-31 22:11:17' # ISO 8601 string representation >>> time.strftime('%Y-%m-%d %H:%M:%S') '2011-10-31 22:11:21' # ISO 8601 string representation >>> time.ctime() 'Mon Oct 31 11:16:10 2011' # non-ISO 8601 string representation >>> If we wanted to return UTC then we could use struct_time Storing times as elapsed seconds is useful in some situations, but there are times when we need to have access to the individual fields of a point in time (year, month, etc.). As mentioned, the time module has two different representations for a
point in time, the floating point based one which we have just seen
using
There are several functions that work with >>> time.gmtime() time.struct_time(tm_year=2011, tm_mon=10, tm_mday=31, tm_hour=9, tm_min=39, tm_sec=55, tm_wday=0, tm_yday=304, tm_isdst=0) >>> time.localtime() time.struct_time(tm_year=2011, tm_mon=10, tm_mday=31, tm_hour=9, tm_min=40, tm_sec=4, tm_wday=0, tm_yday=304, tm_isdst=0) >>> time.mktime(time.gmtime()) 1320054020.0 >>> time.time() 1320054024.777523 >>> time.mktime(time.localtime()) # convert a struct_time to float 1320054095.0 >>> time.time() 1320054096.891176 >>> now = time.gmtime() >>> now.tm_yday 304 >>> now.tm_year 2011
Parsing and Formatting Times The two functions >>> now = time.ctime() >>> now 'Mon Oct 31 10:49:41 2011' # non ISO 8601 string representation >>> time.strptime(now) time.struct_time(tm_year=2011, tm_mon=10, tm_mday=31, tm_hour=10, tm_min=49, tm_sec=41, tm_wday=0, tm_yday=304, tm_isdst=-1) >>> time.strftime('%Y-%m-%d %H:%M:%S', time.strptime(now)) '2011-10-31 10:49:41' # ISO 8601 string representation >>> time.strftime('%Y-%m-%d %H:%M:%S') '2011-10-31 10:50:26' # ISO 8601 string representation >>> Those are just examples on how to use the Miscellaneous Last but not least, the >>> def sleep_example(): ... print(time.time()) ... time.sleep(2) # will delay processing for 2 seconds ... print(time.time()) ... ... >>> sleep_example() 1320055818.41326 1320055820.415575 >>> datetimeThe
As of now (November 2011) there are six classes/types in the datetime module that help us handle dates and times in a uniform and correct manner:
Notes about those types:
Class/Type relationships: object timedelta # no notion of naive/aware tzinfo # an ABC; used by time and datetime objects timezone # subclass/subtype of tzinfo time # naive or aware date # naive-only datetime # naive or aware We will now take a closer look at the >>> import datetime >>> datetime.MINYEAR # constants exported by the datetime module 1 >>> datetime.MAXYEAR 9999 >>> now = datetime.datetime.now() >>> now datetime.datetime(2011, 10, 30, 1, 30, 51, 906080) >>> repr(now) 'datetime.datetime(2011, 10, 30, 1, 30, 51, 906080)' # ISO 8601 representation >>> print(now) 2011-10-30 01:30:51.906080 # print() returns __str__() if available, __repr__() otherwise >>> str(now) '2011-10-30 01:30:51.906080' >>> str(now).split('.')[0] '2011-10-30 01:30:51' >>> now.second # instance attribute 51 >>> now.microsecond 906080 >>> now.year 2011 >>> now.year = 2021 # fails because instance attributes are read-only Traceback (most recent call last): File "<input>", line 1, in <module> AttributeError: attribute 'year' of 'datetime.date' objects is not writable >>> now.min # class attributes datetime.datetime(1, 1, 1, 0, 0) >>> now.max datetime.datetime(9999, 12, 31, 23, 59, 59, 999999) >>> now.resolution datetime.timedelta(0, 0, 1) Note that the default string representation of >>> datetime.datetime.timetuple(now) time.struct_time(tm_year=2011, tm_mon=10, tm_mday=30, tm_hour=1, tm_min=30, tm_sec=51, tm_wday=6, tm_yday=303, tm_isdst=-1) >>> type(datetime.datetime.timetuple(now)) <class 'time.struct_time'> # a named tuples give us >> bar = datetime.datetime.timetuple(now) >>> bar[0] # index-based as well as 2011 >>> bar.tm_year # name-based access 2011 >>> bar.tm_sec 51 >>> time.gmtime() time.struct_time(tm_year=2011, tm_mon=10, tm_mday=30, tm_hour=19, tm_min=9, tm_sec=8, tm_wday=6, tm_yday=303, tm_isdst=0) >>> type(time.gmtime()) <class 'time.struct_time'> >>> A There is much more to be found in the Summary
CodecsWRITEME Tips and TricksThis section is used to collect bits and pieces of mostly unrelated bits and pieces. What they all have in common however is that they are considered to be pythonic when it comes to pure coding and/or, they all relate to Python in a certain way: Find Code on the FilesystemUsing introspection we can use a module's >>> import email >>> email.__file__ '/home/sa/0/1/python3.3/lib/python3.3/email/__init__.py' >>> import sys >>> sys.__file__ Traceback (most recent call last): File "<input>", line 1, in <module> AttributeError: 'module' object has no attribute '__file__' >>> sys.version '3.3.0a0 (default:c33aa14f4edb, Nov 5 2011, 21:41:34) \n[GCC 4.6.2]' >>> The Python VersionHere is how we find out about which version of Python we are running: sa@wks:~$ python >>> import sys >>> sys.version[:3] '3.3' >>> import platform >>> platform.python_version() '3.3.0a0' >>> import sysconfig >>> sysconfig.get_python_version() # available since Python 3.2 '3.3' >>> sa@wks:~$ python --version Python 3.3.0a0 sa@wks:~$ python -c "import sys; print(sys.version)" 3.3.0a0 (default:c33aa14f4edb, Nov 5 2011, 21:41:34) [GCC 4.6.2] sa@wks:~$ Underscore / Interactive InterpreterThe name or identifier sa@wks:~$ python >>> 2 + 2 4 >>> _ 4 >>> _ + 2 6 >>> _ + 3 9 >>> print(_) 9 >>> sa@wks:~$ Pretty Print JSONSay we want to transform this JSON (JavaScript Object Notation)
document { foo: "lorem", bar: "ipsum" } How do we do this? Before we continue however... yes, this is a simplified example with just two fields i.e. even the single line version is quite readable. Try the same with 100 fields of different data types and several levels of nesting... dramatic pause... yes, it makes total sense to know how to bring a JSON document into a more human readable form! Using the standard operating system CLI (Command Line Interface) like for example Bash, we can do: sa@wks:~$ echo '{'foo': "lorem", 'bar': "ipsum"}' | python -m json.tool { 'bar': "ipsum", 'foo': "lorem" } sa@wks:~$ From Python itself there are many ways such as using a built-in module
( >>> import json >>> print(json.dumps({'foo': "lorem", 'bar': "ipsum"}, indent=4)) { 'foo': "lorem", 'bar': "ipsum" } >>> or, we can use a more powerful third party library such as >>> import jsonlib >>> print(jsonlib.write ({'foo': "lorem", 'bar': "ipsum"}, indent = ' ')) { 'foo': "lorem", 'bar': "ipsum" } >>> One example where all this might come in handy is when using MongoDB, as MongoDB uses JSON extensively... (actually it is BSON (Binary JSON) but...). Reverse a String>>> x = "hello world" >>> x[::-1] 'dlrow olleh' >>> As a matter of fact, this works on any sequence type. Additionally,
any type that implements a Extract a SubstringSay we have the string >>> import re >>> mystring = "foo34bar" >>> substring = re.search('\d+', mystring).group() >>> substring '34' >>> ''.join(i for i in mystring if i.isdigit()) '34' >>> ''.join(i for i in mystring if i.isalpha()) 'foobar' >>> Split the extension from a pathname>>> import os.path >>> os.path.splitext("file-1.4.tar.gz")[0] 'file-1.4.tar' >>> os.path.splitext("file-1.4.tar.bz2")[0] 'file-1.4.tar' >>> os.path.splitext("foo.jpg")[0] 'foo' >>> os.path.splitext("mongodb.cpp")[0] 'mongodb' >>> Blank LinesSometimes we have strings or files containing blank lines which we want to get rid of: sa@wks:/tmp$ echo -e "hello\n\nworld" > myfile.txt sa@wks:/tmp$ cat myfile.txt hello world sa@wks:/tmp$ python >>> with open('/tmp/myfile.txt', encoding='utf-8') as foo: ... [line for line in foo if line.strip()] ... ... ['hello\n', 'world\n'] >>> Of course, we could also write back to In this example we used a list comprehension while in practice a generator expression might be a better choice. Also, note that we have used the with compound statement here because file objects implement the context management protocol. EnumerateThe built-in enumerate function allows us to enumerate a sequence using numbers: >>> seasons = ["Spring", "Summer", "Fall", "Winter"] >>> list(enumerate(seasons)) [(0, 'Spring'), (1, 'Summer'), (2, 'Fall'), (3, 'Winter')] If we wanted to use characters instead of numbers then the built-in
>>> import string >>> for identifier, season in zip(string.ascii_uppercase, seasons): ... print(identifier, season) ... ... A Spring B Summer C Fall D Winter >>> [(identifier, season) for identifier, season in zip(string.ascii_uppercase, seasons )] [('A', 'Spring'), ('B', 'Summer'), ('C', 'Fall'), ('D', 'Winter')] >>> Caching
Securityyaml.safe_loadUse YAML's Protect CodeFirst of all it is not about protecting code but about protecting ideas (e.g. algorithms) and sensible information (e.g. passwords), our assets. Source code really just is the collective minds of human beings translated into a language corpora that can be understood by much less-capable entities (computers) — computers are fast with repetitive simple tasks, but they are not (yet) capable of higher thinking/reasoning. Both, ideas and sensible information, are of substantial social and monetary value, hard to quantify (read measure) and once they become general knowledge, they are lost assets. In a nutshell: the only real chance we have to protect said ideas and sensible information is by not revealing it through source code. Forget about all the funky stories about byte-compiled and obfuscated source code — those are tales and lies based on misinterpretation of facts and lack of knowledge. So how do we not reveal our assets but sill provide enough functionality to our users? The answer is with SaaS (Software as a Service) or, without dipping into marketing/hype parlance, we split our application into two parts:
How does this work? Well, we put 2 (our assets) on a server only we control and let 1 (our non-assets) access this server for the information it needs to function properly. This way we never give away ideas or sensible information because we never give away source code containing our assets! Only distributing bytecode and/or obfuscated source code does only pose a hurdle but does not protect us from assets being revealed — if somebody tells us a different story then he is either lying or simply no expert. Those who still think byte-compiling and obfuscating source code is the way to protect assets should be prepared to get asked those question:
WRITEME Android
WRITEME |