Reshaping in Pandas – Pivot, Pivot-Table, Stack and Unstack explained with Pictures

Introduction

from: http://nikgrozev.com/2015/07/01/reshaping-in-pandas-pivot-pivot-table-stack-and-unstack-explained-with-pictures/

Pandas is a popular python library for data analysis. It provides a façade on top of libraries like numpy and matplotlib, which makes it easier to read and transform data. It provides the abstractions of DataFrames and Series, similar to those in R.

In Pandas data reshaping means the transformation of the structure of a table or vector (i.e. DataFrame or Series) to make it suitable for further analysis. Some of Pandas reshaping capabilities do not readily exist in other environments (e.g. SQL or bare bone R) and can be tricky for a beginner.

In this post, I’ll exemplify some of the most common Pandas reshaping functions and will depict their work with diagrams.

Pivot

The pivot function is used to create a new derived table out of a given one. Pivot takes 3 arguements with the following names: index, columns, and values. As a value for each of these parameters you need to specify a column name in the original table. Then the pivot function will create a new table, whose row and column indices are the unique values of the respective parameters. The cell values of the new table are taken from column given as the values parameter.

A bit foggy? Let’s give an example. Assume that we are given the following small table:

Although the semantics doesn’t matter in this example, you can think of it as a table of items we want to sell. The Itemcolumn contains the item names, USD is the price in US dollars and EU is the price in euros. Each client can be classified as Gold, Silver or Bronze customer and this is specified in the CType column.

The following code snippet creates the depicted DataFrame. Note that we will assume these imports are present in all code snippets throughout this article.

from collections import OrderedDict
from pandas import DataFrame
import pandas as pd
import numpy as np

table = OrderedDict((
    ("Item", ['Item0', 'Item0', 'Item1', 'Item1']),
    ('CType',['Gold', 'Bronze', 'Gold', 'Silver']),
    ('USD',  ['1$', '2$', '3$', '4$']),
    ('EU',   ['1€', '2€', '3€', '4€'])
))
d = DataFrame(table)

In such a table, it is not easy to see how the USD price varies over different customer types. We may like to reshape/pivot the table so that all USD prices for an item are on the row to compare more easily. With Pandas, we can do so with a single line:

p = d.pivot(index='Item', columns='CType', values='USD')

This invocation creates a new table/DataFrame whose columns are the unique values in d.CType and whose rows are indexed with the unique values of d.Item. So far we have defined the indices of the columns and rows, but what about the cells’ values? This is defined by the last parameter of the invocation – values=’USD’. Each cell in the newly created DataFrame will have as a value the entry of the USD column in the original table corresponding to the same Item and CType. The following diagram illustrates this. Column and row indices are marked in red.

In other words, the value of USD for every row in the original table has been transferred to the new table, where its row and column match the Item and CType of its original row. Cells in the new table which do not have a matching entry in the original one are set with NaN.

As an example the following lines perform equivalent queries on the original and pivoted tables:

# Original DataFrame: Access the USD cost of Item0 for Gold customers
print (d[(d.Item=='Item0') & (d.CType=='Gold')].USD.values)

# Pivoted DataFrame: Access the USD cost of Item0 for Gold customers
print (p[p.index=='Item0'].Gold.values)

Note that in this example the pivoted table does not contain any information about the EU column! Indeed, we can’t see those euro symbols anywhere! Thus, the pivoted table is a simplified version of the original data and only contains information about the columns we specified as parameters to the pivot method.

Pivoting By Multiple Columns

Now what if we want to extend the previous example to have the EU cost for each item on its row as well? This is actually easy – we just have to omit the values parameter as follows:

p = d.pivot(index='Item', columns='CType')

In this case, Pandas will create a hierarchical column index (MultiIndex) for the new table. You can think of a hierarchical index as a set of trees of indices. Each indexed column/row is identified by a unique sequence of values defining the “path” from the topmost index to the bottom index. The first level of the column index defines all columns that we have not specified in the pivot invocation – in this case USD and EU. The second level of the index defines the unique value of the corresponding column. This is depicted in the following diagram:

We can use this hierarchical column index to filter the values of a single column from the original table. For example p.USD returns a pivoted DataFrame with the USD values only and it is equivalent to the pivoted DataFrame from the previous section.

To exemplify hierarchical indices, the expression p.USD.Bronze selects the first column in the pivoted table.

As a further example the following queries on the original and pivoted tables are equivalent:

# Original DataFrame: Access the USD cost of Item0 for Gold customers
print(d[(d.Item=='Item0') & (d.CType=='Gold')].USD.values)

# Pivoted DataFrame: p.USD gives a "sub-DataFrame" with the USD values only
print(p.USD[p.USD.index=='Item0'].Gold.values)

Common Mistake in Pivoting

As we saw the pivot method takes at least 2 column names as parameters – the index and the columns named parameters. What will happen if we have multiple rows with the same values for these columns? How will the pivot method determine the value of the corresponding cell in the pivoted table? The following diagram depicts the problem:

In this example we have two rows with the same values (“Item0” and “Gold”) for the Item and CType columns. The pivot method can not know what should be the value of the corresponding value in the pivoted table. Thus, it throws an exception with the following message:

ValueError: Index contains duplicate entries, cannot reshape

The following code reproduces the issue:

table = OrderedDict((
    ("Item", ['Item0', 'Item0', 'Item0', 'Item1']),
    ('CType',['Gold', 'Bronze', 'Gold', 'Silver']),
    ('USD',  ['1',  '2',  '3',  '4']),
    ('EU',   ['1€', '2€', '3€', '4€'])
))
d = DataFrame(table)
p = d.pivot(index='Item', columns='CType', values='USD')

Hence, before calling pivot we need to ensure that our data does not have rows with duplicate values for the specified columns. If we can’t ensure this we may have to use the pivot_table method instead.

Pivot Table

The pivot_table method comes to solve this problem. It works like pivot, but it aggregates the values from rows with duplicate entries for the specified columns. In other words, in the previous example we could have used the mean, the median or another aggregation function to compute a single value from the conflicting entries. This is depicted in the example below.

Note that in this example we removed the $ and € symbols to simplify things. There are two rows in the original table, whose values for Item and CType are duplicate. The corresponding value in the pivot table is defined as the mean of these two original values. The pivot_table method takes a parameter called aggfunc, which is the aggregation function used to combine the multitude of values. In this example we used the mean function from numpy. The following snippet lists the code to reproduce the example:

table = OrderedDict((
    ("Item", ['Item0', 'Item0', 'Item0', 'Item1']),
    ('CType',['Gold', 'Bronze', 'Gold', 'Silver']),
    ('USD',  [1, 2, 3, 4]),
    ('EU',   [1.1, 2.2, 3.3, 4.4])
))
d = DataFrame(table)
p = d.pivot_table(index='Item', columns='CType', values='USD', aggfunc=np.min)

In essence pivot_table is a generalisation of pivot, which allows you to aggregate multiple values with the same destination in the pivoted table.

Stack/Unstack

In fact pivoting a table is a special case of stacking a DataFrame. Let us assume we have a DataFrame with MultiIndices on the rows and columns. Stacking a DataFrame means moving (also rotating or pivoting) the innermost column index to become the innermost row index. The inverse operation is called unstacking. It means moving the innermost row index to become the innermost column index. The following diagram depicts the operations:

In this example, we look at a DataFrame with 2-level hierarchical indices on both axes. Stacking takes the most-inner column index (i.e. c00, c01, c10), makes it the most inner row index and reshuffles the cell values accordingly. Inversely, unstacking moves the inner row indices (i.e. r00, r01) to the columns.

Typically, stacking makes the DataFrame taller, as it is “stacking” data in fewer columns and more rows. Similarly, unstacking usually makes it shorter and wider or broader. The following reproduces the example:

# Row Multi-Index
row_idx_arr = list(zip(['r0', 'r0'], ['r-00', 'r-01']))
row_idx = pd.MultiIndex.from_tuples(row_idx_arr)

# Column Multi-Index
col_idx_arr = list(zip(['c0', 'c0', 'c1'], ['c-00', 'c-01', 'c-10']))
col_idx = pd.MultiIndex.from_tuples(col_idx_arr)

# Create the DataFrame
d = DataFrame(np.arange(6).reshape(2,3), index=row_idx, columns=col_idx)
d = d.applymap(lambda x: (x // 3, x % 3))

# Stack/Unstack
s = d.stack()
u = d.unstack()

In fact Pandas allows us to stack/unstack on any level of the index so our previous explanation was a bit simplified :). Thus, in the previous example we could have stacked on the outermost index level as well! However, the default (and most typical case) is to stack/unstack on the innermost index level.

Stacking and unstacking can also be applied to data with flat (i.e. non-hierchical) indices. In this case, one of the indices is de facto removed (the columns index if stacking, and the rows if unstacking) and its values are nested in the other index, which is now a MultiIndex. Therefore, the result is always a Series with a hierarchical index. The following example demonstrates this:

In this example we take a DataFrame similar to the one from the beginning. Instead of pivoting, this time we stack it, and we get a Series with a MultiIndex composed of the initial index as first level, and the table columns as a second. Unstacking can help us get back to our original data structure.

Markdown Cheat Sheet in Jupyter Notebook

Markdown is a wonderfully simple approach to creating web pages, written by John Gruber of Daring Fireball. You get on with the business of writing (without any fancy code) and Markdown takes care of producing clean, web standards compliant HTML.

The Daring Fireball site provides full documentation for Markdown, but the following examples should get you started.

Section Headings

You can define headings of different levels when creating a web page. The most important heading (which typically only occurs once on each page — at the top) is heading 1. A level 1 heading can be created with Markdown by typing a single ‘#’ character at the start of a line. The heading at the top of this page was defined like this:

# Markdown Cheat Sheet

To create a secondary heading (such as the one for this section) you just use two ‘#’ characters, like so:

## Section Headings

You can use up to six ‘#’ characters to create a level 6 heading, but you will probably find that you don’t need to nest your headings quite so deeply!

Paragraphs

Paragraphs are very easy; separate them with a blank line. You can write your paragraph on one long line, or you can wrap the lines yourself if you prefer.

This section was marked up like so:

Paragraphs are very easy; separate them with a blank line. You can write your paragraph on one long line,
or you can
wrap the lines yourself
if you prefer.

This section was marked up like so:

Bold and Italics

It’s very easy to add emphasis with bold and italics:

It's **very** easy to do **bold** and *italics*:

You can also use underscores if you prefer:

It's __very__ easy to do __bold__ and _italics_:

Links

Create simple links by wrapping square brackets around the link text and round brackets around the URL:

[Nesta CMS](http://effectif.com/nesta)

If you want to give your readers an extra about the link that they’re about to follow you can set a link title:

[Nesta CMS](http://effectif.com/nesta "Nesta is a superb CMS")

Titles usually appear as a tooltip when you hover over the link, and help search engines work out what a page is about.

Bulleted Lists

Start each line with hyphen or an asterisk, followed by a space. List items can be nested. This text:

* Bullet 1
* Bullet 2
  * Bullet 2a
  * Bullet 2b
* Bullet 3

…produces this list:

Bullet 1
Bullet 2
- Bullet 2a
- Bullet 2b
Bullet 3

Numbered Lists

Start each line with number and a period, then a space. This text…

1. Baked potato
2. Baked beans
3. Pepper

…produces this list:

Baked potato
Baked beans
Pepper

Quotes

If you need to cite a paragraph of somebody else’s work you really ought to attribute it to them properly by using HTML’s <blockquote/> tag. You can produce it with Markdown by adding a single ‘>’ character at the beginning of the line.

This text:

> One thing was certain, that the white kitten had had nothing
> to do with it -- it was the black kitten's fault entirely. For
> the white kitten had been having its face washed by the old cat,
> for the last quarter of an hour (and bearing it pretty well,
> considering) so you see that it couldn't have had any hand in
> the mischief. <cite>Lewis Carroll, Through the Looking
> Glass</cite>

…produces:

One thing was certain, that the white kitten had had nothing to do with it — it was the black kitten’s fault entirely. For the white kitten had been having its face washed by the old cat, for the last quarter of an hour (and bearing it pretty well, considering) so you see that it couldn’t have had any hand in the mischief. — Lewis Carroll, Through the Looking Glass

Build-in Functions in python

The Python interpreter has a number of functions built into it that are always available. They are listed here in alphabetical order.

		Built-in Functions
`abs()`	`divmod()`	`input()`	`open()`	`staticmethod()`
`all()`	`enumerate()`	`int()`	`ord()`	`str()`
`any()`	`eval()`	`isinstance()`	`pow()`	`sum()`
`basestring()`	`execfile()`	`issubclass()`	`print()`	`super()`
`bin()`	`file()`	`iter()`	`property()`	`tuple()`
`bool()`	`filter()`	`len()`	`range()`	`type()`
`bytearray()`	`float()`	`list()`	`raw_input()`	`unichr()`
`callable()`	`format()`	`locals()`	`reduce()`	`unicode()`
`chr()`	`frozenset()`	`long()`	`reload()`	`vars()`
`classmethod()`	`getattr()`	`map()`	`repr()`	`xrange()`
`cmp()`	`globals()`	`max()`	`reversed()`	`zip()`
`compile()`	`hasattr()`	`memoryview()`	`round()`	`__import__()`
`complex()`	`hash()`	`min()`	`set()`
`delattr()`	`help()`	`next()`	`setattr()`
`dict()`	`hex()`	`object()`	`slice()`
`dir()`	`id()`	`oct()`	`sorted()`

In addition, there are other four built-in functions that are no longer considered essential: apply(), buffer(), coerce(), and intern(). They are documented in the Non-essential Built-in Functions section.

abs(x): Return the absolute value of a number. The argument may be a plain or long integer or a floating point number. If the argument is a complex number, its magnitude is returned.

all(iterable)

Return True if all elements of the iterable are true (or if the iterable is empty). Equivalent to:

def all(iterable):
    for element in iterable:
        if not element:
            return False
    return True

New in version 2.5.

any(iterable)

Return True if any element of the iterable is true. If the iterable is empty, return False. Equivalent to:

def any(iterable):
    for element in iterable:
        if element:
            return True
    return False

New in version 2.5.

basestring(): This abstract type is the superclass for str and unicode. It cannot be called or instantiated, but it can be used to test whether an object is an instance of str or unicode. isinstance(obj, basestring) is equivalent to isinstance(obj, (str, unicode)).

New in version 2.3.

bin(x): Convert an integer number to a binary string. The result is a valid Python expression. If x is not a Python int object, it has to define an __index__()method that returns an integer.

New in version 2.6.

class bool([x]): Return a Boolean value, i.e. one of True or False. x is converted using the standard truth testing procedure. If x is false or omitted, this returns False; otherwise it returns True. bool is also a class, which is a subclass of int. Class bool cannot be subclassed further. Its only instances are False andTrue.

New in version 2.2.1.

Changed in version 2.3: If no argument is given, this function returns False.

class bytearray([source[, encoding[, errors]]])

Return a new array of bytes. The bytearray class is a mutable sequence of integers in the range 0 <= x < 256. It has most of the usual methods of mutable sequences, described in Mutable Sequence Types, as well as most methods that the str type has, see String Methods.

The optional source parameter can be used to initialize the array in a few different ways:

If it is unicode, you must also give the encoding (and optionally, errors) parameters; bytearray() then converts the unicode to bytes using unicode.encode().
If it is an integer, the array will have that size and will be initialized with null bytes.
If it is an object conforming to the buffer interface, a read-only buffer of the object will be used to initialize the bytes array.
If it is an iterable, it must be an iterable of integers in the range 0 <= x < 256, which are used as the initial contents of the array.

Without an argument, an array of size 0 is created.

New in version 2.6.

callable(object): Return True if the object argument appears callable, False if not. If this returns true, it is still possible that a call fails, but if it is false, calling object will never succeed. Note that classes are callable (calling a class returns a new instance); class instances are callable if they have a __call__() method.

chr(i): Return a string of one character whose ASCII code is the integer i. For example, chr(97) returns the string 'a'. This is the inverse of ord(). The argument must be in the range [0..255], inclusive; ValueError will be raised if i is outside that range. See also unichr().

classmethod(function)

Return a class method for function.

A class method receives the class as implicit first argument, just like an instance method receives the instance. To declare a class method, use this idiom:

class C(object):
    @classmethod
    def f(cls, arg1, arg2, ...):
        ...

The @classmethod form is a function decorator – see the description of function definitions in Function definitions for details.

It can be called either on the class (such as C.f()) or on an instance (such as C().f()). The instance is ignored except for its class. If a class method is called for a derived class, the derived class object is passed as the implied first argument.

Class methods are different than C++ or Java static methods. If you want those, see staticmethod() in this section.

For more information on class methods, consult the documentation on the standard type hierarchy in The standard type hierarchy.

New in version 2.2.

Changed in version 2.4: Function decorator syntax added.

cmp(x, y): Compare the two objects x and y and return an integer according to the outcome. The return value is negative if x < y, zero if x == y and strictly positive if x > y.

compile(source, filename, mode[, flags[, dont_inherit]])

Compile the source into a code or AST object. Code objects can be executed by an exec statement or evaluated by a call to eval(). source can either be a Unicode string, a Latin-1 encoded string or an AST object. Refer to the ast module documentation for information on how to work with AST objects.

The filename argument should give the file from which the code was read; pass some recognizable value if it wasn’t read from a file ('<string>' is commonly used).

The mode argument specifies what kind of code must be compiled; it can be 'exec' if source consists of a sequence of statements, 'eval' if it consists of a single expression, or 'single' if it consists of a single interactive statement (in the latter case, expression statements that evaluate to something other than None will be printed).

The optional arguments flags and dont_inherit control which future statements (see PEP 236) affect the compilation of source. If neither is present (or both are zero) the code is compiled with those future statements that are in effect in the code that is calling compile(). If the flags argument is given and dont_inherit is not (or is zero) then the future statements specified by the flags argument are used in addition to those that would be used anyway. If dont_inherit is a non-zero integer then the flags argument is it – the future statements in effect around the call to compile are ignored.

Future statements are specified by bits which can be bitwise ORed together to specify multiple statements. The bitfield required to specify a given feature can be found as the compiler_flag attribute on the _Feature instance in the __future__ module.

This function raises SyntaxError if the compiled source is invalid, and TypeError if the source contains null bytes.

If you want to parse Python code into its AST representation, see ast.parse().

Note

When compiling a string with multi-line code in 'single' or 'eval' mode, input must be terminated by at least one newline character. This is to facilitate detection of incomplete and complete statements in the code module.

Changed in version 2.3: The flags and dont_inherit arguments were added.

Changed in version 2.6: Support for compiling AST objects.

Changed in version 2.7: Allowed use of Windows and Mac newlines. Also input in 'exec' mode does not have to end in a newline anymore.

class complex([real[, imag]])

Return a complex number with the value real + imag*1j or convert a string or number to a complex number. If the first parameter is a string, it will be interpreted as a complex number and the function must be called without a second parameter. The second parameter can never be a string. Each argument may be any numeric type (including complex). If imag is omitted, it defaults to zero and the function serves as a numeric conversion function like int(), long() and float(). If both arguments are omitted, returns 0j.

Note

When converting from a string, the string must not contain whitespace around the central + or - operator. For example, complex('1+2j') is fine, but complex('1 + 2j') raises ValueError.

The complex type is described in Numeric Types — int, float, long, complex.

delattr(object, name): This is a relative of setattr(). The arguments are an object and a string. The string must be the name of one of the object’s attributes. The function deletes the named attribute, provided the object allows it. For example, delattr(x, 'foobar') is equivalent to del x.foobar.

class dict(**kwarg)

class dict(mapping, **kwarg)

class dict(iterable, **kwarg)

Create a new dictionary. The dict object is the dictionary class. See dict and Mapping Types — dict for documentation about this class.

For other containers see the built-in list, set, and tuple classes, as well as the collections module.

dir([object])

Without arguments, return the list of names in the current local scope. With an argument, attempt to return a list of valid attributes for that object.

If the object has a method named __dir__(), this method will be called and must return the list of attributes. This allows objects that implement a custom __getattr__() or __getattribute__() function to customize the way dir() reports their attributes.

If the object does not provide __dir__(), the function tries its best to gather information from the object’s __dict__ attribute, if defined, and from its type object. The resulting list is not necessarily complete, and may be inaccurate when the object has a custom __getattr__().

The default dir() mechanism behaves differently with different types of objects, as it attempts to produce the most relevant, rather than complete, information:

If the object is a module object, the list contains the names of the module’s attributes.
If the object is a type or class object, the list contains the names of its attributes, and recursively of the attributes of its bases.
Otherwise, the list contains the object’s attributes’ names, the names of its class’s attributes, and recursively of the attributes of its class’s base classes.

The resulting list is sorted alphabetically. For example:

>>> import struct
>>> dir()   # show the names in the module namespace
['__builtins__', '__doc__', '__name__', 'struct']
>>> dir(struct)   # show the names in the struct module
['Struct', '__builtins__', '__doc__', '__file__', '__name__',
 '__package__', '_clearcache', 'calcsize', 'error', 'pack', 'pack_into',
 'unpack', 'unpack_from']
>>> class Shape(object):
        def __dir__(self):
            return ['area', 'perimeter', 'location']
>>> s = Shape()
>>> dir(s)
['area', 'perimeter', 'location']

Note

Because dir() is supplied primarily as a convenience for use at an interactive prompt, it tries to supply an interesting set of names more than it tries to supply a rigorously or consistently defined set of names, and its detailed behavior may change across releases. For example, metaclass attributes are not in the result list when the argument is a class.

divmod(a, b): Take two (non complex) numbers as arguments and return a pair of numbers consisting of their quotient and remainder when using long division. With mixed operand types, the rules for binary arithmetic operators apply. For plain and long integers, the result is the same as (a // b, a % b). For floating point numbers the result is (q, a % b), where q is usually math.floor(a / b) but may be 1 less than that. In any case q * b + a % b is very close to a, if a % b is non-zero it has the same sign as b, and 0 <= abs(a % b) < abs(b).

Changed in version 2.3: Using divmod() with complex numbers is deprecated.

enumerate(sequence, start=0)

Return an enumerate object. sequence must be a sequence, an iterator, or some other object which supports iteration. The next() method of the iterator returned by enumerate() returns a tuple containing a count (from start which defaults to 0) and the values obtained from iterating over sequence:

>>> seasons = ['Spring', 'Summer', 'Fall', 'Winter']
>>> list(enumerate(seasons))
[(0, 'Spring'), (1, 'Summer'), (2, 'Fall'), (3, 'Winter')]
>>> list(enumerate(seasons, start=1))
[(1, 'Spring'), (2, 'Summer'), (3, 'Fall'), (4, 'Winter')]

Equivalent to:

def enumerate(sequence, start=0):
    n = start
    for elem in sequence:
        yield n, elem
        n += 1

New in version 2.3.

Changed in version 2.6: The start parameter was added.

eval(expression[, globals[, locals]])

The arguments are a Unicode or Latin-1 encoded string and optional globals and locals. If provided, globals must be a dictionary. If provided, localscan be any mapping object.

Changed in version 2.4: formerly locals was required to be a dictionary.

The expression argument is parsed and evaluated as a Python expression (technically speaking, a condition list) using the globals and localsdictionaries as global and local namespace. If the globals dictionary is present and lacks ‘__builtins__’, the current globals are copied into globalsbefore expression is parsed. This means that expression normally has full access to the standard __builtin__ module and restricted environments are propagated. If the locals dictionary is omitted it defaults to the globals dictionary. If both dictionaries are omitted, the expression is executed in the environment where eval() is called. The return value is the result of the evaluated expression. Syntax errors are reported as exceptions. Example:

>>> x = 1
>>> print eval('x+1')
2

This function can also be used to execute arbitrary code objects (such as those created by compile()). In this case pass a code object instead of a string. If the code object has been compiled with 'exec' as the mode argument, eval()’s return value will be None.

Hints: dynamic execution of statements is supported by the exec statement. Execution of statements from a file is supported by the execfile()function. The globals() and locals() functions returns the current global and local dictionary, respectively, which may be useful to pass around for use by eval() or execfile().

See ast.literal_eval() for a function that can safely evaluate strings with expressions containing only literals.

execfile(filename[, globals[, locals]])

This function is similar to the exec statement, but parses a file instead of a string. It is different from the import statement in that it does not use the module administration — it reads the file unconditionally and does not create a new module. [1]

The arguments are a file name and two optional dictionaries. The file is parsed and evaluated as a sequence of Python statements (similarly to a module) using the globals and locals dictionaries as global and local namespace. If provided, locals can be any mapping object. Remember that at module level, globals and locals are the same dictionary. If two separate objects are passed as globals and locals, the code will be executed as if it were embedded in a class definition.

Changed in version 2.4: formerly locals was required to be a dictionary.

If the locals dictionary is omitted it defaults to the globals dictionary. If both dictionaries are omitted, the expression is executed in the environment where execfile() is called. The return value is None.

Note

The default locals act as described for function locals() below: modifications to the default locals dictionary should not be attempted. Pass an explicit locals dictionary if you need to see effects of the code on locals after function execfile() returns. execfile() cannot be used reliably to modify a function’s locals.

file(name[, mode[, buffering]])

Constructor function for the file type, described further in section File Objects. The constructor’s arguments are the same as those of the open()built-in function described below.

When opening a file, it’s preferable to use open() instead of invoking this constructor directly. file is more suited to type testing (for example, writing isinstance(f, file)).

New in version 2.2.

filter(function, iterable)

Construct a list from those elements of iterable for which function returns true. iterable may be either a sequence, a container which supports iteration, or an iterator. If iterable is a string or a tuple, the result also has that type; otherwise it is always a list. If function is None, the identity function is assumed, that is, all elements of iterable that are false are removed.

Note that filter(function, iterable) is equivalent to [item for item in iterable if function(item)] if function is not None and [item for item in iterableif item] if function is None.

See itertools.ifilter() and itertools.ifilterfalse() for iterator versions of this function, including a variation that filters for elements where the function returns false.

class float([x])

Return a floating point number constructed from a number or string x.

If the argument is a string, it must contain a possibly signed decimal or floating point number, possibly embedded in whitespace. The argument may also be [+|-]nan or [+|-]inf. Otherwise, the argument may be a plain or long integer or a floating point number, and a floating point number with the same value (within Python’s floating point precision) is returned. If no argument is given, returns 0.0.

Note

When passing in a string, values for NaN and Infinity may be returned, depending on the underlying C library. Float accepts the strings nan, inf and -inf for NaN and positive or negative infinity. The case and a leading + are ignored as well as a leading – is ignored for NaN. Float always represents NaN and infinity as nan, inf or -inf.

The float type is described in Numeric Types — int, float, long, complex.

format(value[, format_spec]): Convert a value to a “formatted” representation, as controlled by format_spec. The interpretation of format_spec will depend on the type of the valueargument, however there is a standard formatting syntax that is used by most built-in types: Format Specification Mini-Language.

Note

format(value, format_spec) merely calls value.__format__(format_spec).

New in version 2.6.

class frozenset([iterable])

Return a new frozenset object, optionally with elements taken from iterable. frozenset is a built-in class. See frozenset and Set Types — set, frozenset for documentation about this class.

For other containers see the built-in set, list, tuple, and dict classes, as well as the collections module.

New in version 2.4.

getattr(object, name[, default]): Return the value of the named attribute of object. name must be a string. If the string is the name of one of the object’s attributes, the result is the value of that attribute. For example, getattr(x, 'foobar') is equivalent to x.foobar. If the named attribute does not exist, default is returned if provided, otherwise AttributeError is raised.

globals(): Return a dictionary representing the current global symbol table. This is always the dictionary of the current module (inside a function or method, this is the module where it is defined, not the module from which it is called).

hasattr(object, name): The arguments are an object and a string. The result is True if the string is the name of one of the object’s attributes, False if not. (This is implemented by calling getattr(object, name) and seeing whether it raises an exception or not.)

hash(object): Return the hash value of the object (if it has one). Hash values are integers. They are used to quickly compare dictionary keys during a dictionary lookup. Numeric values that compare equal have the same hash value (even if they are of different types, as is the case for 1 and 1.0).

help([object])

Invoke the built-in help system. (This function is intended for interactive use.) If no argument is given, the interactive help system starts on the interpreter console. If the argument is a string, then the string is looked up as the name of a module, function, class, method, keyword, or documentation topic, and a help page is printed on the console. If the argument is any other kind of object, a help page on the object is generated.

This function is added to the built-in namespace by the site module.

New in version 2.2.

hex(x)

Convert an integer number (of any size) to a lowercase hexadecimal string prefixed with “0x”, for example:

>>> hex(255)
'0xff'
>>> hex(-42)
'-0x2a'
>>> hex(1L)
'0x1L'

If x is not a Python int or long object, it has to define a __hex__() method that returns a string.

See also int() for converting a hexadecimal string to an integer using a base of 16.

Note

To obtain a hexadecimal string representation for a float, use the float.hex() method.

Changed in version 2.4: Formerly only returned an unsigned literal.

id(object): Return the “identity” of an object. This is an integer (or long integer) which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.

CPython implementation detail: This is the address of the object in memory.

input([prompt])

Equivalent to eval(raw_input(prompt)).

This function does not catch user errors. If the input is not syntactically valid, a SyntaxError will be raised. Other exceptions may be raised if there is an error during evaluation.

If the readline module was loaded, then input() will use it to provide elaborate line editing and history features.

Consider using the raw_input() function for general input from users.

class int(x=0)

class int(x, base=10)

Return an integer object constructed from a number or string x, or return 0 if no arguments are given. If x is a number, it can be a plain integer, a long integer, or a floating point number. If x is floating point, the conversion truncates towards zero. If the argument is outside the integer range, the function returns a long object instead.

If x is not a number or if base is given, then x must be a string or Unicode object representing an integer literal in radix base. Optionally, the literal can be preceded by + or - (with no space in between) and surrounded by whitespace. A base-n literal consists of the digits 0 to n-1, with a to z (or Ato Z) having values 10 to 35. The default base is 10. The allowed values are 0 and 2–36. Base-2, -8, and -16 literals can be optionally prefixed with 0b/0B, 0o/0O/0, or 0x/0X, as with integer literals in code. Base 0 means to interpret the string exactly as an integer literal, so that the actual base is 2, 8, 10, or 16.

The integer type is described in Numeric Types — int, float, long, complex.

isinstance(object, classinfo): Return true if the object argument is an instance of the classinfo argument, or of a (direct, indirect or virtual) subclass thereof. Also return true if classinfo is a type object (new-style class) and object is an object of that type or of a (direct, indirect or virtual) subclass thereof. If object is not a class instance or an object of the given type, the function always returns false. If classinfo is a tuple of class or type objects (or recursively, other such tuples), return true if object is an instance of any of the classes or types. If classinfo is not a class, type, or tuple of classes, types, and such tuples, a TypeError exception is raised.

Changed in version 2.2: Support for a tuple of type information was added.

issubclass(class, classinfo): Return true if class is a subclass (direct, indirect or virtual) of classinfo. A class is considered a subclass of itself. classinfo may be a tuple of class objects, in which case every entry in classinfo will be checked. In any other case, a TypeError exception is raised.

Changed in version 2.3: Support for a tuple of type information was added.

iter(o[, sentinel])

Return an iterator object. The first argument is interpreted very differently depending on the presence of the second argument. Without a second argument, o must be a collection object which supports the iteration protocol (the __iter__() method), or it must support the sequence protocol (the__getitem__() method with integer arguments starting at 0). If it does not support either of those protocols, TypeError is raised. If the second argument, sentinel, is given, then o must be a callable object. The iterator created in this case will call o with no arguments for each call to its next() method; if the value returned is equal to sentinel, StopIteration will be raised, otherwise the value will be returned.

One useful application of the second form of iter() is to read lines of a file until a certain line is reached. The following example reads a file until the readline() method returns an empty string:

with open('mydata.txt') as fp:
    for line in iter(fp.readline, ''):
        process_line(line)

New in version 2.2.

len(s): Return the length (the number of items) of an object. The argument may be a sequence (such as a string, bytes, tuple, list, or range) or a collection (such as a dictionary, set, or frozen set).

class list([iterable])

Return a list whose items are the same and in the same order as iterable’s items. iterable may be either a sequence, a container that supports iteration, or an iterator object. If iterable is already a list, a copy is made and returned, similar to iterable[:]. For instance, list('abc') returns ['a','b', 'c'] and list( (1, 2, 3) ) returns [1, 2, 3]. If no argument is given, returns a new empty list, [].

list is a mutable sequence type, as documented in Sequence Types — str, unicode, list, tuple, bytearray, buffer, xrange. For other containers see the built in dict, set, and tuple classes, and the collections module.

locals(): Update and return a dictionary representing the current local symbol table. Free variables are returned by locals() when it is called in function blocks, but not in class blocks.

Note

The contents of this dictionary should not be modified; changes may not affect the values of local and free variables used by the interpreter.

class long(x=0)

class long(x, base=10)

Return a long integer object constructed from a string or number x. If the argument is a string, it must contain a possibly signed number of arbitrary size, possibly embedded in whitespace. The base argument is interpreted in the same way as for int(), and may only be given when x is a string. Otherwise, the argument may be a plain or long integer or a floating point number, and a long integer with the same value is returned. Conversion of floating point numbers to integers truncates (towards zero). If no arguments are given, returns 0L.

The long type is described in Numeric Types — int, float, long, complex.

map(function, iterable, …): Apply function to every item of iterable and return a list of the results. If additional iterable arguments are passed, function must take that many arguments and is applied to the items from all iterables in parallel. If one iterable is shorter than another it is assumed to be extended with Noneitems. If function is None, the identity function is assumed; if there are multiple arguments, map() returns a list consisting of tuples containing the corresponding items from all iterables (a kind of transpose operation). The iterable arguments may be a sequence or any iterable object; the result is always a list.

max(iterable[, key])

max(arg1, arg2, *args[, key])

Return the largest item in an iterable or the largest of two or more arguments.

If one positional argument is provided, iterable must be a non-empty iterable (such as a non-empty string, tuple or list). The largest item in the iterable is returned. If two or more positional arguments are provided, the largest of the positional arguments is returned.

The optional key argument specifies a one-argument ordering function like that used for list.sort(). The key argument, if supplied, must be in keyword form (for example, max(a,b,c,key=func)).

Changed in version 2.5: Added support for the optional key argument.

memoryview(obj): Return a “memory view” object created from the given argument. See memoryview type for more information.

min(iterable[, key])

min(arg1, arg2, *args[, key])

Return the smallest item in an iterable or the smallest of two or more arguments.

If one positional argument is provided, iterable must be a non-empty iterable (such as a non-empty string, tuple or list). The smallest item in the iterable is returned. If two or more positional arguments are provided, the smallest of the positional arguments is returned.

The optional key argument specifies a one-argument ordering function like that used for list.sort(). The key argument, if supplied, must be in keyword form (for example, min(a,b,c,key=func)).

Changed in version 2.5: Added support for the optional key argument.

next(iterator[, default]): Retrieve the next item from the iterator by calling its next() method. If default is given, it is returned if the iterator is exhausted, otherwise StopIteration is raised.

New in version 2.6.

class object: Return a new featureless object. object is a base for all new style classes. It has the methods that are common to all instances of new style classes.

New in version 2.2.

Changed in version 2.3: This function does not accept any arguments. Formerly, it accepted arguments but ignored them.

oct(x): Convert an integer number (of any size) to an octal string. The result is a valid Python expression.

Changed in version 2.4: Formerly only returned an unsigned literal.

open(name[, mode[, buffering]])

Open a file, returning an object of the file type described in section File Objects. If the file cannot be opened, IOError is raised. When opening a file, it’s preferable to use open() instead of invoking the file constructor directly.

The first two arguments are the same as for stdio’s fopen(): name is the file name to be opened, and mode is a string indicating how the file is to be opened.

The most commonly-used values of mode are 'r' for reading, 'w' for writing (truncating the file if it already exists), and 'a' for appending (which on some Unix systems means that all writes append to the end of the file regardless of the current seek position). If mode is omitted, it defaults to 'r'. The default is to use text mode, which may convert '\n' characters to a platform-specific representation on writing and back on reading. Thus, when opening a binary file, you should append 'b' to the mode value to open the file in binary mode, which will improve portability. (Appending 'b' is useful even on systems that don’t treat binary and text files differently, where it serves as documentation.) See below for more possible values of mode.

The optional buffering argument specifies the file’s desired buffer size: 0 means unbuffered, 1 means line buffered, any other positive value means use a buffer of (approximately) that size (in bytes). A negative buffering means to use the system default, which is usually line buffered for tty devices and fully buffered for other files. If omitted, the system default is used. [2]

Modes 'r+', 'w+' and 'a+' open the file for updating (reading and writing); note that 'w+' truncates the file. Append 'b' to the mode to open the file in binary mode, on systems that differentiate between binary and text files; on systems that don’t have this distinction, adding the 'b' has no effect.

In addition to the standard fopen() values mode may be 'U' or 'rU'. Python is usually built with universal newlines support; supplying 'U' opens the file as a text file, but lines may be terminated by any of the following: the Unix end-of-line convention '\n', the Macintosh convention '\r', or the Windows convention '\r\n'. All of these external representations are seen as '\n' by the Python program. If Python is built without universal newlines support a mode with 'U' is the same as normal text mode. Note that file objects so opened also have an attribute called newlines which has a value of None (if no newlines have yet been seen), '\n', '\r', '\r\n', or a tuple containing all the newline types seen.

Python enforces that the mode, after stripping 'U', begins with 'r', 'w' or 'a'.

Python provides many file handling modules including fileinput, os, os.path, tempfile, and shutil.

Changed in version 2.5: Restriction on first letter of mode string introduced.

ord(c): Given a string of length one, return an integer representing the Unicode code point of the character when the argument is a unicode object, or the value of the byte when the argument is an 8-bit string. For example, ord('a') returns the integer 97, ord(u'\u2020') returns 8224. This is the inverse ofchr() for 8-bit strings and of unichr() for unicode objects. If a unicode argument is given and Python was built with UCS2 Unicode, then the character’s code point must be in the range [0..65535] inclusive; otherwise the string length is two, and a TypeError will be raised.

pow(x, y[, z])

Return x to the power y; if z is present, return x to the power y, modulo z (computed more efficiently than pow(x, y) % z). The two-argument form pow(x, y) is equivalent to using the power operator: x**y.

The arguments must have numeric types. With mixed operand types, the coercion rules for binary arithmetic operators apply. For int and long int operands, the result has the same type as the operands (after coercion) unless the second argument is negative; in that case, all arguments are converted to float and a float result is delivered. For example, 10**2 returns 100, but 10**-2 returns 0.01. (This last feature was added in Python 2.2. In Python 2.1 and before, if both arguments were of integer types and the second argument was negative, an exception was raised.) If the second argument is negative, the third argument must be omitted. If z is present, x and y must be of integer types, and y must be non-negative. (This restriction was added in Python 2.2. In Python 2.1 and before, floating 3-argument pow() returned platform-dependent results depending on floating-point rounding accidents.)

print(*objects, sep=’ ‘, end=’\n’, file=sys.stdout)

Print objects to the stream file, separated by sep and followed by end. sep, end and file, if present, must be given as keyword arguments.

All non-keyword arguments are converted to strings like str() does and written to the stream, separated by sep and followed by end. Both sep and end must be strings; they can also be None, which means to use the default values. If no objects are given, print() will just write end.

The file argument must be an object with a write(string) method; if it is not present or None, sys.stdout will be used. Output buffering is determined by file. Use file.flush() to ensure, for instance, immediate appearance on a screen.

Note

This function is not normally available as a built-in since the name print is recognized as the print statement. To disable the statement and use the print() function, use this future statement at the top of your module:

from __future__ import print_function

New in version 2.6.

class property([fget[, fset[, fdel[, doc]]]])

Return a property attribute for new-style classes (classes that derive from object).

fget is a function for getting an attribute value. fset is a function for setting an attribute value. fdel is a function for deleting an attribute value. And doccreates a docstring for the attribute.

A typical use is to define a managed attribute x:

class C(object):
    def __init__(self):
        self._x = None

    def getx(self):
        return self._x

    def setx(self, value):
        self._x = value

    def delx(self):
        del self._x

    x = property(getx, setx, delx, "I'm the 'x' property.")

If c is an instance of C, c.x will invoke the getter, c.x = value will invoke the setter and del c.x the deleter.

If given, doc will be the docstring of the property attribute. Otherwise, the property will copy fget’s docstring (if it exists). This makes it possible to create read-only properties easily using property() as a decorator:

class Parrot(object):
    def __init__(self):
        self._voltage = 100000

    @property
    def voltage(self):
        """Get the current voltage."""
        return self._voltage

The @property decorator turns the voltage() method into a “getter” for a read-only attribute with the same name, and it sets the docstring for voltageto “Get the current voltage.”

A property object has getter, setter, and deleter methods usable as decorators that create a copy of the property with the corresponding accessor function set to the decorated function. This is best explained with an example:

class C(object):
    def __init__(self):
        self._x = None

    @property
    def x(self):
        """I'm the 'x' property."""
        return self._x

    @x.setter
    def x(self, value):
        self._x = value

    @x.deleter
    def x(self):
        del self._x

This code is exactly equivalent to the first example. Be sure to give the additional functions the same name as the original property (x in this case.)

The returned property object also has the attributes fget, fset, and fdel corresponding to the constructor arguments.

New in version 2.2.

Changed in version 2.5: Use fget’s docstring if no doc given.

Changed in version 2.6: The getter, setter, and deleter attributes were added.

range(stop)

range(start, stop[, step])

This is a versatile function to create lists containing arithmetic progressions. It is most often used in for loops. The arguments must be plain integers. If the step argument is omitted, it defaults to 1. If the start argument is omitted, it defaults to 0. The full form returns a list of plain integers [start,start + step, start + 2 * step, ...]. If step is positive, the last element is the largest start + i * step less than stop; if step is negative, the last element is the smallest start + i * step greater than stop. step must not be zero (or else ValueError is raised). Example:

>>> range(10)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> range(1, 11)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> range(0, 30, 5)
[0, 5, 10, 15, 20, 25]
>>> range(0, 10, 3)
[0, 3, 6, 9]
>>> range(0, -10, -1)
[0, -1, -2, -3, -4, -5, -6, -7, -8, -9]
>>> range(0)
[]
>>> range(1, 0)
[]

raw_input([prompt])

If the prompt argument is present, it is written to standard output without a trailing newline. The function then reads a line from input, converts it to a string (stripping a trailing newline), and returns that. When EOF is read, EOFError is raised. Example:

>>> s = raw_input('--> ')
--> Monty Python's Flying Circus
>>> s
"Monty Python's Flying Circus"

If the readline module was loaded, then raw_input() will use it to provide elaborate line editing and history features.

reduce(function, iterable[, initializer])

Apply function of two arguments cumulatively to the items of iterable, from left to right, so as to reduce the iterable to a single value. For example,reduce(lambda x, y: x+y, [1, 2, 3, 4, 5]) calculates ((((1+2)+3)+4)+5). The left argument, x, is the accumulated value and the right argument, y, is the update value from the iterable. If the optional initializer is present, it is placed before the items of the iterable in the calculation, and serves as a default when the iterable is empty. If initializer is not given and iterable contains only one item, the first item is returned. Roughly equivalent to:

def reduce(function, iterable, initializer=None):
    it = iter(iterable)
    if initializer is None:
        try:
            initializer = next(it)
        except StopIteration:
            raise TypeError('reduce() of empty sequence with no initial value')
    accum_value = initializer
    for x in it:
        accum_value = function(accum_value, x)
    return accum_value

reload(module)

Reload a previously imported module. The argument must be a module object, so it must have been successfully imported before. This is useful if you have edited the module source file using an external editor and want to try out the new version without leaving the Python interpreter. The return value is the module object (the same as the module argument).

When reload(module) is executed:

Python modules’ code is recompiled and the module-level code reexecuted, defining a new set of objects which are bound to names in the module’s dictionary. The init function of extension modules is not called a second time.
As with all other objects in Python the old objects are only reclaimed after their reference counts drop to zero.
The names in the module namespace are updated to point to any new or changed objects.
Other references to the old objects (such as names external to the module) are not rebound to refer to the new objects and must be updated in each namespace where they occur if that is desired.

There are a number of other caveats:

When a module is reloaded, its dictionary (containing the module’s global variables) is retained. Redefinitions of names will override the old definitions, so this is generally not a problem. If the new version of a module does not define a name that was defined by the old version, the old definition remains. This feature can be used to the module’s advantage if it maintains a global table or cache of objects — with a try statement it can test for the table’s presence and skip its initialization if desired:

try:
    cache
except NameError:
    cache = {}

It is generally not very useful to reload built-in or dynamically loaded modules. Reloading sys, __main__, builtins and other key modules is not recommended. In many cases extension modules are not designed to be initialized more than once, and may fail in arbitrary ways when reloaded.

If a module imports objects from another module using from … import …, calling reload() for the other module does not redefine the objects imported from it — one way around this is to re-execute the from statement, another is to use import and qualified names (module.*name*) instead.

If a module instantiates instances of a class, reloading the module that defines the class does not affect the method definitions of the instances — they continue to use the old class definition. The same is true for derived classes.

repr(object): Return a string containing a printable representation of an object. This is the same value yielded by conversions (reverse quotes). It is sometimes useful to be able to access this operation as an ordinary function. For many types, this function makes an attempt to return a string that would yield an object with the same value when passed to eval(), otherwise the representation is a string enclosed in angle brackets that contains the name of the type of the object together with additional information often including the name and address of the object. A class can control what this function returns for its instances by defining a __repr__() method.

reversed(seq): Return a reverse iterator. seq must be an object which has a __reversed__() method or supports the sequence protocol (the __len__() method and the __getitem__() method with integer arguments starting at 0).

New in version 2.4.

Changed in version 2.6: Added the possibility to write a custom __reversed__() method.

round(number[, ndigits]): Return the floating point value number rounded to ndigits digits after the decimal point. If ndigits is omitted, it defaults to zero. The result is a floating point number. Values are rounded to the closest multiple of 10 to the power minus ndigits; if two multiples are equally close, rounding is done away from 0 (so, for example, round(0.5) is 1.0 and round(-0.5) is -1.0).

Note

The behavior of round() for floats can be surprising: for example, round(2.675, 2) gives 2.67 instead of the expected 2.68. This is not a bug: it’s a result of the fact that most decimal fractions can’t be represented exactly as a float. See Floating Point Arithmetic: Issues and Limitations for more information.

class set([iterable])

Return a new set object, optionally with elements taken from iterable. set is a built-in class. See set and Set Types — set, frozenset for documentation about this class.

For other containers see the built-in frozenset, list, tuple, and dict classes, as well as the collections module.

New in version 2.4.

setattr(object, name, value): This is the counterpart of getattr(). The arguments are an object, a string and an arbitrary value. The string may name an existing attribute or a new attribute. The function assigns the value to the attribute, provided the object allows it. For example, setattr(x, 'foobar', 123) is equivalent to x.foobar= 123.

class slice(stop)
class slice(start, stop[, step]): Return a slice object representing the set of indices specified by range(start, stop, step). The start and step arguments default to None. Slice objects have read-only data attributes start, stop and step which merely return the argument values (or their default). They have no other explicit functionality; however they are used by Numerical Python and other third party extensions. Slice objects are also generated when extended indexing syntax is used. For example: a[start:stop:step] or a[start:stop, i]. See itertools.islice() for an alternate version that returns an iterator.

sorted(iterable[, cmp[, key[, reverse]]])

Return a new sorted list from the items in iterable.

The optional arguments cmp, key, and reverse have the same meaning as those for the list.sort() method (described in section Mutable Sequence Types).

cmp specifies a custom comparison function of two arguments (iterable elements) which should return a negative, zero or positive number depending on whether the first argument is considered smaller than, equal to, or larger than the second argument: cmp=lambda x,y: cmp(x.lower(),y.lower()). The default value is None.

key specifies a function of one argument that is used to extract a comparison key from each list element: key=str.lower. The default value is None(compare the elements directly).

reverse is a boolean value. If set to True, then the list elements are sorted as if each comparison were reversed.

In general, the key and reverse conversion processes are much faster than specifying an equivalent cmp function. This is because cmp is called multiple times for each list element while key and reverse touch each element only once. Use functools.cmp_to_key() to convert an old-style cmpfunction to a key function.

The built-in sorted() function is guaranteed to be stable. A sort is stable if it guarantees not to change the relative order of elements that compare equal — this is helpful for sorting in multiple passes (for example, sort by department, then by salary grade).

For sorting examples and a brief sorting tutorial, see Sorting HOW TO.

New in version 2.4.

staticmethod(function)

Return a static method for function.

A static method does not receive an implicit first argument. To declare a static method, use this idiom:

class C(object):
    @staticmethod
    def f(arg1, arg2, ...):
        ...

The @staticmethod form is a function decorator – see the description of function definitions in Function definitions for details.

It can be called either on the class (such as C.f()) or on an instance (such as C().f()). The instance is ignored except for its class.

Static methods in Python are similar to those found in Java or C++. Also see classmethod() for a variant that is useful for creating alternate class constructors.

For more information on static methods, consult the documentation on the standard type hierarchy in The standard type hierarchy.

New in version 2.2.

Changed in version 2.4: Function decorator syntax added.

class str(object=”)

Return a string containing a nicely printable representation of an object. For strings, this returns the string itself. The difference with repr(object) is that str(object) does not always attempt to return a string that is acceptable to eval(); its goal is to return a printable string. If no argument is given, returns the empty string, ''.

For more information on strings see Sequence Types — str, unicode, list, tuple, bytearray, buffer, xrange which describes sequence functionality (strings are sequences), and also the string-specific methods described in the String Methods section. To output formatted strings use template strings or the % operator described in the String Formatting Operations section. In addition see the String Services section. See also unicode().

sum(iterable[, start])

Sums start and the items of an iterable from left to right and returns the total. start defaults to 0. The iterable’s items are normally numbers, and the start value is not allowed to be a string.

For some use cases, there are good alternatives to sum(). The preferred, fast way to concatenate a sequence of strings is by calling''.join(sequence). To add floating point values with extended precision, see math.fsum(). To concatenate a series of iterables, consider usingitertools.chain().

New in version 2.3.

super(type[, object-or-type])

Return a proxy object that delegates method calls to a parent or sibling class of type. This is useful for accessing inherited methods that have been overridden in a class. The search order is same as that used by getattr() except that the type itself is skipped.

The __mro__ attribute of the type lists the method resolution search order used by both getattr() and super(). The attribute is dynamic and can change whenever the inheritance hierarchy is updated.

If the second argument is omitted, the super object returned is unbound. If the second argument is an object, isinstance(obj, type) must be true. If the second argument is a type, issubclass(type2, type) must be true (this is useful for classmethods).

Note

super() only works for new-style classes.

There are two typical use cases for super. In a class hierarchy with single inheritance, super can be used to refer to parent classes without naming them explicitly, thus making the code more maintainable. This use closely parallels the use of super in other programming languages.

The second use case is to support cooperative multiple inheritance in a dynamic execution environment. This use case is unique to Python and is not found in statically compiled languages or languages that only support single inheritance. This makes it possible to implement “diamond diagrams” where multiple base classes implement the same method. Good design dictates that this method have the same calling signature in every case (because the order of calls is determined at runtime, because that order adapts to changes in the class hierarchy, and because that order can include sibling classes that are unknown prior to runtime).

For both use cases, a typical superclass call looks like this:

class C(B):
    def method(self, arg):
        super(C, self).method(arg)

Note that super() is implemented as part of the binding process for explicit dotted attribute lookups such as super().__getitem__(name). It does so by implementing its own __getattribute__() method for searching classes in a predictable order that supports cooperative multiple inheritance. Accordingly, super() is undefined for implicit lookups using statements or operators such as super()[name].

Also note that super() is not limited to use inside methods. The two argument form specifies the arguments exactly and makes the appropriate references.

For practical suggestions on how to design cooperative classes using super(), see guide to using super().

New in version 2.2.

tuple([iterable])

Return a tuple whose items are the same and in the same order as iterable’s items. iterable may be a sequence, a container that supports iteration, or an iterator object. If iterable is already a tuple, it is returned unchanged. For instance, tuple('abc') returns ('a', 'b', 'c') and tuple([1, 2, 3])returns (1, 2, 3). If no argument is given, returns a new empty tuple, ().

tuple is an immutable sequence type, as documented in Sequence Types — str, unicode, list, tuple, bytearray, buffer, xrange. For other containers see the built in dict, list, and set classes, and the collections module.

class type(object)

class type(name, bases, dict)

With one argument, return the type of an object. The return value is a type object. The isinstance() built-in function is recommended for testing the type of an object.

With three arguments, return a new type object. This is essentially a dynamic form of the class statement. The name string is the class name and becomes the __name__ attribute; the bases tuple itemizes the base classes and becomes the __bases__ attribute; and the dict dictionary is the namespace containing definitions for class body and becomes the __dict__ attribute. For example, the following two statements create identical typeobjects:

>>> class X(object):
...     a = 1
...
>>> X = type('X', (object,), dict(a=1))

New in version 2.2.

unichr(i): Return the Unicode string of one character whose Unicode code is the integer i. For example, unichr(97) returns the string u'a'. This is the inverse of ord() for Unicode strings. The valid range for the argument depends how Python was configured – it may be either UCS2 [0..0xFFFF] or UCS4 [0..0x10FFFF]. ValueError is raised otherwise. For ASCII and 8-bit strings see chr().

New in version 2.0.

unicode(object=”)

unicode(object[, encoding[, errors]])

Return the Unicode string version of object using one of the following modes:

If encoding and/or errors are given, unicode() will decode the object which can either be an 8-bit string or a character buffer using the codec forencoding. The encoding parameter is a string giving the name of an encoding; if the encoding is not known, LookupError is raised. Error handling is done according to errors; this specifies the treatment of characters which are invalid in the input encoding. If errors is 'strict' (the default), aValueError is raised on errors, while a value of 'ignore' causes errors to be silently ignored, and a value of 'replace' causes the official Unicode replacement character, U+FFFD, to be used to replace input characters which cannot be decoded. See also the codecs module.

If no optional parameters are given, unicode() will mimic the behaviour of str() except that it returns Unicode strings instead of 8-bit strings. More precisely, if object is a Unicode string or subclass it will return that Unicode string without any additional decoding applied.

For objects which provide a __unicode__() method, it will call this method without arguments to create a Unicode string. For all other objects, the 8-bit string version or representation is requested and then converted to a Unicode string using the codec for the default encoding in 'strict' mode.

For more information on Unicode strings see Sequence Types — str, unicode, list, tuple, bytearray, buffer, xrange which describes sequence functionality (Unicode strings are sequences), and also the string-specific methods described in the String Methods section. To output formatted strings use template strings or the % operator described in the String Formatting Operations section. In addition see the String Services section. See also str().

New in version 2.0.

Changed in version 2.2: Support for __unicode__() added.

vars([object])

Return the __dict__ attribute for a module, class, instance, or any other object with a __dict__ attribute.

Objects such as modules and instances have an updateable __dict__ attribute; however, other objects may have write restrictions on their __dict__attributes (for example, new-style classes use a dictproxy to prevent direct dictionary updates).

Without an argument, vars() acts like locals(). Note, the locals dictionary is only useful for reads since updates to the locals dictionary are ignored.

xrange(stop)
xrange(start, stop[, step]): This function is very similar to range(), but returns an xrange object instead of a list. This is an opaque sequence type which yields the same values as the corresponding list, without actually storing them all simultaneously. The advantage of xrange() over range() is minimal (since xrange() still has to create the values when asked for them) except when a very large range is used on a memory-starved machine or when all of the range’s elements are never used (such as when the loop is usually terminated with break). For more information on xrange objects, see XRange Type and Sequence Types — str, unicode, list, tuple, bytearray, buffer, xrange.

CPython implementation detail: xrange() is intended to be simple and fast. Implementations may impose restrictions to achieve this. The C implementation of Python restricts all arguments to native C longs (“short” Python integers), and also requires that the number of elements fit in a native C long. If a larger range is needed, an alternate version can be crafted using the itertools module: islice(count(start, step), (stop-start+step-1+2*(step<0))//step).

zip([iterable, …])

This function returns a list of tuples, where the i-th tuple contains the i-th element from each of the argument sequences or iterables. The returned list is truncated in length to the length of the shortest argument sequence. When there are multiple arguments which are all of the same length, zip()is similar to map() with an initial argument of None. With a single sequence argument, it returns a list of 1-tuples. With no arguments, it returns an empty list.

The left-to-right evaluation order of the iterables is guaranteed. This makes possible an idiom for clustering a data series into n-length groups using zip(*[iter(s)]*n).

zip() in conjunction with the * operator can be used to unzip a list:

>>> x = [1, 2, 3]
>>> y = [4, 5, 6]
>>> zipped = zip(x, y)
>>> zipped
[(1, 4), (2, 5), (3, 6)]
>>> x2, y2 = zip(*zipped)
>>> x == list(x2) and y == list(y2)
True

New in version 2.0.

Changed in version 2.4: Formerly, zip() required at least one argument and zip() raised a TypeError instead of returning an empty list.

__import__(name[, globals[, locals[, fromlist[, level]]]])

Note

This is an advanced function that is not needed in everyday Python programming, unlike importlib.import_module().

This function is invoked by the import statement. It can be replaced (by importing the __builtin__ module and assigning to __builtin__.__import__) in order to change semantics of the import statement, but nowadays it is usually simpler to use import hooks (see PEP 302). Direct use of __import__()is rare, except in cases where you want to import a module whose name is only known at runtime.

The function imports the module name, potentially using the given globals and locals to determine how to interpret the name in a package context. The fromlist gives the names of objects or submodules that should be imported from the module given by name. The standard implementation does not use its locals argument at all, and uses its globals only to determine the package context of the import statement.

level specifies whether to use absolute or relative imports. The default is -1 which indicates both absolute and relative imports will be attempted. 0means only perform absolute imports. Positive values for level indicate the number of parent directories to search relative to the directory of the module calling __import__().

When the name variable is of the form package.module, normally, the top-level package (the name up till the first dot) is returned, not the module named by name. However, when a non-empty fromlist argument is given, the module named by name is returned.

For example, the statement import spam results in bytecode resembling the following code:

spam = __import__('spam', globals(), locals(), [], -1)

The statement import spam.ham results in this call:

spam = __import__('spam.ham', globals(), locals(), [], -1)

Note how __import__() returns the toplevel module here because this is the object that is bound to a name by the import statement.

On the other hand, the statement from spam.ham import eggs, sausage as saus results in

_temp = __import__('spam.ham', globals(), locals(), ['eggs', 'sausage'], -1)
eggs = _temp.eggs
saus = _temp.sausage

Here, the spam.ham module is returned from __import__(). From this object, the names to import are retrieved and assigned to their respective names.

If you simply want to import a module (potentially within a package) by name, use importlib.import_module().

Changed in version 2.5: The level parameter was added.

Changed in version 2.5: Keyword support for parameters was added.

3. Non-essential Built-in Functions

There are several built-in functions that are no longer essential to learn, know or use in modern Python programming. They have been kept here to maintain backwards compatibility with programs written for older versions of Python.

Python programmers, trainers, students and book writers should feel free to bypass these functions without concerns about missing something important.

apply(function, args[, keywords]): The function argument must be a callable object (a user-defined or built-in function or method, or a class object) and the args argument must be a sequence. The function is called with args as the argument list; the number of arguments is the length of the tuple. If the optional keywordsargument is present, it must be a dictionary whose keys are strings. It specifies keyword arguments to be added to the end of the argument list. Calling apply() is different from just calling function(args), since in that case there is always exactly one argument. The use of apply() is equivalent tofunction(*args, **keywords).

Deprecated since version 2.3: Use function(*args, **keywords) instead of apply(function, args, keywords) (see Unpacking Argument Lists).

buffer(object[, offset[, size]]): The object argument must be an object that supports the buffer call interface (such as strings, arrays, and buffers). A new buffer object will be created which references the object argument. The buffer object will be a slice from the beginning of object (or from the specified offset). The slice will extend to the end of object (or will have a length given by the size argument).

coerce(x, y): Return a tuple consisting of the two numeric arguments converted to a common type, using the same rules as used by arithmetic operations. If coercion is not possible, raise TypeError.

intern(string): Enter string in the table of “interned” strings and return the interned string – which is string itself or a copy. Interning strings is useful to gain a little performance on dictionary lookup – if the keys in a dictionary are interned, and the lookup key is interned, the key comparisons (after hashing) can be done by a pointer compare instead of a string compare. Normally, the names used in Python programs are automatically interned, and the dictionaries used to hold module, class or instance attributes have interned keys.

Jupyter Notebook tips, tricks, and shortcuts

Jupyter Notebook

Jupyter notebook, formerly known as the IPython notebook, is a flexible tool that helps you create readable analyses, as you can keep code, images, comments, formulae and plots together.

Jupyter is quite extensible, supports many programming languages and is easily hosted on your computer or on almost any server — you only need to have ssh or http access. Best of all, it’s completely free.

The Jupyter interface.

Project Jupyter was born out of the IPython project as the project evolved to become a notebook that could support multiple languages – hence its historical name as the IPython notebook. The name Jupyter is an indirect acronyum of the three core languages it was designed for: Julia, Python, and R and is inspired by the planet Jupiter.

When working with Python in Jupyter, the IPython kernel is used, which gives us some handy access to IPython features from within our Jupyter notebooks (more on that later!)

We’re going to show you 28 tips and tricks to make your life working with Jupyter easier.

1. Keyboard Shortcuts

As any power user knows, keyboard shortcuts will save you lots of time. Jupyter stores a list of keybord shortcuts under the menu at the top: Help > Keyboard Shortcuts, or by pressing H in command mode (more on that later). It’s worth checking this each time you update Jupyter, as more shortcuts are added all the time.

Another way to access keyboard shortcuts, and a handy way to learn them is to use the command palette: Cmd + Shift + P (or Ctrl + Shift + P on Linux and Windows). This dialog box helps you run any command by name – useful if you don’t know the keyboard shortcut for an action or if what you want to do does not have a keyboard shortcut. The functionality is similar to Spotlight search on a Mac, and once you start using it you’ll wonder how you lived without it!

The command palette.

Some of my favorites:

Esc will take you into command mode where you can navigate around your notebook with arrow keys.
While in command mode:
- A to insert a new cell above the current cell, B to insert a new cell below.
- M to change the current cell to Markdown, Y to change it back to code
- D + D (press the key twice) to delete the current cell
Enter will take you from command mode back into edit mode for the given cell.
Shift + Tab will show you the Docstring (documentation) for the the object you have just typed in a code cell – you can keep pressing this short cut to cycle through a few modes of documentation.
Ctrl + Shift + - will split the current cell into two from where your cursor is.
Esc + F Find and replace on your code but not the outputs.
Esc + O Toggle cell output (fold and unfold).
Select Multiple Cells:
- Shift + J or Shift + Down selects the next sell in a downwards direction. You can also select sells in an upwards direction by using Shift + K or Shift + Up.
- Once cells are selected, you can then delete / copy / cut / paste / run them as a batch. This is helpful when you need to move parts of a notebook.
- You can also use Shift + M to merge multiple cells.

Merging multiple cells.

2. Pretty Display of Variables

The first part of this is pretty widely known. By finishing a Jupyter cell with the name of a variable or unassigned output of a statement, Jupyter will display that variable without the need for a print statement. This is especially useful when dealing with Pandas DataFrames, as the output is neatly formatted into a table.

What is known less, is that you can alter a modify the ast_note_interactivity kernel option to make jupyter do this for any variable or statement on it’s own line, so you can see the value of multiple statements at once.

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

from pydataset import data
quakes = data('quakes')
quakes.head()
quakes.tail()

	lat	long	depth	mag	stations
1	-20.42	181.62	562	4.8	41
2	-20.62	181.03	650	4.2	15
3	-26.00	184.10	42	5.4	43
4	-17.97	181.66	626	4.1	19
5	-20.42	181.96	649	4.0	11

	lat	long	depth	mag	stations
996	-25.93	179.54	470	4.4	22
997	-12.28	167.06	248	4.7	35
998	-20.13	184.20	244	4.5	34
999	-17.40	187.80	40	4.5	14
1000	-21.59	170.56	165	6.0	119

If you want to set this behaviour for all instances of Jupyter (Notebook and Console), simply create a file ~/.ipython/profile_default/ipython_config.py with the lines below.

c = get_config()

# Run all nodes interactively
c.InteractiveShell.ast_node_interactivity = "all"

3. Easy links to documentation

Inside the Help menu you’ll find handy links to the online documentation for common libraries including NumPy, Pandas, SciPy and Matplotlib.

Don’t forget also that by prepending a library, method or variable with ?, you can access the Docstring for quick reference on syntax.

?str.replace()

Docstring:
S.replace(old, new[, count]) -> str

Return a copy of S with all occurrences of substring
old replaced by new.  If the optional argument count is
given, only the first count occurrences are replaced.
Type:      method_descriptor

4. Plotting in notebooks

There are many options for generating plots in your notebooks.

matplotlib (the de-facto standard), activated with %matplotlib inline – Here’s a Dataquest Matplotlib Tutorial.
%matplotlib notebook provides interactivity but can be a little slow, since rendering is done server-side.
Seaborn is built over Matplotlib and makes building more attractive plots easier. Just by importing Seaborn, your matplotlib plots are made ‘prettier’ without any code modification.
mpld3 provides alternative renderer (using d3) for matplotlib code. Quite nice, though incomplete.
bokeh is a better option for building interactive plots.
plot.ly can generate nice plots – this used to be a paid service only but was recently open sourced.
Altair is a relatively new declarative visualization library for Python. It’s easy to use and makes great looking plots, however the ability to customize those plots is not nearly as powerful as in Matplotlib.

The Jupyter interface.

5. IPython Magic Commands

The %matplotlib inline you saw above was an example of a IPython Magic command. Being based on the IPython kernel, Jupyter has access to all the Magics from the IPython kernel, and they can make your life a lot easier!

# This will list all magic commands
%lsmagic

Available line magics:
%alias  %alias_magic  %autocall  %automagic  %autosave  %bookmark  %cat  %cd  %clear  %colors  %config  %connect_info  %cp  %debug  %dhist  %dirs  %doctest_mode  %ed  %edit  %env  %gui  %hist  %history  %killbgscripts  %ldir  %less  %lf  %lk  %ll  %load  %load_ext  %loadpy  %logoff  %logon  %logstart  %logstate  %logstop  %ls  %lsmagic  %lx  %macro  %magic  %man  %matplotlib  %mkdir  %more  %mv  %notebook  %page  %pastebin  %pdb  %pdef  %pdoc  %pfile  %pinfo  %pinfo2  %popd  %pprint  %precision  %profile  %prun  %psearch  %psource  %pushd  %pwd  %pycat  %pylab  %qtconsole  %quickref  %recall  %rehashx  %reload_ext  %rep  %rerun  %reset  %reset_selective  %rm  %rmdir  %run  %save  %sc  %set_env  %store  %sx  %system  %tb  %time  %timeit  %unalias  %unload_ext  %who  %who_ls  %whos  %xdel  %xmode

Available cell magics:
%%!  %%HTML  %%SVG  %%bash  %%capture  %%debug  %%file  %%html  %%javascript  %%js  %%latex  %%perl  %%prun  %%pypy  %%python  %%python2  %%python3  %%ruby  %%script  %%sh  %%svg  %%sx  %%system  %%time  %%timeit  %%writefile

Automagic is ON, % prefix IS NOT needed for line magics.

I recommend browsing the documentation for all IPython Magic commands as you’ll no doubt find some that work for you. A few of my favorites are below:

6. IPython Magic – %env: Set Environment Variables

You can manage environment variables of your notebook without restarting the jupyter server process. Some libraries (like theano) use environment variables to control behavior, %env is the most convenient way.

# Running %env without any arguments
# lists all environment variables

# The line below sets the environment
# variable OMP_NUM_THREADS
%env OMP_NUM_THREADS=4

env: OMP_NUM_THREADS=4

7. IPython Magic – %run: Execute python code

%run can execute python code from .py files – this is well-documented behavior. Lesser known is the fact that it can also execute other jupyter notebooks, which can quite useful.

Note that using %run is not the same as importing a python module.

# this will execute and show the output from
# all code cells of the specified notebook
%run ./two-histograms.ipynb

8. IPython Magic – %load: Insert the code from an external script

This will replace the contents of the cell with an external script. You can either use a file on your computer as a source, or alternatively a URL.

# Before Running
%load ./hello_world.py

# After Running
# %load ./hello_world.py
if __name__ == "__main__":
	print("Hello World!")

Hello World!

9. IPython Magic – %store: Pass variables between notebooks.

The %store command lets you pass variables between two different notebooks.

data = 'this is the string I want to pass to different notebook'
%store data
del data # This has deleted the variable

Stored 'data' (str)

Now, in a new notebook…

%store -r data
print(data)

this is the string I want to pass to different notebook

10. IPython Magic – %who: List all variables of global scope.

The %who command without any arguments will list all variables that existing in the global scope. Passing a parameter like str will list only variables of that type.

one = "for the money"
two = "for the show"
three = "to get ready now go cat go" 
%who str

one	 three	 two

11. IPython Magic – Timing

There are two IPython Magic commands that are useful for timing – %%time and %timeit. These are especially handy when you have some slow code and you’re trying to indentify where the issue is.

%%time will give you information about a single run of the code in your cell.

%%time
import time
for _ in range(1000):
    time.sleep(0.01)# sleep for 0.01 seconds

CPU times: user 21.5 ms, sys: 14.8 ms, total: 36.3 ms
Wall time: 11.6 s

%%timeit uses the Python timeit module which runs a statement 100,000 times (by default) and then provides the mean of the fastest three times.

import numpy
%timeit numpy.random.normal(size=100)

The slowest run took 7.29 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 5.5 µs per loop

12. IPython Magic – %%writefile and %pycat: Export the contents of a cell/Show the contents of an external script

Using the %%writefile magic saves the contents of that cell to an external file. %pycatdoes the opposite, and shows you (in a popup) the syntax highlighted contents of an external file.

%%writefile pythoncode.py

import numpy
def append_if_not_exists(arr, x):
    if x not in arr:
        arr.append(x)

def some_useless_slow_function():
    arr = list()
    for i in range(10000):
        x = numpy.random.randint(0, 10000)
        append_if_not_exists(arr, x)

Writing pythoncode.py

%pycat pythoncode.py

python
import numpy
def append_if_not_exists(arr, x):
    if x not in arr:
        arr.append(x)

def some_useless_slow_function():
    arr = list()
    for i in range(10000):
        x = numpy.random.randint(0, 10000)
        append_if_not_exists(arr, x)

            



### 13. IPython Magic - %prun: Show how much time your program spent in each function.

Using %prun statement_name will give you an ordered table showing you the number of times each internal function was called within the statement, the time each call took as well as the cumulative time of all runs of the function.


python
%prun some_useless_slow_function()

26324 function calls in 0.556 seconds

Ordered by: internal time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 10000    0.527    0.000    0.528    0.000 <ipython-input-46-b52343f1a2d5>:2(append_if_not_exists)
 10000    0.022    0.000    0.022    0.000 {method 'randint' of 'mtrand.RandomState' objects}
     1    0.006    0.006    0.556    0.556 <ipython-input-46-b52343f1a2d5>:6(some_useless_slow_function)
  6320    0.001    0.000    0.001    0.000 {method 'append' of 'list' objects}
     1    0.000    0.000    0.556    0.556 <string>:1(<module>)
     1    0.000    0.000    0.556    0.556 {built-in method exec}
     1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

14. IPython Magic – Debugging with %pdb

Jupyter has own interface for The Python Debugger (pdb). This makes it possible to go inside the function and investigate what happens there.

You can view a list of accepted commands for pdb here.

%pdb

def pick_and_take():
    picked = numpy.random.randint(0, 1000)
    raise NotImplementedError()

pick_and_take()

Automatic pdb calling has been turned ON

---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
<ipython-input-24-0f6b26649b2e> in <module>()
      5     raise NotImplementedError()
      6 
----> 7 pick_and_take()

<ipython-input-24-0f6b26649b2e> in pick_and_take()
      3 def pick_and_take():
      4     picked = numpy.random.randint(0, 1000)
----> 5     raise NotImplementedError()
      6 
      7 pick_and_take()

NotImplementedError:

> <ipython-input-24-0f6b26649b2e>(5)pick_and_take()
      3 def pick_and_take():
      4     picked = numpy.random.randint(0, 1000)
----> 5     raise NotImplementedError()
      6 
      7 pick_and_take()

ipdb>

15. IPython Magic – High-resolution plot outputs for Retina notebooks

One line of IPython magic will give you double resolution plot output for Retina screens, such as the more recent Macbooks. Note: the example below won’t render on non-retina screens

x = range(1000)
y = [i ** 2 for i in x]
plt.plot(x,y)
plt.show();

%config InlineBackend.figure_format = 'retina'
plt.plot(x,y)
plt.show();

16. Suppress the output of a final function.

Sometimes it’s handy to suppress the output of the function on a final line, for instance when plotting. To do this, you just add a semicolon at the end.

%matplotlib inline
from matplotlib import pyplot as plt
import numpy
x = numpy.linspace(0, 1, 1000)**1.5

# Here you get the output of the function
plt.hist(x)

(array([ 216.,  126.,  106.,   95.,   87.,   81.,   77.,   73.,   71.,   68.]),
 array([ 0. ,  0.1,  0.2,  0.3,  0.4,  0.5,  0.6,  0.7,  0.8,  0.9,  1. ]),
 <a list of 10 Patch objects>)

# By adding a semicolon at the end, the output is suppressed.
plt.hist(x);

17. Executing Shell Commands

It’s easy to execute a shell command from inside your notebook. You can use this to check what datasets are in available in your working folder:

!ls *.csv

nba_2016.csv             titanic.csv
pixar_movies.csv         whitehouse_employees.csv

Or to check and manage packages.

!pip install numpy
!pip list | grep pandas

Requirement already satisfied (use --upgrade to upgrade): numpy in /Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages
pandas (0.18.1)

18. Using LaTeX for forumlas

When you write LaTeX in a Markdown cell, it will be rendered as a formula using MathJax.

This:

\\( P(A \mid B) = \frac{P(B \mid A) \, P(A)}{P(B)} \\)

Becomes this:

$P (A ∣ B) = \frac{P (B ∣ A), P (A)}{P (B)}$

Markdown is an important part of notebooks, so don’t forget to use its expressiveness!

19. Run code from a different kernel in a notebook

If you want to, you can combine code from multiple kernels into one notebook.

Just use IPython Magics with the name of your kernel at the start of each cell that you want to use that Kernel for:

%%bash
%%HTML
%%python2
%%python3
%%ruby
%%perl

%%bash
for i in {1..5}
do
   echo "i is $i"
done

i is 1
i is 2
i is 3
i is 4
i is 5

20. Install other kernels for Jupyter

One of the nice features about Jupyter is ability to run kernels for different languages. As an example, here is how to get and R kernel running.

Easy Option: Installing the R Kernel Using Anaconda

If you used Anaconda to set up your environment, getting R working is extremely easy. Just run the below in your terminal:

conda install -c r r-essentials

Less Easy Option: Installing the R Kernel Manually

If you are not using Anaconda, the process is a little more complex. Firstly, you’ll need to install R from CRAN if you haven’t already.

Once that’s done, fire up an R console and run the following:

install.packages(c('repr', 'IRdisplay', 'crayon', 'pbdZMQ', 'devtools'))
devtools::install_github('IRkernel/IRkernel')
IRkernel::installspec()  # to register the kernel in the current R installation

21. Running R and Python in the same notebook.

The best solution to this is to install rpy2 (requires a working version of R as well), which can be easily done with pip:

pip install rpy2

You can then use the two languages together, and even pass variables inbetween:

%load_ext rpy2.ipython

%R require(ggplot2)

array([1], dtype=int32)

import pandas as pd
df = pd.DataFrame({
        'Letter': ['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c'],
        'X': [4, 3, 5, 2, 1, 7, 7, 5, 9],
        'Y': [0, 4, 3, 6, 7, 10, 11, 9, 13],
        'Z': [1, 2, 3, 1, 2, 3, 1, 2, 3]
    })

%%R -i df
ggplot(data = df) + geom_point(aes(x = X, y= Y, color = Letter, size = Z))

Example courtesy Revolutions Blog

22. Writing functions in other languages

Sometimes the speed of numpy is not enough and I need to write some fast code.
In principle, you can compile function in the dynamic library and write python wrappers…

But it is much better when this boring part is done for you, right?

You can write functions in cython or fortran and use those directly from python code.

First you’ll need to install:

!pip install cython fortran-magic

%load_ext Cython

%%cython
def myltiply_by_2(float x):
    return 2.0 * x

myltiply_by_2(23.)

Personally I prefer to use fortran, which I found very convenient for writing number-crunching functions. More details of usage can be found here.

%load_ext fortranmagic

%%fortran
subroutine compute_fortran(x, y, z)
    real, intent(in) :: x(:), y(:)
    real, intent(out) :: z(size(x, 1))

    z = sin(x + y)

end subroutine compute_fortran

compute_fortran([1, 2, 3], [4, 5, 6])

There are also different jitter systems which can speed up your python code. More examples can be found here.

23. Multicursor support

Jupyter supports mutiple cursors, similar to Sublime Text. Simply click and drag your mouse while holding down Alt.

Multicursor support.

24. Jupyter-contrib extensions

Jupyter-contrib extensions is a family of extensions which give Jupyter a lot more functionality, including e.g. jupyter spell-checker and code-formatter.

The following commands will install the extensions, as well as a menu based configurator that will help you browse and enable the extensions from the main Jupyter notebook screen.

!pip install https://github.com/ipython-contrib/jupyter_contrib_nbextensions/tarball/master
!pip install jupyter_nbextensions_configurator
!jupyter contrib nbextension install --user
!jupyter nbextensions_configurator enable --user

The nbextension configurator.

25. Create a presentation from a Jupyter notebook.

Damian Avila’s RISE allows you to create a powerpoint style presentation from an existing notebook.

You can install RISE using conda:

conda install -c damianavila82 rise

Or alternatively pip:

pip install RISE

And then run the following code to install and enable the extension:

jupyter-nbextension install rise --py --sys-prefix
jupyter-nbextension enable rise --py --sys-prefix

26. The Jupyter output system

Notebooks are displayed as HTML and the cell output can be HTML, so you can return virtually anything: video/audio/images.

In this example I scan the folder with images in my repository and show thumbnails of the first 5:

import os
from IPython.display import display, Image
names = [f for f in os.listdir('../images/ml_demonstrations/') if f.endswith('.png')]
for name in names[:5]:
    display(Image('../images/ml_demonstrations/' + name, width=100))

We can create the same list with a bash command, because magics and bash calls return python variables:

names = !ls ../images/ml_demonstrations/*.png
names[:5]

['../images/ml_demonstrations/colah_embeddings.png',
 '../images/ml_demonstrations/convnetjs.png',
 '../images/ml_demonstrations/decision_tree.png',
 '../images/ml_demonstrations/decision_tree_in_course.png',
 '../images/ml_demonstrations/dream_mnist.png']

27. ‘Big data’ analysis

A number of solutions are available for querying/processing large data samples:

ipyparallel (formerly ipython cluster) is a good option for simple map-reduce operations in python. We use it in rep to train many machine learning models in parallel
pyspark
spark-sql magic %%sql

28. Sharing notebooks

The easiest way to share your notebook is simply using the notebook file (.ipynb), but for those who don’t use Jupyter, you have a few options:

Convert notebooks to html file using the File > Download as > HTML Menu option.
Share your notebook file with gists or on github, both of which render the notebooks. See this example.
- If you upload your notebook to a github repository, you can use the handy mybinder service to allow someone half an hour of interactive Jupyter access to your repository.
Setup your own system with jupyterhub, this is very handy when you organize mini-course or workshop and don’t have time to care about students machines.
Store your notebook e.g. in dropbox and put the link to nbviewer. nbviewer will render the notebook from whichever source you host it.
Use the File > Download as > PDF menu to save your notebook as a PDF. If you’re going this route, I highly recommend reading Julius Schulz’s excellent article Making publication ready Python notebooks.
Create a blog using Pelican from your Jupyter notebooks.

Original Website:

https://www.dataquest.io/blog/jupyter-notebook-tips-tricks-shortcuts/

In C++, “&&” can substitute “and” ;

“||” an substitute “or”;

But in Python, there’s no “&&” and “||”.

Jupyter Quick Access

The following are the keyboard shortcuts for an IPython Notebook. Learning to use these will help speed up your interactive shell development.

Command Mode (press `Esc` to enable)

Enter : enter edit mode
Shift-Enter : run cell, select below
Ctrl-Enter : run cell
Alt-Enter : run cell, insert below
Y : to code
M : to markdown
R : to raw
1 : to heading 1
2 : to heading 2
3 : to heading 3
4 : to heading 4
5 : to heading 5
6 : to heading 6
Up : select cell above
K : select cell above
Down : select cell below
J : select cell below
A : insert cell above
B : insert cell below
X : cut selected cell
C : copy selected cell
Shift-V : paste cell above
V : paste cell below
Z : undo last cell deletion
D,D : delete selected cell
Shift-M : merge cell below
S : Save and Checkpoint
Ctrl-S : Save and Checkpoint
L : toggle line numbers
O : toggle output
Shift-O : toggle output scrolling
Esc : close pager
Q : close pager
H : show keyboard shortcut help dialog
I,I : interrupt kernel
0,0 : restart kernel
Space : scroll down
Shift-Space : scroll up
Shift : ignore

Edit Mode (press `Enter` to enable)

Tab : code completion or indent
Shift-Tab : tooltip
Ctrl-] : indent
Ctrl-[ : dedent
Ctrl-A : select all
Ctrl-Z : undo
Ctrl-Shift-Z : redo
Ctrl-Y : redo
Ctrl-Home : go to cell start
Ctrl-Up : go to cell start
Ctrl-End : go to cell end
Ctrl-Down : go to cell end
Ctrl-Left : go one word left
Ctrl-Right : go one word right
Ctrl-Backspace : delete word before
Ctrl-Delete : delete word after
Esc : command mode
Ctrl-M : command mode
Shift-Enter : run cell, select below
Ctrl-Enter : run cell
Alt-Enter : run cell, insert below
Ctrl-Shift-Subtract : split cell
Ctrl-Shift-- : split cell
Ctrl-S : Save and Checkpoint
Up : move cursor up or previous cell
Down : move cursor down or next cell
Shift : ignore

WHY CODING?

Common C++ usage

cout << “”

Output
cin >> “”

Input
cout<<“”<<endl;//

Refresh and wrap
cout<<“”<<flush;//

Refresh without wrap, continue to output in the same line

Common Problems/Warnings/Mistakes for Beginners

<iostream.h> & <iostream>

Why the following code can’t be built in VC 2017?

#include <iostream.h>

int main()
{
cout<<“Hello World.”<<endl;

return 0;
}

fatal error C1083: No such file or directory

造成这个错误的原因在于历史原因，在过去C++98标准尚未订立的时候，C++的标准输入输出流确实是定义在这个文件里面的，这是C风格的定义方法，随着C++98标准的确定，iostream.h已经被取消，至少在VC2017下面是这样的，取而代之的是我们要用头文件来代替，你甚至可以认为是这样定义的：

namespace std

{

#include “iostream.h”

}

因此我们可以简单的修改我们的Hello World.

#include <iostream>
using namespace std;

int main()
{
cout<<“Hello World.”<<endl;

return 0;
}

iostream.h是属于C++的头文件，而非C的，因此标准订立的时候被改成了。而C的头文件stdio.h等依然可以继续使用，这是为了兼容C代码。但是它们依然有对应的C++版本，如等。记住，在VC2017上面采用C++风格的头文件而不是C风格的头文件，除非你是在用C。

27/01/2018

Terminal Applications
Windows Applications
Web Applications (Database – Server – Browser)
Mobile Applications (Small Screen iOS/Android – Touch Input)
Augmented Reality / Virtual Reality / Wearable

Low Level Language
Assembly
C++ Language
Cross-Platform
Efficiency
Game Development

Higher Level Languages
.NET (C#, VB.NET)
Java
Python

Terminal Screen Applications

1- assignments
2- computation (math computations)
3- decisions
4- repetitions

Computers are good at:
carrying out repetitive tasks concistently without mistake.

Computers are bad at:
ethics/moral concepts
random numbers (pseudo random numbers)

Programming computers:
very specific
break down to simple tasks

1- Data: information stored or needed for the functionality to carry out a task. a piece of information (name, age, year, color)

2- Statements: Operation that are performed on the data.
math operations — 00000001: add
00000010: multiplication
00000011: subtraction.
00000100: division
i/o operation — input/output operation.

8 bit – 1 byte
1024 byte – 1 KB
1024 KB – 1 MB
1024 MB – 1 GB
1024 GB – 1 TB

Text Data uses a character mapping table to map characters to numbers. (ASCII: 1 byte per character, Unicode: 2 bytes per character)

Images: Every pixel has a numeric value.
(RGB)
(255,0,0) bright red
(20,0,0) bright red
(255,255,0) bright yellow

INPUT -> PROCESSING -> OUTPUT

INPUT, OUTPUT are Data
Processing: a set of statements

a set of statements — your program – C++ language

int x = 15 + 2;

a compiler — takes the program in the language in which it was written and translates to machine language (ones and zeros)

an IDE — integrated development environment.

Visual Studio 2017 Community Edition
IMPORTANT: C++ does not get installed automatically, need to go to Custom install and check C++.
Only on Windows — need to first run windows within your MAC if you have MAC.

INPUT, OUTPUT: data — variables in our C++ program.

a variable name must start with a letter or underscore character followed by a letter, underscore char, or a number.

Introduction

Pivot

Pivoting By Multiple Columns

Common Mistake in Pivoting

Pivot Table

Stack/Unstack

Section Headings

Paragraphs

Bold and Italics

Links

Bulleted Lists

Numbered Lists

Quotes

3. Non-essential Built-in Functions

Jupyter Notebook

1. Keyboard Shortcuts

2. Pretty Display of Variables

3. Easy links to documentation

4. Plotting in notebooks

5. IPython Magic Commands

6. IPython Magic – %env: Set Environment Variables

7. IPython Magic – %run: Execute python code

8. IPython Magic – %load: Insert the code from an external script

9. IPython Magic – %store: Pass variables between notebooks.

10. IPython Magic – %who: List all variables of global scope.

11. IPython Magic – Timing

12. IPython Magic – %%writefile and %pycat: Export the contents of a cell/Show the contents of an external script

14. IPython Magic – Debugging with %pdb

15. IPython Magic – High-resolution plot outputs for Retina notebooks

16. Suppress the output of a final function.

17. Executing Shell Commands

18. Using LaTeX for forumlas

19. Run code from a different kernel in a notebook

20. Install other kernels for Jupyter

Easy Option: Installing the R Kernel Using Anaconda

Less Easy Option: Installing the R Kernel Manually

21. Running R and Python in the same notebook.

22. Writing functions in other languages

23. Multicursor support

24. Jupyter-contrib extensions

25. Create a presentation from a Jupyter notebook.

26. The Jupyter output system

27. ‘Big data’ analysis

28. Sharing notebooks

Command Mode (press Esc to enable)

Edit Mode (press Enter to enable)

Command Mode (press `Esc` to enable)

Edit Mode (press `Enter` to enable)