Less Commonly Known Basic Features

Start in JupyterHub

Since this is an advanced course, you should be aware of these features.

int

Integers in Python are an Arbitrary-precision integral data type. Because of that this is possible (compare this to C++/Java).

# exponentiation
2 ** 1000
10715086071862673209484250490600018105614048117055336074437503883703510511249361224931983788156958581275946729175531468251871452856923140435984577574698574803934567774824230985421074605062371141877954182153046474983581941267398767559165543946077062914571196477686542167660429831652624386837205668069376
# left shift
1 << 1000
10715086071862673209484250490600018105614048117055336074437503883703510511249361224931983788156958581275946729175531468251871452856923140435984577574698574803934567774824230985421074605062371141877954182153046474983581941267398767559165543946077062914571196477686542167660429831652624386837205668069376
# right shift
a = 20 >> 2
print(f"{20=:#07b} \n {a=:#07b}\n {a=}")
20=0b10100 
 a=0b00101
 a=5
# the same operation expressed as a divison
20 // (2 ** 2)
5
# bitwise operations
# 5 = 0b101
# 3 = 0b011

a,b,c = 5 | 3, 5 & 3, 5 ^ 3
f"{a=:#05b}, {b=:#05b}, {c=:#05b}"
'a=0b111, b=0b001, c=0b110'
# example of a limited-size data type
import numpy as np
large = np.int32((1 << 31) - 1)
large
2147483647
large + 1
2147483648
large2 = (1 << 31) - 1
large2
2147483647
large2 + 1
2147483648

float

Double-precision floating-point data type (IEEE 754)

“double precision” does not mean “arbitrary precision”

# float literal with exponent
-1.2e-20
-1.2e-20
# this is a result of limited precision
0.1 + 0.2
0.30000000000000004
# exact result
2 ** 1000
10715086071862673209484250490600018105614048117055336074437503883703510511249361224931983788156958581275946729175531468251871452856923140435984577574698574803934567774824230985421074605062371141877954182153046474983581941267398767559165543946077062914571196477686542167660429831652624386837205668069376
# inexact result because of limited precision
2.0 ** 1000
1.0715086071862673e+301
# exact result
2 ** 10000
19950631168807583848837421626835850838234968318861924548520089498529438830221946631919961684036194597899331129423209124271556491349413781117593785932096323957855730046793794526765246551266059895520550086918193311542508608460618104685509074866089624888090489894838009253941633257850621568309473902556912388065225096643874441046759871626985453222868538161694315775629640762836880760732228535091641476183956381458969463899410840960536267821064621427333394036525565649530603142680234969400335934316651459297773279665775606172582031407994198179607378245683762280037302885487251900834464581454650557929601414833921615734588139257095379769119277800826957735674444123062018757836325502728323789270710373802866393031428133241401624195671690574061419654342324638801248856147305207431992259611796250130992860241708340807605932320161268492288496255841312844061536738951487114256315111089745514203313820202931640957596464756010405845841566072044962867016515061920631004186422275908670900574606417856951911456055068251250406007519842261898059237118054444788072906395242548339221982707404473162376760846613033778706039803413197133493654622700563169937455508241780972810983291314403571877524768509857276937926433221599399876886660808368837838027643282775172273657572744784112294389733810861607423253291974813120197604178281965697475898164531258434135959862784130128185406283476649088690521047580882615823961985770122407044330583075869039319604603404973156583208672105913300903752823415539745394397715257455290510212310947321610753474825740775273986348298498340756937955646638621874569499279016572103701364433135817214311791398222983845847334440270964182851005072927748364550578634501100852987812389473928699540834346158807043959118985815145779177143619698728131459483783202081474982171858011389071228250905826817436220577475921417653715687725614904582904992461028630081535583308130101987675856234343538955409175623400844887526162643568648833519463720377293240094456246923254350400678027273837755376406726898636241037491410966718557050759098100246789880178271925953381282421954028302759408448955014676668389697996886241636313376393903373455801407636741877711055384225739499110186468219696581651485130494222369947714763069155468217682876200362777257723781365331611196811280792669481887201298643660768551639860534602297871557517947385246369446923087894265948217008051120322365496288169035739121368338393591756418733850510970271613915439590991598154654417336311656936031122249937969999226781732358023111862644575299135758175008199839236284615249881088960232244362173771618086357015468484058622329792853875623486556440536962622018963571028812361567512543338303270029097668650568557157505516727518899194129711337690149916181315171544007728650573189557450920330185304847113818315407324053319038462084036421763703911550639789000742853672196280903477974533320468368795868580237952218629120080742819551317948157624448298518461509704888027274721574688131594750409732115080498190455803416826949787141316063210686391511681774304792596709376
# this does not even work due to overflow
2.0 ** 10000
---------------------------------------------------------------------------
OverflowError                             Traceback (most recent call last)
/tmp/ipykernel_153/1838217021.py in <module>
      1 # this does not even work due to overflow
----> 2 2.0 ** 10000

OverflowError: (34, 'Numerical result out of range')

NB: int divison vs float divison

# / always produces a float
type(5 / 5)
float
## // always produces an int, rounding down (towards negative infinity) if necessary
type(5 // 5)
int
# this is the way you divide by integers
2 ** 5000 // 10
14124670321394260368352096670161473336688961751845411168136880858571181698427075125580891263167115263733560320843136608276420383806997933833597118572663992343105177785186539901187799964513170706937349821263132375255311121537284403595090053595486073341845340557556673680156558740546469964049905084969947235790090561757137661822821643421318152099155667712649865178220417406183093923917686134138329401824022583869272559614700514424328107527562949533909381319896673563360632969102384245412583588865687313398128724098000883807366822180426443291089403078902021944057819848826733976823887227990215742030724757051042384586887259673589180581872779643575301851808664135601285130254672682300925021832801825190734024544986318326563798786219851104636298546194958728111913990722800438594288095395881655456762529608691688577482893444994136241658867532694033256110366455698262220683447421981108187240492950348199137674037982599879141187980271758388549857511529947174346924111707023039810337861523279371029099265644484289551183035573315202080415792009004181195188045670551546834944618273174232768598927760762070952587831876648836834896501547499786411976544143335692801234411176573533639355787921493700434756820866595871776405929359288751429284355704708916487648311661569188620381299755569017189216973375522446903247507879783090132157994012733721069437728343992228027406079823478674043489345812019834110103381250672004660989116070028400210098045296403978870433530261933759786205219228037148113216414718651416909091719190937
# otherwise, this can happen
int(2 ** 50000 / 10)
---------------------------------------------------------------------------
OverflowError                             Traceback (most recent call last)
/tmp/ipykernel_153/1749841635.py in <module>
      1 # otherwise, this can happen
----> 2 int(2 ** 50000 / 10)

OverflowError: integer division result too large for a float
# mind your division order
5 * 10 / 10
5.0
5 / 10 * 10
5.0
# always put divisions last
5 * 10 // 10
5
# otherwise, this happens
5 // 10 * 10
0

bool

Data type for logical values

# logical and
True and False
False
# logical or
True or False
True
# logical xor
True ^ False
True
# inversion
not True
False
# bool is a subclass of int
isinstance(False, int)
True
# as such, you can use them if as they were ints
False * 4
0
True + 7
8
True * True + False + True
2
# count True values
sum([True, False, True])
2
# if statements require the condition to be of type bool
print_something = True
# unnecessarily verbose
if print_something == True:
    print("something")
something
# why even stop at one?
if ((print_something == True) == True) == True:
    print("something")
something
# most readable way
if print_something:
    print("something")
something
# implicit conversion to bool
if 3:
    print("3 is True")
# explicit conversion to bool
if bool(3):
    print("3 is True")
3 is True
3 is True
# more conversions
bool([]), bool([1, 2, 3]), bool([[]]), bool(None), bool(...)
(False, True, True, False, True)
# this if statement is unnecessary
def is_even(parameter):
    if parameter % 2 == 0:
        return True
    else:
        return False
# this does exactly the same thing
def is_even(parameter):
    return parameter % 2 == 0
is_even(2)
True

complex

Built-in data type for complex numbers (two instances of float).

(3 + 5j) * (7 + 2j)
(11+41j)

list

Ordered, mutable collection of objects (usually of the same type). Provides fast random access to elements.

# note that all entries are of the same type
my_list = [1, 2, 3]
# appending at the end is a fast operation
my_list.append(4)
# indexing works with lists (random access)
my_list[2]
3
# you can extract sublists by slicing
my_list[1:3]
[2, 3]
# this is what a slice looks like internally
my_list[slice(1, 3, None)]
[2, 3]
# you can even reverse by slicing with a negative stride
my_list[3:1:-1]
[4, 3]
# you can also assign to slices
my_list[1:2] = [100, 101, 102, 103]
my_list
[1, 100, 101, 102, 103, 3, 4]

tuple

Ordered, immutable collection of fixed size. Elements usually have different types and meanings. Provides fast random access.

# note that the entries have different types
my_tuple = ("hi", 42, 3.1415)
# tuples are immutable
my_tuple[2] = 3
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_153/82475634.py in <module>
      1 # tuples are immutable
----> 2 my_tuple[2] = 3

TypeError: 'tuple' object does not support item assignment
# you can slice tuples to obtain another tuple
my_tuple[0:2]
('hi', 42)
# but you cannot assign to slices
my_tuple[0:2] = [42]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_153/3755722383.py in <module>
      1 # but you cannot assign to slices
----> 2 my_tuple[0:2] = [42]

TypeError: 'tuple' object does not support item assignment
# parentheses can be dropped in some situations to improve readability
my_other_tuple = 42, 1
# these are 1-tuples
my_favorite_tuple = ("something",)
my_favorite_tuple = "something",
# the meaning of tuple entries dependends on the context!

# point in 3d space with integer coordinates
my_point_in_space = (3, 6, 10)
# scores for an exam with 3 assignments
my_assignment_scores = (3, 6, 10)
# what does the 5 mean? Age? School year? Number of relatives?
person = ("Potter", "Harry", 5)
# you can add minimal documentation when unpacking
lastname, firstname, school_year = person
# use _ to ignore values when unpacking
_, firstname, _ = person

My personal recommendation:

  • Always use indexing on lists, unpacking rarely makes sense

  • Always unpack tuples, indexing tuples makes code difficult to understand

  • If unpacking a tuple becomes too cluttered, consider using something like namedtuple or implementing an appropriate class instead.

Assignment BAS1

  1. Basiswechsel Schreiben Sie eine Funktion, die eine Zahl n und eine Zahl b nimmt und eine Liste mit den Stellen von n in Basis b zurückgibt. convert_to_base(b=3, n=19) -> [2, 0, 1]

Solution

def convert_to_base(target_base, number):
    digits = []
    while number:
        # divmod is like // and % but more efficient
        number, rem = divmod(number, target_base)
        # use append since prepending is more expensive
        digits.append(rem)
    # reverse once at the end (you can also use .reverse() instead)
    return digits[::-1]
convert_to_base(3, 32)
[1, 0, 1, 2]

set

Unordered, duplicate-free collection of objects. Allows fast membership tests and deletions.

set uses hashing to map an object to a number. This number is then used for internal lookup. Thus, every set accepts hashable types only.

hash(5)
5
# hashes for numbers need to be consistent across types
hash(5.0)
5
# a small change in value can result in a large change of the hash value
hash(5.1)
230584300921368581
# the same applies to strings
hash("hello")
-3299196042700386240
hash("hello ")
6582747481294351924
# hash multiple objects at once by hashing a tuple of them
my_collection = ("here", "come", "dat", 1, "boi")
hash(my_collection)
7159632160132692602
# you cannot hash mutable built-in types like lists
hash([1, 2, 3])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_153/4256862437.py in <module>
      1 # you cannot hash mutable built-in types like lists
----> 2 hash([1, 2, 3])

TypeError: unhashable type: 'list'
# {} is not the empty set, but the empty dictionary
empty_set = set()
# set literals work like list literals
my_set = {2, 3, 4}
my_other_set = {2, 4, 7, 9}
# set intersection
my_set & my_other_set
{2, 4}
# set union
my_set | my_other_set
{2, 3, 4, 7, 9}
# symmetric set difference
my_set ^ my_other_set
{3, 7, 9}
# set difference
my_set - my_other_set
{3}
my_big_set = set(range(1, 5_000_000, 4)) | set(range(0, 10_000_000, 7)) | set(range(6, 10_000_000, 23))
len(my_big_set)
2826088
my_big_list = list(my_big_set)
len(my_big_list)
2826088
my_lookups = [353424, 2486578, 234234, 856785, 13424242, 75798657, 24234, 64576786, 142353, 9823402, 25642, 994574]
%%time
# lookups in a list are slow
# never do this unless the list is very short
for number in my_lookups:
    if number in my_big_list:
        print(number)
353424
234234
856785
24234
142353
994574
CPU times: user 211 ms, sys: 3.98 ms, total: 215 ms
Wall time: 213 ms
%%time
# lookups in a set are fast
for number in my_lookups:
    if number in my_big_set:
        print(number)
353424
234234
856785
24234
142353
994574
CPU times: user 55.2 ms, sys: 0 ns, total: 55.2 ms
Wall time: 54.7 ms

(see https://wiki.python.org/moin/TimeComplexity)

str

Immutable, ordered collection of Unicode characters. Supports additional string manipulation methods.

# this one is recommended by style guidelines
my_string = "hello world"
my_other_string = 'hello world'
# escape quotes when they are part of the string
string_in_string = "\"hello world\""
# or use different enclosing quotes
string_in_string = '"hello world"'
# slicing works with strings
my_string[:4]
'hell'
# strings are immutable like tuples
my_string[7] = "x"
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_153/2849831266.py in <module>
      1 # strings are immutable like tuples
----> 2 my_string[7] = "x"

TypeError: 'str' object does not support item assignment
# you can insert by constructing a new string
my_string[:7] + "x" + my_string[7+1:]
'hello wxrld'
# implicit concatenation
my_concat = "hello " "world"
# implicit concatenation on multiple lines
my_concat = (
    "hello "
    "world"
)
my_concat
'hello world'
# implicit concatenation only works with literals
this_does_not_work = my_concat "!"
  File "/tmp/ipykernel_153/3494366410.py", line 2
    this_does_not_work = my_concat "!"
                                   ^
SyntaxError: invalid syntax
# split strings at any amount of whitespace using split
"28     29 01            5".split()
['28', '29', '01', '5']
# you can pass a different separator
"28,29,01,5".split(",")
['28', '29', '01', '5']
columns = ["2", "65", "1", "456", "324", "8"]
# concatenate batches of strings using join
# this is more efficient than using + multiple times
",".join(columns)
'2,65,1,456,324,8'
test = "test"
# you can replicate strings using multiplication
(test + " ") * 5
'test test test test test '
# spot the difference
" ".join([test] * 5)
'test test test test test'

str vs repr

# use str for a nice, human-readable representation
print(str("hello"))
hello
# use repr for a programmer-friendly representation
print(repr("hello"))
'hello'
# str(list) calls repr for each element
str(['a', 'b', 'c'])
"['a', 'b', 'c']"
# this is the repr contract: eval(repr(x)) == x
eval(repr(['a', 'b', 'c']))
['a', 'b', 'c']
# functions do not have a printable representation
def some_function():
    # use pass for empty blocks
    pass

# the repr contract does not hold in this case
print(some_function)
<function some_function at 0x7ffa3fa33310>
from uuid import uuid1

# note the difference between str and repr
my_id = uuid1()
print(repr(my_id))
print(str(my_id))
print(my_id)
UUID('a180dcf4-03e8-11ec-be00-0242ac110002')
a180dcf4-03e8-11ec-be00-0242ac110002
a180dcf4-03e8-11ec-be00-0242ac110002

dict

Ordered, mutable mapping from keys to values. Allows fast lookups and deletions using hashable key types.

word_scores = {"great": 4, "bad": -20, "nice": 2}
# this is like indexing a list with strings
word_scores["cool"] = 5
word_scores
{'great': 4, 'bad': -20, 'nice': 2, 'cool': 5}
word_scores["great"]
4
# you can specify a default value for failed lookups
word_scores.get("great", 0)
4
word_scores.get("nonexistent", 0)
0
associations = [
    ("dude", 0),
    ("whatever", -1),
    ("great", 2)
]
# construct a dict from a list of key-value pairs
other_word_scores = dict(associations)
other_word_scores
{'dude': 0, 'whatever': -1, 'great': 2}
# note what happens to "great"
# word_scores.update(other_word_scores)
word_scores |= other_word_scores
word_scores
{'great': 2, 'bad': -20, 'nice': 2, 'cool': 5, 'dude': 0, 'whatever': -1}
# for enumerates keys only
for key in word_scores:
    # note the lookup here
    print(key, word_scores[key])
great 2
bad -20
nice 2
cool 5
dude 0
whatever -1
# iterate over key-value pairs instead to avoid extra lookups
for key, value in word_scores.items():
    print(key, value)
great 2
bad -20
nice 2
cool 5
dude 0
whatever -1
# you can use dicts as sparse lists by using ints as keys
not_sparse = [None, None, None, None, "something", None, None, None, "whatever"]
sparse = {4: "something", 8: "whatever"}

print(not_sparse[4], sparse[4])
something something
# you can merge multiple dicts with |

rock = {"Pearl Jam": 5, "Metallica": 4.8, "Bob Dylan": 4}
pop = {"Stevie Wonder": 5, "Simon and Garfunkel": 4.5}
music = rock | pop

music
{'Pearl Jam': 5,
 'Metallica': 4.8,
 'Bob Dylan': 4,
 'Stevie Wonder': 5,
 'Simon and Garfunkel': 4.5}
# If a key is in both dicts, the value of the last dict is kept.

rock = {"Pearl Jam": 5, "Metallica": 4.8, "Bob Dylan": 4, "The Beatles": 4.5}
pop = {"Stevie Wonder": 5, "Simon and Garfunkel": 4.5, "The Beatles": 5}
music = rock | pop

music
{'Pearl Jam': 5,
 'Metallica': 4.8,
 'Bob Dylan': 4,
 'The Beatles': 5,
 'Stevie Wonder': 5,
 'Simon and Garfunkel': 4.5}

frozenset / frozendict

Immutable versions of set and dict. Immutabiliy allows these types to be hashed. frozendict is not part of the standard libraries. You can use this module to use a frozendict: https://pypi.org/project/frozendict/

# sets cannot be part of sets
some_set = {1, 2, 3}
set_int_set = {some_set}
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_153/3085792906.py in <module>
      1 # sets cannot be part of sets
      2 some_set = {1, 2, 3}
----> 3 set_int_set = {some_set}

TypeError: unhashable type: 'set'
# frozensets can be part of sets
some_set = {1, 2, 3}
set_int_set = {frozenset(some_set)}
special_deals = {
    frozenset(("drink", "fries", "burger")): 550,
    frozenset(("bacon", "eggs")): 200
}
my_items = ["fries", "drink", "burger"]
# use frozenset for order-insensitive lookup
# as an alternative to sorting keys
special_deals[frozenset(my_items)]
550

bytes

Immutable data type used to represent raw binary data. Use bytearray for mutable data.

my_string = "sweet"
my_string.encode("UTF8")
b'sweet'
my_umlaut_string = "süß"
my_umlaut_string.encode("UTF8")
b's\xc3\xbc\xc3\x9f'
my_literal_bytes = b"a\x20\x20b"
my_literal_bytes.decode("ascii")
'a  b'
# bytearrays are like lists of bounded integers
b = bytearray()
b.extend(b"hi")
b[1] = ord("o")
b
bytearray(b'ho')

Enumerating collections with for

See the Iterators chapter for more details

print_me = [1, 2, 3]
# you can iterate by enumerating the indices of the list
for i in range(len(print_me)):
    print(print_me[i])
1
2
3
# or you can iterate the list directly in a more concise way
for element in print_me:
    print(element)
1
2
3
print_me_set = set(print_me)
# you cannot use indices in sets
for i in range(len(print_me_set)):
    print(print_me_set[i])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_153/3237289424.py in <module>
      1 # you cannot use indices in sets
      2 for i in range(len(print_me_set)):
----> 3     print(print_me_set[i])

TypeError: 'set' object is not subscriptable
# direct iteration still works
for element in print_me_set:
    print(element)
1
2
3

Assignment BAS2

  1. Längster String Schreiben Sie eine Funktion, die eine Liste von Strings nimmt und den längsten String zurückgibt. longest_string("42", "ayy", "lmao") -> "lmao"

def longest_string(*strings):
    return max(strings, key=lambda x: len(x))
longest_string("42", "ayy", "lmao")
'lmao'
  1. Range-Filter Schreiben Sie eine Funktion, die eine Liste von Zahlen sowie eine obere und untere Grenze nimmt und eine Liste mit allen Zahlen zurückgibt, die zwischen den Grenzen liegen. find_numbers_in_interval([1, 2, 5, 5, 8, 6, 1], 2, 6) -> [2, 5, 5, 6]

def find_numbers_in_interval(numbers, lower_limit, upper_limit):
    return [x for x in numbers if lower_limit <= x <= upper_limit]
find_numbers_in_interval([1, 2, 5, 5, 8, 6, 1], 2, 6)
[2, 5, 5, 6]
  1. Clipping Schreiben Sie eine Funktion, die eine Liste von Zahlen sowie eine Zahl als obere Grenze nimmt. Geben Sie eine Liste mit denselben Zahlen zurück, aber wenn eine Zahl größer als die obere Grenze ist, soll stattdessen die obere Grenze an der Stelle stehen. clip([2, 10, 5], 5) -> [2, 5, 5]

def clip(numbers, upper_limit):
    return [min(x,upper_limit) for x in numbers]
clip([2, 10, 5], 5)
[2, 5, 5]
  1. Batching🧑‍🏫 Schreiben Sie eine Funktion, die eine Liste und eine Ganzzahl k nimmt und die Liste in Stücke der Größe k zerlegt. Falls die Zerlegung nicht aufgeht, darf das letzte Stück weniger als k Element beinhalten. batches([1, 2, 3, 4, 5], 2) -> [[1, 2], [3, 4], [5]]

def batches(elements, batch_size):
    return [elements[ofst:ofst+batch_size] for ofst in range(0, len(elements), batch_size)]
batches([1, 2, 3, 4, 5], 2)
[[1, 2], [3, 4], [5]]
  1. Einheiten entfernen (AOS)🧑‍🏫 Schreiben Sie eine Funktion, die eine Liste aus 2-Tupeln aus einer Zahl und einem String nimmt. Die Zahl bezeichnet eine Länge und der String ist entweder “m”, “cm”, oder “mm” und stellt die Einheit der Zahl dar. Erzeugen Sie daraus eine Liste, die alle Längen in Millimetern enthält. remove_units([(5, "mm"), (9, "m")]) -> [5, 9000]

# using list comprehensions does not make it readable
def remove_units(values):
    return [value * 1000 if unit == "m" else value * 10 if unit == "cm" else value for value, unit in values]
remove_units([(5, "mm"), (9, "m")])
[5, 9000]
  1. Einheiten entfernen (SOA)🧑‍🏫 Schreiben Sie die Funktion aus Aufgabe 5 noch einmal, aber anstatt eine Liste aus 2-Tupeln zu erhalten soll Ihre Funktion zwei getrennte Listen für die Werte und die Einheiten nehmen. remove_units_soa([5, 9], ["mm", "m"]) -> [5, 9000]

def remove_units_soa(values, units):
    out = []
    # this is difficult to read
    for i in range(len(values)):
        if units[i] == "m":
            out.append(values[i] * 1000)
        elif units[i] == "cm":
            out.append(values[i] * 10)
        else:
            out.append(values[i])
    return out
remove_units_soa([5, 9], ["mm", "m"])
[5, 9000]

None

Singleton value of type NoneType. Used to represent absence of information, but only if there is no better way to do so.

x = None
y = None
x == y
True
x is y
True
# built-in functions tend not to return None unless they have to
"abcd".find("e")
-1

Referential Identity (is) vs Semantic Equality (==)

# equality can be unreliable
# this class claims to be equal to everything
class Troll:
    def __eq__(self, other):
        return True
troll = Troll()
# troll is obviously not None despite claiming to be
troll == None
True
# when comparing to None, always use "is"
troll is None
False
# the same applies to inequality
troll != None
False
not (troll is None)
True
# more readable way
troll is not None
True
my_tuple = 3, 5
my_other_tuple = 3, 5
# semantic equality
my_tuple == my_other_tuple
True
# referential identity
my_tuple is my_other_tuple
False

Structural Pattern Matching

Based on PEP 636.

As an example to motivate this tutorial, you will be writing a text adventure. That is a form of interactive fiction where the user enters text commands to interact with a fictional world and receives text descriptions of what happens. Commands will be simplified forms of natural language like get sword, attack dragon, go north, enter shop or buy cheese.

Matching sequences

Your main loop will need to get input from the user and split it into words, let’s say a list of strings like this:

command = input("What are you doing next? ")
# analyze the result of command.split()

The next step is to interpret the words. Most of our commands will have two words: an action and an object. So you may be tempted to do the following:

[action, obj] = command.split()
... # interpret action, obj

The problem with that line of code is that it’s missing something: what if the user types more or fewer than 2 words? To prevent this problem you can either check the length of the list of words, or capture the ValueError that the statement above would raise.

You can use a matching statement instead:

# analyze the result of command.split()

match command.split():
    case [action, obj]:
        ... # interpret action, obj
  File "/tmp/ipykernel_153/2004066215.py", line 3
    match command.split():
          ^
SyntaxError: invalid syntax

The match statement evaluates the “subject” (the value after the match keyword), and checks it against the pattern (the code next to case). A pattern is able to do two different things:

  • Verify that the subject has certain structure. In your case, the [action, obj] pattern matches any sequence of exactly two elements. This is called matching

  • It will bind some names in the pattern to component elements of your subject. In this case, if the list has two elements, it will bind action = subject[0] and obj = subject[1].

If there’s a match, the statements inside the case block will be executed with the bound variables. If there’s no match, nothing happens and the statement after match is executed next.

Note that, in a similar way to unpacking assignments, you can use either parenthesis, brackets, or just comma separation as synonyms. So you could write case action, obj or case (action, obj) with the same meaning. All forms will match any sequence (for example lists or tuples).

Matching specific values

Your code still needs to look at the specific actions and conditionally execute different logic depending on the specific action (e.g., quit, attack, or buy). You could do that using a chain of if/elif/elif/..., or using a dictionary of functions, but here we’ll leverage pattern matching to solve that task. Instead of a variable, you can use literal values in patterns (like "quit", 42, or None). This allows you to write:

quit_game = lambda: print("Bye")

class CurrentRoom:
    def describe(self):
        print("The room is empty.")
    def neighbor(self, directon):
        return self
    exits = ["north", "south"]
    inventory = ["key", "coin", "potion"]

class Character:
    def get(self, obj, room):
        print(f"You added {obj} to your inventory.")
    def drop(self, obj, room):
        print(f"You droped {obj}.")

current_room = CurrentRoom()
character = Character()
command = input("What are you doing next? ")

match command.split():
    case ["quit"]:
        print("Goodbye!")
        quit_game()
    case ["look"]:
        current_room.describe()
    case ["get", obj]:
        character.get(obj, current_room)
    case ["go", direction]:
        current_room = current_room.neighbor(direction)
    case ["drop", *objects]: # extended unpacking
        for obj in objects:
            character.drop(obj, current_room)
    case _: # Wildcard
        print(f"Sorry, I couldn't understand {command!r}")

    # The rest of your commands go here
  File "/tmp/ipykernel_153/2967420711.py", line 3
    match command.split():
          ^
SyntaxError: invalid syntax

A pattern like ["get", obj] will match only 2-element sequences that have a first element equal to "get". It will also bind obj = subject[1].

As you can see in the "go" case, we also can use different variable names in different patterns.

Literal values are compared with the == operator except for the constants True, False and None which are compared with the is operator.

In the case you don’t know beforehand how many words will be in the command, you can use extended unpacking in patterns in the same way that they are allowed in assignments.

Wildcard

This special pattern which is written _ (and called wildcard) always matches but it doesn’t bind any variables.

Note that this will match any object, not just sequences. As such, it only makes sense to have it by itself as the last pattern (to prevent errors, Python will stop you from using it before).

More complicated cases

command = input("What are you doing next? ")


match command.split():
    case ["quit"]:
        print("Goodbye!")
        quit_game()
    case ["look"]:
        current_room.describe()
    case ["get", obj] | ["pick", "up", obj] | ["pick", obj, "up"]  if obj in current_room.inventory: # Use or and guard
        character.get(obj, current_room)
    case ["get", obj] | ["pick", "up", obj] | ["pick", obj, "up"]:
        print(f"Sorry, here is no {obj}.")
    case ["go", ("north" | "south" | "east" | "west") as direction]: # sub-pattern, as pattern and guard
        current_room = current_room.neighbor(direction)
    case ["go", _]:
        print("Sorry, you can't go that way")
    case ["drop", *objects]:
        for obj in objects:
            character.drop(obj, current_room)
    case _: # Wildcard
        print(f"Sorry, I couldn't understand {command!r}")
  File "/tmp/ipykernel_153/2935579574.py", line 4
    match command.split():
          ^
SyntaxError: invalid syntax

The third case is called an or pattern and will produce the expected result. Patterns are tried from left to right; this may be relevant to know what is bound if more than one alternative matches. An important restriction when writing or patterns is that all alternatives should bind the same variables. So a pattern [1, x] | [2, y] is not allowed because it would make unclear which variable would be bound after a successful match. [1, x] | [2, x] is perfectly fine and will always bind x if successful. It also uses a guard. Guards consist of the if keyword followed by any expression. The guard is not part of the pattern, it’s part of the case. It’s only checked if the pattern matches, and after all the pattern variables have been bound (that’s why the condition can use the obj variable in the example above). If the pattern matches and the condition is truthy, the body of the case executes normally. If the pattern matches but the condition is falsy, the match statement proceeds to check the next case as if the pattern hadn’t matched (with the possible side-effect of having already bound some variables).

The fourth fifth uses a sub pattern to only match a valid direction (this would be nicer with a guard) and an as pattern to bind the sub pattern to an object.

More (and more complex) examples can be found here.

PEPs

Refinement proposals for Python: https://www.python.org/dev/peps/

PEP8 contains coding style conventions: https://www.python.org/dev/peps/pep-0008/. Pay attention to

  • Naming conventions

  • Spacing conventions

  • Documentation strings

Shebang for Unix-like Operating Systems

This can be used to run scripts on Unix-like systems without explicitly invoking the Python interpreter. https://en.wikipedia.org/wiki/Shebang_(Unix)