Standard Library II¶
functools
¶
https://docs.python.org/3/library/functools.html
Some useful features of functools
include the ability to automatically cache value as well as to automatically generate rich comparison methods from a single definition (see MyFraction
). wraps
can be used when defining function wrappers in decorators to copy the original function’s name and docstring.
# https://en.wikipedia.org/wiki/Levenshtein_distance
def naive_distance(text1, text2):
def compute(ofst1, ofst2):
if min(ofst1, ofst2) == 0:
return max(ofst1, ofst2)
return min(
compute(ofst1 - 1, ofst2) + 1,
compute(ofst1, ofst2 - 1) + 1,
compute(ofst1 - 1, ofst2 - 1) + (text1[ofst1 - 1] != text2[ofst2 - 1]),
)
return compute(len(text1), len(text2))
%%timeit
# huge number of redundant calls
naive_distance("shellfish", "selfish")
179 ms ± 11.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
from functools import lru_cache
def distance(text1, text2):
# automatically cache results
@lru_cache(None)
def compute(ofst1, ofst2):
if min(ofst1, ofst2) == 0:
return max(ofst1, ofst2)
return min(
compute(ofst1 - 1, ofst2) + 1,
compute(ofst1, ofst2 - 1) + 1,
compute(ofst1 - 1, ofst2 - 1) + (text1[ofst1 - 1] != text2[ofst2 - 1]),
)
return compute(len(text1), len(text2))
%%timeit
distance("shellfish", "fellshish")
99.2 µs ± 1.48 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
contextlib
¶
https://docs.python.org/3/library/contextlib.html
contextlib
provides basic context managers that you would have to write yourself otherwise. @contextmanager
can be used to define a context manager using generator syntax. Take a look at nullcontext
, closing
, suppress
and the redirect_*
functions.
Assignment SL7¶
Unbounded Cache
lru_cache(None)
speichert beliebig viele Elemente anstatt sich nur die zuletzt angefragten Elemente zu merken. Schreiben Sie einen Decoratorunbounded_cache
, der genau dasselbe macht wielru_cache(None)
.unbounded_cache
soll eindict
nutzen, um bereits berechnete Funktionswerte zu speichern und zurückzugeben, ohne dass die gewrappte Funktion aufgerufen wird. Es ist okay, wenn Ihre Version keine Keyword-Argumente unterstützt.
def unbounded_cache(f):
cache = {}
@functools.wraps(f)
def wrapper(*args):
try:
return cache[args]
except KeyError:
result = f(*args)
cache[args] = result
return result
return wrapper
@contextmanager
Im Kurs haben wir den Kontextmanagerstring_style
als Klasse implementiert. Implementieren Siestring_style
erneut unter Verwendung von@contextmanager
auscontextlib
.
from contextlib import contextmanager
@contextmanager
def string_style(new_style):
global _STRING_REPRESENTATION
old_style = _STRING_REPRESENTATION
_STRING_REPRESENTATION = new_style
try:
yield
finally:
_STRING_REPRESENTATION = old_style
pathlib
¶
https://docs.python.org/3/library/pathlib.html
pathlib
provides path objects that represent file system paths. This makes directory structures more intuitive to navigate in Python.
Take a close look at glob
and rglob
since these can recursively enumerate files in a folder hierarchy.
An an example, we can use pathlib
to merge multiple folders containing files with overlapping names. This is especially useful for photos.
argparse
¶
https://docs.python.org/3/library/argparse.html
Library for argument parsing. See pathlib
example code.
re
¶
https://docs.python.org/3/library/re.html
Also see https://regexr.com/
In this example, we want to parse a list of ranges from a string.
A range including index 1 to 5 is written as 1-5
, but we also want to allow arbitrary spacing, e.g. 1 - 5
.
import re
# match one or more digits
re.match("[0-9]+", "1")
<re.Match object; span=(0, 1), match='1'>
# this one does not match
re.match("[0-9]+", "")
# match one or more digits, a hyphen, and one or more digits again
re.match("[0-9]+-[0-9]+", "1-5-")
<re.Match object; span=(0, 3), match='1-5'>
# force the match to take up the entire string
re.match("^[0-9]+-[0-9]+$", "1-5-")
# same as above, but more readable
re.fullmatch("[0-9]+-[0-9]+", "1-5-")
# we do not account for spaces yet
re.fullmatch("[0-9]+-[0-9]+", "1 - 5")
# \s* matches zero or more whitespace characters
re.fullmatch("[0-9]+\s*-\s*[0-9]+", "1 - 5")
<re.Match object; span=(0, 5), match='1 - 5'>
# we still do not account for spaces at the beginning and end of the string
re.fullmatch("[0-9]+\s*-\s*[0-9]+", " 1 - 5 ")
# this is easily fixable
re.fullmatch("\s*[0-9]+\s*-\s*[0-9]+\s*", " 1 - 5 ")
<re.Match object; span=(0, 7), match=' 1 - 5 '>
# parentheses denote capture groups that can be read later
re.fullmatch("\s*([0-9]+)\s*-\s*([0-9]+)\s*", " 1 - 5 ").groups()
('1', '5')
# capture groups can be named for better readability
re.fullmatch("\s*(?P<from>[0-9]+)\s*-\s*(?P<to>[0-9]+)\s*", " 1 - 5 ").groupdict()
{'from': '1', 'to': '5'}
# "?" means the previous token is optional
re.fullmatch("Aufgaben?", "Aufgabe")
<re.Match object; span=(0, 7), match='Aufgabe'>
re.fullmatch("Aufgaben?", "Aufgaben")
<re.Match object; span=(0, 8), match='Aufgaben'>
# regex is case-sensitive by default
re.fullmatch("Aufgaben?", "aufgaben")
# use the "I" flag for case-insensitive matching
re.fullmatch("Aufgaben?", "aufgaben", re.I)
<re.Match object; span=(0, 8), match='aufgaben'>
# make upper limit optional so we can write unbounded ranges like "5-"
# the addition of parentheses introduced another group, but we only want 2 groups
re.fullmatch("\s*(?P<from>[0-9]+)\s*-(\s*(?P<to>[0-9]+))?\s*", "1-5").groups()
('1', '5', '5')
# use "?:" to denote a non-capturing group
re.fullmatch("\s*(?P<from>[0-9]+)\s*-(?:\s*(?P<to>[0-9]+))?\s*", "1-5").groups()
('1', '5')
# also allows single numbers
re.fullmatch("\s*(?P<from>[0-9]+)\s*(?:-(?:\s*(?P<to>[0-9]+))?)?\s*", "1").groups()
('1', None)
# ranges can be separated by "," or ";"
indices = "1,2-4;6,10-"
shitty_indices = " 1 ;2- 4 , 6,10 -"
invalid_indices = "4;;7"
# no capture groups this time, but match multiple ranges using "*"
range_format = r"\s*[0-9]+\s*(?:-(?:\s*[0-9]+)?\s*)?"
multiple_ranges = f"{range_format}(?:[,;]{range_format})*"
# matches
re.fullmatch(multiple_ranges, indices)
<re.Match object; span=(0, 11), match='1,2-4;6,10-'>
# still matches
re.fullmatch(multiple_ranges, shitty_indices)
<re.Match object; span=(0, 34), match=' 1 ;2- 4 , 6,10 -'>
# this does not match
re.fullmatch(multiple_ranges, invalid_indices)
The next step is parsing the actual ranges. We could use split
and multiple iterations of strip
, but that makes detecting illegal strings very tedious. Instead, fullmatch
is used to check the format beforehand and finditer
enumerates all matches for us and automatically extracts the relevant parts.
from collections import namedtuple
IndexRange = namedtuple('IndexRange', ['start', 'end'])
import re
RANGE_FORMAT = r"\s*[0-9]+\s*(?:-(?:\s*[0-9]+)?\s*)?"
MULTIPLE_RANGES_FORMAT = f"{RANGE_FORMAT}([,;]{RANGE_FORMAT})*"
# make sure the range is terminated by ";" or "," or the end of the string
TERMINATED_RANGE_WITH_LABELS = r"\s*(?:(?P<index>[0-9]+)|(?P<start>[0-9]+)\s*-\s*(?P<end>[0-9]+)?)\s*?(?:[,;]|$)"
def parse_index_ranges(ranges):
if re.fullmatch(MULTIPLE_RANGES_FORMAT, ranges) is None:
raise ValueError("Illegal format")
out = []
for match in re.finditer(TERMINATED_RANGE_WITH_LABELS, ranges):
groups = match.groupdict()
if groups["index"] is not None:
index = int(groups["index"])
out.append(IndexRange(index, index + 1))
else:
start = int(groups["start"])
if groups["end"] is not None:
end = int(groups["end"]) + 1
else:
end = None
out.append(IndexRange(start, end))
return out
parse_index_ranges(" 1 ;2- 4 , 6,10 -")
[IndexRange(start=1, end=2),
IndexRange(start=2, end=5),
IndexRange(start=6, end=7),
IndexRange(start=10, end=None)]
# use ranges to slice a sequence
def index_range_to_slice(index_range):
return slice(index_range.start, index_range.end)
even_numbers = list(range(0, 50, 3))
[n for r in parse_index_ranges(indices) for n in even_numbers[index_range_to_slice(r)]]
[3, 6, 9, 12, 18, 30, 33, 36, 39, 42, 45, 48]
Note: Sometimes, re
ist not the right tool for the job, especially when dealing with non-regular grammars. Consider using urllib
for URLs, a JSON library for JSON data, and some SGML/XML library for HTML/XHTML. A famous quote goes
Some people, when confronted with a problem, think
“I know, I'll use regular expressions.”
Now they have two problems.
Assignment SL8 🧑🏫¶
Ranges als String Schreiben Sie eine Funktion, die die mit Regex geparste Liste von Ranges wieder in einen String umwandelt.
def _range_as_string(r):
if r.end is None:
return f"{r.start}-"
if r.start == r.end - 1:
return f"{r.start}"
return f"{r.start}-{r.end - 1}"
def ranges_as_string(ranges):
return ",".join(_range_as_string(r) for r in ranges)
ranges = parse_index_ranges(" 1 ;2- 4 , 6,10 -")
ranges_as_string(ranges)
'1,2-4,6,10-'
Punktzahl finden Sie erhalten eine Textdatei im unten dargestellen Format. Schreiben Sie eine Funktion, die die Gesamtpunktzahl aus der Datei extrahiert und als
int
zurückgibt.
Bewertung:
Aufgabe 1:
schön, 2 Punkte
Aufgabe 2:
nichts abgegeben
0 Punkte
Aufgabe 3:
literally bester Code, 8 Punkte
Gesamtpunktzahl: 10 Punkte
Musterlösungen sind im Reader!
import re
def find_total_points(file):
# file is expected to be small, read everything at once
content = file.read()
match = re.search(r"Gesamtpunktzahl:\s*(?P<points>[+-]?\s*[0-9]+)\s*Punkte?", content)
if match is None:
raise ValueError("Total points not found")
points = match.group("points").replace(" ", "")
return int(points)
Gebrochene Punktzahl finden Wie vorher, aber die Gesamtpunktzahl kann jetzt eine Kommazahl sein, also
2.3
oder2,3
. Passen Sie Ihre Funktion so an, dass sie damit zurechtkommt.
import re
def find_total_points_float(file):
content = file.read()
match = re.search(r"Gesamtpunktzahl:\s*(?P<points>[+-]?\s*[0-9]+(?:[,.][0-9]+)?)\s*Punkte?", content)
if match is None:
raise ValueError("Total points not found")
points = match.group("points").replace(" ", "").replace(",", ".")
return float(points)
import io
find_total_points_float(io.StringIO("""\
Bewertung:
Aufgabe 1:
schön, 2 Punkte
Aufgabe 2:
nichts abgegeben
0 Punkte
Aufgabe 3:
literally bester Code, 8 Punkte
Gesamtpunktzahl: - 10.3 Punkte
Musterlösungen sind im Reader!
"""))
-10.3
subprocess
/ctypes
¶
In some cases, you might not be able to find an appropriate library for your needs.
Python still allows you to reuse existing code by providing interfaces to the shell and to C code.
Using subprocess
or ctypes
might still be easier than writing code yourself.
Assignment LF6 🧑🏫¶
Wörter zählen mit Sonderzeichen Schreiben Sie eine Variante von
count_words
aus SL1, die mittels regulärer Ausdrücke Sonderzeichen aus dem String entfernt, bevor die Wörter gezählt werden.