An Apprentice Experiment in Python Programming, Part 4

by gilch, konstell11 min read31st Aug 20212 comments

14

ProgrammingApprenticeshipPractical
Frontpage

[Note to readers: The Jupyter notebook version of this post is here]

Previously: https://www.lesswrong.com/posts/fKTqwbGAwPNm6fyEH/an-apprentice-experiment-in-python-programming-part-3

Python Objects in Memory (from comments)

In the previous post, purge commented:

Due to Python's style of reference passing, most of these print statements will show matching id values even if you use any kind of object, not just True/False. Try to predict the output here, then run it to check:

def compare(x, y):
      print(x == y, id(x) == id(y), x is y)
a = {"0": "1"}
b = {"0": "1"}
print(a == b, id(a) == id(b), a is b)
compare(a, b)
c = a
d = a
print(c == d, id(c) == id(d), c is d)
compare(c, d)

When I was coming up with an answer to this question, I got stuck on what the operator is did. I only had a vague sense of how to use it—I knew comparison with None was done via is but didn't know why—so I had to look up what is actually did.

Identity comparisons

The operators "is" and "is not" test for an object’s identity: "x is y" is true if and only if x and y are the same object. An Object’s identity is determined using the "id()" function. "x is not y" yields the inverse truth value.

Here's the doc for id():

id(obj, /) Return the identity of an object. This is guaranteed to be unique among simultaneously existing objects. (CPython uses the object's memory address.)

Then I understood that is would literally check if two objects are the same object. So in the above example we'd get True False False from print(a == b, id(a) == id(b), a is b) and True True True from print(c == d, id(c) == id(d), c is d).

Object Storage in Memory

Speaking of checking if two objects being the same object stored in the same location in memory, gilch made more comments about object storage models (paraphrased):

Compared to C/C++, Python has a more consistent object storage model: everything is an object, only references to objects are stored on the stack, pointing to the actual objects stored in the heap. This means that Python objects are scattered all over the place. One important aspect of CPU optimization is caching contiguous blocks of memory in CPU caches, but Python's model cause cache-miss to be high since two objects adjacent to each other in memory are likely unrelated. This performance degradation is the price for Python's simple memory model.

For computing tasks that have high requirement on performance, NumPy is optimized for making use of blocks of contiguous memory.

== and is

Some remarks gilch made about == and is:

The == operator calls the __eq__ method of an object. The default __eq__ inherits from is, and does a check if two objects are the same object. (Source?) We can have two instances of a number, but not two instances of a True or False.

{} and Set Constructor

We went into a tangent where gilch checked my understanding of sets. We encountered some corner cases like Python interpreting True as 1 and False as 0:

>>> {1, True}
{1}

{} is used to represent both sets and dictionaries, but {} itself would be interpreted as an empty dictionary instead of an empty set:

>>> type({})
<class 'dict'>

To make an empty set, we'd use the set() constructor:

>>> set()
set()

Gilch gave me a puzzle: make an empty set without using the set() constructor.

I came up with the answer {1} - {1} pretty quickly, but gilch had another solution in mind that did not involve using any numbers or letters. Hint: passing in iterables to a constructor results in different values than passing in the same iterables in expressions:

>>> list("hello")
['h', 'e', 'l', 'l', 'o']
>>> ["hello"]
['hello']
>>> set("hello")
{'o', 'l', 'e', 'h'}
>>> {"hello"}
{'hello'}

Using splat, the other way to make an empty set without using the set() constructor is

>>> {*[]}
set()

Magic Methods for Attributes (Continued from last time)

When I was working on the solution that involved modifying the __dict__ last time, I was getting pretty confused about the difference between dir(), vars() and __dict__.

Gilch started by asking me to construct a simple class and making an instance:

class SimpleClass:
    def __init__(self, x):
        self.x = x

sc = SimpleClass(42)

Then we listed out the attributes of sc in different ways:

>>> dir(sc)
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'x']
>>> set(dir(sc)) - set(dir(object))
{'__weakref__', '__module__', 'x', '__dict__'}
>>> set(dir(sc)) - set(dir(type(sc))) - set(dir(object))
{'x'}
>>> sc.x
42
>>> vars(sc)
{'x': 42}
>>> sc.__dict__
{'x': 42}
>>> type(sc).__dict__
mappingproxy({'__module__': '__main__', '__init__': <function SimpleClass.__init__ at 0x7feba0967dc0>, '__dict__': <attribute '__dict__' of 'SimpleClass' objects>, '__weakref__': <attribute '__weakref__' of 'SimpleClass' objects>, '__doc__': None})

The difference between dir and vars is that dir returns all attributes of an object, including the attributes of its class and attributes inherited from its superclasses; on the other hand, vars only returns attributes stored in the default __dict__ attribute, which excludes inherited attributes. This StackOverflow question goes into more details.

__mro__

__mro__ stands for "method resolution order," which provides the inheritance path from the current class all the way up to object. It is honestly the most handy tool I've learned from this session.

Note that __mro__ is a class attribute, not an instance attribute:

>>> sc.__mro__
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'SimpleClass' object has no attribute '__mro__'
>>> type(sc).__mro__
(<class '__main__.SimpleClass'>, <class 'object'>)

Magic Methods for Attributes

Now we can verify that dir(sc) returns the sum of vars(sc), vars(SimpleClass) and vars(object):

>>> vars(sc)
{'x': 42}
>>> vars(type(sc))
mappingproxy({'__module__': '__main__', '__init__': <function SimpleClass.__init__ at 0x7f2ce3b79dc0>, '__dict__': <attribute '__dict__' of 'SimpleClass' objects>, '__weakref__': <attribute '__weakref__' of 'SimpleClass' objects>, '__doc__': None})
>>> vars(object)
mappingproxy({'__repr__': <slot wrapper '__repr__' of 'object' objects>, '__hash__': <slot wrapper '__hash__' of 'object' objects>, '__str__': <slot wrapper '__str__' of 'object' objects>, '__getattribute__': <slot wrapper '__getattribute__' of 'object' objects>, '__setattr__': <slot wrapper '__setattr__' of 'object' objects>, '__delattr__': <slot wrapper '__delattr__' of 'object' objects>, '__lt__': <slot wrapper '__lt__' of 'object' objects>, '__le__': <slot wrapper '__le__' of 'object' objects>, '__eq__': <slot wrapper '__eq__' of 'object' objects>, '__ne__': <slot wrapper '__ne__' of 'object' objects>, '__gt__': <slot wrapper '__gt__' of 'object' objects>, '__ge__': <slot wrapper '__ge__' of 'object' objects>, '__init__': <slot wrapper '__init__' of 'object' objects>, '__new__': <built-in method __new__ of type object at 0x955f60>, '__reduce_ex__': <method '__reduce_ex__' of 'object' objects>, '__reduce__': <method '__reduce__' of 'object' objects>, '__subclasshook__': <method '__subclasshook__' of 'object' objects>, '__init_subclass__': <method '__init_subclass__' of 'object' objects>, '__format__': <method '__format__' of 'object' objects>, '__sizeof__': <method '__sizeof__' of 'object' objects>, '__dir__': <method '__dir__' of 'object' objects>, '__class__': <attribute '__class__' of 'object' objects>, '__doc__': 'The base class of the class hierarchy.\n\nWhen called, it accepts no arguments and returns a new featureless\ninstance that has no instance attributes and cannot be given any.\n'})
>>> type(sc)
<class '__main__.SimpleClass'>
>>> list(vars(sc).keys()) + list(vars(SimpleClass).keys()) + list(vars(object).keys())
['x', '__module__', '__init__', '__dict__', '__weakref__', '__doc__', '__repr__', '__hash__', '__str__', '__getattribute__', '__setattr__', '__delattr__', '__lt__', '__le__', '__eq__', '__ne__', '__gt__', '__ge__', '__init__', '__new__', '__reduce_ex__', '__reduce__', '__subclasshook__', '__init_subclass__', '__format__', '__sizeof__', '__dir__', '__class__', '__doc__']
>>> set(_) == set(dir(sc))
True

Why did we need to covert the two lists to sets when comparing them at the end?

>>> sorted(list(vars(sc).keys()) + list(vars(SimpleClass).keys()) + list(vars(object).keys()))
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'x']

Two of the attributes, __init__ and __doc__, were overridden.

>>> SimpleClass.__init__
<function SimpleClass.__init__ at 0x7f2ce3b79dc0>
>>> object.__init__
<slot wrapper '__init__' of 'object' objects>
>>> SimpleClass.__doc__
>>> object.__doc__
'The base class of the class hierarchy.\n\nWhen called, it accepts no arguments and returns a new featureless\ninstance that has no instance attributes and cannot be given any.\n'

Inheritance and __mro__

Noticing that I didn't understand inheritance completely, gilch gave another example.

class SimpleClass:
    def __init__(self, x):
        self.x = x
    x = 42

class SimpleClass2:
    x = 24
    
class SimpleClass3(SimpleClass, SimpleClass2):
    pass

Here, SimpleClass3 inherits from SimpleClass and SimpleClass2. Both SimpleClass and SimpleClass2 have implemented class method x, which one would SimpleClass3 have?

>>> SimpleClass3.x
42
>>> SimpleClass3.__mro__
(<class '__main__.SimpleClass3'>, <class '__main__.SimpleClass'>, <class '__main__.SimpleClass2'>, <class 'object'>)

However, this changes when we switch the order of inheritance:

class SimpleClass3(SimpleClass2, SimpleClass): # SimpleClass2 now comes first
    pass
>>> SimpleClass3.x
24
>>> SimpleClass3.__mro__
(<class '__main__.SimpleClass3'>, <class '__main__.SimpleClass2'>, <class '__main__.SimpleClass'>, <class 'object'>)

So the inheritance order decides which superclass takes precedence. The Python documentation on method resolution order as well as this talk gives more detailed explanations of the algorithm.

__slots__

__slots__ is used for saving memory.

class SimpleClass4:
    __slots__ = ()
    
sc4 = SimpleClass4()
>>> sc4.__dict__
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'SimpleClass4' object has no attribute '__dict__'
>>> SimpleClass4.x = 42
>>> sc4.x = 0
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'SimpleClass4' object attribute 'x' is read-only

What happened here is that by overriding __slots__ we have restricted the __dict__ attribute of any instance of SimpleClass4. Not adding instance methods means less memory used.

>>> dir(sc4)
['__class__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__slots__', '__str__', '__subclasshook__', 'x']
>>> vars(sc4)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: vars() argument must have __dict__ attribute
>>> vars(type(sc4))
mappingproxy({'__module__': '__main__', '__slots__': (), '__doc__': None, 'x': 42})

As we can see here, sc4 does not have a __dict__ attribute here, so vars(sc4) has become invalid too.

Accessing Attributes of a Superclass

Next, gilch provided an example of using the keyword super. First, we create a class NewTuple that inherits from tuple:

class NewTuple(tuple):
    def __init__(self, x):
        self.x = x

Then we can access the constructor of the superclass by calling super().__new__ and passing in the tuple class as the first argument:

class NewTuple(tuple):
    def __init__(self, x):
        print(x)

    def __new__(cls, y):
        return super().__new__(tuple, [y])
>>> NewTuple(2)
(2,)
>>> type(_)
<class 'tuple'>

We get a tuple object when we call NewTuple(). However, this only works for subtypes of the superclass of the current class. If we pass in list--which is not a subclass of tuple--we would get an error:

class NewTuple(tuple):
    def __init__(self, x):
        print(x)

    def __new__(cls, y):
        return super().__new__(list, [y]) # passing in list instead of tuple
>>> NewTuple(2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "code.py", line 8, in __new__
    return super().__new__(list, [y])
TypeError: tuple.__new__(list): list is not a subtype of tuple

Of course, we can always pass in the current class to make the constructor return an instance of the current class:

class NewTuple(tuple):
    def __init__(self, x):
        print(x)

    def __new__(cls, y):
        return super().__new__(cls, [y]) # passing in cls
>>> type(NewTuple(2))
2
<class '__main__.NewTuple'>

Trace

Next puzzle from gilch: make a @trace decorator that prints inputs and return values.

I came up with a first pass solution:

def trace(f):
    return lambda *args: print(*args, f(*args))

@trace
def addition(x, y):
    return x + y
>>> addition(2, 5)
2 5 7

Then gilch added a condition: the decorated function still needs to return the same value as the undecorated version.

I was pretty stumped on this one. It seemed that I'd need two different statements in the lambda function returned by the decorator for this to work, one to do the printing and the other one to return the value. So gilch gave me a hint: Think about what the expression print('hi') or 1 + 2 evaluates to. Then it occurred to me that, since print returns None, I could use or to combine statements as long as only one of them evaluates to something with boolean value True. After an attempt, I also realized that the statement that produces the True value would need to come last to prevent the expression evaluation being short-circuited.

def trace(f):
    r = []
    return lambda *args, **kwargs: r.append(f(*args, **kwargs)) or print(*r, args, kwargs) or r.pop()


@trace
def addition(x, y):
    return x + y
>>> addition(2, 3)
5 (2, 3) {}
5
>>> addition(2, y=3)
5 (2,) {'y': 3}
5
>>> addition(x=2, y=3)
5 () {'x': 2, 'y': 3}
5

Progn

Gilch asked me to write a function named progn that takes any number of parameters and only returns the last one. Using progn, we can get rid of the or's:

def progn(*args):
    return args[-1]

def trace(f):
    r = []
    return lambda *args, **kwargs: progn(r.append(f(*args, **kwargs)), print(*r, args, kwargs), r.pop())

@trace
def addition(x, y):
    return x + y
>>> addition(2, 3)
5 (2, 3) {}
5
>>> addition(2, y=3)
5 (2,) {'y': 3}
5

Assignment Expression

Gilch introduced the assignment expression, and we rewrote the solution to use it:

def progn(*args):
    return args[-1]

def trace(f):
    return lambda *args, **kwargs: progn(
        r := f(*args, **kwargs),  # moved r inside of the lambda
        print(args, kwargs, r),
        r)

@trace
def addition(x, y):
    return x + y
>>> addition(2, y=3)
(2,) {'y': 3}
5

Earlier I was stumped because I wanted to put two statements inside a lambda function but couldn't. With progn and :=, it's possible to combine multiple statements into one, so effectively create a lambda with multiple statements.

14

2 comments, sorted by Highlighting new comments since Today at 2:00 AM
New Comment

Nice! I always enjoy reading these logs :-)

Python objects are scattered all over the place [on the heap] ... performance degradation is the price for Python's simple memory model. ... NumPy is optimized for making use of blocks of contiguous memory.

Numpy also has the enormous advantage of implementing all the numeric operators in C (or Fortran, or occasionally assembly. (If you want hardware accelerators, interop is a promising work in progress)

You can substantially reduce memory fragmentation and GC pressure with only the standard library array module and memoryview builtin type, if your data suits that pattern. This is particularly useful to implement zero-copy algorithms for IO processing; as soon as the buffer is in memory anywere you just take pointers to slices rather than creating new objects.

JIT implementations of Python (PyPy, Pyjion, etc) are also usually pretty good at reducing the perf impact of Python's memory model, at least if your program is reasonably sensible about what and when it allocates.

With progn and :=, it's possible to combine multiple statements into one, so effectively create a lambda with multiple statements.

Sounds like you're partway to updating onelinerizer.com for Python 3!

Sounds like you're partway to updating onelinerizer.com for Python 3!

For the avoidance of doubt, the "obvious way" to do this (for an acculturated Python programmer) is with a nested def, which makes the progn thing non-obvious and therefore unpythonic. I strongly hinted at the obvious approach here, but konstell latched onto using a lambda instead (probably because she didn't realize that named functions could also be closures). I saw a teaching opportunity in this, so I rolled with it. I got to dispel the myth that lambdas can only have one line and also introduced assignment expressions. I was going to get around to the obvious way, but we ran out of time.

With progn and :=, it's possible to combine multiple statements into one, so effectively create a lambda with multiple statements.

konstell is using the terminology is a little imprecisely here. In Python, an "expression" evaluates to an object, while a "statement" is an instruction that does not evaluate to an object (not even None). Most statement types can contain expressions, however expressions cannot contain statements (exec() doesn't count).

One of the simplest types of statements in Python is the "expression statement", which contains a single expression and discards its result. A progn() expression can discard the results of subexpressions in a similar way, making them act like expression statements, but they are not technically Python statements. We also found an expression substitute for an assignment statement. It's ultimately possible to use expressions for everything you'd normally use statements for, but this is not the "obvious way" to do it.

See my Drython and Hissp projects for more on "onlinerizing" Python.