Files

David Peter 820a31af5d [red-knot] Attribute access and the descriptor protocol (#16416 )

## Summary

* Attributes/method are now properly looked up on metaclasses, when
called on class objects
* We properly distinguish between data descriptors and non-data
descriptors (but we do not yet support them in store-context, i.e.
`obj.data_descr = …`)
* The descriptor protocol is now implemented in a single unified place
for instances, classes and dunder-calls. Unions and possibly-unbound
symbols are supported in all possible stages of the process by creating
union types as results.
* In general, the handling of "possibly-unbound" symbols has been
improved in a lot of places: meta-class attributes, attributes,
descriptors with possibly-unbound `__get__` methods, instance
attributes, …
* We keep track of type qualifiers in a lot more places. I anticipate
that this will be useful if we import e.g. `Final` symbols from other
modules (see relevant change to typing spec:
https://github.com/python/typing/pull/1937).
* Detection and special-casing of the `typing.Protocol` special form in
order to avoid lots of changes in the test suite due to new `@Todo`
types when looking up attributes on builtin types which have `Protocol`
in their MRO. We previously
looked up attributes in a wrong way, which is why this didn't come up
before.

closes #16367
closes #15966

## Context

The way attribute lookup in `Type::member` worked before was simply
wrong (mostly my own fault). The whole instance-attribute lookup should
probably never have been integrated into `Type::member`. And the
`Type::static_member` function that I introduced in my last descriptor
PR was the wrong abstraction. It's kind of fascinating how far this
approach took us, but I am pretty confident that the new approach
proposed here is what we need to model this correctly.

There are three key pieces that are required to implement attribute
lookups:

- **`Type::class_member`**/**`Type::find_in_mro`**: The
`Type::find_in_mro` method that can look up attributes on class bodies
(and corresponding bases). This is a partial function on types, as it
can not be called on instance types like`Type::Instance(…)` or
`Type::IntLiteral(…)`. For this reason, we usually call it through
`Type::class_member`, which is essentially just
`type.to_meta_type().find_in_mro(…)` plus union/intersection handling.
- **`Type::instance_member`**: This new function is basically the
type-level equivalent to `obj.__dict__[name]` when called on
`Type::Instance(…)`. We use this to discover instance attributes such as
those that we see as declarations on class bodies or as (annotated)
assignments to `self.attr` in methods of a class.
- The implementation of the descriptor protocol. It works slightly
different for instances and for class objects, but it can be described
by the general framework:
- Call `type.class_member("attribute")` to look up "attribute" in the
MRO of the meta type of `type`. Call the resulting `Symbol` `meta_attr`
(even if it's unbound).
- Use `meta_attr.class_member("__get__")` to look up `__get__` on the
*meta type* of `meta_attr`. Call it with `__get__(meta_attr, self,
self.to_meta_type())`. If this fails (either the lookup or the call),
just proceed with `meta_attr`. Otherwise, replace `meta_attr` in the
following with the return type of `__get__`. In this step, we also probe
if a `__set__` or `__delete__` method exists and store it in
`meta_attr_kind` (can be either "data descriptor" or "normal attribute
or non-data descriptor").
  - Compute a `fallback` type.
    - For instances, we use `self.instance_member("attribute")`
- For class objects, we use `class_attr =
self.find_in_mro("attribute")`, and then try to invoke the descriptor
protocol on `class_attr`, i.e. we look up `__get__` on the meta type of
`class_attr` and call it with `__get__(class_attr, None, self)`. This
additional invocation of the descriptor protocol on the fallback type is
one major asymmetry in the otherwise universal descriptor protocol
implementation.
- Finally, we look at `meta_attr`, `meta_attr_kind` and `fallback`, and
handle various cases of (possible) unboundness of these symbols.
- If `meta_attr` is bound and a data descriptor, just return `meta_attr`
- If `meta_attr` is not a data descriptor, and `fallback` is bound, just
return `fallback`
- If `meta_attr` is not a data descriptor, and `fallback` is unbound,
return `meta_attr`
- Return unions of these three possibilities for partially-bound
symbols.

This allows us to handle class objects and instances within the same
framework. There is a minor additional detail where for instances, we do
not allow the fallback type (the instance attribute) to completely
shadow the non-data descriptor. We do this because we (currently) don't
want to pretend that we can statically infer that an instance attribute
is always set.

Dunder method calls can also be embedded into this framework. The only
thing that changes is that *there is no fallback type*. If a dunder
method is called on an instance, we do not fall back to instance
variables. If a dunder method is called on a class object, we only look
it up on the meta class, never on the class itself.

## Test Plan

New Markdown tests.

2025-03-07 22:03:28 +01:00

12 KiB

Raw Blame History

Methods

Background: Functions as descriptors

Note: See also this related section in the descriptor guide: Functions and methods.

Say we have a simple class C with a function definition f inside its body:

class C:
    def f(self, x: int) -> str:
        return "a"

Whenever we access the f attribute through the class object itself (C.f) or through an instance (C().f), this access happens via the descriptor protocol. Functions are (non-data) descriptors because they implement a __get__ method. This is crucial in making sure that method calls work as expected. In general, the signature of the __get__ method in the descriptor protocol is __get__(self, instance, owner). The self argument is the descriptor object itself (f). The passed value for the instance argument depends on whether the attribute is accessed from the class object (in which case it is None), or from an instance (in which case it is the instance of type C). The owner argument is the class itself (C of type Literal[C]). To summarize:

C.f is equivalent to getattr_static(C, "f").__get__(None, C)
C().f is equivalent to getattr_static(C, "f").__get__(C(), C)

Here, inspect.getattr_static is used to bypass the descriptor protocol and directly access the function attribute. The way the special __get__ method on functions works is as follows. In the former case, if the instance argument is None, __get__ simply returns the function itself. In the latter case, it returns a bound method object:

from inspect import getattr_static

reveal_type(getattr_static(C, "f"))  # revealed: Literal[f]

reveal_type(getattr_static(C, "f").__get__)  # revealed: <method-wrapper `__get__` of `f`>

reveal_type(getattr_static(C, "f").__get__(None, C))  # revealed: Literal[f]
reveal_type(getattr_static(C, "f").__get__(C(), C))  # revealed: <bound method `f` of `C`>

In conclusion, this is why we see the following two types when accessing the f attribute on the class object C and on an instance C():

reveal_type(C.f)  # revealed: Literal[f]
reveal_type(C().f)  # revealed: <bound method `f` of `C`>

A bound method is a callable object that contains a reference to the instance that it was called on (can be inspected via __self__), and the function object that it refers to (can be inspected via __func__):

bound_method = C().f

reveal_type(bound_method.__self__)  # revealed: C
reveal_type(bound_method.__func__)  # revealed: Literal[f]

When we call the bound method, the instance is implicitly passed as the first argument (self):

reveal_type(C().f(1))  # revealed: str
reveal_type(bound_method(1))  # revealed: str

When we call the function object itself, we need to pass the instance explicitly:

C.f(1)  # error: [missing-argument]

reveal_type(C.f(C(), 1))  # revealed: str

When we access methods from derived classes, they will be bound to instances of the derived class:

class D(C):
    pass

reveal_type(D().f)  # revealed: <bound method `f` of `D`>

If we access an attribute on a bound method object itself, it will defer to types.MethodType:

reveal_type(bound_method.__hash__)  # revealed: <bound method `__hash__` of `MethodType`>

If an attribute is not available on the bound method object, it will be looked up on the underlying function object. We model this explicitly, which means that we can access __kwdefaults__ on bound methods, even though it is not available on types.MethodType:

reveal_type(bound_method.__kwdefaults__)  # revealed: @Todo(generics) | None

Basic method calls on class objects and instances

class Base:
    def method_on_base(self, x: int | None) -> str:
        return "a"

class Derived(Base):
    def method_on_derived(self, x: bytes) -> tuple[int, str]:
        return (1, "a")

reveal_type(Base().method_on_base(1))  # revealed: str
reveal_type(Base.method_on_base(Base(), 1))  # revealed: str

Base().method_on_base("incorrect")  # error: [invalid-argument-type]
Base().method_on_base()  # error: [missing-argument]
Base().method_on_base(1, 2)  # error: [too-many-positional-arguments]

reveal_type(Derived().method_on_base(1))  # revealed: str
reveal_type(Derived().method_on_derived(b"abc"))  # revealed: tuple[int, str]
reveal_type(Derived.method_on_base(Derived(), 1))  # revealed: str
reveal_type(Derived.method_on_derived(Derived(), b"abc"))  # revealed: tuple[int, str]

Method calls on literals

Boolean literals

reveal_type(True.bit_length())  # revealed: int
reveal_type(True.as_integer_ratio())  # revealed: tuple[int, Literal[1]]

Integer literals

reveal_type((42).bit_length())  # revealed: int

String literals

reveal_type("abcde".find("abc"))  # revealed: int
reveal_type("foo".encode(encoding="utf-8"))  # revealed: bytes

"abcde".find(123)  # error: [invalid-argument-type]

Bytes literals

reveal_type(b"abcde".startswith(b"abc"))  # revealed: bool

Method calls on `LiteralString`

from typing_extensions import LiteralString

def f(s: LiteralString) -> None:
    reveal_type(s.find("a"))  # revealed: int

Method calls on `tuple`

def f(t: tuple[int, str]) -> None:
    reveal_type(t.index("a"))  # revealed: int

Method calls on unions

from typing import Any

class A:
    def f(self) -> int:
        return 1

class B:
    def f(self) -> str:
        return "a"

def f(a_or_b: A | B, any_or_a: Any | A):
    reveal_type(a_or_b.f)  # revealed: <bound method `f` of `A`> | <bound method `f` of `B`>
    reveal_type(a_or_b.f())  # revealed: int | str

    reveal_type(any_or_a.f)  # revealed: Any | <bound method `f` of `A`>
    reveal_type(any_or_a.f())  # revealed: Any | int

Method calls on `KnownInstance` types

[environment]
python-version = "3.12"

type IntOrStr = int | str

reveal_type(IntOrStr.__or__)  # revealed: <bound method `__or__` of `typing.TypeAliasType`>

Error cases: Calling `get` for methods

The __get__ method on types.FunctionType has the following overloaded signature in typeshed:

from types import FunctionType, MethodType
from typing import overload

@overload
def __get__(self, instance: None, owner: type, /) -> FunctionType: ...
@overload
def __get__(self, instance: object, owner: type | None = None, /) -> MethodType: ...

Here, we test that this signature is enforced correctly:

from inspect import getattr_static

class C:
    def f(self, x: int) -> str:
        return "a"

method_wrapper = getattr_static(C, "f").__get__

reveal_type(method_wrapper)  # revealed: <method-wrapper `__get__` of `f`>

# All of these are fine:
method_wrapper(C(), C)
method_wrapper(C())
method_wrapper(C(), None)
method_wrapper(None, C)

# Passing `None` without an `owner` argument is an
# error: [missing-argument] "No argument provided for required parameter `owner`"
method_wrapper(None)

# Passing something that is not assignable to `type` as the `owner` argument is an
# error: [invalid-argument-type] "Object of type `Literal[1]` cannot be assigned to parameter 2 (`owner`) of method wrapper `__get__` of function `f`; expected type `type`"
method_wrapper(None, 1)

# Passing `None` as the `owner` argument when `instance` is `None` is an
# error: [invalid-argument-type] "Object of type `None` cannot be assigned to parameter 2 (`owner`) of method wrapper `__get__` of function `f`; expected type `type`"
method_wrapper(None, None)

# Calling `__get__` without any arguments is an
# error: [missing-argument] "No argument provided for required parameter `instance`"
method_wrapper()

# Calling `__get__` with too many positional arguments is an
# error: [too-many-positional-arguments] "Too many positional arguments to method wrapper `__get__` of function `f`: expected 2, got 3"
method_wrapper(C(), C, "one too many")

Fallback to metaclass

When a method is accessed on a class object, it is looked up on the metaclass if it is not found on the class itself. This also creates a bound method that is bound to the class object itself:

from __future__ import annotations

class Meta(type):
    def f(cls, arg: int) -> str:
        return "a"

class C(metaclass=Meta):
    pass

reveal_type(C.f)  # revealed: <bound method `f` of `Literal[C]`>
reveal_type(C.f(1))  # revealed: str

The method f can not be accessed from an instance of the class:

# error: [unresolved-attribute] "Type `C` has no attribute `f`"
C().f

A metaclass function can be shadowed by a method on the class:

from typing import Any, Literal

class D(metaclass=Meta):
    def f(arg: int) -> Literal["a"]:
        return "a"

reveal_type(D.f(1))  # revealed: Literal["a"]

If the class method is possibly unbound, we union the return types:

def flag() -> bool:
    return True

class E(metaclass=Meta):
    if flag():
        def f(arg: int) -> Any:
            return "a"

reveal_type(E.f(1))  # revealed: str | Any

`@classmethod`

Basic

When a @classmethod attribute is accessed, it returns a bound method object, even when accessed on the class object itself:

from __future__ import annotations

class C:
    @classmethod
    def f(cls: type[C], x: int) -> str:
        return "a"

reveal_type(C.f)  # revealed: <bound method `f` of `Literal[C]`>
reveal_type(C().f)  # revealed: <bound method `f` of `type[C]`>

The cls method argument is then implicitly passed as the first argument when calling the method:

reveal_type(C.f(1))  # revealed: str
reveal_type(C().f(1))  # revealed: str

When the class method is called incorrectly, we detect it:

C.f("incorrect")  # error: [invalid-argument-type]
C.f()  # error: [missing-argument]
C.f(1, 2)  # error: [too-many-positional-arguments]

If the cls parameter is wrongly annotated, we emit an error at the call site:

class D:
    @classmethod
    def f(cls: D):
        # This function is wrongly annotated, it should be `type[D]` instead of `D`
        pass

# error: [invalid-argument-type] "Object of type `Literal[D]` cannot be assigned to parameter 1 (`cls`) of bound method `f`; expected type `D`"
D.f()

When a class method is accessed on a derived class, it is bound to that derived class:

class Derived(C):
    pass

reveal_type(Derived.f)  # revealed: <bound method `f` of `Literal[Derived]`>
reveal_type(Derived().f)  # revealed: <bound method `f` of `type[Derived]`>

reveal_type(Derived.f(1))  # revealed: str
reveal_type(Derived().f(1))  # revealed: str

Accessing the classmethod as a static member

Accessing a @classmethod-decorated function at runtime returns a classmethod object. We currently don't model this explicitly:

from inspect import getattr_static

class C:
    @classmethod
    def f(cls): ...

reveal_type(getattr_static(C, "f"))  # revealed: Literal[f]
reveal_type(getattr_static(C, "f").__get__)  # revealed: <method-wrapper `__get__` of `f`>

But we correctly model how the classmethod descriptor works:

reveal_type(getattr_static(C, "f").__get__(None, C))  # revealed: <bound method `f` of `Literal[C]`>
reveal_type(getattr_static(C, "f").__get__(C(), C))  # revealed: <bound method `f` of `Literal[C]`>
reveal_type(getattr_static(C, "f").__get__(C()))  # revealed: <bound method `f` of `type[C]`>

The owner argument takes precedence over the instance argument:

reveal_type(getattr_static(C, "f").__get__("dummy", C))  # revealed: <bound method `f` of `Literal[C]`>

Classmethods mixed with other decorators

When a @classmethod is additionally decorated with another decorator, it is still treated as a class method:

from __future__ import annotations

def does_nothing[T](f: T) -> T:
    return f

class C:
    @classmethod
    @does_nothing
    def f1(cls: type[C], x: int) -> str:
        return "a"

    @does_nothing
    @classmethod
    def f2(cls: type[C], x: int) -> str:
        return "a"

# TODO: We do not support decorators yet (only limited special cases). Eventually,
# these should all return `str`:

reveal_type(C.f1(1))  # revealed: @Todo(return type of decorated function)
reveal_type(C().f1(1))  # revealed: @Todo(return type of decorated function)

reveal_type(C.f2(1))  # revealed: @Todo(return type of decorated function)
reveal_type(C().f2(1))  # revealed: @Todo(return type of decorated function)

12 KiB Raw Blame History