Files
ruff/crates/red_knot_python_semantic/resources/mdtest/generics/scoping.md
David Peter 820a31af5d [red-knot] Attribute access and the descriptor protocol (#16416)
## Summary

* Attributes/method are now properly looked up on metaclasses, when
called on class objects
* We properly distinguish between data descriptors and non-data
descriptors (but we do not yet support them in store-context, i.e.
`obj.data_descr = …`)
* The descriptor protocol is now implemented in a single unified place
for instances, classes and dunder-calls. Unions and possibly-unbound
symbols are supported in all possible stages of the process by creating
union types as results.
* In general, the handling of "possibly-unbound" symbols has been
improved in a lot of places: meta-class attributes, attributes,
descriptors with possibly-unbound `__get__` methods, instance
attributes, …
* We keep track of type qualifiers in a lot more places. I anticipate
that this will be useful if we import e.g. `Final` symbols from other
modules (see relevant change to typing spec:
https://github.com/python/typing/pull/1937).
* Detection and special-casing of the `typing.Protocol` special form in
order to avoid lots of changes in the test suite due to new `@Todo`
types when looking up attributes on builtin types which have `Protocol`
in their MRO. We previously
looked up attributes in a wrong way, which is why this didn't come up
before.

closes #16367
closes #15966

## Context

The way attribute lookup in `Type::member` worked before was simply
wrong (mostly my own fault). The whole instance-attribute lookup should
probably never have been integrated into `Type::member`. And the
`Type::static_member` function that I introduced in my last descriptor
PR was the wrong abstraction. It's kind of fascinating how far this
approach took us, but I am pretty confident that the new approach
proposed here is what we need to model this correctly.

There are three key pieces that are required to implement attribute
lookups:

- **`Type::class_member`**/**`Type::find_in_mro`**: The
`Type::find_in_mro` method that can look up attributes on class bodies
(and corresponding bases). This is a partial function on types, as it
can not be called on instance types like`Type::Instance(…)` or
`Type::IntLiteral(…)`. For this reason, we usually call it through
`Type::class_member`, which is essentially just
`type.to_meta_type().find_in_mro(…)` plus union/intersection handling.
- **`Type::instance_member`**: This new function is basically the
type-level equivalent to `obj.__dict__[name]` when called on
`Type::Instance(…)`. We use this to discover instance attributes such as
those that we see as declarations on class bodies or as (annotated)
assignments to `self.attr` in methods of a class.
- The implementation of the descriptor protocol. It works slightly
different for instances and for class objects, but it can be described
by the general framework:
- Call `type.class_member("attribute")` to look up "attribute" in the
MRO of the meta type of `type`. Call the resulting `Symbol` `meta_attr`
(even if it's unbound).
- Use `meta_attr.class_member("__get__")` to look up `__get__` on the
*meta type* of `meta_attr`. Call it with `__get__(meta_attr, self,
self.to_meta_type())`. If this fails (either the lookup or the call),
just proceed with `meta_attr`. Otherwise, replace `meta_attr` in the
following with the return type of `__get__`. In this step, we also probe
if a `__set__` or `__delete__` method exists and store it in
`meta_attr_kind` (can be either "data descriptor" or "normal attribute
or non-data descriptor").
  - Compute a `fallback` type.
    - For instances, we use `self.instance_member("attribute")`
- For class objects, we use `class_attr =
self.find_in_mro("attribute")`, and then try to invoke the descriptor
protocol on `class_attr`, i.e. we look up `__get__` on the meta type of
`class_attr` and call it with `__get__(class_attr, None, self)`. This
additional invocation of the descriptor protocol on the fallback type is
one major asymmetry in the otherwise universal descriptor protocol
implementation.
- Finally, we look at `meta_attr`, `meta_attr_kind` and `fallback`, and
handle various cases of (possible) unboundness of these symbols.
- If `meta_attr` is bound and a data descriptor, just return `meta_attr`
- If `meta_attr` is not a data descriptor, and `fallback` is bound, just
return `fallback`
- If `meta_attr` is not a data descriptor, and `fallback` is unbound,
return `meta_attr`
- Return unions of these three possibilities for partially-bound
symbols.

This allows us to handle class objects and instances within the same
framework. There is a minor additional detail where for instances, we do
not allow the fallback type (the instance attribute) to completely
shadow the non-data descriptor. We do this because we (currently) don't
want to pretend that we can statically infer that an instance attribute
is always set.

Dunder method calls can also be embedded into this framework. The only
thing that changes is that *there is no fallback type*. If a dunder
method is called on an instance, we do not fall back to instance
variables. If a dunder method is called on a class object, we only look
it up on the meta class, never on the class itself.

## Test Plan

New Markdown tests.
2025-03-07 22:03:28 +01:00

5.7 KiB

Scoping rules for type variables

Most of these tests come from the Scoping rules for type variables section of the typing spec.

Typevar used outside of generic function or class

Typevars may only be used in generic function or class definitions.

from typing import TypeVar

T = TypeVar("T")

# TODO: error
x: T

class C:
    # TODO: error
    x: T

def f() -> None:
    # TODO: error
    x: T

Legacy typevar used multiple times

A type variable used in a generic function could be inferred to represent different types in the same code block.

This only applies to typevars defined using the legacy syntax, since the PEP 695 syntax creates a new distinct typevar for each occurrence.

from typing import TypeVar

T = TypeVar("T")

def f1(x: T) -> T: ...
def f2(x: T) -> T: ...

f1(1)
f2("a")

Typevar inferred multiple times

A type variable used in a generic function could be inferred to represent different types in the same code block.

This also applies to a single generic function being used multiple times, instantiating the typevar to a different type each time.

def f[T](x: T) -> T: ...

# TODO: no error
# TODO: revealed: int or Literal[1]
# error: [invalid-argument-type]
reveal_type(f(1))  # revealed: T
# TODO: no error
# TODO: revealed: str or Literal["a"]
# error: [invalid-argument-type]
reveal_type(f("a"))  # revealed: T

Methods can mention class typevars

A type variable used in a method of a generic class that coincides with one of the variables that parameterize this class is always bound to that variable.

class C[T]:
    def m1(self, x: T) -> T: ...
    def m2(self, x: T) -> T: ...

c: C[int] = C()
# TODO: no error
# error: [invalid-argument-type]
c.m1(1)
# TODO: no error
# error: [invalid-argument-type]
c.m2(1)
# TODO: expected type `int`
# error: [invalid-argument-type] "Object of type `Literal["string"]` cannot be assigned to parameter 2 (`x`) of bound method `m2`; expected type `T`"
c.m2("string")

Methods can mention other typevars

A type variable used in a method that does not match any of the variables that parameterize the class makes this method a generic function in that variable.

from typing import TypeVar, Generic

T = TypeVar("T")
S = TypeVar("S")

# TODO: no error
# error: [invalid-base]
class Legacy(Generic[T]):
    def m(self, x: T, y: S) -> S: ...

legacy: Legacy[int] = Legacy()
# TODO: revealed: str
reveal_type(legacy.m(1, "string"))  # revealed: @Todo(Invalid or unsupported `Instance` in `Type::to_type_expression`)

With PEP 695 syntax, it is clearer that the method uses a separate typevar:

class C[T]:
    def m[S](self, x: T, y: S) -> S: ...

c: C[int] = C()
# TODO: no errors
# TODO: revealed: str
# error: [invalid-argument-type]
# error: [invalid-argument-type]
reveal_type(c.m(1, "string"))  # revealed: S

Unbound typevars

Unbound type variables should not appear in the bodies of generic functions, or in the class bodies apart from method definitions.

This is true with the legacy syntax:

from typing import TypeVar, Generic

T = TypeVar("T")
S = TypeVar("S")

def f(x: T) -> None:
    x: list[T] = []
    # TODO: error
    y: list[S] = []

# TODO: no error
# error: [invalid-base]
class C(Generic[T]):
    # TODO: error
    x: list[S] = []

    # This is not an error, as shown in the previous test
    def m(self, x: S) -> S: ...

This is true with PEP 695 syntax, as well, though we must use the legacy syntax to define the unbound typevars:

pep695.py:

from typing import TypeVar

S = TypeVar("S")

def f[T](x: T) -> None:
    x: list[T] = []
    # TODO: error
    y: list[S] = []

class C[T]:
    # TODO: error
    x: list[S] = []

    def m1(self, x: S) -> S: ...
    def m2[S](self, x: S) -> S: ...

Nested formal typevars must be distinct

Generic functions and classes can be nested in each other, but it is an error for the same typevar to be used in nested generic definitions.

Note that the typing spec only mentions two specific versions of this rule:

A generic class definition that appears inside a generic function should not use type variables that parameterize the generic function.

and

A generic class nested in another generic class cannot use the same type variables.

We assume that the more general form holds.

Generic function within generic function

def f[T](x: T, y: T) -> None:
    def ok[S](a: S, b: S) -> None: ...

    # TODO: error
    def bad[T](a: T, b: T) -> None: ...

Generic method within generic class

class C[T]:
    def ok[S](self, a: S, b: S) -> None: ...

    # TODO: error
    def bad[T](self, a: T, b: T) -> None: ...

Generic class within generic function

from typing import Iterable

def f[T](x: T, y: T) -> None:
    class Ok[S]: ...
    # TODO: error for reuse of typevar
    class Bad1[T]: ...
    # TODO: no non-subscriptable error, error for reuse of typevar
    # error: [non-subscriptable]
    class Bad2(Iterable[T]): ...

Generic class within generic class

from typing import Iterable

class C[T]:
    class Ok1[S]: ...
    # TODO: error for reuse of typevar
    class Bad1[T]: ...
    # TODO: no non-subscriptable error, error for reuse of typevar
    # error: [non-subscriptable]
    class Bad2(Iterable[T]): ...

Class scopes do not cover inner scopes

Just like regular symbols, the typevars of a generic class are only available in that class's scope, and are not available in nested scopes.

class C[T]:
    ok1: list[T] = []

    class Bad:
        # TODO: error
        bad: list[T] = []

    class Inner[S]: ...
    ok2: Inner[T]