[red-knot] Method calls and the descriptor protocol (#16121)

## Summary

This PR achieves the following:

* Add support for checking method calls, and inferring return types from
method calls. For example:
  ```py
  reveal_type("abcde".find("abc"))  # revealed: int
  reveal_type("foo".encode(encoding="utf-8"))  # revealed: bytes
  
  "abcde".find(123)  # error: [invalid-argument-type]
  
  class C:
      def f(self) -> int:
          pass
  
  reveal_type(C.f)  # revealed: <function `f`>
  reveal_type(C().f)  # revealed: <bound method: `f` of `C`>
  
  C.f()  # error: [missing-argument]
  reveal_type(C().f())  # revealed: int
  ```
* Implement the descriptor protocol, i.e. properly call the `__get__`
method when a descriptor object is accessed through a class object or an
instance of a class. For example:
  ```py
  from typing import Literal
  
  class Ten:
def __get__(self, instance: object, owner: type | None = None) ->
Literal[10]:
          return 10
  
  class C:
      ten: Ten = Ten()
  
  reveal_type(C.ten)  # revealed: Literal[10]
  reveal_type(C().ten)  # revealed: Literal[10]
  ```
* Add support for member lookup on intersection types.
* Support type inference for `inspect.getattr_static(obj, attr)` calls.
This was mostly used as a debugging tool during development, but seems
more generally useful. It can be used to bypass the descriptor protocol.
For the example above:
  ```py
  from inspect import getattr_static
  
  reveal_type(getattr_static(C, "ten"))  # revealed: Ten
  ```
* Add a new `Type::Callable(…)` variant with the following sub-variants:
* `Type::Callable(CallableType::BoundMethod(…))` — represents bound
method objects, e.g. `C().f` above
* `Type::Callable(CallableType::MethodWrapperDunderGet(…))` — represents
`f.__get__` where `f` is a function
* `Type::Callable(WrapperDescriptorDunderGet)` — represents
`FunctionType.__get__`
* Add new known classes:
  * `types.MethodType`
  * `types.MethodWrapperType`
  * `types.WrapperDescriptorType`
  * `builtins.range`

## Performance analysis

On this branch, we do more work. We need to do more call checking, since
we now check all method calls. We also need to do ~twice as many member
lookups, because we need to check if a `__get__` attribute exists on
accessed members.

A brief analysis on `tomllib` shows that we now call `Type::call` 1780
times, compared to 612 calls before.

## Limitations

* Data descriptors are not yet supported, i.e. we do not infer correct
types for descriptor attribute accesses in `Store` context and do not
check writes to descriptor attributes. I felt like this was something
that could be split out as a follow-up without risking a major
architectural change.
* We currently distinguish between `Type::member` (with descriptor
protocol) and `Type::static_member` (without descriptor protocol). The
former corresponds to `obj.attr`, the latter corresponds to
`getattr_static(obj, "attr")`. However, to model some details correctly,
we would also need to distinguish between a static member lookup *with*
and *without* instance variables. The lookup without instance variables
corresponds to `find_name_in_mro`
[here](https://docs.python.org/3/howto/descriptor.html#invocation-from-an-instance).
We currently approximate both using `member_static`, which leads to two
open TODOs. Changing this would be a larger refactoring of
`Type::own_instance_member`, so I chose to leave it out of this PR.

## Test Plan

* New `call/methods.md` test suite for method calls
* New tests in `descriptor_protocol.md`
* New `call/getattr_static.md` test suite for `inspect.getattr_static`
* Various updated tests
This commit is contained in:
David Peter
2025-02-20 23:22:26 +01:00
committed by GitHub
parent f62e5406f2
commit d2e034adcd
21 changed files with 1577 additions and 174 deletions

View File

@@ -0,0 +1,133 @@
# `inspect.getattr_static`
## Basic usage
`inspect.getattr_static` is a function that returns attributes of an object without invoking the
descriptor protocol (for caveats, see the [official documentation]).
Consider the following example:
```py
import inspect
class Descriptor:
def __get__(self, instance, owner) -> str:
return 1
class C:
normal: int = 1
descriptor: Descriptor = Descriptor()
```
If we access attributes on an instance of `C` as usual, the descriptor protocol is invoked, and we
get a type of `str` for the `descriptor` attribute:
```py
c = C()
reveal_type(c.normal) # revealed: int
reveal_type(c.descriptor) # revealed: str
```
However, if we use `inspect.getattr_static`, we can see the underlying `Descriptor` type:
```py
reveal_type(inspect.getattr_static(c, "normal")) # revealed: int
reveal_type(inspect.getattr_static(c, "descriptor")) # revealed: Descriptor
```
For non-existent attributes, a default value can be provided:
```py
reveal_type(inspect.getattr_static(C, "normal", "default-arg")) # revealed: int
reveal_type(inspect.getattr_static(C, "non_existent", "default-arg")) # revealed: Literal["default-arg"]
```
When a non-existent attribute is accessed without a default value, the runtime raises an
`AttributeError`. We could emit a diagnostic for this case, but that is currently not supported:
```py
# TODO: we could emit a diagnostic here
reveal_type(inspect.getattr_static(C, "non_existent")) # revealed: Never
```
We can access attributes on objects of all kinds:
```py
import sys
reveal_type(inspect.getattr_static(sys, "platform")) # revealed: LiteralString
reveal_type(inspect.getattr_static(inspect, "getattr_static")) # revealed: Literal[getattr_static]
reveal_type(inspect.getattr_static(1, "real")) # revealed: Literal[1]
```
(Implicit) instance attributes can also be accessed through `inspect.getattr_static`:
```py
class D:
def __init__(self) -> None:
self.instance_attr: int = 1
reveal_type(inspect.getattr_static(D(), "instance_attr")) # revealed: int
```
## Error cases
We can only infer precise types if the attribute is a literal string. In all other cases, we fall
back to `Any`:
```py
import inspect
class C:
x: int = 1
def _(attr_name: str):
reveal_type(inspect.getattr_static(C(), attr_name)) # revealed: Any
reveal_type(inspect.getattr_static(C(), attr_name, 1)) # revealed: Any
```
But we still detect errors in the number or type of arguments:
```py
# error: [missing-argument] "No arguments provided for required parameters `obj`, `attr` of function `getattr_static`"
inspect.getattr_static()
# error: [missing-argument] "No argument provided for required parameter `attr`"
inspect.getattr_static(C())
# error: [invalid-argument-type] "Object of type `Literal[1]` cannot be assigned to parameter 2 (`attr`) of function `getattr_static`; expected type `str`"
inspect.getattr_static(C(), 1)
# error: [too-many-positional-arguments] "Too many positional arguments to function `getattr_static`: expected 3, got 4"
inspect.getattr_static(C(), "x", "default-arg", "one too many")
```
## Possibly unbound attributes
```py
import inspect
def _(flag: bool):
class C:
if flag:
x: int = 1
reveal_type(inspect.getattr_static(C, "x", "default")) # revealed: int | Literal["default"]
```
## Gradual types
```py
import inspect
from typing import Any
def _(a: Any, tuple_of_any: tuple[Any]):
reveal_type(inspect.getattr_static(a, "x", "default")) # revealed: Any | Literal["default"]
# TODO: Ideally, this would just be `Literal[index]`
reveal_type(inspect.getattr_static(tuple_of_any, "index", "default")) # revealed: Literal[index] | Literal["default"]
```
[official documentation]: https://docs.python.org/3/library/inspect.html#inspect.getattr_static

View File

@@ -0,0 +1,258 @@
# Methods
## Background: Functions as descriptors
> Note: See also this related section in the descriptor guide: [Functions and methods].
Say we have a simple class `C` with a function definition `f` inside its body:
```py
class C:
def f(self, x: int) -> str:
return "a"
```
Whenever we access the `f` attribute through the class object itself (`C.f`) or through an instance
(`C().f`), this access happens via the descriptor protocol. Functions are (non-data) descriptors
because they implement a `__get__` method. This is crucial in making sure that method calls work as
expected. In general, the signature of the `__get__` method in the descriptor protocol is
`__get__(self, instance, owner)`. The `self` argument is the descriptor object itself (`f`). The
passed value for the `instance` argument depends on whether the attribute is accessed from the class
object (in which case it is `None`), or from an instance (in which case it is the instance of type
`C`). The `owner` argument is the class itself (`C` of type `Literal[C]`). To summarize:
- `C.f` is equivalent to `getattr_static(C, "f").__get__(None, C)`
- `C().f` is equivalent to `getattr_static(C, "f").__get__(C(), C)`
Here, `inspect.getattr_static` is used to bypass the descriptor protocol and directly access the
function attribute. The way the special `__get__` method *on functions* works is as follows. In the
former case, if the `instance` argument is `None`, `__get__` simply returns the function itself. In
the latter case, it returns a *bound method* object:
```py
from inspect import getattr_static
reveal_type(getattr_static(C, "f")) # revealed: Literal[f]
reveal_type(getattr_static(C, "f").__get__) # revealed: <method-wrapper `__get__` of `f`>
reveal_type(getattr_static(C, "f").__get__(None, C)) # revealed: Literal[f]
reveal_type(getattr_static(C, "f").__get__(C(), C)) # revealed: <bound method `f` of `C`>
```
In conclusion, this is why we see the following two types when accessing the `f` attribute on the
class object `C` and on an instance `C()`:
```py
reveal_type(C.f) # revealed: Literal[f]
reveal_type(C().f) # revealed: <bound method `f` of `C`>
```
A bound method is a callable object that contains a reference to the `instance` that it was called
on (can be inspected via `__self__`), and the function object that it refers to (can be inspected
via `__func__`):
```py
bound_method = C().f
reveal_type(bound_method.__self__) # revealed: C
reveal_type(bound_method.__func__) # revealed: Literal[f]
```
When we call the bound method, the `instance` is implicitly passed as the first argument (`self`):
```py
reveal_type(C().f(1)) # revealed: str
reveal_type(bound_method(1)) # revealed: str
```
When we call the function object itself, we need to pass the `instance` explicitly:
```py
C.f(1) # error: [missing-argument]
reveal_type(C.f(C(), 1)) # revealed: str
```
When we access methods from derived classes, they will be bound to instances of the derived class:
```py
class D(C):
pass
reveal_type(D().f) # revealed: <bound method `f` of `D`>
```
If we access an attribute on a bound method object itself, it will defer to `types.MethodType`:
```py
reveal_type(bound_method.__hash__) # revealed: <bound method `__hash__` of `MethodType`>
```
If an attribute is not available on the bound method object, it will be looked up on the underlying
function object. We model this explicitly, which means that we can access `__kwdefaults__` on bound
methods, even though it is not available on `types.MethodType`:
```py
reveal_type(bound_method.__kwdefaults__) # revealed: @Todo(generics) | None
```
## Basic method calls on class objects and instances
```py
class Base:
def method_on_base(self, x: int | None) -> str:
return "a"
class Derived(Base):
def method_on_derived(self, x: bytes) -> tuple[int, str]:
return (1, "a")
reveal_type(Base().method_on_base(1)) # revealed: str
reveal_type(Base.method_on_base(Base(), 1)) # revealed: str
Base().method_on_base("incorrect") # error: [invalid-argument-type]
Base().method_on_base() # error: [missing-argument]
Base().method_on_base(1, 2) # error: [too-many-positional-arguments]
reveal_type(Derived().method_on_base(1)) # revealed: str
reveal_type(Derived().method_on_derived(b"abc")) # revealed: tuple[int, str]
reveal_type(Derived.method_on_base(Derived(), 1)) # revealed: str
reveal_type(Derived.method_on_derived(Derived(), b"abc")) # revealed: tuple[int, str]
```
## Method calls on literals
### Boolean literals
```py
reveal_type(True.bit_length()) # revealed: int
reveal_type(True.as_integer_ratio()) # revealed: tuple[int, Literal[1]]
```
### Integer literals
```py
reveal_type((42).bit_length()) # revealed: int
```
### String literals
```py
reveal_type("abcde".find("abc")) # revealed: int
reveal_type("foo".encode(encoding="utf-8")) # revealed: bytes
"abcde".find(123) # error: [invalid-argument-type]
```
### Bytes literals
```py
reveal_type(b"abcde".startswith(b"abc")) # revealed: bool
```
## Method calls on `LiteralString`
```py
from typing_extensions import LiteralString
def f(s: LiteralString) -> None:
reveal_type(s.find("a")) # revealed: int
```
## Method calls on `tuple`
```py
def f(t: tuple[int, str]) -> None:
reveal_type(t.index("a")) # revealed: int
```
## Method calls on unions
```py
from typing import Any
class A:
def f(self) -> int:
return 1
class B:
def f(self) -> str:
return "a"
def f(a_or_b: A | B, any_or_a: Any | A):
reveal_type(a_or_b.f) # revealed: <bound method `f` of `A`> | <bound method `f` of `B`>
reveal_type(a_or_b.f()) # revealed: int | str
reveal_type(any_or_a.f) # revealed: Any | <bound method `f` of `A`>
reveal_type(any_or_a.f()) # revealed: Any | int
```
## Method calls on `KnownInstance` types
```toml
[environment]
python-version = "3.12"
```
```py
type IntOrStr = int | str
reveal_type(IntOrStr.__or__) # revealed: <bound method `__or__` of `typing.TypeAliasType`>
```
## Error cases: Calling `__get__` for methods
The `__get__` method on `types.FunctionType` has the following overloaded signature in typeshed:
```py
from types import FunctionType, MethodType
from typing import overload
@overload
def __get__(self, instance: None, owner: type, /) -> FunctionType: ...
@overload
def __get__(self, instance: object, owner: type | None = None, /) -> MethodType: ...
```
Here, we test that this signature is enforced correctly:
```py
from inspect import getattr_static
class C:
def f(self, x: int) -> str:
return "a"
method_wrapper = getattr_static(C, "f").__get__
reveal_type(method_wrapper) # revealed: <method-wrapper `__get__` of `f`>
# All of these are fine:
method_wrapper(C(), C)
method_wrapper(C())
method_wrapper(C(), None)
method_wrapper(None, C)
# Passing `None` without an `owner` argument is an
# error: [missing-argument] "No argument provided for required parameter `owner`"
method_wrapper(None)
# Passing something that is not assignable to `type` as the `owner` argument is an
# error: [invalid-argument-type] "Object of type `Literal[1]` cannot be assigned to parameter 2 (`owner`); expected type `type`"
method_wrapper(None, 1)
# Passing `None` as the `owner` argument when `instance` is `None` is an
# error: [invalid-argument-type] "Object of type `None` cannot be assigned to parameter 2 (`owner`); expected type `type`"
method_wrapper(None, None)
# Calling `__get__` without any arguments is an
# error: [missing-argument] "No argument provided for required parameter `instance`"
method_wrapper()
# Calling `__get__` with too many positional arguments is an
# error: [too-many-positional-arguments] "Too many positional arguments: expected 2, got 3"
method_wrapper(C(), C, "one too many")
```
[functions and methods]: https://docs.python.org/3/howto/descriptor.html#functions-and-methods