Files

cake-monotone 6a4d207db7 [red-knot] Refactoring the inference logic of lexicographic comparisons (#14422 )

## Summary

closes #14279

### Limitations of the Current Implementation
#### Incorrect Error Propagation

In the current implementation of lexicographic comparisons, if the
result of an Eq operation is Ambiguous, the comparison stops
immediately, returning a bool instance. While this may yield correct
inferences, it fails to capture unsupported-operation errors that might
occur in subsequent comparisons.
```py
class A: ...

(int_instance(), A()) < (int_instance(), A())  # should error
```

#### Weak Inference in Specific Cases

> Example: `(int_instance(), "foo") == (int_instance(), "bar")`
> Current result: `bool`
> Expected result: `Literal[False]`

`Eq` and `NotEq` have unique behavior in lexicographic comparisons
compared to other operators. Specifically:
- For `Eq`, if any non-equal pair exists within the tuples being
compared, we can immediately conclude that the tuples are not equal.
- For `NotEq`, if any equal pair exists, we can conclude that the tuples
are unequal.

```py
a = (str_instance(), int_instance(), "foo")

reveal_type(a == a)  # revealed: bool
reveal_type(a != a)  # revealed: bool

b = (str_instance(), int_instance(), "bar")

reveal_type(a == b)  # revealed: bool  # should be Literal[False]
reveal_type(a != b)  # revealed: bool  # should be Literal[True]
```
#### Incorrect Support for Non-Boolean Rich Comparisons

In CPython, aside from `==` and `!=`, tuple comparisons return a
non-boolean result as-is. Tuples do not convert the value into `bool`.

Note: If all pairwise `==` comparisons between elements in the tuples
return Truthy, the comparison then considers the tuples' lengths.
Regardless of the return type of the dunder methods, the final result
can still be a boolean.

```py
from __future__ import annotations

class A:
    def __eq__(self, o: object) -> str:
        return "hello"

    def __ne__(self, o: object) -> bytes:
        return b"world"

    def __lt__(self, o: A) -> float:
        return 3.14

a = (A(), A())

reveal_type(a == a)  # revealed: bool
reveal_type(a != a)  # revealed: bool
reveal_type(a < a)  # revealed: bool # should be: `float | Literal[False]`

```

### Key Changes
One of the major changes is that comparisons no longer end with a `bool`
result when a pairwise `Eq` result is `Ambiguous`. Instead, the function
attempts to infer all possible cases and unions the results. This
improvement allows for more robust type inference and better error
detection.

Additionally, as the function is now optimized for tuple comparisons,
the name has been changed from the more general
`infer_lexicographic_comparison` to `infer_tuple_rich_comparison`.

## Test Plan

mdtest included

2024-11-19 17:32:43 -08:00

11 KiB

Raw Blame History

Comparison: Tuples

Heterogeneous

For tuples like tuple[int, str, Literal[1]]

Value Comparisons

"Value Comparisons" refers to the operators: ==, !=, <, <=, >, >=

Results without Ambiguity

Cases where the result can be definitively inferred as a BooleanLiteral.

a = (1, "test", (3, 13), True)
b = (1, "test", (3, 14), False)

reveal_type(a == a)  # revealed: Literal[True]
reveal_type(a != a)  # revealed: Literal[False]
reveal_type(a < a)  # revealed: Literal[False]
reveal_type(a <= a)  # revealed: Literal[True]
reveal_type(a > a)  # revealed: Literal[False]
reveal_type(a >= a)  # revealed: Literal[True]

reveal_type(a == b)  # revealed: Literal[False]
reveal_type(a != b)  # revealed: Literal[True]
reveal_type(a < b)  # revealed: Literal[True]
reveal_type(a <= b)  # revealed: Literal[True]
reveal_type(a > b)  # revealed: Literal[False]
reveal_type(a >= b)  # revealed: Literal[False]

Even when tuples have different lengths, comparisons should be handled appropriately.

a = (1, 2, 3)
b = (1, 2, 3, 4)

reveal_type(a == b)  # revealed: Literal[False]
reveal_type(a != b)  # revealed: Literal[True]
reveal_type(a < b)  # revealed: Literal[True]
reveal_type(a <= b)  # revealed: Literal[True]
reveal_type(a > b)  # revealed: Literal[False]
reveal_type(a >= b)  # revealed: Literal[False]

c = ("a", "b", "c", "d")
d = ("a", "b", "c")

reveal_type(c == d)  # revealed: Literal[False]
reveal_type(c != d)  # revealed: Literal[True]
reveal_type(c < d)  # revealed: Literal[False]
reveal_type(c <= d)  # revealed: Literal[False]
reveal_type(c > d)  # revealed: Literal[True]
reveal_type(c >= d)  # revealed: Literal[True]

Results with Ambiguity

def bool_instance() -> bool:
    return True

def int_instance() -> int:
    return 42

a = (bool_instance(),)
b = (int_instance(),)

reveal_type(a == a)  # revealed: bool
reveal_type(a != a)  # revealed: bool
reveal_type(a < a)  # revealed: bool
reveal_type(a <= a)  # revealed: bool
reveal_type(a > a)  # revealed: bool
reveal_type(a >= a)  # revealed: bool

reveal_type(a == b)  # revealed: bool
reveal_type(a != b)  # revealed: bool
reveal_type(a < b)  # revealed: bool
reveal_type(a <= b)  # revealed: bool
reveal_type(a > b)  # revealed: bool
reveal_type(a >= b)  # revealed: bool

Comparison Unsupported

If two tuples contain types that do not support comparison, the result may be Unknown. However, == and != are exceptions and can still provide definite results.

a = (1, 2)
b = (1, "hello")

# TODO: should be Literal[False], once we implement (in)equality for mismatched literals
reveal_type(a == b)  # revealed: bool

# TODO: should be Literal[True], once we implement (in)equality for mismatched literals
reveal_type(a != b)  # revealed: bool

# TODO: should be Unknown and add more informative diagnostics
reveal_type(a < b)  # revealed: bool
reveal_type(a <= b)  # revealed: bool
reveal_type(a > b)  # revealed: bool
reveal_type(a >= b)  # revealed: bool

However, if the lexicographic comparison completes without reaching a point where str and int are compared, Python will still produce a result based on the prior elements.

a = (1, 2)
b = (999999, "hello")

reveal_type(a == b)  # revealed: Literal[False]
reveal_type(a != b)  # revealed: Literal[True]
reveal_type(a < b)  # revealed: Literal[True]
reveal_type(a <= b)  # revealed: Literal[True]
reveal_type(a > b)  # revealed: Literal[False]
reveal_type(a >= b)  # revealed: Literal[False]

Matryoshka Tuples

a = (1, True, "Hello")
b = (a, a, a)
c = (b, b, b)

reveal_type(c == c)  # revealed: Literal[True]
reveal_type(c != c)  # revealed: Literal[False]
reveal_type(c < c)  # revealed: Literal[False]
reveal_type(c <= c)  # revealed: Literal[True]
reveal_type(c > c)  # revealed: Literal[False]
reveal_type(c >= c)  # revealed: Literal[True]

Non Boolean Rich Comparisons

Rich comparison methods defined in a class affect tuple comparisons as well. Proper type inference should be possible even in cases where these methods return non-boolean types.

Note: Tuples use lexicographic comparisons. If the == result for all paired elements in the tuple is True, the comparison then considers the tuple’s length. Regardless of the return type of the dunder methods, the final result can still be a boolean value.

(+cpython: For tuples, == and != always produce boolean results, regardless of the return type of the dunder methods.)

from __future__ import annotations

class A:
    def __eq__(self, o: object) -> str:
        return "hello"

    def __ne__(self, o: object) -> bytes:
        return b"world"

    def __lt__(self, o: A) -> float:
        return 3.14

    def __le__(self, o: A) -> complex:
        return complex(0.5, -0.5)

    def __gt__(self, o: A) -> tuple:
        return (1, 2, 3)

    def __ge__(self, o: A) -> list:
        return [1, 2, 3]

a = (A(), A())

reveal_type(a == a)  # revealed: bool
reveal_type(a != a)  # revealed: bool
reveal_type(a < a)  # revealed: float | Literal[False]
reveal_type(a <= a)  # revealed: complex | Literal[True]
reveal_type(a > a)  # revealed: tuple | Literal[False]
reveal_type(a >= a)  # revealed: list | Literal[True]

# If lexicographic comparison is finished before comparing A()
b = ("1_foo", A())
c = ("2_bar", A())

reveal_type(b == c)  # revealed: Literal[False]
reveal_type(b != c)  # revealed: Literal[True]
reveal_type(b < c)  # revealed: Literal[True]
reveal_type(b <= c)  # revealed: Literal[True]
reveal_type(b > c)  # revealed: Literal[False]
reveal_type(b >= c)  # revealed: Literal[False]

class B:
    def __lt__(self, o: B) -> set:
        return set()

reveal_type((A(), B()) < (A(), B()))  # revealed: float | set | Literal[False]

Special Handling of Eq and NotEq in Lexicographic Comparisons

Example: (int_instance(), "foo") == (int_instance(), "bar")

Eq and NotEq have unique behavior compared to other operators in lexicographic comparisons. Specifically, for Eq, if any non-equal pair exists within the tuples being compared, we can immediately conclude that the tuples are not equal. Conversely, for NotEq, if any non-equal pair exists, we can determine that the tuples are unequal.

In contrast, with operators like < and >, the comparison must consider each pair of elements sequentially, and the final outcome might remain ambiguous until all pairs are compared.

def str_instance() -> str:
    return "hello"

def int_instance() -> int:
    return 42

reveal_type("foo" == "bar")  # revealed: Literal[False]
reveal_type(("foo",) == ("bar",))  # revealed: Literal[False]
reveal_type((4, "foo") == (4, "bar"))  # revealed: Literal[False]
reveal_type((int_instance(), "foo") == (int_instance(), "bar"))  # revealed: Literal[False]

a = (str_instance(), int_instance(), "foo")

reveal_type(a == a)  # revealed: bool
reveal_type(a != a)  # revealed: bool
reveal_type(a < a)  # revealed: bool
reveal_type(a <= a)  # revealed: bool
reveal_type(a > a)  # revealed: bool
reveal_type(a >= a)  # revealed: bool

b = (str_instance(), int_instance(), "bar")

reveal_type(a == b)  # revealed: Literal[False]
reveal_type(a != b)  # revealed: Literal[True]
reveal_type(a < b)  # revealed: bool
reveal_type(a <= b)  # revealed: bool
reveal_type(a > b)  # revealed: bool
reveal_type(a >= b)  # revealed: bool

c = (str_instance(), int_instance(), "foo", "different_length")
reveal_type(a == c)  # revealed: Literal[False]
reveal_type(a != c)  # revealed: Literal[True]
reveal_type(a < c)  # revealed: bool
reveal_type(a <= c)  # revealed: bool
reveal_type(a > c)  # revealed: bool
reveal_type(a >= c)  # revealed: bool

Error Propagation

Errors occurring within a tuple comparison should propagate outward. However, if the tuple comparison can clearly conclude before encountering an error, the error should not be raised.

def int_instance() -> int:
    return 42

def str_instance() -> str:
    return "hello"

class A: ...

# error: [unsupported-operator] "Operator `<` is not supported for types `A` and `A`"
A() < A()
# error: [unsupported-operator] "Operator `<=` is not supported for types `A` and `A`"
A() <= A()
# error: [unsupported-operator] "Operator `>` is not supported for types `A` and `A`"
A() > A()
# error: [unsupported-operator] "Operator `>=` is not supported for types `A` and `A`"
A() >= A()

a = (0, int_instance(), A())

# error: [unsupported-operator] "Operator `<` is not supported for types `A` and `A`, in comparing `tuple[Literal[0], int, A]` with `tuple[Literal[0], int, A]`"
reveal_type(a < a)  # revealed: Unknown
# error: [unsupported-operator] "Operator `<=` is not supported for types `A` and `A`, in comparing `tuple[Literal[0], int, A]` with `tuple[Literal[0], int, A]`"
reveal_type(a <= a)  # revealed: Unknown
# error: [unsupported-operator] "Operator `>` is not supported for types `A` and `A`, in comparing `tuple[Literal[0], int, A]` with `tuple[Literal[0], int, A]`"
reveal_type(a > a)  # revealed: Unknown
# error: [unsupported-operator] "Operator `>=` is not supported for types `A` and `A`, in comparing `tuple[Literal[0], int, A]` with `tuple[Literal[0], int, A]`"
reveal_type(a >= a)  # revealed: Unknown

# Comparison between `a` and `b` should only involve the first elements, `Literal[0]` and `Literal[99999]`,
# and should terminate immediately.
b = (99999, int_instance(), A())

reveal_type(a < b)  # revealed: Literal[True]
reveal_type(a <= b)  # revealed: Literal[True]
reveal_type(a > b)  # revealed: Literal[False]
reveal_type(a >= b)  # revealed: Literal[False]

Membership Test Comparisons

"Membership Test Comparisons" refers to the operators in and not in.

def int_instance() -> int:
    return 42

a = (1, 2)
b = ((3, 4), (1, 2))
c = ((1, 2, 3), (4, 5, 6))
d = ((int_instance(), int_instance()), (int_instance(), int_instance()))

reveal_type(a in b)  # revealed: Literal[True]
reveal_type(a not in b)  # revealed: Literal[False]

reveal_type(a in c)  # revealed: Literal[False]
reveal_type(a not in c)  # revealed: Literal[True]

reveal_type(a in d)  # revealed: bool
reveal_type(a not in d)  # revealed: bool

Identity Comparisons

"Identity Comparisons" refers to is and is not.

a = (1, 2)
b = ("a", "b")
c = (1, 2, 3)

reveal_type(a is (1, 2))  # revealed: bool
reveal_type(a is not (1, 2))  # revealed: bool

# TODO should be Literal[False] once we implement comparison of mismatched literal types
reveal_type(a is b)  # revealed: bool
# TODO should be Literal[True] once we implement comparison of mismatched literal types
reveal_type(a is not b)  # revealed: bool

reveal_type(a is c)  # revealed: Literal[False]
reveal_type(a is not c)  # revealed: Literal[True]

Homogeneous

For tuples like tuple[int, ...], tuple[Any, ...]

// TODO

11 KiB Raw Blame History Unescape Escape