Compare commits

...

80 Commits

Author SHA1 Message Date
Aria Desires
a9526fe0a5 serde_json is no longer optional 2025-12-08 15:17:28 -05:00
Aria Desires
e733a87bd7 Teach ty check to ask uv to sync the venv of a PEP-723 script 2025-12-08 15:00:37 -05:00
Micha Reiser
0ab8521171 [ty] Remove legacy concise_message fallback behavior (#21847) 2025-12-08 16:19:01 +00:00
Alex Waygood
0ccd84136a [ty] Make Python-version subdiagnostics less verbose (#21849) 2025-12-08 15:58:23 +00:00
Aria Desires
3981a23ee9 [ty] Supress inlay hints when assigning a trivial initializer call (#21848)
## Summary

By taking a purely syntactic approach to the problem of trivial
initializer calls we can supress `x: T = T()`, `x: T = x.y.T()` and `x:
MyNewType = MyNewType(0)` but still display `x: T[U] = T()`.

The place where we drop a ball is this does not compose with our
analysis for supressing `x = (0, "hello")` as `x = (0, T())` and `x =
(T(), T())` will still get inlay hints (I don't think this is a huge
deal).

* fixes https://github.com/astral-sh/ty/issues/1516

## Test Plan

Existing snapshots cover this well.
2025-12-08 10:54:30 -05:00
Charlie Marsh
385dd2770b [ty] Avoid double-inference on non-tuple argument to Annotated (#21837)
## Summary

If you pass a non-tuple to `Annotated`, we end up running inference on
it twice. I _think_ the only case here is `Annotated[]`, where we insert
a (fake) empty `Name` node in the slice.

Closes https://github.com/astral-sh/ty/issues/1801.
2025-12-08 10:24:05 -05:00
Alex Waygood
7519f6c27b Print Python version and Python platform in the fuzzer output when fuzzing fails (#21844) 2025-12-08 14:35:36 +00:00
David Peter
4686111681 [ty] More SQLAlchemy test updates (#21846)
Minor updates to the SQLAlchemy test suite. I verified all expected
results using pyright.
2025-12-08 15:22:55 +01:00
Micha Reiser
4364ffbdd3 [ty] Don't create a related diagnostic for the primary annotation of sub-diagnostics (#21845) 2025-12-08 14:22:11 +00:00
Charlie Marsh
b845e81c4a Use memchr for computing line indexes (#21838)
## Summary

Some benchmarks with Claude's help:

| File | Size | Baseline | Optimized | Speedup |

|---------------------|-------|----------------------|----------------------|---------|
| numpy/globals.py | 3 KB | 1.48 µs (1.95 GiB/s) | 740 ns (3.89 GiB/s) |
2.0x |
| unicode/pypinyin.py | 4 KB | 2.04 µs (2.01 GiB/s) | 1.18 µs (3.49
GiB/s) | 1.7x |
| pydantic/types.py | 26 KB | 13.1 µs (1.90 GiB/s) | 5.88 µs (4.23
GiB/s) | 2.2x |
| numpy/ctypeslib.py | 17 KB | 8.45 µs (1.92 GiB/s) | 3.94 µs (4.13
GiB/s) | 2.1x |
| large/dataset.py | 41 KB | 21.6 µs (1.84 GiB/s) | 11.2 µs (3.55 GiB/s)
| 1.9x |

I think that I originally thought we _had_ to iterate
character-by-character here because we needed to do the ASCII check, but
the ASCII check can be vectorized by LLVM (and the "search for newlines"
can be done with `memchr`).
2025-12-08 08:50:51 -05:00
David Peter
c99e10eedc [ty] Increase SQLAlchemy test coverage (#21843)
## Summary

Increase our SQLAlchemy test coverage to make sure we understand
`Session.scalar`, `Session.scalars`, `Session.execute` (and their async
equivalents), as well as `Result.tuples`, `Result.one_or_none`,
`Row._tuple`.
2025-12-08 14:36:13 +01:00
Dhruv Manilawala
a364195335 [ty] Avoid diagnostic when typing_extensions.ParamSpec uses default parameter (#21839)
## Summary

fixes: https://github.com/astral-sh/ty/issues/1798

## Test Plan

Add mdtest.
2025-12-08 12:34:30 +00:00
David Peter
dfd6ed0524 [ty] mdtests with external dependencies (#20904)
## Summary

This PR adds the possibility to write mdtests that specify external
dependencies in a `project` section of TOML blocks. For example, here is
a test that makes sure that we understand Pydantic's dataclass-transform
setup:

````markdown
```toml
[environment]
python-version = "3.12"
python-platform = "linux"

[project]
dependencies = ["pydantic==2.12.2"]
```

```py
from pydantic import BaseModel

class User(BaseModel):
    id: int
    name: str

user = User(id=1, name="Alice")
reveal_type(user.id)  # revealed: int
reveal_type(user.name)  # revealed: str

# error: [missing-argument] "No argument provided for required parameter
`name`"
invalid_user = User(id=2)
```
````

## How?

Using the `python-version` and the `dependencies` fields from the
Markdown section, we generate a `pyproject.toml` file, write it to a
temporary directory, and use `uv sync` to install the dependencies into
a virtual environment. We then copy the Python source files from that
venv's `site-packages` folder to a corresponding directory structure in
the in-memory filesystem. Finally, we configure the search paths
accordingly, and run the mdtest as usual.

I fully understand that there are valid concerns here:
* Doesn't this require network access? (yes, it does)
* Is this fast enough? (`uv` caching makes this almost unnoticeable,
actually)
* Is this deterministic? ~~(probably not, package resolution can depend
on the platform you're on)~~ (yes, hopefully)

For this reason, this first version is opt-in, locally. ~~We don't even
run these tests in CI (even though they worked fine in a previous
iteration of this PR).~~ You need to set `MDTEST_EXTERNAL=1`, or use the
new `-e/--enable-external` command line option of the `mdtest.py`
runner. For example:
```bash
# Skip mdtests with external dependencies (default):
uv run crates/ty_python_semantic/mdtest.py

# Run all mdtests, including those with external dependencies:
uv run crates/ty_python_semantic/mdtest.py -e

# Only run the `pydantic` tests. Use `-e` to make sure it is not skipped:
uv run crates/ty_python_semantic/mdtest.py -e pydantic
```

## Why?

I believe that this can be a useful addition to our testing strategy,
which lies somewhere between ecosystem tests and normal mdtests.
Ecosystem tests cover much more code, but they have the disadvantage
that we only see second- or third-order effects via diagnostic diffs. If
we unexpectedly gain or lose type coverage somewhere, we might not even
notice (assuming the gradual guarantee holds, and ecosystem code is
mostly correct). Another disadvantage of ecosystem checks is that they
only test checked-in code that is usually correct. However, we also want
to test what happens on wrong code, like the code that is momentarily
written in an editor, before fixing it. On the other end of the spectrum
we have normal mdtests, which have the disadvantage that they do not
reflect the reality of complex real-world code. We experience this
whenever we're surprised by an ecosystem report on a PR.

That said, these tests should not be seen as a replacement for either of
these things. For example, we should still strive to write detailed
self-contained mdtests for user-reported issues. But we might use this
new layer for regression tests, or simply as a debugging tool. It can
also serve as a tool to document our support for popular third-party
libraries.

## Test Plan

* I've been locally using this for a couple of weeks now.
* `uv run crates/ty_python_semantic/mdtest.py -e`
2025-12-08 11:44:20 +01:00
Dhruv Manilawala
ac882f7e63 [ty] Handle various invalid explicit specializations for ParamSpec (#21821)
## Summary

fixes: https://github.com/astral-sh/ty/issues/1788

## Test Plan

Add new mdtests.

---------

Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
2025-12-08 05:20:41 +00:00
Alex Waygood
857fd4f683 [ty] Add test case for fixed panic (#21832) 2025-12-07 15:58:11 +00:00
Charlie Marsh
285d6410d3 [ty] Avoid double-analyzing tuple in Final subscript (#21828)
## Summary

As-is, a single-element tuple gets destructured via:

```rust
let arguments = if let ast::Expr::Tuple(tuple) = slice {
    &*tuple.elts
} else {
    std::slice::from_ref(slice)
};
```

But then, because it's a single element, we call
`infer_annotation_expression_impl`, passing in the tuple, rather than
the first element.

Closes https://github.com/astral-sh/ty/issues/1793.
Closes https://github.com/astral-sh/ty/issues/1768.

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-07 14:27:14 +00:00
Prakhar Pratyush
cbff09b9af [flake8-bandit] Fix false positive when using non-standard CSafeLoader path (S506). (#21830) 2025-12-07 11:40:46 +01:00
Louis Maddox
6e0e49eda8 Add minimal-size build profile (#21826)
This PR adds the same `minimal-size` profile as `uv` repo workspace has

```toml
# Profile to build a minimally sized binary for uv-build
[profile.minimal-size]
inherits = "release"
opt-level = "z"
# This will still show a panic message, we only skip the unwind
panic = "abort"
codegen-units = 1
```
but removes its `panic = "abort"` setting

- As discussed in #21825

Compared to the ones pre-built via `uv tool install`, this builds 35%
smaller ruff and 24% smaller ty binaries
(as measured
[here](https://github.com/lmmx/just-pre-commit/blob/master/refresh_binaries.sh))
2025-12-06 13:19:04 -05:00
Charlie Marsh
ef45c97dab [ty] Allow tuple[Any, ...] to assign to tuple[int, *tuple[int, ...]] (#21803)
## Summary

Closes https://github.com/astral-sh/ty/issues/1750.
2025-12-05 19:04:23 +00:00
Micha Reiser
9714c589e1 [ty] Support renaming import aliases (#21792) 2025-12-05 19:12:13 +01:00
Micha Reiser
b2fb421ddd [ty] Add redeclaration LSP tests (#21812) 2025-12-05 18:02:34 +00:00
Shunsuke Shibayama
2f05ffa2c8 [ty] more detailed description of "Size limit on unions of literals" in mdtest (#21804) 2025-12-05 17:34:39 +00:00
Dhruv Manilawala
b623189560 [ty] Complete support for ParamSpec (#21445)
## Summary

Closes: https://github.com/astral-sh/ty/issues/157

This PR adds support for the following capabilities involving a
`ParamSpec` type variable:
- Representing `P.args` and `P.kwargs` in the type system
- Matching against a callable containing `P` to create a type mapping
- Specializing `P` against the stored parameters

The value of a `ParamSpec` type variable is being represented using
`CallableType` with a `CallableTypeKind::ParamSpecValue` variant. This
`CallableTypeKind` is expanded from the existing `is_function_like`
boolean flag. An `enum` is used as these variants are mutually
exclusive.

For context, an initial iteration made an attempt to expand the
`Specialization` to use `TypeOrParameters` enum that represents that a
type variable can specialize into either a `Type` or `Parameters` but
that increased the complexity of the code as all downstream usages would
need to handle both the variants appropriately. Additionally, we'd have
also need to establish an invariant that a regular type variable always
maps to a `Type` while a paramspec type variable always maps to a
`Parameters`.

I've intentionally left out checking and raising diagnostics when the
`ParamSpec` type variable and it's components are not being used
correctly to avoid scope increase and it can easily be done as a
follow-up. This would also include the scoping rules which I don't think
a regular type variable implements either.

## Test Plan

Add new mdtest cases and update existing test cases.

Ran this branch on pyx, no new diagnostics.

### Ecosystem analysis

There's a case where in an annotated assignment like:
```py
type CustomType[P] = Callable[...]

def value[**P](...): ...

def another[**P](...):
	target: CustomType[P] = value
```
The type of `value` is a callable and it has a paramspec that's bound to
`value`, `CustomType` is a type alias that's a callable and `P` that's
used in it's specialization is bound to `another`. Now, ty infers the
type of `target` same as `value` and does not use the declared type
`CustomType[P]`. [This is the
assignment](0980b9d9ab/src/async_utils/gen_transform.py (L108))
that I'm referring to which then leads to error in downstream usage.
Pyright and mypy does seem to use the declared type.

There are multiple diagnostics in `dd-trace-py` that requires support
for `cls`.

I'm seeing `Divergent` type for an example like which ~~I'm not sure
why, I'll look into it tomorrow~~ is because of a cycle as mentioned in
https://github.com/astral-sh/ty/issues/1729#issuecomment-3612279974:
```py
from typing import Callable

def decorator[**P](c: Callable[P, int]) -> Callable[P, str]: ...

@decorator
def func(a: int) -> int: ...

# ((a: int) -> str) | ((a: Divergent) -> str)
reveal_type(func)
```

I ~~need to look into why are the parameters not being specialized
through multiple decorators in the following code~~ think this is also
because of the cycle mentioned in
https://github.com/astral-sh/ty/issues/1729#issuecomment-3612279974 and
the fact that we don't support `staticmethod` properly:
```py
from contextlib import contextmanager

class Foo:
    @staticmethod
    @contextmanager
    def method(x: int):
        yield

foo = Foo()
# ty: Revealed type: `() -> _GeneratorContextManager[Unknown, None, None]` [revealed-type]
reveal_type(foo.method)
```

There's some issue related to `Protocol` that are generic over a
`ParamSpec` in `starlette` which might be related to
https://github.com/astral-sh/ty/issues/1635 but I'm not sure. Here's a
minimal example to reproduce:

<details><summary>Code snippet:</summary>
<p>

```py
from collections.abc import Awaitable, Callable, MutableMapping
from typing import Any, Callable, ParamSpec, Protocol

P = ParamSpec("P")

Scope = MutableMapping[str, Any]
Message = MutableMapping[str, Any]
Receive = Callable[[], Awaitable[Message]]
Send = Callable[[Message], Awaitable[None]]

ASGIApp = Callable[[Scope, Receive, Send], Awaitable[None]]

_Scope = Any
_Receive = Callable[[], Awaitable[Any]]
_Send = Callable[[Any], Awaitable[None]]

# Since `starlette.types.ASGIApp` type differs from `ASGIApplication` from `asgiref`
# we need to define a more permissive version of ASGIApp that doesn't cause type errors.
_ASGIApp = Callable[[_Scope, _Receive, _Send], Awaitable[None]]


class _MiddlewareFactory(Protocol[P]):
    def __call__(
        self, app: _ASGIApp, *args: P.args, **kwargs: P.kwargs
    ) -> _ASGIApp: ...


class Middleware:
    def __init__(
        self, factory: _MiddlewareFactory[P], *args: P.args, **kwargs: P.kwargs
    ) -> None:
        self.factory = factory
        self.args = args
        self.kwargs = kwargs


class ServerErrorMiddleware:
    def __init__(
        self,
        app: ASGIApp,
        value: int | None = None,
        flag: bool = False,
    ) -> None:
        self.app = app
        self.value = value
        self.flag = flag

    async def __call__(self, scope: Scope, receive: Receive, send: Send) -> None: ...


# ty: Argument to bound method `__init__` is incorrect: Expected `_MiddlewareFactory[(...)]`, found `<class 'ServerErrorMiddleware'>` [invalid-argument-type]
Middleware(ServerErrorMiddleware, value=500, flag=True)
```

</p>
</details> 

### Conformance analysis

> ```diff
> -constructors_callable.py:36:13: info[revealed-type] Revealed type:
`(...) -> Unknown`
> +constructors_callable.py:36:13: info[revealed-type] Revealed type:
`(x: int) -> Unknown`
> ```

Requires return type inference i.e.,
https://github.com/astral-sh/ruff/pull/21551

> ```diff
> +constructors_callable.py:194:16: error[invalid-argument-type]
Argument is incorrect: Expected `list[T@__init__]`, found `list[Unknown
| str]`
> +constructors_callable.py:194:22: error[invalid-argument-type]
Argument is incorrect: Expected `list[T@__init__]`, found `list[Unknown
| str]`
> +constructors_callable.py:195:4: error[invalid-argument-type] Argument
is incorrect: Expected `list[T@__init__]`, found `list[Unknown | int]`
> +constructors_callable.py:195:9: error[invalid-argument-type] Argument
is incorrect: Expected `list[T@__init__]`, found `list[Unknown | str]`
> ```

I might need to look into why this is happening...

> ```diff
> +generics_defaults.py:79:1: error[type-assertion-failure] Type
`type[Class_ParamSpec[(str, int, /)]]` does not match asserted type
`<class 'Class_ParamSpec'>`
> ```

which is on the following code
```py
DefaultP = ParamSpec("DefaultP", default=[str, int])

class Class_ParamSpec(Generic[DefaultP]): ...

assert_type(Class_ParamSpec, type[Class_ParamSpec[str, int]])
```

It's occurring because there's no equivalence relationship defined
between `ClassLiteral` and `KnownInstanceType::TypeGenericAlias` which
is what these types are.

Everything else looks good to me!
2025-12-05 22:00:06 +05:30
Micha Reiser
f29436ca9e [ty] Update benchmark dependencies (#21815) 2025-12-05 17:23:18 +01:00
Douglas Creager
e42cdf8495 [ty] Carry generic context through when converting class into Callable (#21798)
When converting a class (whether specialized or not) into a `Callable`
type, we should carry through any generic context that the constructor
has. This includes both the generic context of the class itself (if it's
generic) and of the constructor methods (if they are separately
generic).

To help test this, this also updates the `generic_context` extension
function to work on `Callable` types and unions; and adds a new
`into_callable` extension function that works just like
`CallableTypeOf`, but on value forms instead of type forms.

Pulled this out of #21551 for separate review.
2025-12-05 08:57:21 -05:00
Alex Waygood
71a7a03ad4 [ty] Add more tests for renamings (#21810) 2025-12-05 12:41:31 +00:00
Alex Waygood
48f7f42784 [ty] Minor improvements to assert_type diagnostics (#21811) 2025-12-05 12:33:30 +00:00
Micha Reiser
3deb7e1b90 [ty] Add some attribute/method renaming test cases (#21809) 2025-12-05 11:56:28 +01:00
mahiro
5df8a959f5 Update mkdocs-material to 9.7.0 (Insiders now free) (#21797) 2025-12-05 08:53:08 +01:00
Dhruv Manilawala
6f03afe318 Remove unused whitespaces in test cases (#21806)
These aren't used in the tests themselves. There are more instances of
them in other files but those require code changes so I've left them as
it is.
2025-12-05 12:51:40 +05:30
Shunsuke Shibayama
1951f1bbb8 [ty] fix panic when instantiating a type variable with invalid constraints (#21663) 2025-12-04 18:48:38 -08:00
Shunsuke Shibayama
10de342991 [ty] fix build failure caused by conflicts between #21683 and #21800 (#21802) 2025-12-04 18:20:24 -08:00
Shunsuke Shibayama
3511b7a06b [ty] do nothing with store_expression_type if inner_expression_inference_state is Get (#21718)
## Summary

Fixes https://github.com/astral-sh/ty/issues/1688

## Test Plan

N/A
2025-12-04 18:05:41 -08:00
Shunsuke Shibayama
f3e5713d90 [ty] increase the limit on the number of elements in a non-recursively defined literal union (#21683)
## Summary

Closes https://github.com/astral-sh/ty/issues/957

As explained in https://github.com/astral-sh/ty/issues/957, literal
union types for recursively defined values ​​can be widened early to
speed up the convergence of fixed-point iterations.
This PR achieves this by embedding a marker in `UnionType` that
distinguishes whether a value is recursively defined.

This also allows us to identify values ​​that are not recursively
defined, so I've increased the limit on the number of elements in a
literal union type for such values.

Edit: while this PR doesn't provide the significant performance
improvement initially hoped for, it does have the benefit of allowing
the number of elements in a literal union to be raised above the salsa
limit, and indeed mypy_primer results revealed that a literal union of
220 elements was actually being used.

## Test Plan

`call/union.md` has been updated
2025-12-04 18:01:48 -08:00
Carl Meyer
a9de6b5c3e [ty] normalize typevar bounds/constraints in cycles (#21800)
Fixes https://github.com/astral-sh/ty/issues/1587

## Summary

Perform cycle normalization on typevar bounds and constraints (similar
to how it was already done for typevar defaults) in order to ensure
convergence in cyclic cases.

There might be another fix here that could avoid the cycle in many more
cases, where we don't eagerly evaluate typevar bounds/constraints on
explicit specialization, but just accept the given specialization and
later evaluate to see whether we need to emit a diagnostic on it. But
the current fix here is sufficient to solve the problem and matches the
patterns we use to ensure cycle convergence elsewhere, so it seems good
for now; left a TODO for the other idea.

This fix is sufficient to make us not panic, but not sufficient to get
the semantics fully correct; see the TODOs in the tests. I have ideas
for fixing that as well, but it seems worth at least getting this in to
fix the panic.

## Test Plan

Test that previously panicked now does not.

---------

Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
2025-12-04 15:17:57 -08:00
Andrew Gallant
06415b1877 [ty] Update completion eval to include modules
Our parsing and confirming of symbol names is highly suspect, but
I think it's fine for now.
2025-12-04 17:37:37 -05:00
Andrew Gallant
518d11b33f [ty] Add modules to auto-import
This makes auto-import include modules in suggestions.

In this initial implementation, we permit this to include submodules as
well. This is in contrast to what we do in `import ...` completions.
It's easy to change this behavior, but I think it'd be interesting to
run with this for now to see how well it works.
2025-12-04 17:37:37 -05:00
Andrew Gallant
da94b99248 [ty] Add support for module-only import requests
The existing importer functionality always required
an import request with a module and a member in that
module. But we want to be able to insert import statements
for a module itself and not any members in the module.

This is basically changing `member: &str` to an
`Option<&str>` and fixing the fallout in a way that
makes sense for module-only imports.
2025-12-04 17:37:37 -05:00
Andrew Gallant
3c2cf49f60 [ty] Refactor auto-import symbol info
This just encapsulates the representation so that
we can make changes to it more easily.
2025-12-04 17:37:37 -05:00
Andrew Gallant
fdcb5a7e73 [ty] Clarify the use of SymbolKind in auto-import 2025-12-04 13:21:26 -05:00
Andrew Gallant
6a025d1925 [ty] Redact ranking of completions from e2e LSP tests
I think changes to this value are generally noise. It's hard to tell
what it means and it isn't especially actionable. We already have an
eval running in CI for completion ranking, so I don't think it's
terribly important to care about ranking here in e2e tests _generally_.
2025-12-04 13:21:26 -05:00
Andrew Gallant
f054e7edf8 [ty] Tweaks tests to use clearer language
A completion lacking a module reference doesn't necessarily mean that
the symbol is defined within the current module. I believe the intent
here is that it means that no import is required to use it.
2025-12-04 13:21:26 -05:00
Andrew Gallant
e154efa229 [ty] Update evaluation results
These are all improvements here with one slight regression on
`reveal_type` ranking. The previous completions offered were:

```
$ cargo r -q -p ty_completion_eval show-one ty-extensions-lower-stdlib
ENOTRECOVERABLE (module: errno)
REG_WHOLE_HIVE_VOLATILE (module: winreg)
SQLITE_NOTICE_RECOVER_WAL (module: _sqlite3)
SupportsGetItemViewable (module: _typeshed)
removeHandler (module: unittest.signals)
reveal_mro (module: ty_extensions)
reveal_protocol_interface (module: ty_extensions)
reveal_type (module: typing) (*, 8/10)
_remove_original_values (module: _osx_support)
_remove_universal_flags (module: _osx_support)
-----
found 10 completions
```

And now they are:

```
$ cargo r -q -p ty_completion_eval show-one ty-extensions-lower-stdlib
ENOTRECOVERABLE (module: errno)
REG_WHOLE_HIVE_VOLATILE (module: winreg)
SQLITE_NOTICE_RECOVER_WAL (module: sqlite3)
SQLITE_NOTICE_RECOVER_WAL (module: sqlite3.dbapi2)
removeHandler (module: unittest)
removeHandler (module: unittest.signals)
reveal_mro (module: ty_extensions)
reveal_protocol_interface (module: ty_extensions)
reveal_type (module: typing) (*, 9/9)
-----
found 9 completions
```

Some completions were removed (because they are now considered
unexported) and some were added (likely do to better re-export support).

This particular case probably warrants more special attention anyway.
So I think this is fine. (It's only a one-ranking regression.)
2025-12-04 13:21:26 -05:00
Andrew Gallant
32f400a457 [ty] Make auto-import ignore symbols in modules starting with a _
This applies recursively. So if *any* component of a module name starts
with a `_`, then symbols from that module are excluded from auto-import.

The exception is when it's a module within first party code. Then we
want to include it in auto-import.
2025-12-04 13:21:26 -05:00
Andrew Gallant
2a38395bc8 [ty] Add some tests for re-exports and __all__ to completions
Note that the `Deprecated` symbols from `importlib.metadata` are no
longer offered because 1) `importlib.metadata` defined `__all__` and 2)
the `Deprecated` symbols aren't in it. These seem to not be a part of
its public API according to the docs, so this seems right to me.
2025-12-04 13:21:26 -05:00
Andrew Gallant
8c72b296c9 [ty] Add support for re-exports and __all__ to auto-import
This commit (mostly) re-implements the support for `__all__` in
ty-proper, but inside the auto-import AST scanner.

When `__all__` isn't present in a module, we fall back to conventions to
determine whether a symbol is exported or not:
https://docs.python.org/3/library/index.html

However, in keeping with current practice for non-auto-import
completions, we continue to provide sunder and dunder names as
re-exports.

When `__all__` is present, we respect it strictly. That is, a symbol is
exported *if and only if* it's in `__all__`. This is somewhat stricter
than pylance seemingly is. I felt like it was a good idea to start here,
and we can relax it based on user demand (perhaps through a setting).
2025-12-04 13:21:26 -05:00
Andrew Gallant
086f1e0b89 [ty] Skip over expressions in auto-import AST scanning 2025-12-04 13:21:26 -05:00
Andrew Gallant
5da45f8ec7 [ty] Simplify auto-import AST visitor slightly and add tests
This simplifies the existing visitor by DRYing it up slightly.
We also add tests for the existing functionality. In particular,
we want to add support for re-export conventions, and that
warrants more careful testing.
2025-12-04 13:21:26 -05:00
Andrew Gallant
62f20b1e86 [ty] Re-arrange imports in symbol extraction
I like using a qualified `ast::` prefix for things from
`ruff_python_ast`, so switch over to that convention.
2025-12-04 13:21:26 -05:00
Aria Desires
cccb0bbaa4 [ty] Add tests for implicit submodule references (#21793)
## Summary

I realized we don't really test `DefinitionKind::ImportFromSubmodule` in
the IDE at all, so here's a bunch of them, just recording our current
behaviour.

## Test Plan

*stares at the camera*
2025-12-04 15:46:23 +00:00
Brent Westbrook
9d4f1c6ae2 Bump 0.14.8 (#21791) 2025-12-04 09:45:53 -05:00
Micha Reiser
326025d45f [ty] Always register rename provider if client doesn't support dynamic registration (#21789) 2025-12-04 14:40:16 +01:00
Micha Reiser
3aefe85b32 [ty] Ensure rename CursorTest calls can_rename before renaming (#21790) 2025-12-04 14:19:48 +01:00
Dhruv Manilawala
b8ecc83a54 Fix clippy errors on main (#21788)
https://github.com/astral-sh/ruff/actions/runs/19922070773/job/57112827024#step:5:62
2025-12-04 16:20:37 +05:30
Aria Desires
6491932757 [ty] Fix crash when hovering an unknown string annotation (#21782)
## Summary

I have no idea what I'm doing with the fix (all the interesting stuff is
in the second commit).

The basic problem is the compiler emits the diagnostic:

```
x: "foobar"
    ^^^^^^
```

Which the suppression code-action hands the end of to `Tokens::after`
which then panics because that function panics if handed an offset that
is in the middle of a token.

Fixes https://github.com/astral-sh/ty/issues/1748

## Test Plan

Many tests added (only the e2e test matters).
2025-12-04 09:11:40 +01:00
Micha Reiser
a9f2bb41bd [ty] Don't send publish diagnostics for clients supporting pull diagnostics (#21772) 2025-12-04 08:12:04 +01:00
Aria Desires
e2b72fbf99 [ty] cleanup test path (#21781)
Fixes
https://github.com/astral-sh/ruff/pull/21745#discussion_r2586552295
2025-12-03 21:54:50 +00:00
Alex Waygood
14fce0d440 [ty] Improve the display of various special-form types (#21775) 2025-12-03 21:19:59 +00:00
Alex Waygood
8ebecb2a88 [ty] Add subdiagnostic hint if the user wrote X = Any rather than X: Any (#21777) 2025-12-03 20:42:21 +00:00
Aria Desires
45ac30a4d7 [ty] Teach ty the meaning of desperation (try ancestor pyproject.tomls as search-paths if module resolution fails) (#21745)
## Summary

This makes an importing file a required argument to module resolution,
and if the fast-path cached query fails to resolve the module, take the
slow-path uncached (could be cached if we want)
`desperately_resolve_module` which will walk up from the importing file
until it finds a `pyproject.toml` (arbitrary decision, we could try
every ancestor directory), at which point it takes one last desperate
attempt to use that directory as a search-path. We do not continue
walking up once we've found a `pyproject.toml` (arbitrary decision, we
could keep going up).

Running locally, this fixes every broken-for-workspace-reasons import in
pyx's workspace!

* Fixes https://github.com/astral-sh/ty/issues/1539
* Improves https://github.com/astral-sh/ty/issues/839

## Test Plan

The workspace tests see a huge improvement on most absolute imports.
2025-12-03 15:04:36 -05:00
Alex Waygood
0280949000 [ty] fix panic when attempting to infer the variance of a PEP-695 class that depends on a recursive type aliases and also somehow protocols (#21778)
Fixes https://github.com/astral-sh/ty/issues/1716.

## Test plan

I added a corpus snippet that causes us to panic on `main` (I tested by
running `cargo run -p ty_python_semantic --test=corpus` without the fix
applied).
2025-12-03 19:01:42 +00:00
Bhuminjay Soni
c722f498fe [flake8-bugbear] Catch yield expressions within other statements (B901) (#21200)
## Summary

This PR re-implements [return-in-generator
(B901)](https://docs.astral.sh/ruff/rules/return-in-generator/#return-in-generator-b901)
for async generators as a semantic syntax error. This is not a syntax
error for sync generators, so we'll need to preserve both the lint rule
and the syntax error in this case.

It also updates B901 and the new implementation to catch cases where the
generator's `yield` or `yield from` expression is part of another
statement, as in:

```py
def foo():
    return (yield)
```

These were previously not caught because we only looked for
`Stmt::Expr(Expr::Yield)` in `visit_stmt` instead of visiting `yield`
expressions directly. I think this modification is within the spirit of
the rule and safe to try out since the rule is in preview.

## Test Plan

<!-- How was it tested? -->
I have written tests as directed in #17412

---------

Signed-off-by: 11happy <soni5happy@gmail.com>
Signed-off-by: 11happy <bhuminjaysoni@gmail.com>
Co-authored-by: Brent Westbrook <brentrwestbrook@gmail.com>
Co-authored-by: Brent Westbrook <36778786+ntBre@users.noreply.github.com>
2025-12-03 12:05:15 -05:00
David Peter
1f4f8d9950 [ty] Fix flow of associated member states during star imports (#21776)
## Summary

Star-imports can not just affect the state of symbols that they pull in,
they can also affect the state of members that are associated with those
symbols. For example, if `obj.attr` was previously narrowed from `int |
None` to `int`, and a star-import now overwrites `obj`, then the
narrowing on `obj.attr` should be "reset".

This PR keeps track of the state of associated members during star
imports and properly models the flow of their corresponding state
through the control flow structure that we artificially create for
star-imports.

See [this
comment](https://github.com/astral-sh/ty/issues/1355#issuecomment-3607125005)
for an explanation why this caused ty to see certain `asyncio` symbols
as not being accessible on Python 3.14.

closes https://github.com/astral-sh/ty/issues/1355

## Ecosystem impact

```diff
async-utils (https://github.com/mikeshardmind/async-utils)
- src/async_utils/bg_loop.py:115:31: error[invalid-argument-type] Argument to bound method `set_task_factory` is incorrect: Expected `_TaskFactory | None`, found `def eager_task_factory[_T_co](loop: AbstractEventLoop | None, coro: Coroutine[Any, Any, _T_co@eager_task_factory], *, name: str | None = None, context: Context | None = None) -> Task[_T_co@eager_task_factory]`
- Found 30 diagnostics
+ Found 29 diagnostics

mitmproxy (https://github.com/mitmproxy/mitmproxy)
+ mitmproxy/utils/asyncio_utils.py:96:60: warning[unused-ignore-comment] Unused blanket `type: ignore` directive
- test/conftest.py:37:31: error[invalid-argument-type] Argument to bound method `set_task_factory` is incorrect: Expected `_TaskFactory | None`, found `def eager_task_factory[_T_co](loop: AbstractEventLoop | None, coro: Coroutine[Any, Any, _T_co@eager_task_factory], *, name: str | None = None, context: Context | None = None) -> Task[_T_co@eager_task_factory]`
```

All of these seem to be correct, they give us a different type for
`asyncio` symbols that are now imported from different
`sys.version_info` branches (where we previously failed to recognize
some of these as statically true/false).

```diff
dd-trace-py (https://github.com/DataDog/dd-trace-py)
- ddtrace/contrib/internal/asyncio/patch.py:39:12: error[invalid-argument-type] Argument to function `unwrap` is incorrect: Expected `WrappedFunction`, found `def create_task[_T](self, coro: Coroutine[Any, Any, _T@create_task] | Generator[Any, None, _T@create_task], *, name: object = None) -> Task[_T@create_task]`
+ ddtrace/contrib/internal/asyncio/patch.py:39:12: error[invalid-argument-type] Argument to function `unwrap` is incorrect: Expected `WrappedFunction`, found `def create_task[_T](self, coro: Generator[Any, None, _T@create_task] | Coroutine[Any, Any, _T@create_task], *, name: object = None) -> Task[_T@create_task]`
```

Similar, but only results in a diagnostic change.

## Test Plan

Added a regression test
2025-12-03 17:52:31 +01:00
William Woodruff
4488e9d47d Revert "Enable PEP 740 attestations when publishing to PyPI" (#21768) 2025-12-03 11:07:29 -05:00
github-actions[bot]
b08f0b2caa [ty] Sync vendored typeshed stubs (#21715)
Co-authored-by: typeshedbot <>
Co-authored-by: David Peter <mail@david-peter.de>
2025-12-03 15:49:51 +00:00
David Peter
d6e472f297 [ty] Reachability constraints: minor documentation fixes (#21774) 2025-12-03 16:40:11 +01:00
Douglas Creager
45842cc034 [ty] Fix non-determinism in ConstraintSet.specialize_constrained (#21744)
This fixes a non-determinism that we were seeing in the constraint set
tests in https://github.com/astral-sh/ruff/pull/21715.

In this test, we create the following constraint set, and then try to
create a specialization from it:

```
(T@constrained_by_gradual_list = list[Base])
  ∨
(Bottom[list[Any]] ≤ T@constrained_by_gradual_list ≤ Top[list[Any]])
```

That is, `T` is either specifically `list[Base]`, or it's any `list`.
Our current heuristics say that, absent other restrictions, we should
specialize `T` to the more specific type (`list[Base]`).

In the correct test output, we end up creating a BDD that looks like
this:

```
(T@constrained_by_gradual_list = list[Base])
┡━₁ always
└─₀ (Bottom[list[Any]] ≤ T@constrained_by_gradual_list ≤ Top[list[Any]])
    ┡━₁ always
    └─₀ never
```

In the incorrect output, the BDD looks like this:

```
(Bottom[list[Any]] ≤ T@constrained_by_gradual_list ≤ Top[list[Any]])
┡━₁ always
└─₀ never
```

The difference is the ordering of the two individual constraints. Both
constraints appear in the first BDD, but the second BDD only contains `T
is any list`. If we were to force the second BDD to contain both
constraints, it would look like this:

```
(Bottom[list[Any]] ≤ T@constrained_by_gradual_list ≤ Top[list[Any]])
┡━₁ always
└─₀ (T@constrained_by_gradual_list = list[Base])
    ┡━₁ always
    └─₀ never
```

This is the standard shape for an OR of two constraints. However! Those
two constraints are not independent of each other! If `T` is
specifically `list[Base]`, then it's definitely also "any `list`". From
that, we can infer the contrapositive: that if `T` is not any list, then
it cannot be `list[Base]` specifically. When we encounter impossible
situations like that, we prune that path in the BDD, and treat it as
`false`. That rewrites the second BDD to the following:

```
(Bottom[list[Any]] ≤ T@constrained_by_gradual_list ≤ Top[list[Any]])
┡━₁ always
└─₀ (T@constrained_by_gradual_list = list[Base])
    ┡━₁ never   <-- IMPOSSIBLE, rewritten to never
    └─₀ never
```

We then would see that that BDD node is redundant, since both of its
outgoing edges point at the `never` node. Our BDDs are _reduced_, which
means we have to remove that redundant node, resulting in the BDD we saw
above:

```
(Bottom[list[Any]] ≤ T@constrained_by_gradual_list ≤ Top[list[Any]])
┡━₁ always
└─₀ never       <-- redundant node removed
```

The end result is that we were "forgetting" about the `T = list[Base]`
constraint, but only for some BDD variable orderings.

To fix this, I'm leaning in to the fact that our BDDs really do need to
"remember" all of the constraints that they were created with. Some
combinations might not be possible, but we now have the sequent map,
which is quite good at detecting and pruning those.

So now our BDDs are _quasi-reduced_, which just means that redundant
nodes are allowed. (At first I was worried that allowing redundant nodes
would be an unsound "fix the glitch". But it turns out they're real!
[This](https://ieeexplore.ieee.org/abstract/document/130209) is the
paper that introduces them, though it's very difficult to read. Knuth
mentions them in §7.1.4 of
[TAOCP](https://course.khoury.northeastern.edu/csu690/ssl/bdd-knuth.pdf),
and [this paper](https://par.nsf.gov/servlets/purl/10128966) has a nice
short summary of them in §2.)

While we're here, I've added a bunch of `debug` and `trace` level log
messages to the constraint set implementation. I was getting tired of
having to add these by hands over and over. To enable them, just set
`TY_LOG` in your environment, e.g.

```sh
env TY_LOG=ty_python_semantic::types::constraints::SequentMap=trace ty check ...
```

[Note, this has an `internal` label because are still not using
`specialize_constrained` in anything user-facing yet.]
2025-12-03 10:19:39 -05:00
Alex Waygood
cd079bd92e [ty] Improve @override, @final and Liskov checks in cases where there are multiple reachable definitions (#21767) 2025-12-03 12:51:36 +00:00
Alex Waygood
5756b3809c [ty] Extend invalid-explicit-override to also cover properties decorated with @override that do not override anything (#21756) 2025-12-03 11:27:47 +00:00
Micha Reiser
92c5f62ec0 [ty] Enable LRU collection for parsed module (#21749) 2025-12-03 12:16:18 +01:00
David Peter
21e5a57296 [ty] Support typevar-specialized dynamic types in generic type aliases (#21730)
## Summary

For a type alias like the one below, where `UnknownClass` is something
with a dynamic type, we previously lost track of the fact that this
dynamic type was explicitly specialized *with a type variable*. If that
alias is then later explicitly specialized itself (`MyAlias[int]`), we
would miscount the number of legacy type variables and emit a
`invalid-type-arguments` diagnostic
([playground](https://play.ty.dev/886ae6cc-86c3-4304-a365-510d29211f85)).
```py
T = TypeVar("T")

MyAlias: TypeAlias = UnknownClass[T] | None
```
The solution implemented here is not pretty, but we can hopefully get
rid of it via https://github.com/astral-sh/ty/issues/1711. Also, once we
properly support `ParamSpec` and `Concatenate`, we should be able to
remove some of this code.

This addresses many of the `invalid-type-arguments` false-positives in
https://github.com/astral-sh/ty/issues/1685. With this change, there are
still some diagnostics of this type left. Instead of implementing even
more (rather sophisticated) workarounds for these cases as well, it
might be much easier to wait for full `ParamSpec`/`Concatenate` support
and then try again.

A disadvantage of this implementation is that we lose track of some
`@Todo` types and replace them with `Unknown`. We could spend more
effort and try to preserve them, but I'm unsure if this is the best use
of our time right now.

## Test Plan

New Markdown tests.
2025-12-03 10:00:02 +01:00
Denys Zhak
f4e4229683 Add token based parenthesized_ranges implementation (#21738)
Co-authored-by: Micha Reiser <micha@reiser.io>
2025-12-03 08:15:17 +00:00
David Peter
e6ddeed386 [ty] Default-specialization of generic type aliases (#21765)
## Summary

Implement default-specialization of generic type aliases (implicit or
PEP-613) if they are used in a type expression without an explicit
specialization.

closes https://github.com/astral-sh/ty/issues/1690

## Typing conformance

```diff
-generics_defaults_specialization.py:26:5: error[type-assertion-failure] Type `SomethingWithNoDefaults[int, str]` does not match asserted type `SomethingWithNoDefaults[int, DefaultStrT]`
```

That's exactly what we want ✔️ 

All other tests in this file pass as well, with the exception of this
assertion, which is just wrong (at least according to our
interpretation, `type[Bar] != <class 'Bar'>`). I checked that we do
correctly default-specialize the type parameter which is not displayed
in the diagnostic that we raise.
```py
class Bar(SubclassMe[int, DefaultStrT]): ...

assert_type(Bar, type[Bar[str]])  # ty: Type `type[Bar[str]]` does not match asserted type `<class 'Bar'>`
```

## Ecosystem impact

Looks like I should have included this last week 😎 

## Test Plan

Updated pre-existing tests and add a few new ones.
2025-12-03 09:10:45 +01:00
Alex Waygood
c5b8d551df [ty] Suppress false positives when dataclasses.dataclass(...)(cls) is called imperatively (#21729)
Fixes https://github.com/astral-sh/ty/issues/1705
2025-12-03 08:05:25 +00:00
Bhuminjay Soni
f68080b55e [syntax-error] Default type parameter followed by non-default type parameter (#21657)
## Summary

This PR implements syntax error where a default type parameter is
followed by a non-default type parameter.
https://github.com/astral-sh/ruff/issues/17412#issuecomment-3584088217


## Test Plan

I have written inline tests as directed in #17412

---------

Signed-off-by: 11happy <bhuminjaysoni@gmail.com>
Signed-off-by: 11happy <soni5happy@gmail.com>
2025-12-03 12:01:31 +05:30
Amethyst Reese
abaa49f552 new module for parsing ranged suppressions (#21441)
This adds a new `suppression` module to the `ruff_linter` crate, similar
to the suppression
module for ty, to parse comments for ruff suppression directives, such
as `# ruff: disable[CODE]`.
2025-12-02 15:39:59 -08:00
Ibraheem Ahmed
7b0aab1696 [ty] type[T] is assignable to an inferable typevar (#21766)
## Summary

Resolves https://github.com/astral-sh/ty/issues/1712.
2025-12-02 18:25:09 -05:00
Brent Westbrook
2250fa6f98 Fix syntax error false positives for await outside functions (#21763)
## Summary

Fixes #21750 and a related bug in `PLE1142`. We were not properly
considering generators to be valid `await` contexts, which caused the
`F704` issue. One of the tests I added for this also uncovered an issue
in `PLE1142` for comprehensions nested within async generators because
we were only checking the current scope rather than traversing the
nested context.

## Test Plan

Both of these rules are implemented as semantic syntax errors, so I
added tests (and fixes) in both Ruff and ty.
2025-12-02 21:02:02 +00:00
Alex Waygood
392a8e4e50 [ty] Improve diagnostics for unsupported comparison operations (#21737) 2025-12-02 19:58:45 +00:00
Micha Reiser
515de2d062 Move Token, TokenKind and Tokens to ruff-python-ast (#21760) 2025-12-02 20:10:46 +01:00
350 changed files with 25655 additions and 10909 deletions

View File

@@ -75,14 +75,6 @@
matchManagers: ["cargo"],
enabled: false,
},
{
// `mkdocs-material` requires a manual update to keep the version in sync
// with `mkdocs-material-insider`.
// See: https://squidfunk.github.io/mkdocs-material/insiders/upgrade/
matchManagers: ["pip_requirements"],
matchPackageNames: ["mkdocs-material"],
enabled: false,
},
{
groupName: "pre-commit dependencies",
matchManagers: ["pre-commit"],

View File

@@ -24,6 +24,8 @@ env:
PACKAGE_NAME: ruff
PYTHON_VERSION: "3.14"
NEXTEST_PROFILE: ci
# Enable mdtests that require external dependencies
MDTEST_EXTERNAL: "1"
jobs:
determine_changes:
@@ -779,8 +781,6 @@ jobs:
name: "mkdocs"
runs-on: ubuntu-latest
timeout-minutes: 10
env:
MKDOCS_INSIDERS_SSH_KEY_EXISTS: ${{ secrets.MKDOCS_INSIDERS_SSH_KEY != '' }}
steps:
- uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
with:
@@ -788,11 +788,6 @@ jobs:
- uses: Swatinem/rust-cache@779680da715d629ac1d338a641029a2f4372abb5 # v2.8.2
with:
save-if: ${{ github.ref == 'refs/heads/main' }}
- name: "Add SSH key"
if: ${{ env.MKDOCS_INSIDERS_SSH_KEY_EXISTS == 'true' }}
uses: webfactory/ssh-agent@a6f90b1f127823b31d4d4a8d96047790581349bd # v0.9.1
with:
ssh-private-key: ${{ secrets.MKDOCS_INSIDERS_SSH_KEY }}
- name: "Install Rust toolchain"
run: rustup show
- name: Install uv
@@ -800,11 +795,7 @@ jobs:
with:
python-version: 3.13
activate-environment: true
- name: "Install Insiders dependencies"
if: ${{ env.MKDOCS_INSIDERS_SSH_KEY_EXISTS == 'true' }}
run: uv pip install -r docs/requirements-insiders.txt
- name: "Install dependencies"
if: ${{ env.MKDOCS_INSIDERS_SSH_KEY_EXISTS != 'true' }}
run: uv pip install -r docs/requirements.txt
- name: "Update README File"
run: python scripts/transform_readme.py --target mkdocs
@@ -812,12 +803,8 @@ jobs:
run: python scripts/generate_mkdocs.py
- name: "Check docs formatting"
run: python scripts/check_docs_formatted.py
- name: "Build Insiders docs"
if: ${{ env.MKDOCS_INSIDERS_SSH_KEY_EXISTS == 'true' }}
run: mkdocs build --strict -f mkdocs.insiders.yml
- name: "Build docs"
if: ${{ env.MKDOCS_INSIDERS_SSH_KEY_EXISTS != 'true' }}
run: mkdocs build --strict -f mkdocs.public.yml
run: mkdocs build --strict -f mkdocs.yml
check-formatter-instability-and-black-similarity:
name: "formatter instabilities and black similarity"

View File

@@ -20,8 +20,6 @@ on:
jobs:
mkdocs:
runs-on: ubuntu-latest
env:
MKDOCS_INSIDERS_SSH_KEY_EXISTS: ${{ secrets.MKDOCS_INSIDERS_SSH_KEY != '' }}
steps:
- uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
with:
@@ -59,23 +57,12 @@ jobs:
echo "branch_name=update-docs-$branch_display_name-$timestamp" >> "$GITHUB_ENV"
echo "timestamp=$timestamp" >> "$GITHUB_ENV"
- name: "Add SSH key"
if: ${{ env.MKDOCS_INSIDERS_SSH_KEY_EXISTS == 'true' }}
uses: webfactory/ssh-agent@a6f90b1f127823b31d4d4a8d96047790581349bd # v0.9.1
with:
ssh-private-key: ${{ secrets.MKDOCS_INSIDERS_SSH_KEY }}
- name: "Install Rust toolchain"
run: rustup show
- uses: Swatinem/rust-cache@779680da715d629ac1d338a641029a2f4372abb5 # v2.8.2
- name: "Install Insiders dependencies"
if: ${{ env.MKDOCS_INSIDERS_SSH_KEY_EXISTS == 'true' }}
run: pip install -r docs/requirements-insiders.txt
- name: "Install dependencies"
if: ${{ env.MKDOCS_INSIDERS_SSH_KEY_EXISTS != 'true' }}
run: pip install -r docs/requirements.txt
- name: "Copy README File"
@@ -83,13 +70,8 @@ jobs:
python scripts/transform_readme.py --target mkdocs
python scripts/generate_mkdocs.py
- name: "Build Insiders docs"
if: ${{ env.MKDOCS_INSIDERS_SSH_KEY_EXISTS == 'true' }}
run: mkdocs build --strict -f mkdocs.insiders.yml
- name: "Build docs"
if: ${{ env.MKDOCS_INSIDERS_SSH_KEY_EXISTS != 'true' }}
run: mkdocs build --strict -f mkdocs.public.yml
run: mkdocs build --strict -f mkdocs.yml
- name: "Clone docs repo"
run: git clone https://${{ secrets.ASTRAL_DOCS_PAT }}@github.com/astral-sh/docs.git astral-docs

View File

@@ -18,7 +18,8 @@ jobs:
environment:
name: release
permissions:
id-token: write # For PyPI's trusted publishing + PEP 740 attestations
# For PyPI's trusted publishing.
id-token: write
steps:
- name: "Install uv"
uses: astral-sh/setup-uv@1e862dfacbd1d6d858c55d9b792c756523627244 # v7.1.4
@@ -27,8 +28,5 @@ jobs:
pattern: wheels-*
path: wheels
merge-multiple: true
- uses: astral-sh/attest-action@2c727738cea36d6c97dd85eb133ea0e0e8fe754b # v0.0.4
with:
paths: wheels/*
- name: Publish to PyPi
run: uv publish -v wheels/*

View File

@@ -1,5 +1,34 @@
# Changelog
## 0.14.8
Released on 2025-12-04.
### Preview features
- \[`flake8-bugbear`\] Catch `yield` expressions within other statements (`B901`) ([#21200](https://github.com/astral-sh/ruff/pull/21200))
- \[`flake8-use-pathlib`\] Mark fixes unsafe for return type changes (`PTH104`, `PTH105`, `PTH109`, `PTH115`) ([#21440](https://github.com/astral-sh/ruff/pull/21440))
### Bug fixes
- Fix syntax error false positives for `await` outside functions ([#21763](https://github.com/astral-sh/ruff/pull/21763))
- \[`flake8-simplify`\] Fix truthiness assumption for non-iterable arguments in tuple/list/set calls (`SIM222`, `SIM223`) ([#21479](https://github.com/astral-sh/ruff/pull/21479))
### Documentation
- Suggest using `--output-file` option in GitLab integration ([#21706](https://github.com/astral-sh/ruff/pull/21706))
### Other changes
- [syntax-error] Default type parameter followed by non-default type parameter ([#21657](https://github.com/astral-sh/ruff/pull/21657))
### Contributors
- [@kieran-ryan](https://github.com/kieran-ryan)
- [@11happy](https://github.com/11happy)
- [@danparizher](https://github.com/danparizher)
- [@ntBre](https://github.com/ntBre)
## 0.14.7
Released on 2025-11-28.

View File

@@ -331,13 +331,6 @@ you addressed them.
## MkDocs
> [!NOTE]
>
> The documentation uses Material for MkDocs Insiders, which is closed-source software.
> This means only members of the Astral organization can preview the documentation exactly as it
> will appear in production.
> Outside contributors can still preview the documentation, but there will be some differences. Consult [the Material for MkDocs documentation](https://squidfunk.github.io/mkdocs-material/insiders/benefits/#features) for which features are exclusively available in the insiders version.
To preview any changes to the documentation locally:
1. Install the [Rust toolchain](https://www.rust-lang.org/tools/install).
@@ -351,11 +344,7 @@ To preview any changes to the documentation locally:
1. Run the development server with:
```shell
# For contributors.
uvx --with-requirements docs/requirements.txt -- mkdocs serve -f mkdocs.public.yml
# For members of the Astral org, which has access to MkDocs Insiders via sponsorship.
uvx --with-requirements docs/requirements-insiders.txt -- mkdocs serve -f mkdocs.insiders.yml
uvx --with-requirements docs/requirements.txt -- mkdocs serve -f mkdocs.yml
```
The documentation should then be available locally at

8
Cargo.lock generated
View File

@@ -2859,7 +2859,7 @@ dependencies = [
[[package]]
name = "ruff"
version = "0.14.7"
version = "0.14.8"
dependencies = [
"anyhow",
"argfile",
@@ -3117,13 +3117,14 @@ dependencies = [
[[package]]
name = "ruff_linter"
version = "0.14.7"
version = "0.14.8"
dependencies = [
"aho-corasick",
"anyhow",
"bitflags 2.10.0",
"clap",
"colored 3.0.0",
"compact_str",
"fern",
"glob",
"globset",
@@ -3472,7 +3473,7 @@ dependencies = [
[[package]]
name = "ruff_wasm"
version = "0.14.7"
version = "0.14.8"
dependencies = [
"console_error_panic_hook",
"console_log",
@@ -4556,6 +4557,7 @@ dependencies = [
"anyhow",
"camino",
"colored 3.0.0",
"dunce",
"insta",
"memchr",
"path-slash",

View File

@@ -272,6 +272,12 @@ large_stack_arrays = "allow"
lto = "fat"
codegen-units = 16
# Profile to build a minimally sized binary for ruff/ty
[profile.minimal-size]
inherits = "release"
opt-level = "z"
codegen-units = 1
# Some crates don't change as much but benefit more from
# more expensive optimization passes, so we selectively
# decrease codegen-units in some cases.

View File

@@ -147,8 +147,8 @@ curl -LsSf https://astral.sh/ruff/install.sh | sh
powershell -c "irm https://astral.sh/ruff/install.ps1 | iex"
# For a specific version.
curl -LsSf https://astral.sh/ruff/0.14.7/install.sh | sh
powershell -c "irm https://astral.sh/ruff/0.14.7/install.ps1 | iex"
curl -LsSf https://astral.sh/ruff/0.14.8/install.sh | sh
powershell -c "irm https://astral.sh/ruff/0.14.8/install.ps1 | iex"
```
You can also install Ruff via [Homebrew](https://formulae.brew.sh/formula/ruff), [Conda](https://anaconda.org/conda-forge/ruff),
@@ -181,7 +181,7 @@ Ruff can also be used as a [pre-commit](https://pre-commit.com/) hook via [`ruff
```yaml
- repo: https://github.com/astral-sh/ruff-pre-commit
# Ruff version.
rev: v0.14.7
rev: v0.14.8
hooks:
# Run the linter.
- id: ruff-check

View File

@@ -1,6 +1,6 @@
[package]
name = "ruff"
version = "0.14.7"
version = "0.14.8"
publish = true
authors = { workspace = true }
edition = { workspace = true }

View File

@@ -6,7 +6,8 @@ use criterion::{
use ruff_benchmark::{
LARGE_DATASET, NUMPY_CTYPESLIB, NUMPY_GLOBALS, PYDANTIC_TYPES, TestCase, UNICODE_PYPINYIN,
};
use ruff_python_parser::{Mode, TokenKind, lexer};
use ruff_python_ast::token::TokenKind;
use ruff_python_parser::{Mode, lexer};
#[cfg(target_os = "windows")]
#[global_allocator]

View File

@@ -166,28 +166,8 @@ impl Diagnostic {
/// Returns the primary message for this diagnostic.
///
/// A diagnostic always has a message, but it may be empty.
///
/// NOTE: At present, this routine will return the first primary
/// annotation's message as the primary message when the main diagnostic
/// message is empty. This is meant to facilitate an incremental migration
/// in ty over to the new diagnostic data model. (The old data model
/// didn't distinguish between messages on the entire diagnostic and
/// messages attached to a particular span.)
pub fn primary_message(&self) -> &str {
if !self.inner.message.as_str().is_empty() {
return self.inner.message.as_str();
}
// FIXME: As a special case, while we're migrating ty
// to the new diagnostic data model, we'll look for a primary
// message from the primary annotation. This is because most
// ty diagnostics are created with an empty diagnostic
// message and instead attach the message to the annotation.
// Fixing this will require touching basically every diagnostic
// in ty, so we do it this way for now to match the old
// semantics. ---AG
self.primary_annotation()
.and_then(|ann| ann.get_message())
.unwrap_or_default()
self.inner.message.as_str()
}
/// Introspects this diagnostic and returns what kind of "primary" message
@@ -199,18 +179,6 @@ impl Diagnostic {
/// contains *essential* information or context for understanding the
/// diagnostic.
///
/// The reason why we don't just always return both the main diagnostic
/// message and the primary annotation message is because this was written
/// in the midst of an incremental migration of ty over to the new
/// diagnostic data model. At time of writing, diagnostics were still
/// constructed in the old model where the main diagnostic message and the
/// primary annotation message were not distinguished from each other. So
/// for now, we carefully return what kind of messages this diagnostic
/// contains. In effect, if this diagnostic has a non-empty main message
/// *and* a non-empty primary annotation message, then the diagnostic is
/// 100% using the new diagnostic data model and we can format things
/// appropriately.
///
/// The type returned implements the `std::fmt::Display` trait. In most
/// cases, just converting it to a string (or printing it) will do what
/// you want.
@@ -224,11 +192,10 @@ impl Diagnostic {
.primary_annotation()
.and_then(|ann| ann.get_message())
.unwrap_or_default();
match (main.is_empty(), annotation.is_empty()) {
(false, true) => ConciseMessage::MainDiagnostic(main),
(true, false) => ConciseMessage::PrimaryAnnotation(annotation),
(false, false) => ConciseMessage::Both { main, annotation },
(true, true) => ConciseMessage::Empty,
if annotation.is_empty() {
ConciseMessage::MainDiagnostic(main)
} else {
ConciseMessage::Both { main, annotation }
}
}
@@ -693,18 +660,6 @@ impl SubDiagnostic {
/// contains *essential* information or context for understanding the
/// diagnostic.
///
/// The reason why we don't just always return both the main diagnostic
/// message and the primary annotation message is because this was written
/// in the midst of an incremental migration of ty over to the new
/// diagnostic data model. At time of writing, diagnostics were still
/// constructed in the old model where the main diagnostic message and the
/// primary annotation message were not distinguished from each other. So
/// for now, we carefully return what kind of messages this diagnostic
/// contains. In effect, if this diagnostic has a non-empty main message
/// *and* a non-empty primary annotation message, then the diagnostic is
/// 100% using the new diagnostic data model and we can format things
/// appropriately.
///
/// The type returned implements the `std::fmt::Display` trait. In most
/// cases, just converting it to a string (or printing it) will do what
/// you want.
@@ -714,11 +669,10 @@ impl SubDiagnostic {
.primary_annotation()
.and_then(|ann| ann.get_message())
.unwrap_or_default();
match (main.is_empty(), annotation.is_empty()) {
(false, true) => ConciseMessage::MainDiagnostic(main),
(true, false) => ConciseMessage::PrimaryAnnotation(annotation),
(false, false) => ConciseMessage::Both { main, annotation },
(true, true) => ConciseMessage::Empty,
if annotation.is_empty() {
ConciseMessage::MainDiagnostic(main)
} else {
ConciseMessage::Both { main, annotation }
}
}
}
@@ -888,6 +842,10 @@ impl Annotation {
pub fn hide_snippet(&mut self, yes: bool) {
self.hide_snippet = yes;
}
pub fn is_primary(&self) -> bool {
self.is_primary
}
}
/// Tags that can be associated with an annotation.
@@ -1508,28 +1466,10 @@ pub enum DiagnosticFormat {
pub enum ConciseMessage<'a> {
/// A diagnostic contains a non-empty main message and an empty
/// primary annotation message.
///
/// This strongly suggests that the diagnostic is using the
/// "new" data model.
MainDiagnostic(&'a str),
/// A diagnostic contains an empty main message and a non-empty
/// primary annotation message.
///
/// This strongly suggests that the diagnostic is using the
/// "old" data model.
PrimaryAnnotation(&'a str),
/// A diagnostic contains a non-empty main message and a non-empty
/// primary annotation message.
///
/// This strongly suggests that the diagnostic is using the
/// "new" data model.
Both { main: &'a str, annotation: &'a str },
/// A diagnostic contains an empty main message and an empty
/// primary annotation message.
///
/// This indicates that the diagnostic is probably using the old
/// model.
Empty,
/// A custom concise message has been provided.
Custom(&'a str),
}
@@ -1540,13 +1480,9 @@ impl std::fmt::Display for ConciseMessage<'_> {
ConciseMessage::MainDiagnostic(main) => {
write!(f, "{main}")
}
ConciseMessage::PrimaryAnnotation(annotation) => {
write!(f, "{annotation}")
}
ConciseMessage::Both { main, annotation } => {
write!(f, "{main}: {annotation}")
}
ConciseMessage::Empty => Ok(()),
ConciseMessage::Custom(message) => {
write!(f, "{message}")
}

View File

@@ -21,7 +21,11 @@ use crate::source::source_text;
/// reflected in the changed AST offsets.
/// The other reason is that Ruff's AST doesn't implement `Eq` which Salsa requires
/// for determining if a query result is unchanged.
#[salsa::tracked(returns(ref), no_eq, heap_size=ruff_memory_usage::heap_size)]
///
/// The LRU capacity of 200 was picked without any empirical evidence that it's optimal,
/// instead it's a wild guess that it should be unlikely that incremental changes involve
/// more than 200 modules. Parsed ASTs within the same revision are never evicted by Salsa.
#[salsa::tracked(returns(ref), no_eq, heap_size=ruff_memory_usage::heap_size, lru=200)]
pub fn parsed_module(db: &dyn Db, file: File) -> ParsedModule {
let _span = tracing::trace_span!("parsed_module", ?file).entered();
@@ -92,14 +96,9 @@ impl ParsedModule {
self.inner.store(None);
}
/// Returns the pointer address of this [`ParsedModule`].
///
/// The pointer uniquely identifies the module within the current Salsa revision,
/// regardless of whether particular [`ParsedModuleRef`] instances are garbage collected.
pub fn addr(&self) -> usize {
// Note that the outer `Arc` in `inner` is stable across garbage collection, while the inner
// `Arc` within the `ArcSwap` may change.
Arc::as_ptr(&self.inner).addr()
/// Returns the file to which this module belongs.
pub fn file(&self) -> File {
self.file
}
}

View File

@@ -667,6 +667,13 @@ impl Deref for SystemPathBuf {
}
}
impl AsRef<Path> for SystemPathBuf {
#[inline]
fn as_ref(&self) -> &Path {
self.0.as_std_path()
}
}
impl<P: AsRef<SystemPath>> FromIterator<P> for SystemPathBuf {
fn from_iter<I: IntoIterator<Item = P>>(iter: I) -> Self {
let mut buf = SystemPathBuf::new();

View File

@@ -49,7 +49,7 @@ impl ModuleImports {
// Resolve the imports.
let mut resolved_imports = ModuleImports::default();
for import in imports {
for resolved in Resolver::new(db).resolve(import) {
for resolved in Resolver::new(db, path).resolve(import) {
if let Some(path) = resolved.as_system_path() {
resolved_imports.insert(path.to_path_buf());
}

View File

@@ -1,5 +1,9 @@
use ruff_db::files::FilePath;
use ty_python_semantic::{ModuleName, resolve_module, resolve_real_module};
use ruff_db::files::{File, FilePath, system_path_to_file};
use ruff_db::system::SystemPath;
use ty_python_semantic::{
ModuleName, resolve_module, resolve_module_confident, resolve_real_module,
resolve_real_module_confident,
};
use crate::ModuleDb;
use crate::collector::CollectedImport;
@@ -7,12 +11,15 @@ use crate::collector::CollectedImport;
/// Collect all imports for a given Python file.
pub(crate) struct Resolver<'a> {
db: &'a ModuleDb,
file: Option<File>,
}
impl<'a> Resolver<'a> {
/// Initialize a [`Resolver`] with a given [`ModuleDb`].
pub(crate) fn new(db: &'a ModuleDb) -> Self {
Self { db }
pub(crate) fn new(db: &'a ModuleDb, path: &SystemPath) -> Self {
// If we know the importing file we can potentially resolve more imports
let file = system_path_to_file(db, path).ok();
Self { db, file }
}
/// Resolve the [`CollectedImport`] into a [`FilePath`].
@@ -70,13 +77,21 @@ impl<'a> Resolver<'a> {
/// Resolves a module name to a module.
pub(crate) fn resolve_module(&self, module_name: &ModuleName) -> Option<&'a FilePath> {
let module = resolve_module(self.db, module_name)?;
let module = if let Some(file) = self.file {
resolve_module(self.db, file, module_name)?
} else {
resolve_module_confident(self.db, module_name)?
};
Some(module.file(self.db)?.path(self.db))
}
/// Resolves a module name to a module (stubs not allowed).
fn resolve_real_module(&self, module_name: &ModuleName) -> Option<&'a FilePath> {
let module = resolve_real_module(self.db, module_name)?;
let module = if let Some(file) = self.file {
resolve_real_module(self.db, file, module_name)?
} else {
resolve_real_module_confident(self.db, module_name)?
};
Some(module.file(self.db)?.path(self.db))
}
}

View File

@@ -1,6 +1,6 @@
[package]
name = "ruff_linter"
version = "0.14.7"
version = "0.14.8"
publish = false
authors = { workspace = true }
edition = { workspace = true }
@@ -35,6 +35,7 @@ anyhow = { workspace = true }
bitflags = { workspace = true }
clap = { workspace = true, features = ["derive", "string"], optional = true }
colored = { workspace = true }
compact_str = { workspace = true }
fern = { workspace = true }
glob = { workspace = true }
globset = { workspace = true }

View File

@@ -28,9 +28,11 @@ yaml.load("{}", SafeLoader)
yaml.load("{}", yaml.SafeLoader)
yaml.load("{}", CSafeLoader)
yaml.load("{}", yaml.CSafeLoader)
yaml.load("{}", yaml.cyaml.CSafeLoader)
yaml.load("{}", NewSafeLoader)
yaml.load("{}", Loader=SafeLoader)
yaml.load("{}", Loader=yaml.SafeLoader)
yaml.load("{}", Loader=CSafeLoader)
yaml.load("{}", Loader=yaml.CSafeLoader)
yaml.load("{}", Loader=yaml.cyaml.CSafeLoader)
yaml.load("{}", Loader=NewSafeLoader)

View File

@@ -52,16 +52,16 @@ def not_broken5():
yield inner()
def not_broken6():
def broken3():
return (yield from [])
def not_broken7():
def broken4():
x = yield from []
return x
def not_broken8():
def broken5():
x = None
def inner(ex):
@@ -76,3 +76,13 @@ class NotBroken9(object):
def __await__(self):
yield from function()
return 42
async def broken6():
yield 1
return foo()
async def broken7():
yield 1
return [1, 2, 3]

View File

@@ -17,3 +17,24 @@ def _():
# Valid yield scope
yield 3
# await is valid in any generator, sync or async
(await cor async for cor in f()) # ok
(await cor for cor in f()) # ok
# but not in comprehensions
[await cor async for cor in f()] # F704
{await cor async for cor in f()} # F704
{await cor: 1 async for cor in f()} # F704
[await cor for cor in f()] # F704
{await cor for cor in f()} # F704
{await cor: 1 for cor in f()} # F704
# or in the iterator of an async generator, which is evaluated in the parent
# scope
(cor async for cor in await f()) # F704
(await cor async for cor in [await c for c in f()]) # F704
# this is also okay because the comprehension is within the generator scope
([await c for c in cor] async for cor in f()) # ok

View File

@@ -3,3 +3,5 @@ def func():
# Top-level await
await 1
([await c for c in cor] async for cor in func()) # ok

View File

@@ -0,0 +1,24 @@
async def gen():
yield 1
return 42
def gen(): # B901 but not a syntax error - not an async generator
yield 1
return 42
async def gen(): # ok - no value in return
yield 1
return
async def gen():
yield 1
return foo()
async def gen():
yield 1
return [1, 2, 3]
async def gen():
if True:
yield 1
return 10

View File

@@ -35,6 +35,7 @@ use ruff_python_ast::helpers::{collect_import_from_member, is_docstring_stmt, to
use ruff_python_ast::identifier::Identifier;
use ruff_python_ast::name::QualifiedName;
use ruff_python_ast::str::Quote;
use ruff_python_ast::token::Tokens;
use ruff_python_ast::visitor::{Visitor, walk_except_handler, walk_pattern};
use ruff_python_ast::{
self as ast, AnyParameterRef, ArgOrKeyword, Comprehension, ElifElseClause, ExceptHandler, Expr,
@@ -48,7 +49,7 @@ use ruff_python_parser::semantic_errors::{
SemanticSyntaxChecker, SemanticSyntaxContext, SemanticSyntaxError, SemanticSyntaxErrorKind,
};
use ruff_python_parser::typing::{AnnotationKind, ParsedAnnotation, parse_type_annotation};
use ruff_python_parser::{ParseError, Parsed, Tokens};
use ruff_python_parser::{ParseError, Parsed};
use ruff_python_semantic::all::{DunderAllDefinition, DunderAllFlags};
use ruff_python_semantic::analyze::{imports, typing};
use ruff_python_semantic::{
@@ -68,6 +69,7 @@ use crate::noqa::NoqaMapping;
use crate::package::PackageRoot;
use crate::preview::is_undefined_export_in_dunder_init_enabled;
use crate::registry::Rule;
use crate::rules::flake8_bugbear::rules::ReturnInGenerator;
use crate::rules::pyflakes::rules::{
LateFutureImport, MultipleStarredExpressions, ReturnOutsideFunction,
UndefinedLocalWithNestedImportStarUsage, YieldOutsideFunction,
@@ -728,6 +730,12 @@ impl SemanticSyntaxContext for Checker<'_> {
self.report_diagnostic(NonlocalWithoutBinding { name }, error.range);
}
}
SemanticSyntaxErrorKind::ReturnInGenerator => {
// B901
if self.is_rule_enabled(Rule::ReturnInGenerator) {
self.report_diagnostic(ReturnInGenerator, error.range);
}
}
SemanticSyntaxErrorKind::ReboundComprehensionVariable
| SemanticSyntaxErrorKind::DuplicateTypeParameter
| SemanticSyntaxErrorKind::MultipleCaseAssignment(_)
@@ -746,6 +754,7 @@ impl SemanticSyntaxContext for Checker<'_> {
| SemanticSyntaxErrorKind::LoadBeforeNonlocalDeclaration { .. }
| SemanticSyntaxErrorKind::NonlocalAndGlobal(_)
| SemanticSyntaxErrorKind::AnnotatedGlobal(_)
| SemanticSyntaxErrorKind::TypeParameterDefaultOrder(_)
| SemanticSyntaxErrorKind::AnnotatedNonlocal(_) => {
self.semantic_errors.borrow_mut().push(error);
}
@@ -779,6 +788,10 @@ impl SemanticSyntaxContext for Checker<'_> {
match scope.kind {
ScopeKind::Class(_) => return false,
ScopeKind::Function(_) | ScopeKind::Lambda(_) => return true,
ScopeKind::Generator {
kind: GeneratorKind::Generator,
..
} => return true,
ScopeKind::Generator { .. }
| ScopeKind::Module
| ScopeKind::Type
@@ -828,14 +841,19 @@ impl SemanticSyntaxContext for Checker<'_> {
self.source_type.is_ipynb()
}
fn in_generator_scope(&self) -> bool {
matches!(
&self.semantic.current_scope().kind,
ScopeKind::Generator {
kind: GeneratorKind::Generator,
..
fn in_generator_context(&self) -> bool {
for scope in self.semantic.current_scopes() {
if matches!(
scope.kind,
ScopeKind::Generator {
kind: GeneratorKind::Generator,
..
}
) {
return true;
}
)
}
false
}
fn in_loop_context(&self) -> bool {

View File

@@ -1,6 +1,6 @@
use ruff_python_ast::token::{TokenKind, Tokens};
use ruff_python_codegen::Stylist;
use ruff_python_index::Indexer;
use ruff_python_parser::{TokenKind, Tokens};
use ruff_source_file::LineRanges;
use ruff_text_size::{Ranged, TextRange};

View File

@@ -4,9 +4,9 @@ use std::path::Path;
use ruff_notebook::CellOffsets;
use ruff_python_ast::PySourceType;
use ruff_python_ast::token::Tokens;
use ruff_python_codegen::Stylist;
use ruff_python_index::Indexer;
use ruff_python_parser::Tokens;
use crate::Locator;
use crate::directives::TodoComment;

View File

@@ -5,8 +5,8 @@ use std::str::FromStr;
use bitflags::bitflags;
use ruff_python_ast::token::{TokenKind, Tokens};
use ruff_python_index::Indexer;
use ruff_python_parser::{TokenKind, Tokens};
use ruff_python_trivia::CommentRanges;
use ruff_source_file::LineRanges;
use ruff_text_size::{Ranged, TextLen, TextRange, TextSize};

View File

@@ -5,8 +5,8 @@ use std::iter::FusedIterator;
use std::slice::Iter;
use ruff_python_ast::statement_visitor::{StatementVisitor, walk_stmt};
use ruff_python_ast::token::{Token, TokenKind, Tokens};
use ruff_python_ast::{self as ast, Stmt, Suite};
use ruff_python_parser::{Token, TokenKind, Tokens};
use ruff_source_file::UniversalNewlineIterator;
use ruff_text_size::{Ranged, TextSize};

View File

@@ -9,10 +9,11 @@ use anyhow::Result;
use libcst_native as cst;
use ruff_diagnostics::Edit;
use ruff_python_ast::token::Tokens;
use ruff_python_ast::{self as ast, Expr, ModModule, Stmt};
use ruff_python_codegen::Stylist;
use ruff_python_importer::Insertion;
use ruff_python_parser::{Parsed, Tokens};
use ruff_python_parser::Parsed;
use ruff_python_semantic::{
ImportedName, MemberNameImport, ModuleNameImport, NameImport, SemanticModel,
};

View File

@@ -46,6 +46,7 @@ pub mod rule_selector;
pub mod rules;
pub mod settings;
pub mod source_kind;
pub mod suppression;
mod text_helpers;
pub mod upstream_categories;
mod violation;

View File

@@ -1043,6 +1043,7 @@ mod tests {
Rule::YieldFromInAsyncFunction,
Path::new("yield_from_in_async_function.py")
)]
#[test_case(Rule::ReturnInGenerator, Path::new("return_in_generator.py"))]
fn test_syntax_errors(rule: Rule, path: &Path) -> Result<()> {
let snapshot = path.to_string_lossy().to_string();
let path = Path::new("resources/test/fixtures/syntax_errors").join(path);

View File

@@ -75,6 +75,7 @@ pub(crate) fn unsafe_yaml_load(checker: &Checker, call: &ast::ExprCall) {
qualified_name.segments(),
["yaml", "SafeLoader" | "CSafeLoader"]
| ["yaml", "loader", "SafeLoader" | "CSafeLoader"]
| ["yaml", "cyaml", "CSafeLoader"]
)
})
{

View File

@@ -1,6 +1,5 @@
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_ast::statement_visitor;
use ruff_python_ast::statement_visitor::StatementVisitor;
use ruff_python_ast::visitor::{Visitor, walk_expr, walk_stmt};
use ruff_python_ast::{self as ast, Expr, Stmt, StmtFunctionDef};
use ruff_text_size::TextRange;
@@ -96,6 +95,11 @@ pub(crate) fn return_in_generator(checker: &Checker, function_def: &StmtFunction
return;
}
// Async functions are flagged by the `ReturnInGenerator` semantic syntax error.
if function_def.is_async {
return;
}
let mut visitor = ReturnInGeneratorVisitor::default();
visitor.visit_body(&function_def.body);
@@ -112,15 +116,9 @@ struct ReturnInGeneratorVisitor {
has_yield: bool,
}
impl StatementVisitor<'_> for ReturnInGeneratorVisitor {
impl Visitor<'_> for ReturnInGeneratorVisitor {
fn visit_stmt(&mut self, stmt: &Stmt) {
match stmt {
Stmt::Expr(ast::StmtExpr { value, .. }) => match **value {
Expr::Yield(_) | Expr::YieldFrom(_) => {
self.has_yield = true;
}
_ => {}
},
Stmt::FunctionDef(_) => {
// Do not recurse into nested functions; they're evaluated separately.
}
@@ -130,8 +128,19 @@ impl StatementVisitor<'_> for ReturnInGeneratorVisitor {
node_index: _,
}) => {
self.return_ = Some(*range);
walk_stmt(self, stmt);
}
_ => statement_visitor::walk_stmt(self, stmt),
_ => walk_stmt(self, stmt),
}
}
fn visit_expr(&mut self, expr: &Expr) {
match expr {
Expr::Lambda(_) => {}
Expr::Yield(_) | Expr::YieldFrom(_) => {
self.has_yield = true;
}
_ => walk_expr(self, expr),
}
}
}

View File

@@ -21,3 +21,46 @@ B901 Using `yield` and `return {value}` in a generator function can lead to conf
37 |
38 | yield from not_broken()
|
B901 Using `yield` and `return {value}` in a generator function can lead to confusing behavior
--> B901.py:56:5
|
55 | def broken3():
56 | return (yield from [])
| ^^^^^^^^^^^^^^^^^^^^^^
|
B901 Using `yield` and `return {value}` in a generator function can lead to confusing behavior
--> B901.py:61:5
|
59 | def broken4():
60 | x = yield from []
61 | return x
| ^^^^^^^^
|
B901 Using `yield` and `return {value}` in a generator function can lead to confusing behavior
--> B901.py:72:5
|
71 | inner((yield from []))
72 | return x
| ^^^^^^^^
|
B901 Using `yield` and `return {value}` in a generator function can lead to confusing behavior
--> B901.py:83:5
|
81 | async def broken6():
82 | yield 1
83 | return foo()
| ^^^^^^^^^^^^
|
B901 Using `yield` and `return {value}` in a generator function can lead to confusing behavior
--> B901.py:88:5
|
86 | async def broken7():
87 | yield 1
88 | return [1, 2, 3]
| ^^^^^^^^^^^^^^^^
|

View File

@@ -1,6 +1,6 @@
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_ast::token::{TokenKind, Tokens};
use ruff_python_index::Indexer;
use ruff_python_parser::{TokenKind, Tokens};
use ruff_text_size::{Ranged, TextRange};
use crate::Locator;

View File

@@ -3,7 +3,7 @@ use ruff_python_ast as ast;
use ruff_python_ast::ExprGenerator;
use ruff_python_ast::comparable::ComparableExpr;
use ruff_python_ast::parenthesize::parenthesized_range;
use ruff_python_parser::TokenKind;
use ruff_python_ast::token::TokenKind;
use ruff_text_size::{Ranged, TextRange, TextSize};
use crate::checkers::ast::Checker;

View File

@@ -3,7 +3,7 @@ use ruff_python_ast as ast;
use ruff_python_ast::ExprGenerator;
use ruff_python_ast::comparable::ComparableExpr;
use ruff_python_ast::parenthesize::parenthesized_range;
use ruff_python_parser::TokenKind;
use ruff_python_ast::token::TokenKind;
use ruff_text_size::{Ranged, TextRange, TextSize};
use crate::checkers::ast::Checker;

View File

@@ -1,7 +1,7 @@
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_ast as ast;
use ruff_python_ast::parenthesize::parenthesized_range;
use ruff_python_parser::TokenKind;
use ruff_python_ast::token::TokenKind;
use ruff_text_size::{Ranged, TextRange, TextSize};
use crate::checkers::ast::Checker;

View File

@@ -3,8 +3,8 @@ use std::borrow::Cow;
use itertools::Itertools;
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_ast::StringFlags;
use ruff_python_ast::token::{Token, TokenKind, Tokens};
use ruff_python_index::Indexer;
use ruff_python_parser::{Token, TokenKind, Tokens};
use ruff_source_file::LineRanges;
use ruff_text_size::{Ranged, TextLen, TextRange};

View File

@@ -1,6 +1,6 @@
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_ast::token::{TokenKind, Tokens};
use ruff_python_ast::{self as ast, Expr};
use ruff_python_parser::{TokenKind, Tokens};
use ruff_text_size::{Ranged, TextLen, TextSize};
use crate::checkers::ast::Checker;

View File

@@ -4,10 +4,10 @@ use ruff_diagnostics::Applicability;
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_ast::helpers::{is_const_false, is_const_true};
use ruff_python_ast::stmt_if::elif_else_range;
use ruff_python_ast::token::TokenKind;
use ruff_python_ast::visitor::Visitor;
use ruff_python_ast::whitespace::indentation;
use ruff_python_ast::{self as ast, Decorator, ElifElseClause, Expr, Stmt};
use ruff_python_parser::TokenKind;
use ruff_python_semantic::SemanticModel;
use ruff_python_semantic::analyze::visibility::is_property;
use ruff_python_trivia::{SimpleTokenKind, SimpleTokenizer, is_python_whitespace};

View File

@@ -1,5 +1,5 @@
use ruff_python_ast::token::Tokens;
use ruff_python_ast::{self as ast, Stmt};
use ruff_python_parser::Tokens;
use ruff_source_file::LineRanges;
use ruff_text_size::{Ranged, TextRange};

View File

@@ -1,5 +1,5 @@
use ruff_python_ast::Stmt;
use ruff_python_parser::{TokenKind, Tokens};
use ruff_python_ast::token::{TokenKind, Tokens};
use ruff_python_trivia::PythonWhitespace;
use ruff_source_file::UniversalNewlines;
use ruff_text_size::Ranged;

View File

@@ -11,8 +11,8 @@ use comments::Comment;
use normalize::normalize_imports;
use order::order_imports;
use ruff_python_ast::PySourceType;
use ruff_python_ast::token::Tokens;
use ruff_python_codegen::Stylist;
use ruff_python_parser::Tokens;
use settings::Settings;
use types::EitherImport::{Import, ImportFrom};
use types::{AliasData, ImportBlock, TrailingComma};

View File

@@ -1,11 +1,11 @@
use itertools::{EitherOrBoth, Itertools};
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_ast::token::Tokens;
use ruff_python_ast::whitespace::trailing_lines_end;
use ruff_python_ast::{PySourceType, PythonVersion, Stmt};
use ruff_python_codegen::Stylist;
use ruff_python_index::Indexer;
use ruff_python_parser::Tokens;
use ruff_python_trivia::{PythonWhitespace, leading_indentation, textwrap::indent};
use ruff_source_file::{LineRanges, UniversalNewlines};
use ruff_text_size::{Ranged, TextRange};

View File

@@ -1,4 +1,4 @@
use ruff_python_parser::TokenKind;
use ruff_python_ast::token::TokenKind;
/// Returns `true` if the name should be considered "ambiguous".
pub(super) fn is_ambiguous_name(name: &str) -> bool {

View File

@@ -8,10 +8,10 @@ use itertools::Itertools;
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_notebook::CellOffsets;
use ruff_python_ast::PySourceType;
use ruff_python_ast::token::TokenIterWithContext;
use ruff_python_ast::token::TokenKind;
use ruff_python_ast::token::Tokens;
use ruff_python_codegen::Stylist;
use ruff_python_parser::TokenIterWithContext;
use ruff_python_parser::TokenKind;
use ruff_python_parser::Tokens;
use ruff_python_trivia::PythonWhitespace;
use ruff_source_file::{LineRanges, UniversalNewlines};
use ruff_text_size::TextRange;

View File

@@ -1,8 +1,8 @@
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_notebook::CellOffsets;
use ruff_python_ast::PySourceType;
use ruff_python_ast::token::{TokenIterWithContext, TokenKind, Tokens};
use ruff_python_index::Indexer;
use ruff_python_parser::{TokenIterWithContext, TokenKind, Tokens};
use ruff_text_size::{Ranged, TextSize};
use crate::Locator;

View File

@@ -1,5 +1,5 @@
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_parser::TokenKind;
use ruff_python_ast::token::TokenKind;
use ruff_text_size::{Ranged, TextRange};
use crate::AlwaysFixableViolation;

View File

@@ -1,5 +1,5 @@
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_parser::TokenKind;
use ruff_python_ast::token::TokenKind;
use ruff_text_size::TextRange;
use crate::Violation;

View File

@@ -1,5 +1,5 @@
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_parser::TokenKind;
use ruff_python_ast::token::TokenKind;
use ruff_text_size::Ranged;
use crate::Edit;

View File

@@ -1,5 +1,5 @@
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_parser::TokenKind;
use ruff_python_ast::token::TokenKind;
use ruff_text_size::Ranged;
use crate::checkers::ast::LintContext;

View File

@@ -1,5 +1,5 @@
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_parser::TokenKind;
use ruff_python_ast::token::TokenKind;
use ruff_text_size::{Ranged, TextRange};
use crate::checkers::ast::LintContext;

View File

@@ -9,7 +9,7 @@ pub(crate) use missing_whitespace::*;
pub(crate) use missing_whitespace_after_keyword::*;
pub(crate) use missing_whitespace_around_operator::*;
pub(crate) use redundant_backslash::*;
use ruff_python_parser::{TokenKind, Tokens};
use ruff_python_ast::token::{TokenKind, Tokens};
use ruff_python_trivia::is_python_whitespace;
use ruff_text_size::{Ranged, TextLen, TextRange, TextSize};
pub(crate) use space_around_operator::*;

View File

@@ -1,6 +1,6 @@
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_ast::token::TokenKind;
use ruff_python_index::Indexer;
use ruff_python_parser::TokenKind;
use ruff_source_file::LineRanges;
use ruff_text_size::{Ranged, TextRange, TextSize};

View File

@@ -1,5 +1,5 @@
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_parser::TokenKind;
use ruff_python_ast::token::TokenKind;
use ruff_text_size::{Ranged, TextRange};
use crate::checkers::ast::LintContext;

View File

@@ -1,5 +1,5 @@
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_parser::TokenKind;
use ruff_python_ast::token::TokenKind;
use ruff_text_size::{Ranged, TextRange, TextSize};
use crate::checkers::ast::LintContext;

View File

@@ -1,5 +1,5 @@
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_parser::TokenKind;
use ruff_python_ast::token::TokenKind;
use ruff_python_trivia::PythonWhitespace;
use ruff_source_file::LineRanges;
use ruff_text_size::{Ranged, TextLen, TextRange, TextSize};

View File

@@ -1,5 +1,5 @@
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_parser::TokenKind;
use ruff_python_ast::token::TokenKind;
use ruff_text_size::{Ranged, TextRange, TextSize};
use crate::checkers::ast::LintContext;

View File

@@ -3,7 +3,7 @@ use std::iter::Peekable;
use itertools::Itertools;
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_notebook::CellOffsets;
use ruff_python_parser::{Token, TokenKind, Tokens};
use ruff_python_ast::token::{Token, TokenKind, Tokens};
use ruff_text_size::{Ranged, TextRange, TextSize};
use crate::{AlwaysFixableViolation, Edit, Fix, checkers::ast::LintContext};

View File

@@ -2,8 +2,8 @@ use anyhow::{Error, bail};
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_ast::helpers;
use ruff_python_ast::token::{TokenKind, Tokens};
use ruff_python_ast::{CmpOp, Expr};
use ruff_python_parser::{TokenKind, Tokens};
use ruff_text_size::{Ranged, TextRange};
use crate::checkers::ast::Checker;

View File

@@ -3,8 +3,8 @@ use itertools::Itertools;
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_ast::helpers::contains_effect;
use ruff_python_ast::parenthesize::parenthesized_range;
use ruff_python_ast::token::{TokenKind, Tokens};
use ruff_python_ast::{self as ast, Stmt};
use ruff_python_parser::{TokenKind, Tokens};
use ruff_python_semantic::Binding;
use ruff_text_size::{Ranged, TextRange, TextSize};

View File

@@ -37,3 +37,88 @@ F704 `await` statement outside of a function
12 |
13 | def _():
|
F704 `await` statement outside of a function
--> F704.py:27:2
|
26 | # but not in comprehensions
27 | [await cor async for cor in f()] # F704
| ^^^^^^^^^
28 | {await cor async for cor in f()} # F704
29 | {await cor: 1 async for cor in f()} # F704
|
F704 `await` statement outside of a function
--> F704.py:28:2
|
26 | # but not in comprehensions
27 | [await cor async for cor in f()] # F704
28 | {await cor async for cor in f()} # F704
| ^^^^^^^^^
29 | {await cor: 1 async for cor in f()} # F704
30 | [await cor for cor in f()] # F704
|
F704 `await` statement outside of a function
--> F704.py:29:2
|
27 | [await cor async for cor in f()] # F704
28 | {await cor async for cor in f()} # F704
29 | {await cor: 1 async for cor in f()} # F704
| ^^^^^^^^^
30 | [await cor for cor in f()] # F704
31 | {await cor for cor in f()} # F704
|
F704 `await` statement outside of a function
--> F704.py:30:2
|
28 | {await cor async for cor in f()} # F704
29 | {await cor: 1 async for cor in f()} # F704
30 | [await cor for cor in f()] # F704
| ^^^^^^^^^
31 | {await cor for cor in f()} # F704
32 | {await cor: 1 for cor in f()} # F704
|
F704 `await` statement outside of a function
--> F704.py:31:2
|
29 | {await cor: 1 async for cor in f()} # F704
30 | [await cor for cor in f()] # F704
31 | {await cor for cor in f()} # F704
| ^^^^^^^^^
32 | {await cor: 1 for cor in f()} # F704
|
F704 `await` statement outside of a function
--> F704.py:32:2
|
30 | [await cor for cor in f()] # F704
31 | {await cor for cor in f()} # F704
32 | {await cor: 1 for cor in f()} # F704
| ^^^^^^^^^
33 |
34 | # or in the iterator of an async generator, which is evaluated in the parent
|
F704 `await` statement outside of a function
--> F704.py:36:23
|
34 | # or in the iterator of an async generator, which is evaluated in the parent
35 | # scope
36 | (cor async for cor in await f()) # F704
| ^^^^^^^^^
37 | (await cor async for cor in [await c for c in f()]) # F704
|
F704 `await` statement outside of a function
--> F704.py:37:30
|
35 | # scope
36 | (cor async for cor in await f()) # F704
37 | (await cor async for cor in [await c for c in f()]) # F704
| ^^^^^^^
38 |
39 | # this is also okay because the comprehension is within the generator scope
|

View File

@@ -1,5 +1,5 @@
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_parser::{Token, TokenKind};
use ruff_python_ast::token::{Token, TokenKind};
use ruff_text_size::{Ranged, TextLen, TextRange, TextSize};
use crate::Locator;

View File

@@ -1,5 +1,5 @@
use ruff_python_ast::StmtImportFrom;
use ruff_python_parser::{TokenKind, Tokens};
use ruff_python_ast::token::{TokenKind, Tokens};
use ruff_text_size::{Ranged, TextRange};
use crate::Locator;

View File

@@ -1,10 +1,10 @@
use itertools::Itertools;
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_ast::token::Tokens;
use ruff_python_ast::whitespace::indentation;
use ruff_python_ast::{Alias, StmtImportFrom, StmtRef};
use ruff_python_codegen::Stylist;
use ruff_python_parser::Tokens;
use ruff_text_size::Ranged;
use crate::Locator;

View File

@@ -1,7 +1,7 @@
use std::slice::Iter;
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_parser::{Token, TokenKind, Tokens};
use ruff_python_ast::token::{Token, TokenKind, Tokens};
use ruff_text_size::{Ranged, TextRange};
use crate::Locator;

View File

@@ -6,11 +6,11 @@ use rustc_hash::{FxHashMap, FxHashSet};
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_ast::helpers::any_over_expr;
use ruff_python_ast::str::{leading_quote, trailing_quote};
use ruff_python_ast::token::TokenKind;
use ruff_python_ast::{self as ast, Expr, Keyword, StringFlags};
use ruff_python_literal::format::{
FieldName, FieldNamePart, FieldType, FormatPart, FormatString, FromTemplate,
};
use ruff_python_parser::TokenKind;
use ruff_source_file::LineRanges;
use ruff_text_size::{Ranged, TextRange};

View File

@@ -3,12 +3,12 @@ use std::fmt::Write;
use std::str::FromStr;
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_ast::token::TokenKind;
use ruff_python_ast::{self as ast, AnyStringFlags, Expr, StringFlags, whitespace::indentation};
use ruff_python_codegen::Stylist;
use ruff_python_literal::cformat::{
CConversionFlags, CFormatPart, CFormatPrecision, CFormatQuantity, CFormatString,
};
use ruff_python_parser::TokenKind;
use ruff_python_stdlib::identifiers::is_identifier;
use ruff_source_file::LineRanges;
use ruff_text_size::{Ranged, TextRange};

View File

@@ -1,6 +1,6 @@
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_ast::Stmt;
use ruff_python_parser::TokenKind;
use ruff_python_ast::token::TokenKind;
use ruff_python_semantic::SemanticModel;
use ruff_source_file::LineRanges;
use ruff_text_size::{TextLen, TextRange, TextSize};

View File

@@ -1,7 +1,7 @@
use anyhow::Result;
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_ast::token::{TokenKind, Tokens};
use ruff_python_ast::{self as ast, Expr};
use ruff_python_parser::{TokenKind, Tokens};
use ruff_python_stdlib::open_mode::OpenMode;
use ruff_text_size::{Ranged, TextSize};

View File

@@ -1,8 +1,8 @@
use std::fmt::Write as _;
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_ast::token::{TokenKind, Tokens};
use ruff_python_ast::{self as ast, Arguments, Expr, Keyword};
use ruff_python_parser::{TokenKind, Tokens};
use ruff_text_size::{Ranged, TextRange};
use crate::Locator;

View File

@@ -4,8 +4,8 @@ use anyhow::Result;
use libcst_native::{LeftParen, ParenthesizedNode, RightParen};
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_ast::token::TokenKind;
use ruff_python_ast::{self as ast, Expr, OperatorPrecedence};
use ruff_python_parser::TokenKind;
use ruff_text_size::Ranged;
use crate::checkers::ast::Checker;

View File

@@ -2,9 +2,9 @@ use std::cmp::Ordering;
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_ast::helpers::comment_indentation_after;
use ruff_python_ast::token::{TokenKind, Tokens};
use ruff_python_ast::whitespace::indentation;
use ruff_python_ast::{Stmt, StmtExpr, StmtFor, StmtIf, StmtTry, StmtWhile};
use ruff_python_parser::{TokenKind, Tokens};
use ruff_source_file::LineRanges;
use ruff_text_size::{Ranged, TextLen, TextRange, TextSize};

View File

@@ -9,8 +9,8 @@ use std::cmp::Ordering;
use itertools::Itertools;
use ruff_python_ast as ast;
use ruff_python_ast::token::{TokenKind, Tokens};
use ruff_python_codegen::Stylist;
use ruff_python_parser::{TokenKind, Tokens};
use ruff_python_stdlib::str::is_cased_uppercase;
use ruff_python_trivia::{SimpleTokenKind, first_non_trivia_token, leading_indentation};
use ruff_source_file::LineRanges;

View File

@@ -1,7 +1,7 @@
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_ast::PythonVersion;
use ruff_python_ast::token::TokenKind;
use ruff_python_ast::{Expr, ExprCall, parenthesize::parenthesized_range};
use ruff_python_parser::TokenKind;
use ruff_text_size::{Ranged, TextRange};
use crate::checkers::ast::Checker;

View File

@@ -17,4 +17,6 @@ PLE1142 `await` should be used within an async function
4 | # Top-level await
5 | await 1
| ^^^^^^^
6 |
7 | ([await c for c in cor] async for cor in func()) # ok
|

View File

@@ -0,0 +1,55 @@
---
source: crates/ruff_linter/src/linter.rs
---
B901 Using `yield` and `return {value}` in a generator function can lead to confusing behavior
--> resources/test/fixtures/syntax_errors/return_in_generator.py:3:5
|
1 | async def gen():
2 | yield 1
3 | return 42
| ^^^^^^^^^
4 |
5 | def gen(): # B901 but not a syntax error - not an async generator
|
B901 Using `yield` and `return {value}` in a generator function can lead to confusing behavior
--> resources/test/fixtures/syntax_errors/return_in_generator.py:7:5
|
5 | def gen(): # B901 but not a syntax error - not an async generator
6 | yield 1
7 | return 42
| ^^^^^^^^^
8 |
9 | async def gen(): # ok - no value in return
|
B901 Using `yield` and `return {value}` in a generator function can lead to confusing behavior
--> resources/test/fixtures/syntax_errors/return_in_generator.py:15:5
|
13 | async def gen():
14 | yield 1
15 | return foo()
| ^^^^^^^^^^^^
16 |
17 | async def gen():
|
B901 Using `yield` and `return {value}` in a generator function can lead to confusing behavior
--> resources/test/fixtures/syntax_errors/return_in_generator.py:19:5
|
17 | async def gen():
18 | yield 1
19 | return [1, 2, 3]
| ^^^^^^^^^^^^^^^^
20 |
21 | async def gen():
|
B901 Using `yield` and `return {value}` in a generator function can lead to confusing behavior
--> resources/test/fixtures/syntax_errors/return_in_generator.py:24:5
|
22 | if True:
23 | yield 1
24 | return 10
| ^^^^^^^^^
|

File diff suppressed because it is too large Load Diff

View File

@@ -1,17 +1,46 @@
use std::sync::{LazyLock, Mutex};
use std::cell::RefCell;
use get_size2::{GetSize, StandardTracker};
use ordermap::{OrderMap, OrderSet};
thread_local! {
pub static TRACKER: RefCell<Option<StandardTracker>>= const { RefCell::new(None) };
}
struct TrackerGuard(Option<StandardTracker>);
impl Drop for TrackerGuard {
fn drop(&mut self) {
TRACKER.set(self.0.take());
}
}
pub fn attach_tracker<R>(tracker: StandardTracker, f: impl FnOnce() -> R) -> R {
let prev = TRACKER.replace(Some(tracker));
let _guard = TrackerGuard(prev);
f()
}
fn with_tracker<F, R>(f: F) -> R
where
F: FnOnce(Option<&mut StandardTracker>) -> R,
{
TRACKER.with(|tracker| {
let mut tracker = tracker.borrow_mut();
f(tracker.as_mut())
})
}
/// Returns the memory usage of the provided object, using a global tracker to avoid
/// double-counting shared objects.
pub fn heap_size<T: GetSize>(value: &T) -> usize {
static TRACKER: LazyLock<Mutex<StandardTracker>> =
LazyLock::new(|| Mutex::new(StandardTracker::new()));
value
.get_heap_size_with_tracker(&mut *TRACKER.lock().unwrap())
.0
with_tracker(|tracker| {
if let Some(tracker) = tracker {
value.get_heap_size_with_tracker(tracker).0
} else {
value.get_heap_size()
}
})
}
/// An implementation of [`GetSize::get_heap_size`] for [`OrderSet`].

View File

@@ -29,6 +29,7 @@ pub mod statement_visitor;
pub mod stmt_if;
pub mod str;
pub mod str_prefix;
pub mod token;
pub mod traversal;
pub mod types;
pub mod visitor;

View File

@@ -11,6 +11,8 @@ use crate::ExprRef;
/// Note that without a parent the range can be inaccurate, e.g. `f(a)` we falsely return a set of
/// parentheses around `a` even if the parentheses actually belong to `f`. That is why you should
/// generally prefer [`parenthesized_range`].
///
/// Prefer [`crate::token::parentheses_iterator`] if you have access to [`crate::token::Tokens`].
pub fn parentheses_iterator<'a>(
expr: ExprRef<'a>,
parent: Option<AnyNodeRef>,
@@ -57,6 +59,8 @@ pub fn parentheses_iterator<'a>(
/// Returns the [`TextRange`] of a given expression including parentheses, if the expression is
/// parenthesized; or `None`, if the expression is not parenthesized.
///
/// Prefer [`crate::token::parenthesized_range`] if you have access to [`crate::token::Tokens`].
pub fn parenthesized_range(
expr: ExprRef,
parent: AnyNodeRef,

View File

@@ -12,11 +12,11 @@ static FINDER: LazyLock<Finder> = LazyLock::new(|| Finder::new(b"# /// script"))
#[derive(Debug, Clone, Eq, PartialEq)]
pub struct ScriptTag {
/// The content of the script before the metadata block.
prelude: String,
pub prelude: String,
/// The metadata block.
metadata: String,
pub metadata: String,
/// The content of the script after the metadata block.
postlude: String,
pub postlude: String,
}
impl ScriptTag {

View File

@@ -0,0 +1,853 @@
//! Token kinds for Python source code created by the lexer and consumed by the `ruff_python_parser`.
//!
//! This module defines the tokens that the lexer recognizes. The tokens are
//! loosely based on the token definitions found in the [CPython source].
//!
//! [CPython source]: https://github.com/python/cpython/blob/dfc2e065a2e71011017077e549cd2f9bf4944c54/Grammar/Tokens
use std::fmt;
use bitflags::bitflags;
use crate::str::{Quote, TripleQuotes};
use crate::str_prefix::{
AnyStringPrefix, ByteStringPrefix, FStringPrefix, StringLiteralPrefix, TStringPrefix,
};
use crate::{AnyStringFlags, BoolOp, Operator, StringFlags, UnaryOp};
use ruff_text_size::{Ranged, TextRange};
mod parentheses;
mod tokens;
pub use parentheses::{parentheses_iterator, parenthesized_range};
pub use tokens::{TokenAt, TokenIterWithContext, Tokens};
#[derive(Clone, Copy, PartialEq, Eq)]
#[cfg_attr(feature = "get-size", derive(get_size2::GetSize))]
pub struct Token {
/// The kind of the token.
kind: TokenKind,
/// The range of the token.
range: TextRange,
/// The set of flags describing this token.
flags: TokenFlags,
}
impl Token {
pub fn new(kind: TokenKind, range: TextRange, flags: TokenFlags) -> Token {
Self { kind, range, flags }
}
/// Returns the token kind.
#[inline]
pub const fn kind(&self) -> TokenKind {
self.kind
}
/// Returns the token as a tuple of (kind, range).
#[inline]
pub const fn as_tuple(&self) -> (TokenKind, TextRange) {
(self.kind, self.range)
}
/// Returns `true` if the current token is a triple-quoted string of any kind.
///
/// # Panics
///
/// If it isn't a string or any f/t-string tokens.
pub fn is_triple_quoted_string(self) -> bool {
self.unwrap_string_flags().is_triple_quoted()
}
/// Returns the [`Quote`] style for the current string token of any kind.
///
/// # Panics
///
/// If it isn't a string or any f/t-string tokens.
pub fn string_quote_style(self) -> Quote {
self.unwrap_string_flags().quote_style()
}
/// Returns the [`AnyStringFlags`] style for the current string token of any kind.
///
/// # Panics
///
/// If it isn't a string or any f/t-string tokens.
pub fn unwrap_string_flags(self) -> AnyStringFlags {
self.string_flags()
.unwrap_or_else(|| panic!("token to be a string"))
}
/// Returns true if the current token is a string and it is raw.
pub fn string_flags(self) -> Option<AnyStringFlags> {
if self.is_any_string() {
Some(self.flags.as_any_string_flags())
} else {
None
}
}
/// Returns `true` if this is any kind of string token - including
/// tokens in t-strings (which do not have type `str`).
const fn is_any_string(self) -> bool {
matches!(
self.kind,
TokenKind::String
| TokenKind::FStringStart
| TokenKind::FStringMiddle
| TokenKind::FStringEnd
| TokenKind::TStringStart
| TokenKind::TStringMiddle
| TokenKind::TStringEnd
)
}
}
impl Ranged for Token {
fn range(&self) -> TextRange {
self.range
}
}
impl fmt::Debug for Token {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
write!(f, "{:?} {:?}", self.kind, self.range)?;
if !self.flags.is_empty() {
f.write_str(" (flags = ")?;
let mut first = true;
for (name, _) in self.flags.iter_names() {
if first {
first = false;
} else {
f.write_str(" | ")?;
}
f.write_str(name)?;
}
f.write_str(")")?;
}
Ok(())
}
}
/// A kind of a token.
#[derive(Copy, Clone, PartialEq, Eq, Hash, Debug, PartialOrd, Ord)]
#[cfg_attr(feature = "get-size", derive(get_size2::GetSize))]
pub enum TokenKind {
/// Token kind for a name, commonly known as an identifier.
Name,
/// Token kind for an integer.
Int,
/// Token kind for a floating point number.
Float,
/// Token kind for a complex number.
Complex,
/// Token kind for a string.
String,
/// Token kind for the start of an f-string. This includes the `f`/`F`/`fr` prefix
/// and the opening quote(s).
FStringStart,
/// Token kind that includes the portion of text inside the f-string that's not
/// part of the expression part and isn't an opening or closing brace.
FStringMiddle,
/// Token kind for the end of an f-string. This includes the closing quote.
FStringEnd,
/// Token kind for the start of a t-string. This includes the `t`/`T`/`tr` prefix
/// and the opening quote(s).
TStringStart,
/// Token kind that includes the portion of text inside the t-string that's not
/// part of the interpolation part and isn't an opening or closing brace.
TStringMiddle,
/// Token kind for the end of a t-string. This includes the closing quote.
TStringEnd,
/// Token kind for a IPython escape command.
IpyEscapeCommand,
/// Token kind for a comment. These are filtered out of the token stream prior to parsing.
Comment,
/// Token kind for a newline.
Newline,
/// Token kind for a newline that is not a logical line break. These are filtered out of
/// the token stream prior to parsing.
NonLogicalNewline,
/// Token kind for an indent.
Indent,
/// Token kind for a dedent.
Dedent,
EndOfFile,
/// Token kind for a question mark `?`.
Question,
/// Token kind for an exclamation mark `!`.
Exclamation,
/// Token kind for a left parenthesis `(`.
Lpar,
/// Token kind for a right parenthesis `)`.
Rpar,
/// Token kind for a left square bracket `[`.
Lsqb,
/// Token kind for a right square bracket `]`.
Rsqb,
/// Token kind for a colon `:`.
Colon,
/// Token kind for a comma `,`.
Comma,
/// Token kind for a semicolon `;`.
Semi,
/// Token kind for plus `+`.
Plus,
/// Token kind for minus `-`.
Minus,
/// Token kind for star `*`.
Star,
/// Token kind for slash `/`.
Slash,
/// Token kind for vertical bar `|`.
Vbar,
/// Token kind for ampersand `&`.
Amper,
/// Token kind for less than `<`.
Less,
/// Token kind for greater than `>`.
Greater,
/// Token kind for equal `=`.
Equal,
/// Token kind for dot `.`.
Dot,
/// Token kind for percent `%`.
Percent,
/// Token kind for left bracket `{`.
Lbrace,
/// Token kind for right bracket `}`.
Rbrace,
/// Token kind for double equal `==`.
EqEqual,
/// Token kind for not equal `!=`.
NotEqual,
/// Token kind for less than or equal `<=`.
LessEqual,
/// Token kind for greater than or equal `>=`.
GreaterEqual,
/// Token kind for tilde `~`.
Tilde,
/// Token kind for caret `^`.
CircumFlex,
/// Token kind for left shift `<<`.
LeftShift,
/// Token kind for right shift `>>`.
RightShift,
/// Token kind for double star `**`.
DoubleStar,
/// Token kind for double star equal `**=`.
DoubleStarEqual,
/// Token kind for plus equal `+=`.
PlusEqual,
/// Token kind for minus equal `-=`.
MinusEqual,
/// Token kind for star equal `*=`.
StarEqual,
/// Token kind for slash equal `/=`.
SlashEqual,
/// Token kind for percent equal `%=`.
PercentEqual,
/// Token kind for ampersand equal `&=`.
AmperEqual,
/// Token kind for vertical bar equal `|=`.
VbarEqual,
/// Token kind for caret equal `^=`.
CircumflexEqual,
/// Token kind for left shift equal `<<=`.
LeftShiftEqual,
/// Token kind for right shift equal `>>=`.
RightShiftEqual,
/// Token kind for double slash `//`.
DoubleSlash,
/// Token kind for double slash equal `//=`.
DoubleSlashEqual,
/// Token kind for colon equal `:=`.
ColonEqual,
/// Token kind for at `@`.
At,
/// Token kind for at equal `@=`.
AtEqual,
/// Token kind for arrow `->`.
Rarrow,
/// Token kind for ellipsis `...`.
Ellipsis,
// The keywords should be sorted in alphabetical order. If the boundary tokens for the
// "Keywords" and "Soft keywords" group change, update the related methods on `TokenKind`.
// Keywords
And,
As,
Assert,
Async,
Await,
Break,
Class,
Continue,
Def,
Del,
Elif,
Else,
Except,
False,
Finally,
For,
From,
Global,
If,
Import,
In,
Is,
Lambda,
None,
Nonlocal,
Not,
Or,
Pass,
Raise,
Return,
True,
Try,
While,
With,
Yield,
// Soft keywords
Case,
Match,
Type,
Unknown,
}
impl TokenKind {
/// Returns `true` if this is an end of file token.
#[inline]
pub const fn is_eof(self) -> bool {
matches!(self, TokenKind::EndOfFile)
}
/// Returns `true` if this is either a newline or non-logical newline token.
#[inline]
pub const fn is_any_newline(self) -> bool {
matches!(self, TokenKind::Newline | TokenKind::NonLogicalNewline)
}
/// Returns `true` if the token is a keyword (including soft keywords).
///
/// See also [`is_soft_keyword`], [`is_non_soft_keyword`].
///
/// [`is_soft_keyword`]: TokenKind::is_soft_keyword
/// [`is_non_soft_keyword`]: TokenKind::is_non_soft_keyword
#[inline]
pub fn is_keyword(self) -> bool {
TokenKind::And <= self && self <= TokenKind::Type
}
/// Returns `true` if the token is strictly a soft keyword.
///
/// See also [`is_keyword`], [`is_non_soft_keyword`].
///
/// [`is_keyword`]: TokenKind::is_keyword
/// [`is_non_soft_keyword`]: TokenKind::is_non_soft_keyword
#[inline]
pub fn is_soft_keyword(self) -> bool {
TokenKind::Case <= self && self <= TokenKind::Type
}
/// Returns `true` if the token is strictly a non-soft keyword.
///
/// See also [`is_keyword`], [`is_soft_keyword`].
///
/// [`is_keyword`]: TokenKind::is_keyword
/// [`is_soft_keyword`]: TokenKind::is_soft_keyword
#[inline]
pub fn is_non_soft_keyword(self) -> bool {
TokenKind::And <= self && self <= TokenKind::Yield
}
#[inline]
pub const fn is_operator(self) -> bool {
matches!(
self,
TokenKind::Lpar
| TokenKind::Rpar
| TokenKind::Lsqb
| TokenKind::Rsqb
| TokenKind::Comma
| TokenKind::Semi
| TokenKind::Plus
| TokenKind::Minus
| TokenKind::Star
| TokenKind::Slash
| TokenKind::Vbar
| TokenKind::Amper
| TokenKind::Less
| TokenKind::Greater
| TokenKind::Equal
| TokenKind::Dot
| TokenKind::Percent
| TokenKind::Lbrace
| TokenKind::Rbrace
| TokenKind::EqEqual
| TokenKind::NotEqual
| TokenKind::LessEqual
| TokenKind::GreaterEqual
| TokenKind::Tilde
| TokenKind::CircumFlex
| TokenKind::LeftShift
| TokenKind::RightShift
| TokenKind::DoubleStar
| TokenKind::PlusEqual
| TokenKind::MinusEqual
| TokenKind::StarEqual
| TokenKind::SlashEqual
| TokenKind::PercentEqual
| TokenKind::AmperEqual
| TokenKind::VbarEqual
| TokenKind::CircumflexEqual
| TokenKind::LeftShiftEqual
| TokenKind::RightShiftEqual
| TokenKind::DoubleStarEqual
| TokenKind::DoubleSlash
| TokenKind::DoubleSlashEqual
| TokenKind::At
| TokenKind::AtEqual
| TokenKind::Rarrow
| TokenKind::Ellipsis
| TokenKind::ColonEqual
| TokenKind::Colon
| TokenKind::And
| TokenKind::Or
| TokenKind::Not
| TokenKind::In
| TokenKind::Is
)
}
/// Returns `true` if this is a singleton token i.e., `True`, `False`, or `None`.
#[inline]
pub const fn is_singleton(self) -> bool {
matches!(self, TokenKind::False | TokenKind::True | TokenKind::None)
}
/// Returns `true` if this is a trivia token i.e., a comment or a non-logical newline.
#[inline]
pub const fn is_trivia(&self) -> bool {
matches!(self, TokenKind::Comment | TokenKind::NonLogicalNewline)
}
/// Returns `true` if this is a comment token.
#[inline]
pub const fn is_comment(&self) -> bool {
matches!(self, TokenKind::Comment)
}
#[inline]
pub const fn is_arithmetic(self) -> bool {
matches!(
self,
TokenKind::DoubleStar
| TokenKind::Star
| TokenKind::Plus
| TokenKind::Minus
| TokenKind::Slash
| TokenKind::DoubleSlash
| TokenKind::At
)
}
#[inline]
pub const fn is_bitwise_or_shift(self) -> bool {
matches!(
self,
TokenKind::LeftShift
| TokenKind::LeftShiftEqual
| TokenKind::RightShift
| TokenKind::RightShiftEqual
| TokenKind::Amper
| TokenKind::AmperEqual
| TokenKind::Vbar
| TokenKind::VbarEqual
| TokenKind::CircumFlex
| TokenKind::CircumflexEqual
| TokenKind::Tilde
)
}
/// Returns `true` if the current token is a unary arithmetic operator.
#[inline]
pub const fn is_unary_arithmetic_operator(self) -> bool {
matches!(self, TokenKind::Plus | TokenKind::Minus)
}
#[inline]
pub const fn is_interpolated_string_end(self) -> bool {
matches!(self, TokenKind::FStringEnd | TokenKind::TStringEnd)
}
/// Returns the [`UnaryOp`] that corresponds to this token kind, if it is a unary arithmetic
/// operator, otherwise return [None].
///
/// Use [`as_unary_operator`] to match against any unary operator.
///
/// [`as_unary_operator`]: TokenKind::as_unary_operator
#[inline]
pub const fn as_unary_arithmetic_operator(self) -> Option<UnaryOp> {
Some(match self {
TokenKind::Plus => UnaryOp::UAdd,
TokenKind::Minus => UnaryOp::USub,
_ => return None,
})
}
/// Returns the [`UnaryOp`] that corresponds to this token kind, if it is a unary operator,
/// otherwise return [None].
///
/// Use [`as_unary_arithmetic_operator`] to match against only an arithmetic unary operator.
///
/// [`as_unary_arithmetic_operator`]: TokenKind::as_unary_arithmetic_operator
#[inline]
pub const fn as_unary_operator(self) -> Option<UnaryOp> {
Some(match self {
TokenKind::Plus => UnaryOp::UAdd,
TokenKind::Minus => UnaryOp::USub,
TokenKind::Tilde => UnaryOp::Invert,
TokenKind::Not => UnaryOp::Not,
_ => return None,
})
}
/// Returns the [`BoolOp`] that corresponds to this token kind, if it is a boolean operator,
/// otherwise return [None].
#[inline]
pub const fn as_bool_operator(self) -> Option<BoolOp> {
Some(match self {
TokenKind::And => BoolOp::And,
TokenKind::Or => BoolOp::Or,
_ => return None,
})
}
/// Returns the binary [`Operator`] that corresponds to the current token, if it's a binary
/// operator, otherwise return [None].
///
/// Use [`as_augmented_assign_operator`] to match against an augmented assignment token.
///
/// [`as_augmented_assign_operator`]: TokenKind::as_augmented_assign_operator
pub const fn as_binary_operator(self) -> Option<Operator> {
Some(match self {
TokenKind::Plus => Operator::Add,
TokenKind::Minus => Operator::Sub,
TokenKind::Star => Operator::Mult,
TokenKind::At => Operator::MatMult,
TokenKind::DoubleStar => Operator::Pow,
TokenKind::Slash => Operator::Div,
TokenKind::DoubleSlash => Operator::FloorDiv,
TokenKind::Percent => Operator::Mod,
TokenKind::Amper => Operator::BitAnd,
TokenKind::Vbar => Operator::BitOr,
TokenKind::CircumFlex => Operator::BitXor,
TokenKind::LeftShift => Operator::LShift,
TokenKind::RightShift => Operator::RShift,
_ => return None,
})
}
/// Returns the [`Operator`] that corresponds to this token kind, if it is
/// an augmented assignment operator, or [`None`] otherwise.
#[inline]
pub const fn as_augmented_assign_operator(self) -> Option<Operator> {
Some(match self {
TokenKind::PlusEqual => Operator::Add,
TokenKind::MinusEqual => Operator::Sub,
TokenKind::StarEqual => Operator::Mult,
TokenKind::AtEqual => Operator::MatMult,
TokenKind::DoubleStarEqual => Operator::Pow,
TokenKind::SlashEqual => Operator::Div,
TokenKind::DoubleSlashEqual => Operator::FloorDiv,
TokenKind::PercentEqual => Operator::Mod,
TokenKind::AmperEqual => Operator::BitAnd,
TokenKind::VbarEqual => Operator::BitOr,
TokenKind::CircumflexEqual => Operator::BitXor,
TokenKind::LeftShiftEqual => Operator::LShift,
TokenKind::RightShiftEqual => Operator::RShift,
_ => return None,
})
}
}
impl From<BoolOp> for TokenKind {
#[inline]
fn from(op: BoolOp) -> Self {
match op {
BoolOp::And => TokenKind::And,
BoolOp::Or => TokenKind::Or,
}
}
}
impl From<UnaryOp> for TokenKind {
#[inline]
fn from(op: UnaryOp) -> Self {
match op {
UnaryOp::Invert => TokenKind::Tilde,
UnaryOp::Not => TokenKind::Not,
UnaryOp::UAdd => TokenKind::Plus,
UnaryOp::USub => TokenKind::Minus,
}
}
}
impl From<Operator> for TokenKind {
#[inline]
fn from(op: Operator) -> Self {
match op {
Operator::Add => TokenKind::Plus,
Operator::Sub => TokenKind::Minus,
Operator::Mult => TokenKind::Star,
Operator::MatMult => TokenKind::At,
Operator::Div => TokenKind::Slash,
Operator::Mod => TokenKind::Percent,
Operator::Pow => TokenKind::DoubleStar,
Operator::LShift => TokenKind::LeftShift,
Operator::RShift => TokenKind::RightShift,
Operator::BitOr => TokenKind::Vbar,
Operator::BitXor => TokenKind::CircumFlex,
Operator::BitAnd => TokenKind::Amper,
Operator::FloorDiv => TokenKind::DoubleSlash,
}
}
}
impl fmt::Display for TokenKind {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
let value = match self {
TokenKind::Unknown => "Unknown",
TokenKind::Newline => "newline",
TokenKind::NonLogicalNewline => "NonLogicalNewline",
TokenKind::Indent => "indent",
TokenKind::Dedent => "dedent",
TokenKind::EndOfFile => "end of file",
TokenKind::Name => "name",
TokenKind::Int => "int",
TokenKind::Float => "float",
TokenKind::Complex => "complex",
TokenKind::String => "string",
TokenKind::FStringStart => "FStringStart",
TokenKind::FStringMiddle => "FStringMiddle",
TokenKind::FStringEnd => "FStringEnd",
TokenKind::TStringStart => "TStringStart",
TokenKind::TStringMiddle => "TStringMiddle",
TokenKind::TStringEnd => "TStringEnd",
TokenKind::IpyEscapeCommand => "IPython escape command",
TokenKind::Comment => "comment",
TokenKind::Question => "`?`",
TokenKind::Exclamation => "`!`",
TokenKind::Lpar => "`(`",
TokenKind::Rpar => "`)`",
TokenKind::Lsqb => "`[`",
TokenKind::Rsqb => "`]`",
TokenKind::Lbrace => "`{`",
TokenKind::Rbrace => "`}`",
TokenKind::Equal => "`=`",
TokenKind::ColonEqual => "`:=`",
TokenKind::Dot => "`.`",
TokenKind::Colon => "`:`",
TokenKind::Semi => "`;`",
TokenKind::Comma => "`,`",
TokenKind::Rarrow => "`->`",
TokenKind::Plus => "`+`",
TokenKind::Minus => "`-`",
TokenKind::Star => "`*`",
TokenKind::DoubleStar => "`**`",
TokenKind::Slash => "`/`",
TokenKind::DoubleSlash => "`//`",
TokenKind::Percent => "`%`",
TokenKind::Vbar => "`|`",
TokenKind::Amper => "`&`",
TokenKind::CircumFlex => "`^`",
TokenKind::LeftShift => "`<<`",
TokenKind::RightShift => "`>>`",
TokenKind::Tilde => "`~`",
TokenKind::At => "`@`",
TokenKind::Less => "`<`",
TokenKind::Greater => "`>`",
TokenKind::EqEqual => "`==`",
TokenKind::NotEqual => "`!=`",
TokenKind::LessEqual => "`<=`",
TokenKind::GreaterEqual => "`>=`",
TokenKind::PlusEqual => "`+=`",
TokenKind::MinusEqual => "`-=`",
TokenKind::StarEqual => "`*=`",
TokenKind::DoubleStarEqual => "`**=`",
TokenKind::SlashEqual => "`/=`",
TokenKind::DoubleSlashEqual => "`//=`",
TokenKind::PercentEqual => "`%=`",
TokenKind::VbarEqual => "`|=`",
TokenKind::AmperEqual => "`&=`",
TokenKind::CircumflexEqual => "`^=`",
TokenKind::LeftShiftEqual => "`<<=`",
TokenKind::RightShiftEqual => "`>>=`",
TokenKind::AtEqual => "`@=`",
TokenKind::Ellipsis => "`...`",
TokenKind::False => "`False`",
TokenKind::None => "`None`",
TokenKind::True => "`True`",
TokenKind::And => "`and`",
TokenKind::As => "`as`",
TokenKind::Assert => "`assert`",
TokenKind::Async => "`async`",
TokenKind::Await => "`await`",
TokenKind::Break => "`break`",
TokenKind::Class => "`class`",
TokenKind::Continue => "`continue`",
TokenKind::Def => "`def`",
TokenKind::Del => "`del`",
TokenKind::Elif => "`elif`",
TokenKind::Else => "`else`",
TokenKind::Except => "`except`",
TokenKind::Finally => "`finally`",
TokenKind::For => "`for`",
TokenKind::From => "`from`",
TokenKind::Global => "`global`",
TokenKind::If => "`if`",
TokenKind::Import => "`import`",
TokenKind::In => "`in`",
TokenKind::Is => "`is`",
TokenKind::Lambda => "`lambda`",
TokenKind::Nonlocal => "`nonlocal`",
TokenKind::Not => "`not`",
TokenKind::Or => "`or`",
TokenKind::Pass => "`pass`",
TokenKind::Raise => "`raise`",
TokenKind::Return => "`return`",
TokenKind::Try => "`try`",
TokenKind::While => "`while`",
TokenKind::Match => "`match`",
TokenKind::Type => "`type`",
TokenKind::Case => "`case`",
TokenKind::With => "`with`",
TokenKind::Yield => "`yield`",
};
f.write_str(value)
}
}
bitflags! {
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
pub struct TokenFlags: u16 {
/// The token is a string with double quotes (`"`).
const DOUBLE_QUOTES = 1 << 0;
/// The token is a triple-quoted string i.e., it starts and ends with three consecutive
/// quote characters (`"""` or `'''`).
const TRIPLE_QUOTED_STRING = 1 << 1;
/// The token is a unicode string i.e., prefixed with `u` or `U`
const UNICODE_STRING = 1 << 2;
/// The token is a byte string i.e., prefixed with `b` or `B`
const BYTE_STRING = 1 << 3;
/// The token is an f-string i.e., prefixed with `f` or `F`
const F_STRING = 1 << 4;
/// The token is a t-string i.e., prefixed with `t` or `T`
const T_STRING = 1 << 5;
/// The token is a raw string and the prefix character is in lowercase.
const RAW_STRING_LOWERCASE = 1 << 6;
/// The token is a raw string and the prefix character is in uppercase.
const RAW_STRING_UPPERCASE = 1 << 7;
/// String without matching closing quote(s)
const UNCLOSED_STRING = 1 << 8;
/// The token is a raw string i.e., prefixed with `r` or `R`
const RAW_STRING = Self::RAW_STRING_LOWERCASE.bits() | Self::RAW_STRING_UPPERCASE.bits();
}
}
#[cfg(feature = "get-size")]
impl get_size2::GetSize for TokenFlags {}
impl StringFlags for TokenFlags {
fn quote_style(self) -> Quote {
if self.intersects(TokenFlags::DOUBLE_QUOTES) {
Quote::Double
} else {
Quote::Single
}
}
fn triple_quotes(self) -> TripleQuotes {
if self.intersects(TokenFlags::TRIPLE_QUOTED_STRING) {
TripleQuotes::Yes
} else {
TripleQuotes::No
}
}
fn prefix(self) -> AnyStringPrefix {
if self.intersects(TokenFlags::F_STRING) {
if self.intersects(TokenFlags::RAW_STRING_LOWERCASE) {
AnyStringPrefix::Format(FStringPrefix::Raw { uppercase_r: false })
} else if self.intersects(TokenFlags::RAW_STRING_UPPERCASE) {
AnyStringPrefix::Format(FStringPrefix::Raw { uppercase_r: true })
} else {
AnyStringPrefix::Format(FStringPrefix::Regular)
}
} else if self.intersects(TokenFlags::T_STRING) {
if self.intersects(TokenFlags::RAW_STRING_LOWERCASE) {
AnyStringPrefix::Template(TStringPrefix::Raw { uppercase_r: false })
} else if self.intersects(TokenFlags::RAW_STRING_UPPERCASE) {
AnyStringPrefix::Template(TStringPrefix::Raw { uppercase_r: true })
} else {
AnyStringPrefix::Template(TStringPrefix::Regular)
}
} else if self.intersects(TokenFlags::BYTE_STRING) {
if self.intersects(TokenFlags::RAW_STRING_LOWERCASE) {
AnyStringPrefix::Bytes(ByteStringPrefix::Raw { uppercase_r: false })
} else if self.intersects(TokenFlags::RAW_STRING_UPPERCASE) {
AnyStringPrefix::Bytes(ByteStringPrefix::Raw { uppercase_r: true })
} else {
AnyStringPrefix::Bytes(ByteStringPrefix::Regular)
}
} else if self.intersects(TokenFlags::RAW_STRING_LOWERCASE) {
AnyStringPrefix::Regular(StringLiteralPrefix::Raw { uppercase: false })
} else if self.intersects(TokenFlags::RAW_STRING_UPPERCASE) {
AnyStringPrefix::Regular(StringLiteralPrefix::Raw { uppercase: true })
} else if self.intersects(TokenFlags::UNICODE_STRING) {
AnyStringPrefix::Regular(StringLiteralPrefix::Unicode)
} else {
AnyStringPrefix::Regular(StringLiteralPrefix::Empty)
}
}
fn is_unclosed(self) -> bool {
self.intersects(TokenFlags::UNCLOSED_STRING)
}
}
impl TokenFlags {
/// Returns `true` if the token is an f-string.
pub const fn is_f_string(self) -> bool {
self.intersects(TokenFlags::F_STRING)
}
/// Returns `true` if the token is a t-string.
pub const fn is_t_string(self) -> bool {
self.intersects(TokenFlags::T_STRING)
}
/// Returns `true` if the token is a t-string.
pub const fn is_interpolated_string(self) -> bool {
self.intersects(TokenFlags::T_STRING.union(TokenFlags::F_STRING))
}
/// Returns `true` if the token is a triple-quoted t-string.
pub fn is_triple_quoted_interpolated_string(self) -> bool {
self.intersects(TokenFlags::TRIPLE_QUOTED_STRING) && self.is_interpolated_string()
}
/// Returns `true` if the token is a raw string.
pub const fn is_raw_string(self) -> bool {
self.intersects(TokenFlags::RAW_STRING)
}
}

View File

@@ -0,0 +1,58 @@
use ruff_text_size::{Ranged, TextLen, TextRange};
use super::{TokenKind, Tokens};
use crate::{AnyNodeRef, ExprRef};
/// Returns an iterator over the ranges of the optional parentheses surrounding an expression.
///
/// E.g. for `((f()))` with `f()` as expression, the iterator returns the ranges (1, 6) and (0, 7).
///
/// Note that without a parent the range can be inaccurate, e.g. `f(a)` we falsely return a set of
/// parentheses around `a` even if the parentheses actually belong to `f`. That is why you should
/// generally prefer [`parenthesized_range`].
pub fn parentheses_iterator<'a>(
expr: ExprRef<'a>,
parent: Option<AnyNodeRef>,
tokens: &'a Tokens,
) -> impl Iterator<Item = TextRange> + 'a {
let after_tokens = if let Some(parent) = parent {
// If the parent is a node that brings its own parentheses, exclude the closing parenthesis
// from our search range. Otherwise, we risk matching on calls, like `func(x)`, for which
// the open and close parentheses are part of the `Arguments` node.
let exclusive_parent_end = if parent.is_arguments() {
parent.end() - ")".text_len()
} else {
parent.end()
};
tokens.in_range(TextRange::new(expr.end(), exclusive_parent_end))
} else {
tokens.after(expr.end())
};
let right_parens = after_tokens
.iter()
.filter(|token| !token.kind().is_trivia())
.take_while(move |token| token.kind() == TokenKind::Rpar);
let left_parens = tokens
.before(expr.start())
.iter()
.rev()
.filter(|token| !token.kind().is_trivia())
.take_while(|token| token.kind() == TokenKind::Lpar);
right_parens
.zip(left_parens)
.map(|(right, left)| TextRange::new(left.start(), right.end()))
}
/// Returns the [`TextRange`] of a given expression including parentheses, if the expression is
/// parenthesized; or `None`, if the expression is not parenthesized.
pub fn parenthesized_range(
expr: ExprRef,
parent: AnyNodeRef,
tokens: &Tokens,
) -> Option<TextRange> {
parentheses_iterator(expr, Some(parent), tokens).last()
}

View File

@@ -0,0 +1,520 @@
use std::{iter::FusedIterator, ops::Deref};
use super::{Token, TokenKind};
use ruff_python_trivia::CommentRanges;
use ruff_text_size::{Ranged as _, TextRange, TextSize};
/// Tokens represents a vector of lexed [`Token`].
#[derive(Debug, Clone, PartialEq, Eq)]
#[cfg_attr(feature = "get-size", derive(get_size2::GetSize))]
pub struct Tokens {
raw: Vec<Token>,
}
impl Tokens {
pub fn new(tokens: Vec<Token>) -> Tokens {
Tokens { raw: tokens }
}
/// Returns an iterator over all the tokens that provides context.
pub fn iter_with_context(&self) -> TokenIterWithContext<'_> {
TokenIterWithContext::new(&self.raw)
}
/// Performs a binary search to find the index of the **first** token that starts at the given `offset`.
///
/// Unlike `binary_search_by_key`, this method ensures that if multiple tokens start at the same offset,
/// it returns the index of the first one. Multiple tokens can start at the same offset in cases where
/// zero-length tokens are involved (like `Dedent` or `Newline` at the end of the file).
pub fn binary_search_by_start(&self, offset: TextSize) -> Result<usize, usize> {
let partition_point = self.partition_point(|token| token.start() < offset);
let after = &self[partition_point..];
if after.first().is_some_and(|first| first.start() == offset) {
Ok(partition_point)
} else {
Err(partition_point)
}
}
/// Returns a slice of [`Token`] that are within the given `range`.
///
/// The start and end offset of the given range should be either:
/// 1. Token boundary
/// 2. Gap between the tokens
///
/// For example, considering the following tokens and their corresponding range:
///
/// | Token | Range |
/// |---------------------|-----------|
/// | `Def` | `0..3` |
/// | `Name` | `4..7` |
/// | `Lpar` | `7..8` |
/// | `Rpar` | `8..9` |
/// | `Colon` | `9..10` |
/// | `Newline` | `10..11` |
/// | `Comment` | `15..24` |
/// | `NonLogicalNewline` | `24..25` |
/// | `Indent` | `25..29` |
/// | `Pass` | `29..33` |
///
/// Here, for (1) a token boundary is considered either the start or end offset of any of the
/// above tokens. For (2), the gap would be any offset between the `Newline` and `Comment`
/// token which are 12, 13, and 14.
///
/// Examples:
/// 1) `4..10` would give `Name`, `Lpar`, `Rpar`, `Colon`
/// 2) `11..25` would give `Comment`, `NonLogicalNewline`
/// 3) `12..25` would give same as (2) and offset 12 is in the "gap"
/// 4) `9..12` would give `Colon`, `Newline` and offset 12 is in the "gap"
/// 5) `18..27` would panic because both the start and end offset is within a token
///
/// ## Note
///
/// The returned slice can contain the [`TokenKind::Unknown`] token if there was a lexical
/// error encountered within the given range.
///
/// # Panics
///
/// If either the start or end offset of the given range is within a token range.
pub fn in_range(&self, range: TextRange) -> &[Token] {
let tokens_after_start = self.after(range.start());
Self::before_impl(tokens_after_start, range.end())
}
/// Searches the token(s) at `offset`.
///
/// Returns [`TokenAt::Between`] if `offset` points directly inbetween two tokens
/// (the left token ends at `offset` and the right token starts at `offset`).
pub fn at_offset(&self, offset: TextSize) -> TokenAt {
match self.binary_search_by_start(offset) {
// The token at `index` starts exactly at `offset.
// ```python
// object.attribute
// ^ OFFSET
// ```
Ok(index) => {
let token = self[index];
// `token` starts exactly at `offset`. Test if the offset is right between
// `token` and the previous token (if there's any)
if let Some(previous) = index.checked_sub(1).map(|idx| self[idx]) {
if previous.end() == offset {
return TokenAt::Between(previous, token);
}
}
TokenAt::Single(token)
}
// No token found that starts exactly at the given offset. But it's possible that
// the token starting before `offset` fully encloses `offset` (it's end range ends after `offset`).
// ```python
// object.attribute
// ^ OFFSET
// # or
// if True:
// print("test")
// ^ OFFSET
// ```
Err(index) => {
if let Some(previous) = index.checked_sub(1).map(|idx| self[idx]) {
if previous.range().contains_inclusive(offset) {
return TokenAt::Single(previous);
}
}
TokenAt::None
}
}
}
/// Returns a slice of tokens before the given [`TextSize`] offset.
///
/// If the given offset is between two tokens, the returned slice will end just before the
/// following token. In other words, if the offset is between the end of previous token and
/// start of next token, the returned slice will end just before the next token.
///
/// # Panics
///
/// If the given offset is inside a token range at any point
/// other than the start of the range.
pub fn before(&self, offset: TextSize) -> &[Token] {
Self::before_impl(&self.raw, offset)
}
fn before_impl(tokens: &[Token], offset: TextSize) -> &[Token] {
let partition_point = tokens.partition_point(|token| token.start() < offset);
let before = &tokens[..partition_point];
if let Some(last) = before.last() {
// If it's equal to the end offset, then it's at a token boundary which is
// valid. If it's greater than the end offset, then it's in the gap between
// the tokens which is valid as well.
assert!(
offset >= last.end(),
"Offset {:?} is inside a token range {:?}",
offset,
last.range()
);
}
before
}
/// Returns a slice of tokens after the given [`TextSize`] offset.
///
/// If the given offset is between two tokens, the returned slice will start from the following
/// token. In other words, if the offset is between the end of previous token and start of next
/// token, the returned slice will start from the next token.
///
/// # Panics
///
/// If the given offset is inside a token range at any point
/// other than the start of the range.
pub fn after(&self, offset: TextSize) -> &[Token] {
let partition_point = self.partition_point(|token| token.end() <= offset);
let after = &self[partition_point..];
if let Some(first) = after.first() {
// valid. If it's greater than the end offset, then it's in the gap between
// the tokens which is valid as well.
assert!(
offset <= first.start(),
"Offset {:?} is inside a token range {:?}",
offset,
first.range()
);
}
after
}
}
impl<'a> IntoIterator for &'a Tokens {
type Item = &'a Token;
type IntoIter = std::slice::Iter<'a, Token>;
fn into_iter(self) -> Self::IntoIter {
self.iter()
}
}
impl Deref for Tokens {
type Target = [Token];
fn deref(&self) -> &Self::Target {
&self.raw
}
}
/// A token that encloses a given offset or ends exactly at it.
#[derive(Debug, Clone)]
pub enum TokenAt {
/// There's no token at the given offset
None,
/// There's a single token at the given offset.
Single(Token),
/// The offset falls exactly between two tokens. E.g. `CURSOR` in `call<CURSOR>(arguments)` is
/// positioned exactly between the `call` and `(` tokens.
Between(Token, Token),
}
impl Iterator for TokenAt {
type Item = Token;
fn next(&mut self) -> Option<Self::Item> {
match *self {
TokenAt::None => None,
TokenAt::Single(token) => {
*self = TokenAt::None;
Some(token)
}
TokenAt::Between(first, second) => {
*self = TokenAt::Single(second);
Some(first)
}
}
}
}
impl FusedIterator for TokenAt {}
impl From<&Tokens> for CommentRanges {
fn from(tokens: &Tokens) -> Self {
let mut ranges = vec![];
for token in tokens {
if token.kind() == TokenKind::Comment {
ranges.push(token.range());
}
}
CommentRanges::new(ranges)
}
}
/// An iterator over the [`Token`]s with context.
///
/// This struct is created by the [`iter_with_context`] method on [`Tokens`]. Refer to its
/// documentation for more details.
///
/// [`iter_with_context`]: Tokens::iter_with_context
#[derive(Debug, Clone)]
pub struct TokenIterWithContext<'a> {
inner: std::slice::Iter<'a, Token>,
nesting: u32,
}
impl<'a> TokenIterWithContext<'a> {
fn new(tokens: &'a [Token]) -> TokenIterWithContext<'a> {
TokenIterWithContext {
inner: tokens.iter(),
nesting: 0,
}
}
/// Return the nesting level the iterator is currently in.
pub const fn nesting(&self) -> u32 {
self.nesting
}
/// Returns `true` if the iterator is within a parenthesized context.
pub const fn in_parenthesized_context(&self) -> bool {
self.nesting > 0
}
/// Returns the next [`Token`] in the iterator without consuming it.
pub fn peek(&self) -> Option<&'a Token> {
self.clone().next()
}
}
impl<'a> Iterator for TokenIterWithContext<'a> {
type Item = &'a Token;
fn next(&mut self) -> Option<Self::Item> {
let token = self.inner.next()?;
match token.kind() {
TokenKind::Lpar | TokenKind::Lbrace | TokenKind::Lsqb => self.nesting += 1,
TokenKind::Rpar | TokenKind::Rbrace | TokenKind::Rsqb => {
self.nesting = self.nesting.saturating_sub(1);
}
// This mimics the behavior of re-lexing which reduces the nesting level on the lexer.
// We don't need to reduce it by 1 because unlike the lexer we see the final token
// after recovering from every unclosed parenthesis.
TokenKind::Newline if self.nesting > 0 => {
self.nesting = 0;
}
_ => {}
}
Some(token)
}
}
impl FusedIterator for TokenIterWithContext<'_> {}
#[cfg(test)]
mod tests {
use std::ops::Range;
use ruff_text_size::TextSize;
use crate::token::{Token, TokenFlags, TokenKind};
use super::*;
/// Test case containing a "gap" between two tokens.
///
/// Code: <https://play.ruff.rs/a3658340-6df8-42c5-be80-178744bf1193>
const TEST_CASE_WITH_GAP: [(TokenKind, Range<u32>); 10] = [
(TokenKind::Def, 0..3),
(TokenKind::Name, 4..7),
(TokenKind::Lpar, 7..8),
(TokenKind::Rpar, 8..9),
(TokenKind::Colon, 9..10),
(TokenKind::Newline, 10..11),
// Gap ||..||
(TokenKind::Comment, 15..24),
(TokenKind::NonLogicalNewline, 24..25),
(TokenKind::Indent, 25..29),
(TokenKind::Pass, 29..33),
// No newline at the end to keep the token set full of unique tokens
];
/// Helper function to create [`Tokens`] from an iterator of (kind, range).
fn new_tokens(tokens: impl Iterator<Item = (TokenKind, Range<u32>)>) -> Tokens {
Tokens::new(
tokens
.map(|(kind, range)| {
Token::new(
kind,
TextRange::new(TextSize::new(range.start), TextSize::new(range.end)),
TokenFlags::empty(),
)
})
.collect(),
)
}
#[test]
fn tokens_after_offset_at_token_start() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let after = tokens.after(TextSize::new(8));
assert_eq!(after.len(), 7);
assert_eq!(after.first().unwrap().kind(), TokenKind::Rpar);
}
#[test]
fn tokens_after_offset_at_token_end() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let after = tokens.after(TextSize::new(11));
assert_eq!(after.len(), 4);
assert_eq!(after.first().unwrap().kind(), TokenKind::Comment);
}
#[test]
fn tokens_after_offset_between_tokens() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let after = tokens.after(TextSize::new(13));
assert_eq!(after.len(), 4);
assert_eq!(after.first().unwrap().kind(), TokenKind::Comment);
}
#[test]
fn tokens_after_offset_at_last_token_end() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let after = tokens.after(TextSize::new(33));
assert_eq!(after.len(), 0);
}
#[test]
#[should_panic(expected = "Offset 5 is inside a token range 4..7")]
fn tokens_after_offset_inside_token() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
tokens.after(TextSize::new(5));
}
#[test]
fn tokens_before_offset_at_first_token_start() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let before = tokens.before(TextSize::new(0));
assert_eq!(before.len(), 0);
}
#[test]
fn tokens_before_offset_after_first_token_gap() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let before = tokens.before(TextSize::new(3));
assert_eq!(before.len(), 1);
assert_eq!(before.last().unwrap().kind(), TokenKind::Def);
}
#[test]
fn tokens_before_offset_at_second_token_start() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let before = tokens.before(TextSize::new(4));
assert_eq!(before.len(), 1);
assert_eq!(before.last().unwrap().kind(), TokenKind::Def);
}
#[test]
fn tokens_before_offset_at_token_start() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let before = tokens.before(TextSize::new(8));
assert_eq!(before.len(), 3);
assert_eq!(before.last().unwrap().kind(), TokenKind::Lpar);
}
#[test]
fn tokens_before_offset_at_token_end() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let before = tokens.before(TextSize::new(11));
assert_eq!(before.len(), 6);
assert_eq!(before.last().unwrap().kind(), TokenKind::Newline);
}
#[test]
fn tokens_before_offset_between_tokens() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let before = tokens.before(TextSize::new(13));
assert_eq!(before.len(), 6);
assert_eq!(before.last().unwrap().kind(), TokenKind::Newline);
}
#[test]
fn tokens_before_offset_at_last_token_end() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let before = tokens.before(TextSize::new(33));
assert_eq!(before.len(), 10);
assert_eq!(before.last().unwrap().kind(), TokenKind::Pass);
}
#[test]
#[should_panic(expected = "Offset 5 is inside a token range 4..7")]
fn tokens_before_offset_inside_token() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
tokens.before(TextSize::new(5));
}
#[test]
fn tokens_in_range_at_token_offset() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let in_range = tokens.in_range(TextRange::new(4.into(), 10.into()));
assert_eq!(in_range.len(), 4);
assert_eq!(in_range.first().unwrap().kind(), TokenKind::Name);
assert_eq!(in_range.last().unwrap().kind(), TokenKind::Colon);
}
#[test]
fn tokens_in_range_start_offset_at_token_end() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let in_range = tokens.in_range(TextRange::new(11.into(), 29.into()));
assert_eq!(in_range.len(), 3);
assert_eq!(in_range.first().unwrap().kind(), TokenKind::Comment);
assert_eq!(in_range.last().unwrap().kind(), TokenKind::Indent);
}
#[test]
fn tokens_in_range_end_offset_at_token_start() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let in_range = tokens.in_range(TextRange::new(8.into(), 15.into()));
assert_eq!(in_range.len(), 3);
assert_eq!(in_range.first().unwrap().kind(), TokenKind::Rpar);
assert_eq!(in_range.last().unwrap().kind(), TokenKind::Newline);
}
#[test]
fn tokens_in_range_start_offset_between_tokens() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let in_range = tokens.in_range(TextRange::new(13.into(), 29.into()));
assert_eq!(in_range.len(), 3);
assert_eq!(in_range.first().unwrap().kind(), TokenKind::Comment);
assert_eq!(in_range.last().unwrap().kind(), TokenKind::Indent);
}
#[test]
fn tokens_in_range_end_offset_between_tokens() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let in_range = tokens.in_range(TextRange::new(9.into(), 13.into()));
assert_eq!(in_range.len(), 2);
assert_eq!(in_range.first().unwrap().kind(), TokenKind::Colon);
assert_eq!(in_range.last().unwrap().kind(), TokenKind::Newline);
}
#[test]
#[should_panic(expected = "Offset 5 is inside a token range 4..7")]
fn tokens_in_range_start_offset_inside_token() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
tokens.in_range(TextRange::new(5.into(), 10.into()));
}
#[test]
#[should_panic(expected = "Offset 6 is inside a token range 4..7")]
fn tokens_in_range_end_offset_inside_token() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
tokens.in_range(TextRange::new(0.into(), 6.into()));
}
}

View File

@@ -0,0 +1,199 @@
//! Tests for [`ruff_python_ast::tokens::parentheses_iterator`] and
//! [`ruff_python_ast::tokens::parenthesized_range`].
use ruff_python_ast::{
self as ast, Expr,
token::{parentheses_iterator, parenthesized_range},
};
use ruff_python_parser::parse_module;
#[test]
fn test_no_parentheses() {
let source = "x = 2 + 2";
let parsed = parse_module(source).expect("should parse valid python");
let tokens = parsed.tokens();
let module = parsed.syntax();
let stmt = module.body.first().expect("module should have a statement");
let ast::Stmt::Assign(assign) = stmt else {
panic!("expected `Assign` statement, got {stmt:?}");
};
let result = parenthesized_range(assign.value.as_ref().into(), stmt.into(), tokens);
assert_eq!(result, None);
}
#[test]
fn test_single_parentheses() {
let source = "x = (2 + 2)";
let parsed = parse_module(source).expect("should parse valid python");
let tokens = parsed.tokens();
let module = parsed.syntax();
let stmt = module.body.first().expect("module should have a statement");
let ast::Stmt::Assign(assign) = stmt else {
panic!("expected `Assign` statement, got {stmt:?}");
};
let result = parenthesized_range(assign.value.as_ref().into(), stmt.into(), tokens);
let range = result.expect("should find parentheses");
assert_eq!(&source[range], "(2 + 2)");
}
#[test]
fn test_double_parentheses() {
let source = "x = ((2 + 2))";
let parsed = parse_module(source).expect("should parse valid python");
let tokens = parsed.tokens();
let module = parsed.syntax();
let stmt = module.body.first().expect("module should have a statement");
let ast::Stmt::Assign(assign) = stmt else {
panic!("expected `Assign` statement, got {stmt:?}");
};
let result = parenthesized_range(assign.value.as_ref().into(), stmt.into(), tokens);
let range = result.expect("should find parentheses");
assert_eq!(&source[range], "((2 + 2))");
}
#[test]
fn test_parentheses_with_whitespace() {
let source = "x = ( 2 + 2 )";
let parsed = parse_module(source).expect("should parse valid python");
let tokens = parsed.tokens();
let module = parsed.syntax();
let stmt = module.body.first().expect("module should have a statement");
let ast::Stmt::Assign(assign) = stmt else {
panic!("expected `Assign` statement, got {stmt:?}");
};
let result = parenthesized_range(assign.value.as_ref().into(), stmt.into(), tokens);
let range = result.expect("should find parentheses");
assert_eq!(&source[range], "( 2 + 2 )");
}
#[test]
fn test_parentheses_with_comments() {
let source = "x = ( # comment\n 2 + 2\n)";
let parsed = parse_module(source).expect("should parse valid python");
let tokens = parsed.tokens();
let module = parsed.syntax();
let stmt = module.body.first().expect("module should have a statement");
let ast::Stmt::Assign(assign) = stmt else {
panic!("expected `Assign` statement, got {stmt:?}");
};
let result = parenthesized_range(assign.value.as_ref().into(), stmt.into(), tokens);
let range = result.expect("should find parentheses");
assert_eq!(&source[range], "( # comment\n 2 + 2\n)");
}
#[test]
fn test_parenthesized_range_multiple() {
let source = "x = (((2 + 2)))";
let parsed = parse_module(source).expect("should parse valid python");
let tokens = parsed.tokens();
let module = parsed.syntax();
let stmt = module.body.first().expect("module should have a statement");
let ast::Stmt::Assign(assign) = stmt else {
panic!("expected `Assign` statement, got {stmt:?}");
};
let result = parenthesized_range(assign.value.as_ref().into(), stmt.into(), tokens);
let range = result.expect("should find parentheses");
assert_eq!(&source[range], "(((2 + 2)))");
}
#[test]
fn test_parentheses_iterator_multiple() {
let source = "x = (((2 + 2)))";
let parsed = parse_module(source).expect("should parse valid python");
let tokens = parsed.tokens();
let module = parsed.syntax();
let stmt = module.body.first().expect("module should have a statement");
let ast::Stmt::Assign(assign) = stmt else {
panic!("expected `Assign` statement, got {stmt:?}");
};
let ranges: Vec<_> =
parentheses_iterator(assign.value.as_ref().into(), Some(stmt.into()), tokens).collect();
assert_eq!(ranges.len(), 3);
assert_eq!(&source[ranges[0]], "(2 + 2)");
assert_eq!(&source[ranges[1]], "((2 + 2))");
assert_eq!(&source[ranges[2]], "(((2 + 2)))");
}
#[test]
fn test_call_arguments_not_counted() {
let source = "f(x)";
let parsed = parse_module(source).expect("should parse valid python");
let tokens = parsed.tokens();
let module = parsed.syntax();
let stmt = module.body.first().expect("module should have a statement");
let ast::Stmt::Expr(expr_stmt) = stmt else {
panic!("expected `Expr` statement, got {stmt:?}");
};
let Expr::Call(call) = expr_stmt.value.as_ref() else {
panic!("expected Call expression, got {:?}", expr_stmt.value);
};
let arg = call
.arguments
.args
.first()
.expect("call should have an argument");
let result = parenthesized_range(arg.into(), (&call.arguments).into(), tokens);
// The parentheses belong to the call, not the argument
assert_eq!(result, None);
}
#[test]
fn test_call_with_parenthesized_argument() {
let source = "f((x))";
let parsed = parse_module(source).expect("should parse valid python");
let tokens = parsed.tokens();
let module = parsed.syntax();
let stmt = module.body.first().expect("module should have a statement");
let ast::Stmt::Expr(expr_stmt) = stmt else {
panic!("expected Expr statement, got {stmt:?}");
};
let Expr::Call(call) = expr_stmt.value.as_ref() else {
panic!("expected `Call` expression, got {:?}", expr_stmt.value);
};
let arg = call
.arguments
.args
.first()
.expect("call should have an argument");
let result = parenthesized_range(arg.into(), (&call.arguments).into(), tokens);
let range = result.expect("should find parentheses around argument");
assert_eq!(&source[range], "(x)");
}
#[test]
fn test_multiline_with_parentheses() {
let source = "x = (\n 2 + 2 + 2\n)";
let parsed = parse_module(source).expect("should parse valid python");
let tokens = parsed.tokens();
let module = parsed.syntax();
let stmt = module.body.first().expect("module should have a statement");
let ast::Stmt::Assign(assign) = stmt else {
panic!("expected `Assign` statement, got {stmt:?}");
};
let result = parenthesized_range(assign.value.as_ref().into(), stmt.into(), tokens);
let range = result.expect("should find parentheses");
assert_eq!(&source[range], "(\n 2 + 2 + 2\n)");
}

View File

@@ -5,7 +5,7 @@ use std::cell::OnceCell;
use std::ops::Deref;
use ruff_python_ast::str::Quote;
use ruff_python_parser::{Token, TokenKind, Tokens};
use ruff_python_ast::token::{Token, TokenKind, Tokens};
use ruff_source_file::{LineEnding, LineRanges, find_newline};
use ruff_text_size::Ranged;

View File

@@ -3,7 +3,7 @@ use std::ops::{Deref, DerefMut};
use ruff_formatter::{Buffer, FormatContext, GroupId, IndentWidth, SourceCode};
use ruff_python_ast::str::Quote;
use ruff_python_parser::Tokens;
use ruff_python_ast::token::Tokens;
use crate::PyFormatOptions;
use crate::comments::Comments;

View File

@@ -5,7 +5,7 @@ use std::slice::Iter;
use ruff_formatter::{FormatError, write};
use ruff_python_ast::AnyNodeRef;
use ruff_python_ast::Stmt;
use ruff_python_parser::{self as parser, TokenKind};
use ruff_python_ast::token::{Token as AstToken, TokenKind};
use ruff_python_trivia::lines_before;
use ruff_source_file::LineRanges;
use ruff_text_size::{Ranged, TextRange, TextSize};
@@ -770,7 +770,7 @@ impl Format<PyFormatContext<'_>> for FormatVerbatimStatementRange {
}
struct LogicalLinesIter<'a> {
tokens: Iter<'a, parser::Token>,
tokens: Iter<'a, AstToken>,
// The end of the last logical line
last_line_end: TextSize,
// The position where the content to lex ends.
@@ -778,7 +778,7 @@ struct LogicalLinesIter<'a> {
}
impl<'a> LogicalLinesIter<'a> {
fn new(tokens: Iter<'a, parser::Token>, verbatim_range: TextRange) -> Self {
fn new(tokens: Iter<'a, AstToken>, verbatim_range: TextRange) -> Self {
Self {
tokens,
last_line_end: verbatim_range.start(),

View File

@@ -14,7 +14,6 @@ license = { workspace = true }
ruff_diagnostics = { workspace = true }
ruff_python_ast = { workspace = true }
ruff_python_codegen = { workspace = true }
ruff_python_parser = { workspace = true }
ruff_python_trivia = { workspace = true }
ruff_source_file = { workspace = true, features = ["serde"] }
ruff_text_size = { workspace = true }
@@ -22,6 +21,8 @@ ruff_text_size = { workspace = true }
anyhow = { workspace = true }
[dev-dependencies]
ruff_python_parser = { workspace = true }
insta = { workspace = true }
[features]

View File

@@ -5,8 +5,8 @@ use std::ops::Add;
use ruff_diagnostics::Edit;
use ruff_python_ast::Stmt;
use ruff_python_ast::helpers::is_docstring_stmt;
use ruff_python_ast::token::{TokenKind, Tokens};
use ruff_python_codegen::Stylist;
use ruff_python_parser::{TokenKind, Tokens};
use ruff_python_trivia::is_python_whitespace;
use ruff_python_trivia::{PythonWhitespace, textwrap::indent};
use ruff_source_file::{LineRanges, UniversalNewlineIterator};
@@ -194,7 +194,7 @@ impl<'a> Insertion<'a> {
tokens
.before(at)
.last()
.map(ruff_python_parser::Token::kind),
.map(ruff_python_ast::token::Token::kind),
Some(TokenKind::Import)
) {
return None;

View File

@@ -15,12 +15,12 @@ doctest = false
[dependencies]
ruff_python_ast = { workspace = true }
ruff_python_parser = { workspace = true }
ruff_python_trivia = { workspace = true }
ruff_source_file = { workspace = true }
ruff_text_size = { workspace = true }
[dev-dependencies]
ruff_python_parser = { workspace = true }
[lints]
workspace = true

View File

@@ -2,7 +2,7 @@
//! are omitted from the AST (e.g., commented lines).
use ruff_python_ast::Stmt;
use ruff_python_parser::{TokenKind, Tokens};
use ruff_python_ast::token::{TokenKind, Tokens};
use ruff_python_trivia::{
CommentRanges, has_leading_content, has_trailing_content, is_python_whitespace,
};

View File

@@ -1,6 +1,6 @@
use std::collections::BTreeMap;
use ruff_python_parser::{Token, TokenKind};
use ruff_python_ast::token::{Token, TokenKind};
use ruff_text_size::{Ranged, TextRange, TextSize};
/// Stores the ranges of all interpolated strings in a file sorted by [`TextRange::start`].

View File

@@ -1,4 +1,4 @@
use ruff_python_parser::{Token, TokenKind};
use ruff_python_ast::token::{Token, TokenKind};
use ruff_text_size::{Ranged, TextRange};
/// Stores the range of all multiline strings in a file sorted by

View File

@@ -0,0 +1,3 @@
class C[T = int, U]: ...
class C[T1, T2 = int, T3, T4]: ...
type Alias[T = int, U] = ...

View File

@@ -1,9 +1,10 @@
use std::fmt::{self, Display};
use ruff_python_ast::PythonVersion;
use ruff_python_ast::token::TokenKind;
use ruff_text_size::{Ranged, TextRange};
use crate::{TokenKind, string::InterpolatedStringKind};
use crate::string::InterpolatedStringKind;
/// Represents represent errors that occur during parsing and are
/// returned by the `parse_*` functions.

View File

@@ -14,6 +14,7 @@ use unicode_normalization::UnicodeNormalization;
use ruff_python_ast::name::Name;
use ruff_python_ast::str_prefix::{AnyStringPrefix, StringLiteralPrefix};
use ruff_python_ast::token::{TokenFlags, TokenKind};
use ruff_python_ast::{Int, IpyEscapeKind, StringFlags};
use ruff_python_trivia::is_python_whitespace;
use ruff_text_size::{TextLen, TextRange, TextSize};
@@ -26,7 +27,7 @@ use crate::lexer::interpolated_string::{
InterpolatedStringContext, InterpolatedStrings, InterpolatedStringsCheckpoint,
};
use crate::string::InterpolatedStringKind;
use crate::token::{TokenFlags, TokenKind, TokenValue};
use crate::token::TokenValue;
mod cursor;
mod indentation;

View File

@@ -63,23 +63,20 @@
//! [lexical analysis]: https://en.wikipedia.org/wiki/Lexical_analysis
//! [parsing]: https://en.wikipedia.org/wiki/Parsing
//! [lexer]: crate::lexer
use std::iter::FusedIterator;
use std::ops::Deref;
pub use crate::error::{
InterpolatedStringErrorType, LexicalErrorType, ParseError, ParseErrorType,
UnsupportedSyntaxError, UnsupportedSyntaxErrorKind,
};
pub use crate::parser::ParseOptions;
pub use crate::token::{Token, TokenKind};
use crate::parser::Parser;
use ruff_python_ast::token::Tokens;
use ruff_python_ast::{
Expr, Mod, ModExpression, ModModule, PySourceType, StringFlags, StringLiteral, Suite,
};
use ruff_python_trivia::CommentRanges;
use ruff_text_size::{Ranged, TextRange, TextSize};
use ruff_text_size::{Ranged, TextRange};
mod error;
pub mod lexer;
@@ -473,351 +470,6 @@ impl Parsed<ModExpression> {
}
}
/// Tokens represents a vector of lexed [`Token`].
#[derive(Debug, Clone, PartialEq, Eq, get_size2::GetSize)]
pub struct Tokens {
raw: Vec<Token>,
}
impl Tokens {
pub(crate) fn new(tokens: Vec<Token>) -> Tokens {
Tokens { raw: tokens }
}
/// Returns an iterator over all the tokens that provides context.
pub fn iter_with_context(&self) -> TokenIterWithContext<'_> {
TokenIterWithContext::new(&self.raw)
}
/// Performs a binary search to find the index of the **first** token that starts at the given `offset`.
///
/// Unlike `binary_search_by_key`, this method ensures that if multiple tokens start at the same offset,
/// it returns the index of the first one. Multiple tokens can start at the same offset in cases where
/// zero-length tokens are involved (like `Dedent` or `Newline` at the end of the file).
pub fn binary_search_by_start(&self, offset: TextSize) -> Result<usize, usize> {
let partition_point = self.partition_point(|token| token.start() < offset);
let after = &self[partition_point..];
if after.first().is_some_and(|first| first.start() == offset) {
Ok(partition_point)
} else {
Err(partition_point)
}
}
/// Returns a slice of [`Token`] that are within the given `range`.
///
/// The start and end offset of the given range should be either:
/// 1. Token boundary
/// 2. Gap between the tokens
///
/// For example, considering the following tokens and their corresponding range:
///
/// | Token | Range |
/// |---------------------|-----------|
/// | `Def` | `0..3` |
/// | `Name` | `4..7` |
/// | `Lpar` | `7..8` |
/// | `Rpar` | `8..9` |
/// | `Colon` | `9..10` |
/// | `Newline` | `10..11` |
/// | `Comment` | `15..24` |
/// | `NonLogicalNewline` | `24..25` |
/// | `Indent` | `25..29` |
/// | `Pass` | `29..33` |
///
/// Here, for (1) a token boundary is considered either the start or end offset of any of the
/// above tokens. For (2), the gap would be any offset between the `Newline` and `Comment`
/// token which are 12, 13, and 14.
///
/// Examples:
/// 1) `4..10` would give `Name`, `Lpar`, `Rpar`, `Colon`
/// 2) `11..25` would give `Comment`, `NonLogicalNewline`
/// 3) `12..25` would give same as (2) and offset 12 is in the "gap"
/// 4) `9..12` would give `Colon`, `Newline` and offset 12 is in the "gap"
/// 5) `18..27` would panic because both the start and end offset is within a token
///
/// ## Note
///
/// The returned slice can contain the [`TokenKind::Unknown`] token if there was a lexical
/// error encountered within the given range.
///
/// # Panics
///
/// If either the start or end offset of the given range is within a token range.
pub fn in_range(&self, range: TextRange) -> &[Token] {
let tokens_after_start = self.after(range.start());
Self::before_impl(tokens_after_start, range.end())
}
/// Searches the token(s) at `offset`.
///
/// Returns [`TokenAt::Between`] if `offset` points directly inbetween two tokens
/// (the left token ends at `offset` and the right token starts at `offset`).
///
///
/// ## Examples
///
/// [Playground](https://play.ruff.rs/f3ad0a55-5931-4a13-96c7-b2b8bfdc9a2e?secondary=Tokens)
///
/// ```
/// # use ruff_python_ast::PySourceType;
/// # use ruff_python_parser::{Token, TokenAt, TokenKind};
/// # use ruff_text_size::{Ranged, TextSize};
///
/// let source = r#"
/// def test(arg):
/// arg.call()
/// if True:
/// pass
/// print("true")
/// "#.trim();
///
/// let parsed = ruff_python_parser::parse_unchecked_source(source, PySourceType::Python);
/// let tokens = parsed.tokens();
///
/// let collect_tokens = |offset: TextSize| {
/// tokens.at_offset(offset).into_iter().map(|t| (t.kind(), &source[t.range()])).collect::<Vec<_>>()
/// };
///
/// assert_eq!(collect_tokens(TextSize::new(4)), vec! [(TokenKind::Name, "test")]);
/// assert_eq!(collect_tokens(TextSize::new(6)), vec! [(TokenKind::Name, "test")]);
/// // between `arg` and `.`
/// assert_eq!(collect_tokens(TextSize::new(22)), vec! [(TokenKind::Name, "arg"), (TokenKind::Dot, ".")]);
/// assert_eq!(collect_tokens(TextSize::new(36)), vec! [(TokenKind::If, "if")]);
/// // Before the dedent token
/// assert_eq!(collect_tokens(TextSize::new(57)), vec! []);
/// ```
pub fn at_offset(&self, offset: TextSize) -> TokenAt {
match self.binary_search_by_start(offset) {
// The token at `index` starts exactly at `offset.
// ```python
// object.attribute
// ^ OFFSET
// ```
Ok(index) => {
let token = self[index];
// `token` starts exactly at `offset`. Test if the offset is right between
// `token` and the previous token (if there's any)
if let Some(previous) = index.checked_sub(1).map(|idx| self[idx]) {
if previous.end() == offset {
return TokenAt::Between(previous, token);
}
}
TokenAt::Single(token)
}
// No token found that starts exactly at the given offset. But it's possible that
// the token starting before `offset` fully encloses `offset` (it's end range ends after `offset`).
// ```python
// object.attribute
// ^ OFFSET
// # or
// if True:
// print("test")
// ^ OFFSET
// ```
Err(index) => {
if let Some(previous) = index.checked_sub(1).map(|idx| self[idx]) {
if previous.range().contains_inclusive(offset) {
return TokenAt::Single(previous);
}
}
TokenAt::None
}
}
}
/// Returns a slice of tokens before the given [`TextSize`] offset.
///
/// If the given offset is between two tokens, the returned slice will end just before the
/// following token. In other words, if the offset is between the end of previous token and
/// start of next token, the returned slice will end just before the next token.
///
/// # Panics
///
/// If the given offset is inside a token range at any point
/// other than the start of the range.
pub fn before(&self, offset: TextSize) -> &[Token] {
Self::before_impl(&self.raw, offset)
}
fn before_impl(tokens: &[Token], offset: TextSize) -> &[Token] {
let partition_point = tokens.partition_point(|token| token.start() < offset);
let before = &tokens[..partition_point];
if let Some(last) = before.last() {
// If it's equal to the end offset, then it's at a token boundary which is
// valid. If it's greater than the end offset, then it's in the gap between
// the tokens which is valid as well.
assert!(
offset >= last.end(),
"Offset {:?} is inside a token range {:?}",
offset,
last.range()
);
}
before
}
/// Returns a slice of tokens after the given [`TextSize`] offset.
///
/// If the given offset is between two tokens, the returned slice will start from the following
/// token. In other words, if the offset is between the end of previous token and start of next
/// token, the returned slice will start from the next token.
///
/// # Panics
///
/// If the given offset is inside a token range at any point
/// other than the start of the range.
pub fn after(&self, offset: TextSize) -> &[Token] {
let partition_point = self.partition_point(|token| token.end() <= offset);
let after = &self[partition_point..];
if let Some(first) = after.first() {
// valid. If it's greater than the end offset, then it's in the gap between
// the tokens which is valid as well.
assert!(
offset <= first.start(),
"Offset {:?} is inside a token range {:?}",
offset,
first.range()
);
}
after
}
}
impl<'a> IntoIterator for &'a Tokens {
type Item = &'a Token;
type IntoIter = std::slice::Iter<'a, Token>;
fn into_iter(self) -> Self::IntoIter {
self.iter()
}
}
impl Deref for Tokens {
type Target = [Token];
fn deref(&self) -> &Self::Target {
&self.raw
}
}
/// A token that encloses a given offset or ends exactly at it.
#[derive(Debug, Clone)]
pub enum TokenAt {
/// There's no token at the given offset
None,
/// There's a single token at the given offset.
Single(Token),
/// The offset falls exactly between two tokens. E.g. `CURSOR` in `call<CURSOR>(arguments)` is
/// positioned exactly between the `call` and `(` tokens.
Between(Token, Token),
}
impl Iterator for TokenAt {
type Item = Token;
fn next(&mut self) -> Option<Self::Item> {
match *self {
TokenAt::None => None,
TokenAt::Single(token) => {
*self = TokenAt::None;
Some(token)
}
TokenAt::Between(first, second) => {
*self = TokenAt::Single(second);
Some(first)
}
}
}
}
impl FusedIterator for TokenAt {}
impl From<&Tokens> for CommentRanges {
fn from(tokens: &Tokens) -> Self {
let mut ranges = vec![];
for token in tokens {
if token.kind() == TokenKind::Comment {
ranges.push(token.range());
}
}
CommentRanges::new(ranges)
}
}
/// An iterator over the [`Token`]s with context.
///
/// This struct is created by the [`iter_with_context`] method on [`Tokens`]. Refer to its
/// documentation for more details.
///
/// [`iter_with_context`]: Tokens::iter_with_context
#[derive(Debug, Clone)]
pub struct TokenIterWithContext<'a> {
inner: std::slice::Iter<'a, Token>,
nesting: u32,
}
impl<'a> TokenIterWithContext<'a> {
fn new(tokens: &'a [Token]) -> TokenIterWithContext<'a> {
TokenIterWithContext {
inner: tokens.iter(),
nesting: 0,
}
}
/// Return the nesting level the iterator is currently in.
pub const fn nesting(&self) -> u32 {
self.nesting
}
/// Returns `true` if the iterator is within a parenthesized context.
pub const fn in_parenthesized_context(&self) -> bool {
self.nesting > 0
}
/// Returns the next [`Token`] in the iterator without consuming it.
pub fn peek(&self) -> Option<&'a Token> {
self.clone().next()
}
}
impl<'a> Iterator for TokenIterWithContext<'a> {
type Item = &'a Token;
fn next(&mut self) -> Option<Self::Item> {
let token = self.inner.next()?;
match token.kind() {
TokenKind::Lpar | TokenKind::Lbrace | TokenKind::Lsqb => self.nesting += 1,
TokenKind::Rpar | TokenKind::Rbrace | TokenKind::Rsqb => {
self.nesting = self.nesting.saturating_sub(1);
}
// This mimics the behavior of re-lexing which reduces the nesting level on the lexer.
// We don't need to reduce it by 1 because unlike the lexer we see the final token
// after recovering from every unclosed parenthesis.
TokenKind::Newline if self.nesting > 0 => {
self.nesting = 0;
}
_ => {}
}
Some(token)
}
}
impl FusedIterator for TokenIterWithContext<'_> {}
/// Control in the different modes by which a source file can be parsed.
///
/// The mode argument specifies in what way code must be parsed.
@@ -888,204 +540,3 @@ impl std::fmt::Display for ModeParseError {
write!(f, r#"mode must be "exec", "eval", "ipython", or "single""#)
}
}
#[cfg(test)]
mod tests {
use std::ops::Range;
use crate::token::TokenFlags;
use super::*;
/// Test case containing a "gap" between two tokens.
///
/// Code: <https://play.ruff.rs/a3658340-6df8-42c5-be80-178744bf1193>
const TEST_CASE_WITH_GAP: [(TokenKind, Range<u32>); 10] = [
(TokenKind::Def, 0..3),
(TokenKind::Name, 4..7),
(TokenKind::Lpar, 7..8),
(TokenKind::Rpar, 8..9),
(TokenKind::Colon, 9..10),
(TokenKind::Newline, 10..11),
// Gap ||..||
(TokenKind::Comment, 15..24),
(TokenKind::NonLogicalNewline, 24..25),
(TokenKind::Indent, 25..29),
(TokenKind::Pass, 29..33),
// No newline at the end to keep the token set full of unique tokens
];
/// Helper function to create [`Tokens`] from an iterator of (kind, range).
fn new_tokens(tokens: impl Iterator<Item = (TokenKind, Range<u32>)>) -> Tokens {
Tokens::new(
tokens
.map(|(kind, range)| {
Token::new(
kind,
TextRange::new(TextSize::new(range.start), TextSize::new(range.end)),
TokenFlags::empty(),
)
})
.collect(),
)
}
#[test]
fn tokens_after_offset_at_token_start() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let after = tokens.after(TextSize::new(8));
assert_eq!(after.len(), 7);
assert_eq!(after.first().unwrap().kind(), TokenKind::Rpar);
}
#[test]
fn tokens_after_offset_at_token_end() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let after = tokens.after(TextSize::new(11));
assert_eq!(after.len(), 4);
assert_eq!(after.first().unwrap().kind(), TokenKind::Comment);
}
#[test]
fn tokens_after_offset_between_tokens() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let after = tokens.after(TextSize::new(13));
assert_eq!(after.len(), 4);
assert_eq!(after.first().unwrap().kind(), TokenKind::Comment);
}
#[test]
fn tokens_after_offset_at_last_token_end() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let after = tokens.after(TextSize::new(33));
assert_eq!(after.len(), 0);
}
#[test]
#[should_panic(expected = "Offset 5 is inside a token range 4..7")]
fn tokens_after_offset_inside_token() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
tokens.after(TextSize::new(5));
}
#[test]
fn tokens_before_offset_at_first_token_start() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let before = tokens.before(TextSize::new(0));
assert_eq!(before.len(), 0);
}
#[test]
fn tokens_before_offset_after_first_token_gap() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let before = tokens.before(TextSize::new(3));
assert_eq!(before.len(), 1);
assert_eq!(before.last().unwrap().kind(), TokenKind::Def);
}
#[test]
fn tokens_before_offset_at_second_token_start() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let before = tokens.before(TextSize::new(4));
assert_eq!(before.len(), 1);
assert_eq!(before.last().unwrap().kind(), TokenKind::Def);
}
#[test]
fn tokens_before_offset_at_token_start() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let before = tokens.before(TextSize::new(8));
assert_eq!(before.len(), 3);
assert_eq!(before.last().unwrap().kind(), TokenKind::Lpar);
}
#[test]
fn tokens_before_offset_at_token_end() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let before = tokens.before(TextSize::new(11));
assert_eq!(before.len(), 6);
assert_eq!(before.last().unwrap().kind(), TokenKind::Newline);
}
#[test]
fn tokens_before_offset_between_tokens() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let before = tokens.before(TextSize::new(13));
assert_eq!(before.len(), 6);
assert_eq!(before.last().unwrap().kind(), TokenKind::Newline);
}
#[test]
fn tokens_before_offset_at_last_token_end() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let before = tokens.before(TextSize::new(33));
assert_eq!(before.len(), 10);
assert_eq!(before.last().unwrap().kind(), TokenKind::Pass);
}
#[test]
#[should_panic(expected = "Offset 5 is inside a token range 4..7")]
fn tokens_before_offset_inside_token() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
tokens.before(TextSize::new(5));
}
#[test]
fn tokens_in_range_at_token_offset() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let in_range = tokens.in_range(TextRange::new(4.into(), 10.into()));
assert_eq!(in_range.len(), 4);
assert_eq!(in_range.first().unwrap().kind(), TokenKind::Name);
assert_eq!(in_range.last().unwrap().kind(), TokenKind::Colon);
}
#[test]
fn tokens_in_range_start_offset_at_token_end() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let in_range = tokens.in_range(TextRange::new(11.into(), 29.into()));
assert_eq!(in_range.len(), 3);
assert_eq!(in_range.first().unwrap().kind(), TokenKind::Comment);
assert_eq!(in_range.last().unwrap().kind(), TokenKind::Indent);
}
#[test]
fn tokens_in_range_end_offset_at_token_start() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let in_range = tokens.in_range(TextRange::new(8.into(), 15.into()));
assert_eq!(in_range.len(), 3);
assert_eq!(in_range.first().unwrap().kind(), TokenKind::Rpar);
assert_eq!(in_range.last().unwrap().kind(), TokenKind::Newline);
}
#[test]
fn tokens_in_range_start_offset_between_tokens() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let in_range = tokens.in_range(TextRange::new(13.into(), 29.into()));
assert_eq!(in_range.len(), 3);
assert_eq!(in_range.first().unwrap().kind(), TokenKind::Comment);
assert_eq!(in_range.last().unwrap().kind(), TokenKind::Indent);
}
#[test]
fn tokens_in_range_end_offset_between_tokens() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let in_range = tokens.in_range(TextRange::new(9.into(), 13.into()));
assert_eq!(in_range.len(), 2);
assert_eq!(in_range.first().unwrap().kind(), TokenKind::Colon);
assert_eq!(in_range.last().unwrap().kind(), TokenKind::Newline);
}
#[test]
#[should_panic(expected = "Offset 5 is inside a token range 4..7")]
fn tokens_in_range_start_offset_inside_token() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
tokens.in_range(TextRange::new(5.into(), 10.into()));
}
#[test]
#[should_panic(expected = "Offset 6 is inside a token range 4..7")]
fn tokens_in_range_end_offset_inside_token() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
tokens.in_range(TextRange::new(0.into(), 6.into()));
}
}

View File

@@ -4,6 +4,7 @@ use bitflags::bitflags;
use rustc_hash::{FxBuildHasher, FxHashSet};
use ruff_python_ast::name::Name;
use ruff_python_ast::token::TokenKind;
use ruff_python_ast::{
self as ast, AnyStringFlags, AtomicNodeIndex, BoolOp, CmpOp, ConversionFlag, Expr, ExprContext,
FString, InterpolatedStringElement, InterpolatedStringElements, IpyEscapeKind, Number,
@@ -18,7 +19,7 @@ use crate::string::{
InterpolatedStringKind, StringType, parse_interpolated_string_literal_element,
parse_string_literal,
};
use crate::token::{TokenKind, TokenValue};
use crate::token::TokenValue;
use crate::token_set::TokenSet;
use crate::{
InterpolatedStringErrorType, Mode, ParseErrorType, UnsupportedSyntaxError,

Some files were not shown because too many files have changed in this diff Show More