Compare commits

..

32 Commits

Author SHA1 Message Date
Brent Westbrook
9d4f1c6ae2 Bump 0.14.8 (#21791) 2025-12-04 09:45:53 -05:00
Micha Reiser
326025d45f [ty] Always register rename provider if client doesn't support dynamic registration (#21789) 2025-12-04 14:40:16 +01:00
Micha Reiser
3aefe85b32 [ty] Ensure rename CursorTest calls can_rename before renaming (#21790) 2025-12-04 14:19:48 +01:00
Dhruv Manilawala
b8ecc83a54 Fix clippy errors on main (#21788)
https://github.com/astral-sh/ruff/actions/runs/19922070773/job/57112827024#step:5:62
2025-12-04 16:20:37 +05:30
Aria Desires
6491932757 [ty] Fix crash when hovering an unknown string annotation (#21782)
## Summary

I have no idea what I'm doing with the fix (all the interesting stuff is
in the second commit).

The basic problem is the compiler emits the diagnostic:

```
x: "foobar"
    ^^^^^^
```

Which the suppression code-action hands the end of to `Tokens::after`
which then panics because that function panics if handed an offset that
is in the middle of a token.

Fixes https://github.com/astral-sh/ty/issues/1748

## Test Plan

Many tests added (only the e2e test matters).
2025-12-04 09:11:40 +01:00
Micha Reiser
a9f2bb41bd [ty] Don't send publish diagnostics for clients supporting pull diagnostics (#21772) 2025-12-04 08:12:04 +01:00
Aria Desires
e2b72fbf99 [ty] cleanup test path (#21781)
Fixes
https://github.com/astral-sh/ruff/pull/21745#discussion_r2586552295
2025-12-03 21:54:50 +00:00
Alex Waygood
14fce0d440 [ty] Improve the display of various special-form types (#21775) 2025-12-03 21:19:59 +00:00
Alex Waygood
8ebecb2a88 [ty] Add subdiagnostic hint if the user wrote X = Any rather than X: Any (#21777) 2025-12-03 20:42:21 +00:00
Aria Desires
45ac30a4d7 [ty] Teach ty the meaning of desperation (try ancestor pyproject.tomls as search-paths if module resolution fails) (#21745)
## Summary

This makes an importing file a required argument to module resolution,
and if the fast-path cached query fails to resolve the module, take the
slow-path uncached (could be cached if we want)
`desperately_resolve_module` which will walk up from the importing file
until it finds a `pyproject.toml` (arbitrary decision, we could try
every ancestor directory), at which point it takes one last desperate
attempt to use that directory as a search-path. We do not continue
walking up once we've found a `pyproject.toml` (arbitrary decision, we
could keep going up).

Running locally, this fixes every broken-for-workspace-reasons import in
pyx's workspace!

* Fixes https://github.com/astral-sh/ty/issues/1539
* Improves https://github.com/astral-sh/ty/issues/839

## Test Plan

The workspace tests see a huge improvement on most absolute imports.
2025-12-03 15:04:36 -05:00
Alex Waygood
0280949000 [ty] fix panic when attempting to infer the variance of a PEP-695 class that depends on a recursive type aliases and also somehow protocols (#21778)
Fixes https://github.com/astral-sh/ty/issues/1716.

## Test plan

I added a corpus snippet that causes us to panic on `main` (I tested by
running `cargo run -p ty_python_semantic --test=corpus` without the fix
applied).
2025-12-03 19:01:42 +00:00
Bhuminjay Soni
c722f498fe [flake8-bugbear] Catch yield expressions within other statements (B901) (#21200)
## Summary

This PR re-implements [return-in-generator
(B901)](https://docs.astral.sh/ruff/rules/return-in-generator/#return-in-generator-b901)
for async generators as a semantic syntax error. This is not a syntax
error for sync generators, so we'll need to preserve both the lint rule
and the syntax error in this case.

It also updates B901 and the new implementation to catch cases where the
generator's `yield` or `yield from` expression is part of another
statement, as in:

```py
def foo():
    return (yield)
```

These were previously not caught because we only looked for
`Stmt::Expr(Expr::Yield)` in `visit_stmt` instead of visiting `yield`
expressions directly. I think this modification is within the spirit of
the rule and safe to try out since the rule is in preview.

## Test Plan

<!-- How was it tested? -->
I have written tests as directed in #17412

---------

Signed-off-by: 11happy <soni5happy@gmail.com>
Signed-off-by: 11happy <bhuminjaysoni@gmail.com>
Co-authored-by: Brent Westbrook <brentrwestbrook@gmail.com>
Co-authored-by: Brent Westbrook <36778786+ntBre@users.noreply.github.com>
2025-12-03 12:05:15 -05:00
David Peter
1f4f8d9950 [ty] Fix flow of associated member states during star imports (#21776)
## Summary

Star-imports can not just affect the state of symbols that they pull in,
they can also affect the state of members that are associated with those
symbols. For example, if `obj.attr` was previously narrowed from `int |
None` to `int`, and a star-import now overwrites `obj`, then the
narrowing on `obj.attr` should be "reset".

This PR keeps track of the state of associated members during star
imports and properly models the flow of their corresponding state
through the control flow structure that we artificially create for
star-imports.

See [this
comment](https://github.com/astral-sh/ty/issues/1355#issuecomment-3607125005)
for an explanation why this caused ty to see certain `asyncio` symbols
as not being accessible on Python 3.14.

closes https://github.com/astral-sh/ty/issues/1355

## Ecosystem impact

```diff
async-utils (https://github.com/mikeshardmind/async-utils)
- src/async_utils/bg_loop.py:115:31: error[invalid-argument-type] Argument to bound method `set_task_factory` is incorrect: Expected `_TaskFactory | None`, found `def eager_task_factory[_T_co](loop: AbstractEventLoop | None, coro: Coroutine[Any, Any, _T_co@eager_task_factory], *, name: str | None = None, context: Context | None = None) -> Task[_T_co@eager_task_factory]`
- Found 30 diagnostics
+ Found 29 diagnostics

mitmproxy (https://github.com/mitmproxy/mitmproxy)
+ mitmproxy/utils/asyncio_utils.py:96:60: warning[unused-ignore-comment] Unused blanket `type: ignore` directive
- test/conftest.py:37:31: error[invalid-argument-type] Argument to bound method `set_task_factory` is incorrect: Expected `_TaskFactory | None`, found `def eager_task_factory[_T_co](loop: AbstractEventLoop | None, coro: Coroutine[Any, Any, _T_co@eager_task_factory], *, name: str | None = None, context: Context | None = None) -> Task[_T_co@eager_task_factory]`
```

All of these seem to be correct, they give us a different type for
`asyncio` symbols that are now imported from different
`sys.version_info` branches (where we previously failed to recognize
some of these as statically true/false).

```diff
dd-trace-py (https://github.com/DataDog/dd-trace-py)
- ddtrace/contrib/internal/asyncio/patch.py:39:12: error[invalid-argument-type] Argument to function `unwrap` is incorrect: Expected `WrappedFunction`, found `def create_task[_T](self, coro: Coroutine[Any, Any, _T@create_task] | Generator[Any, None, _T@create_task], *, name: object = None) -> Task[_T@create_task]`
+ ddtrace/contrib/internal/asyncio/patch.py:39:12: error[invalid-argument-type] Argument to function `unwrap` is incorrect: Expected `WrappedFunction`, found `def create_task[_T](self, coro: Generator[Any, None, _T@create_task] | Coroutine[Any, Any, _T@create_task], *, name: object = None) -> Task[_T@create_task]`
```

Similar, but only results in a diagnostic change.

## Test Plan

Added a regression test
2025-12-03 17:52:31 +01:00
William Woodruff
4488e9d47d Revert "Enable PEP 740 attestations when publishing to PyPI" (#21768) 2025-12-03 11:07:29 -05:00
github-actions[bot]
b08f0b2caa [ty] Sync vendored typeshed stubs (#21715)
Co-authored-by: typeshedbot <>
Co-authored-by: David Peter <mail@david-peter.de>
2025-12-03 15:49:51 +00:00
David Peter
d6e472f297 [ty] Reachability constraints: minor documentation fixes (#21774) 2025-12-03 16:40:11 +01:00
Douglas Creager
45842cc034 [ty] Fix non-determinism in ConstraintSet.specialize_constrained (#21744)
This fixes a non-determinism that we were seeing in the constraint set
tests in https://github.com/astral-sh/ruff/pull/21715.

In this test, we create the following constraint set, and then try to
create a specialization from it:

```
(T@constrained_by_gradual_list = list[Base])
  ∨
(Bottom[list[Any]] ≤ T@constrained_by_gradual_list ≤ Top[list[Any]])
```

That is, `T` is either specifically `list[Base]`, or it's any `list`.
Our current heuristics say that, absent other restrictions, we should
specialize `T` to the more specific type (`list[Base]`).

In the correct test output, we end up creating a BDD that looks like
this:

```
(T@constrained_by_gradual_list = list[Base])
┡━₁ always
└─₀ (Bottom[list[Any]] ≤ T@constrained_by_gradual_list ≤ Top[list[Any]])
    ┡━₁ always
    └─₀ never
```

In the incorrect output, the BDD looks like this:

```
(Bottom[list[Any]] ≤ T@constrained_by_gradual_list ≤ Top[list[Any]])
┡━₁ always
└─₀ never
```

The difference is the ordering of the two individual constraints. Both
constraints appear in the first BDD, but the second BDD only contains `T
is any list`. If we were to force the second BDD to contain both
constraints, it would look like this:

```
(Bottom[list[Any]] ≤ T@constrained_by_gradual_list ≤ Top[list[Any]])
┡━₁ always
└─₀ (T@constrained_by_gradual_list = list[Base])
    ┡━₁ always
    └─₀ never
```

This is the standard shape for an OR of two constraints. However! Those
two constraints are not independent of each other! If `T` is
specifically `list[Base]`, then it's definitely also "any `list`". From
that, we can infer the contrapositive: that if `T` is not any list, then
it cannot be `list[Base]` specifically. When we encounter impossible
situations like that, we prune that path in the BDD, and treat it as
`false`. That rewrites the second BDD to the following:

```
(Bottom[list[Any]] ≤ T@constrained_by_gradual_list ≤ Top[list[Any]])
┡━₁ always
└─₀ (T@constrained_by_gradual_list = list[Base])
    ┡━₁ never   <-- IMPOSSIBLE, rewritten to never
    └─₀ never
```

We then would see that that BDD node is redundant, since both of its
outgoing edges point at the `never` node. Our BDDs are _reduced_, which
means we have to remove that redundant node, resulting in the BDD we saw
above:

```
(Bottom[list[Any]] ≤ T@constrained_by_gradual_list ≤ Top[list[Any]])
┡━₁ always
└─₀ never       <-- redundant node removed
```

The end result is that we were "forgetting" about the `T = list[Base]`
constraint, but only for some BDD variable orderings.

To fix this, I'm leaning in to the fact that our BDDs really do need to
"remember" all of the constraints that they were created with. Some
combinations might not be possible, but we now have the sequent map,
which is quite good at detecting and pruning those.

So now our BDDs are _quasi-reduced_, which just means that redundant
nodes are allowed. (At first I was worried that allowing redundant nodes
would be an unsound "fix the glitch". But it turns out they're real!
[This](https://ieeexplore.ieee.org/abstract/document/130209) is the
paper that introduces them, though it's very difficult to read. Knuth
mentions them in §7.1.4 of
[TAOCP](https://course.khoury.northeastern.edu/csu690/ssl/bdd-knuth.pdf),
and [this paper](https://par.nsf.gov/servlets/purl/10128966) has a nice
short summary of them in §2.)

While we're here, I've added a bunch of `debug` and `trace` level log
messages to the constraint set implementation. I was getting tired of
having to add these by hands over and over. To enable them, just set
`TY_LOG` in your environment, e.g.

```sh
env TY_LOG=ty_python_semantic::types::constraints::SequentMap=trace ty check ...
```

[Note, this has an `internal` label because are still not using
`specialize_constrained` in anything user-facing yet.]
2025-12-03 10:19:39 -05:00
Alex Waygood
cd079bd92e [ty] Improve @override, @final and Liskov checks in cases where there are multiple reachable definitions (#21767) 2025-12-03 12:51:36 +00:00
Alex Waygood
5756b3809c [ty] Extend invalid-explicit-override to also cover properties decorated with @override that do not override anything (#21756) 2025-12-03 11:27:47 +00:00
Micha Reiser
92c5f62ec0 [ty] Enable LRU collection for parsed module (#21749) 2025-12-03 12:16:18 +01:00
David Peter
21e5a57296 [ty] Support typevar-specialized dynamic types in generic type aliases (#21730)
## Summary

For a type alias like the one below, where `UnknownClass` is something
with a dynamic type, we previously lost track of the fact that this
dynamic type was explicitly specialized *with a type variable*. If that
alias is then later explicitly specialized itself (`MyAlias[int]`), we
would miscount the number of legacy type variables and emit a
`invalid-type-arguments` diagnostic
([playground](https://play.ty.dev/886ae6cc-86c3-4304-a365-510d29211f85)).
```py
T = TypeVar("T")

MyAlias: TypeAlias = UnknownClass[T] | None
```
The solution implemented here is not pretty, but we can hopefully get
rid of it via https://github.com/astral-sh/ty/issues/1711. Also, once we
properly support `ParamSpec` and `Concatenate`, we should be able to
remove some of this code.

This addresses many of the `invalid-type-arguments` false-positives in
https://github.com/astral-sh/ty/issues/1685. With this change, there are
still some diagnostics of this type left. Instead of implementing even
more (rather sophisticated) workarounds for these cases as well, it
might be much easier to wait for full `ParamSpec`/`Concatenate` support
and then try again.

A disadvantage of this implementation is that we lose track of some
`@Todo` types and replace them with `Unknown`. We could spend more
effort and try to preserve them, but I'm unsure if this is the best use
of our time right now.

## Test Plan

New Markdown tests.
2025-12-03 10:00:02 +01:00
Denys Zhak
f4e4229683 Add token based parenthesized_ranges implementation (#21738)
Co-authored-by: Micha Reiser <micha@reiser.io>
2025-12-03 08:15:17 +00:00
David Peter
e6ddeed386 [ty] Default-specialization of generic type aliases (#21765)
## Summary

Implement default-specialization of generic type aliases (implicit or
PEP-613) if they are used in a type expression without an explicit
specialization.

closes https://github.com/astral-sh/ty/issues/1690

## Typing conformance

```diff
-generics_defaults_specialization.py:26:5: error[type-assertion-failure] Type `SomethingWithNoDefaults[int, str]` does not match asserted type `SomethingWithNoDefaults[int, DefaultStrT]`
```

That's exactly what we want ✔️ 

All other tests in this file pass as well, with the exception of this
assertion, which is just wrong (at least according to our
interpretation, `type[Bar] != <class 'Bar'>`). I checked that we do
correctly default-specialize the type parameter which is not displayed
in the diagnostic that we raise.
```py
class Bar(SubclassMe[int, DefaultStrT]): ...

assert_type(Bar, type[Bar[str]])  # ty: Type `type[Bar[str]]` does not match asserted type `<class 'Bar'>`
```

## Ecosystem impact

Looks like I should have included this last week 😎 

## Test Plan

Updated pre-existing tests and add a few new ones.
2025-12-03 09:10:45 +01:00
Alex Waygood
c5b8d551df [ty] Suppress false positives when dataclasses.dataclass(...)(cls) is called imperatively (#21729)
Fixes https://github.com/astral-sh/ty/issues/1705
2025-12-03 08:05:25 +00:00
Bhuminjay Soni
f68080b55e [syntax-error] Default type parameter followed by non-default type parameter (#21657)
## Summary

This PR implements syntax error where a default type parameter is
followed by a non-default type parameter.
https://github.com/astral-sh/ruff/issues/17412#issuecomment-3584088217


## Test Plan

I have written inline tests as directed in #17412

---------

Signed-off-by: 11happy <bhuminjaysoni@gmail.com>
Signed-off-by: 11happy <soni5happy@gmail.com>
2025-12-03 12:01:31 +05:30
Amethyst Reese
abaa49f552 new module for parsing ranged suppressions (#21441)
This adds a new `suppression` module to the `ruff_linter` crate, similar
to the suppression
module for ty, to parse comments for ruff suppression directives, such
as `# ruff: disable[CODE]`.
2025-12-02 15:39:59 -08:00
Ibraheem Ahmed
7b0aab1696 [ty] type[T] is assignable to an inferable typevar (#21766)
## Summary

Resolves https://github.com/astral-sh/ty/issues/1712.
2025-12-02 18:25:09 -05:00
Brent Westbrook
2250fa6f98 Fix syntax error false positives for await outside functions (#21763)
## Summary

Fixes #21750 and a related bug in `PLE1142`. We were not properly
considering generators to be valid `await` contexts, which caused the
`F704` issue. One of the tests I added for this also uncovered an issue
in `PLE1142` for comprehensions nested within async generators because
we were only checking the current scope rather than traversing the
nested context.

## Test Plan

Both of these rules are implemented as semantic syntax errors, so I
added tests (and fixes) in both Ruff and ty.
2025-12-02 21:02:02 +00:00
Alex Waygood
392a8e4e50 [ty] Improve diagnostics for unsupported comparison operations (#21737) 2025-12-02 19:58:45 +00:00
Micha Reiser
515de2d062 Move Token, TokenKind and Tokens to ruff-python-ast (#21760) 2025-12-02 20:10:46 +01:00
Douglas Creager
508c0a0861 [ty] Don't confuse multiple occurrences of typing.Self when binding bound methods (#21754)
In the following example, there are two occurrences of `typing.Self`,
one for `Foo.foo` and one for `Bar.bar`:

```py
from typing import Self, reveal_type

class Foo[T]:
    def foo(self: Self) -> T:
        raise NotImplementedError

class Bar:
    def bar(self: Self, x: Foo[Self]):
        # SHOULD BE: bound method Foo[Self@bar].foo() -> Self@bar
        # revealed: bound method Foo[Self@bar].foo() -> Foo[Self@bar]
        reveal_type(x.foo)

def f[U: Bar](x: Foo[U]):
    # revealed: bound method Foo[U@f].foo() -> U@f
    reveal_type(x.foo)
```

When accessing a bound method, we replace any occurrences of `Self` with
the bound `self` type.

We were doing this correctly for the second reveal. We would first apply
the specialization, getting `(self: Self@foo) -> U@F` as the signature
of `x.foo`. We would then bind the `self` parameter, substituting
`Self@foo` with `Foo[U@F]` as part of that. The return type was already
specialized to `U@F`, so that substitution had no further affect on the
type that we revealed.

In the first reveal, we would follow the same process, but we confused
the two occurrences of `Self`. We would first apply the specialization,
getting `(self: Self@foo) -> Self@bar` as the method signature. We would
then try to bind the `self` parameter, substituting `Self@foo` with
`Foo[Self@bar]`. However, because we didn't distinguish the two separate
`Self`s, and applied the substitution to the return type as well as to
the `self` parameter.

The fix is to track which particular `Self` we're trying to substitute
when applying the type mapping.

Fixes https://github.com/astral-sh/ty/issues/1713
2025-12-02 13:15:09 -05:00
William Woodruff
0d2792517d Use our org-wide Renovate preset (#21759) 2025-12-02 13:05:26 -05:00
254 changed files with 8630 additions and 3180 deletions

View File

@@ -2,12 +2,11 @@
$schema: "https://docs.renovatebot.com/renovate-schema.json",
dependencyDashboard: true,
suppressNotifications: ["prEditedNotification"],
extends: ["config:recommended"],
extends: ["github>astral-sh/renovate-config"],
labels: ["internal"],
schedule: ["before 4am on Monday"],
semanticCommits: "disabled",
separateMajorMinor: false,
prHourlyLimit: 10,
enabledManagers: ["github-actions", "pre-commit", "cargo", "pep621", "pip_requirements", "npm"],
cargo: {
// See https://docs.renovatebot.com/configuration-options/#rangestrategy
@@ -16,7 +15,7 @@
pep621: {
// The default for this package manager is to only search for `pyproject.toml` files
// found at the repository root: https://docs.renovatebot.com/modules/manager/pep621/#file-matching
fileMatch: ["^(python|scripts)/.*pyproject\\.toml$"],
managerFilePatterns: ["^(python|scripts)/.*pyproject\\.toml$"],
},
pip_requirements: {
// The default for this package manager is to run on all requirements.txt files:
@@ -34,7 +33,7 @@
npm: {
// The default for this package manager is to only search for `package.json` files
// found at the repository root: https://docs.renovatebot.com/modules/manager/npm/#file-matching
fileMatch: ["^playground/.*package\\.json$"],
managerFilePatterns: ["^playground/.*package\\.json$"],
},
"pre-commit": {
enabled: true,

View File

@@ -18,7 +18,8 @@ jobs:
environment:
name: release
permissions:
id-token: write # For PyPI's trusted publishing + PEP 740 attestations
# For PyPI's trusted publishing.
id-token: write
steps:
- name: "Install uv"
uses: astral-sh/setup-uv@1e862dfacbd1d6d858c55d9b792c756523627244 # v7.1.4
@@ -27,8 +28,5 @@ jobs:
pattern: wheels-*
path: wheels
merge-multiple: true
- uses: astral-sh/attest-action@2c727738cea36d6c97dd85eb133ea0e0e8fe754b # v0.0.4
with:
paths: wheels/*
- name: Publish to PyPi
run: uv publish -v wheels/*

View File

@@ -1,5 +1,34 @@
# Changelog
## 0.14.8
Released on 2025-12-04.
### Preview features
- \[`flake8-bugbear`\] Catch `yield` expressions within other statements (`B901`) ([#21200](https://github.com/astral-sh/ruff/pull/21200))
- \[`flake8-use-pathlib`\] Mark fixes unsafe for return type changes (`PTH104`, `PTH105`, `PTH109`, `PTH115`) ([#21440](https://github.com/astral-sh/ruff/pull/21440))
### Bug fixes
- Fix syntax error false positives for `await` outside functions ([#21763](https://github.com/astral-sh/ruff/pull/21763))
- \[`flake8-simplify`\] Fix truthiness assumption for non-iterable arguments in tuple/list/set calls (`SIM222`, `SIM223`) ([#21479](https://github.com/astral-sh/ruff/pull/21479))
### Documentation
- Suggest using `--output-file` option in GitLab integration ([#21706](https://github.com/astral-sh/ruff/pull/21706))
### Other changes
- [syntax-error] Default type parameter followed by non-default type parameter ([#21657](https://github.com/astral-sh/ruff/pull/21657))
### Contributors
- [@kieran-ryan](https://github.com/kieran-ryan)
- [@11happy](https://github.com/11happy)
- [@danparizher](https://github.com/danparizher)
- [@ntBre](https://github.com/ntBre)
## 0.14.7
Released on 2025-11-28.

7
Cargo.lock generated
View File

@@ -2859,7 +2859,7 @@ dependencies = [
[[package]]
name = "ruff"
version = "0.14.7"
version = "0.14.8"
dependencies = [
"anyhow",
"argfile",
@@ -3117,13 +3117,14 @@ dependencies = [
[[package]]
name = "ruff_linter"
version = "0.14.7"
version = "0.14.8"
dependencies = [
"aho-corasick",
"anyhow",
"bitflags 2.10.0",
"clap",
"colored 3.0.0",
"compact_str",
"fern",
"glob",
"globset",
@@ -3472,7 +3473,7 @@ dependencies = [
[[package]]
name = "ruff_wasm"
version = "0.14.7"
version = "0.14.8"
dependencies = [
"console_error_panic_hook",
"console_log",

View File

@@ -147,8 +147,8 @@ curl -LsSf https://astral.sh/ruff/install.sh | sh
powershell -c "irm https://astral.sh/ruff/install.ps1 | iex"
# For a specific version.
curl -LsSf https://astral.sh/ruff/0.14.7/install.sh | sh
powershell -c "irm https://astral.sh/ruff/0.14.7/install.ps1 | iex"
curl -LsSf https://astral.sh/ruff/0.14.8/install.sh | sh
powershell -c "irm https://astral.sh/ruff/0.14.8/install.ps1 | iex"
```
You can also install Ruff via [Homebrew](https://formulae.brew.sh/formula/ruff), [Conda](https://anaconda.org/conda-forge/ruff),
@@ -181,7 +181,7 @@ Ruff can also be used as a [pre-commit](https://pre-commit.com/) hook via [`ruff
```yaml
- repo: https://github.com/astral-sh/ruff-pre-commit
# Ruff version.
rev: v0.14.7
rev: v0.14.8
hooks:
# Run the linter.
- id: ruff-check

View File

@@ -1,6 +1,6 @@
[package]
name = "ruff"
version = "0.14.7"
version = "0.14.8"
publish = true
authors = { workspace = true }
edition = { workspace = true }

View File

@@ -6,7 +6,8 @@ use criterion::{
use ruff_benchmark::{
LARGE_DATASET, NUMPY_CTYPESLIB, NUMPY_GLOBALS, PYDANTIC_TYPES, TestCase, UNICODE_PYPINYIN,
};
use ruff_python_parser::{Mode, TokenKind, lexer};
use ruff_python_ast::token::TokenKind;
use ruff_python_parser::{Mode, lexer};
#[cfg(target_os = "windows")]
#[global_allocator]

View File

@@ -3,10 +3,8 @@ use std::sync::Arc;
use arc_swap::ArcSwapOption;
use get_size2::GetSize;
use ruff_python_ast::{AnyRootNodeRef, ModExpression, ModModule, NodeIndex, StringLiteral};
use ruff_python_parser::{
ParseError, ParseOptions, Parsed, parse_string_annotation, parse_unchecked,
};
use ruff_python_ast::{AnyRootNodeRef, ModModule, NodeIndex};
use ruff_python_parser::{ParseOptions, Parsed, parse_unchecked};
use crate::Db;
use crate::files::File;
@@ -23,7 +21,11 @@ use crate::source::source_text;
/// reflected in the changed AST offsets.
/// The other reason is that Ruff's AST doesn't implement `Eq` which Salsa requires
/// for determining if a query result is unchanged.
#[salsa::tracked(returns(ref), no_eq, heap_size=ruff_memory_usage::heap_size)]
///
/// The LRU capacity of 200 was picked without any empirical evidence that it's optimal,
/// instead it's a wild guess that it should be unlikely that incremental changes involve
/// more than 200 modules. Parsed ASTs within the same revision are never evicted by Salsa.
#[salsa::tracked(returns(ref), no_eq, heap_size=ruff_memory_usage::heap_size, lru=200)]
pub fn parsed_module(db: &dyn Db, file: File) -> ParsedModule {
let _span = tracing::trace_span!("parsed_module", ?file).entered();
@@ -43,18 +45,6 @@ pub fn parsed_module_impl(db: &dyn Db, file: File) -> Parsed<ModModule> {
.expect("PySourceType always parses into a module")
}
pub fn parsed_string_annotation(
source: &str,
string: &StringLiteral,
) -> Result<Parsed<ModExpression>, ParseError> {
let expr = parse_string_annotation(source, string)?;
// We need the sub-ast of the string annotation to be indexed
indexed::ensure_indexed(&expr);
Ok(expr)
}
/// A wrapper around a parsed module.
///
/// This type manages instances of the module AST. A particular instance of the AST
@@ -106,14 +96,9 @@ impl ParsedModule {
self.inner.store(None);
}
/// Returns the pointer address of this [`ParsedModule`].
///
/// The pointer uniquely identifies the module within the current Salsa revision,
/// regardless of whether particular [`ParsedModuleRef`] instances are garbage collected.
pub fn addr(&self) -> usize {
// Note that the outer `Arc` in `inner` is stable across garbage collection, while the inner
// `Arc` within the `ArcSwap` may change.
Arc::as_ptr(&self.inner).addr()
/// Returns the file to which this module belongs.
pub fn file(&self) -> File {
self.file
}
}
@@ -184,21 +169,12 @@ mod indexed {
pub parsed: Parsed<ModModule>,
}
pub fn ensure_indexed(parsed: &Parsed<ModExpression>) {
let mut visitor = Visitor {
nodes: Some(Vec::new()),
index: 0,
};
AnyNodeRef::from(parsed.syntax()).visit_source_order(&mut visitor);
}
impl IndexedModule {
/// Create a new [`IndexedModule`] from the given AST.
#[allow(clippy::unnecessary_cast)]
pub fn new(parsed: Parsed<ModModule>) -> Arc<Self> {
let mut visitor = Visitor {
nodes: Some(Vec::new()),
nodes: Vec::new(),
index: 0,
};
@@ -209,7 +185,7 @@ mod indexed {
AnyNodeRef::from(inner.parsed.syntax()).visit_source_order(&mut visitor);
let index: Box<[AnyRootNodeRef<'_>]> = visitor.nodes.unwrap().into_boxed_slice();
let index: Box<[AnyRootNodeRef<'_>]> = visitor.nodes.into_boxed_slice();
// SAFETY: We cast from `Box<[AnyRootNodeRef<'_>]>` to `Box<[AnyRootNodeRef<'static>]>`,
// faking the 'static lifetime to create the self-referential struct. The node references
@@ -238,7 +214,7 @@ mod indexed {
/// A visitor that collects nodes in source order.
pub struct Visitor<'a> {
pub index: u32,
pub nodes: Option<Vec<AnyRootNodeRef<'a>>>,
pub nodes: Vec<AnyRootNodeRef<'a>>,
}
impl<'a> Visitor<'a> {
@@ -248,9 +224,7 @@ mod indexed {
AnyRootNodeRef<'a>: From<&'a T>,
{
node.node_index().set(NodeIndex::from(self.index));
if let Some(nodes) = &mut self.nodes {
nodes.push(AnyRootNodeRef::from(node));
}
self.nodes.push(AnyRootNodeRef::from(node));
self.index += 1;
}
}

View File

@@ -667,6 +667,13 @@ impl Deref for SystemPathBuf {
}
}
impl AsRef<Path> for SystemPathBuf {
#[inline]
fn as_ref(&self) -> &Path {
self.0.as_std_path()
}
}
impl<P: AsRef<SystemPath>> FromIterator<P> for SystemPathBuf {
fn from_iter<I: IntoIterator<Item = P>>(iter: I) -> Self {
let mut buf = SystemPathBuf::new();

View File

@@ -49,7 +49,7 @@ impl ModuleImports {
// Resolve the imports.
let mut resolved_imports = ModuleImports::default();
for import in imports {
for resolved in Resolver::new(db).resolve(import) {
for resolved in Resolver::new(db, path).resolve(import) {
if let Some(path) = resolved.as_system_path() {
resolved_imports.insert(path.to_path_buf());
}

View File

@@ -1,5 +1,9 @@
use ruff_db::files::FilePath;
use ty_python_semantic::{ModuleName, resolve_module, resolve_real_module};
use ruff_db::files::{File, FilePath, system_path_to_file};
use ruff_db::system::SystemPath;
use ty_python_semantic::{
ModuleName, resolve_module, resolve_module_confident, resolve_real_module,
resolve_real_module_confident,
};
use crate::ModuleDb;
use crate::collector::CollectedImport;
@@ -7,12 +11,15 @@ use crate::collector::CollectedImport;
/// Collect all imports for a given Python file.
pub(crate) struct Resolver<'a> {
db: &'a ModuleDb,
file: Option<File>,
}
impl<'a> Resolver<'a> {
/// Initialize a [`Resolver`] with a given [`ModuleDb`].
pub(crate) fn new(db: &'a ModuleDb) -> Self {
Self { db }
pub(crate) fn new(db: &'a ModuleDb, path: &SystemPath) -> Self {
// If we know the importing file we can potentially resolve more imports
let file = system_path_to_file(db, path).ok();
Self { db, file }
}
/// Resolve the [`CollectedImport`] into a [`FilePath`].
@@ -70,13 +77,21 @@ impl<'a> Resolver<'a> {
/// Resolves a module name to a module.
pub(crate) fn resolve_module(&self, module_name: &ModuleName) -> Option<&'a FilePath> {
let module = resolve_module(self.db, module_name)?;
let module = if let Some(file) = self.file {
resolve_module(self.db, file, module_name)?
} else {
resolve_module_confident(self.db, module_name)?
};
Some(module.file(self.db)?.path(self.db))
}
/// Resolves a module name to a module (stubs not allowed).
fn resolve_real_module(&self, module_name: &ModuleName) -> Option<&'a FilePath> {
let module = resolve_real_module(self.db, module_name)?;
let module = if let Some(file) = self.file {
resolve_real_module(self.db, file, module_name)?
} else {
resolve_real_module_confident(self.db, module_name)?
};
Some(module.file(self.db)?.path(self.db))
}
}

View File

@@ -1,6 +1,6 @@
[package]
name = "ruff_linter"
version = "0.14.7"
version = "0.14.8"
publish = false
authors = { workspace = true }
edition = { workspace = true }
@@ -35,6 +35,7 @@ anyhow = { workspace = true }
bitflags = { workspace = true }
clap = { workspace = true, features = ["derive", "string"], optional = true }
colored = { workspace = true }
compact_str = { workspace = true }
fern = { workspace = true }
glob = { workspace = true }
globset = { workspace = true }

View File

@@ -52,16 +52,16 @@ def not_broken5():
yield inner()
def not_broken6():
def broken3():
return (yield from [])
def not_broken7():
def broken4():
x = yield from []
return x
def not_broken8():
def broken5():
x = None
def inner(ex):
@@ -76,3 +76,13 @@ class NotBroken9(object):
def __await__(self):
yield from function()
return 42
async def broken6():
yield 1
return foo()
async def broken7():
yield 1
return [1, 2, 3]

View File

@@ -17,3 +17,24 @@ def _():
# Valid yield scope
yield 3
# await is valid in any generator, sync or async
(await cor async for cor in f()) # ok
(await cor for cor in f()) # ok
# but not in comprehensions
[await cor async for cor in f()] # F704
{await cor async for cor in f()} # F704
{await cor: 1 async for cor in f()} # F704
[await cor for cor in f()] # F704
{await cor for cor in f()} # F704
{await cor: 1 for cor in f()} # F704
# or in the iterator of an async generator, which is evaluated in the parent
# scope
(cor async for cor in await f()) # F704
(await cor async for cor in [await c for c in f()]) # F704
# this is also okay because the comprehension is within the generator scope
([await c for c in cor] async for cor in f()) # ok

View File

@@ -3,3 +3,5 @@ def func():
# Top-level await
await 1
([await c for c in cor] async for cor in func()) # ok

View File

@@ -0,0 +1,24 @@
async def gen():
yield 1
return 42
def gen(): # B901 but not a syntax error - not an async generator
yield 1
return 42
async def gen(): # ok - no value in return
yield 1
return
async def gen():
yield 1
return foo()
async def gen():
yield 1
return [1, 2, 3]
async def gen():
if True:
yield 1
return 10

View File

@@ -35,6 +35,7 @@ use ruff_python_ast::helpers::{collect_import_from_member, is_docstring_stmt, to
use ruff_python_ast::identifier::Identifier;
use ruff_python_ast::name::QualifiedName;
use ruff_python_ast::str::Quote;
use ruff_python_ast::token::Tokens;
use ruff_python_ast::visitor::{Visitor, walk_except_handler, walk_pattern};
use ruff_python_ast::{
self as ast, AnyParameterRef, ArgOrKeyword, Comprehension, ElifElseClause, ExceptHandler, Expr,
@@ -48,7 +49,7 @@ use ruff_python_parser::semantic_errors::{
SemanticSyntaxChecker, SemanticSyntaxContext, SemanticSyntaxError, SemanticSyntaxErrorKind,
};
use ruff_python_parser::typing::{AnnotationKind, ParsedAnnotation, parse_type_annotation};
use ruff_python_parser::{ParseError, Parsed, Tokens};
use ruff_python_parser::{ParseError, Parsed};
use ruff_python_semantic::all::{DunderAllDefinition, DunderAllFlags};
use ruff_python_semantic::analyze::{imports, typing};
use ruff_python_semantic::{
@@ -68,6 +69,7 @@ use crate::noqa::NoqaMapping;
use crate::package::PackageRoot;
use crate::preview::is_undefined_export_in_dunder_init_enabled;
use crate::registry::Rule;
use crate::rules::flake8_bugbear::rules::ReturnInGenerator;
use crate::rules::pyflakes::rules::{
LateFutureImport, MultipleStarredExpressions, ReturnOutsideFunction,
UndefinedLocalWithNestedImportStarUsage, YieldOutsideFunction,
@@ -728,6 +730,12 @@ impl SemanticSyntaxContext for Checker<'_> {
self.report_diagnostic(NonlocalWithoutBinding { name }, error.range);
}
}
SemanticSyntaxErrorKind::ReturnInGenerator => {
// B901
if self.is_rule_enabled(Rule::ReturnInGenerator) {
self.report_diagnostic(ReturnInGenerator, error.range);
}
}
SemanticSyntaxErrorKind::ReboundComprehensionVariable
| SemanticSyntaxErrorKind::DuplicateTypeParameter
| SemanticSyntaxErrorKind::MultipleCaseAssignment(_)
@@ -746,6 +754,7 @@ impl SemanticSyntaxContext for Checker<'_> {
| SemanticSyntaxErrorKind::LoadBeforeNonlocalDeclaration { .. }
| SemanticSyntaxErrorKind::NonlocalAndGlobal(_)
| SemanticSyntaxErrorKind::AnnotatedGlobal(_)
| SemanticSyntaxErrorKind::TypeParameterDefaultOrder(_)
| SemanticSyntaxErrorKind::AnnotatedNonlocal(_) => {
self.semantic_errors.borrow_mut().push(error);
}
@@ -779,6 +788,10 @@ impl SemanticSyntaxContext for Checker<'_> {
match scope.kind {
ScopeKind::Class(_) => return false,
ScopeKind::Function(_) | ScopeKind::Lambda(_) => return true,
ScopeKind::Generator {
kind: GeneratorKind::Generator,
..
} => return true,
ScopeKind::Generator { .. }
| ScopeKind::Module
| ScopeKind::Type
@@ -828,14 +841,19 @@ impl SemanticSyntaxContext for Checker<'_> {
self.source_type.is_ipynb()
}
fn in_generator_scope(&self) -> bool {
matches!(
&self.semantic.current_scope().kind,
ScopeKind::Generator {
kind: GeneratorKind::Generator,
..
fn in_generator_context(&self) -> bool {
for scope in self.semantic.current_scopes() {
if matches!(
scope.kind,
ScopeKind::Generator {
kind: GeneratorKind::Generator,
..
}
) {
return true;
}
)
}
false
}
fn in_loop_context(&self) -> bool {

View File

@@ -1,6 +1,6 @@
use ruff_python_ast::token::{TokenKind, Tokens};
use ruff_python_codegen::Stylist;
use ruff_python_index::Indexer;
use ruff_python_parser::{TokenKind, Tokens};
use ruff_source_file::LineRanges;
use ruff_text_size::{Ranged, TextRange};

View File

@@ -4,9 +4,9 @@ use std::path::Path;
use ruff_notebook::CellOffsets;
use ruff_python_ast::PySourceType;
use ruff_python_ast::token::Tokens;
use ruff_python_codegen::Stylist;
use ruff_python_index::Indexer;
use ruff_python_parser::Tokens;
use crate::Locator;
use crate::directives::TodoComment;

View File

@@ -5,8 +5,8 @@ use std::str::FromStr;
use bitflags::bitflags;
use ruff_python_ast::token::{TokenKind, Tokens};
use ruff_python_index::Indexer;
use ruff_python_parser::{TokenKind, Tokens};
use ruff_python_trivia::CommentRanges;
use ruff_source_file::LineRanges;
use ruff_text_size::{Ranged, TextLen, TextRange, TextSize};

View File

@@ -5,8 +5,8 @@ use std::iter::FusedIterator;
use std::slice::Iter;
use ruff_python_ast::statement_visitor::{StatementVisitor, walk_stmt};
use ruff_python_ast::token::{Token, TokenKind, Tokens};
use ruff_python_ast::{self as ast, Stmt, Suite};
use ruff_python_parser::{Token, TokenKind, Tokens};
use ruff_source_file::UniversalNewlineIterator;
use ruff_text_size::{Ranged, TextSize};

View File

@@ -9,10 +9,11 @@ use anyhow::Result;
use libcst_native as cst;
use ruff_diagnostics::Edit;
use ruff_python_ast::token::Tokens;
use ruff_python_ast::{self as ast, Expr, ModModule, Stmt};
use ruff_python_codegen::Stylist;
use ruff_python_importer::Insertion;
use ruff_python_parser::{Parsed, Tokens};
use ruff_python_parser::Parsed;
use ruff_python_semantic::{
ImportedName, MemberNameImport, ModuleNameImport, NameImport, SemanticModel,
};

View File

@@ -46,6 +46,7 @@ pub mod rule_selector;
pub mod rules;
pub mod settings;
pub mod source_kind;
pub mod suppression;
mod text_helpers;
pub mod upstream_categories;
mod violation;

View File

@@ -1043,6 +1043,7 @@ mod tests {
Rule::YieldFromInAsyncFunction,
Path::new("yield_from_in_async_function.py")
)]
#[test_case(Rule::ReturnInGenerator, Path::new("return_in_generator.py"))]
fn test_syntax_errors(rule: Rule, path: &Path) -> Result<()> {
let snapshot = path.to_string_lossy().to_string();
let path = Path::new("resources/test/fixtures/syntax_errors").join(path);

View File

@@ -1,6 +1,5 @@
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_ast::statement_visitor;
use ruff_python_ast::statement_visitor::StatementVisitor;
use ruff_python_ast::visitor::{Visitor, walk_expr, walk_stmt};
use ruff_python_ast::{self as ast, Expr, Stmt, StmtFunctionDef};
use ruff_text_size::TextRange;
@@ -96,6 +95,11 @@ pub(crate) fn return_in_generator(checker: &Checker, function_def: &StmtFunction
return;
}
// Async functions are flagged by the `ReturnInGenerator` semantic syntax error.
if function_def.is_async {
return;
}
let mut visitor = ReturnInGeneratorVisitor::default();
visitor.visit_body(&function_def.body);
@@ -112,15 +116,9 @@ struct ReturnInGeneratorVisitor {
has_yield: bool,
}
impl StatementVisitor<'_> for ReturnInGeneratorVisitor {
impl Visitor<'_> for ReturnInGeneratorVisitor {
fn visit_stmt(&mut self, stmt: &Stmt) {
match stmt {
Stmt::Expr(ast::StmtExpr { value, .. }) => match **value {
Expr::Yield(_) | Expr::YieldFrom(_) => {
self.has_yield = true;
}
_ => {}
},
Stmt::FunctionDef(_) => {
// Do not recurse into nested functions; they're evaluated separately.
}
@@ -130,8 +128,19 @@ impl StatementVisitor<'_> for ReturnInGeneratorVisitor {
node_index: _,
}) => {
self.return_ = Some(*range);
walk_stmt(self, stmt);
}
_ => statement_visitor::walk_stmt(self, stmt),
_ => walk_stmt(self, stmt),
}
}
fn visit_expr(&mut self, expr: &Expr) {
match expr {
Expr::Lambda(_) => {}
Expr::Yield(_) | Expr::YieldFrom(_) => {
self.has_yield = true;
}
_ => walk_expr(self, expr),
}
}
}

View File

@@ -21,3 +21,46 @@ B901 Using `yield` and `return {value}` in a generator function can lead to conf
37 |
38 | yield from not_broken()
|
B901 Using `yield` and `return {value}` in a generator function can lead to confusing behavior
--> B901.py:56:5
|
55 | def broken3():
56 | return (yield from [])
| ^^^^^^^^^^^^^^^^^^^^^^
|
B901 Using `yield` and `return {value}` in a generator function can lead to confusing behavior
--> B901.py:61:5
|
59 | def broken4():
60 | x = yield from []
61 | return x
| ^^^^^^^^
|
B901 Using `yield` and `return {value}` in a generator function can lead to confusing behavior
--> B901.py:72:5
|
71 | inner((yield from []))
72 | return x
| ^^^^^^^^
|
B901 Using `yield` and `return {value}` in a generator function can lead to confusing behavior
--> B901.py:83:5
|
81 | async def broken6():
82 | yield 1
83 | return foo()
| ^^^^^^^^^^^^
|
B901 Using `yield` and `return {value}` in a generator function can lead to confusing behavior
--> B901.py:88:5
|
86 | async def broken7():
87 | yield 1
88 | return [1, 2, 3]
| ^^^^^^^^^^^^^^^^
|

View File

@@ -1,6 +1,6 @@
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_ast::token::{TokenKind, Tokens};
use ruff_python_index::Indexer;
use ruff_python_parser::{TokenKind, Tokens};
use ruff_text_size::{Ranged, TextRange};
use crate::Locator;

View File

@@ -3,7 +3,7 @@ use ruff_python_ast as ast;
use ruff_python_ast::ExprGenerator;
use ruff_python_ast::comparable::ComparableExpr;
use ruff_python_ast::parenthesize::parenthesized_range;
use ruff_python_parser::TokenKind;
use ruff_python_ast::token::TokenKind;
use ruff_text_size::{Ranged, TextRange, TextSize};
use crate::checkers::ast::Checker;

View File

@@ -3,7 +3,7 @@ use ruff_python_ast as ast;
use ruff_python_ast::ExprGenerator;
use ruff_python_ast::comparable::ComparableExpr;
use ruff_python_ast::parenthesize::parenthesized_range;
use ruff_python_parser::TokenKind;
use ruff_python_ast::token::TokenKind;
use ruff_text_size::{Ranged, TextRange, TextSize};
use crate::checkers::ast::Checker;

View File

@@ -1,7 +1,7 @@
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_ast as ast;
use ruff_python_ast::parenthesize::parenthesized_range;
use ruff_python_parser::TokenKind;
use ruff_python_ast::token::TokenKind;
use ruff_text_size::{Ranged, TextRange, TextSize};
use crate::checkers::ast::Checker;

View File

@@ -3,8 +3,8 @@ use std::borrow::Cow;
use itertools::Itertools;
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_ast::StringFlags;
use ruff_python_ast::token::{Token, TokenKind, Tokens};
use ruff_python_index::Indexer;
use ruff_python_parser::{Token, TokenKind, Tokens};
use ruff_source_file::LineRanges;
use ruff_text_size::{Ranged, TextLen, TextRange};

View File

@@ -1,6 +1,6 @@
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_ast::token::{TokenKind, Tokens};
use ruff_python_ast::{self as ast, Expr};
use ruff_python_parser::{TokenKind, Tokens};
use ruff_text_size::{Ranged, TextLen, TextSize};
use crate::checkers::ast::Checker;

View File

@@ -4,10 +4,10 @@ use ruff_diagnostics::Applicability;
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_ast::helpers::{is_const_false, is_const_true};
use ruff_python_ast::stmt_if::elif_else_range;
use ruff_python_ast::token::TokenKind;
use ruff_python_ast::visitor::Visitor;
use ruff_python_ast::whitespace::indentation;
use ruff_python_ast::{self as ast, Decorator, ElifElseClause, Expr, Stmt};
use ruff_python_parser::TokenKind;
use ruff_python_semantic::SemanticModel;
use ruff_python_semantic::analyze::visibility::is_property;
use ruff_python_trivia::{SimpleTokenKind, SimpleTokenizer, is_python_whitespace};

View File

@@ -1,5 +1,5 @@
use ruff_python_ast::token::Tokens;
use ruff_python_ast::{self as ast, Stmt};
use ruff_python_parser::Tokens;
use ruff_source_file::LineRanges;
use ruff_text_size::{Ranged, TextRange};

View File

@@ -1,5 +1,5 @@
use ruff_python_ast::Stmt;
use ruff_python_parser::{TokenKind, Tokens};
use ruff_python_ast::token::{TokenKind, Tokens};
use ruff_python_trivia::PythonWhitespace;
use ruff_source_file::UniversalNewlines;
use ruff_text_size::Ranged;

View File

@@ -11,8 +11,8 @@ use comments::Comment;
use normalize::normalize_imports;
use order::order_imports;
use ruff_python_ast::PySourceType;
use ruff_python_ast::token::Tokens;
use ruff_python_codegen::Stylist;
use ruff_python_parser::Tokens;
use settings::Settings;
use types::EitherImport::{Import, ImportFrom};
use types::{AliasData, ImportBlock, TrailingComma};

View File

@@ -1,11 +1,11 @@
use itertools::{EitherOrBoth, Itertools};
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_ast::token::Tokens;
use ruff_python_ast::whitespace::trailing_lines_end;
use ruff_python_ast::{PySourceType, PythonVersion, Stmt};
use ruff_python_codegen::Stylist;
use ruff_python_index::Indexer;
use ruff_python_parser::Tokens;
use ruff_python_trivia::{PythonWhitespace, leading_indentation, textwrap::indent};
use ruff_source_file::{LineRanges, UniversalNewlines};
use ruff_text_size::{Ranged, TextRange};

View File

@@ -1,4 +1,4 @@
use ruff_python_parser::TokenKind;
use ruff_python_ast::token::TokenKind;
/// Returns `true` if the name should be considered "ambiguous".
pub(super) fn is_ambiguous_name(name: &str) -> bool {

View File

@@ -8,10 +8,10 @@ use itertools::Itertools;
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_notebook::CellOffsets;
use ruff_python_ast::PySourceType;
use ruff_python_ast::token::TokenIterWithContext;
use ruff_python_ast::token::TokenKind;
use ruff_python_ast::token::Tokens;
use ruff_python_codegen::Stylist;
use ruff_python_parser::TokenIterWithContext;
use ruff_python_parser::TokenKind;
use ruff_python_parser::Tokens;
use ruff_python_trivia::PythonWhitespace;
use ruff_source_file::{LineRanges, UniversalNewlines};
use ruff_text_size::TextRange;

View File

@@ -1,8 +1,8 @@
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_notebook::CellOffsets;
use ruff_python_ast::PySourceType;
use ruff_python_ast::token::{TokenIterWithContext, TokenKind, Tokens};
use ruff_python_index::Indexer;
use ruff_python_parser::{TokenIterWithContext, TokenKind, Tokens};
use ruff_text_size::{Ranged, TextSize};
use crate::Locator;

View File

@@ -1,5 +1,5 @@
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_parser::TokenKind;
use ruff_python_ast::token::TokenKind;
use ruff_text_size::{Ranged, TextRange};
use crate::AlwaysFixableViolation;

View File

@@ -1,5 +1,5 @@
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_parser::TokenKind;
use ruff_python_ast::token::TokenKind;
use ruff_text_size::TextRange;
use crate::Violation;

View File

@@ -1,5 +1,5 @@
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_parser::TokenKind;
use ruff_python_ast::token::TokenKind;
use ruff_text_size::Ranged;
use crate::Edit;

View File

@@ -1,5 +1,5 @@
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_parser::TokenKind;
use ruff_python_ast::token::TokenKind;
use ruff_text_size::Ranged;
use crate::checkers::ast::LintContext;

View File

@@ -1,5 +1,5 @@
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_parser::TokenKind;
use ruff_python_ast::token::TokenKind;
use ruff_text_size::{Ranged, TextRange};
use crate::checkers::ast::LintContext;

View File

@@ -9,7 +9,7 @@ pub(crate) use missing_whitespace::*;
pub(crate) use missing_whitespace_after_keyword::*;
pub(crate) use missing_whitespace_around_operator::*;
pub(crate) use redundant_backslash::*;
use ruff_python_parser::{TokenKind, Tokens};
use ruff_python_ast::token::{TokenKind, Tokens};
use ruff_python_trivia::is_python_whitespace;
use ruff_text_size::{Ranged, TextLen, TextRange, TextSize};
pub(crate) use space_around_operator::*;

View File

@@ -1,6 +1,6 @@
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_ast::token::TokenKind;
use ruff_python_index::Indexer;
use ruff_python_parser::TokenKind;
use ruff_source_file::LineRanges;
use ruff_text_size::{Ranged, TextRange, TextSize};

View File

@@ -1,5 +1,5 @@
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_parser::TokenKind;
use ruff_python_ast::token::TokenKind;
use ruff_text_size::{Ranged, TextRange};
use crate::checkers::ast::LintContext;

View File

@@ -1,5 +1,5 @@
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_parser::TokenKind;
use ruff_python_ast::token::TokenKind;
use ruff_text_size::{Ranged, TextRange, TextSize};
use crate::checkers::ast::LintContext;

View File

@@ -1,5 +1,5 @@
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_parser::TokenKind;
use ruff_python_ast::token::TokenKind;
use ruff_python_trivia::PythonWhitespace;
use ruff_source_file::LineRanges;
use ruff_text_size::{Ranged, TextLen, TextRange, TextSize};

View File

@@ -1,5 +1,5 @@
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_parser::TokenKind;
use ruff_python_ast::token::TokenKind;
use ruff_text_size::{Ranged, TextRange, TextSize};
use crate::checkers::ast::LintContext;

View File

@@ -3,7 +3,7 @@ use std::iter::Peekable;
use itertools::Itertools;
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_notebook::CellOffsets;
use ruff_python_parser::{Token, TokenKind, Tokens};
use ruff_python_ast::token::{Token, TokenKind, Tokens};
use ruff_text_size::{Ranged, TextRange, TextSize};
use crate::{AlwaysFixableViolation, Edit, Fix, checkers::ast::LintContext};

View File

@@ -2,8 +2,8 @@ use anyhow::{Error, bail};
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_ast::helpers;
use ruff_python_ast::token::{TokenKind, Tokens};
use ruff_python_ast::{CmpOp, Expr};
use ruff_python_parser::{TokenKind, Tokens};
use ruff_text_size::{Ranged, TextRange};
use crate::checkers::ast::Checker;

View File

@@ -3,8 +3,8 @@ use itertools::Itertools;
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_ast::helpers::contains_effect;
use ruff_python_ast::parenthesize::parenthesized_range;
use ruff_python_ast::token::{TokenKind, Tokens};
use ruff_python_ast::{self as ast, Stmt};
use ruff_python_parser::{TokenKind, Tokens};
use ruff_python_semantic::Binding;
use ruff_text_size::{Ranged, TextRange, TextSize};

View File

@@ -37,3 +37,88 @@ F704 `await` statement outside of a function
12 |
13 | def _():
|
F704 `await` statement outside of a function
--> F704.py:27:2
|
26 | # but not in comprehensions
27 | [await cor async for cor in f()] # F704
| ^^^^^^^^^
28 | {await cor async for cor in f()} # F704
29 | {await cor: 1 async for cor in f()} # F704
|
F704 `await` statement outside of a function
--> F704.py:28:2
|
26 | # but not in comprehensions
27 | [await cor async for cor in f()] # F704
28 | {await cor async for cor in f()} # F704
| ^^^^^^^^^
29 | {await cor: 1 async for cor in f()} # F704
30 | [await cor for cor in f()] # F704
|
F704 `await` statement outside of a function
--> F704.py:29:2
|
27 | [await cor async for cor in f()] # F704
28 | {await cor async for cor in f()} # F704
29 | {await cor: 1 async for cor in f()} # F704
| ^^^^^^^^^
30 | [await cor for cor in f()] # F704
31 | {await cor for cor in f()} # F704
|
F704 `await` statement outside of a function
--> F704.py:30:2
|
28 | {await cor async for cor in f()} # F704
29 | {await cor: 1 async for cor in f()} # F704
30 | [await cor for cor in f()] # F704
| ^^^^^^^^^
31 | {await cor for cor in f()} # F704
32 | {await cor: 1 for cor in f()} # F704
|
F704 `await` statement outside of a function
--> F704.py:31:2
|
29 | {await cor: 1 async for cor in f()} # F704
30 | [await cor for cor in f()] # F704
31 | {await cor for cor in f()} # F704
| ^^^^^^^^^
32 | {await cor: 1 for cor in f()} # F704
|
F704 `await` statement outside of a function
--> F704.py:32:2
|
30 | [await cor for cor in f()] # F704
31 | {await cor for cor in f()} # F704
32 | {await cor: 1 for cor in f()} # F704
| ^^^^^^^^^
33 |
34 | # or in the iterator of an async generator, which is evaluated in the parent
|
F704 `await` statement outside of a function
--> F704.py:36:23
|
34 | # or in the iterator of an async generator, which is evaluated in the parent
35 | # scope
36 | (cor async for cor in await f()) # F704
| ^^^^^^^^^
37 | (await cor async for cor in [await c for c in f()]) # F704
|
F704 `await` statement outside of a function
--> F704.py:37:30
|
35 | # scope
36 | (cor async for cor in await f()) # F704
37 | (await cor async for cor in [await c for c in f()]) # F704
| ^^^^^^^
38 |
39 | # this is also okay because the comprehension is within the generator scope
|

View File

@@ -1,5 +1,5 @@
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_parser::{Token, TokenKind};
use ruff_python_ast::token::{Token, TokenKind};
use ruff_text_size::{Ranged, TextLen, TextRange, TextSize};
use crate::Locator;

View File

@@ -1,5 +1,5 @@
use ruff_python_ast::StmtImportFrom;
use ruff_python_parser::{TokenKind, Tokens};
use ruff_python_ast::token::{TokenKind, Tokens};
use ruff_text_size::{Ranged, TextRange};
use crate::Locator;

View File

@@ -1,10 +1,10 @@
use itertools::Itertools;
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_ast::token::Tokens;
use ruff_python_ast::whitespace::indentation;
use ruff_python_ast::{Alias, StmtImportFrom, StmtRef};
use ruff_python_codegen::Stylist;
use ruff_python_parser::Tokens;
use ruff_text_size::Ranged;
use crate::Locator;

View File

@@ -1,7 +1,7 @@
use std::slice::Iter;
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_parser::{Token, TokenKind, Tokens};
use ruff_python_ast::token::{Token, TokenKind, Tokens};
use ruff_text_size::{Ranged, TextRange};
use crate::Locator;

View File

@@ -6,11 +6,11 @@ use rustc_hash::{FxHashMap, FxHashSet};
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_ast::helpers::any_over_expr;
use ruff_python_ast::str::{leading_quote, trailing_quote};
use ruff_python_ast::token::TokenKind;
use ruff_python_ast::{self as ast, Expr, Keyword, StringFlags};
use ruff_python_literal::format::{
FieldName, FieldNamePart, FieldType, FormatPart, FormatString, FromTemplate,
};
use ruff_python_parser::TokenKind;
use ruff_source_file::LineRanges;
use ruff_text_size::{Ranged, TextRange};

View File

@@ -3,12 +3,12 @@ use std::fmt::Write;
use std::str::FromStr;
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_ast::token::TokenKind;
use ruff_python_ast::{self as ast, AnyStringFlags, Expr, StringFlags, whitespace::indentation};
use ruff_python_codegen::Stylist;
use ruff_python_literal::cformat::{
CConversionFlags, CFormatPart, CFormatPrecision, CFormatQuantity, CFormatString,
};
use ruff_python_parser::TokenKind;
use ruff_python_stdlib::identifiers::is_identifier;
use ruff_source_file::LineRanges;
use ruff_text_size::{Ranged, TextRange};

View File

@@ -1,6 +1,6 @@
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_ast::Stmt;
use ruff_python_parser::TokenKind;
use ruff_python_ast::token::TokenKind;
use ruff_python_semantic::SemanticModel;
use ruff_source_file::LineRanges;
use ruff_text_size::{TextLen, TextRange, TextSize};

View File

@@ -1,7 +1,7 @@
use anyhow::Result;
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_ast::token::{TokenKind, Tokens};
use ruff_python_ast::{self as ast, Expr};
use ruff_python_parser::{TokenKind, Tokens};
use ruff_python_stdlib::open_mode::OpenMode;
use ruff_text_size::{Ranged, TextSize};

View File

@@ -1,8 +1,8 @@
use std::fmt::Write as _;
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_ast::token::{TokenKind, Tokens};
use ruff_python_ast::{self as ast, Arguments, Expr, Keyword};
use ruff_python_parser::{TokenKind, Tokens};
use ruff_text_size::{Ranged, TextRange};
use crate::Locator;

View File

@@ -4,8 +4,8 @@ use anyhow::Result;
use libcst_native::{LeftParen, ParenthesizedNode, RightParen};
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_ast::token::TokenKind;
use ruff_python_ast::{self as ast, Expr, OperatorPrecedence};
use ruff_python_parser::TokenKind;
use ruff_text_size::Ranged;
use crate::checkers::ast::Checker;

View File

@@ -2,9 +2,9 @@ use std::cmp::Ordering;
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_ast::helpers::comment_indentation_after;
use ruff_python_ast::token::{TokenKind, Tokens};
use ruff_python_ast::whitespace::indentation;
use ruff_python_ast::{Stmt, StmtExpr, StmtFor, StmtIf, StmtTry, StmtWhile};
use ruff_python_parser::{TokenKind, Tokens};
use ruff_source_file::LineRanges;
use ruff_text_size::{Ranged, TextLen, TextRange, TextSize};

View File

@@ -9,8 +9,8 @@ use std::cmp::Ordering;
use itertools::Itertools;
use ruff_python_ast as ast;
use ruff_python_ast::token::{TokenKind, Tokens};
use ruff_python_codegen::Stylist;
use ruff_python_parser::{TokenKind, Tokens};
use ruff_python_stdlib::str::is_cased_uppercase;
use ruff_python_trivia::{SimpleTokenKind, first_non_trivia_token, leading_indentation};
use ruff_source_file::LineRanges;

View File

@@ -1,7 +1,7 @@
use ruff_macros::{ViolationMetadata, derive_message_formats};
use ruff_python_ast::PythonVersion;
use ruff_python_ast::token::TokenKind;
use ruff_python_ast::{Expr, ExprCall, parenthesize::parenthesized_range};
use ruff_python_parser::TokenKind;
use ruff_text_size::{Ranged, TextRange};
use crate::checkers::ast::Checker;

View File

@@ -17,4 +17,6 @@ PLE1142 `await` should be used within an async function
4 | # Top-level await
5 | await 1
| ^^^^^^^
6 |
7 | ([await c for c in cor] async for cor in func()) # ok
|

View File

@@ -0,0 +1,55 @@
---
source: crates/ruff_linter/src/linter.rs
---
B901 Using `yield` and `return {value}` in a generator function can lead to confusing behavior
--> resources/test/fixtures/syntax_errors/return_in_generator.py:3:5
|
1 | async def gen():
2 | yield 1
3 | return 42
| ^^^^^^^^^
4 |
5 | def gen(): # B901 but not a syntax error - not an async generator
|
B901 Using `yield` and `return {value}` in a generator function can lead to confusing behavior
--> resources/test/fixtures/syntax_errors/return_in_generator.py:7:5
|
5 | def gen(): # B901 but not a syntax error - not an async generator
6 | yield 1
7 | return 42
| ^^^^^^^^^
8 |
9 | async def gen(): # ok - no value in return
|
B901 Using `yield` and `return {value}` in a generator function can lead to confusing behavior
--> resources/test/fixtures/syntax_errors/return_in_generator.py:15:5
|
13 | async def gen():
14 | yield 1
15 | return foo()
| ^^^^^^^^^^^^
16 |
17 | async def gen():
|
B901 Using `yield` and `return {value}` in a generator function can lead to confusing behavior
--> resources/test/fixtures/syntax_errors/return_in_generator.py:19:5
|
17 | async def gen():
18 | yield 1
19 | return [1, 2, 3]
| ^^^^^^^^^^^^^^^^
20 |
21 | async def gen():
|
B901 Using `yield` and `return {value}` in a generator function can lead to confusing behavior
--> resources/test/fixtures/syntax_errors/return_in_generator.py:24:5
|
22 | if True:
23 | yield 1
24 | return 10
| ^^^^^^^^^
|

File diff suppressed because it is too large Load Diff

View File

@@ -1,17 +1,46 @@
use std::sync::{LazyLock, Mutex};
use std::cell::RefCell;
use get_size2::{GetSize, StandardTracker};
use ordermap::{OrderMap, OrderSet};
thread_local! {
pub static TRACKER: RefCell<Option<StandardTracker>>= const { RefCell::new(None) };
}
struct TrackerGuard(Option<StandardTracker>);
impl Drop for TrackerGuard {
fn drop(&mut self) {
TRACKER.set(self.0.take());
}
}
pub fn attach_tracker<R>(tracker: StandardTracker, f: impl FnOnce() -> R) -> R {
let prev = TRACKER.replace(Some(tracker));
let _guard = TrackerGuard(prev);
f()
}
fn with_tracker<F, R>(f: F) -> R
where
F: FnOnce(Option<&mut StandardTracker>) -> R,
{
TRACKER.with(|tracker| {
let mut tracker = tracker.borrow_mut();
f(tracker.as_mut())
})
}
/// Returns the memory usage of the provided object, using a global tracker to avoid
/// double-counting shared objects.
pub fn heap_size<T: GetSize>(value: &T) -> usize {
static TRACKER: LazyLock<Mutex<StandardTracker>> =
LazyLock::new(|| Mutex::new(StandardTracker::new()));
value
.get_heap_size_with_tracker(&mut *TRACKER.lock().unwrap())
.0
with_tracker(|tracker| {
if let Some(tracker) = tracker {
value.get_heap_size_with_tracker(tracker).0
} else {
value.get_heap_size()
}
})
}
/// An implementation of [`GetSize::get_heap_size`] for [`OrderSet`].

View File

@@ -29,6 +29,7 @@ pub mod statement_visitor;
pub mod stmt_if;
pub mod str;
pub mod str_prefix;
pub mod token;
pub mod traversal;
pub mod types;
pub mod visitor;

View File

@@ -11,6 +11,8 @@ use crate::ExprRef;
/// Note that without a parent the range can be inaccurate, e.g. `f(a)` we falsely return a set of
/// parentheses around `a` even if the parentheses actually belong to `f`. That is why you should
/// generally prefer [`parenthesized_range`].
///
/// Prefer [`crate::token::parentheses_iterator`] if you have access to [`crate::token::Tokens`].
pub fn parentheses_iterator<'a>(
expr: ExprRef<'a>,
parent: Option<AnyNodeRef>,
@@ -57,6 +59,8 @@ pub fn parentheses_iterator<'a>(
/// Returns the [`TextRange`] of a given expression including parentheses, if the expression is
/// parenthesized; or `None`, if the expression is not parenthesized.
///
/// Prefer [`crate::token::parenthesized_range`] if you have access to [`crate::token::Tokens`].
pub fn parenthesized_range(
expr: ExprRef,
parent: AnyNodeRef,

View File

@@ -0,0 +1,853 @@
//! Token kinds for Python source code created by the lexer and consumed by the `ruff_python_parser`.
//!
//! This module defines the tokens that the lexer recognizes. The tokens are
//! loosely based on the token definitions found in the [CPython source].
//!
//! [CPython source]: https://github.com/python/cpython/blob/dfc2e065a2e71011017077e549cd2f9bf4944c54/Grammar/Tokens
use std::fmt;
use bitflags::bitflags;
use crate::str::{Quote, TripleQuotes};
use crate::str_prefix::{
AnyStringPrefix, ByteStringPrefix, FStringPrefix, StringLiteralPrefix, TStringPrefix,
};
use crate::{AnyStringFlags, BoolOp, Operator, StringFlags, UnaryOp};
use ruff_text_size::{Ranged, TextRange};
mod parentheses;
mod tokens;
pub use parentheses::{parentheses_iterator, parenthesized_range};
pub use tokens::{TokenAt, TokenIterWithContext, Tokens};
#[derive(Clone, Copy, PartialEq, Eq)]
#[cfg_attr(feature = "get-size", derive(get_size2::GetSize))]
pub struct Token {
/// The kind of the token.
kind: TokenKind,
/// The range of the token.
range: TextRange,
/// The set of flags describing this token.
flags: TokenFlags,
}
impl Token {
pub fn new(kind: TokenKind, range: TextRange, flags: TokenFlags) -> Token {
Self { kind, range, flags }
}
/// Returns the token kind.
#[inline]
pub const fn kind(&self) -> TokenKind {
self.kind
}
/// Returns the token as a tuple of (kind, range).
#[inline]
pub const fn as_tuple(&self) -> (TokenKind, TextRange) {
(self.kind, self.range)
}
/// Returns `true` if the current token is a triple-quoted string of any kind.
///
/// # Panics
///
/// If it isn't a string or any f/t-string tokens.
pub fn is_triple_quoted_string(self) -> bool {
self.unwrap_string_flags().is_triple_quoted()
}
/// Returns the [`Quote`] style for the current string token of any kind.
///
/// # Panics
///
/// If it isn't a string or any f/t-string tokens.
pub fn string_quote_style(self) -> Quote {
self.unwrap_string_flags().quote_style()
}
/// Returns the [`AnyStringFlags`] style for the current string token of any kind.
///
/// # Panics
///
/// If it isn't a string or any f/t-string tokens.
pub fn unwrap_string_flags(self) -> AnyStringFlags {
self.string_flags()
.unwrap_or_else(|| panic!("token to be a string"))
}
/// Returns true if the current token is a string and it is raw.
pub fn string_flags(self) -> Option<AnyStringFlags> {
if self.is_any_string() {
Some(self.flags.as_any_string_flags())
} else {
None
}
}
/// Returns `true` if this is any kind of string token - including
/// tokens in t-strings (which do not have type `str`).
const fn is_any_string(self) -> bool {
matches!(
self.kind,
TokenKind::String
| TokenKind::FStringStart
| TokenKind::FStringMiddle
| TokenKind::FStringEnd
| TokenKind::TStringStart
| TokenKind::TStringMiddle
| TokenKind::TStringEnd
)
}
}
impl Ranged for Token {
fn range(&self) -> TextRange {
self.range
}
}
impl fmt::Debug for Token {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
write!(f, "{:?} {:?}", self.kind, self.range)?;
if !self.flags.is_empty() {
f.write_str(" (flags = ")?;
let mut first = true;
for (name, _) in self.flags.iter_names() {
if first {
first = false;
} else {
f.write_str(" | ")?;
}
f.write_str(name)?;
}
f.write_str(")")?;
}
Ok(())
}
}
/// A kind of a token.
#[derive(Copy, Clone, PartialEq, Eq, Hash, Debug, PartialOrd, Ord)]
#[cfg_attr(feature = "get-size", derive(get_size2::GetSize))]
pub enum TokenKind {
/// Token kind for a name, commonly known as an identifier.
Name,
/// Token kind for an integer.
Int,
/// Token kind for a floating point number.
Float,
/// Token kind for a complex number.
Complex,
/// Token kind for a string.
String,
/// Token kind for the start of an f-string. This includes the `f`/`F`/`fr` prefix
/// and the opening quote(s).
FStringStart,
/// Token kind that includes the portion of text inside the f-string that's not
/// part of the expression part and isn't an opening or closing brace.
FStringMiddle,
/// Token kind for the end of an f-string. This includes the closing quote.
FStringEnd,
/// Token kind for the start of a t-string. This includes the `t`/`T`/`tr` prefix
/// and the opening quote(s).
TStringStart,
/// Token kind that includes the portion of text inside the t-string that's not
/// part of the interpolation part and isn't an opening or closing brace.
TStringMiddle,
/// Token kind for the end of a t-string. This includes the closing quote.
TStringEnd,
/// Token kind for a IPython escape command.
IpyEscapeCommand,
/// Token kind for a comment. These are filtered out of the token stream prior to parsing.
Comment,
/// Token kind for a newline.
Newline,
/// Token kind for a newline that is not a logical line break. These are filtered out of
/// the token stream prior to parsing.
NonLogicalNewline,
/// Token kind for an indent.
Indent,
/// Token kind for a dedent.
Dedent,
EndOfFile,
/// Token kind for a question mark `?`.
Question,
/// Token kind for an exclamation mark `!`.
Exclamation,
/// Token kind for a left parenthesis `(`.
Lpar,
/// Token kind for a right parenthesis `)`.
Rpar,
/// Token kind for a left square bracket `[`.
Lsqb,
/// Token kind for a right square bracket `]`.
Rsqb,
/// Token kind for a colon `:`.
Colon,
/// Token kind for a comma `,`.
Comma,
/// Token kind for a semicolon `;`.
Semi,
/// Token kind for plus `+`.
Plus,
/// Token kind for minus `-`.
Minus,
/// Token kind for star `*`.
Star,
/// Token kind for slash `/`.
Slash,
/// Token kind for vertical bar `|`.
Vbar,
/// Token kind for ampersand `&`.
Amper,
/// Token kind for less than `<`.
Less,
/// Token kind for greater than `>`.
Greater,
/// Token kind for equal `=`.
Equal,
/// Token kind for dot `.`.
Dot,
/// Token kind for percent `%`.
Percent,
/// Token kind for left bracket `{`.
Lbrace,
/// Token kind for right bracket `}`.
Rbrace,
/// Token kind for double equal `==`.
EqEqual,
/// Token kind for not equal `!=`.
NotEqual,
/// Token kind for less than or equal `<=`.
LessEqual,
/// Token kind for greater than or equal `>=`.
GreaterEqual,
/// Token kind for tilde `~`.
Tilde,
/// Token kind for caret `^`.
CircumFlex,
/// Token kind for left shift `<<`.
LeftShift,
/// Token kind for right shift `>>`.
RightShift,
/// Token kind for double star `**`.
DoubleStar,
/// Token kind for double star equal `**=`.
DoubleStarEqual,
/// Token kind for plus equal `+=`.
PlusEqual,
/// Token kind for minus equal `-=`.
MinusEqual,
/// Token kind for star equal `*=`.
StarEqual,
/// Token kind for slash equal `/=`.
SlashEqual,
/// Token kind for percent equal `%=`.
PercentEqual,
/// Token kind for ampersand equal `&=`.
AmperEqual,
/// Token kind for vertical bar equal `|=`.
VbarEqual,
/// Token kind for caret equal `^=`.
CircumflexEqual,
/// Token kind for left shift equal `<<=`.
LeftShiftEqual,
/// Token kind for right shift equal `>>=`.
RightShiftEqual,
/// Token kind for double slash `//`.
DoubleSlash,
/// Token kind for double slash equal `//=`.
DoubleSlashEqual,
/// Token kind for colon equal `:=`.
ColonEqual,
/// Token kind for at `@`.
At,
/// Token kind for at equal `@=`.
AtEqual,
/// Token kind for arrow `->`.
Rarrow,
/// Token kind for ellipsis `...`.
Ellipsis,
// The keywords should be sorted in alphabetical order. If the boundary tokens for the
// "Keywords" and "Soft keywords" group change, update the related methods on `TokenKind`.
// Keywords
And,
As,
Assert,
Async,
Await,
Break,
Class,
Continue,
Def,
Del,
Elif,
Else,
Except,
False,
Finally,
For,
From,
Global,
If,
Import,
In,
Is,
Lambda,
None,
Nonlocal,
Not,
Or,
Pass,
Raise,
Return,
True,
Try,
While,
With,
Yield,
// Soft keywords
Case,
Match,
Type,
Unknown,
}
impl TokenKind {
/// Returns `true` if this is an end of file token.
#[inline]
pub const fn is_eof(self) -> bool {
matches!(self, TokenKind::EndOfFile)
}
/// Returns `true` if this is either a newline or non-logical newline token.
#[inline]
pub const fn is_any_newline(self) -> bool {
matches!(self, TokenKind::Newline | TokenKind::NonLogicalNewline)
}
/// Returns `true` if the token is a keyword (including soft keywords).
///
/// See also [`is_soft_keyword`], [`is_non_soft_keyword`].
///
/// [`is_soft_keyword`]: TokenKind::is_soft_keyword
/// [`is_non_soft_keyword`]: TokenKind::is_non_soft_keyword
#[inline]
pub fn is_keyword(self) -> bool {
TokenKind::And <= self && self <= TokenKind::Type
}
/// Returns `true` if the token is strictly a soft keyword.
///
/// See also [`is_keyword`], [`is_non_soft_keyword`].
///
/// [`is_keyword`]: TokenKind::is_keyword
/// [`is_non_soft_keyword`]: TokenKind::is_non_soft_keyword
#[inline]
pub fn is_soft_keyword(self) -> bool {
TokenKind::Case <= self && self <= TokenKind::Type
}
/// Returns `true` if the token is strictly a non-soft keyword.
///
/// See also [`is_keyword`], [`is_soft_keyword`].
///
/// [`is_keyword`]: TokenKind::is_keyword
/// [`is_soft_keyword`]: TokenKind::is_soft_keyword
#[inline]
pub fn is_non_soft_keyword(self) -> bool {
TokenKind::And <= self && self <= TokenKind::Yield
}
#[inline]
pub const fn is_operator(self) -> bool {
matches!(
self,
TokenKind::Lpar
| TokenKind::Rpar
| TokenKind::Lsqb
| TokenKind::Rsqb
| TokenKind::Comma
| TokenKind::Semi
| TokenKind::Plus
| TokenKind::Minus
| TokenKind::Star
| TokenKind::Slash
| TokenKind::Vbar
| TokenKind::Amper
| TokenKind::Less
| TokenKind::Greater
| TokenKind::Equal
| TokenKind::Dot
| TokenKind::Percent
| TokenKind::Lbrace
| TokenKind::Rbrace
| TokenKind::EqEqual
| TokenKind::NotEqual
| TokenKind::LessEqual
| TokenKind::GreaterEqual
| TokenKind::Tilde
| TokenKind::CircumFlex
| TokenKind::LeftShift
| TokenKind::RightShift
| TokenKind::DoubleStar
| TokenKind::PlusEqual
| TokenKind::MinusEqual
| TokenKind::StarEqual
| TokenKind::SlashEqual
| TokenKind::PercentEqual
| TokenKind::AmperEqual
| TokenKind::VbarEqual
| TokenKind::CircumflexEqual
| TokenKind::LeftShiftEqual
| TokenKind::RightShiftEqual
| TokenKind::DoubleStarEqual
| TokenKind::DoubleSlash
| TokenKind::DoubleSlashEqual
| TokenKind::At
| TokenKind::AtEqual
| TokenKind::Rarrow
| TokenKind::Ellipsis
| TokenKind::ColonEqual
| TokenKind::Colon
| TokenKind::And
| TokenKind::Or
| TokenKind::Not
| TokenKind::In
| TokenKind::Is
)
}
/// Returns `true` if this is a singleton token i.e., `True`, `False`, or `None`.
#[inline]
pub const fn is_singleton(self) -> bool {
matches!(self, TokenKind::False | TokenKind::True | TokenKind::None)
}
/// Returns `true` if this is a trivia token i.e., a comment or a non-logical newline.
#[inline]
pub const fn is_trivia(&self) -> bool {
matches!(self, TokenKind::Comment | TokenKind::NonLogicalNewline)
}
/// Returns `true` if this is a comment token.
#[inline]
pub const fn is_comment(&self) -> bool {
matches!(self, TokenKind::Comment)
}
#[inline]
pub const fn is_arithmetic(self) -> bool {
matches!(
self,
TokenKind::DoubleStar
| TokenKind::Star
| TokenKind::Plus
| TokenKind::Minus
| TokenKind::Slash
| TokenKind::DoubleSlash
| TokenKind::At
)
}
#[inline]
pub const fn is_bitwise_or_shift(self) -> bool {
matches!(
self,
TokenKind::LeftShift
| TokenKind::LeftShiftEqual
| TokenKind::RightShift
| TokenKind::RightShiftEqual
| TokenKind::Amper
| TokenKind::AmperEqual
| TokenKind::Vbar
| TokenKind::VbarEqual
| TokenKind::CircumFlex
| TokenKind::CircumflexEqual
| TokenKind::Tilde
)
}
/// Returns `true` if the current token is a unary arithmetic operator.
#[inline]
pub const fn is_unary_arithmetic_operator(self) -> bool {
matches!(self, TokenKind::Plus | TokenKind::Minus)
}
#[inline]
pub const fn is_interpolated_string_end(self) -> bool {
matches!(self, TokenKind::FStringEnd | TokenKind::TStringEnd)
}
/// Returns the [`UnaryOp`] that corresponds to this token kind, if it is a unary arithmetic
/// operator, otherwise return [None].
///
/// Use [`as_unary_operator`] to match against any unary operator.
///
/// [`as_unary_operator`]: TokenKind::as_unary_operator
#[inline]
pub const fn as_unary_arithmetic_operator(self) -> Option<UnaryOp> {
Some(match self {
TokenKind::Plus => UnaryOp::UAdd,
TokenKind::Minus => UnaryOp::USub,
_ => return None,
})
}
/// Returns the [`UnaryOp`] that corresponds to this token kind, if it is a unary operator,
/// otherwise return [None].
///
/// Use [`as_unary_arithmetic_operator`] to match against only an arithmetic unary operator.
///
/// [`as_unary_arithmetic_operator`]: TokenKind::as_unary_arithmetic_operator
#[inline]
pub const fn as_unary_operator(self) -> Option<UnaryOp> {
Some(match self {
TokenKind::Plus => UnaryOp::UAdd,
TokenKind::Minus => UnaryOp::USub,
TokenKind::Tilde => UnaryOp::Invert,
TokenKind::Not => UnaryOp::Not,
_ => return None,
})
}
/// Returns the [`BoolOp`] that corresponds to this token kind, if it is a boolean operator,
/// otherwise return [None].
#[inline]
pub const fn as_bool_operator(self) -> Option<BoolOp> {
Some(match self {
TokenKind::And => BoolOp::And,
TokenKind::Or => BoolOp::Or,
_ => return None,
})
}
/// Returns the binary [`Operator`] that corresponds to the current token, if it's a binary
/// operator, otherwise return [None].
///
/// Use [`as_augmented_assign_operator`] to match against an augmented assignment token.
///
/// [`as_augmented_assign_operator`]: TokenKind::as_augmented_assign_operator
pub const fn as_binary_operator(self) -> Option<Operator> {
Some(match self {
TokenKind::Plus => Operator::Add,
TokenKind::Minus => Operator::Sub,
TokenKind::Star => Operator::Mult,
TokenKind::At => Operator::MatMult,
TokenKind::DoubleStar => Operator::Pow,
TokenKind::Slash => Operator::Div,
TokenKind::DoubleSlash => Operator::FloorDiv,
TokenKind::Percent => Operator::Mod,
TokenKind::Amper => Operator::BitAnd,
TokenKind::Vbar => Operator::BitOr,
TokenKind::CircumFlex => Operator::BitXor,
TokenKind::LeftShift => Operator::LShift,
TokenKind::RightShift => Operator::RShift,
_ => return None,
})
}
/// Returns the [`Operator`] that corresponds to this token kind, if it is
/// an augmented assignment operator, or [`None`] otherwise.
#[inline]
pub const fn as_augmented_assign_operator(self) -> Option<Operator> {
Some(match self {
TokenKind::PlusEqual => Operator::Add,
TokenKind::MinusEqual => Operator::Sub,
TokenKind::StarEqual => Operator::Mult,
TokenKind::AtEqual => Operator::MatMult,
TokenKind::DoubleStarEqual => Operator::Pow,
TokenKind::SlashEqual => Operator::Div,
TokenKind::DoubleSlashEqual => Operator::FloorDiv,
TokenKind::PercentEqual => Operator::Mod,
TokenKind::AmperEqual => Operator::BitAnd,
TokenKind::VbarEqual => Operator::BitOr,
TokenKind::CircumflexEqual => Operator::BitXor,
TokenKind::LeftShiftEqual => Operator::LShift,
TokenKind::RightShiftEqual => Operator::RShift,
_ => return None,
})
}
}
impl From<BoolOp> for TokenKind {
#[inline]
fn from(op: BoolOp) -> Self {
match op {
BoolOp::And => TokenKind::And,
BoolOp::Or => TokenKind::Or,
}
}
}
impl From<UnaryOp> for TokenKind {
#[inline]
fn from(op: UnaryOp) -> Self {
match op {
UnaryOp::Invert => TokenKind::Tilde,
UnaryOp::Not => TokenKind::Not,
UnaryOp::UAdd => TokenKind::Plus,
UnaryOp::USub => TokenKind::Minus,
}
}
}
impl From<Operator> for TokenKind {
#[inline]
fn from(op: Operator) -> Self {
match op {
Operator::Add => TokenKind::Plus,
Operator::Sub => TokenKind::Minus,
Operator::Mult => TokenKind::Star,
Operator::MatMult => TokenKind::At,
Operator::Div => TokenKind::Slash,
Operator::Mod => TokenKind::Percent,
Operator::Pow => TokenKind::DoubleStar,
Operator::LShift => TokenKind::LeftShift,
Operator::RShift => TokenKind::RightShift,
Operator::BitOr => TokenKind::Vbar,
Operator::BitXor => TokenKind::CircumFlex,
Operator::BitAnd => TokenKind::Amper,
Operator::FloorDiv => TokenKind::DoubleSlash,
}
}
}
impl fmt::Display for TokenKind {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
let value = match self {
TokenKind::Unknown => "Unknown",
TokenKind::Newline => "newline",
TokenKind::NonLogicalNewline => "NonLogicalNewline",
TokenKind::Indent => "indent",
TokenKind::Dedent => "dedent",
TokenKind::EndOfFile => "end of file",
TokenKind::Name => "name",
TokenKind::Int => "int",
TokenKind::Float => "float",
TokenKind::Complex => "complex",
TokenKind::String => "string",
TokenKind::FStringStart => "FStringStart",
TokenKind::FStringMiddle => "FStringMiddle",
TokenKind::FStringEnd => "FStringEnd",
TokenKind::TStringStart => "TStringStart",
TokenKind::TStringMiddle => "TStringMiddle",
TokenKind::TStringEnd => "TStringEnd",
TokenKind::IpyEscapeCommand => "IPython escape command",
TokenKind::Comment => "comment",
TokenKind::Question => "`?`",
TokenKind::Exclamation => "`!`",
TokenKind::Lpar => "`(`",
TokenKind::Rpar => "`)`",
TokenKind::Lsqb => "`[`",
TokenKind::Rsqb => "`]`",
TokenKind::Lbrace => "`{`",
TokenKind::Rbrace => "`}`",
TokenKind::Equal => "`=`",
TokenKind::ColonEqual => "`:=`",
TokenKind::Dot => "`.`",
TokenKind::Colon => "`:`",
TokenKind::Semi => "`;`",
TokenKind::Comma => "`,`",
TokenKind::Rarrow => "`->`",
TokenKind::Plus => "`+`",
TokenKind::Minus => "`-`",
TokenKind::Star => "`*`",
TokenKind::DoubleStar => "`**`",
TokenKind::Slash => "`/`",
TokenKind::DoubleSlash => "`//`",
TokenKind::Percent => "`%`",
TokenKind::Vbar => "`|`",
TokenKind::Amper => "`&`",
TokenKind::CircumFlex => "`^`",
TokenKind::LeftShift => "`<<`",
TokenKind::RightShift => "`>>`",
TokenKind::Tilde => "`~`",
TokenKind::At => "`@`",
TokenKind::Less => "`<`",
TokenKind::Greater => "`>`",
TokenKind::EqEqual => "`==`",
TokenKind::NotEqual => "`!=`",
TokenKind::LessEqual => "`<=`",
TokenKind::GreaterEqual => "`>=`",
TokenKind::PlusEqual => "`+=`",
TokenKind::MinusEqual => "`-=`",
TokenKind::StarEqual => "`*=`",
TokenKind::DoubleStarEqual => "`**=`",
TokenKind::SlashEqual => "`/=`",
TokenKind::DoubleSlashEqual => "`//=`",
TokenKind::PercentEqual => "`%=`",
TokenKind::VbarEqual => "`|=`",
TokenKind::AmperEqual => "`&=`",
TokenKind::CircumflexEqual => "`^=`",
TokenKind::LeftShiftEqual => "`<<=`",
TokenKind::RightShiftEqual => "`>>=`",
TokenKind::AtEqual => "`@=`",
TokenKind::Ellipsis => "`...`",
TokenKind::False => "`False`",
TokenKind::None => "`None`",
TokenKind::True => "`True`",
TokenKind::And => "`and`",
TokenKind::As => "`as`",
TokenKind::Assert => "`assert`",
TokenKind::Async => "`async`",
TokenKind::Await => "`await`",
TokenKind::Break => "`break`",
TokenKind::Class => "`class`",
TokenKind::Continue => "`continue`",
TokenKind::Def => "`def`",
TokenKind::Del => "`del`",
TokenKind::Elif => "`elif`",
TokenKind::Else => "`else`",
TokenKind::Except => "`except`",
TokenKind::Finally => "`finally`",
TokenKind::For => "`for`",
TokenKind::From => "`from`",
TokenKind::Global => "`global`",
TokenKind::If => "`if`",
TokenKind::Import => "`import`",
TokenKind::In => "`in`",
TokenKind::Is => "`is`",
TokenKind::Lambda => "`lambda`",
TokenKind::Nonlocal => "`nonlocal`",
TokenKind::Not => "`not`",
TokenKind::Or => "`or`",
TokenKind::Pass => "`pass`",
TokenKind::Raise => "`raise`",
TokenKind::Return => "`return`",
TokenKind::Try => "`try`",
TokenKind::While => "`while`",
TokenKind::Match => "`match`",
TokenKind::Type => "`type`",
TokenKind::Case => "`case`",
TokenKind::With => "`with`",
TokenKind::Yield => "`yield`",
};
f.write_str(value)
}
}
bitflags! {
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
pub struct TokenFlags: u16 {
/// The token is a string with double quotes (`"`).
const DOUBLE_QUOTES = 1 << 0;
/// The token is a triple-quoted string i.e., it starts and ends with three consecutive
/// quote characters (`"""` or `'''`).
const TRIPLE_QUOTED_STRING = 1 << 1;
/// The token is a unicode string i.e., prefixed with `u` or `U`
const UNICODE_STRING = 1 << 2;
/// The token is a byte string i.e., prefixed with `b` or `B`
const BYTE_STRING = 1 << 3;
/// The token is an f-string i.e., prefixed with `f` or `F`
const F_STRING = 1 << 4;
/// The token is a t-string i.e., prefixed with `t` or `T`
const T_STRING = 1 << 5;
/// The token is a raw string and the prefix character is in lowercase.
const RAW_STRING_LOWERCASE = 1 << 6;
/// The token is a raw string and the prefix character is in uppercase.
const RAW_STRING_UPPERCASE = 1 << 7;
/// String without matching closing quote(s)
const UNCLOSED_STRING = 1 << 8;
/// The token is a raw string i.e., prefixed with `r` or `R`
const RAW_STRING = Self::RAW_STRING_LOWERCASE.bits() | Self::RAW_STRING_UPPERCASE.bits();
}
}
#[cfg(feature = "get-size")]
impl get_size2::GetSize for TokenFlags {}
impl StringFlags for TokenFlags {
fn quote_style(self) -> Quote {
if self.intersects(TokenFlags::DOUBLE_QUOTES) {
Quote::Double
} else {
Quote::Single
}
}
fn triple_quotes(self) -> TripleQuotes {
if self.intersects(TokenFlags::TRIPLE_QUOTED_STRING) {
TripleQuotes::Yes
} else {
TripleQuotes::No
}
}
fn prefix(self) -> AnyStringPrefix {
if self.intersects(TokenFlags::F_STRING) {
if self.intersects(TokenFlags::RAW_STRING_LOWERCASE) {
AnyStringPrefix::Format(FStringPrefix::Raw { uppercase_r: false })
} else if self.intersects(TokenFlags::RAW_STRING_UPPERCASE) {
AnyStringPrefix::Format(FStringPrefix::Raw { uppercase_r: true })
} else {
AnyStringPrefix::Format(FStringPrefix::Regular)
}
} else if self.intersects(TokenFlags::T_STRING) {
if self.intersects(TokenFlags::RAW_STRING_LOWERCASE) {
AnyStringPrefix::Template(TStringPrefix::Raw { uppercase_r: false })
} else if self.intersects(TokenFlags::RAW_STRING_UPPERCASE) {
AnyStringPrefix::Template(TStringPrefix::Raw { uppercase_r: true })
} else {
AnyStringPrefix::Template(TStringPrefix::Regular)
}
} else if self.intersects(TokenFlags::BYTE_STRING) {
if self.intersects(TokenFlags::RAW_STRING_LOWERCASE) {
AnyStringPrefix::Bytes(ByteStringPrefix::Raw { uppercase_r: false })
} else if self.intersects(TokenFlags::RAW_STRING_UPPERCASE) {
AnyStringPrefix::Bytes(ByteStringPrefix::Raw { uppercase_r: true })
} else {
AnyStringPrefix::Bytes(ByteStringPrefix::Regular)
}
} else if self.intersects(TokenFlags::RAW_STRING_LOWERCASE) {
AnyStringPrefix::Regular(StringLiteralPrefix::Raw { uppercase: false })
} else if self.intersects(TokenFlags::RAW_STRING_UPPERCASE) {
AnyStringPrefix::Regular(StringLiteralPrefix::Raw { uppercase: true })
} else if self.intersects(TokenFlags::UNICODE_STRING) {
AnyStringPrefix::Regular(StringLiteralPrefix::Unicode)
} else {
AnyStringPrefix::Regular(StringLiteralPrefix::Empty)
}
}
fn is_unclosed(self) -> bool {
self.intersects(TokenFlags::UNCLOSED_STRING)
}
}
impl TokenFlags {
/// Returns `true` if the token is an f-string.
pub const fn is_f_string(self) -> bool {
self.intersects(TokenFlags::F_STRING)
}
/// Returns `true` if the token is a t-string.
pub const fn is_t_string(self) -> bool {
self.intersects(TokenFlags::T_STRING)
}
/// Returns `true` if the token is a t-string.
pub const fn is_interpolated_string(self) -> bool {
self.intersects(TokenFlags::T_STRING.union(TokenFlags::F_STRING))
}
/// Returns `true` if the token is a triple-quoted t-string.
pub fn is_triple_quoted_interpolated_string(self) -> bool {
self.intersects(TokenFlags::TRIPLE_QUOTED_STRING) && self.is_interpolated_string()
}
/// Returns `true` if the token is a raw string.
pub const fn is_raw_string(self) -> bool {
self.intersects(TokenFlags::RAW_STRING)
}
}

View File

@@ -0,0 +1,58 @@
use ruff_text_size::{Ranged, TextLen, TextRange};
use super::{TokenKind, Tokens};
use crate::{AnyNodeRef, ExprRef};
/// Returns an iterator over the ranges of the optional parentheses surrounding an expression.
///
/// E.g. for `((f()))` with `f()` as expression, the iterator returns the ranges (1, 6) and (0, 7).
///
/// Note that without a parent the range can be inaccurate, e.g. `f(a)` we falsely return a set of
/// parentheses around `a` even if the parentheses actually belong to `f`. That is why you should
/// generally prefer [`parenthesized_range`].
pub fn parentheses_iterator<'a>(
expr: ExprRef<'a>,
parent: Option<AnyNodeRef>,
tokens: &'a Tokens,
) -> impl Iterator<Item = TextRange> + 'a {
let after_tokens = if let Some(parent) = parent {
// If the parent is a node that brings its own parentheses, exclude the closing parenthesis
// from our search range. Otherwise, we risk matching on calls, like `func(x)`, for which
// the open and close parentheses are part of the `Arguments` node.
let exclusive_parent_end = if parent.is_arguments() {
parent.end() - ")".text_len()
} else {
parent.end()
};
tokens.in_range(TextRange::new(expr.end(), exclusive_parent_end))
} else {
tokens.after(expr.end())
};
let right_parens = after_tokens
.iter()
.filter(|token| !token.kind().is_trivia())
.take_while(move |token| token.kind() == TokenKind::Rpar);
let left_parens = tokens
.before(expr.start())
.iter()
.rev()
.filter(|token| !token.kind().is_trivia())
.take_while(|token| token.kind() == TokenKind::Lpar);
right_parens
.zip(left_parens)
.map(|(right, left)| TextRange::new(left.start(), right.end()))
}
/// Returns the [`TextRange`] of a given expression including parentheses, if the expression is
/// parenthesized; or `None`, if the expression is not parenthesized.
pub fn parenthesized_range(
expr: ExprRef,
parent: AnyNodeRef,
tokens: &Tokens,
) -> Option<TextRange> {
parentheses_iterator(expr, Some(parent), tokens).last()
}

View File

@@ -0,0 +1,520 @@
use std::{iter::FusedIterator, ops::Deref};
use super::{Token, TokenKind};
use ruff_python_trivia::CommentRanges;
use ruff_text_size::{Ranged as _, TextRange, TextSize};
/// Tokens represents a vector of lexed [`Token`].
#[derive(Debug, Clone, PartialEq, Eq)]
#[cfg_attr(feature = "get-size", derive(get_size2::GetSize))]
pub struct Tokens {
raw: Vec<Token>,
}
impl Tokens {
pub fn new(tokens: Vec<Token>) -> Tokens {
Tokens { raw: tokens }
}
/// Returns an iterator over all the tokens that provides context.
pub fn iter_with_context(&self) -> TokenIterWithContext<'_> {
TokenIterWithContext::new(&self.raw)
}
/// Performs a binary search to find the index of the **first** token that starts at the given `offset`.
///
/// Unlike `binary_search_by_key`, this method ensures that if multiple tokens start at the same offset,
/// it returns the index of the first one. Multiple tokens can start at the same offset in cases where
/// zero-length tokens are involved (like `Dedent` or `Newline` at the end of the file).
pub fn binary_search_by_start(&self, offset: TextSize) -> Result<usize, usize> {
let partition_point = self.partition_point(|token| token.start() < offset);
let after = &self[partition_point..];
if after.first().is_some_and(|first| first.start() == offset) {
Ok(partition_point)
} else {
Err(partition_point)
}
}
/// Returns a slice of [`Token`] that are within the given `range`.
///
/// The start and end offset of the given range should be either:
/// 1. Token boundary
/// 2. Gap between the tokens
///
/// For example, considering the following tokens and their corresponding range:
///
/// | Token | Range |
/// |---------------------|-----------|
/// | `Def` | `0..3` |
/// | `Name` | `4..7` |
/// | `Lpar` | `7..8` |
/// | `Rpar` | `8..9` |
/// | `Colon` | `9..10` |
/// | `Newline` | `10..11` |
/// | `Comment` | `15..24` |
/// | `NonLogicalNewline` | `24..25` |
/// | `Indent` | `25..29` |
/// | `Pass` | `29..33` |
///
/// Here, for (1) a token boundary is considered either the start or end offset of any of the
/// above tokens. For (2), the gap would be any offset between the `Newline` and `Comment`
/// token which are 12, 13, and 14.
///
/// Examples:
/// 1) `4..10` would give `Name`, `Lpar`, `Rpar`, `Colon`
/// 2) `11..25` would give `Comment`, `NonLogicalNewline`
/// 3) `12..25` would give same as (2) and offset 12 is in the "gap"
/// 4) `9..12` would give `Colon`, `Newline` and offset 12 is in the "gap"
/// 5) `18..27` would panic because both the start and end offset is within a token
///
/// ## Note
///
/// The returned slice can contain the [`TokenKind::Unknown`] token if there was a lexical
/// error encountered within the given range.
///
/// # Panics
///
/// If either the start or end offset of the given range is within a token range.
pub fn in_range(&self, range: TextRange) -> &[Token] {
let tokens_after_start = self.after(range.start());
Self::before_impl(tokens_after_start, range.end())
}
/// Searches the token(s) at `offset`.
///
/// Returns [`TokenAt::Between`] if `offset` points directly inbetween two tokens
/// (the left token ends at `offset` and the right token starts at `offset`).
pub fn at_offset(&self, offset: TextSize) -> TokenAt {
match self.binary_search_by_start(offset) {
// The token at `index` starts exactly at `offset.
// ```python
// object.attribute
// ^ OFFSET
// ```
Ok(index) => {
let token = self[index];
// `token` starts exactly at `offset`. Test if the offset is right between
// `token` and the previous token (if there's any)
if let Some(previous) = index.checked_sub(1).map(|idx| self[idx]) {
if previous.end() == offset {
return TokenAt::Between(previous, token);
}
}
TokenAt::Single(token)
}
// No token found that starts exactly at the given offset. But it's possible that
// the token starting before `offset` fully encloses `offset` (it's end range ends after `offset`).
// ```python
// object.attribute
// ^ OFFSET
// # or
// if True:
// print("test")
// ^ OFFSET
// ```
Err(index) => {
if let Some(previous) = index.checked_sub(1).map(|idx| self[idx]) {
if previous.range().contains_inclusive(offset) {
return TokenAt::Single(previous);
}
}
TokenAt::None
}
}
}
/// Returns a slice of tokens before the given [`TextSize`] offset.
///
/// If the given offset is between two tokens, the returned slice will end just before the
/// following token. In other words, if the offset is between the end of previous token and
/// start of next token, the returned slice will end just before the next token.
///
/// # Panics
///
/// If the given offset is inside a token range at any point
/// other than the start of the range.
pub fn before(&self, offset: TextSize) -> &[Token] {
Self::before_impl(&self.raw, offset)
}
fn before_impl(tokens: &[Token], offset: TextSize) -> &[Token] {
let partition_point = tokens.partition_point(|token| token.start() < offset);
let before = &tokens[..partition_point];
if let Some(last) = before.last() {
// If it's equal to the end offset, then it's at a token boundary which is
// valid. If it's greater than the end offset, then it's in the gap between
// the tokens which is valid as well.
assert!(
offset >= last.end(),
"Offset {:?} is inside a token range {:?}",
offset,
last.range()
);
}
before
}
/// Returns a slice of tokens after the given [`TextSize`] offset.
///
/// If the given offset is between two tokens, the returned slice will start from the following
/// token. In other words, if the offset is between the end of previous token and start of next
/// token, the returned slice will start from the next token.
///
/// # Panics
///
/// If the given offset is inside a token range at any point
/// other than the start of the range.
pub fn after(&self, offset: TextSize) -> &[Token] {
let partition_point = self.partition_point(|token| token.end() <= offset);
let after = &self[partition_point..];
if let Some(first) = after.first() {
// valid. If it's greater than the end offset, then it's in the gap between
// the tokens which is valid as well.
assert!(
offset <= first.start(),
"Offset {:?} is inside a token range {:?}",
offset,
first.range()
);
}
after
}
}
impl<'a> IntoIterator for &'a Tokens {
type Item = &'a Token;
type IntoIter = std::slice::Iter<'a, Token>;
fn into_iter(self) -> Self::IntoIter {
self.iter()
}
}
impl Deref for Tokens {
type Target = [Token];
fn deref(&self) -> &Self::Target {
&self.raw
}
}
/// A token that encloses a given offset or ends exactly at it.
#[derive(Debug, Clone)]
pub enum TokenAt {
/// There's no token at the given offset
None,
/// There's a single token at the given offset.
Single(Token),
/// The offset falls exactly between two tokens. E.g. `CURSOR` in `call<CURSOR>(arguments)` is
/// positioned exactly between the `call` and `(` tokens.
Between(Token, Token),
}
impl Iterator for TokenAt {
type Item = Token;
fn next(&mut self) -> Option<Self::Item> {
match *self {
TokenAt::None => None,
TokenAt::Single(token) => {
*self = TokenAt::None;
Some(token)
}
TokenAt::Between(first, second) => {
*self = TokenAt::Single(second);
Some(first)
}
}
}
}
impl FusedIterator for TokenAt {}
impl From<&Tokens> for CommentRanges {
fn from(tokens: &Tokens) -> Self {
let mut ranges = vec![];
for token in tokens {
if token.kind() == TokenKind::Comment {
ranges.push(token.range());
}
}
CommentRanges::new(ranges)
}
}
/// An iterator over the [`Token`]s with context.
///
/// This struct is created by the [`iter_with_context`] method on [`Tokens`]. Refer to its
/// documentation for more details.
///
/// [`iter_with_context`]: Tokens::iter_with_context
#[derive(Debug, Clone)]
pub struct TokenIterWithContext<'a> {
inner: std::slice::Iter<'a, Token>,
nesting: u32,
}
impl<'a> TokenIterWithContext<'a> {
fn new(tokens: &'a [Token]) -> TokenIterWithContext<'a> {
TokenIterWithContext {
inner: tokens.iter(),
nesting: 0,
}
}
/// Return the nesting level the iterator is currently in.
pub const fn nesting(&self) -> u32 {
self.nesting
}
/// Returns `true` if the iterator is within a parenthesized context.
pub const fn in_parenthesized_context(&self) -> bool {
self.nesting > 0
}
/// Returns the next [`Token`] in the iterator without consuming it.
pub fn peek(&self) -> Option<&'a Token> {
self.clone().next()
}
}
impl<'a> Iterator for TokenIterWithContext<'a> {
type Item = &'a Token;
fn next(&mut self) -> Option<Self::Item> {
let token = self.inner.next()?;
match token.kind() {
TokenKind::Lpar | TokenKind::Lbrace | TokenKind::Lsqb => self.nesting += 1,
TokenKind::Rpar | TokenKind::Rbrace | TokenKind::Rsqb => {
self.nesting = self.nesting.saturating_sub(1);
}
// This mimics the behavior of re-lexing which reduces the nesting level on the lexer.
// We don't need to reduce it by 1 because unlike the lexer we see the final token
// after recovering from every unclosed parenthesis.
TokenKind::Newline if self.nesting > 0 => {
self.nesting = 0;
}
_ => {}
}
Some(token)
}
}
impl FusedIterator for TokenIterWithContext<'_> {}
#[cfg(test)]
mod tests {
use std::ops::Range;
use ruff_text_size::TextSize;
use crate::token::{Token, TokenFlags, TokenKind};
use super::*;
/// Test case containing a "gap" between two tokens.
///
/// Code: <https://play.ruff.rs/a3658340-6df8-42c5-be80-178744bf1193>
const TEST_CASE_WITH_GAP: [(TokenKind, Range<u32>); 10] = [
(TokenKind::Def, 0..3),
(TokenKind::Name, 4..7),
(TokenKind::Lpar, 7..8),
(TokenKind::Rpar, 8..9),
(TokenKind::Colon, 9..10),
(TokenKind::Newline, 10..11),
// Gap ||..||
(TokenKind::Comment, 15..24),
(TokenKind::NonLogicalNewline, 24..25),
(TokenKind::Indent, 25..29),
(TokenKind::Pass, 29..33),
// No newline at the end to keep the token set full of unique tokens
];
/// Helper function to create [`Tokens`] from an iterator of (kind, range).
fn new_tokens(tokens: impl Iterator<Item = (TokenKind, Range<u32>)>) -> Tokens {
Tokens::new(
tokens
.map(|(kind, range)| {
Token::new(
kind,
TextRange::new(TextSize::new(range.start), TextSize::new(range.end)),
TokenFlags::empty(),
)
})
.collect(),
)
}
#[test]
fn tokens_after_offset_at_token_start() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let after = tokens.after(TextSize::new(8));
assert_eq!(after.len(), 7);
assert_eq!(after.first().unwrap().kind(), TokenKind::Rpar);
}
#[test]
fn tokens_after_offset_at_token_end() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let after = tokens.after(TextSize::new(11));
assert_eq!(after.len(), 4);
assert_eq!(after.first().unwrap().kind(), TokenKind::Comment);
}
#[test]
fn tokens_after_offset_between_tokens() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let after = tokens.after(TextSize::new(13));
assert_eq!(after.len(), 4);
assert_eq!(after.first().unwrap().kind(), TokenKind::Comment);
}
#[test]
fn tokens_after_offset_at_last_token_end() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let after = tokens.after(TextSize::new(33));
assert_eq!(after.len(), 0);
}
#[test]
#[should_panic(expected = "Offset 5 is inside a token range 4..7")]
fn tokens_after_offset_inside_token() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
tokens.after(TextSize::new(5));
}
#[test]
fn tokens_before_offset_at_first_token_start() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let before = tokens.before(TextSize::new(0));
assert_eq!(before.len(), 0);
}
#[test]
fn tokens_before_offset_after_first_token_gap() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let before = tokens.before(TextSize::new(3));
assert_eq!(before.len(), 1);
assert_eq!(before.last().unwrap().kind(), TokenKind::Def);
}
#[test]
fn tokens_before_offset_at_second_token_start() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let before = tokens.before(TextSize::new(4));
assert_eq!(before.len(), 1);
assert_eq!(before.last().unwrap().kind(), TokenKind::Def);
}
#[test]
fn tokens_before_offset_at_token_start() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let before = tokens.before(TextSize::new(8));
assert_eq!(before.len(), 3);
assert_eq!(before.last().unwrap().kind(), TokenKind::Lpar);
}
#[test]
fn tokens_before_offset_at_token_end() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let before = tokens.before(TextSize::new(11));
assert_eq!(before.len(), 6);
assert_eq!(before.last().unwrap().kind(), TokenKind::Newline);
}
#[test]
fn tokens_before_offset_between_tokens() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let before = tokens.before(TextSize::new(13));
assert_eq!(before.len(), 6);
assert_eq!(before.last().unwrap().kind(), TokenKind::Newline);
}
#[test]
fn tokens_before_offset_at_last_token_end() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let before = tokens.before(TextSize::new(33));
assert_eq!(before.len(), 10);
assert_eq!(before.last().unwrap().kind(), TokenKind::Pass);
}
#[test]
#[should_panic(expected = "Offset 5 is inside a token range 4..7")]
fn tokens_before_offset_inside_token() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
tokens.before(TextSize::new(5));
}
#[test]
fn tokens_in_range_at_token_offset() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let in_range = tokens.in_range(TextRange::new(4.into(), 10.into()));
assert_eq!(in_range.len(), 4);
assert_eq!(in_range.first().unwrap().kind(), TokenKind::Name);
assert_eq!(in_range.last().unwrap().kind(), TokenKind::Colon);
}
#[test]
fn tokens_in_range_start_offset_at_token_end() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let in_range = tokens.in_range(TextRange::new(11.into(), 29.into()));
assert_eq!(in_range.len(), 3);
assert_eq!(in_range.first().unwrap().kind(), TokenKind::Comment);
assert_eq!(in_range.last().unwrap().kind(), TokenKind::Indent);
}
#[test]
fn tokens_in_range_end_offset_at_token_start() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let in_range = tokens.in_range(TextRange::new(8.into(), 15.into()));
assert_eq!(in_range.len(), 3);
assert_eq!(in_range.first().unwrap().kind(), TokenKind::Rpar);
assert_eq!(in_range.last().unwrap().kind(), TokenKind::Newline);
}
#[test]
fn tokens_in_range_start_offset_between_tokens() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let in_range = tokens.in_range(TextRange::new(13.into(), 29.into()));
assert_eq!(in_range.len(), 3);
assert_eq!(in_range.first().unwrap().kind(), TokenKind::Comment);
assert_eq!(in_range.last().unwrap().kind(), TokenKind::Indent);
}
#[test]
fn tokens_in_range_end_offset_between_tokens() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let in_range = tokens.in_range(TextRange::new(9.into(), 13.into()));
assert_eq!(in_range.len(), 2);
assert_eq!(in_range.first().unwrap().kind(), TokenKind::Colon);
assert_eq!(in_range.last().unwrap().kind(), TokenKind::Newline);
}
#[test]
#[should_panic(expected = "Offset 5 is inside a token range 4..7")]
fn tokens_in_range_start_offset_inside_token() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
tokens.in_range(TextRange::new(5.into(), 10.into()));
}
#[test]
#[should_panic(expected = "Offset 6 is inside a token range 4..7")]
fn tokens_in_range_end_offset_inside_token() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
tokens.in_range(TextRange::new(0.into(), 6.into()));
}
}

View File

@@ -0,0 +1,199 @@
//! Tests for [`ruff_python_ast::tokens::parentheses_iterator`] and
//! [`ruff_python_ast::tokens::parenthesized_range`].
use ruff_python_ast::{
self as ast, Expr,
token::{parentheses_iterator, parenthesized_range},
};
use ruff_python_parser::parse_module;
#[test]
fn test_no_parentheses() {
let source = "x = 2 + 2";
let parsed = parse_module(source).expect("should parse valid python");
let tokens = parsed.tokens();
let module = parsed.syntax();
let stmt = module.body.first().expect("module should have a statement");
let ast::Stmt::Assign(assign) = stmt else {
panic!("expected `Assign` statement, got {stmt:?}");
};
let result = parenthesized_range(assign.value.as_ref().into(), stmt.into(), tokens);
assert_eq!(result, None);
}
#[test]
fn test_single_parentheses() {
let source = "x = (2 + 2)";
let parsed = parse_module(source).expect("should parse valid python");
let tokens = parsed.tokens();
let module = parsed.syntax();
let stmt = module.body.first().expect("module should have a statement");
let ast::Stmt::Assign(assign) = stmt else {
panic!("expected `Assign` statement, got {stmt:?}");
};
let result = parenthesized_range(assign.value.as_ref().into(), stmt.into(), tokens);
let range = result.expect("should find parentheses");
assert_eq!(&source[range], "(2 + 2)");
}
#[test]
fn test_double_parentheses() {
let source = "x = ((2 + 2))";
let parsed = parse_module(source).expect("should parse valid python");
let tokens = parsed.tokens();
let module = parsed.syntax();
let stmt = module.body.first().expect("module should have a statement");
let ast::Stmt::Assign(assign) = stmt else {
panic!("expected `Assign` statement, got {stmt:?}");
};
let result = parenthesized_range(assign.value.as_ref().into(), stmt.into(), tokens);
let range = result.expect("should find parentheses");
assert_eq!(&source[range], "((2 + 2))");
}
#[test]
fn test_parentheses_with_whitespace() {
let source = "x = ( 2 + 2 )";
let parsed = parse_module(source).expect("should parse valid python");
let tokens = parsed.tokens();
let module = parsed.syntax();
let stmt = module.body.first().expect("module should have a statement");
let ast::Stmt::Assign(assign) = stmt else {
panic!("expected `Assign` statement, got {stmt:?}");
};
let result = parenthesized_range(assign.value.as_ref().into(), stmt.into(), tokens);
let range = result.expect("should find parentheses");
assert_eq!(&source[range], "( 2 + 2 )");
}
#[test]
fn test_parentheses_with_comments() {
let source = "x = ( # comment\n 2 + 2\n)";
let parsed = parse_module(source).expect("should parse valid python");
let tokens = parsed.tokens();
let module = parsed.syntax();
let stmt = module.body.first().expect("module should have a statement");
let ast::Stmt::Assign(assign) = stmt else {
panic!("expected `Assign` statement, got {stmt:?}");
};
let result = parenthesized_range(assign.value.as_ref().into(), stmt.into(), tokens);
let range = result.expect("should find parentheses");
assert_eq!(&source[range], "( # comment\n 2 + 2\n)");
}
#[test]
fn test_parenthesized_range_multiple() {
let source = "x = (((2 + 2)))";
let parsed = parse_module(source).expect("should parse valid python");
let tokens = parsed.tokens();
let module = parsed.syntax();
let stmt = module.body.first().expect("module should have a statement");
let ast::Stmt::Assign(assign) = stmt else {
panic!("expected `Assign` statement, got {stmt:?}");
};
let result = parenthesized_range(assign.value.as_ref().into(), stmt.into(), tokens);
let range = result.expect("should find parentheses");
assert_eq!(&source[range], "(((2 + 2)))");
}
#[test]
fn test_parentheses_iterator_multiple() {
let source = "x = (((2 + 2)))";
let parsed = parse_module(source).expect("should parse valid python");
let tokens = parsed.tokens();
let module = parsed.syntax();
let stmt = module.body.first().expect("module should have a statement");
let ast::Stmt::Assign(assign) = stmt else {
panic!("expected `Assign` statement, got {stmt:?}");
};
let ranges: Vec<_> =
parentheses_iterator(assign.value.as_ref().into(), Some(stmt.into()), tokens).collect();
assert_eq!(ranges.len(), 3);
assert_eq!(&source[ranges[0]], "(2 + 2)");
assert_eq!(&source[ranges[1]], "((2 + 2))");
assert_eq!(&source[ranges[2]], "(((2 + 2)))");
}
#[test]
fn test_call_arguments_not_counted() {
let source = "f(x)";
let parsed = parse_module(source).expect("should parse valid python");
let tokens = parsed.tokens();
let module = parsed.syntax();
let stmt = module.body.first().expect("module should have a statement");
let ast::Stmt::Expr(expr_stmt) = stmt else {
panic!("expected `Expr` statement, got {stmt:?}");
};
let Expr::Call(call) = expr_stmt.value.as_ref() else {
panic!("expected Call expression, got {:?}", expr_stmt.value);
};
let arg = call
.arguments
.args
.first()
.expect("call should have an argument");
let result = parenthesized_range(arg.into(), (&call.arguments).into(), tokens);
// The parentheses belong to the call, not the argument
assert_eq!(result, None);
}
#[test]
fn test_call_with_parenthesized_argument() {
let source = "f((x))";
let parsed = parse_module(source).expect("should parse valid python");
let tokens = parsed.tokens();
let module = parsed.syntax();
let stmt = module.body.first().expect("module should have a statement");
let ast::Stmt::Expr(expr_stmt) = stmt else {
panic!("expected Expr statement, got {stmt:?}");
};
let Expr::Call(call) = expr_stmt.value.as_ref() else {
panic!("expected `Call` expression, got {:?}", expr_stmt.value);
};
let arg = call
.arguments
.args
.first()
.expect("call should have an argument");
let result = parenthesized_range(arg.into(), (&call.arguments).into(), tokens);
let range = result.expect("should find parentheses around argument");
assert_eq!(&source[range], "(x)");
}
#[test]
fn test_multiline_with_parentheses() {
let source = "x = (\n 2 + 2 + 2\n)";
let parsed = parse_module(source).expect("should parse valid python");
let tokens = parsed.tokens();
let module = parsed.syntax();
let stmt = module.body.first().expect("module should have a statement");
let ast::Stmt::Assign(assign) = stmt else {
panic!("expected `Assign` statement, got {stmt:?}");
};
let result = parenthesized_range(assign.value.as_ref().into(), stmt.into(), tokens);
let range = result.expect("should find parentheses");
assert_eq!(&source[range], "(\n 2 + 2 + 2\n)");
}

View File

@@ -5,7 +5,7 @@ use std::cell::OnceCell;
use std::ops::Deref;
use ruff_python_ast::str::Quote;
use ruff_python_parser::{Token, TokenKind, Tokens};
use ruff_python_ast::token::{Token, TokenKind, Tokens};
use ruff_source_file::{LineEnding, LineRanges, find_newline};
use ruff_text_size::Ranged;

View File

@@ -3,7 +3,7 @@ use std::ops::{Deref, DerefMut};
use ruff_formatter::{Buffer, FormatContext, GroupId, IndentWidth, SourceCode};
use ruff_python_ast::str::Quote;
use ruff_python_parser::Tokens;
use ruff_python_ast::token::Tokens;
use crate::PyFormatOptions;
use crate::comments::Comments;

View File

@@ -5,7 +5,7 @@ use std::slice::Iter;
use ruff_formatter::{FormatError, write};
use ruff_python_ast::AnyNodeRef;
use ruff_python_ast::Stmt;
use ruff_python_parser::{self as parser, TokenKind};
use ruff_python_ast::token::{Token as AstToken, TokenKind};
use ruff_python_trivia::lines_before;
use ruff_source_file::LineRanges;
use ruff_text_size::{Ranged, TextRange, TextSize};
@@ -770,7 +770,7 @@ impl Format<PyFormatContext<'_>> for FormatVerbatimStatementRange {
}
struct LogicalLinesIter<'a> {
tokens: Iter<'a, parser::Token>,
tokens: Iter<'a, AstToken>,
// The end of the last logical line
last_line_end: TextSize,
// The position where the content to lex ends.
@@ -778,7 +778,7 @@ struct LogicalLinesIter<'a> {
}
impl<'a> LogicalLinesIter<'a> {
fn new(tokens: Iter<'a, parser::Token>, verbatim_range: TextRange) -> Self {
fn new(tokens: Iter<'a, AstToken>, verbatim_range: TextRange) -> Self {
Self {
tokens,
last_line_end: verbatim_range.start(),

View File

@@ -14,7 +14,6 @@ license = { workspace = true }
ruff_diagnostics = { workspace = true }
ruff_python_ast = { workspace = true }
ruff_python_codegen = { workspace = true }
ruff_python_parser = { workspace = true }
ruff_python_trivia = { workspace = true }
ruff_source_file = { workspace = true, features = ["serde"] }
ruff_text_size = { workspace = true }
@@ -22,6 +21,8 @@ ruff_text_size = { workspace = true }
anyhow = { workspace = true }
[dev-dependencies]
ruff_python_parser = { workspace = true }
insta = { workspace = true }
[features]

View File

@@ -5,8 +5,8 @@ use std::ops::Add;
use ruff_diagnostics::Edit;
use ruff_python_ast::Stmt;
use ruff_python_ast::helpers::is_docstring_stmt;
use ruff_python_ast::token::{TokenKind, Tokens};
use ruff_python_codegen::Stylist;
use ruff_python_parser::{TokenKind, Tokens};
use ruff_python_trivia::is_python_whitespace;
use ruff_python_trivia::{PythonWhitespace, textwrap::indent};
use ruff_source_file::{LineRanges, UniversalNewlineIterator};
@@ -194,7 +194,7 @@ impl<'a> Insertion<'a> {
tokens
.before(at)
.last()
.map(ruff_python_parser::Token::kind),
.map(ruff_python_ast::token::Token::kind),
Some(TokenKind::Import)
) {
return None;

View File

@@ -15,12 +15,12 @@ doctest = false
[dependencies]
ruff_python_ast = { workspace = true }
ruff_python_parser = { workspace = true }
ruff_python_trivia = { workspace = true }
ruff_source_file = { workspace = true }
ruff_text_size = { workspace = true }
[dev-dependencies]
ruff_python_parser = { workspace = true }
[lints]
workspace = true

View File

@@ -2,7 +2,7 @@
//! are omitted from the AST (e.g., commented lines).
use ruff_python_ast::Stmt;
use ruff_python_parser::{TokenKind, Tokens};
use ruff_python_ast::token::{TokenKind, Tokens};
use ruff_python_trivia::{
CommentRanges, has_leading_content, has_trailing_content, is_python_whitespace,
};

View File

@@ -1,6 +1,6 @@
use std::collections::BTreeMap;
use ruff_python_parser::{Token, TokenKind};
use ruff_python_ast::token::{Token, TokenKind};
use ruff_text_size::{Ranged, TextRange, TextSize};
/// Stores the ranges of all interpolated strings in a file sorted by [`TextRange::start`].

View File

@@ -1,4 +1,4 @@
use ruff_python_parser::{Token, TokenKind};
use ruff_python_ast::token::{Token, TokenKind};
use ruff_text_size::{Ranged, TextRange};
/// Stores the range of all multiline strings in a file sorted by

View File

@@ -0,0 +1,3 @@
class C[T = int, U]: ...
class C[T1, T2 = int, T3, T4]: ...
type Alias[T = int, U] = ...

View File

@@ -1,9 +1,10 @@
use std::fmt::{self, Display};
use ruff_python_ast::PythonVersion;
use ruff_python_ast::token::TokenKind;
use ruff_text_size::{Ranged, TextRange};
use crate::{TokenKind, string::InterpolatedStringKind};
use crate::string::InterpolatedStringKind;
/// Represents represent errors that occur during parsing and are
/// returned by the `parse_*` functions.

View File

@@ -14,6 +14,7 @@ use unicode_normalization::UnicodeNormalization;
use ruff_python_ast::name::Name;
use ruff_python_ast::str_prefix::{AnyStringPrefix, StringLiteralPrefix};
use ruff_python_ast::token::{TokenFlags, TokenKind};
use ruff_python_ast::{Int, IpyEscapeKind, StringFlags};
use ruff_python_trivia::is_python_whitespace;
use ruff_text_size::{TextLen, TextRange, TextSize};
@@ -26,7 +27,7 @@ use crate::lexer::interpolated_string::{
InterpolatedStringContext, InterpolatedStrings, InterpolatedStringsCheckpoint,
};
use crate::string::InterpolatedStringKind;
use crate::token::{TokenFlags, TokenKind, TokenValue};
use crate::token::TokenValue;
mod cursor;
mod indentation;

View File

@@ -63,23 +63,20 @@
//! [lexical analysis]: https://en.wikipedia.org/wiki/Lexical_analysis
//! [parsing]: https://en.wikipedia.org/wiki/Parsing
//! [lexer]: crate::lexer
use std::iter::FusedIterator;
use std::ops::Deref;
pub use crate::error::{
InterpolatedStringErrorType, LexicalErrorType, ParseError, ParseErrorType,
UnsupportedSyntaxError, UnsupportedSyntaxErrorKind,
};
pub use crate::parser::ParseOptions;
pub use crate::token::{Token, TokenKind};
use crate::parser::Parser;
use ruff_python_ast::token::Tokens;
use ruff_python_ast::{
Expr, Mod, ModExpression, ModModule, PySourceType, StringFlags, StringLiteral, Suite,
};
use ruff_python_trivia::CommentRanges;
use ruff_text_size::{Ranged, TextRange, TextSize};
use ruff_text_size::{Ranged, TextRange};
mod error;
pub mod lexer;
@@ -473,351 +470,6 @@ impl Parsed<ModExpression> {
}
}
/// Tokens represents a vector of lexed [`Token`].
#[derive(Debug, Clone, PartialEq, Eq, get_size2::GetSize)]
pub struct Tokens {
raw: Vec<Token>,
}
impl Tokens {
pub(crate) fn new(tokens: Vec<Token>) -> Tokens {
Tokens { raw: tokens }
}
/// Returns an iterator over all the tokens that provides context.
pub fn iter_with_context(&self) -> TokenIterWithContext<'_> {
TokenIterWithContext::new(&self.raw)
}
/// Performs a binary search to find the index of the **first** token that starts at the given `offset`.
///
/// Unlike `binary_search_by_key`, this method ensures that if multiple tokens start at the same offset,
/// it returns the index of the first one. Multiple tokens can start at the same offset in cases where
/// zero-length tokens are involved (like `Dedent` or `Newline` at the end of the file).
pub fn binary_search_by_start(&self, offset: TextSize) -> Result<usize, usize> {
let partition_point = self.partition_point(|token| token.start() < offset);
let after = &self[partition_point..];
if after.first().is_some_and(|first| first.start() == offset) {
Ok(partition_point)
} else {
Err(partition_point)
}
}
/// Returns a slice of [`Token`] that are within the given `range`.
///
/// The start and end offset of the given range should be either:
/// 1. Token boundary
/// 2. Gap between the tokens
///
/// For example, considering the following tokens and their corresponding range:
///
/// | Token | Range |
/// |---------------------|-----------|
/// | `Def` | `0..3` |
/// | `Name` | `4..7` |
/// | `Lpar` | `7..8` |
/// | `Rpar` | `8..9` |
/// | `Colon` | `9..10` |
/// | `Newline` | `10..11` |
/// | `Comment` | `15..24` |
/// | `NonLogicalNewline` | `24..25` |
/// | `Indent` | `25..29` |
/// | `Pass` | `29..33` |
///
/// Here, for (1) a token boundary is considered either the start or end offset of any of the
/// above tokens. For (2), the gap would be any offset between the `Newline` and `Comment`
/// token which are 12, 13, and 14.
///
/// Examples:
/// 1) `4..10` would give `Name`, `Lpar`, `Rpar`, `Colon`
/// 2) `11..25` would give `Comment`, `NonLogicalNewline`
/// 3) `12..25` would give same as (2) and offset 12 is in the "gap"
/// 4) `9..12` would give `Colon`, `Newline` and offset 12 is in the "gap"
/// 5) `18..27` would panic because both the start and end offset is within a token
///
/// ## Note
///
/// The returned slice can contain the [`TokenKind::Unknown`] token if there was a lexical
/// error encountered within the given range.
///
/// # Panics
///
/// If either the start or end offset of the given range is within a token range.
pub fn in_range(&self, range: TextRange) -> &[Token] {
let tokens_after_start = self.after(range.start());
Self::before_impl(tokens_after_start, range.end())
}
/// Searches the token(s) at `offset`.
///
/// Returns [`TokenAt::Between`] if `offset` points directly inbetween two tokens
/// (the left token ends at `offset` and the right token starts at `offset`).
///
///
/// ## Examples
///
/// [Playground](https://play.ruff.rs/f3ad0a55-5931-4a13-96c7-b2b8bfdc9a2e?secondary=Tokens)
///
/// ```
/// # use ruff_python_ast::PySourceType;
/// # use ruff_python_parser::{Token, TokenAt, TokenKind};
/// # use ruff_text_size::{Ranged, TextSize};
///
/// let source = r#"
/// def test(arg):
/// arg.call()
/// if True:
/// pass
/// print("true")
/// "#.trim();
///
/// let parsed = ruff_python_parser::parse_unchecked_source(source, PySourceType::Python);
/// let tokens = parsed.tokens();
///
/// let collect_tokens = |offset: TextSize| {
/// tokens.at_offset(offset).into_iter().map(|t| (t.kind(), &source[t.range()])).collect::<Vec<_>>()
/// };
///
/// assert_eq!(collect_tokens(TextSize::new(4)), vec! [(TokenKind::Name, "test")]);
/// assert_eq!(collect_tokens(TextSize::new(6)), vec! [(TokenKind::Name, "test")]);
/// // between `arg` and `.`
/// assert_eq!(collect_tokens(TextSize::new(22)), vec! [(TokenKind::Name, "arg"), (TokenKind::Dot, ".")]);
/// assert_eq!(collect_tokens(TextSize::new(36)), vec! [(TokenKind::If, "if")]);
/// // Before the dedent token
/// assert_eq!(collect_tokens(TextSize::new(57)), vec! []);
/// ```
pub fn at_offset(&self, offset: TextSize) -> TokenAt {
match self.binary_search_by_start(offset) {
// The token at `index` starts exactly at `offset.
// ```python
// object.attribute
// ^ OFFSET
// ```
Ok(index) => {
let token = self[index];
// `token` starts exactly at `offset`. Test if the offset is right between
// `token` and the previous token (if there's any)
if let Some(previous) = index.checked_sub(1).map(|idx| self[idx]) {
if previous.end() == offset {
return TokenAt::Between(previous, token);
}
}
TokenAt::Single(token)
}
// No token found that starts exactly at the given offset. But it's possible that
// the token starting before `offset` fully encloses `offset` (it's end range ends after `offset`).
// ```python
// object.attribute
// ^ OFFSET
// # or
// if True:
// print("test")
// ^ OFFSET
// ```
Err(index) => {
if let Some(previous) = index.checked_sub(1).map(|idx| self[idx]) {
if previous.range().contains_inclusive(offset) {
return TokenAt::Single(previous);
}
}
TokenAt::None
}
}
}
/// Returns a slice of tokens before the given [`TextSize`] offset.
///
/// If the given offset is between two tokens, the returned slice will end just before the
/// following token. In other words, if the offset is between the end of previous token and
/// start of next token, the returned slice will end just before the next token.
///
/// # Panics
///
/// If the given offset is inside a token range at any point
/// other than the start of the range.
pub fn before(&self, offset: TextSize) -> &[Token] {
Self::before_impl(&self.raw, offset)
}
fn before_impl(tokens: &[Token], offset: TextSize) -> &[Token] {
let partition_point = tokens.partition_point(|token| token.start() < offset);
let before = &tokens[..partition_point];
if let Some(last) = before.last() {
// If it's equal to the end offset, then it's at a token boundary which is
// valid. If it's greater than the end offset, then it's in the gap between
// the tokens which is valid as well.
assert!(
offset >= last.end(),
"Offset {:?} is inside a token range {:?}",
offset,
last.range()
);
}
before
}
/// Returns a slice of tokens after the given [`TextSize`] offset.
///
/// If the given offset is between two tokens, the returned slice will start from the following
/// token. In other words, if the offset is between the end of previous token and start of next
/// token, the returned slice will start from the next token.
///
/// # Panics
///
/// If the given offset is inside a token range at any point
/// other than the start of the range.
pub fn after(&self, offset: TextSize) -> &[Token] {
let partition_point = self.partition_point(|token| token.end() <= offset);
let after = &self[partition_point..];
if let Some(first) = after.first() {
// valid. If it's greater than the end offset, then it's in the gap between
// the tokens which is valid as well.
assert!(
offset <= first.start(),
"Offset {:?} is inside a token range {:?}",
offset,
first.range()
);
}
after
}
}
impl<'a> IntoIterator for &'a Tokens {
type Item = &'a Token;
type IntoIter = std::slice::Iter<'a, Token>;
fn into_iter(self) -> Self::IntoIter {
self.iter()
}
}
impl Deref for Tokens {
type Target = [Token];
fn deref(&self) -> &Self::Target {
&self.raw
}
}
/// A token that encloses a given offset or ends exactly at it.
#[derive(Debug, Clone)]
pub enum TokenAt {
/// There's no token at the given offset
None,
/// There's a single token at the given offset.
Single(Token),
/// The offset falls exactly between two tokens. E.g. `CURSOR` in `call<CURSOR>(arguments)` is
/// positioned exactly between the `call` and `(` tokens.
Between(Token, Token),
}
impl Iterator for TokenAt {
type Item = Token;
fn next(&mut self) -> Option<Self::Item> {
match *self {
TokenAt::None => None,
TokenAt::Single(token) => {
*self = TokenAt::None;
Some(token)
}
TokenAt::Between(first, second) => {
*self = TokenAt::Single(second);
Some(first)
}
}
}
}
impl FusedIterator for TokenAt {}
impl From<&Tokens> for CommentRanges {
fn from(tokens: &Tokens) -> Self {
let mut ranges = vec![];
for token in tokens {
if token.kind() == TokenKind::Comment {
ranges.push(token.range());
}
}
CommentRanges::new(ranges)
}
}
/// An iterator over the [`Token`]s with context.
///
/// This struct is created by the [`iter_with_context`] method on [`Tokens`]. Refer to its
/// documentation for more details.
///
/// [`iter_with_context`]: Tokens::iter_with_context
#[derive(Debug, Clone)]
pub struct TokenIterWithContext<'a> {
inner: std::slice::Iter<'a, Token>,
nesting: u32,
}
impl<'a> TokenIterWithContext<'a> {
fn new(tokens: &'a [Token]) -> TokenIterWithContext<'a> {
TokenIterWithContext {
inner: tokens.iter(),
nesting: 0,
}
}
/// Return the nesting level the iterator is currently in.
pub const fn nesting(&self) -> u32 {
self.nesting
}
/// Returns `true` if the iterator is within a parenthesized context.
pub const fn in_parenthesized_context(&self) -> bool {
self.nesting > 0
}
/// Returns the next [`Token`] in the iterator without consuming it.
pub fn peek(&self) -> Option<&'a Token> {
self.clone().next()
}
}
impl<'a> Iterator for TokenIterWithContext<'a> {
type Item = &'a Token;
fn next(&mut self) -> Option<Self::Item> {
let token = self.inner.next()?;
match token.kind() {
TokenKind::Lpar | TokenKind::Lbrace | TokenKind::Lsqb => self.nesting += 1,
TokenKind::Rpar | TokenKind::Rbrace | TokenKind::Rsqb => {
self.nesting = self.nesting.saturating_sub(1);
}
// This mimics the behavior of re-lexing which reduces the nesting level on the lexer.
// We don't need to reduce it by 1 because unlike the lexer we see the final token
// after recovering from every unclosed parenthesis.
TokenKind::Newline if self.nesting > 0 => {
self.nesting = 0;
}
_ => {}
}
Some(token)
}
}
impl FusedIterator for TokenIterWithContext<'_> {}
/// Control in the different modes by which a source file can be parsed.
///
/// The mode argument specifies in what way code must be parsed.
@@ -888,204 +540,3 @@ impl std::fmt::Display for ModeParseError {
write!(f, r#"mode must be "exec", "eval", "ipython", or "single""#)
}
}
#[cfg(test)]
mod tests {
use std::ops::Range;
use crate::token::TokenFlags;
use super::*;
/// Test case containing a "gap" between two tokens.
///
/// Code: <https://play.ruff.rs/a3658340-6df8-42c5-be80-178744bf1193>
const TEST_CASE_WITH_GAP: [(TokenKind, Range<u32>); 10] = [
(TokenKind::Def, 0..3),
(TokenKind::Name, 4..7),
(TokenKind::Lpar, 7..8),
(TokenKind::Rpar, 8..9),
(TokenKind::Colon, 9..10),
(TokenKind::Newline, 10..11),
// Gap ||..||
(TokenKind::Comment, 15..24),
(TokenKind::NonLogicalNewline, 24..25),
(TokenKind::Indent, 25..29),
(TokenKind::Pass, 29..33),
// No newline at the end to keep the token set full of unique tokens
];
/// Helper function to create [`Tokens`] from an iterator of (kind, range).
fn new_tokens(tokens: impl Iterator<Item = (TokenKind, Range<u32>)>) -> Tokens {
Tokens::new(
tokens
.map(|(kind, range)| {
Token::new(
kind,
TextRange::new(TextSize::new(range.start), TextSize::new(range.end)),
TokenFlags::empty(),
)
})
.collect(),
)
}
#[test]
fn tokens_after_offset_at_token_start() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let after = tokens.after(TextSize::new(8));
assert_eq!(after.len(), 7);
assert_eq!(after.first().unwrap().kind(), TokenKind::Rpar);
}
#[test]
fn tokens_after_offset_at_token_end() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let after = tokens.after(TextSize::new(11));
assert_eq!(after.len(), 4);
assert_eq!(after.first().unwrap().kind(), TokenKind::Comment);
}
#[test]
fn tokens_after_offset_between_tokens() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let after = tokens.after(TextSize::new(13));
assert_eq!(after.len(), 4);
assert_eq!(after.first().unwrap().kind(), TokenKind::Comment);
}
#[test]
fn tokens_after_offset_at_last_token_end() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let after = tokens.after(TextSize::new(33));
assert_eq!(after.len(), 0);
}
#[test]
#[should_panic(expected = "Offset 5 is inside a token range 4..7")]
fn tokens_after_offset_inside_token() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
tokens.after(TextSize::new(5));
}
#[test]
fn tokens_before_offset_at_first_token_start() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let before = tokens.before(TextSize::new(0));
assert_eq!(before.len(), 0);
}
#[test]
fn tokens_before_offset_after_first_token_gap() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let before = tokens.before(TextSize::new(3));
assert_eq!(before.len(), 1);
assert_eq!(before.last().unwrap().kind(), TokenKind::Def);
}
#[test]
fn tokens_before_offset_at_second_token_start() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let before = tokens.before(TextSize::new(4));
assert_eq!(before.len(), 1);
assert_eq!(before.last().unwrap().kind(), TokenKind::Def);
}
#[test]
fn tokens_before_offset_at_token_start() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let before = tokens.before(TextSize::new(8));
assert_eq!(before.len(), 3);
assert_eq!(before.last().unwrap().kind(), TokenKind::Lpar);
}
#[test]
fn tokens_before_offset_at_token_end() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let before = tokens.before(TextSize::new(11));
assert_eq!(before.len(), 6);
assert_eq!(before.last().unwrap().kind(), TokenKind::Newline);
}
#[test]
fn tokens_before_offset_between_tokens() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let before = tokens.before(TextSize::new(13));
assert_eq!(before.len(), 6);
assert_eq!(before.last().unwrap().kind(), TokenKind::Newline);
}
#[test]
fn tokens_before_offset_at_last_token_end() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let before = tokens.before(TextSize::new(33));
assert_eq!(before.len(), 10);
assert_eq!(before.last().unwrap().kind(), TokenKind::Pass);
}
#[test]
#[should_panic(expected = "Offset 5 is inside a token range 4..7")]
fn tokens_before_offset_inside_token() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
tokens.before(TextSize::new(5));
}
#[test]
fn tokens_in_range_at_token_offset() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let in_range = tokens.in_range(TextRange::new(4.into(), 10.into()));
assert_eq!(in_range.len(), 4);
assert_eq!(in_range.first().unwrap().kind(), TokenKind::Name);
assert_eq!(in_range.last().unwrap().kind(), TokenKind::Colon);
}
#[test]
fn tokens_in_range_start_offset_at_token_end() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let in_range = tokens.in_range(TextRange::new(11.into(), 29.into()));
assert_eq!(in_range.len(), 3);
assert_eq!(in_range.first().unwrap().kind(), TokenKind::Comment);
assert_eq!(in_range.last().unwrap().kind(), TokenKind::Indent);
}
#[test]
fn tokens_in_range_end_offset_at_token_start() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let in_range = tokens.in_range(TextRange::new(8.into(), 15.into()));
assert_eq!(in_range.len(), 3);
assert_eq!(in_range.first().unwrap().kind(), TokenKind::Rpar);
assert_eq!(in_range.last().unwrap().kind(), TokenKind::Newline);
}
#[test]
fn tokens_in_range_start_offset_between_tokens() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let in_range = tokens.in_range(TextRange::new(13.into(), 29.into()));
assert_eq!(in_range.len(), 3);
assert_eq!(in_range.first().unwrap().kind(), TokenKind::Comment);
assert_eq!(in_range.last().unwrap().kind(), TokenKind::Indent);
}
#[test]
fn tokens_in_range_end_offset_between_tokens() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
let in_range = tokens.in_range(TextRange::new(9.into(), 13.into()));
assert_eq!(in_range.len(), 2);
assert_eq!(in_range.first().unwrap().kind(), TokenKind::Colon);
assert_eq!(in_range.last().unwrap().kind(), TokenKind::Newline);
}
#[test]
#[should_panic(expected = "Offset 5 is inside a token range 4..7")]
fn tokens_in_range_start_offset_inside_token() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
tokens.in_range(TextRange::new(5.into(), 10.into()));
}
#[test]
#[should_panic(expected = "Offset 6 is inside a token range 4..7")]
fn tokens_in_range_end_offset_inside_token() {
let tokens = new_tokens(TEST_CASE_WITH_GAP.into_iter());
tokens.in_range(TextRange::new(0.into(), 6.into()));
}
}

View File

@@ -4,6 +4,7 @@ use bitflags::bitflags;
use rustc_hash::{FxBuildHasher, FxHashSet};
use ruff_python_ast::name::Name;
use ruff_python_ast::token::TokenKind;
use ruff_python_ast::{
self as ast, AnyStringFlags, AtomicNodeIndex, BoolOp, CmpOp, ConversionFlag, Expr, ExprContext,
FString, InterpolatedStringElement, InterpolatedStringElements, IpyEscapeKind, Number,
@@ -18,7 +19,7 @@ use crate::string::{
InterpolatedStringKind, StringType, parse_interpolated_string_literal_element,
parse_string_literal,
};
use crate::token::{TokenKind, TokenValue};
use crate::token::TokenValue;
use crate::token_set::TokenSet;
use crate::{
InterpolatedStringErrorType, Mode, ParseErrorType, UnsupportedSyntaxError,

View File

@@ -1,7 +1,8 @@
use ruff_python_ast::token::TokenKind;
use ruff_python_ast::{self as ast, CmpOp, Expr, ExprContext, Number};
use ruff_text_size::{Ranged, TextRange};
use crate::{TokenKind, error::RelaxedDecoratorError};
use crate::error::RelaxedDecoratorError;
/// Set the `ctx` for `Expr::Id`, `Expr::Attribute`, `Expr::Subscript`, `Expr::Starred`,
/// `Expr::Tuple` and `Expr::List`. If `expr` is either `Expr::Tuple` or `Expr::List`,

View File

@@ -2,6 +2,7 @@ use std::cmp::Ordering;
use bitflags::bitflags;
use ruff_python_ast::token::TokenKind;
use ruff_python_ast::{AtomicNodeIndex, Mod, ModExpression, ModModule};
use ruff_text_size::{Ranged, TextRange, TextSize};
@@ -12,7 +13,7 @@ use crate::string::InterpolatedStringKind;
use crate::token::TokenValue;
use crate::token_set::TokenSet;
use crate::token_source::{TokenSource, TokenSourceCheckpoint};
use crate::{Mode, ParseError, ParseErrorType, TokenKind, UnsupportedSyntaxErrorKind};
use crate::{Mode, ParseError, ParseErrorType, UnsupportedSyntaxErrorKind};
use crate::{Parsed, Tokens};
pub use crate::parser::options::ParseOptions;

View File

@@ -1,4 +1,5 @@
use ruff_python_ast::name::Name;
use ruff_python_ast::token::TokenKind;
use ruff_python_ast::{
self as ast, AtomicNodeIndex, Expr, ExprContext, Number, Operator, Pattern, Singleton,
};
@@ -7,7 +8,7 @@ use ruff_text_size::{Ranged, TextSize};
use crate::ParseErrorType;
use crate::parser::progress::ParserProgress;
use crate::parser::{Parser, RecoveryContextKind, SequenceMatchPatternParentheses, recovery};
use crate::token::{TokenKind, TokenValue};
use crate::token::TokenValue;
use crate::token_set::TokenSet;
use super::expression::ExpressionContext;

View File

@@ -2,6 +2,7 @@ use compact_str::CompactString;
use std::fmt::{Display, Write};
use ruff_python_ast::name::Name;
use ruff_python_ast::token::TokenKind;
use ruff_python_ast::{
self as ast, AtomicNodeIndex, ExceptHandler, Expr, ExprContext, IpyEscapeKind, Operator,
PythonVersion, Stmt, WithItem,
@@ -14,7 +15,7 @@ use crate::parser::progress::ParserProgress;
use crate::parser::{
FunctionKind, Parser, RecoveryContext, RecoveryContextKind, WithItemKind, helpers,
};
use crate::token::{TokenKind, TokenValue};
use crate::token::TokenValue;
use crate::token_set::TokenSet;
use crate::{Mode, ParseErrorType, UnsupportedSyntaxErrorKind};

View File

@@ -3,12 +3,13 @@
//! This checker is not responsible for traversing the AST itself. Instead, its
//! [`SemanticSyntaxChecker::visit_stmt`] and [`SemanticSyntaxChecker::visit_expr`] methods should
//! be called in a parent `Visitor`'s `visit_stmt` and `visit_expr` methods, respectively.
use ruff_python_ast::{
self as ast, Expr, ExprContext, IrrefutablePatternKind, Pattern, PythonVersion, Stmt, StmtExpr,
StmtImportFrom,
StmtFunctionDef, StmtImportFrom,
comparable::ComparableExpr,
helpers,
visitor::{Visitor, walk_expr},
visitor::{Visitor, walk_expr, walk_stmt},
};
use ruff_text_size::{Ranged, TextRange, TextSize};
use rustc_hash::{FxBuildHasher, FxHashSet};
@@ -144,11 +145,16 @@ impl SemanticSyntaxChecker {
}
}
}
Stmt::ClassDef(ast::StmtClassDef { type_params, .. })
| Stmt::TypeAlias(ast::StmtTypeAlias { type_params, .. }) => {
if let Some(type_params) = type_params {
Self::duplicate_type_parameter_name(type_params, ctx);
}
Stmt::ClassDef(ast::StmtClassDef {
type_params: Some(type_params),
..
})
| Stmt::TypeAlias(ast::StmtTypeAlias {
type_params: Some(type_params),
..
}) => {
Self::duplicate_type_parameter_name(type_params, ctx);
Self::type_parameter_default_order(type_params, ctx);
}
Stmt::Assign(ast::StmtAssign { targets, value, .. }) => {
if let [Expr::Starred(ast::ExprStarred { range, .. })] = targets.as_slice() {
@@ -611,6 +617,39 @@ impl SemanticSyntaxChecker {
}
}
fn type_parameter_default_order<Ctx: SemanticSyntaxContext>(
type_params: &ast::TypeParams,
ctx: &Ctx,
) {
let mut seen_default = false;
for type_param in type_params.iter() {
let has_default = match type_param {
ast::TypeParam::TypeVar(ast::TypeParamTypeVar { default, .. })
| ast::TypeParam::TypeVarTuple(ast::TypeParamTypeVarTuple { default, .. })
| ast::TypeParam::ParamSpec(ast::TypeParamParamSpec { default, .. }) => {
default.is_some()
}
};
if seen_default && !has_default {
// test_err type_parameter_default_order
// class C[T = int, U]: ...
// class C[T1, T2 = int, T3, T4]: ...
// type Alias[T = int, U] = ...
Self::add_error(
ctx,
SemanticSyntaxErrorKind::TypeParameterDefaultOrder(
type_param.name().id.to_string(),
),
type_param.range(),
);
}
if has_default {
seen_default = true;
}
}
}
fn duplicate_parameter_name<Ctx: SemanticSyntaxContext>(
parameters: &ast::Parameters,
ctx: &Ctx,
@@ -701,7 +740,21 @@ impl SemanticSyntaxChecker {
self.seen_futures_boundary = true;
}
}
Stmt::FunctionDef(_) => {
Stmt::FunctionDef(StmtFunctionDef { is_async, body, .. }) => {
if *is_async {
let mut visitor = ReturnVisitor::default();
visitor.visit_body(body);
if visitor.has_yield {
if let Some(return_range) = visitor.return_range {
Self::add_error(
ctx,
SemanticSyntaxErrorKind::ReturnInGenerator,
return_range,
);
}
}
}
self.seen_futures_boundary = true;
}
_ => {
@@ -896,7 +949,7 @@ impl SemanticSyntaxChecker {
// This check is required in addition to avoiding calling this function in `visit_expr`
// because the generator scope applies to nested parts of the `Expr::Generator` that are
// visited separately.
if ctx.in_generator_scope() {
if ctx.in_generator_context() {
return;
}
Self::add_error(
@@ -1066,6 +1119,12 @@ impl Display for SemanticSyntaxError {
SemanticSyntaxErrorKind::DuplicateTypeParameter => {
f.write_str("duplicate type parameter")
}
SemanticSyntaxErrorKind::TypeParameterDefaultOrder(name) => {
write!(
f,
"non default type parameter `{name}` follows default type parameter"
)
}
SemanticSyntaxErrorKind::MultipleCaseAssignment(name) => {
write!(f, "multiple assignments to name `{name}` in pattern")
}
@@ -1169,6 +1228,9 @@ impl Display for SemanticSyntaxError {
SemanticSyntaxErrorKind::NonlocalWithoutBinding(name) => {
write!(f, "no binding for nonlocal `{name}` found")
}
SemanticSyntaxErrorKind::ReturnInGenerator => {
write!(f, "`return` with value in async generator")
}
}
}
}
@@ -1572,6 +1634,12 @@ pub enum SemanticSyntaxErrorKind {
/// Represents a nonlocal statement for a name that has no binding in an enclosing scope.
NonlocalWithoutBinding(String),
/// Represents a default type parameter followed by a non-default type parameter.
TypeParameterDefaultOrder(String),
/// Represents a `return` statement with a value in an asynchronous generator.
ReturnInGenerator,
}
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, get_size2::GetSize)]
@@ -1688,6 +1756,40 @@ impl Visitor<'_> for ReboundComprehensionVisitor<'_> {
}
}
#[derive(Default)]
struct ReturnVisitor {
return_range: Option<TextRange>,
has_yield: bool,
}
impl Visitor<'_> for ReturnVisitor {
fn visit_stmt(&mut self, stmt: &Stmt) {
match stmt {
// Do not recurse into nested functions; they're evaluated separately.
Stmt::FunctionDef(_) | Stmt::ClassDef(_) => {}
Stmt::Return(ast::StmtReturn {
value: Some(_),
range,
..
}) => {
self.return_range = Some(*range);
walk_stmt(self, stmt);
}
_ => walk_stmt(self, stmt),
}
}
fn visit_expr(&mut self, expr: &Expr) {
match expr {
Expr::Lambda(_) => {}
Expr::Yield(_) | Expr::YieldFrom(_) => {
self.has_yield = true;
}
_ => walk_expr(self, expr),
}
}
}
struct MatchPatternVisitor<'a, Ctx> {
names: FxHashSet<&'a ast::name::Name>,
ctx: &'a Ctx,
@@ -2096,11 +2198,11 @@ pub trait SemanticSyntaxContext {
/// Returns `true` if the visitor is in a function scope.
fn in_function_scope(&self) -> bool;
/// Returns `true` if the visitor is in a generator scope.
/// Returns `true` if the visitor is within a generator scope.
///
/// Note that this refers to an `Expr::Generator` precisely, not to comprehensions more
/// generally.
fn in_generator_scope(&self) -> bool;
fn in_generator_context(&self) -> bool;
/// Returns `true` if the source file is a Jupyter notebook.
fn in_notebook(&self) -> bool;

View File

@@ -3,13 +3,11 @@
use bstr::ByteSlice;
use std::fmt;
use ruff_python_ast::token::TokenKind;
use ruff_python_ast::{self as ast, AnyStringFlags, AtomicNodeIndex, Expr, StringFlags};
use ruff_text_size::{Ranged, TextRange, TextSize};
use crate::{
TokenKind,
error::{LexicalError, LexicalErrorType},
};
use crate::error::{LexicalError, LexicalErrorType};
#[derive(Debug)]
pub(crate) enum StringType {

View File

@@ -1,848 +1,4 @@
//! Token kinds for Python source code created by the lexer and consumed by the `ruff_python_parser`.
//!
//! This module defines the tokens that the lexer recognizes. The tokens are
//! loosely based on the token definitions found in the [CPython source].
//!
//! [CPython source]: https://github.com/python/cpython/blob/dfc2e065a2e71011017077e549cd2f9bf4944c54/Grammar/Tokens
use std::fmt;
use bitflags::bitflags;
use ruff_python_ast::name::Name;
use ruff_python_ast::str::{Quote, TripleQuotes};
use ruff_python_ast::str_prefix::{
AnyStringPrefix, ByteStringPrefix, FStringPrefix, StringLiteralPrefix, TStringPrefix,
};
use ruff_python_ast::{AnyStringFlags, BoolOp, Int, IpyEscapeKind, Operator, StringFlags, UnaryOp};
use ruff_text_size::{Ranged, TextRange};
#[derive(Clone, Copy, PartialEq, Eq, get_size2::GetSize)]
pub struct Token {
/// The kind of the token.
kind: TokenKind,
/// The range of the token.
range: TextRange,
/// The set of flags describing this token.
flags: TokenFlags,
}
impl Token {
pub(crate) fn new(kind: TokenKind, range: TextRange, flags: TokenFlags) -> Token {
Self { kind, range, flags }
}
/// Returns the token kind.
#[inline]
pub const fn kind(&self) -> TokenKind {
self.kind
}
/// Returns the token as a tuple of (kind, range).
#[inline]
pub const fn as_tuple(&self) -> (TokenKind, TextRange) {
(self.kind, self.range)
}
/// Returns `true` if the current token is a triple-quoted string of any kind.
///
/// # Panics
///
/// If it isn't a string or any f/t-string tokens.
pub fn is_triple_quoted_string(self) -> bool {
self.unwrap_string_flags().is_triple_quoted()
}
/// Returns the [`Quote`] style for the current string token of any kind.
///
/// # Panics
///
/// If it isn't a string or any f/t-string tokens.
pub fn string_quote_style(self) -> Quote {
self.unwrap_string_flags().quote_style()
}
/// Returns the [`AnyStringFlags`] style for the current string token of any kind.
///
/// # Panics
///
/// If it isn't a string or any f/t-string tokens.
pub fn unwrap_string_flags(self) -> AnyStringFlags {
self.string_flags()
.unwrap_or_else(|| panic!("token to be a string"))
}
/// Returns true if the current token is a string and it is raw.
pub fn string_flags(self) -> Option<AnyStringFlags> {
if self.is_any_string() {
Some(self.flags.as_any_string_flags())
} else {
None
}
}
/// Returns `true` if this is any kind of string token - including
/// tokens in t-strings (which do not have type `str`).
const fn is_any_string(self) -> bool {
matches!(
self.kind,
TokenKind::String
| TokenKind::FStringStart
| TokenKind::FStringMiddle
| TokenKind::FStringEnd
| TokenKind::TStringStart
| TokenKind::TStringMiddle
| TokenKind::TStringEnd
)
}
}
impl Ranged for Token {
fn range(&self) -> TextRange {
self.range
}
}
impl fmt::Debug for Token {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
write!(f, "{:?} {:?}", self.kind, self.range)?;
if !self.flags.is_empty() {
f.write_str(" (flags = ")?;
let mut first = true;
for (name, _) in self.flags.iter_names() {
if first {
first = false;
} else {
f.write_str(" | ")?;
}
f.write_str(name)?;
}
f.write_str(")")?;
}
Ok(())
}
}
/// A kind of a token.
#[derive(Copy, Clone, PartialEq, Eq, Hash, Debug, PartialOrd, Ord, get_size2::GetSize)]
pub enum TokenKind {
/// Token kind for a name, commonly known as an identifier.
Name,
/// Token kind for an integer.
Int,
/// Token kind for a floating point number.
Float,
/// Token kind for a complex number.
Complex,
/// Token kind for a string.
String,
/// Token kind for the start of an f-string. This includes the `f`/`F`/`fr` prefix
/// and the opening quote(s).
FStringStart,
/// Token kind that includes the portion of text inside the f-string that's not
/// part of the expression part and isn't an opening or closing brace.
FStringMiddle,
/// Token kind for the end of an f-string. This includes the closing quote.
FStringEnd,
/// Token kind for the start of a t-string. This includes the `t`/`T`/`tr` prefix
/// and the opening quote(s).
TStringStart,
/// Token kind that includes the portion of text inside the t-string that's not
/// part of the interpolation part and isn't an opening or closing brace.
TStringMiddle,
/// Token kind for the end of a t-string. This includes the closing quote.
TStringEnd,
/// Token kind for a IPython escape command.
IpyEscapeCommand,
/// Token kind for a comment. These are filtered out of the token stream prior to parsing.
Comment,
/// Token kind for a newline.
Newline,
/// Token kind for a newline that is not a logical line break. These are filtered out of
/// the token stream prior to parsing.
NonLogicalNewline,
/// Token kind for an indent.
Indent,
/// Token kind for a dedent.
Dedent,
EndOfFile,
/// Token kind for a question mark `?`.
Question,
/// Token kind for an exclamation mark `!`.
Exclamation,
/// Token kind for a left parenthesis `(`.
Lpar,
/// Token kind for a right parenthesis `)`.
Rpar,
/// Token kind for a left square bracket `[`.
Lsqb,
/// Token kind for a right square bracket `]`.
Rsqb,
/// Token kind for a colon `:`.
Colon,
/// Token kind for a comma `,`.
Comma,
/// Token kind for a semicolon `;`.
Semi,
/// Token kind for plus `+`.
Plus,
/// Token kind for minus `-`.
Minus,
/// Token kind for star `*`.
Star,
/// Token kind for slash `/`.
Slash,
/// Token kind for vertical bar `|`.
Vbar,
/// Token kind for ampersand `&`.
Amper,
/// Token kind for less than `<`.
Less,
/// Token kind for greater than `>`.
Greater,
/// Token kind for equal `=`.
Equal,
/// Token kind for dot `.`.
Dot,
/// Token kind for percent `%`.
Percent,
/// Token kind for left bracket `{`.
Lbrace,
/// Token kind for right bracket `}`.
Rbrace,
/// Token kind for double equal `==`.
EqEqual,
/// Token kind for not equal `!=`.
NotEqual,
/// Token kind for less than or equal `<=`.
LessEqual,
/// Token kind for greater than or equal `>=`.
GreaterEqual,
/// Token kind for tilde `~`.
Tilde,
/// Token kind for caret `^`.
CircumFlex,
/// Token kind for left shift `<<`.
LeftShift,
/// Token kind for right shift `>>`.
RightShift,
/// Token kind for double star `**`.
DoubleStar,
/// Token kind for double star equal `**=`.
DoubleStarEqual,
/// Token kind for plus equal `+=`.
PlusEqual,
/// Token kind for minus equal `-=`.
MinusEqual,
/// Token kind for star equal `*=`.
StarEqual,
/// Token kind for slash equal `/=`.
SlashEqual,
/// Token kind for percent equal `%=`.
PercentEqual,
/// Token kind for ampersand equal `&=`.
AmperEqual,
/// Token kind for vertical bar equal `|=`.
VbarEqual,
/// Token kind for caret equal `^=`.
CircumflexEqual,
/// Token kind for left shift equal `<<=`.
LeftShiftEqual,
/// Token kind for right shift equal `>>=`.
RightShiftEqual,
/// Token kind for double slash `//`.
DoubleSlash,
/// Token kind for double slash equal `//=`.
DoubleSlashEqual,
/// Token kind for colon equal `:=`.
ColonEqual,
/// Token kind for at `@`.
At,
/// Token kind for at equal `@=`.
AtEqual,
/// Token kind for arrow `->`.
Rarrow,
/// Token kind for ellipsis `...`.
Ellipsis,
// The keywords should be sorted in alphabetical order. If the boundary tokens for the
// "Keywords" and "Soft keywords" group change, update the related methods on `TokenKind`.
// Keywords
And,
As,
Assert,
Async,
Await,
Break,
Class,
Continue,
Def,
Del,
Elif,
Else,
Except,
False,
Finally,
For,
From,
Global,
If,
Import,
In,
Is,
Lambda,
None,
Nonlocal,
Not,
Or,
Pass,
Raise,
Return,
True,
Try,
While,
With,
Yield,
// Soft keywords
Case,
Match,
Type,
Unknown,
}
impl TokenKind {
/// Returns `true` if this is an end of file token.
#[inline]
pub const fn is_eof(self) -> bool {
matches!(self, TokenKind::EndOfFile)
}
/// Returns `true` if this is either a newline or non-logical newline token.
#[inline]
pub const fn is_any_newline(self) -> bool {
matches!(self, TokenKind::Newline | TokenKind::NonLogicalNewline)
}
/// Returns `true` if the token is a keyword (including soft keywords).
///
/// See also [`is_soft_keyword`], [`is_non_soft_keyword`].
///
/// [`is_soft_keyword`]: TokenKind::is_soft_keyword
/// [`is_non_soft_keyword`]: TokenKind::is_non_soft_keyword
#[inline]
pub fn is_keyword(self) -> bool {
TokenKind::And <= self && self <= TokenKind::Type
}
/// Returns `true` if the token is strictly a soft keyword.
///
/// See also [`is_keyword`], [`is_non_soft_keyword`].
///
/// [`is_keyword`]: TokenKind::is_keyword
/// [`is_non_soft_keyword`]: TokenKind::is_non_soft_keyword
#[inline]
pub fn is_soft_keyword(self) -> bool {
TokenKind::Case <= self && self <= TokenKind::Type
}
/// Returns `true` if the token is strictly a non-soft keyword.
///
/// See also [`is_keyword`], [`is_soft_keyword`].
///
/// [`is_keyword`]: TokenKind::is_keyword
/// [`is_soft_keyword`]: TokenKind::is_soft_keyword
#[inline]
pub fn is_non_soft_keyword(self) -> bool {
TokenKind::And <= self && self <= TokenKind::Yield
}
#[inline]
pub const fn is_operator(self) -> bool {
matches!(
self,
TokenKind::Lpar
| TokenKind::Rpar
| TokenKind::Lsqb
| TokenKind::Rsqb
| TokenKind::Comma
| TokenKind::Semi
| TokenKind::Plus
| TokenKind::Minus
| TokenKind::Star
| TokenKind::Slash
| TokenKind::Vbar
| TokenKind::Amper
| TokenKind::Less
| TokenKind::Greater
| TokenKind::Equal
| TokenKind::Dot
| TokenKind::Percent
| TokenKind::Lbrace
| TokenKind::Rbrace
| TokenKind::EqEqual
| TokenKind::NotEqual
| TokenKind::LessEqual
| TokenKind::GreaterEqual
| TokenKind::Tilde
| TokenKind::CircumFlex
| TokenKind::LeftShift
| TokenKind::RightShift
| TokenKind::DoubleStar
| TokenKind::PlusEqual
| TokenKind::MinusEqual
| TokenKind::StarEqual
| TokenKind::SlashEqual
| TokenKind::PercentEqual
| TokenKind::AmperEqual
| TokenKind::VbarEqual
| TokenKind::CircumflexEqual
| TokenKind::LeftShiftEqual
| TokenKind::RightShiftEqual
| TokenKind::DoubleStarEqual
| TokenKind::DoubleSlash
| TokenKind::DoubleSlashEqual
| TokenKind::At
| TokenKind::AtEqual
| TokenKind::Rarrow
| TokenKind::Ellipsis
| TokenKind::ColonEqual
| TokenKind::Colon
| TokenKind::And
| TokenKind::Or
| TokenKind::Not
| TokenKind::In
| TokenKind::Is
)
}
/// Returns `true` if this is a singleton token i.e., `True`, `False`, or `None`.
#[inline]
pub const fn is_singleton(self) -> bool {
matches!(self, TokenKind::False | TokenKind::True | TokenKind::None)
}
/// Returns `true` if this is a trivia token i.e., a comment or a non-logical newline.
#[inline]
pub const fn is_trivia(&self) -> bool {
matches!(self, TokenKind::Comment | TokenKind::NonLogicalNewline)
}
/// Returns `true` if this is a comment token.
#[inline]
pub const fn is_comment(&self) -> bool {
matches!(self, TokenKind::Comment)
}
#[inline]
pub const fn is_arithmetic(self) -> bool {
matches!(
self,
TokenKind::DoubleStar
| TokenKind::Star
| TokenKind::Plus
| TokenKind::Minus
| TokenKind::Slash
| TokenKind::DoubleSlash
| TokenKind::At
)
}
#[inline]
pub const fn is_bitwise_or_shift(self) -> bool {
matches!(
self,
TokenKind::LeftShift
| TokenKind::LeftShiftEqual
| TokenKind::RightShift
| TokenKind::RightShiftEqual
| TokenKind::Amper
| TokenKind::AmperEqual
| TokenKind::Vbar
| TokenKind::VbarEqual
| TokenKind::CircumFlex
| TokenKind::CircumflexEqual
| TokenKind::Tilde
)
}
/// Returns `true` if the current token is a unary arithmetic operator.
#[inline]
pub const fn is_unary_arithmetic_operator(self) -> bool {
matches!(self, TokenKind::Plus | TokenKind::Minus)
}
#[inline]
pub const fn is_interpolated_string_end(self) -> bool {
matches!(self, TokenKind::FStringEnd | TokenKind::TStringEnd)
}
/// Returns the [`UnaryOp`] that corresponds to this token kind, if it is a unary arithmetic
/// operator, otherwise return [None].
///
/// Use [`as_unary_operator`] to match against any unary operator.
///
/// [`as_unary_operator`]: TokenKind::as_unary_operator
#[inline]
pub const fn as_unary_arithmetic_operator(self) -> Option<UnaryOp> {
Some(match self {
TokenKind::Plus => UnaryOp::UAdd,
TokenKind::Minus => UnaryOp::USub,
_ => return None,
})
}
/// Returns the [`UnaryOp`] that corresponds to this token kind, if it is a unary operator,
/// otherwise return [None].
///
/// Use [`as_unary_arithmetic_operator`] to match against only an arithmetic unary operator.
///
/// [`as_unary_arithmetic_operator`]: TokenKind::as_unary_arithmetic_operator
#[inline]
pub const fn as_unary_operator(self) -> Option<UnaryOp> {
Some(match self {
TokenKind::Plus => UnaryOp::UAdd,
TokenKind::Minus => UnaryOp::USub,
TokenKind::Tilde => UnaryOp::Invert,
TokenKind::Not => UnaryOp::Not,
_ => return None,
})
}
/// Returns the [`BoolOp`] that corresponds to this token kind, if it is a boolean operator,
/// otherwise return [None].
#[inline]
pub const fn as_bool_operator(self) -> Option<BoolOp> {
Some(match self {
TokenKind::And => BoolOp::And,
TokenKind::Or => BoolOp::Or,
_ => return None,
})
}
/// Returns the binary [`Operator`] that corresponds to the current token, if it's a binary
/// operator, otherwise return [None].
///
/// Use [`as_augmented_assign_operator`] to match against an augmented assignment token.
///
/// [`as_augmented_assign_operator`]: TokenKind::as_augmented_assign_operator
pub const fn as_binary_operator(self) -> Option<Operator> {
Some(match self {
TokenKind::Plus => Operator::Add,
TokenKind::Minus => Operator::Sub,
TokenKind::Star => Operator::Mult,
TokenKind::At => Operator::MatMult,
TokenKind::DoubleStar => Operator::Pow,
TokenKind::Slash => Operator::Div,
TokenKind::DoubleSlash => Operator::FloorDiv,
TokenKind::Percent => Operator::Mod,
TokenKind::Amper => Operator::BitAnd,
TokenKind::Vbar => Operator::BitOr,
TokenKind::CircumFlex => Operator::BitXor,
TokenKind::LeftShift => Operator::LShift,
TokenKind::RightShift => Operator::RShift,
_ => return None,
})
}
/// Returns the [`Operator`] that corresponds to this token kind, if it is
/// an augmented assignment operator, or [`None`] otherwise.
#[inline]
pub const fn as_augmented_assign_operator(self) -> Option<Operator> {
Some(match self {
TokenKind::PlusEqual => Operator::Add,
TokenKind::MinusEqual => Operator::Sub,
TokenKind::StarEqual => Operator::Mult,
TokenKind::AtEqual => Operator::MatMult,
TokenKind::DoubleStarEqual => Operator::Pow,
TokenKind::SlashEqual => Operator::Div,
TokenKind::DoubleSlashEqual => Operator::FloorDiv,
TokenKind::PercentEqual => Operator::Mod,
TokenKind::AmperEqual => Operator::BitAnd,
TokenKind::VbarEqual => Operator::BitOr,
TokenKind::CircumflexEqual => Operator::BitXor,
TokenKind::LeftShiftEqual => Operator::LShift,
TokenKind::RightShiftEqual => Operator::RShift,
_ => return None,
})
}
}
impl From<BoolOp> for TokenKind {
#[inline]
fn from(op: BoolOp) -> Self {
match op {
BoolOp::And => TokenKind::And,
BoolOp::Or => TokenKind::Or,
}
}
}
impl From<UnaryOp> for TokenKind {
#[inline]
fn from(op: UnaryOp) -> Self {
match op {
UnaryOp::Invert => TokenKind::Tilde,
UnaryOp::Not => TokenKind::Not,
UnaryOp::UAdd => TokenKind::Plus,
UnaryOp::USub => TokenKind::Minus,
}
}
}
impl From<Operator> for TokenKind {
#[inline]
fn from(op: Operator) -> Self {
match op {
Operator::Add => TokenKind::Plus,
Operator::Sub => TokenKind::Minus,
Operator::Mult => TokenKind::Star,
Operator::MatMult => TokenKind::At,
Operator::Div => TokenKind::Slash,
Operator::Mod => TokenKind::Percent,
Operator::Pow => TokenKind::DoubleStar,
Operator::LShift => TokenKind::LeftShift,
Operator::RShift => TokenKind::RightShift,
Operator::BitOr => TokenKind::Vbar,
Operator::BitXor => TokenKind::CircumFlex,
Operator::BitAnd => TokenKind::Amper,
Operator::FloorDiv => TokenKind::DoubleSlash,
}
}
}
impl fmt::Display for TokenKind {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
let value = match self {
TokenKind::Unknown => "Unknown",
TokenKind::Newline => "newline",
TokenKind::NonLogicalNewline => "NonLogicalNewline",
TokenKind::Indent => "indent",
TokenKind::Dedent => "dedent",
TokenKind::EndOfFile => "end of file",
TokenKind::Name => "name",
TokenKind::Int => "int",
TokenKind::Float => "float",
TokenKind::Complex => "complex",
TokenKind::String => "string",
TokenKind::FStringStart => "FStringStart",
TokenKind::FStringMiddle => "FStringMiddle",
TokenKind::FStringEnd => "FStringEnd",
TokenKind::TStringStart => "TStringStart",
TokenKind::TStringMiddle => "TStringMiddle",
TokenKind::TStringEnd => "TStringEnd",
TokenKind::IpyEscapeCommand => "IPython escape command",
TokenKind::Comment => "comment",
TokenKind::Question => "`?`",
TokenKind::Exclamation => "`!`",
TokenKind::Lpar => "`(`",
TokenKind::Rpar => "`)`",
TokenKind::Lsqb => "`[`",
TokenKind::Rsqb => "`]`",
TokenKind::Lbrace => "`{`",
TokenKind::Rbrace => "`}`",
TokenKind::Equal => "`=`",
TokenKind::ColonEqual => "`:=`",
TokenKind::Dot => "`.`",
TokenKind::Colon => "`:`",
TokenKind::Semi => "`;`",
TokenKind::Comma => "`,`",
TokenKind::Rarrow => "`->`",
TokenKind::Plus => "`+`",
TokenKind::Minus => "`-`",
TokenKind::Star => "`*`",
TokenKind::DoubleStar => "`**`",
TokenKind::Slash => "`/`",
TokenKind::DoubleSlash => "`//`",
TokenKind::Percent => "`%`",
TokenKind::Vbar => "`|`",
TokenKind::Amper => "`&`",
TokenKind::CircumFlex => "`^`",
TokenKind::LeftShift => "`<<`",
TokenKind::RightShift => "`>>`",
TokenKind::Tilde => "`~`",
TokenKind::At => "`@`",
TokenKind::Less => "`<`",
TokenKind::Greater => "`>`",
TokenKind::EqEqual => "`==`",
TokenKind::NotEqual => "`!=`",
TokenKind::LessEqual => "`<=`",
TokenKind::GreaterEqual => "`>=`",
TokenKind::PlusEqual => "`+=`",
TokenKind::MinusEqual => "`-=`",
TokenKind::StarEqual => "`*=`",
TokenKind::DoubleStarEqual => "`**=`",
TokenKind::SlashEqual => "`/=`",
TokenKind::DoubleSlashEqual => "`//=`",
TokenKind::PercentEqual => "`%=`",
TokenKind::VbarEqual => "`|=`",
TokenKind::AmperEqual => "`&=`",
TokenKind::CircumflexEqual => "`^=`",
TokenKind::LeftShiftEqual => "`<<=`",
TokenKind::RightShiftEqual => "`>>=`",
TokenKind::AtEqual => "`@=`",
TokenKind::Ellipsis => "`...`",
TokenKind::False => "`False`",
TokenKind::None => "`None`",
TokenKind::True => "`True`",
TokenKind::And => "`and`",
TokenKind::As => "`as`",
TokenKind::Assert => "`assert`",
TokenKind::Async => "`async`",
TokenKind::Await => "`await`",
TokenKind::Break => "`break`",
TokenKind::Class => "`class`",
TokenKind::Continue => "`continue`",
TokenKind::Def => "`def`",
TokenKind::Del => "`del`",
TokenKind::Elif => "`elif`",
TokenKind::Else => "`else`",
TokenKind::Except => "`except`",
TokenKind::Finally => "`finally`",
TokenKind::For => "`for`",
TokenKind::From => "`from`",
TokenKind::Global => "`global`",
TokenKind::If => "`if`",
TokenKind::Import => "`import`",
TokenKind::In => "`in`",
TokenKind::Is => "`is`",
TokenKind::Lambda => "`lambda`",
TokenKind::Nonlocal => "`nonlocal`",
TokenKind::Not => "`not`",
TokenKind::Or => "`or`",
TokenKind::Pass => "`pass`",
TokenKind::Raise => "`raise`",
TokenKind::Return => "`return`",
TokenKind::Try => "`try`",
TokenKind::While => "`while`",
TokenKind::Match => "`match`",
TokenKind::Type => "`type`",
TokenKind::Case => "`case`",
TokenKind::With => "`with`",
TokenKind::Yield => "`yield`",
};
f.write_str(value)
}
}
bitflags! {
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
pub(crate) struct TokenFlags: u16 {
/// The token is a string with double quotes (`"`).
const DOUBLE_QUOTES = 1 << 0;
/// The token is a triple-quoted string i.e., it starts and ends with three consecutive
/// quote characters (`"""` or `'''`).
const TRIPLE_QUOTED_STRING = 1 << 1;
/// The token is a unicode string i.e., prefixed with `u` or `U`
const UNICODE_STRING = 1 << 2;
/// The token is a byte string i.e., prefixed with `b` or `B`
const BYTE_STRING = 1 << 3;
/// The token is an f-string i.e., prefixed with `f` or `F`
const F_STRING = 1 << 4;
/// The token is a t-string i.e., prefixed with `t` or `T`
const T_STRING = 1 << 5;
/// The token is a raw string and the prefix character is in lowercase.
const RAW_STRING_LOWERCASE = 1 << 6;
/// The token is a raw string and the prefix character is in uppercase.
const RAW_STRING_UPPERCASE = 1 << 7;
/// String without matching closing quote(s)
const UNCLOSED_STRING = 1 << 8;
/// The token is a raw string i.e., prefixed with `r` or `R`
const RAW_STRING = Self::RAW_STRING_LOWERCASE.bits() | Self::RAW_STRING_UPPERCASE.bits();
}
}
impl get_size2::GetSize for TokenFlags {}
impl StringFlags for TokenFlags {
fn quote_style(self) -> Quote {
if self.intersects(TokenFlags::DOUBLE_QUOTES) {
Quote::Double
} else {
Quote::Single
}
}
fn triple_quotes(self) -> TripleQuotes {
if self.intersects(TokenFlags::TRIPLE_QUOTED_STRING) {
TripleQuotes::Yes
} else {
TripleQuotes::No
}
}
fn prefix(self) -> AnyStringPrefix {
if self.intersects(TokenFlags::F_STRING) {
if self.intersects(TokenFlags::RAW_STRING_LOWERCASE) {
AnyStringPrefix::Format(FStringPrefix::Raw { uppercase_r: false })
} else if self.intersects(TokenFlags::RAW_STRING_UPPERCASE) {
AnyStringPrefix::Format(FStringPrefix::Raw { uppercase_r: true })
} else {
AnyStringPrefix::Format(FStringPrefix::Regular)
}
} else if self.intersects(TokenFlags::T_STRING) {
if self.intersects(TokenFlags::RAW_STRING_LOWERCASE) {
AnyStringPrefix::Template(TStringPrefix::Raw { uppercase_r: false })
} else if self.intersects(TokenFlags::RAW_STRING_UPPERCASE) {
AnyStringPrefix::Template(TStringPrefix::Raw { uppercase_r: true })
} else {
AnyStringPrefix::Template(TStringPrefix::Regular)
}
} else if self.intersects(TokenFlags::BYTE_STRING) {
if self.intersects(TokenFlags::RAW_STRING_LOWERCASE) {
AnyStringPrefix::Bytes(ByteStringPrefix::Raw { uppercase_r: false })
} else if self.intersects(TokenFlags::RAW_STRING_UPPERCASE) {
AnyStringPrefix::Bytes(ByteStringPrefix::Raw { uppercase_r: true })
} else {
AnyStringPrefix::Bytes(ByteStringPrefix::Regular)
}
} else if self.intersects(TokenFlags::RAW_STRING_LOWERCASE) {
AnyStringPrefix::Regular(StringLiteralPrefix::Raw { uppercase: false })
} else if self.intersects(TokenFlags::RAW_STRING_UPPERCASE) {
AnyStringPrefix::Regular(StringLiteralPrefix::Raw { uppercase: true })
} else if self.intersects(TokenFlags::UNICODE_STRING) {
AnyStringPrefix::Regular(StringLiteralPrefix::Unicode)
} else {
AnyStringPrefix::Regular(StringLiteralPrefix::Empty)
}
}
fn is_unclosed(self) -> bool {
self.intersects(TokenFlags::UNCLOSED_STRING)
}
}
impl TokenFlags {
/// Returns `true` if the token is an f-string.
pub(crate) const fn is_f_string(self) -> bool {
self.intersects(TokenFlags::F_STRING)
}
/// Returns `true` if the token is a t-string.
pub(crate) const fn is_t_string(self) -> bool {
self.intersects(TokenFlags::T_STRING)
}
/// Returns `true` if the token is a t-string.
pub(crate) const fn is_interpolated_string(self) -> bool {
self.intersects(TokenFlags::T_STRING.union(TokenFlags::F_STRING))
}
/// Returns `true` if the token is a triple-quoted t-string.
pub(crate) fn is_triple_quoted_interpolated_string(self) -> bool {
self.intersects(TokenFlags::TRIPLE_QUOTED_STRING) && self.is_interpolated_string()
}
/// Returns `true` if the token is a raw string.
pub(crate) const fn is_raw_string(self) -> bool {
self.intersects(TokenFlags::RAW_STRING)
}
}
use ruff_python_ast::{Int, IpyEscapeKind, name::Name};
#[derive(Clone, Debug, Default)]
pub(crate) enum TokenValue {

View File

@@ -1,4 +1,4 @@
use crate::TokenKind;
use ruff_python_ast::token::TokenKind;
/// A bit-set of `TokenKind`s
#[derive(Clone, Copy)]
@@ -42,7 +42,7 @@ impl<const N: usize> From<[TokenKind; N]> for TokenSet {
#[test]
fn token_set_works_for_tokens() {
use crate::TokenKind::*;
use ruff_python_ast::token::TokenKind::*;
let mut ts = TokenSet::new([EndOfFile, Name]);
assert!(ts.contains(EndOfFile));
assert!(ts.contains(Name));

Some files were not shown because too many files have changed in this diff Show More