Commit Graph

121 Commits

Author SHA1 Message Date
Dhruv Manilawala
756e7b6c82 Store current TokenKind on Parser 2024-03-18 20:17:50 +05:30
Dhruv Manilawala
4f5604fc83 Fix clippy warnings 2024-03-14 13:31:04 +05:30
Dhruv Manilawala
4381629e13 Use recovery context to decide on trailing comma (#10405)
## Summary

This PR removes the `allow_trailing_comma` parameter from list parsing
because it can be inferred using the recovery context kind.
2024-03-14 13:31:04 +05:30
Dhruv Manilawala
b09e5f40df Remove Expr::Invalid (#10386)
## Summary

This PR removes the `Expr::Invalid` variant from the AST. Instead, we'll
try to retain as much valid information as possible and use an empty
`Expr::Name` with `ExprContext::Invalid` as a replacement.

## Test Plan

- [x] All tests pass
- [x] No performance regression
2024-03-14 13:31:04 +05:30
Dhruv Manilawala
1ec7259116 Use string flags to mark it as invalid (#10385)
## Summary

This PR updates the string flags to include an `Invalid` variant for any
invalid string nodes as deemed by the parser. This is to avoid dropping
the nodes and instead just mark it as invalid. The nodes will be empty
for now but we can discuss on whether to keep the raw source text or
not. It's not really required because the range can be used to get the
same.

It also adds a new `handle_implicitly_concatenated_strings` method which
is similar to existing `concatenated_strings` function. The reason to
have a separate method is to avoid dropping all strings if there's an
error. The error being that it's concatenating bytes and non-bytes
literal. Now, we need to decide which strings to retain. Currently, I've
kept it simple to retain bytes literal _only_ if all of them are bytes
otherwise we'll have string / f-string with invalid nodes instead of
bytes literal.

This removes the need for having a `StringType::Invalid` variant.

## Test Plan

- [x] Existing test cases pass
- [x] No performance regression
2024-03-14 13:31:04 +05:30
Victor Hugo Gomes
d41ecfe351 Remove f-string UnclosedLbrace error checking from the lexer (#10372)
This error check is already handled by the new parser.

Co-authored-by: Dhruv Manilawala <dhruvmanila@gmail.com>
2024-03-14 13:31:04 +05:30
Dhruv Manilawala
559832ba4d Merge string parsing to single entrypoint (#10339)
This PR merges the different string parsing functions into a single
entry point function.

Previously there were two entry points, one for string or byte literal
and the other for f-strings. The reason for this separation is that our
old parser raised a hard syntax error if an f-string was used as a
pattern literal. But, it's actually a soft syntax error as evident
through the CPython parser which raises it during the runtime.

This function basically implements the following grammar:
```
strings: (string|fstring)+
```

And it delegates it to the list parsing for better error recovery.

- [x] All tests pass
- [x] No performance regression
2024-03-14 13:31:04 +05:30
Dhruv Manilawala
98f6dcbb91 Introduce LexicalError::InvalidByteLiteral (#10328)
## Summary

This PR introduces a new `InvalidByteLiteral` lexical error type to
avoid repeating the message in multiple locations.
2024-03-14 13:31:04 +05:30
Dhruv Manilawala
94cc5f2e13 Remove FStringElement::Invalid, improve parsing logic (#10327)
This PR does the following around f-string parsing:
1. Removes the `FStringElement::Invalid` variant
2. Move the parsing of f-string elements to use list parsing logic
3. Add error recovery for f-string elements

- [x] All tests pass
- [x] No performance regression
2024-03-14 13:31:04 +05:30
Dhruv Manilawala
cdcbb04686 Fix tests and error recovery token sets (#10338)
This PR fixes the failing CI tests from the original PR. The changes
have been highlighted using PR comments to have the proper context.

`cargo test`
2024-03-14 13:31:04 +05:30
Dhruv Manilawala
70aa19e9af Move parser quick test to pretty print code 2024-03-14 13:31:04 +05:30
Dhruv Manilawala
2b00e81c22 Remove skip_until parser method (#10293)
## Summary

This PR removes the `skip_until` parser method. The main use case for it
was for error recovery which we want to isolate only in list parsing.

There are two references which are removed:
1. Parsing a list of match arguments in a class pattern. Take the
following code snippet as an example:

	```python
	match foo:
		case Foo(bar.z=1, baz):
			pass
	```
This is a syntax error as the keyword argument pattern can only have an
identifier but here it's an attribute node. Now, to move on to the next
argument (`baz`), the parser would skip until the end of the argument to
recover. What we will do now is to parse the value as a pattern (per
spec) thus moving the parser ahead and add the node with an empty
identifier.

	The above code will produce the following AST:

	<details><summary><b>AST</b></summary>
	<p>
	
	```rs
	Module(
	    ModModule {
	        range: 0..52,
	        body: [
	            Match(
	                StmtMatch {
	                    range: 0..51,
	                    subject: Name(
	                        ExprName {
	                            range: 6..9,
	                            id: "foo",
	                            ctx: Load,
	                        },
	                    ),
	                    cases: [
	                        MatchCase {
	                            range: 15..51,
	                            pattern: MatchClass(
	                                PatternMatchClass {
	                                    range: 20..37,
	                                    cls: Name(
	                                        ExprName {
	                                            range: 20..23,
	                                            id: "Foo",
	                                            ctx: Load,
	                                        },
	                                    ),
	                                    arguments: PatternArguments {
	                                        range: 24..37,
	                                        patterns: [
	                                            MatchAs(
	                                                PatternMatchAs {
	                                                    range: 33..36,
	                                                    pattern: None,
	                                                    name: Some(
	                                                        Identifier {
	                                                            id: "baz",
range: 33..36,
	                                                        },
	                                                    ),
	                                                },
	                                            ),
	                                        ],
	                                        keywords: [
	                                            PatternKeyword {
	                                                range: 24..31,
	                                                attr: Identifier {
	                                                    id: "",
	                                                    range: 31..31,
	                                                },
	                                                pattern: MatchValue(
	                                                    PatternMatchValue {
	                                                        range: 30..31,
value: NumberLiteral(
ExprNumberLiteral {
range: 30..31,
value: Int(
	                                                                    1,
	                                                                ),
	                                                            },
	                                                        ),
	                                                    },
	                                                ),
	                                            },
	                                        ],
	                                    },
	                                },
	                            ),
	                            guard: None,
	                            body: [
	                                Pass(
	                                    StmtPass {
	                                        range: 47..51,
	                                    },
	                                ),
	                            ],
	                        },
	                    ],
	                },
	            ),
	        ],
	    },
	)
	```
	
	</p>
	</details> 

2. Parsing a list of parameters. Here, our list parsing method makes
sure to only call the parse element function when it's a valid list
element. A parameter can start either with a `Star`, `DoubleStar`, or
`Name` token which corresponds to the 3 `if` conditions. Thus, the
`else` block is not required as the list parsing will recover without
it.
2024-03-14 13:31:04 +05:30
Dhruv Manilawala
b65e3fb335 Improve various assignment target error (#10288)
## Summary

This PR improves error related things around assignment nodes, mainly
the following:
1. Rename parse error variant:
	a. `AssignmentError` -> `InvalidAssignmentTarget`
	b. `NamedAssignmentError` -> `InvalidNamedAssignmentTarget`
	c. `AugAssignmentError` -> `InvalidAugmnetedAssignmentTarget`
2. Add `InvalidDeleteTarget` for invalid `del` targets
a. Add helper function to check if it's a valid delete target similar to
other target check functions.
4. Fix: named assignment target can only be a `Name` node

## Test Plan

Various test cases locally. As mentioned in my previous PR, I want to
keep the testing part separate.
2024-03-14 13:31:04 +05:30
Dhruv Manilawala
814438777c Remove deprecated parsing list functions (#10271)
## Summary

This PR removes the deprecated parsing list functions and updates the
references to use the new functions.

There are now 4 functions to accommodate this pattern. They are divided
into 2 groups: one to parse a sequence of elements and the other to
parse a sequence of elements _separated_ by a comma. In each of the
groups, there are 2 functions: one collects and returns all the parsed
elements as a vector and the other delegates the collection part to the
user. This separation is achieved by using `Fn` and `FnMut` to allow
mutation in the later case.

The error recovery context has been updated to accommodate the new
sequence kind. Currently, the terminator token kinds only contain the
necessary token to end the list and not necessarily the ones which might
help in error recovery. This will be updated as I go through the testing
phase. This phase is basically coming up with a bunch of invalid
programs to check how the parser is acting and how can we help in the
recovery phase.


## Test Plan

Currently, my plan is to keep the testing part separate than the actual
update. This doesn't mean I'm not testing locally, but it's not
thorough. The main reason is to keep the diffs to a minimal and writing
test cases will require some effort which I want to decouple with the
actual change. This is ok here as it's not getting merged into `main`
but the parser PR.
2024-03-14 13:31:04 +05:30
Dhruv Manilawala
f6467216dc Rename to include "token" in method name (#10287)
Small quality of life improvement to rename the following method:
1. `current_kind` -> `current_token_kind`
2. `current_range` -> `current_token_range`

It's a PR for visibility.
2024-03-14 13:31:04 +05:30
Dhruv Manilawala
bf67f129dd Encapsulate Program fields (#10270)
## Summary

This PR updates the fields in `Program` struct to be private and exposes
methods to get the values. The motivation behind this is to encapsulate
the internal representation of the parsed program which we could alter
in the future.
2024-03-14 13:31:04 +05:30
Dhruv Manilawala
3ee670440c Assert the parser is at augmented assign token (#10269)
## Summary

This PR updates fixes one of the `FIXME` comment to assert that the
parser is at one of the possible augmented assignment token when parsing
an augmented assignment statement.

## Test Plan

1. Add valid test cases for all the possible augmented assignment tokens
2. Add invalid test cases similar to assignment statement
2024-03-14 13:31:04 +05:30
Dhruv Manilawala
0de3f2f92d Fix tests and clippy warnings 2024-03-14 13:31:04 +05:30
Victor Hugo Gomes
f4a8ab8756 Replace LALRPOP parser with hand-written parser
Co-authored-by: Micha Reiser <micha@reiser.io>
2024-03-14 13:31:04 +05:30
Alex Waygood
c2e15f38ee Unify enums used for internal representation of quoting style (#10383) 2024-03-13 17:19:17 +00:00
Auguste Lalande
3ed707f245 Spellcheck & grammar (#10375)
## Summary

I used `codespell` and `gramma` to identify mispellings and grammar
errors throughout the codebase and fixed them. I tried not to make any
controversial changes, but feel free to revert as you see fit.
2024-03-13 02:34:23 +00:00
Gautier Moin
a067d87ccc Fix incorrect Parameter range for *args and **kwargs (#10283)
## Summary

Fix #10282 

This PR updates the Python grammar to include the `*` character in
`*args` `**kwargs` in the range of the `Parameter`
```
def f(*args, **kwargs): pass
#      ~~~~    ~~~~~~    <-- range before the PR
#     ^^^^^  ^^^^^^^^    <-- range after
```

The invalid syntax `def f(*, **kwargs): ...` is also now correctly
reported.

## Test Plan

Test cases were added to `function.rs`.
2024-03-08 18:57:49 -05:00
Alex Waygood
1d97f27335 Start tracking quoting style in the AST (#10298)
This PR modifies our AST so that nodes for string literals, bytes literals and f-strings all retain the following information:
- The quoting style used (double or single quotes)
- Whether the string is triple-quoted or not
- Whether the string is raw or not

This PR is a followup to #10256. Like with that PR, this PR does not, in itself, fix any bugs. However, it means that we will have the necessary information to preserve quoting style and rawness of strings in the `ExprGenerator` in a followup PR, which will allow us to provide a fix for https://github.com/astral-sh/ruff/issues/7799.

The information is recorded on the AST nodes using a bitflag field on each node, similarly to how we recorded the information on `Tok::String`, `Tok::FStringStart` and `Tok::FStringMiddle` tokens in #10298. Rather than reusing the bitflag I used for the tokens, however, I decided to create a custom bitflag for each AST node.

Using different bitflags for each node allows us to make invalid states unrepresentable: it is valid to set a `u` prefix on a string literal, but not on a bytes literal or an f-string. It also allows us to have better debug representations for each AST node modified in this PR.
2024-03-08 19:11:47 +00:00
Alex Waygood
c504d7ab11 Track quoting style in the tokenizer (#10256) 2024-03-08 08:40:06 +00:00
Micha Reiser
184241f99a Remove Expr postfix from ExprNamed, ExprIf, and ExprGenerator (#10229)
The expression types in our AST are called `ExprYield`, `ExprAwait`,
`ExprStringLiteral` etc, except `ExprNamedExpr`, `ExprIfExpr` and
`ExprGenratorExpr`. This seems to align with [Python AST's
naming](https://docs.python.org/3/library/ast.html) but feels
inconsistent and excessive.

This PR removes the `Expr` postfix from `ExprNamedExpr`, `ExprIfExpr`,
and `ExprGeneratorExpr`.
2024-03-04 12:55:01 +01:00
Micha Reiser
77c5561646 Add parenthesized flag to ExprTuple and ExprGenerator (#9614) 2024-02-26 15:35:20 +00:00
Dhruv Manilawala
33ac2867b7 Use non-parenthesized range for DebugText (#9953)
## Summary

This PR fixes the `DebugText` implementation to use the expression range
instead of the parenthesized range.

Taking the following code snippet as an example:
```python
x = 1
print(f"{  ( x  ) = }")
```

The output of running it would be:
```
  ( x  ) = 1
```

Notice that the whitespace between the parentheses and the expression is
preserved as is.

Currently, we don't preserve this information in the AST which defeats
the purpose of `DebugText` as the main purpose of the struct is to
preserve whitespaces _around_ the expression.

This is also problematic when generating the code from the AST node as
then the generator has no information about the parentheses the
whitespaces between them and the expression which would lead to the
removal of the parentheses in the generated code.

I noticed this while working on the f-string formatting where the debug
text would be used to preserve the text surrounding the expression in
the presence of debug expression. The parentheses were being dropped
then which made me realize that the problem is instead in the parser.

## Test Plan

1. Add a test case for the parser
2. Add a test case for the generator
2024-02-12 23:00:02 +05:30
Charlie Marsh
6f0e4ad332 Remove unnecessary string cloning from the parser (#9884)
Closes https://github.com/astral-sh/ruff/issues/9869.
2024-02-09 16:03:27 -05:00
Charlie Marsh
49fe1b85f2 Reduce size of Expr from 80 to 64 bytes (#9900)
## Summary

This PR reduces the size of `Expr` from 80 to 64 bytes, by reducing the
sizes of...

- `ExprCall` from 72 to 56 bytes, by using boxed slices for `Arguments`.
- `ExprCompare` from 64 to 48 bytes, by using boxed slices for its
various vectors.

In testing, the parser gets a bit faster, and the linter benchmarks
improve quite a bit.
2024-02-09 02:53:13 +00:00
Micha Reiser
fe7d965334 Reduce Result<Tok, LexicalError> size by using Box<str> instead of String (#9885) 2024-02-08 20:36:22 +00:00
Micha Reiser
688177ff6a Use Rust 1.76 (#9897) 2024-02-08 18:20:08 +00:00
Charlie Marsh
6fffde72e7 Use memchr for string lexing (#9888)
## Summary

On `main`, string lexing consists of walking through the string
character-by-character to search for the closing quote (with some
nuance: we also need to skip escaped characters, and error if we see
newlines in non-triple-quoted strings). This PR rewrites `lex_string` to
instead use `memchr` to search for the closing quote, which is
significantly faster. On my machine, at least, the `globals.py`
benchmark (which contains a lot of docstrings) gets 40% faster...

```text
lexer/numpy/globals.py  time:   [3.6410 µs 3.6496 µs 3.6585 µs]
                        thrpt:  [806.53 MiB/s 808.49 MiB/s 810.41 MiB/s]
                 change:
                        time:   [-40.413% -40.185% -39.984%] (p = 0.00 < 0.05)
                        thrpt:  [+66.623% +67.181% +67.822%]
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
lexer/unicode/pypinyin.py
                        time:   [12.422 µs 12.445 µs 12.467 µs]
                        thrpt:  [337.03 MiB/s 337.65 MiB/s 338.27 MiB/s]
                 change:
                        time:   [-9.4213% -9.1930% -8.9586%] (p = 0.00 < 0.05)
                        thrpt:  [+9.8401% +10.124% +10.401%]
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe
lexer/pydantic/types.py time:   [107.45 µs 107.50 µs 107.56 µs]
                        thrpt:  [237.11 MiB/s 237.24 MiB/s 237.35 MiB/s]
                 change:
                        time:   [-4.0108% -3.7005% -3.3787%] (p = 0.00 < 0.05)
                        thrpt:  [+3.4968% +3.8427% +4.1784%]
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe
lexer/numpy/ctypeslib.py
                        time:   [46.123 µs 46.165 µs 46.208 µs]
                        thrpt:  [360.36 MiB/s 360.69 MiB/s 361.01 MiB/s]
                 change:
                        time:   [-19.313% -18.996% -18.710%] (p = 0.00 < 0.05)
                        thrpt:  [+23.016% +23.451% +23.935%]
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) low mild
  1 (1.00%) high mild
  4 (4.00%) high severe
lexer/large/dataset.py  time:   [231.07 µs 231.19 µs 231.33 µs]
                        thrpt:  [175.87 MiB/s 175.97 MiB/s 176.06 MiB/s]
                 change:
                        time:   [-2.0437% -1.7663% -1.4922%] (p = 0.00 < 0.05)
                        thrpt:  [+1.5148% +1.7981% +2.0864%]
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  5 (5.00%) high mild
  5 (5.00%) high severe
```
2024-02-08 17:23:06 +00:00
Seo Sanghyeon
df7fb95cbc Index multiline f-strings (#9837)
Fix #9777.
2024-02-05 21:25:33 -05:00
Micha Reiser
47ad7b4500 Approximate tokens len (#9546) 2024-01-19 17:39:37 +01:00
Micha Reiser
21f2d0c90b Add an explicit fast path for whitespace to is_identifier_continuation (#9532) 2024-01-16 08:23:43 +00:00
Micha Reiser
f192c72596 Remove type parameter from parse_* methods (#9466) 2024-01-11 19:41:19 +01:00
Micha Reiser
94968fedd5 Use Rust 1.75 toolchain (#9437) 2024-01-08 18:03:16 +01:00
Micha Reiser
b1a5df8694 Move locate_cmp_ops to invalid_literal_comparisons (#9438) 2024-01-08 13:15:36 +01:00
Charlie Marsh
1666c7a5cb Add size hints to string parser (#9413) 2024-01-06 15:59:34 -05:00
Charlie Marsh
f0d43dafcf Ignore trailing quotes for unclosed l-brace errors (#9388)
## Summary

Given:

```python
F"{"ڤ
```

We try to locate the "unclosed left brace" error by subtracting the
quote size from the lexer offset -- so we subtract 1 from the end of the
source, which puts us in the middle of a Unicode character. I don't
think we should try to adjust the offset in this way, since there can be
content _after_ the quote. For example, with the advent of PEP 701, this
string could reasonably be fixed as:

```python
F"{"ڤ"}"
````

Closes https://github.com/astral-sh/ruff/issues/9379.
2024-01-04 05:00:55 +00:00
Charlie Marsh
9073220887 Make all dependencies workspace dependencies (#9333)
## Summary

This PR modifies our `Cargo.toml` files to use workspace dependencies
for _all_ dependencies, rather than the status quo of sporadically
trying to use workspace dependencies for those dependencies that are
used across multiple crates. I find the current situation more confusing
and harder to manage, since we have a mix of workspace and crate-local
dependencies, whereas this setup consistently uses the same approach for
all dependencies.
2024-01-02 13:41:59 +00:00
Charlie Marsh
48e04cc2c8 Add row and column numbers to formatted parse errors (#9321)
## Summary

We now render parse errors in the formatter identically to those in the
linter, e.g.:

```
❯ cargo run -p ruff_cli -- format foo.py
error: Failed to parse foo.py:1:17: Unexpected token '='
```

Closes https://github.com/astral-sh/ruff/issues/8338.

Closes https://github.com/astral-sh/ruff/issues/9311.
2023-12-31 07:10:45 -05:00
Charlie Marsh
e80260a3c5 Remove source path from parser errors (#9322)
## Summary

I always found it odd that we had to pass this in, since it's really
higher-level context for the error. The awkwardness is further evidenced
by the fact that we pass in fake values everywhere (even outside of
tests). The source path isn't actually used to display the error; it's
only accessed elsewhere to _re-display_ the error in certain cases. This
PR modifies to instead pass the path directly in those cases.
2023-12-30 20:33:05 +00:00
Charlie Marsh
97e9d3c54f Use Display for formatter parse errors (#9316)
## Summary

This helps a bit with (but does not close) the issues described in
https://github.com/astral-sh/ruff/issues/9311. E.g., now, we at least
see: `error: Failed to format main.py: source contains syntax errors:
invalid syntax. Got unexpected token '=' at byte offset 20`.
2023-12-29 22:26:57 +00:00
Charlie Marsh
9d6444138b Remove lexing and parsing from the linter benchmark (#9264)
## Summary

This PR adds some helper structs to the linter paths to enable passing
in the pre-computed tokens and parsed source code during benchmarking,
to remove lexing and parsing from the overall linter benchmark
measurement. We already remove parsing for the formatter, and we have
separate benchmarks for the lexer and the parser, so this should make it
much easier to measure linter performance changes.
2023-12-23 16:43:11 -05:00
Andrew Gallant
3ce145c476 release: switch to Cargo's default (#9031)
This sets `lto = "thin"` instead of using "fat" LTO, and sets
`codegen-units = 16`. These are the defaults for Cargo's `release`
profile, and I think it may give us faster iteration times, especially
when benchmarking. The point of this PR is to see what kind of impact
this has on benchmarks. It is expected that benchmarks may regress to
some extent.

I did some quick ad hoc experiments to quantify this change in compile
times. Namely, I ran:

    cargo build --profile release -p ruff_cli

Then I ran

touch crates/ruff_python_formatter/src/expression/string/docstring.rs

(because that's where i've been working lately) and re-ran

    cargo build --profile release -p ruff_cli

This last command is what I timed, since it reflects how much time one
has to wait between making a change and getting a compiled artifact.

Here are my results:

* With status quo `release` profile, build takes 77s
* with `release` but `lto = "thin"`, build takes 41s
* with `release`, but `lto = false`, build takes 19s
* with `release`, but `lto = false` **and** `codegen-units = 16`, build
takes 7s
* with `release`, but `lto = "thin"` **and** `codegen-units = 16`, build
takes 16s (i believe this is the default `release` configuration)

This PR represents the last option. It's not the fastest to compile, but
it's nearly a whole minute faster! The idea is that with `codegen-units
= 16`, we still make use of parallelism, but keep _some_ level of LTO on
to try and re-gain what we lose by increasing the number of codegen
units.
2023-12-15 08:19:35 -05:00
Dhruv Manilawala
cdac90ef68 New AST nodes for f-string elements (#8835)
Rebase of #6365 authored by @davidszotten.

## Summary

This PR updates the AST structure for an f-string elements.

The main **motivation** behind this change is to have a dedicated node
for the string part of an f-string. Previously, the existing
`ExprStringLiteral` node was used for this purpose which isn't exactly
correct. The `ExprStringLiteral` node should include the quotes as well
in the range but the f-string literal element doesn't include the quote
as it's a specific part within an f-string. For example,

```python
f"foo {x}"
# ^^^^
# This is the literal part of an f-string
```

The introduction of `FStringElement` enum is helpful which represent
either the literal part or the expression part of an f-string.

### Rule Updates

This means that there'll be two nodes representing a string depending on
the context. One for a normal string literal while the other is a string
literal within an f-string. The AST checker is updated to accommodate
this change. The rules which work on string literal are updated to check
on the literal part of f-string as well.

#### Notes

1. The `Expr::is_literal_expr` method would check for
`ExprStringLiteral` and return true if so. But now that we don't
represent the literal part of an f-string using that node, this improves
the method's behavior and confines to the actual expression. We do have
the `FStringElement::is_literal` method.
2. We avoid checking if we're in a f-string context before adding to
`string_type_definitions` because the f-string literal is now a
dedicated node and not part of `Expr`.
3. Annotations cannot use f-string so we avoid changing any rules which
work on annotation and checks for `ExprStringLiteral`.

## Test Plan

- All references of `Expr::StringLiteral` were checked to see if any of
the rules require updating to account for the f-string literal element
node.
- New test cases are added for rules which check against the literal
part of an f-string.
- Check the ecosystem results and ensure it remains unchanged.

## Performance

There's a performance penalty in the parser. The reason for this remains
unknown as it seems that the generated assembly code is now different
for the `__reduce154` function. The reduce function body is just popping
the `ParenthesizedExpr` on top of the stack and pushing it with the new
location.

- The size of `FStringElement` enum is the same as `Expr` which is what
it replaces in `FString::format_spec`
- The size of `FStringExpressionElement` is the same as
`ExprFormattedValue` which is what it replaces

I tried reducing the `Expr` enum from 80 bytes to 72 bytes but it hardly
resulted in any performance gain. The difference can be seen here:
- Original profile: https://share.firefox.dev/3Taa7ES
- Profile after boxing some node fields:
https://share.firefox.dev/3GsNXpD

### Backtracking

I tried backtracking the changes to see if any of the isolated change
produced this regression. The problem here is that the overall change is
so small that there's only a single checkpoint where I can backtrack and
that checkpoint results in the same regression. This checkpoint is to
revert using `Expr` to the `FString::format_spec` field. After this
point, the change would revert back to the original implementation.

## Review process

The review process is similar to #7927. The first set of commits update
the node structure, parser, and related AST files. Then, further commits
update the linter and formatter part to account for the AST change.

---------

Co-authored-by: David Szotten <davidszotten@gmail.com>
2023-12-07 10:28:05 -06:00
Micha Reiser
7e390d3772 Move ParenthesizedExpr to ruff_python_parser (#8987) 2023-12-04 05:36:28 +00:00
Charlie Marsh
20782ab02c Support type alias statements in simple statement positions (#8916)
<!--
Thank you for contributing to Ruff! To help us out with reviewing,
please consider the following:

- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->

## Summary

Our `SoftKeywordTokenizer` only respected soft keywords in compound
statement positions -- for example, at the start of a logical line:

```python
type X = int
```

However, type aliases can also appear in simple statement positions,
like:

```python
class Class: type X = int
```

(Note that `match` and `case` are _not_ valid keywords in such
positions.)

This PR upgrades the tokenizer to track both kinds of valid positions.

Closes https://github.com/astral-sh/ruff/issues/8900.
Closes https://github.com/astral-sh/ruff/issues/8899.

## Test Plan

`cargo test`
2023-11-30 19:15:19 +00:00
Charlie Marsh
774c77adae Avoid off-by-one error in with-item named expressions (#8915)
## Summary

Given `with (a := b): pass`, we truncate the `WithItem` range by one on
both sides such that the parentheses are part of the statement, rather
than the item. However, for `with (a := b) as x: pass`, we want to avoid
this trick.

Closes https://github.com/astral-sh/ruff/issues/8913.
2023-11-30 00:11:04 +00:00