Commit Graph

19 Commits

Author SHA1 Message Date
Micha Reiser
c52aa8f065 Basic string formatting
<!--
Thank you for contributing to Ruff! To help us out with reviewing, please consider the following:

- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->

## Summary

This PR implements formatting for non-f-string Strings that do not use implicit concatenation. 

Docstring formatting is out of the scope of this PR.

<!-- What's the purpose of the change? What does it do, and why? -->

## Test Plan

I added a few tests for simple string literals. 

## Performance

Ouch. This is hitting performance somewhat hard. This is probably because we now iterate each string a couple of times:

1. To detect if it is an implicit string continuation
2. To detect if the string contains any new lines
3. To detect the preferred quote
4. To normalize the string

Edit: I integrated the detection of newlines into the preferred quote detection so that we only iterate the string three time.
We can probably do better by merging the implicit string continuation with the quote detection and new line detection by iterating till the end of the string part and returning the offset. We then use our simple tokenizer to skip over any comments or whitespace until we find the first non trivia token. From there we keep continue doing this in a loop until we reach the end o the string. I'll leave this improvement for later.
2023-06-23 09:46:05 +02:00
Charlie Marsh
68b6d30c46 Use consistent Cargo.toml metadata in all crates (#5015) 2023-06-12 00:02:40 +00:00
Charlie Marsh
1d756dc3a7 Move Python whitespace utilities into new ruff_python_whitespace crate (#4993)
## Summary

`ruff_newlines` becomes `ruff_python_whitespace`, and includes the
existing "universal newline" handlers alongside the Python
whitespace-specific utilities.
2023-06-10 00:59:57 +00:00
Charlie Marsh
9d0ffd33ca Move universal newline handling into its own crate (#4729) 2023-05-31 12:00:47 -04:00
Micha Reiser
0cd453bdf0 Generic "comment to node" association logic (#4642) 2023-05-30 09:28:01 +00:00
Micha Reiser
6146b75dd0 Add MultiMap implementation for storing comments (#4639) 2023-05-30 09:51:25 +02:00
Jeong, YunWon
be6e00ef6e Re-integrate RustPython parser repository (#4359)
Co-authored-by: Micha Reiser <micha@reiser.io>
2023-05-11 07:47:17 +00:00
Micha Reiser
cab65b25da Replace row/column based Location with byte-offsets. (#3931) 2023-04-26 18:11:02 +00:00
Charlie Marsh
d919adc13c Introduce a ruff_python_semantic crate (#3865) 2023-04-04 16:50:47 +00:00
Charlie Marsh
ff2c0dd491 Use shared leading_quote implementation in ruff_python_formatter (#3396) 2023-03-08 18:21:59 +00:00
Charlie Marsh
d1c48016eb Rename ruff_python crate to ruff_python_stdlib (#3354)
In hindsight, `ruff_python` is too general. A good giveaway is that it's actually a prefix of some other crates. The intent of this crate is to reimplement pieces of the Python standard library and CPython itself, so `ruff_python_stdlib` feels appropriate.
2023-03-06 13:43:22 +00:00
Jonathan Plasse
8828e12283 Bump dependencies and move more shared dependencies into workspace (#3340) 2023-03-04 12:36:26 -05:00
Charlie Marsh
061495a9eb Make BoolOp its own located token (#3265) 2023-02-28 03:43:28 +00:00
Jeong YunWon
84e96cdcd9 More enum work (#3212) 2023-02-25 11:40:16 -05:00
Charlie Marsh
f967f344fc Add support for basic Constant::Str formatting (#3173)
This PR enables us to apply the proper quotation marks, including support for escapes. There are some significant TODOs, especially around implicit concatenations like:

```py
(
  "abc"
  "def"
)
```

Which are represented as a single AST node, which requires us to tokenize _within_ the formatter to identify all the individual string parts.
2023-02-23 16:23:10 +00:00
Charlie Marsh
095f005bf4 Move RustPython vendored and helper code into its own crate (#3171) 2023-02-23 14:14:16 +00:00
Micha Reiser
ed33b75bad test(ruff_python_formatter): Run all Black tests (#2993)
This PR changes the testing infrastructure to run all black tests and:

* Pass if Ruff and Black generate the same formatting
* Fail and write a markdown snapshot that shows the input code, the differences between Black and Ruff, Ruffs output, and Blacks output

This is achieved by introducing a new `fixture` macro (open to better name suggestions) that "duplicates" the attributed test for every file that matches the specified glob pattern. Creating a new test for each file over having a test that iterates over all files has the advantage that you can run a single test, and that test failures indicate which case is failing. 

The `fixture` macro also makes it straightforward to e.g. setup our own spec tests that test very specific formatting by creating a new folder and use insta to assert the formatted output.
2023-02-22 09:25:06 -05:00
Jonathan Plasse
b75663be6d Add missing rust-version in crates (#3009) 2023-02-19 15:07:17 +00:00
Charlie Marsh
ca49b00e55 Add initial formatter implementation (#2883)
# Summary

This PR contains the code for the autoformatter proof-of-concept.

## Crate structure

The primary formatting hook is the `fmt` function in `crates/ruff_python_formatter/src/lib.rs`.

The current formatter approach is outlined in `crates/ruff_python_formatter/src/lib.rs`, and is structured as follows:

- Tokenize the code using the RustPython lexer.
- In `crates/ruff_python_formatter/src/trivia.rs`, extract a variety of trivia tokens from the token stream. These include comments, trailing commas, and empty lines.
- Generate the AST via the RustPython parser.
- In `crates/ruff_python_formatter/src/cst.rs`, convert the AST to a CST structure. As of now, the CST is nearly identical to the AST, except that every node gets a `trivia` vector. But we might want to modify it further.
- In `crates/ruff_python_formatter/src/attachment.rs`, attach each trivia token to the corresponding CST node. The logic for this is mostly in `decorate_trivia` and is ported almost directly from Prettier (given each token, find its preceding, following, and enclosing nodes, then attach the token to the appropriate node in a second pass).
- In `crates/ruff_python_formatter/src/newlines.rs`, normalize newlines to match Black’s preferences. This involves traversing the CST and inserting or removing `TriviaToken` values as we go.
- Call `format!` on the CST, which delegates to type-specific formatter implementations (e.g., `crates/ruff_python_formatter/src/format/stmt.rs` for `Stmt` nodes, and similar for `Expr` nodes; the others are trivial). Those type-specific implementations delegate to kind-specific functions (e.g., `format_func_def`).

## Testing and iteration

The formatter is being developed against the Black test suite, which was copied over in-full to `crates/ruff_python_formatter/resources/test/fixtures/black`.

The Black fixtures had to be modified to create `[insta](https://github.com/mitsuhiko/insta)`-compatible snapshots, which now exist in the repo.

My approach thus far has been to try and improve coverage by tackling fixtures one-by-one.

## What works, and what doesn’t

- *Most* nodes are supported at a basic level (though there are a few stragglers at time of writing, like `StmtKind::Try`).
- Newlines are properly preserved in most cases.
- Magic trailing commas are properly preserved in some (but not all) cases.
- Trivial leading and trailing standalone comments mostly work (although maybe not at the end of a file).
- Inline comments, and comments within expressions, often don’t work -- they work in a few cases, but it’s one-off right now. (We’re probably associating them with the “right” nodes more often than we are actually rendering them in the right place.)
- We don’t properly normalize string quotes. (At present, we just repeat any constants verbatim.)
- We’re mishandling a bunch of wrapping cases (if we treat Black as the reference implementation). Here are a few examples (demonstrating Black's stable behavior):

```py
# In some cases, if the end expression is "self-closing" (functions,
# lists, dictionaries, sets, subscript accesses, and any length-two
# boolean operations that end in these elments), Black
# will wrap like this...
if some_expression and f(
    b,
    c,
    d,
):
    pass

# ...whereas we do this:
if (
    some_expression
    and f(
        b,
        c,
        d,
    )
):
    pass

# If function arguments can fit on a single line, then Black will
# format them like this, rather than exploding them vertically.
if f(
    a, b, c, d, e, f, g, ...
):
    pass
```

- We don’t properly preserve parentheses in all cases. Black preserves parentheses in some but not all cases.
2023-02-15 04:06:35 +00:00