This fixes a bug where the current indent level was not calculated
correctly for doctests. Namely, it didn't account for the extra indent
level (in terms of ASCII spaces) used by by the PS1 (`>>> `) and PS2
(`... `) prompts. As a result, lines could extend up to 4 spaces beyond
the configured line length limit.
We fix that by passing the `CodeExampleKind` to the `format` routine
instead of just the code itself. In this way, `format` can query whether
there will be any extra indent added _after_ formatting the code and
take that into account for its line length setting.
We add a few regression tests, taken directly from @stinodego's
examples.
Fixes#9126
This PR adds a `as_slice` method to all the string nodes which returns
all the parts of the nodes as a slice. This will be useful in the next
PR to split the string formatting to use this method to extract the
_single node_ or _implicitly concanated nodes_.
## Summary
This PR changes the internal `docstring-code-line-width` setting to
additionally accept a string value `dynamic`. When `dynamic` is set, the
line width is dynamically adjusted when reformatting code snippets in
docstrings based on the indent level of the docstring. The result is
that the reformatted lines from the code snippet should not exceed the
"global" line width configuration for the surrounding source.
This PR does not change the default behavior, although I suspect the
default should probably be `dynamic`.
## Test Plan
I added a new configuration to the existing docstring code tests and
also added a new set of tests dedicated to the new `dynamic` mode.
## Summary
This does the light plumbing necessary to add a new internal option that
permits setting the line width of code examples in docstrings. The plan
is to add the corresponding user facing knob in #8854.
Note that this effectively removes the `same-as-global` configuration
style discussed [in this
comment](https://github.com/astral-sh/ruff/issues/8855#issuecomment-1847230440).
It replaces it with the `{integer}` configuration style only.
There are a lot of commits here, but they are each tiny to make review
easier because of the changes to snapshots.
## Test Plan
I added a new docstring test configuration that sets
`docstring-code-line-width = 60` and examined the differences.
(This is not possible to actually use until
https://github.com/astral-sh/ruff/pull/8854 is merged.)
This commit slots in support for formatting Markdown fenced code
blocks[1]. With the refactoring done for reStructuredText previously,
this ended up being pretty easy to add. Markdown code blocks are also
quite a bit easier to parse and recognize correctly.
One point of contention in #8860 is whether to assume that unlabeled
Markdown code fences are Python or not by default. In this PR, we make
such an assumption. This follows what `rustdoc` does. The mitigation
here is that if an unlabeled code block isn't Python, then it probably
won't parse as Python. And we'll end up skipping it. So in the vast
majority of cases, the worst thing that can happen is a little bit of
wasted work.
Closes#8860
[1]: https://spec.commonmark.org/0.30/#fenced-code-blocks
Rebase of #6365 authored by @davidszotten.
## Summary
This PR updates the AST structure for an f-string elements.
The main **motivation** behind this change is to have a dedicated node
for the string part of an f-string. Previously, the existing
`ExprStringLiteral` node was used for this purpose which isn't exactly
correct. The `ExprStringLiteral` node should include the quotes as well
in the range but the f-string literal element doesn't include the quote
as it's a specific part within an f-string. For example,
```python
f"foo {x}"
# ^^^^
# This is the literal part of an f-string
```
The introduction of `FStringElement` enum is helpful which represent
either the literal part or the expression part of an f-string.
### Rule Updates
This means that there'll be two nodes representing a string depending on
the context. One for a normal string literal while the other is a string
literal within an f-string. The AST checker is updated to accommodate
this change. The rules which work on string literal are updated to check
on the literal part of f-string as well.
#### Notes
1. The `Expr::is_literal_expr` method would check for
`ExprStringLiteral` and return true if so. But now that we don't
represent the literal part of an f-string using that node, this improves
the method's behavior and confines to the actual expression. We do have
the `FStringElement::is_literal` method.
2. We avoid checking if we're in a f-string context before adding to
`string_type_definitions` because the f-string literal is now a
dedicated node and not part of `Expr`.
3. Annotations cannot use f-string so we avoid changing any rules which
work on annotation and checks for `ExprStringLiteral`.
## Test Plan
- All references of `Expr::StringLiteral` were checked to see if any of
the rules require updating to account for the f-string literal element
node.
- New test cases are added for rules which check against the literal
part of an f-string.
- Check the ecosystem results and ensure it remains unchanged.
## Performance
There's a performance penalty in the parser. The reason for this remains
unknown as it seems that the generated assembly code is now different
for the `__reduce154` function. The reduce function body is just popping
the `ParenthesizedExpr` on top of the stack and pushing it with the new
location.
- The size of `FStringElement` enum is the same as `Expr` which is what
it replaces in `FString::format_spec`
- The size of `FStringExpressionElement` is the same as
`ExprFormattedValue` which is what it replaces
I tried reducing the `Expr` enum from 80 bytes to 72 bytes but it hardly
resulted in any performance gain. The difference can be seen here:
- Original profile: https://share.firefox.dev/3Taa7ES
- Profile after boxing some node fields:
https://share.firefox.dev/3GsNXpD
### Backtracking
I tried backtracking the changes to see if any of the isolated change
produced this regression. The problem here is that the overall change is
so small that there's only a single checkpoint where I can backtrack and
that checkpoint results in the same regression. This checkpoint is to
revert using `Expr` to the `FString::format_spec` field. After this
point, the change would revert back to the original implementation.
## Review process
The review process is similar to #7927. The first set of commits update
the node structure, parser, and related AST files. Then, further commits
update the linter and formatter part to account for the AST change.
---------
Co-authored-by: David Szotten <davidszotten@gmail.com>
(This is not possible to actually use until
https://github.com/astral-sh/ruff/pull/8854 is merged.)
ruff_python_formatter: add reStructuredText docstring formatting support
This commit makes use of the refactoring done in prior commits to slot
in reStructuredText support. Essentially, we add a new type of code
example and look for *both* literal blocks and code block directives.
Literal blocks are treated as Python by default because it seems to be a
[common
practice](https://github.com/adamchainz/blacken-docs/issues/195).
That is, literal blocks like this:
```
def example():
"""
Here's an example::
foo( 1 )
All done.
"""
pass
```
Will get reformatted. And code blocks (via reStructuredText directives)
will also get reformatted:
```
def example():
"""
Here's an example:
.. code-block:: python
foo( 1 )
All done.
"""
pass
```
When looking for a code block, it is possible for it to become invalid.
In which case, we back out of looking for a code example and print the
lines out as they are. As with doctest formatting, if reformatting the
code would result in invalid Python or if the code collected from the
block is invalid, then formatting is also skipped.
A number of tests have been added to check both the formatting and
resetting behavior. Mixed indentation is also tested a fair bit, since
one of my initial attempts at dealing with mixed indentation ended up
not working.
I recommend working through this PR commit-by-commit. There is in
particular a somewhat gnarly refactoring before reST support is added.
Closes#8859
In the source of working on #8859, I made a number of smallish refactors
to how code snippet formatting works. Most or all of these were
motivated by writing in support for reStructuredText blocks. They have
some fundamentally different requirements than doctests, and there are a
lot more ways for reStructuredText blocks to become invalid.
(Commit-by-commit review is recommended as the commit messages provide
further context on each change. I split this off from ongoing work to
make review more manageable.)
This turns `string` into a parent module with a `docstring` sub-module.
I arranged things this way because there are parts of the `string`
module that the `docstring` module wants to know about (such as a
`NormalizedString`). The alternative I think would be to make
`docstring` a sibling module and expose more of `string`'s internals.
I think I overall like this change because it gives docstring handling a
bit more room to breath. It has grown quite a bit with the addition of
code snippet formatting.
[This was suggested by
@charliermarsh.](https://github.com/astral-sh/ruff/pull/8811#discussion_r1401169531)