[pyupgrade] Fix parsing named Unicode escape sequences (UP032) (#21901)

## Summary

Fixes https://github.com/astral-sh/ruff/issues/19771

Fixes incorrect parsing of Unicode named escape sequences like `Hey
\N{snowman}` in `FormatString`, which were being incorrectly split into
separate literal and field parts instead of being treated as a single
literal unit.

## Problem

The `FormatString` parser incorrectly handles Unicode named escape
sequences:
- **Current**: `Hey \N{snowman}` is parsed into 2 parts `Literal("Hey
\N")` & `Field("snowman")`
- **Expected**: `Hey \N{snowman}` should be parsed into 1 part
`Literal("Hey \N{snowman}")`

This affects f-string conversion rules when fixing `UP032` that rely on
proper format string parsing.

## Solution

I modified `parse_literal` to detect and handle Unicode named escape
sequences before parsing single characters:
- Introduced a flag to track when a backslash is "available" to escape
something.
- When the flag is `true`, and the text starts with `N{`, try to parse
the complete Unicode escape sequence as one unit, and set the flag to
`false` after parsing successfully.
- Set the flag to `false` when the backslash is already consumed.

## Manual Verification

`"\N{angle}AOB = {angle}°".format(angle=180)` 

**Result**

```bash
 def foo():
-    "\N{angle}AOB = {angle}°".format(angle=180)
+    f"\N{angle}AOB = {180}°"

Would fix 1 error.
```

`"\N{snowman} {snowman}".format(snowman=1)`

**Result**
```bash
 def foo():
-    "\N{snowman} {snowman}".format(snowman=1)
+    f"\N{snowman} {1}"

Would fix 1 error.
```

`"\\N{snowman} {snowman}".format(snowman=1)`

**Result**
```bash
 def foo():
-    "\\N{snowman} {snowman}".format(snowman=1)
+    f"\\N{1} {1}"

Would fix 1 error.
```

## Test Plan

- Added test cases (happy case, invalid case, edge case) for
`FormatString` when parsing Unicode escape sequence.
- Updated snapshots.
This commit is contained in:
Phong Do
2025-12-16 22:33:39 +01:00
committed by GitHub
parent ad3de4e488
commit b0bc990cbf
3 changed files with 389 additions and 288 deletions

View File

@@ -592,11 +592,23 @@ impl FormatString {
fn parse_literal(text: &str) -> Result<(FormatPart, &str), FormatParseError> {
let mut cur_text = text;
let mut result_string = String::new();
let mut pending_escape = false;
while !cur_text.is_empty() {
if pending_escape
&& let Some((unicode_string, remaining)) =
FormatString::parse_escaped_unicode_string(cur_text)
{
result_string.push_str(unicode_string);
cur_text = remaining;
pending_escape = false;
continue;
}
match FormatString::parse_literal_single(cur_text) {
Ok((next_char, remaining)) => {
result_string.push(next_char);
cur_text = remaining;
pending_escape = next_char == '\\' && !pending_escape;
}
Err(err) => {
return if result_string.is_empty() {
@@ -678,6 +690,13 @@ impl FormatString {
}
Err(FormatParseError::UnmatchedBracket)
}
fn parse_escaped_unicode_string(text: &str) -> Option<(&str, &str)> {
text.strip_prefix("N{")?.find('}').map(|idx| {
let end_idx = idx + 3; // 3 for "N{"
(&text[..end_idx], &text[end_idx..])
})
}
}
pub trait FromTemplate<'a>: Sized {
@@ -1020,4 +1039,48 @@ mod tests {
Err(FormatParseError::InvalidCharacterAfterRightBracket)
);
}
#[test]
fn test_format_unicode_escape() {
let expected = Ok(FormatString {
format_parts: vec![FormatPart::Literal("I am a \\N{snowman}".to_owned())],
});
assert_eq!(FormatString::from_str("I am a \\N{snowman}"), expected);
}
#[test]
fn test_format_unicode_escape_with_field() {
let expected = Ok(FormatString {
format_parts: vec![
FormatPart::Literal("I am a \\N{snowman}".to_owned()),
FormatPart::Field {
field_name: "snowman".to_owned(),
conversion_spec: None,
format_spec: String::new(),
},
],
});
assert_eq!(
FormatString::from_str("I am a \\N{snowman}{snowman}"),
expected
);
}
#[test]
fn test_format_multiple_escape_with_field() {
let expected = Ok(FormatString {
format_parts: vec![
FormatPart::Literal("I am a \\\\N".to_owned()),
FormatPart::Field {
field_name: "snowman".to_owned(),
conversion_spec: None,
format_spec: String::new(),
},
],
});
assert_eq!(FormatString::from_str("I am a \\\\N{snowman}"), expected);
}
}