[pyupgrade] Fix parsing named Unicode escape sequences (UP032) (#21901)
## Summary Fixes https://github.com/astral-sh/ruff/issues/19771 Fixes incorrect parsing of Unicode named escape sequences like `Hey \N{snowman}` in `FormatString`, which were being incorrectly split into separate literal and field parts instead of being treated as a single literal unit. ## Problem The `FormatString` parser incorrectly handles Unicode named escape sequences: - **Current**: `Hey \N{snowman}` is parsed into 2 parts `Literal("Hey \N")` & `Field("snowman")` - **Expected**: `Hey \N{snowman}` should be parsed into 1 part `Literal("Hey \N{snowman}")` This affects f-string conversion rules when fixing `UP032` that rely on proper format string parsing. ## Solution I modified `parse_literal` to detect and handle Unicode named escape sequences before parsing single characters: - Introduced a flag to track when a backslash is "available" to escape something. - When the flag is `true`, and the text starts with `N{`, try to parse the complete Unicode escape sequence as one unit, and set the flag to `false` after parsing successfully. - Set the flag to `false` when the backslash is already consumed. ## Manual Verification `"\N{angle}AOB = {angle}°".format(angle=180)` **Result** ```bash def foo(): - "\N{angle}AOB = {angle}°".format(angle=180) + f"\N{angle}AOB = {180}°" Would fix 1 error. ``` `"\N{snowman} {snowman}".format(snowman=1)` **Result** ```bash def foo(): - "\N{snowman} {snowman}".format(snowman=1) + f"\N{snowman} {1}" Would fix 1 error. ``` `"\\N{snowman} {snowman}".format(snowman=1)` **Result** ```bash def foo(): - "\\N{snowman} {snowman}".format(snowman=1) + f"\\N{1} {1}" Would fix 1 error. ``` ## Test Plan - Added test cases (happy case, invalid case, edge case) for `FormatString` when parsing Unicode escape sequence. - Updated snapshots.
This commit is contained in:
@@ -592,11 +592,23 @@ impl FormatString {
|
||||
fn parse_literal(text: &str) -> Result<(FormatPart, &str), FormatParseError> {
|
||||
let mut cur_text = text;
|
||||
let mut result_string = String::new();
|
||||
let mut pending_escape = false;
|
||||
while !cur_text.is_empty() {
|
||||
if pending_escape
|
||||
&& let Some((unicode_string, remaining)) =
|
||||
FormatString::parse_escaped_unicode_string(cur_text)
|
||||
{
|
||||
result_string.push_str(unicode_string);
|
||||
cur_text = remaining;
|
||||
pending_escape = false;
|
||||
continue;
|
||||
}
|
||||
|
||||
match FormatString::parse_literal_single(cur_text) {
|
||||
Ok((next_char, remaining)) => {
|
||||
result_string.push(next_char);
|
||||
cur_text = remaining;
|
||||
pending_escape = next_char == '\\' && !pending_escape;
|
||||
}
|
||||
Err(err) => {
|
||||
return if result_string.is_empty() {
|
||||
@@ -678,6 +690,13 @@ impl FormatString {
|
||||
}
|
||||
Err(FormatParseError::UnmatchedBracket)
|
||||
}
|
||||
|
||||
fn parse_escaped_unicode_string(text: &str) -> Option<(&str, &str)> {
|
||||
text.strip_prefix("N{")?.find('}').map(|idx| {
|
||||
let end_idx = idx + 3; // 3 for "N{"
|
||||
(&text[..end_idx], &text[end_idx..])
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
pub trait FromTemplate<'a>: Sized {
|
||||
@@ -1020,4 +1039,48 @@ mod tests {
|
||||
Err(FormatParseError::InvalidCharacterAfterRightBracket)
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_format_unicode_escape() {
|
||||
let expected = Ok(FormatString {
|
||||
format_parts: vec![FormatPart::Literal("I am a \\N{snowman}".to_owned())],
|
||||
});
|
||||
|
||||
assert_eq!(FormatString::from_str("I am a \\N{snowman}"), expected);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_format_unicode_escape_with_field() {
|
||||
let expected = Ok(FormatString {
|
||||
format_parts: vec![
|
||||
FormatPart::Literal("I am a \\N{snowman}".to_owned()),
|
||||
FormatPart::Field {
|
||||
field_name: "snowman".to_owned(),
|
||||
conversion_spec: None,
|
||||
format_spec: String::new(),
|
||||
},
|
||||
],
|
||||
});
|
||||
|
||||
assert_eq!(
|
||||
FormatString::from_str("I am a \\N{snowman}{snowman}"),
|
||||
expected
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_format_multiple_escape_with_field() {
|
||||
let expected = Ok(FormatString {
|
||||
format_parts: vec![
|
||||
FormatPart::Literal("I am a \\\\N".to_owned()),
|
||||
FormatPart::Field {
|
||||
field_name: "snowman".to_owned(),
|
||||
conversion_spec: None,
|
||||
format_spec: String::new(),
|
||||
},
|
||||
],
|
||||
});
|
||||
|
||||
assert_eq!(FormatString::from_str("I am a \\\\N{snowman}"), expected);
|
||||
}
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user