## Summary We use `.trim()` and friends in a bunch of places, to strip whitespace from source code. However, not all Unicode whitespace characters are considered "whitespace" in Python, which only supports the standard space, tab, and form-feed characters. This PR audits our usages of `.trim()`, `.trim_start()`, `.trim_end()`, and `char::is_whitespace`, and replaces them as appropriate with a new `.trim_whitespace()` analogues, powered by a `PythonWhitespace` trait. In general, the only place that should continue to use `.trim()` is content within docstrings, which don't need to adhere to Python's semantic definitions of whitespace. Closes #4991.
44 lines
1.6 KiB
Rust
44 lines
1.6 KiB
Rust
/// Returns `true` for [whitespace](https://docs.python.org/3/reference/lexical_analysis.html#whitespace-between-tokens)
|
|
/// characters.
|
|
pub const fn is_python_whitespace(c: char) -> bool {
|
|
matches!(
|
|
c,
|
|
// Space, tab, or form-feed
|
|
' ' | '\t' | '\x0C'
|
|
)
|
|
}
|
|
|
|
/// Extract the leading indentation from a line.
|
|
pub fn leading_indentation(line: &str) -> &str {
|
|
line.find(|char: char| !is_python_whitespace(char))
|
|
.map_or(line, |index| &line[..index])
|
|
}
|
|
|
|
pub trait PythonWhitespace {
|
|
/// Like `str::trim()`, but only removes whitespace characters that Python considers
|
|
/// to be [whitespace](https://docs.python.org/3/reference/lexical_analysis.html#whitespace-between-tokens).
|
|
fn trim_whitespace(&self) -> &Self;
|
|
|
|
/// Like `str::trim_start()`, but only removes whitespace characters that Python considers
|
|
/// to be [whitespace](https://docs.python.org/3/reference/lexical_analysis.html#whitespace-between-tokens).
|
|
fn trim_whitespace_start(&self) -> &Self;
|
|
|
|
/// Like `str::trim_end()`, but only removes whitespace characters that Python considers
|
|
/// to be [whitespace](https://docs.python.org/3/reference/lexical_analysis.html#whitespace-between-tokens).
|
|
fn trim_whitespace_end(&self) -> &Self;
|
|
}
|
|
|
|
impl PythonWhitespace for str {
|
|
fn trim_whitespace(&self) -> &Self {
|
|
self.trim_matches(is_python_whitespace)
|
|
}
|
|
|
|
fn trim_whitespace_start(&self) -> &Self {
|
|
self.trim_start_matches(is_python_whitespace)
|
|
}
|
|
|
|
fn trim_whitespace_end(&self) -> &Self {
|
|
self.trim_end_matches(is_python_whitespace)
|
|
}
|
|
}
|