Files
ruff/crates/red_knot_python_semantic/src/lint.rs
Micha Reiser 881375a8d9 [red-knot] Lint registry and rule selection (#14874)
## Summary

This is the third and last PR in this stack that adds support for
toggling lints at a per-rule level.

This PR introduces a new `LintRegistry`, a central index of known lints.
The registry is required because we want to support lint rules from many
different crates but need a way to look them up by name, e.g., when
resolving a lint from a name in the configuration or analyzing a
suppression comment.

Adding a lint now requires two steps:

1. Declare the lint with `declare_lint`
2. Register the lint in the registry inside the `register_lints`
function.

I considered some more involved macros to avoid changes in two places.
Still, I ultimately decided against it because a) it's just two places
and b) I'd expect that registering a type checker lint will differ from
registering a lint that runs as a rule in the linter. I worry that any
more opinionated design could limit our options when working on the
linter, so I kept it simple.

The second part of this PR is the `RuleSelection`. It stores which lints
are enabled and what severity they should use for created diagnostics.
For now, the `RuleSelection` always gets initialized with all known
lints and it uses their default level.

## Linter crates

Each crate that defines lints should export a `register_lints` function
that accepts a `&mut LintRegistryBuilder` to register all its known
lints in the registry. This should make registering all known lints in a
top-level crate easy: Just call `register_lints` of every crate that
defines lint rules.

I considered defining a `LintCollection` trait and even some fancy
macros to accomplish the same but decided to go for this very simplistic
approach for now. We can add more abstraction once needed.

## Lint rules

This is a bit hand-wavy. I don't have a good sense for how our linter
infrastructure will look like, but I expect we'll need a way to register
the rules that should run as part of the red knot linter. One way is to
keep doing what Ruff does by having one massive `checker` and each lint
rule adds a call to itself in the relevant AST visitor methods. An
alternative is that we have a `LintRule` trait that provides common
hooks and implementations will be called at the "right time". Such a
design would need a way to register all known lint implementations,
possibly with the lint. This is where we'd probably want a dedicated
`register_rule` method. A third option is that lint rules are handled
separately from the `LintRegistry` and are specific to the linter crate.

The current design should be flexible enough to support the three
options.


## Documentation generation

The documentation for all known lints can be generated by creating a
factory, registering all lints by calling the `register_lints` methods,
and then querying the registry for the metadata.

## Deserialization and Schema generation

I haven't fully decided what the best approach is when it comes to
deserializing lint rule names:

* Reject invalid names in the deserializer. This gives us error messages
with line and column numbers (by serde)
* Don't validate lint rule names during deserialization; defer the
validation until the configuration is resolved. This gives us more
control over handling the error, e.g. emit a warning diagnostic instead
of aborting when a rule isn't known.

One technical challenge for both deserialization and schema generation
is that the `Deserialize` and `JSONSchema` traits do not allow passing
the `LintRegistry`, which is required to look up the lints by name. I
suggest that we either rely on the salsa db being set for the current
thread (`salsa::Attach`) or build our own thread-local storage for the
`LintRegistry`. It's the caller's responsibility to make the lint
registry available before calling `Deserialize` or `JSONSchema`.


## CLI support

I prefer deferring adding support for enabling and disabling lints from
the CLI for now because I think it will be easier
to add once I've figured out how to handle configurations. 

## Bitset optimization

Ruff tracks the enabled rules using a cheap copyable `Bitset` instead of
a hash map. This helped improve performance by a few percent (see
https://github.com/astral-sh/ruff/pull/3606). However, this approach is
no longer possible because lints have no "cheap" way to compute their
index inside the registry (other than using a hash map).

We could consider doing something similar to Salsa where each
`LintMetadata` stores a `LazyLintIndex`.

```
pub struct LazyLintIndex {
	cached: OnceLock<(Nonce, LintIndex)>
}

impl LazyLintIndex {
	pub fn get(registry: &LintRegistry, lint: &'static LintMetadata) {
	
	let (nonce, index) = self.cached.get_or_init(|| registry.lint_index(lint));

	if registry.nonce() == nonce {
		index
	} else {
		registry.lint_index(lint)
	}
}
```

Each registry keeps a map from `LintId` to `LintIndex` where `LintIndex`
is in the range of `0...registry.len()`. The `LazyLintIndex` is based on
the assumption that every program has exactly **one** registry. This
assumption allows to cache the `LintIndex` directly on the
`LintMetadata`. The implementation falls back to the "slow" path if
there is more than one registry at runtime.

I was very close to implementing this optimization because it's kind of
fun to implement. I ultimately decided against it because it adds
complexity and I don't think it's worth doing in Red Knot today:

* Red Knot only queries the rule selection when deciding whether or not
to emit a diagnostic. It is rarely used to detect if a certain code
block should run. This is different from Ruff where the rule selection
is queried many times for every single AST node to determine which rules
*should* run.
* I'm not sure if a 2-3% performance improvement is worth the complexity

I suggest revisiting this decision when working on the linter where a
fast path for deciding if a rule is enabled might be more important (but
that depends on how lint rules are implemented)


## Test Plan

I removed a lint from the default rule registry, and the MD tests
started failing because the diagnostics were no longer emitted.
2024-12-11 13:25:19 +01:00

466 lines
13 KiB
Rust

use itertools::Itertools;
use ruff_db::diagnostic::{LintName, Severity};
use rustc_hash::FxHashMap;
use std::hash::Hasher;
use thiserror::Error;
#[derive(Debug, Clone)]
pub struct LintMetadata {
/// The unique identifier for the lint.
pub name: LintName,
/// A one-sentence summary of what the lint catches.
pub summary: &'static str,
/// An in depth explanation of the lint in markdown. Covers what the lint does, why it's bad and possible fixes.
///
/// The documentation may require post-processing to be rendered correctly. For example, lines
/// might have leading or trailing whitespace that should be removed.
pub raw_documentation: &'static str,
/// The default level of the lint if the user doesn't specify one.
pub default_level: Level,
pub status: LintStatus,
/// The source file in which the lint is declared.
pub file: &'static str,
/// The 1-based line number in the source `file` where the lint is declared.
pub line: u32,
}
#[derive(Copy, Clone, Debug, Eq, PartialEq, Ord, PartialOrd)]
pub enum Level {
/// The lint is disabled and should not run.
Ignore,
/// The lint is enabled and diagnostic should have a warning severity.
Warn,
/// The lint is enabled and diagnostics have an error severity.
Error,
}
impl Level {
pub const fn is_error(self) -> bool {
matches!(self, Level::Error)
}
pub const fn is_warn(self) -> bool {
matches!(self, Level::Warn)
}
pub const fn is_ignore(self) -> bool {
matches!(self, Level::Ignore)
}
}
impl TryFrom<Level> for Severity {
type Error = ();
fn try_from(level: Level) -> Result<Self, ()> {
match level {
Level::Ignore => Err(()),
Level::Warn => Ok(Severity::Warning),
Level::Error => Ok(Severity::Error),
}
}
}
impl LintMetadata {
pub fn name(&self) -> LintName {
self.name
}
pub fn summary(&self) -> &str {
self.summary
}
/// Returns the documentation line by line with one leading space and all trailing whitespace removed.
pub fn documentation_lines(&self) -> impl Iterator<Item = &str> {
self.raw_documentation
.lines()
.map(|line| line.strip_prefix(' ').unwrap_or(line).trim_end())
}
/// Returns the documentation as a single string.
pub fn documentation(&self) -> String {
self.documentation_lines().join("\n")
}
pub fn default_level(&self) -> Level {
self.default_level
}
pub fn status(&self) -> &LintStatus {
&self.status
}
pub fn file(&self) -> &str {
self.file
}
pub fn line(&self) -> u32 {
self.line
}
}
#[doc(hidden)]
pub const fn lint_metadata_defaults() -> LintMetadata {
LintMetadata {
name: LintName::of(""),
summary: "",
raw_documentation: "",
default_level: Level::Error,
status: LintStatus::preview("0.0.0"),
file: "",
line: 1,
}
}
#[derive(Copy, Clone, Debug)]
pub enum LintStatus {
/// The lint has been added to the linter, but is not yet stable.
Preview {
/// The version in which the lint was added.
since: &'static str,
},
/// The lint is stable.
Stable {
/// The version in which the lint was stabilized.
since: &'static str,
},
/// The lint is deprecated and no longer recommended for use.
Deprecated {
/// The version in which the lint was deprecated.
since: &'static str,
/// The reason why the lint has been deprecated.
///
/// This should explain why the lint has been deprecated and if there's a replacement lint that users
/// can use instead.
reason: &'static str,
},
/// The lint has been removed and can no longer be used.
Removed {
/// The version in which the lint was removed.
since: &'static str,
/// The reason why the lint has been removed.
reason: &'static str,
},
}
impl LintStatus {
pub const fn preview(since: &'static str) -> Self {
LintStatus::Preview { since }
}
pub const fn stable(since: &'static str) -> Self {
LintStatus::Stable { since }
}
pub const fn deprecated(since: &'static str, reason: &'static str) -> Self {
LintStatus::Deprecated { since, reason }
}
pub const fn removed(since: &'static str, reason: &'static str) -> Self {
LintStatus::Removed { since, reason }
}
pub const fn is_removed(&self) -> bool {
matches!(self, LintStatus::Removed { .. })
}
}
/// Declares a lint rule with the given metadata.
///
/// ```rust
/// use red_knot_python_semantic::declare_lint;
/// use red_knot_python_semantic::lint::{LintStatus, Level};
///
/// declare_lint! {
/// /// ## What it does
/// /// Checks for references to names that are not defined.
/// ///
/// /// ## Why is this bad?
/// /// Using an undefined variable will raise a `NameError` at runtime.
/// ///
/// /// ## Example
/// ///
/// /// ```python
/// /// print(x) # NameError: name 'x' is not defined
/// /// ```
/// pub(crate) static UNRESOLVED_REFERENCE = {
/// summary: "detects references to names that are not defined",
/// status: LintStatus::preview("1.0.0"),
/// default_level: Level::Warn,
/// }
/// }
/// ```
#[macro_export]
macro_rules! declare_lint {
(
$(#[doc = $doc:literal])+
$vis: vis static $name: ident = {
summary: $summary: literal,
status: $status: expr,
// Optional properties
$( $key:ident: $value:expr, )*
}
) => {
$( #[doc = $doc] )+
#[allow(clippy::needless_update)]
$vis static $name: $crate::lint::LintMetadata = $crate::lint::LintMetadata {
name: ruff_db::diagnostic::LintName::of(ruff_macros::kebab_case!($name)),
summary: $summary,
raw_documentation: concat!($($doc,)+ "\n"),
status: $status,
file: file!(),
line: line!(),
$( $key: $value, )*
..$crate::lint::lint_metadata_defaults()
};
};
}
/// A unique identifier for a lint rule.
///
/// Implements `PartialEq`, `Eq`, and `Hash` based on the `LintMetadata` pointer
/// for fast comparison and lookup.
#[derive(Debug, Clone, Copy)]
pub struct LintId {
definition: &'static LintMetadata,
}
impl LintId {
pub const fn of(definition: &'static LintMetadata) -> Self {
LintId { definition }
}
}
impl PartialEq for LintId {
fn eq(&self, other: &Self) -> bool {
std::ptr::eq(self.definition, other.definition)
}
}
impl Eq for LintId {}
impl std::hash::Hash for LintId {
fn hash<H: Hasher>(&self, state: &mut H) {
std::ptr::hash(self.definition, state);
}
}
impl std::ops::Deref for LintId {
type Target = LintMetadata;
fn deref(&self) -> &Self::Target {
self.definition
}
}
#[derive(Default, Debug)]
pub struct LintRegistryBuilder {
/// Registered lints that haven't been removed.
lints: Vec<LintId>,
/// Lints indexed by name, including aliases and removed rules.
by_name: FxHashMap<&'static str, LintEntry>,
}
impl LintRegistryBuilder {
#[track_caller]
pub fn register_lint(&mut self, lint: &'static LintMetadata) {
assert_eq!(
self.by_name.insert(&*lint.name, lint.into()),
None,
"duplicate lint registration for '{name}'",
name = lint.name
);
if !lint.status.is_removed() {
self.lints.push(LintId::of(lint));
}
}
#[track_caller]
pub fn register_alias(&mut self, from: LintName, to: &'static LintMetadata) {
let target = match self.by_name.get(to.name.as_str()) {
Some(LintEntry::Lint(target) | LintEntry::Removed(target)) => target,
Some(LintEntry::Alias(target)) => {
panic!(
"lint alias {from} -> {to:?} points to another alias {target:?}",
target = target.name()
)
}
None => panic!(
"lint alias {from} -> {to} points to non-registered lint",
to = to.name
),
};
assert_eq!(
self.by_name
.insert(from.as_str(), LintEntry::Alias(*target)),
None,
"duplicate lint registration for '{from}'",
);
}
pub fn build(self) -> LintRegistry {
LintRegistry {
lints: self.lints,
by_name: self.by_name,
}
}
}
#[derive(Default, Debug)]
pub struct LintRegistry {
lints: Vec<LintId>,
by_name: FxHashMap<&'static str, LintEntry>,
}
impl LintRegistry {
/// Looks up a lint by its name.
pub fn get(&self, code: &str) -> Result<LintId, GetLintError> {
match self.by_name.get(code) {
Some(LintEntry::Lint(metadata)) => Ok(*metadata),
Some(LintEntry::Alias(lint)) => {
if lint.status.is_removed() {
Err(GetLintError::Removed(lint.name()))
} else {
Ok(*lint)
}
}
Some(LintEntry::Removed(lint)) => Err(GetLintError::Removed(lint.name())),
None => Err(GetLintError::Unknown(code.to_string())),
}
}
/// Returns all registered, non-removed lints.
pub fn lints(&self) -> &[LintId] {
&self.lints
}
/// Returns an iterator over all known aliases and to their target lints.
///
/// This iterator includes aliases that point to removed lints.
pub fn aliases(&self) -> impl Iterator<Item = (LintName, LintId)> + '_ {
self.by_name.iter().filter_map(|(key, value)| {
if let LintEntry::Alias(alias) = value {
Some((LintName::of(key), *alias))
} else {
None
}
})
}
/// Iterates over all removed lints.
pub fn removed(&self) -> impl Iterator<Item = LintId> + '_ {
self.by_name.iter().filter_map(|(_, value)| {
if let LintEntry::Removed(metadata) = value {
Some(*metadata)
} else {
None
}
})
}
}
#[derive(Error, Debug, Clone)]
pub enum GetLintError {
/// The name maps to this removed lint.
#[error("lint {0} has been removed")]
Removed(LintName),
/// No lint with the given name is known.
#[error("unknown lint {0}")]
Unknown(String),
}
#[derive(Debug, PartialEq, Eq)]
pub enum LintEntry {
/// An existing lint rule. Can be in preview, stable or deprecated.
Lint(LintId),
/// A lint rule that has been removed.
Removed(LintId),
Alias(LintId),
}
impl From<&'static LintMetadata> for LintEntry {
fn from(metadata: &'static LintMetadata) -> Self {
if metadata.status.is_removed() {
LintEntry::Removed(LintId::of(metadata))
} else {
LintEntry::Lint(LintId::of(metadata))
}
}
}
#[derive(Debug, Clone, Default)]
pub struct RuleSelection {
/// Map with the severity for each enabled lint rule.
///
/// If a rule isn't present in this map, then it should be considered disabled.
lints: FxHashMap<LintId, Severity>,
}
impl RuleSelection {
/// Creates a new rule selection from all known lints in the registry that are enabled
/// according to their default severity.
pub fn from_registry(registry: &LintRegistry) -> Self {
let lints = registry
.lints()
.iter()
.filter_map(|lint| {
Severity::try_from(lint.default_level())
.ok()
.map(|severity| (*lint, severity))
})
.collect();
RuleSelection { lints }
}
/// Returns an iterator over all enabled lints.
pub fn enabled(&self) -> impl Iterator<Item = LintId> + '_ {
self.lints.keys().copied()
}
/// Returns an iterator over all enabled lints and their severity.
pub fn iter(&self) -> impl ExactSizeIterator<Item = (LintId, Severity)> + '_ {
self.lints.iter().map(|(&lint, &severity)| (lint, severity))
}
/// Returns the configured severity for the lint with the given id or `None` if the lint is disabled.
pub fn severity(&self, lint: LintId) -> Option<Severity> {
self.lints.get(&lint).copied()
}
/// Enables `lint` and configures with the given `severity`.
///
/// Overrides any previous configuration for the lint.
pub fn enable(&mut self, lint: LintId, severity: Severity) {
self.lints.insert(lint, severity);
}
/// Disables `lint` if it was previously enabled.
pub fn disable(&mut self, lint: LintId) {
self.lints.remove(&lint);
}
/// Merges the enabled lints from `other` into this selection.
///
/// Lints from `other` will override any existing configuration.
pub fn merge(&mut self, other: &RuleSelection) {
self.lints.extend(other.iter());
}
}