[red-knot] Combine terminal statement support with statically known branches (#15817)

This example from @sharkdp shows how terminal statements can appear in
statically known branches:
https://github.com/astral-sh/ruff/pull/15676#issuecomment-2618809716

```py
def _(cond: bool):
    x = "a"
    if cond:
        x = "b"
        if True:
            return

    reveal_type(x)  # revealed: "a", "b"; should be "a"
```

We now use visibility constraints to track reachability, which allows us
to model this correctly. There are two related changes as a result:

- New bindings are not assumed to be visible; they inherit the current
"scope start" visibility, which effectively means that new bindings are
visible if/when the current flow is reachable

- When simplifying visibility constraints after branching control flow,
we only simplify if none of the intervening branches included a terminal
statement. That is, earlier unaffected bindings are only _actually_
unaffected if all branches make it to the merge point.
This commit is contained in:
Douglas Creager
2025-02-05 17:47:49 -05:00
committed by GitHub
parent d296f602e7
commit 0906554357
6 changed files with 148 additions and 68 deletions

View File

@@ -124,42 +124,49 @@ def _(e: Exception | type[Exception] | None):
## Exception cause is not an exception
```py
try:
raise EOFError() from GeneratorExit # fine
except:
...
def _():
try:
raise EOFError() from GeneratorExit # fine
except:
...
try:
raise StopIteration from MemoryError() # fine
except:
...
def _():
try:
raise StopIteration from MemoryError() # fine
except:
...
try:
raise BufferError() from None # fine
except:
...
def _():
try:
raise BufferError() from None # fine
except:
...
try:
raise ZeroDivisionError from False # error: [invalid-raise]
except:
...
def _():
try:
raise ZeroDivisionError from False # error: [invalid-raise]
except:
...
try:
raise SystemExit from bool() # error: [invalid-raise]
except:
...
def _():
try:
raise SystemExit from bool() # error: [invalid-raise]
except:
...
try:
raise
except KeyboardInterrupt as e: # fine
reveal_type(e) # revealed: KeyboardInterrupt
raise LookupError from e # fine
def _():
try:
raise
except KeyboardInterrupt as e: # fine
reveal_type(e) # revealed: KeyboardInterrupt
raise LookupError from e # fine
try:
raise
except int as e: # error: [invalid-exception-caught]
reveal_type(e) # revealed: Unknown
raise KeyError from e
def _():
try:
raise
except int as e: # error: [invalid-exception-caught]
reveal_type(e) # revealed: Unknown
raise KeyError from e
def _(e: Exception | type[Exception]):
raise ModuleNotFoundError from e # fine

View File

@@ -452,6 +452,9 @@ def raise_in_both_branches(cond: bool):
# Exceptions can occur anywhere, so "before" and "raise" are valid possibilities
reveal_type(x) # revealed: Literal["before", "raise1", "raise2"]
else:
# This branch is unreachable, since all control flows in the `try` clause raise exceptions.
# As a result, this binding should never be reachable, since new bindings are visible only
# when they are reachable.
x = "unreachable"
finally:
# Exceptions can occur anywhere, so "before" and "raise" are valid possibilities
@@ -623,9 +626,9 @@ def return_from_nested_if(cond1: bool, cond2: bool):
## Statically known terminal statements
Terminal statements do not yet interact correctly with statically known bounds. In this example, we
should see that the `return` statement is always executed, and therefore that the `"b"` assignment
is not visible to the `reveal_type`.
We model reachability using the same visibility constraints that we use to model statically known
bounds. In this example, we see that the `return` statement is always executed, and therefore that
the `"b"` assignment is not visible to the `reveal_type`.
```py
def _(cond: bool):
@@ -635,6 +638,26 @@ def _(cond: bool):
if True:
return
# TODO: Literal["a"]
reveal_type(x) # revealed: Literal["a", "b"]
reveal_type(x) # revealed: Literal["a"]
```
## Bindings after a terminal statement are unreachable
Any bindings introduced after a terminal statement are unreachable, and are currently considered not
visible. We [anticipate](https://github.com/astral-sh/ruff/issues/15797) that we want to provide a
more useful analysis for code after terminal statements.
```py
def f(cond: bool) -> str:
x = "before"
if cond:
reveal_type(x) # revealed: Literal["before"]
return
x = "after-return"
# TODO: no unresolved-reference error
# error: [unresolved-reference]
reveal_type(x) # revealed: Unknown
else:
x = "else"
reveal_type(x) # revealed: Literal["else"]
```

View File

@@ -793,9 +793,30 @@ where
&mut builder.current_first_parameter_name,
&mut first_parameter_name,
);
builder.visit_body(body);
builder.current_first_parameter_name = first_parameter_name;
// TODO: Fix how we determine the public types of symbols in a
// function-like scope: https://github.com/astral-sh/ruff/issues/15777
//
// In the meantime, visit the function body, but treat the last statement
// specially if it is a return. If it is, this would cause all definitions
// in the function to be marked as non-visible with our current treatment
// of terminal statements. Since we currently model the externally visible
// definitions in a function scope as the set of bindings that are visible
// at the end of the body, we then consider this function to have no
// externally visible definitions. To get around this, we take a flow
// snapshot just before processing the return statement, and use _that_ as
// the "end-of-body" state that we resolve external references against.
if let Some((last_stmt, first_stmts)) = body.split_last() {
builder.visit_body(first_stmts);
let pre_return_state = matches!(last_stmt, ast::Stmt::Return(_))
.then(|| builder.flow_snapshot());
builder.visit_stmt(last_stmt);
if let Some(pre_return_state) = pre_return_state {
builder.flow_restore(pre_return_state);
}
}
builder.current_first_parameter_name = first_parameter_name;
builder.pop_scope()
},
);

View File

@@ -478,7 +478,6 @@ impl std::iter::FusedIterator for DeclarationsIterator<'_, '_> {}
pub(super) struct FlowSnapshot {
symbol_states: IndexVec<ScopedSymbolId, SymbolState>,
scope_start_visibility: ScopedVisibilityConstraintId,
reachable: bool,
}
#[derive(Debug)]
@@ -506,8 +505,6 @@ pub(super) struct UseDefMapBuilder<'db> {
/// Currently live bindings and declarations for each symbol.
symbol_states: IndexVec<ScopedSymbolId, SymbolState>,
reachable: bool,
}
impl Default for UseDefMapBuilder<'_> {
@@ -520,14 +517,13 @@ impl Default for UseDefMapBuilder<'_> {
bindings_by_use: IndexVec::new(),
definitions_by_definition: FxHashMap::default(),
symbol_states: IndexVec::new(),
reachable: true,
}
}
}
impl<'db> UseDefMapBuilder<'db> {
pub(super) fn mark_unreachable(&mut self) {
self.reachable = false;
self.record_visibility_constraint(ScopedVisibilityConstraintId::ALWAYS_FALSE);
}
pub(super) fn add_symbol(&mut self, symbol: ScopedSymbolId) {
@@ -544,7 +540,7 @@ impl<'db> UseDefMapBuilder<'db> {
binding,
SymbolDefinitions::Declarations(symbol_state.declarations().clone()),
);
symbol_state.record_binding(def_id);
symbol_state.record_binding(def_id, self.scope_start_visibility);
}
pub(super) fn add_constraint(&mut self, constraint: Constraint<'db>) -> ScopedConstraintId {
@@ -596,7 +592,11 @@ impl<'db> UseDefMapBuilder<'db> {
pub(super) fn simplify_visibility_constraints(&mut self, snapshot: FlowSnapshot) {
debug_assert!(self.symbol_states.len() >= snapshot.symbol_states.len());
self.scope_start_visibility = snapshot.scope_start_visibility;
// If there are any control flow paths that have become unreachable between `snapshot` and
// now, then it's not valid to simplify any visibility constraints to `snapshot`.
if self.scope_start_visibility != snapshot.scope_start_visibility {
return;
}
// Note that this loop terminates when we reach a symbol not present in the snapshot.
// This means we keep visibility constraints for all new symbols, which is intended,
@@ -632,7 +632,7 @@ impl<'db> UseDefMapBuilder<'db> {
let def_id = self.all_definitions.push(Some(definition));
let symbol_state = &mut self.symbol_states[symbol];
symbol_state.record_declaration(def_id);
symbol_state.record_binding(def_id);
symbol_state.record_binding(def_id, self.scope_start_visibility);
}
pub(super) fn record_use(&mut self, symbol: ScopedSymbolId, use_id: ScopedUseId) {
@@ -649,7 +649,6 @@ impl<'db> UseDefMapBuilder<'db> {
FlowSnapshot {
symbol_states: self.symbol_states.clone(),
scope_start_visibility: self.scope_start_visibility,
reachable: self.reachable,
}
}
@@ -672,21 +671,23 @@ impl<'db> UseDefMapBuilder<'db> {
num_symbols,
SymbolState::undefined(self.scope_start_visibility),
);
self.reachable = snapshot.reachable;
}
/// Merge the given snapshot into the current state, reflecting that we might have taken either
/// path to get here. The new state for each symbol should include definitions from both the
/// prior state and the snapshot.
pub(super) fn merge(&mut self, snapshot: FlowSnapshot) {
// Unreachable snapshots should not be merged: If the current snapshot is unreachable, it
// should be completely overwritten by the snapshot we're merging in. If the other snapshot
// is unreachable, we should return without merging.
if !snapshot.reachable {
// As an optimization, if we know statically that either of the snapshots is always
// unreachable, we can leave it out of the merged result entirely. Note that we cannot
// perform any type inference at this point, so this is largely limited to unreachability
// via terminal statements. If a flow's reachability depends on an expression in the code,
// we will include the flow in the merged result; the visibility constraints of its
// bindings will include this reachability condition, so that later during type inference,
// we can determine whether any particular binding is non-visible due to unreachability.
if snapshot.scope_start_visibility == ScopedVisibilityConstraintId::ALWAYS_FALSE {
return;
}
if !self.reachable {
if self.scope_start_visibility == ScopedVisibilityConstraintId::ALWAYS_FALSE {
self.restore(snapshot);
return;
}
@@ -712,9 +713,6 @@ impl<'db> UseDefMapBuilder<'db> {
self.scope_start_visibility = self
.visibility_constraints
.add_or_constraint(self.scope_start_visibility, snapshot.scope_start_visibility);
// Both of the snapshots are reachable, so the merged result is too.
self.reachable = true;
}
pub(super) fn finish(mut self) -> UseDefMap<'db> {

View File

@@ -237,7 +237,11 @@ impl SymbolBindings {
}
/// Record a newly-encountered binding for this symbol.
pub(super) fn record_binding(&mut self, binding_id: ScopedDefinitionId) {
pub(super) fn record_binding(
&mut self,
binding_id: ScopedDefinitionId,
visibility_constraint: ScopedVisibilityConstraintId,
) {
// The new binding replaces all previous live bindings in this path, and has no
// constraints.
self.live_bindings = Bindings::with(binding_id.into());
@@ -245,8 +249,7 @@ impl SymbolBindings {
self.constraints.push(Constraints::default());
self.visibility_constraints = VisibilityConstraintPerBinding::with_capacity(1);
self.visibility_constraints
.push(ScopedVisibilityConstraintId::ALWAYS_TRUE);
self.visibility_constraints.push(visibility_constraint);
}
/// Add given constraint to all live bindings.
@@ -349,9 +352,14 @@ impl SymbolState {
}
/// Record a newly-encountered binding for this symbol.
pub(super) fn record_binding(&mut self, binding_id: ScopedDefinitionId) {
pub(super) fn record_binding(
&mut self,
binding_id: ScopedDefinitionId,
visibility_constraint: ScopedVisibilityConstraintId,
) {
debug_assert_ne!(binding_id, ScopedDefinitionId::UNBOUND);
self.bindings.record_binding(binding_id);
self.bindings
.record_binding(binding_id, visibility_constraint);
}
/// Add given constraint to all live bindings.
@@ -557,7 +565,10 @@ mod tests {
#[test]
fn with() {
let mut sym = SymbolState::undefined(ScopedVisibilityConstraintId::ALWAYS_TRUE);
sym.record_binding(ScopedDefinitionId::from_u32(1));
sym.record_binding(
ScopedDefinitionId::from_u32(1),
ScopedVisibilityConstraintId::ALWAYS_TRUE,
);
assert_bindings(&sym, &["1<>"]);
}
@@ -565,7 +576,10 @@ mod tests {
#[test]
fn record_constraint() {
let mut sym = SymbolState::undefined(ScopedVisibilityConstraintId::ALWAYS_TRUE);
sym.record_binding(ScopedDefinitionId::from_u32(1));
sym.record_binding(
ScopedDefinitionId::from_u32(1),
ScopedVisibilityConstraintId::ALWAYS_TRUE,
);
sym.record_constraint(ScopedConstraintId::from_u32(0));
assert_bindings(&sym, &["1<0>"]);
@@ -577,11 +591,17 @@ mod tests {
// merging the same definition with the same constraint keeps the constraint
let mut sym1a = SymbolState::undefined(ScopedVisibilityConstraintId::ALWAYS_TRUE);
sym1a.record_binding(ScopedDefinitionId::from_u32(1));
sym1a.record_binding(
ScopedDefinitionId::from_u32(1),
ScopedVisibilityConstraintId::ALWAYS_TRUE,
);
sym1a.record_constraint(ScopedConstraintId::from_u32(0));
let mut sym1b = SymbolState::undefined(ScopedVisibilityConstraintId::ALWAYS_TRUE);
sym1b.record_binding(ScopedDefinitionId::from_u32(1));
sym1b.record_binding(
ScopedDefinitionId::from_u32(1),
ScopedVisibilityConstraintId::ALWAYS_TRUE,
);
sym1b.record_constraint(ScopedConstraintId::from_u32(0));
sym1a.merge(sym1b, &mut visibility_constraints);
@@ -590,11 +610,17 @@ mod tests {
// merging the same definition with differing constraints drops all constraints
let mut sym2a = SymbolState::undefined(ScopedVisibilityConstraintId::ALWAYS_TRUE);
sym2a.record_binding(ScopedDefinitionId::from_u32(2));
sym2a.record_binding(
ScopedDefinitionId::from_u32(2),
ScopedVisibilityConstraintId::ALWAYS_TRUE,
);
sym2a.record_constraint(ScopedConstraintId::from_u32(1));
let mut sym1b = SymbolState::undefined(ScopedVisibilityConstraintId::ALWAYS_TRUE);
sym1b.record_binding(ScopedDefinitionId::from_u32(2));
sym1b.record_binding(
ScopedDefinitionId::from_u32(2),
ScopedVisibilityConstraintId::ALWAYS_TRUE,
);
sym1b.record_constraint(ScopedConstraintId::from_u32(2));
sym2a.merge(sym1b, &mut visibility_constraints);
@@ -603,7 +629,10 @@ mod tests {
// merging a constrained definition with unbound keeps both
let mut sym3a = SymbolState::undefined(ScopedVisibilityConstraintId::ALWAYS_TRUE);
sym3a.record_binding(ScopedDefinitionId::from_u32(3));
sym3a.record_binding(
ScopedDefinitionId::from_u32(3),
ScopedVisibilityConstraintId::ALWAYS_TRUE,
);
sym3a.record_constraint(ScopedConstraintId::from_u32(3));
let sym2b = SymbolState::undefined(ScopedVisibilityConstraintId::ALWAYS_TRUE);