Skip to main content

PLAN: code-health + tech-debt audit for OSS release

Read-only inventory of the sqlink codebase before public release. Scope: core/ cli/ host/ sqlink-httpd/ sqlite-lib/ sqlite-cas-cache/ sqlite-pcache-tvm/ sqlite-mem-tvm/ sqlite-vfs-tvm/ sqlite-embed/ plus top-level extensions/ Rust. Excludes target/, generated WIT bindings, and extensions/_shared-target/.

Audit date: 2026-06-22, branch main at cc6924f (PLAN reorg). All findings are file:line citations — no code changes proposed inline.


Headline finding (the one thing)

host/src/lib.rs is 10,189 lines. It owns ~32% of all Rust LOC in the project, holds 91 of the 252 .unwrap()/.expect()/panic!() sites (36%), 125 of 271 .clone()s (46%), and contains 5 of the 10 longest functions in the codebase. Every other quality finding in this report is dominated by this one file. Splitting it into ~8 sub-modules (the WIT-world impl blocks already form natural seams — see §5) is the single highest-leverage change in this audit. Size L (3-5 days). Impact: makes every subsequent refactor in this file ~3x cheaper; opens the door to extracting vtab.rs-style siblings for sessions / spi / dispatch / blob-cache.


One-page summary — top refactors by leverage

#RefactorSizeImpact
1Split host/src/lib.rs (10k LOC) into per-world impl filesLOpens the path to every other host refactor; internal but unblocks reviewers
2Add CI jobs for cli, sqlink-httpd, sqlite-lib, and the wasm32-wasip2 buildMEvery user — currently CI only tests host + cache + compose
3Replace 32× expect("ensured open") with a with_spi_conn(|c| ...) helper (like the L2a with_user_conn shape we just shipped)SEliminates an entire class of panic-on-invariant in host's hot path; internal cleanup
4Sweep top 10 longest functions in host/src/lib.rs + cli/src/lib.rs — they range 90-630 lines; most have clean internal seamsMInternal; review-quality boost
5Audit + consolidate the 303 .map_err(|e|...) boilerplate sites with a From<E> impl or thiserror crate-wide error typesMInternal; cleanup
6extensions/: 229 crates × ~3 file Cargo.toml — feature audit + Cargo.lock cleanup passMOSS users see crate count; relevant for crates.io publish story
7Cli's do_load (378 lines) + do_cache (248) + do_compose (212) need extraction — each is a state machine masquerading as a fnMInternal review-quality
8sqlink-httpd, sqlite-lib, sqlite-embed, cli, core: zero CI test coverage today (see §9)MEvery user
9Workspace dep deduplication — serde, serde_json, tokio declared 4-5× with subtly-different version strings ("1" vs "1.0")SBuild hygiene
10Top-level host/Cargo.toml is 121 lines with 89 embed-* features — extract feature wiring into a generated file from the survey DBSBuild hygiene

§1 — TODO/FIXME/XXX/HACK census

Real action items (the only one):

  • host/src/lib.rs:4128// TODO: gate by policy.fs once a filesystem capability lands — relevant for OSS. Should be dispositioned: ship the filesystem capability OR drop the path that this comment guards. Needs a one-paragraph plan; track as a follow-up.

False positives (XXX appearing as a literal in extension data strings, not markers):

  • extensions/ssn/src/lib.rs:89 + src/embed.rs:69XXX-XX-{last4} is the literal SSN-mask format string.
  • extensions/bic/src/embed.rs:30,87,92, extensions/bic/src/lib.rs:53,169,174"XXX" is the BIC code for "primary office" per ISO 9362.
  • extensions/core-dotcmd/src/lib.rs:156"Run Time: real X.XXX" is a .timer output format string.
  • cli/src/settings.rs:61 — same Run Time: real X.XXX string in .timer docs.

Verdict: the codebase is remarkably clean on this front — 1 real TODO, 9 false-positive grep hits.


§2 — Panic audit

Distribution of .unwrap() / .expect(...) / panic!() (excluding test files, excluding target/):

CrateCountNote
host160by far the worst; 91 in lib.rs alone
core64mostly test-helper paths inside db.rs; 4 real panic!()
sqlite-vfs-tvm12almost entirely test-only
sqlite-pcache-tvm9mostly test-only
sqlite-lib4needs inspection
sqlink-httpd3low
cli0clean (already returns Strings from every dot-cmd)
sqlite-cas-cache, sqlite-mem-tvm, sqlite-embed0clean

Worst offenders — top 20 in host/src/lib.rs:

LinePatternVerdict
2907, 2937, 2961, 2977, 2988, 3026, 3049, 3060, 3080, 3096, 3111, 3277r.as_ref().expect("ensured open")(a) invariantshared_spi_ensure_open(self.host)? is called immediately before. Safe but verbose. Worth a closure helper (see §summary #3).
4474self.entries.remove(pos).unwrap()(a) invariantpos came from a same-thread iter().position() 4 lines up. Safe.
4814g.as_ref().map(|(_, c)| c).expect("just-opened")(a) invariant — the L2a with_user_conn just-opened the connection. Safe.
5811, 5921, 5937, 5946, 5958, 5968guard.as_mut().unwrap()(a) invariantstateful_locked returns Result on miss; the Option-unwrap inside is post-?. Worth .as_mut().ok_or_else(...) for clarity.

panic!() sites — 10 total across non-test code:

File:LineCommentVerdict
core/src/db.rs:1903_ => panic!("expected row")inside #[test] — safe
core/src/db.rs:2075,2093,2129let Value::Integer(n) = ... else { panic!() }inside #[test] — safe
host/src/compose_provider.rs:487,511,528_ => panic!() inside convert_* value variants(b) potential bug — these should be Err(InvalidVariant) rather than panic. Reachable via malformed WIT input. Needs deeper look.
host/src/cache.rs:338StepResult::Done => panic!("no row")inside #[cfg(test)] — safe
sqlite-pcache-tvm/src/cache.rs:621,625panic!("page 1 should be in shadow ...")inside #[test] — safe

Bottom line: ~95% of panic-sites in this codebase are invariant-driven and not real bugs. The 3 compose_provider.rs sites are the only real flags; the rest of the cleanup is ergonomic (shorter, intention-revealing helpers vs raw expect).


§3 — Dead code

Native build (cargo build --release): 0 dead-code warnings.

Wasm build (cargo build -p sqlite-cli --target wasm32-wasip2 --release):

SymbolLocationVerdict
use bindings::sqlite::extension::types::SqlValuecli/src/lib.rs:783delete — local import, never referenced. Likely a leftover from the Stage 5 cleanup.
fn log_eventcli/src/lib.rs:910delete — function defined, never called. Likely a leftover from .log migration.
Err(String) variantcli/src/dot.rs:39 (FetchResult)needs deeper look — possibly used only via Display formatting; check before deleting.
priority: i64cli/src/sqlink_registry.rs:41 (ResolverRow)needs deeper look — could be a column we read from SQL but never bind in code. If so, swap for _priority or drop the field.

#[allow(dead_code)] annotations — 3 sites:

LocationVerdict
host/src/lib.rs:5248needs inspection (single line, no surrounding fn shown in summary)
sqlite-lib/src/lib.rs:790 (pub fn _touch)intentional — leading underscore marks it as a type-checker pin
sqlite-pcache-tvm/src/cache.rs:143needs inspection

§4 — Code-smell patterns

Functions > 100 lines

host/src/lib.rs top 10:

LinesStartFunction
6302228unsafe fn register_host_embedded_extensions — auto-generated dispatch over 89 embed-* features; leave alone (mechanical, code-genny shape)
326747fn manifest_for_ext — 4-deep nested transform of WIT manifests into bindings. Split candidate.
2584173fn refresh_call_budget — needs deeper look
1915580async fn register_component — split into validate / instantiate / store phases
1325865pub async fn dispatch_scalar — already has 3-engine Store routing; manageable
1324640pub fn new (Host::new) — long constructor, split candidate
971232async fn resolve — fine
945771pub async fn dispatch_dot_command — fine
901124async fn handle (http) — fine
881558unsafe fn register_host_dot_command_function — fine (FFI shape)

cli/src/lib.rs top 10:

LinesFunction
378do_load — state-machine; pre-flight / trust gate / TOFU / register-N-things / format. Split into stages.
248do_cache — multi-subcommand dispatch (stats / gc / verify); each subcommand is a function in disguise
212embed_core_dotcmd — startup auto-embed; mostly include_bytes! glue, fine as-is
172eval_input — main dispatch loop; touching this is risky. Leave for last.
146build_cli_state_snapshot — splits cleanly by namespace (general / params / conn)
135run — the main loop; split candidate
104do_compose — manageable
90is_statement_complete — sqlite3_complete replacement; intentional shape

Files > 1500 lines

LOCFileVerdict
10,189host/src/lib.rsSplit — the headline finding
2,833cli/src/lib.rsExtract dot-cmd fn do_* helpers into cli/src/dotcmds/*.rs
2,201core/src/db.rsCould split into connection.rs / stmt.rs / vfs.rs / aggregate.rs
1,544host/src/vtab.rsAlready a sub-module of host; could split sqlite3_module trampolines from registry
1,445sqlite-embed/src/lib.rsTight, mostly typed wrappers; leave

Boxed closures

Only 2 sites total — core/src/db.rs:287,551 for the set_stmt_trace trampoline. Not hot paths. Leave alone.

String::from(format!())

Zero occurrences in scope. Clean.

.map_err(|e| ...) boilerplate

303 occurrences across the codebase. Top files:

CountFile
132host/src/lib.rs (anyhow conversion mostly)
22host/src/main.rs
19host/src/component_blob_cache.rs
6sqlink-httpd/src/tls.rs
3sqlink-httpd/src/main.rs
2sqlink-httpd/src/wasm.rs
1host/src/cache.rs

Most are .map_err(|e| anyhow!("...: {e}")). A crate-level thiserror-style error type with #[from] impls would eliminate ~80% of these mechanically. Size M (1-3 days).

.clone() heatmap

CountFile
125host/src/lib.rs
18cli/src/lib.rs
12sqlite-lib/src/lib.rs
12sqlink-httpd/src/main.rs
11host/tests/load.rs (test code)
10host/src/main.rs
7sqlink-httpd/src/router.rs
7host/src/compose_provider.rs
6cli/src/orchestration.rs
5sqlink-httpd/src/wasm.rs

The 125 in host/src/lib.rs are mostly name.clone(), ext_name.clone(), path.clone() — strings cloned to pass into tokio::spawn or to use after a borrow. Could be reduced ~30% with Arc<str> for the most-cloned identifiers (extension name, db_path). Size M; perf gain is small (these aren't hot paths) but the readability gain is real.


§5 — Module bloat — where host/src/lib.rs could split

The file already has natural seams via impl blocks. Suggested split:

Proposed fileContainsLOC estimate
host/src/policy.rs (exists, expand)LoadedState::http, dns impls~300
host/src/spi_impl.rsloaded::sqlite::extension::spi::Host, bindings::...::spi::Host, with_user_conn + helpers, the 32 expect("ensured open") sites~1500
host/src/aggregate.rsHostLoadedAggregate + dispatch_aggregate_* impls~600
host/src/dot_dispatch.rsdispatch_dot_command + sync_dispatch_dot_command + the dot_command() SQL fn registration~400
host/src/session_impl.rssession::Host impl~250
host/src/component_load.rsregister_component, load_extension_from_*, resolve_uri_to_bytes~1200
host/src/manifest.rsmanifest_for_ext + WIT manifest conversions~500
host/src/dispatch.rsdispatch_scalar / aggregate_* / collation / authorize / on_update / on_commit / on_rollback / vtab_*~2500
host/src/lib.rs (residue)Host struct, Host::new, bindings! macro, world declarations, top-level entry points~1500

That's a deliberate ~5-day refactor that the existing test suite should cover, but it touches every line in the file. Best done as one PR with reviewer pre-approval; don't sprinkle across many small commits.


§6 — Cargo.toml audit

Workspace declared deps with subtle drift

DepVariants seen
tokiotokio = { version = "1", features = [...] } (2 different feature lists, host vs httpd)
serdeversion = "1.0" (host) vs version = "1" (httpd, handlers)
serde_json"1.0" (host) vs "1" (4 other places)

Fix: workspace.dependencies = { ... } at root, all members { workspace = true }. Size S.

host/Cargo.toml is 121 lines, 89 embed-* features

All 89 are wired into register_host_embedded_extensions (verified by grep — 0 unused). The feature list could be auto-generated from the survey DB (provenance/extensions.db) instead of hand-maintained, but that's a chore not a blocker. Size S, low priority.

Workspace member list

Cargo.toml lists 20 workspace members. 9 extensions are workspace members; the other 220 are standalone with their own [workspace] block. Mixing both modes works but means cargo build --workspace from the root only builds the listed ones. Probably right (the 220 standalone are released individually) but worth documenting in CONTRIBUTING.

Per-crate dep audit (would each crate compile if I removed deps not used by use)?

Not done in this pass — needs cargo machete or cargo-udeps. Note as a follow-up; estimate ~1-2 unused deps per crate × 10 crates = 10-20 trims, size S total.


§7 — Bonus findings

Test coverage gap

CrateIntegration testsInline #[test] filesTested in CI?
host123
sqlite-cas-cache40✓ (cache-tests job)
sqlite-pcache-tvm11
sqlite-mem-tvm11
sqlite-vfs-tvm12
core01
cli00
sqlink-httpd00
sqlite-lib00
sqlite-embed00

CI runs only 3 jobs (host-checks, cache-tests, compose-tests). No CI job builds the wasm32-wasip2 cli or any extension. Adding:

  • A wasm32-wasip2-build job that does cargo build -p sqlite-cli --target wasm32-wasip2 --release and verifies the component-encoding step (wasm-tools component new)
  • A cli-smokes job that runs the existing examples/sqlite-utils-tour.sql through the built cli
  • A sqlite-lib-tests job (lib smoke + compose test)

…would cover the major surface gaps. Size M (1-2 days).

Recently surfaced issues from this session worth re-checking

  • host/src/lib.rs line 4128 TODO about filesystem capability gating.
  • 3 panic!() sites in host/src/compose_provider.rs (lines 487, 511, 528) — see §2.
  • Err(String) variant in cli/src/dot.rs:39 and priority: i64 in cli/src/sqlink_registry.rs:41 — see §3.

Not investigated this pass (would each take >30min)

  • Cargo [features] cross-graph audit (which features force which deps, and are any cycles introduced)
  • Async borrow patterns (Arc<RwLock> vs Arc<Mutex> vs parking_lot choices) in host — there are at least 3 styles co-existing
  • core::db::Connection lifetime / Sync+Send story — the ReentrantMutex<RefCell<Option<Connection>>> is novel; deserves a design-note callout in host/SPI.md
  • License/attribution audit for embedded SQLite code under deps/
  • Auditing the 229 extensions for anything sensitive (URLs to private services, etc.) — sister forks have touched this for tegmentum but a fresh pass is worth it

Sequencing recommendation

A reasonable order to ship these (each step independently verifiable, each lands as its own PR):

  1. §3 dead-code cleanup (a few minutes; just delete the 2-4 unused things, gives a clean baseline)
  2. §9 workspace dep deduplication (size S)
  3. CI expansion: add 3 missing jobs from §7 (size M, foundational — lets every subsequent change get tested)
  4. §1 — disposition the L4128 TODO (decide ship-vs-drop)
  5. §2 — the 3 compose_provider.rs panics → Result (size S, real bug fix)
  6. §3 / with_spi_conn helper (size S, eliminates 32 expects)
  7. host/src/lib.rs split (size L, the headline finding)
  8. §4 — top 5 long functions in cli + host (size M)
  9. §5 — .map_err consolidation via thiserror (size M, ergonomic)
  10. §6 — host embed- feature generator* (size S, optional)

Items 1-3 are the launch blockers; everything else is post-OSS-launch polish.