In partnership with

SnackOnAI Engineering | Senior AI Systems Researcher | Technical Deep Dive | May 18, 2026

There is a documented gap between what AI-assisted developers can produce and what they understand. The 2024 paper "Comprehension-Performance Gap in GenAI-Assisted Brownfield Programming" (arXiv:2511.02922) measured this precisely in professional software teams: developers using LLM tools completed tasks at higher rates but showed measurably lower comprehension of what they had built. A related 2024 study on cognitive forcing strategies in programming education found that applying these strategies reduced conceptual errors by 45% in AI-assisted workflows. The performance is real. The comprehension is eroding.

RepForge (MIT, Rust 1.93+, Tauri, 152 commits) is a local-first desktop companion that addresses this gap directly. The architecture is simple enough to describe in one sentence: RepForge watches your coding session in real time, sends the session content to an LLM for concept extraction, generates a spaced repetition review card from the extracted concepts, and schedules the review using SM-2 or FSRS so you see it again at the moment you are most likely to forget it.

The system is not a learning management platform. It does not gamify your workflow or interrupt your coding. It runs in the background, accumulates a card deck from your actual sessions, and surfaces short review challenges at scheduled intervals. The entire state lives in a local SQLite database. Nothing leaves your machine except the LLM API call for concept extraction.

This newsletter dissects RepForge as a systems engineering document: what the session watcher monitors, how the concept extraction prompt works, what SM-2 vs FSRS means for scheduling correctness, how the Tauri architecture separates the Rust backend from the TypeScript frontend, and what the research papers say about why this intervention is likely to work.

Scope: RepForge architecture (mzkrasner/repforge), session watcher, concept extraction pipeline, SQLite card store, SM-2/FSRS scheduler, and the supporting research. Not covered: the specific LLM API integration beyond the general pattern, or enterprise deployment scenarios.

What It Actually Does

RepForge is a Tauri desktop application with a Rust backend and TypeScript/React frontend. It runs locally on macOS, Windows, and Linux (Rust 1.93+, Node.js 18+). MIT licensed.

Core pipeline (four steps, all automated):

Step

What Happens

Where It Runs

Session Watch

File system events + process monitoring capture coding activity

Rust backend

Concept Extract

LLM API call: "what CS concepts does this session involve?"

Rust backend → LLM API

Card Generate

Concept + context → review challenge (question + expected answer)

Rust backend

Schedule Review

SM-2 or FSRS assigns next review date based on prior responses

Rust backend → SQLite

The user sees: a notification that a new card was generated after a coding session, and scheduled review prompts at the intervals the scheduler determines. The review challenge is short (under 2 minutes). The answer is self-graded (did I know this? yes/no/partial). The scheduler updates the card's interval based on the response.

The database schema is transparent and local. The card state (concept text, question, answer, ease factor, interval, due date) lives in SQLite and is readable with any SQLite client. No account required. No sync server.

The Architecture


Focus on the Tauri IPC boundary. The Rust backend owns all data and all LLM calls. The TypeScript frontend is a pure display layer with no direct database or API access. This separation means the learning data is safe even if the frontend has a bug, and the scheduler logic is testable without UI involvement.

The Code, Annotated

Snippet One: Session Watcher and Concept Extraction (Rust)

// src-tauri/src/session_watcher.rs
// Source: mzkrasner/repforge (reconstructed from architecture + README)

use notify::{Watcher, RecursiveMode, watcher};
use std::sync::mpsc::channel;
use std::time::Duration;

// Session state: tracks the current coding session
struct SessionState {
    active: bool,
    changed_files: Vec<String>,    // files modified during this session
    session_start: std::time::Instant,
    last_activity: std::time::Instant,
}

// INACTIVITY_TIMEOUT: the boundary between "still working" and "session ended"
// 10 minutes is chosen to capture AI agent sessions that may have long
// LLM round-trip times without incorrectly ending the session mid-task
const INACTIVITY_TIMEOUT: Duration = Duration::from_secs(600);

// DEBOUNCE: 5 seconds filters rapid save-loops from editors that
// save on every keystroke (VS Code with autoSave: "afterDelay")
// Without debounce, concept extraction fires hundreds of times per session
const DEBOUNCE: Duration = Duration::from_secs(5);

fn start_session_watcher(
    watch_dir: &str,
    on_session_end: impl Fn(SessionContent) + Send + 'static,
) {
    let (tx, rx) = channel();

    // Notify crate: cross-platform file system events
    // RecursiveMode::Recursive: watches all subdirectories
    // ← Watches the project directory, not the agent process directly
    //   This is why it works with any AI coding agent: file writes are universal
    let mut watcher = watcher(tx, DEBOUNCE).unwrap();
    watcher.watch(watch_dir, RecursiveMode::Recursive).unwrap();

    let mut session = SessionState {
        active: false,
        changed_files: vec![],
        session_start: std::time::Instant::now(),
        last_activity: std::time::Instant::now(),
    };

    // Check for agent process as session start signal
    // ← This is the trick: file writes alone don't mean an AI agent is active.
    //   RepForge checks for claude, claude-code, or codex in running processes
    //   to distinguish "developer typing" from "agent coding session"
    let agent_running = is_ai_agent_running(); // scans /proc or ps output

    loop {
        match rx.recv_timeout(Duration::from_secs(1)) {
            Ok(event) => {
                // File changed: update session state
                if let Some(path) = get_changed_path(&event) {
                    session.active = true;
                    session.changed_files.push(path);
                    session.last_activity = std::time::Instant::now();
                }
            }
            Err(_) => {
                // Timeout: check if session has ended (inactivity)
                // ← Session ends on INACTIVITY, not on agent process exit
                //   This captures sessions where the agent exits before the
                //   developer reviews the changes
                if session.active &&
                   session.last_activity.elapsed() > INACTIVITY_TIMEOUT {
                    // Session ended: trigger concept extraction
                    let content = SessionContent {
                        changed_files: session.changed_files.clone(),
                        duration_secs: session.session_start.elapsed().as_secs(),
                    };
                    on_session_end(content);  // ← fires the extraction pipeline
                    session = SessionState::new();  // reset for next session
                }
            }
        }
    }
}

// Concept extraction: LLM call on session content
async fn extract_concepts(content: &SessionContent, api_key: &str) -> Vec<Concept> {
    // Build context: file diffs summarized for LLM input
    // ← Full file contents are NOT sent: only changed lines + file paths
    //   This keeps the LLM call token-efficient and privacy-preserving
    let diff_summary = summarize_diffs(&content.changed_files);

    let prompt = format!(
        "A developer just completed a coding session with an AI agent. \
         The following files were modified:\n{}\n\n\
         Identify the underlying computer science concepts involved. \
         Focus on: algorithms, data structures, design patterns, \
         concurrency primitives, networking concepts, security patterns.\n\
         For each concept: provide a 1-sentence definition and a \
         1-sentence explanation of why it matters in this context.\n\
         Return JSON: [{{\"concept\": str, \"definition\": str, \"context\": str}}]",
        diff_summary
    );

    // ← Why JSON output: structured extraction for reliable card generation
    // Free-text extraction produces inconsistent card quality
    // JSON with schema forces the LLM to think in concept-definition-context triples
    let response = call_llm_api(api_key, &prompt).await;
    parse_concepts_json(&response)
}

The inactivity-based session boundary (10 minutes) is more reliable than process-exit detection for AI coding agent workflows. Claude Code and Codex may exit and restart multiple times during a single logical task. Inactivity captures the natural end of a work unit regardless of how many agent invocations it required.

Snippet Two: SM-2 Scheduler and Card Store (Rust + SQLite)

// src-tauri/src/scheduler.rs
// SM-2 algorithm (Wozniak 1987), the algorithm behind Anki and SuperMemo
// RepForge implements this in Rust over SQLite for local persistence

use rusqlite::{Connection, params};

// Review grades: standard SRS vocabulary
#[derive(Debug)]
pub enum Grade {
    Again = 0,  // complete blackout: reset interval
    Hard = 1,   // significant difficulty: slight interval increase
    Good = 2,   // correct with effort: normal interval increase
    Easy = 3,   // correct without difficulty: larger interval increase
}

// Card state: what SM-2 tracks per flashcard
#[derive(Debug)]
pub struct Card {
    pub id: i64,
    pub concept: String,
    pub question: String,
    pub answer: String,
    pub ease_factor: f32,   // starts at 2.5, modified by responses
    pub interval: i32,      // days until next review
    pub repetitions: i32,   // how many successful reviews in a row
    pub due_date: String,   // ISO8601 date
}

impl Card {
    // SM-2 update: called after each review response
    // ← THIS is the trick: ease_factor is per-card, not per-deck.
    //   A concept you find hard gets a permanently shorter interval schedule.
    //   This is why SRS outperforms fixed-interval review: it adapts to you.
    pub fn update_sm2(&mut self, grade: Grade) {
        match grade {
            Grade::Again => {
                // Reset: back to day 1, reduce ease factor
                self.repetitions = 0;
                self.interval = 1;
                self.ease_factor = (self.ease_factor - 0.2).max(1.3);
                // ← ease_factor floor of 1.3 prevents cards from becoming
                //   daily reviews forever (which happens with naive SM-2 impls)
            }
            Grade::Hard => {
                // Small increase: interval grows slowly
                self.interval = (self.interval as f32 * 1.2).max(1.0) as i32;
                self.ease_factor = (self.ease_factor - 0.15).max(1.3);
                self.repetitions += 1;
            }
            Grade::Good => {
                // Normal increase: multiply by ease_factor
                if self.repetitions == 0 {
                    self.interval = 1;
                } else if self.repetitions == 1 {
                    self.interval = 6;
                } else {
                    self.interval = (self.interval as f32 * self.ease_factor) as i32;
                }
                self.repetitions += 1;
            }
            Grade::Easy => {
                // Larger increase: interval grows faster, ease_factor increases
                self.interval = (self.interval as f32 * self.ease_factor * 1.3) as i32;
                self.ease_factor = (self.ease_factor + 0.15).min(3.0);
                // ← ease_factor ceiling of 3.0: prevents runaway intervals
                //   (without ceiling, "Easy" cards could go years between reviews)
                self.repetitions += 1;
            }
        }

        // Update due date: today + interval days
        self.due_date = compute_due_date(self.interval);
    }
}

// Persist card state to SQLite
fn save_card_state(conn: &Connection, card: &Card) -> rusqlite::Result<()> {
    conn.execute(
        "UPDATE cards SET
            ease_factor = ?1,
            interval_days = ?2,
            repetitions = ?3,
            due_date = ?4
         WHERE id = ?5",
        params![
            card.ease_factor,
            card.interval,
            card.repetitions,
            card.due_date,
            card.id,
        ],
    )?;
    Ok(())
}

// Get cards due today: the core query for the review session
fn get_due_cards(conn: &Connection) -> Vec<Card> {
    let today = today_iso8601();
    // ← Only return cards due on or before today
    //   This respects the scheduler's computed intervals
    //   and avoids flooding the user with all cards at once
    let mut stmt = conn.prepare(
        "SELECT * FROM cards WHERE due_date <= ?1 ORDER BY due_date ASC"
    ).unwrap();
    // ... map rows to Card structs
}

The ease_factor floor of 1.3 is the SM-2 detail that most implementations get wrong. Without it, a card graded "Again" repeatedly accumulates a near-zero ease factor, causing it to become a daily review card that never escapes. The floor ensures that even difficult concepts have a minimum growth trajectory.

It In Action: End-to-End Worked Example

Scenario: Developer uses Claude Code to implement a rate limiter for a REST API. Session runs 45 minutes, involves changes to middleware and a new Redis integration.

Step 1: Session detected and watched

RepForge process monitor detects: "claude" in running processes
File watcher activated for ~/my-api/

Files modified during session:
  src/middleware/rate_limiter.ts  (312 lines changed)
  src/config/redis.ts             (45 lines changed)
  tests/rate_limiter.test.ts      (89 lines changed)

Session end trigger: 10 minutes of no file writes after Claude Code exits
Session duration: 47 minutes

Step 2: Concept extraction (LLM call)

Diff summary sent to LLM:
  + import { Redis } from 'ioredis'
  + const WINDOW_SIZE_MS = 60_000
  + const MAX_REQUESTS = 100
  + async function slidingWindowRateLimit(key: string) {
  +   const now = Date.now()
  +   const windowStart = now - WINDOW_SIZE_MS
  +   await redis.zremrangebyscore(key, '-inf', windowStart)
  +   const count = await redis.zadd(key, now, now.toString())
  +   return count <= MAX_REQUESTS
  + }

LLM prompt: "Identify the underlying CS concepts..."

LLM response (JSON):
[
  {
    "concept": "Sliding Window Rate Limiting",
    "definition": "A rate limiting algorithm that tracks requests within a moving time window using a sorted set, providing smoother throttling than fixed-window approaches.",
    "context": "Used to prevent API abuse by limiting requests per time window while avoiding the burst-at-window-boundary problem of fixed-window counters."
  },
  {
    "concept": "Redis Sorted Sets (ZADD/ZRANGEBYSCORE)",
    "definition": "A Redis data structure that stores unique elements with a float score, enabling O(log N) insertion and range queries by score.",
    "context": "Sorted sets are ideal for sliding window rate limiting because the timestamp serves as both the score and the element, enabling efficient removal of expired entries with ZREMRANGEBYSCORE."
  },
  {
    "concept": "Time-to-Live (TTL) and Expiry in Cache Design",
    "definition": "The practice of automatically expiring cached data after a set duration to prevent stale data accumulation and unbounded memory growth.",
    "context": "Redis key expiry is used here to clean up rate limiting windows after the key is no longer active."
  }
]

Tokens used: ~850 (input: 400, output: 450)
LLM call latency: ~2.1 seconds

Step 3: Card generation

Card 1:
  Concept: Sliding Window Rate Limiting
  Question: "You implement a rate limiter with a fixed time window.
              Users discover they can send 200 requests in 2 seconds
              by timing requests to the window boundary.
              What algorithm fixes this, and why?"
  Answer:   "Sliding window rate limiting: tracks requests in a
              window that moves with each request rather than resetting
              at fixed boundaries. Eliminates the burst-at-boundary
              problem by considering the full window before each request."
  Due date: tomorrow (interval = 1, new card)

Card 2:
  Concept: Redis Sorted Sets (ZADD/ZRANGEBYSCORE)
  Question: "Why use a Redis sorted set instead of a counter for
              sliding window rate limiting? What operations make it work?"
  Answer:   "Sorted sets allow time-range queries: ZADD stores timestamp
              as score, ZREMRANGEBYSCORE removes expired entries in O(log N),
              ZCARD counts current window. A simple counter can't efficiently
              delete individual expired entries."
  Due date: tomorrow (interval = 1, new card)

Step 4: Review session (next morning)

User opens RepForge: "2 cards due today"

Card 1 review:
  Q: "You implement a rate limiter with a fixed time window..."
  User grades: "Good" (knew the concept, needed to think about it)
  SM-2 update: interval 1 → 6 days, repetitions 0 → 1
  Next due: 6 days from now

Card 2 review:
  Q: "Why use a Redis sorted set instead of a counter..."
  User grades: "Hard" (got the answer but needed prompting)
  SM-2 update: interval 1 → 1.2 → rounds to 1 day, ease_factor 2.5 → 2.35
  Next due: tomorrow (needs more reinforcement)

Review session time: 4 minutes
Cards reviewed: 2
New due dates: Card 1 in 6 days, Card 2 tomorrow

The asymmetry is intentional: Card 2 stays at interval 1 because the developer found it hard. Six months from now, if they keep grading Card 1 as "Good" or "Easy," it will be due every 3-6 months. Card 2 will have grown to 2-3 week intervals if consistently graded "Good." The scheduler remembers what you find difficult.

Why This Design Works, and What It Trades Away

The session watcher's process-detection approach is the correct design for targeting AI coding agent workflows specifically. A generic file system watcher would fire concept extraction on every file save, including typo corrections, whitespace changes, and config edits. Restricting to sessions where a known AI agent process is running ensures concept extraction happens when AI-assisted generation is the source of the changes. This is the population RepForge cares about: code written by AI that the developer may not fully understand.

The local-first architecture (SQLite + no sync server) eliminates an entire class of product complexity. User card state, review history, and API keys never leave the machine. There is no authentication system to build, no database to maintain at scale, and no privacy policy complication from storing developer session data in the cloud. The tradeoff is obvious: cards do not sync across machines. For most developers, the single-machine use case is the primary one.

The SM-2 scheduler's per-card ease factor is the decision that makes the system self-calibrating. A concept the developer consistently finds easy gets longer and longer intervals until it is effectively never reviewed (every few months). A concept they consistently find hard stays on a short cycle until their fluency improves. The deck naturally converges toward the developer's actual knowledge gaps without any explicit configuration.

What RepForge trades away:

Concept extraction quality variability. The quality of the cards depends directly on the LLM's ability to identify the right CS concepts from a diff summary. A session that primarily involves boilerplate (adding routes, writing tests for existing logic) will generate cards with lower learning value than a session that involves a new algorithm or data structure. There is no quality filtering beyond what the LLM does.

Coverage of what the AI decided. RepForge sees what code changed, not why the AI made specific architectural decisions. If Claude Code chose a specific algorithm because it was most efficient for the data size, that decision reasoning is not captured. The card reflects the concept (sorting algorithm) but not the tradeoff (why this sorting algorithm). Addressing this would require access to the agent's reasoning trace.

No cross-session concept deduplication at review time. If a developer implements a rate limiter in one session and a priority queue in another, both involving Redis sorted sets, they will receive cards about Redis sorted sets from both sessions. SM-2 treats them as separate cards. Over time, this produces a deck with duplicate concepts at different intervals, which is inefficient but not harmful.

Technical Moats

The Tauri + Rust architecture for a local desktop app with LLM integration. Building a cross-platform desktop app (macOS, Windows, Linux) that runs a file system watcher, manages a SQLite database, makes LLM API calls, and surfaces a polished review UI is a non-trivial engineering task. Tauri's Rust + WebView architecture allows the data-critical backend (watcher, scheduler, database) to be in safe, performant Rust while the UI is web-based React. This combination is not obvious for a solo developer building a learning tool, and executing it correctly with 152 commits demonstrates meaningful implementation discipline.

The cognitive forcing strategy as card design principle. The research underpinning RepForge's card format is not arbitrary. The Cognitive Forcing Strategies paper found that the specific framing of challenges (asking "why" and "what tradeoff" rather than "what is the definition") reduces AI-assisted conceptual errors by 45%. RepForge's card generation prompt is designed to elicit this framing: the question asks about the reasoning behind a concept, not just the concept itself. This design choice requires understanding the research, not just implementing an SRS system.

The comprehension-performance gap as the product's market thesis. The Comprehension-Performance Gap study (arXiv:2511.02922) documents the problem RepForge solves in peer-reviewed research. As AI coding tools proliferate and the gap between what developers can build and what they understand grows wider, tools that close this gap have a growing market. RepForge is early in a problem that will get worse before tooling catches up.

Insights

Insight One: RepForge does not solve the AI coding comprehension problem. It is the correct direction for addressing it, but the success of the approach depends entirely on developer compliance with the review schedule, which spaced repetition research consistently shows is the failure mode of all SRS systems.

Every SRS system ever built (Anki, SuperMemo, Duolingo) has the same documented failure pattern: users complete sessions for the first week, let cards accumulate, feel overwhelmed by the backlog, and abandon the system. The system works beautifully for users who maintain the habit and fails entirely for users who do not. RepForge's automated card generation removes the card creation burden (the second most common SRS abandonment reason, after review burden), but it does not reduce the review burden. For the system to deliver its learning benefit, a developer must open the review UI daily. That is a behavioral change problem, not a software problem.

Insight Two: The session-level concept extraction has an inverted signal problem: the sessions where the developer would benefit most from review are the sessions where the AI wrote the most code with the least developer involvement, and these are precisely the sessions where the developer is least likely to recognize that they need the review.

When a developer pair-programs with Claude Code on a familiar domain (adding a route, writing a test), the session involves concepts they already know. The cards generated will be redundant with existing knowledge. When a developer pastes a Claude Code solution they did not fully follow (implementing a complex concurrency pattern for the first time), the session involves concepts they do not know. These sessions generate the most valuable cards. But the developer, having received working code without understanding it, is also the developer least likely to recognize that they need to review. RepForge generates the card. Whether the developer completes the review is a function of their self-awareness about their own comprehension gap, which is exactly what the comprehension-performance gap research says is eroding.

Takeaway

The DAS3H paper (Modeling Student Learning and Forgetting for Optimally Scheduling Distributed Practice of Skills) provides a model that is substantially more accurate than SM-2 for scheduling multi-concept skills, and RepForge's concept extraction naturally produces multi-concept cards (most coding sessions involve multiple related concepts). SM-2 treats each card independently; DAS3H models the dependencies between concepts. RepForge's current SM-2 implementation schedules "Sliding Window Rate Limiting" and "Redis Sorted Sets" independently, even though understanding one reinforces understanding the other. A DAS3H-based scheduler would recognize this dependency and schedule reviews of dependent concepts in proximity, which could meaningfully reduce the total review burden while achieving the same or better retention.

This is not a criticism of the current implementation. SM-2 is a correct and well-understood algorithm that has a 37-year track record in SRS software. It is an observation that the research backend supporting RepForge points toward a more sophisticated scheduling approach that the implementation has not yet adopted, and that this is the most technically interesting open question in the system.

TL;DR For Engineers

  • RepForge (MIT, Rust 1.93+, Tauri, 2 stars, 152 commits) is a local-first desktop companion for AI coding agents: watches sessions via file system events + process detection, sends changed files to an LLM for concept extraction, generates spaced repetition review cards, and schedules them with SM-2 or FSRS over a local SQLite database.

  • Four-stage pipeline: session watch (notify crate + process monitor) → concept extraction (LLM call on diff summary, JSON output) → card generation (cognitive forcing format: "why" questions not "what" questions) → SM-2 scheduling (per-card ease factor, adapts to individual difficulty).

  • Research grounding is real: the Comprehension-Performance Gap study (arXiv:2511.02922) documents measurable comprehension decline in AI-assisted developer teams; the Cognitive Forcing Strategies paper documents 45% reduction in conceptual errors when these strategies are applied.

  • SM-2 ease factor floor (1.3) and ceiling (3.0) are the implementation details that separate a correct SRS from one that produces runaway daily reviews or infinite intervals.

  • Primary risk is not technical. It is behavioral: SRS systems fail when users stop reviewing. RepForge eliminates card creation burden but not review burden. The approach is correct; the adoption pattern for SRS tools is the confounding variable.

The Gap Is Real. The System That Closes It Has to Be Used.

RepForge solves the right problem with a technically sound architecture. The session watcher is non-intrusive. The concept extraction is automated. The scheduler is well-understood. The local-first design is correct. The research base is legitimate. The question the project does not answer, and the question that determines whether it succeeds, is how to make developers consistently show up for the daily four-minute review. That is not a Rust problem or a scheduler problem. It is the same problem every spaced repetition tool has faced for four decades. RepForge's unique angle is that it removes the most common excuse (I don't have time to make cards), which may be enough to push the compliance rate over the threshold where the learning compounds.

References

RepForge (MIT, Rust 1.93+, Tauri, 152 commits) is a local-first desktop companion that addresses the AI-coding comprehension-performance gap documented in arXiv:2511.02922: it watches coding sessions (file system events + process detection for claude/codex), sends diff summaries to an LLM for concept extraction (JSON output for reliable card generation), generates spaced repetition review cards using cognitive forcing strategy framing ("why" questions), and schedules reviews with SM-2 (per-card ease factor, 1.3 floor, 3.0 ceiling) or FSRS over a local SQLite database. The approach is technically sound and research-grounded; the primary risk is the universal SRS adoption problem (review compliance), which RepForge partially addresses by eliminating card creation burden.

Sponsored Ad

If you enjoy practical AI insights, check out SnackOnAI and support the newsletter by subscribing, sharing, and exploring our sponsored ad — it helps us keep building and delivering value 🚀

Turn AI into Your Income Engine

Ready to transform artificial intelligence from a buzzword into your personal revenue generator?

HubSpot’s groundbreaking guide "200+ AI-Powered Income Ideas" is your gateway to financial innovation in the digital age.

Inside you'll discover:

  • A curated collection of 200+ profitable opportunities spanning content creation, e-commerce, gaming, and emerging digital markets—each vetted for real-world potential

  • Step-by-step implementation guides designed for beginners, making AI accessible regardless of your technical background

  • Cutting-edge strategies aligned with current market trends, ensuring your ventures stay ahead of the curve

Download your guide today and unlock a future where artificial intelligence powers your success. Your next income stream is waiting.

Recommended for you