The reason this is architecturally interesting is not privacy marketing. It is that the engineering choices required to make full-stack AI run offline on consumer hardware produce a genuinely different system from the cloud-first alternatives, and those differences are worth understanding precisely.
SnackOnAI Engineering | Senior AI Systems Researcher | Technical Deep Dive | July 04, 2026
Meeting AI tools have a compliance problem nobody discusses cleanly. The IBM 2024 Cost of a Data Breach report puts the average breach cost at $4.4 million. European regulators have issued €5.88 billion in GDPR fines through 2025. California has seen 400+ unlawful recording cases in a single year. Meeting recordings are high-density sensitive data: they contain names, financial figures, personnel discussions, legal strategy, and anything else said in a meeting. Routing them through a cloud transcription API creates a data residency and consent problem for every jurisdiction with data protection laws.
The standard response from cloud-first tools is contractual: sign a data processing agreement, claim SOC 2 compliance, point to Terms of Service. Meetily's response is architectural: the data never leaves the device. You cannot breach data that was never transmitted.
Scope: Meetily's five-component Rust/Tauri architecture, the Whisper and Parakeet transcription engines and why Parakeet is 4x faster, the Ollama summarization pipeline and its prompt structure, the SQLite storage design, and the privacy architecture. References to the Whisper paper (arXiv:2212.04356) for the transcription foundation, and the Microsoft meeting recap research (arXiv:2307.15793) for the summarization design context. Not covered: Meetily PRO features (speaker diarization, advanced export) or the cloud LLM integrations beyond brief mention.
What It Actually Does
Meetily captures your microphone and system audio simultaneously, transcribes in real time, stores everything in a local SQLite database, and generates summaries using a local LLM via Ollama or a remote endpoint of your choice. The UI runs in Next.js; the heavy lifting runs in Rust. No subscription required for the core features. No data leaves your machine unless you point the summary engine at a cloud endpoint.
Supported platforms and GPU acceleration:
Platform | Transcription acceleration |
|---|---|
macOS (Apple Silicon) | Metal + CoreML |
macOS (Intel) | Core ML, CPU fallback |
Windows / Linux (NVIDIA) | CUDA |
Windows / Linux (AMD / Intel) | Vulkan |
Supported summary backends:
Ollama (local, recommended)
Claude API
Groq API
OpenRouter
Any OpenAI-compatible endpoint
The Architecture, Unpacked

The privacy boundary is the architectural invariant. Audio capture, transcription, and storage all stay local. Only the text transcript optionally reaches a remote endpoint for summarization, and only when the user explicitly configures a remote backend.
The Code, Annotated
Snippet One: Tauri Command for Real-Time Transcription
// Meetily: Tauri command bridging frontend → Rust transcription engine
// Design intent: Tauri's IPC is the only way the frontend touches the audio/ASR pipeline.
// Nothing in the Next.js layer has direct access to raw audio or model weights.
use tauri::State;
use crate::transcription::{TranscriptionEngine, TranscriptionResult};
use crate::audio::AudioBuffer;
// ← THIS is the architectural boundary: a Tauri command is the only
// surface the frontend can call into the Rust core.
// The frontend calls `invoke("start_transcription", { model: "parakeet" })`.
// It never sees the raw audio buffer, model weights, or GPU context.
#[tauri::command]
pub async fn start_transcription(
model: String, // "whisper" or "parakeet"
audio_device: String, // microphone device ID
system_audio: bool, // capture speaker output as well
engine: State<'_, TranscriptionEngine>,
buffer: State<'_, AudioBuffer>,
) -> Result<String, String> {
// Configure the audio capture: mic + system audio if requested
// ← Audio ducking prevents feedback loops when both streams are active
buffer.configure(
audio_device,
system_audio,
duck_on_mic: true, // ← reduce system audio volume when mic is active
).map_err(|e| format!("Audio config error: {}", e))?;
// Select the transcription backend based on user choice
// ← Parakeet is the default for performance; Whisper for multilingual/accuracy
let result = match model.as_str() {
"parakeet" => engine.run_parakeet(buffer.get_current()).await,
"whisper" => engine.run_whisper(buffer.get_current()).await,
_ => Err("Unknown model".into()),
};
// Return transcript as JSON string to Next.js frontend
// ← The frontend only receives text. Audio never crosses the IPC boundary.
result
.map(|r: TranscriptionResult| serde_json::to_string(&r).unwrap())
.map_err(|e| e.to_string())
}
// The 30-second ring buffer: audio captured, chunked, and fed to the model
pub struct AudioBuffer {
ring: Vec<f32>, // PCM samples at 16kHz
capacity: usize, // 30s × 16000Hz = 480,000 samples
write_head: usize,
// ← 30 seconds is Whisper's native chunk size.
// Parakeet can handle variable-length chunks but uses the same buffer
// for consistency. Ring buffer prevents memory growth on long meetings.
}
The Tauri command pattern is the privacy architecture in code. The frontend (TypeScript/Next.js) cannot directly call audio APIs, load models, or access the filesystem outside the paths Tauri grants. Every sensitive operation goes through a typed command that the Rust backend controls. This is why Meetily can claim "no data leaves" with a straight face: the architecture enforces it, not just policy.
Snippet Two: Parakeet vs Whisper Inference, and Ollama Summarization
// Meetily: Parakeet (TDT) vs Whisper transcription, and Ollama summarization pipeline
// Source: /backend/ and /llama-helper/ in Zackriya-Solutions/meetily
use ort::{Environment, SessionBuilder, Value}; // ONNX Runtime for Parakeet
use whisper_rs::{WhisperContext, FullParams}; // whisper.cpp bindings
// ─── PARAKEET: Token and Duration Transducer (4x faster than Whisper) ─────────
pub async fn run_parakeet(audio: &[f32]) -> Result<TranscriptionResult, Error> {
// Load ONNX model from local path (downloaded once, stored in app data dir)
// ← ONNX Runtime: platform-agnostic, no CUDA/Metal SDK required at runtime
// Runs on any hardware: CPU fallback always available
let env = Environment::builder().build()?;
let session = SessionBuilder::new(&env)?
.with_model_from_file(&get_model_path("parakeet-tdt-0.6b-v3.onnx"))?;
// Preprocess: extract log-Mel features (80 bins, 16kHz)
let mel = extract_mel_features(audio, n_mels: 80, sample_rate: 16000);
// ← THIS is why Parakeet is 4x faster:
// TDT (Token and Duration Transducer) predicts all tokens AND their
// durations in one forward pass through a CTC-style non-autoregressive decoder.
// Whisper's encoder-decoder generates tokens one at a time (autoregressive):
// each token depends on all prior tokens, making it O(T) sequential steps.
// Parakeet's transducer: O(1) parallel decode pass.
// For a 30-second chunk: Whisper may generate 200-400 tokens sequentially;
// Parakeet predicts them all at once (plus timestamps for each word).
let input = Value::from_array(env.memory_info()?, &mel)?;
let outputs = session.run(vec![input])?;
// Parse: token IDs → text, duration predictions → word-level timestamps
let transcript = decode_tokens(&outputs[0])?;
let timestamps = decode_durations(&outputs[1])?; // ← word timestamps included
Ok(TranscriptionResult { text: transcript, timestamps })
}
// ─── WHISPER: Encoder-Decoder (better multilingual accuracy) ───────────────────
pub async fn run_whisper(audio: &[f32]) -> Result<TranscriptionResult, Error> {
let ctx = WhisperContext::new(&get_model_path("whisper-medium.bin"))?;
let mut params = FullParams::new_with_sampling_strategy(SamplingStrategy::Greedy);
// ← Whisper was trained on 680,000 hours of multilingual weak supervision
// (arXiv:2212.04356). For non-English or accented speech, Whisper's
// broader training distribution typically beats Parakeet (English-primary).
params.set_language(Some("auto")); // auto-detect language
params.set_translate(false); // transcribe in source language
params.set_token_timestamps(true); // word-level timestamps via cross-attention
ctx.full(params, audio)?;
let transcript = ctx.full_get_segment_text_joined();
let timestamps = ctx.full_get_all_segment_timestamps();
Ok(TranscriptionResult { text: transcript, timestamps })
}
// ─── OLLAMA SUMMARIZATION: Local LLM via HTTP ───────────────────────────────
// Based on the Microsoft meeting recap research (arXiv:2307.15793, CSCW 2024):
// two recap types are most valuable: highlights (quick scan) + hierarchical minutes
// Meetily's prompt structure implements both.
pub async fn generate_summary(transcript: &str, model: &str) -> Result<String, Error> {
// ← Ollama runs as a local server (localhost:11434 by default)
// The transcript text is the only thing leaving the Rust binary.
// Audio never leaves. The transcript goes to Ollama locally, or optionally
// to a remote endpoint the user configures.
let prompt = format!(r#"
You are an expert meeting assistant. Analyze the following meeting transcript and provide:
## Summary
A 2-3 sentence overview of what was discussed.
## Key Highlights
- [3-5 important points or decisions made]
## Action Items
- [Task]: [Owner if mentioned] - [Deadline if mentioned]
## Meeting Notes
[Structured hierarchical notes organized by topic]
Transcript:
{}
Provide structured output in clean markdown.
"#, transcript);
// ← The Ollama API is OpenAI-compatible: same SDK, different base_url
let response = reqwest::Client::new()
.post("http://localhost:11434/api/generate")
.json(&json!({
"model": model, // e.g., "llama3.2", "mistral", "phi3"
"prompt": prompt,
"stream": false,
// ← No temperature specified: Ollama default is deterministic
// enough for structured extraction tasks
}))
.send()
.await?;
let result = response.json::<OllamaResponse>().await?;
Ok(result.response)
}
The Parakeet vs Whisper choice is the latency-accuracy tradeoff that matters in practice. For English business meetings, use Parakeet: 4x faster, word-level timestamps, ONNX so no CUDA dependency. For multilingual meetings or heavy accents, use Whisper: 680,000 hours of training data produces better robustness. Neither choice is wrong. They serve different primary use cases.
It In Action: End-to-End Meeting Capture
Scenario: A 45-minute product planning meeting with two participants, discussing a React migration from Vue.js.
Step 1: Recording starts
Audio config: MacBook Pro M3, Parakeet model, system audio enabled
Audio capture: CoreAudio → dual-stream (mic + system at 16kHz, mono)
Ring buffer: 30s rolling PCM window, ~480,000 samples per chunk
GPU acceleration: Apple Metal + CoreML (auto-detected at build time)
Step 2: Live transcription (Parakeet, first 30s chunk)
Input: 30s PCM audio, ~40MB uncompressed
Preprocessing: 480,000 samples → log-Mel spectrogram [1, 80, 3000]
Parakeet TDT forward pass (M3 GPU via CoreML):
Encoder: audio features → contextualized representations
Decoder: single parallel pass → all tokens + duration predictions
Output (first 30s):
"Alice: So the main concern with the migration is the component
library. We have about 200 custom Vue components and most of them
don't have TypeScript types at all."
Word timestamps:
Alice [0.0s], So [0.2s], the [0.4s], main [0.5s]...
Latency for 30s audio: ~7.5s (real-time factor ~4x faster than audio)
VRAM used: ~1.2GB (Parakeet 0.6B ONNX + mel features)
Step 3: Meeting continues for 45 minutes
Chunks processed: 90 (45min × 2 chunks/min)
Total transcript length: ~8,500 words (~42,000 characters)
Stored in SQLite: meeting_id, transcript_text, word_timestamps
All stored locally in app data directory
No network calls made (Ollama is local)
Step 4: Summary generation (Ollama, llama3.2:3b)
Input: 42,000-character transcript
Model: llama3.2:3b (running locally on M3 via Ollama)
Prompt: structured summary prompt (highlights + action items + minutes)
Context window: transcript fits in 32k token context with room to spare
Ollama generation:
Time: ~45 seconds on M3 GPU (llama3.2:3b, 4-bit quantized)
Tokens generated: ~800 (the summary)
Output:
## Summary
The team discussed migrating 200 Vue.js components to React,
with TypeScript adoption as a prerequisite. Consensus on a 6-month
phased timeline with Alice leading the component audit.
## Key Highlights
- 200 custom Vue components need TypeScript types before migration
- Phased approach: 20 components/week over 10 weeks
- Bob to research shadcn/ui compatibility with existing design system
## Action Items
- Component audit: Alice - by Friday June 27
- TypeScript migration plan: Bob - by Monday June 30
- Design system review: Bob + design team - next week
## Meeting Notes
### Migration Strategy
- Current state: 200 components, ~0 TypeScript coverage
- Proposed approach: prioritize shared components first...
Step 5: Storage and retrieval
SQLite record:
meeting_id: uuid-1234
created_at: 2026-06-22T14:00:00Z
duration_seconds: 2700
transcript_word_count: 8500
summary_generated: true
transcript_file: /Users/alice/Library/Application Support/meetily/transcripts/uuid-1234.txt
summary_file: /Users/alice/Library/Application Support/meetily/summaries/uuid-1234.md
Total local storage: ~85KB (transcript text + summary markdown)
Audio file (if saved): ~205MB (45min × 16kHz × 4 bytes PCM)
Data transmitted to any remote server: 0 bytes
Why This Design Works, and What It Trades Away
The Tauri framework choice is the correct architectural decision for privacy-first desktop AI. Electron, the most common alternative for cross-platform TypeScript apps, runs a full Chromium browser with a Node.js backend. This means the JavaScript layer has broad filesystem access, network access, and no principled separation of concerns between UI and system resources. Tauri's allowlist model requires explicitly whitelisting every filesystem path and system API the frontend can access. The Rust backend is the enforcer of what the UI can and cannot touch. This is not harder to build; it is a different security model that happens to produce a smaller binary and lower RAM footprint.
The Parakeet TDT architecture deserves a dedicated explanation. NVIDIA's Parakeet is a Token and Duration Transducer, a model class that predicts text tokens and their speech durations simultaneously in a single non-autoregressive decode pass. Whisper, by contrast, uses a standard encoder-decoder: the encoder processes the mel spectrogram into contextual embeddings, and the decoder generates tokens one at a time, each token attending to all previous tokens. For a 30-second audio chunk generating 200 words, Whisper runs 200 sequential decoder steps. Parakeet runs one. The 4x speedup is structural, not a hyperparameter.
The Microsoft meeting recap research (arXiv:2307.15793, CSCW 2024) validates Meetily's summarization design. The paper found that fixed-length summaries fail to meet diverse recap needs: some users need a quick overview, others need detailed notes to reconstruct decisions. The dual-format output (highlights + hierarchical minutes) maps directly to this finding. The paper's additional insight, that user editing patterns (adding, deleting, modifying AI-generated text) should inform future recap improvements, points toward a feature direction Meetily has not yet implemented: learning from user edits to personalize future summaries.
What Meetily trades away:
Speaker diarization is not available in the Community Edition. The transcript output does not identify who said what. The README acknowledges this as a planned PRO feature. For the primary use case of individual meeting capture and personal note-taking, this is acceptable. For team meeting analysis or attribution of decisions to specific speakers, it is a significant gap.
Whisper's 30-second chunk architecture creates a continuity problem at boundaries. Sentences spoken across two chunks may be split mid-transcription. The whisper.cpp integration mitigates this with a small overlap buffer, but word-level accuracy at chunk boundaries is lower than word-level accuracy mid-chunk. Parakeet handles variable-length inputs more gracefully.
Local LLM quality is bounded by what your hardware can run. A MacBook Pro M3 with 32GB unified memory can run llama3.2:3b comfortably. Running a 70B model for higher-quality summaries is possible with sufficiently large VRAM but becomes slow enough to test patience (several minutes for a 45-minute meeting).
Technical Moats
Rust + ONNX for cross-platform ONNX inference without CUDA dependency. Meetily's Parakeet integration uses ONNX Runtime, which runs on CPU (any platform), CUDA (NVIDIA GPUs), CoreML (Apple Silicon), and Vulkan (AMD/Intel). This means the same model binary works everywhere. Whisper.cpp provides the same cross-platform coverage for Whisper. Building a Python-based system would require managing separate CUDA, MPS, and CPU code paths or accepting performance penalties on non-NVIDIA hardware. The Rust + ONNX Runtime approach handles this with a single deployment target.
Tauri's security model as a compliance artifact. Enterprise deployments often require demonstrating that sensitive data cannot be exfiltrated by a rogue application or a supply chain compromise in a JavaScript dependency. Tauri's allowlist model means that even if a malicious npm package were included, it cannot access arbitrary filesystem paths or make network calls outside the declared allowlist. This is a compliance argument, not just a performance argument. Cloud-first tools with Electron frontends cannot make this claim.
The ONNX Parakeet conversion by the community. The NVIDIA Parakeet model itself is public on Hugging Face. The ONNX-converted version used by Meetily (istupakov/parakeet-tdt-0.6b-v3-onnx) is a community contribution that enables deployment without NeMo or CUDA. Converting a transducer model to ONNX and maintaining numerical equivalence across platforms is non-trivial. Teams trying to integrate Parakeet into their own applications benefit directly from this conversion work without needing to reproduce it.
Insights
Insight One: Meetily's privacy claim is strong for transcription and weak for summarization, and this distinction matters for compliance purposes. The transcription engine is fully local: Whisper and Parakeet run on-device, audio never leaves, and the transcript stays in SQLite on the local filesystem. The summarization engine defaults to Ollama (local) but supports Claude, Groq, and OpenRouter. A user who configures Claude as their summary backend is sending the full meeting transcript to Anthropic's servers. This is not a Meetily failure; it is the user's configuration choice. But compliance teams evaluating Meetily for regulated environments need to understand that "privacy-first" applies fully to transcription and applies only by default (not by necessity) to summarization. The architecture is honest about this. The marketing is not always equally clear.
Insight Two: The 4x speed claim for Parakeet over Whisper is accurate but the comparison baseline matters. Parakeet TDT vs Whisper medium on the same hardware for English audio: yes, approximately 4x. Parakeet vs Whisper tiny on the same hardware: the gap narrows significantly, because Whisper tiny has far fewer autoregressive decoder steps per token. Parakeet vs a CUDA-optimized Faster-Whisper implementation with GPU acceleration: the gap varies by hardware and batch size. Teams benchmarking Meetily should test their specific hardware and language distribution rather than taking the 4x figure as universal. For English-primary meetings on Apple Silicon, Parakeet is the correct choice on all dimensions: speed, accuracy, and timestamp quality. For multilingual meetings, Whisper's 680,000-hour training distribution produces better robustness to accents and code-switching.
Surprising Takeaway
Meetily includes a CLAUDE.md file in its root directory, which means the developers use Claude Code to build Meetily. The repository that teaches Claude Code to understand the codebase and contribute to it is the same codebase that processes meeting recordings locally to avoid sending sensitive data to AI systems. The developers are simultaneously trusting Claude with their development workflow and building privacy infrastructure that prevents Claude (and all other cloud AI systems) from accessing their users' meeting data. This is not a contradiction. It is the correct deployment model: use cloud AI for work that benefits from it, where you control the context, and use local AI for user data, where you do not control what sensitive information might appear. Meetily's own development process is a demonstration of the tiered trust model the product is built to enable.
TL;DR For Engineers
Meetily (Zackriya-Solutions/meetily, MIT, 15.4k stars, v0.4.0) is a Tauri desktop app (Rust backend + Next.js frontend) for local meeting transcription and summarization. Audio capture, transcription, and storage are entirely on-device. Summarization defaults to Ollama (local) with optional cloud endpoints (Claude, Groq, OpenRouter, OpenAI-compatible).
Two transcription backends: Whisper (arXiv:2212.04356, 680k-hour multilingual, encoder-decoder, autoregressive) via whisper.cpp, and Parakeet (NVIDIA TDT, 0.6B params, ONNX, non-autoregressive transducer). Parakeet is 4x faster on English; Whisper is more robust for multilingual and accented speech. GPU acceleration: Metal/CoreML (Apple), CUDA (NVIDIA), Vulkan (AMD/Intel).
Privacy boundary is the Tauri IPC layer. Frontend cannot access audio APIs, model weights, or arbitrary filesystem paths directly. All sensitive operations are typed Rust commands. Even malicious npm dependencies cannot exfiltrate audio data because the TypeScript layer has no access to it.
Summarization prompt implements the highlights + hierarchical minutes structure validated in the Microsoft meeting recap research (arXiv:2307.15793, CSCW 2024): fixed-length summaries fail diverse needs; dual-format output addresses both quick overview and detailed notes use cases.
Current gaps: no speaker diarization in Community Edition (PRO roadmap), chunk-boundary transcription accuracy, local LLM quality bounded by hardware. Float16 throughout; no mixed precision for Parakeet inference.
Privacy Is Not a Feature. It Is Where the Data Goes.
Cloud meeting tools argue that their privacy controls, access restrictions, and data processing agreements make them safe. Meetily's argument is simpler: if the audio never leaves the device, there is nothing to protect against. The architectural choice to run everything locally is not a compromise on capability. For the use cases Meetily targets, the local pipeline produces results that are competitive with cloud alternatives and produces them without the compliance surface area.
The more interesting observation is what the architecture reveals about local AI in 2026. A 0.6B Parakeet model running on ONNX Runtime achieves 4x faster than real-time transcription on consumer hardware. A 3B llama model running via Ollama produces meeting summaries that are useful and structurally sound in under a minute. The performance gap between local and cloud AI, for bounded NLP tasks, is closing. Meeting transcription and summarization are within the task complexity range where local models are already production-viable.
References
Meetily GitHub Repository, Zackriya-Solutions, MIT
Robust Speech Recognition via Large-Scale Weak Supervision, arXiv:2212.04356, Radford, Kim, Xu, Brockman, McLeavey, Sutskever, OpenAI, 2022
Summaries, Highlights, and Action Items: LLM-Powered Meeting Recap, arXiv:2307.15793, Asthana, Hilleli, He, Halfaker, Microsoft, CSCW 2024
Parakeet TDT ONNX model on HuggingFace, community conversion from NVIDIA Parakeet
Meetily (Zackriya-Solutions/meetily, MIT, 15.4k stars, v0.4.0) is a Tauri desktop application (Rust 46% + TypeScript 30%) that captures, transcribes, and summarizes meetings entirely on-device, with audio and transcription never leaving the local machine. Its transcription engine supports both Whisper (arXiv:2212.04356, 680k-hour multilingual encoder-decoder, best for accented and multilingual audio) and NVIDIA Parakeet TDT (0.6B ONNX, non-autoregressive transducer, 4x faster than Whisper for English via parallel token prediction), with GPU acceleration across Apple Metal, NVIDIA CUDA, and AMD/Intel Vulkan. Summarization defaults to Ollama (local) using a dual-format prompt (highlights + hierarchical minutes) aligned with the Microsoft meeting recap research (arXiv:2307.15793, CSCW 2024), with optional cloud backends (Claude, Groq, OpenRouter). The Tauri IPC layer is the enforced privacy boundary: the TypeScript frontend cannot access audio, model weights, or arbitrary filesystem paths directly.
Sponsored Ad
If you enjoy practical AI insights, check out SnackOnAI and support the newsletter by subscribing, sharing, and exploring our sponsored ad — it helps us keep building and delivering value 🚀
How AI-Era Pricing Is Reshaping Finance Operations
Usage-based and hybrid pricing models are changing how B2B companies generate revenue — and creating new headaches for the finance teams behind them.
Tabs co-founder Rebecca Schwartz and PwC Partner Amit Dhir sat down to unpack exactly what that means in practice: how pricing model decisions ripple into revenue recognition, forecasting, and financial ops — and what it takes to scale without piling on manual work.
Watch the on-demand recording to get practical frameworks, real-world examples, and a clear path to operationalizing usage-based revenue — including a forward-looking take on how AI will reshape financial workflows. If your team is navigating pricing complexity heading into the back half of the year, this is worth an hour.


