Why This Matters Now
Every time you call DALL-E or Replicate, you're paying for three things you don't need: a round-trip across the public internet, a queue behind thousands of other requests, and a margin that funds someone else's GPU cluster.
The better path has existed for two years. Most people just haven't taken it seriously yet.
Mr. Pixel Smith is a Python CLI that generates images locally using Ollama and watermarks them automatically with Pillow.
Built by SnackOnAI and available for free, this CLI is now a core part of SnackOnAI’s workflow for generating high-quality newsletter images.

Generated By Mr. Pixel Smith with Prompt
“A character named Mr. Pixel Smith floating among clouds and stars, holding a glowing hammer made of pixels, cinematic lighting, ultra realistic, dramatic sky, high detail, 4k”
It's small by design. But the patterns inside it - preflight service checks, subprocess error handling, graceful degradation — are the same ones that will break your production AI pipeline if you get them wrong.
The Problem Most Builders Ignore
Wrapping an inference service isn't just calling it. It's handling every state the service can be in.
Ollama, like any long-running daemon, can be missing, installed-but-not-running, running-but-hung, or running-with-the-wrong-model-loaded. Most wrappers handle none of these. They call the service, get a cryptic error or empty output, and surface a useless stack trace to the user.
The three failure domains that matter:
Failure | Root Cause | Right Response |
|---|---|---|
Binary not found | Not installed | Exit with install link |
Daemon not running | Not started | Exit with |
Model not pulled | Empty JSON response | Exit with |
Bad input | Empty prompt, invalid dims | Validate before subprocess call |
Corrupt output | Bad base64 from model | Surface error with context |
Watermark crash | Font missing, wrong image mode | Fall back to original, log warning |
Fix all six or you don't have a tool, you have a demo.
How It Actually Works
The Inference Pipeline
When you send a prompt to Ollama's image model, here's what happens:
TEXT PROMPT
|
v
[Text Encoder] --> 768-dim embedding vector
|
v
[Latent Diffusion] --> Iterative denoising (20-50 steps)
|
v
[VAE Decoder] --> Pixel-space image
|
v
Base64 PNG --> Returned in JSON response
The width and height parameters don't just control output resolution, they control the size of the latent tensor being denoised. A 1200×628 image is roughly 4× the compute of a 512×512 at equivalent step count. Resolution choices have real latency consequences.
The Watermark Pipeline
Watermarking in Pillow is three operations composed:
RGBA conversion — Forces 4-channel mode regardless of source format
Tiled diagonal repeat — Text at opacity 60/255, covering the full image. Non-destructive to remove but not trivial either.
Corner stamp — High-contrast at 220/255 with a dark background rect for legibility at any image brightness
The graceful degradation path — save the original if watermarking fails — is the right call. Failing to watermark is always less bad than losing the image.
The Code That Actually Matters
Preflight Check
def check_ollama() -> None:
if not shutil.which("ollama"):
print("Error: 'ollama' not found. Install: https://ollama.com")
sys.exit(1)
try:
result = subprocess.run(
["ollama", "list"],
capture_output=True, text=True, timeout=5
)
if result.returncode != 0:
print("Ollama installed but not running. Run: ollama serve")
sys.exit(1)
except subprocess.TimeoutExpired:
print("Ollama not responding. Is 'ollama serve' running?")
sys.exit(1)
The 5-second timeout is deliberate. ollama list returns in under 100ms if the daemon is healthy. If it times out, image generation will too — fail fast here rather than hanging for 2 minutes.
Validated Input Loop
def get_int_input(prompt_text, default, min_val=64, max_val=4096):
while True:
raw = input(f"{prompt_text} [default: {default}]: ").strip()
if raw == "":
return default
try:
value = int(raw)
if not (min_val <= value <= max_val):
print(f" Enter a value between {min_val} and {max_val}.")
continue
return value
except ValueError:
print(" Invalid — enter a whole number.")
The bounds [64, 4096] aren't arbitrary guardrails. Below 64px, diffusion models produce incoherent noise. Above 4096px on consumer hardware, you're likely to OOM before you get an output.
Font Fallback (The Version Most People Write Wrong)
# Wrong: crashes silently on systems without DejaVu
font = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans-Bold.ttf", size)
# Right: try a prioritized list, log the fallback
FONT_CANDIDATES = [
"/usr/share/fonts/truetype/dejavu/DejaVuSans-Bold.ttf", # Linux
"/System/Library/Fonts/Helvetica.ttc", # macOS
"C:/Windows/Fonts/arial.ttf", # Windows
]
def find_font(size: int):
for path in FONT_CANDIDATES:
try:
return ImageFont.truetype(path, size)
except (IOError, OSError):
continue
logging.warning("No system font found. Using default bitmap font.")
return ImageFont.load_default()
ImageFont.load_default() returns a bitmap font designed for thumbnails. At 1200px it looks terrible. Never use it as a silent default.
Architecture: What the System Actually Looks Like
USER (Terminal)
|
| input(): prompt, width, height, output_path
v
┌─────────────────────────────────────┐
│ MAIN ORCHESTRATOR │
│ 1. check_ollama() — preflight │
│ 2. get_int_input() — validated I/O │
│ 3. generate_image() — subprocess │
│ 4. add_watermark() — Pillow comp. │
│ 5. write_bytes() — disk write │
└─────────────────────────────────────┘
| |
v v
[Ollama Daemon] [Pillow / PIL]
localhost:11434 In-process CPU
GPU/CPU inference compositing
The data flow is a straight pipeline: str → subprocess → JSON → base64 → bytes → PIL Image → watermarked bytes → file. Each transformation has exactly one failure mode. Handle them all or handle none.
⚡ Insight #1: Subprocess Is the Wrong Abstraction
Everyone reaches for subprocess.run() because it's familiar. It's also fragile.
The current implementation calls the Ollama CLI and parses its stdout as JSON. That works until Ollama adds a progress indicator, a warning message, or changes its output schema, and then it breaks silently, returning empty or malformed output with no error.
The REST API at http://localhost:11434/api/generate has an actual versioned contract. It supports streaming. It gives you proper HTTP error codes. It doesn't require spawning a new process on every generation.
# Better: call the REST API directly
import httpx
response = httpx.post(
"http://localhost:11434/api/generate",
json={"model": "x/z-image-turbo", "prompt": prompt},
timeout=300
)
response.raise_for_status()
image_b64 = response.json()["response"]
Two lines cleaner, dramatically more robust. Use subprocess for quick prototypes. Use REST for anything that runs in production.
⚡ Insight #2: Local-First Is a Competitive Moat, Not Just a Cost Play
The common framing is: "use local inference to save money." That's true but undersells it.
The real advantage is architectural independence. When your product's image generation runs locally, you're immune to API rate limits, outages, pricing changes, and terms-of-service updates. You can run in air-gapped environments. You can guarantee data doesn't leave the machine. You can tune inference parameters the cloud API doesn't expose.
Founders building cost-sensitive products should be asking: "what is the total cost of API dependency?", not just the per-image price. The answer almost always favors local-first for internal tools and batch workloads, and cloud for user-facing real-time generation where quality SLAs matter.
💡 Surprising Takeaway: The Watermark Is a Product Decision, Not a Technical One
Most engineers treat watermarking as an implementation detail. It's not.
A watermark communicates provenance. It's how your brand travels when generated images get shared, reused, or embedded somewhere you didn't intend. For a content tool, the watermark is distribution. For an enterprise tool, it's audit trail. The opacity, placement, and content of the watermark are product requirements that should come from a spec, not from whatever value happened to get hardcoded during prototyping.
Adaptive opacity based on regional image luminance is the right production implementation. A watermark at 60/255 opacity is invisible on a bright sky and visible on a dark forest. Compute the mean luminance of the stamp region and adjust accordingly.
What Breaks at Scale
Mr. Pixel Smith is a single-user CLI. These assumptions fail the moment you serve multiple users:
Ollama is single-threaded by default — concurrent requests queue, they don't parallelize
subprocess.run()is blocking — a web server calling this blocks its entire threadoutput.pngis hardcoded — concurrent users stomp each other's filesNo request correlation — impossible to trace which generation belongs to which user
The production path requires a task queue, async REST calls to Ollama, per-request unique output paths, and object storage (S3) instead of local disk.
HTTP Request
|
v
[FastAPI]
|
v
[Celery + Redis] <-- decouple generation from request
|
v
[Worker] --> POST localhost:11434/api/generate
|
v
[S3 Upload] <-- watermarked PNG
|
v
Signed URL returned to client
None of this is complicated. It's just not in scope for a CLI tool.
Tool Comparison
Tool | Best For | The Catch |
|---|---|---|
Ollama + Mr. Pixel Smith | Local, private, zero API cost | Setup required, GPU helpful |
Stable Diffusion WebUI | Mature ecosystem, 1000s of models | Browser-based, not CLI-native |
ComfyUI | Complex multi-step workflows | Steep learning curve |
Replicate API | Zero setup, managed GPU | ~$0.003/image, data leaves machine |
DALL-E 3 | Best prompt adherence | Most expensive, requires internet |
HuggingFace Diffusers | Maximum pipeline flexibility | Write inference code yourself |
Use local when privacy, cost at scale, or offline operation matter. Use cloud when quality SLAs and latency requirements genuinely demand it.
Five Things People Get Wrong
1. Trusting stdout format. json.loads(result.stdout) breaks silently when Ollama changes its output. Use the REST API.
2. Ignoring stderr. Most subprocess wrappers only check returncode. Ollama writes diagnostics to stderr. Log it even on success.
3. Not accounting for cold start. First generation after model load is 3-5× slower. Build this into your timeout config and user messaging.
4. Hardcoding watermark opacity. A fixed opacity looks wrong on half your images. Adapt to regional luminance.
5. The font fallback trap. load_default() is a bitmap font. At 1200px it looks like 1995. Always try a system font list first.
How to Think About This as a Builder
Mr. Pixel Smith is a specific instance of a pattern you'll encounter constantly: wrapping an inference service with validation, post-processing, and failure handling.
The same pattern applies to LLM APIs, speech-to-text, OCR, embedding models, any service with its own lifecycle that can be in an unknown state when your code calls it.
The quality of the wrapper often determines the user experience more than the quality of the underlying model. A great model with a brittle wrapper feels worse than a decent model with a solid one.
TL;DR for Founders
Local AI inference is production-viable right now. Ollama is a single CLI install. Pillow watermarking is CPU-only, fast, and free. The real work is the wrapper, validation, preflight checks, graceful degradation.
Build on this in this order:
Swap subprocess for REST API calls to
localhost:11434Make watermark opacity adaptive to image luminance
Add a cross-platform font candidate list
Wrap in FastAPI + Celery for multi-user serving
Move output to S3 with per-request UUIDs
The economics of local inference only improve from here. Hardware gets cheaper, models get smaller, Ollama adds more features. The builders who understand this stack now will have a durable advantage over those who keep treating cloud APIs as the default.
Give it a try yourself, I’d love to see what images you come up with.
Mr. Pixel Smith source code: github.com/mohnishbasha/snackonai/tree/master/mr-pixel-smith
More at snackonai.com
Sponsored Ad
If you work in fintech or finance, you already have too many tabs open and not enough time.
Fintech Takes is the free newsletter senior leaders actually read. Each week, I break down the trends, deals, and regulatory moves shaping the industry — and explain why they matter — in plain English.
No filler, no PR spin, and no “insights” you already saw on LinkedIn eight times this week. Just clear analysis and the occasional bad joke to make it go down easier.
Get context you can actually use. Subscribe free and see what’s coming before everyone else.


