Building Spark: My Personal AI Assistant
Over a year ago I started working with N8N, a self-hosted automation platform similar to Zapier that connects to basically any service you can imagine. They had a built-in AI agent builder that let me wire up my own bot using the OpenRouter API. This was before tools were a thing, before ChatGPT could search the web, before any of the major AI products had live internet access. My little bot could search the web. I remember showing that off to friends and feeling like I'd built something genuinely ahead of the curve.
The N8N Bot That Didn't Last
The problem was that it never really delivered on the promise. Searching the web sounds impressive until you realize the results are inconsistent and the reasoning on top of them is unreliable. It made more mistakes than useful responses, the API costs were getting expensive even for short one-off requests, and the biggest issue was that it had no memory at all. Every conversation started from zero. So I ended up not using it much and went back to my regular stack: subscriptions to ChatGPT and Claude, cycling between them depending on the task.
OpenClaw Changes the Game
A few weeks to a month ago, I came across OpenClaw, a bot that was taking the AI community by storm. The core idea was clever: it runs locally on your computer, uses a coding agent to manage a bunch of files as persistent memory, and you access it through a gateway on your phone. So it was basically Claude Code running continuously, but with a phone interface and a single giant repository it could manage and control over time. You could even hand it access to your entire computer. It felt like the first time persistent memory actually worked the way it should.
I gave it a try and genuinely loved it. The conversations felt coherent in a way nothing else had. But I could see the problem almost immediately: it was consuming my Claude subscription credits at an alarming rate. OpenClaw defaulted to Opus, Anthropic's most capable and most expensive model, for everything. Every request, no matter how simple, would trigger a reasoning loop that burned through tokens like it had something to prove. There's a well-known example in the community of someone saying "hi" to an OpenClaw instance and watching it spin up a context-heavy multi-step process that cost nearly $20. For a single greeting.
The math was ugly. Running it the way I wanted would have cost over a thousand dollars a month, which is five times the cost of my most expensive Claude subscription. Anthropic eventually banned OpenClaw from their platform entirely. I believe it was the last straw that led them to restrict subscription usage to Claude Code projects only, since OpenClaw was essentially jailbreaking the subscription model. Smart move from them, and honestly an overdue one.
Building My Own
I'd already named my OpenClaw instance Spark, and when OpenClaw got shut down, I decided to keep the name and build the real thing myself.
The two things I wanted to get right were cost efficiency and memory. OpenClaw's single-model-for-everything approach was the root of both problems: wasteful because it used Opus for tasks that didn't need it, and brittle because the memory was just a pile of files without any real structure. I'd also spent enough time building with MCPs (model context protocol tools) to know they're unreliable. They use a lot of credits, frequently return bad results, and break in hard-to-debug ways. I didn't want to build on top of that.
So I built my own action system for Spark instead, and a memory system designed around how conversations actually work. Information within a single conversation stays cohesive and contextually linked. Information from past conversations gets pulled in automatically through semantic search, so if you mention something that's relevant to what Spark already knows about you, it surfaces that context without you having to ask for it. It just shows up in the conversation, the way it would with a person who actually remembers your history.
Memory That Feels Human
The design goal for Spark's memory was that it should work the way human memory works. You don't consciously retrieve memories by querying a database. You're in a conversation, something triggers a connection, and the relevant context floats to the surface. That's what I wanted Spark to do.
When you're talking to Spark and you mention something it's encountered before, it brings that context forward automatically. You don't have to say "remember when we talked about X." It just knows, and it acts like it knows. The first time this happened naturally in a real conversation it felt like talking to an old friend rather than a tool.
The system is also built to run for years without falling apart. The memory doesn't degrade. It doesn't run into context limits. It's not tied to a single conversation thread that gets dropped when you close the app. It's persistent in the way that actually matters.
One Model for Everything Is a Bad Idea
The other core design principle was to never use a single model for everything. This was OpenClaw's biggest architectural mistake. Opus is brilliant for hard reasoning problems, but it's grotesquely expensive for simple tasks, and most tasks are simple. Routing everything through your most powerful model because you haven't thought about the problem is a sign that you're building carelessly.
Spark routes by task type. For general conversation, I've landed on Grok as the best fit. Its style is concise and direct in a way I find genuinely useful, probably a product of being trained heavily on Twitter where every word has to earn its place. For coding, Spark hands off to a completely separate environment running ChatGPT Codex. That environment shares context with the main conversation so it knows what we've been discussing, but the code work happens in its own space and doesn't bleed back into the main chat. Keeping them separate means the main conversation stays clean and readable regardless of how deep a coding session gets.
Context trimming was the other piece I had to get right. Long conversations accumulate a lot of noise: half-formed ideas, resolved questions, tangents that went nowhere. If you don't actively prune that, the model eventually loses the thread on what actually matters. Spark continuously trims its active context down to the most important information, so the conversation stays fluid even over long sessions without losing the substance of what we've covered.
Running Constantly
Once I had the conversation side working the way I wanted, I needed Spark to actually run as infrastructure, not just a script I fire up manually. I built it on a Uvicorn server so it runs persistently on my computer. This matters more than it sounds. Cron jobs are brittle for anything complex: they run at fixed intervals, they don't communicate with each other, and when something goes wrong you usually find out hours later. Spark's scheduler can handle real complexity, running other projects on consistent schedules, managing dependencies between jobs, and sending me outputs from any task that finishes. It's closer to having a process manager than a timer.
Where It Stands Now
Spark is the assistant I actually wanted when I first built that N8N bot over a year ago. It knows me, it remembers what we've talked about, it doesn't cost a fortune to run, and it handles real work across conversation, code, and background scheduling.
It's not ready for anyone else to use yet. It's deeply customized to how I work, and there's a gap between "works for me" and "works for someone else" that I haven't bridged. But if this reaches anyone who's interested in what I've built here, either to learn more about the architecture or to help figure out how to make something like this available more broadly, I'd genuinely like to hear from you. The goal was always to build an AI partner that could actually help manage my life for years. I think Spark is becoming that. Here's to building AI tools that actually work.
