Practical Notes about Vibe Coding

Just some notes about spec-driven development, honest tradeoffs, and knowing when a spreadsheet is enough

Feb 23, 2026

person writing bucket list on book — Photo by Glenn Carstens-Peters on Unsplash

Small organizations (advocacy groups, mutual aid networks, local campaign chapters) hit a wall as they grow. Someone needs to know which city council staffers have been responsive, which donors lapsed, which coalition partners actually show up. This is relationship data, often politically sensitive, and it has no business living on someone else's servers under someone else's terms. But building custom software is expensive, and hiring a developer is open-ended.

Some people get it right. Most don’t.

By now, the pattern is unmistakable. “Vibe coding,” the practice of describing what you want to an AI and letting it generate software, was named Collins Dictionary’s Word of the Year for 2025, and for good reason: it has produced both spectacular successes and spectacular failures, often within the same week.

On the success side: a quarter of startups in Y Combinator’s Winter 2025 batch had codebases that were 95% AI-generated. A vibe-coded game reportedly generated $1 million in revenue within 17 days. Non-technical founders have built functional products in hours that would previously have required months of developer time.

On the failure side: a non-technical founder built his entire SaaS product with an AI coding tool, proudly announced he had written “zero hand written code,” and within days was posting desperately online that users were bypassing his subscription system, attackers were flooding his API, and his database was filling with garbage. All because the AI (well, it’s not actually AI but large language model) had produced code with no authentication, no rate limiting, and no input validation. A prominent startup founder had an LLM assistant delete his entire database despite explicit instructions not to make changes. A security audit of one major vibe-coding platform found that roughly one in ten applications it generated had vulnerabilities that would expose users’ personal information to anyone who looked.

The problem is the conversation, not the LLM

The failure mode is always the same. You open an LLM chatbot, describe what you want, and start going back and forth. The LLM generates something promising. You ask for changes. It breaks something. You ask it to fix that. It forgets something else. Errors compound. The fundamental issue is structural: a chat-based AI can only hold a limited amount of text in its working memory at once. As the project grows, the AI loses track of its own decisions. It’s less like working with a contractor who remembers your project and more like briefing a new one every morning who has only a vague sense of what the last one built.

This is why the back-and-forth approach reliably produces fragile software. It is also why the people who succeed are doing something different.

Write what you need (the specs)

The approach that works is called spec-driven development. Instead of chatting back and forth with AI, you write a structured description of what you need in a plain document: what data you track, what your workflow looks like, who should see what. You write that down once, clearly, then hand it to an AI agent that reads the project, implements it task by task, tests its own work, and maintains memory between sessions. If something needs to change, you update the document, not the code. Because the AI is working from a single authoritative source rather than a wandering conversation, the context problem largely disappears.

This is what separates the success stories from the disasters. Not better prompting, but better descriptions of what’s needed before the AI starts building. Thoughtworks flagged spec-driven development as an emerging best practice in their November 2025 Technology Radar, and open-source toolkits for it are now available across every major AI coding platform.

It is not foolproof. Specs drift from implementation. The AI can still make quietly wrong decisions that look correct on the surface, and if you lack the technical literacy to catch a subtle permissions or validation error, you won’t know until something breaks. Better than the chatbot approach by a wide margin, but not a guarantee.

Tools for this workflow have moved fast. Claude Code maintains a project memory file across sessions and can farm out tasks to multiple sub-agents simultaneously. Google’s Antigravity, currently in free public preview, coordinates agents across an editor, terminal, and browser. You can use both on the same project.

What to point the spec at

With a specification in hand, you need to know what’s actually out there before deciding what the AI should build. The landscape is wider than most non-developers realize, and picking the right starting point saves more time than any amount of clever prompting.

For most internal tools, configure an existing platform. Free, open-source systems like Payload, Strapi, and Keystone already solve the hard problems: managing a database, generating an admin interface, handling user accounts and permissions. An AI agent working from your specification can produce the configuration files these platforms require, defining your member database field by field, your permissions rule by rule. The admin interface they generate isn’t a rough developer screen. For internal use, it’s the finished product: a clean interface where your team logs in, browses records, manages access, and gets work done.

For something more custom, the same spec drives a fuller build. Some organizations eventually need a public signup form, a dashboard with maps and charts, a scraper that monitors government postings, or a small agent that handles repetitive tasks. Frameworks like Streamlit and Observable can turn a spec into a working dashboard in minutes. Crawlee can power a scraper that watches a city council’s agenda page. Mastra can wire up a lightweight agent that checks for new grant postings and alerts your team. These require more code than configuring an admin panel, but they also eliminate whole categories of problems you’d otherwise have to solve from scratch, and they shrink the amount of context the AI is working with. Same spec, same methodology, the agent just builds more.

The specification is the common thread. Whether the AI is writing a configuration file for Payload or scaffolding a dashboard in Streamlit, it’s working from the same structured description of your organization’s needs. A little awareness of what tools exist goes a long way.

Data ownership and cost

Both approaches keep your data off someone else’s servers. For organizations tracking which officials’ offices have been helpful or hostile, or what stakeholders have done what, that’s not a nice-to-have. A commercial platform means that data lives under their terms, is recoverable only as long as you keep paying, and is one acquisition or policy change away from being handled in ways you never agreed to.

Open-source tools and self-hosted applications run on infrastructure you control. You can host them on a basic cloud server for $4–6 per month (or even for free). Your data sits in a database you own. The software itself is free: no per-user fees, no plan limits, no lock-in.

For an organization running on donations, that math is not complicated.

What this actually requires

How much maintenance you’re signing up for depends on what you build. Admin frameworks like Payload and Strapi handle their own updates and have documentation written for non-experts; keeping a configured admin panel running is closer to maintaining a WordPress site than maintaining custom software. The more custom you go, the more you own. A Streamlit dashboard has a large community, but when something breaks, you’re debugging Python. The question is always: who keeps this running six months from now, and do they know it’s their job?

Before you build anything, consider whether you need to. If your team already lives in spreadsheets, tools like Claude in Excel or Google Sheets with Gemini can add structure and automation to what you already have. Sometimes the right answer is a smarter spreadsheet, not a new app.

And if you do build, build one thing. A single dashboard that answers the question your team actually asks every week. Get that working, get people using it, then build the next thing. The temptation to do everything at once is a reliable project-killer, even for experienced developers. The hardest part was never the software. It was saying, in plain simple terms, what your organization actually needs.

The Operational Alpha

Discussion about this post

Ready for more?