I built the same thing nine times over in completely autonomous agent mode so that you can have one empirical data point on token efficiency, quality and effectiveness of MCPs and skills.
What I found was that:
There is no established best practice for AI assisted development. You must be experimenting constantly to understand how new tooling can be adopted. My small part to contribute here is to burn a billion tokens on building the same project over and over so that we can better understand what tools work best with AI, and what you should consider doing on your next greenfield project.
The hypothesis
Functional languages would use fewer tokens — a strong type system would also push towards leaner overall solutions.
I can confidently say this is not true. Functional languages appear to be the worst in token efficiency, although the overall solution is very line-of-code efficient. 2026 may not be the year for my beloved esoteric functional languages after all!
For this test project I built a feature-voting SaaS where product teams collect and prioritize user feedback through a public board.
The app covers the full product loop end-to-end:
How it was built
I started with a detailed PRD that broke the work into 26 user stories across 9 epics — phased into three releases. I used the beads issue tracker to manage the plan in Claude Code. The amount of nudging needed varied between 2 and 5 times to get the agent to keep working until the project was finished.
| Stack | Total Tokens | Tests? |
|---|---|---|
| Go | ~14M | |
| Next.js | ~26M | |
| Rails | ~36M | |
| C | ~52M | yes |
| Java Spring | ~88M | |
| Laravel | ~91M | yes |
| OCaml | ~122M | yes |
| Clojure v1 | ~133M | yes |
| Clojure v2 | ~181M | yes |
| Stack | Primary Lang | Primary LOC | Total Code LOC |
|---|---|---|---|
| OCaml | OCaml | 3,918 (8 files, 1,000 in test) | 3,928 |
| Rails | Ruby | 1,732 (88 files) | 4,950 |
| C | C | 2,315 (1 file) | 5,150 |
| Clojure v2 (no REPL) | Clojure | 3,677 (29 files, 1,241 in test) | 5,346 |
| Clojure v1 (with REPL) | Clojure | 2,847 (15 files, 599 in test) | 5,799 |
| Go | Go | 3,095 (25 files) | 7,013 |
| Next.js | TypeScript | 4,743 (72 files) | 10,230 |
| Laravel | PHP + TypeScript | 6,011 PHP + 7,530 TS (2,923 in tests) | 17,423 |
Clojure, Rails and OCaml are incredibly terse — especially when you consider the significant testing component of the OCaml/Clojure codebases.
Next.js and Laravel have the most coherent recommended practice for AI-based development. They ship skill files, MCPs and recommended claude.md structures. Within Next.js, there is an ORM-specific skill (Prisma) and a design system-specific skill (shadcn). Laravel included an MCP that can unify backend and frontend logs, and a significant suite of skill files including 'best practice' and 'documentation links'. Next.js was a surprisingly token-efficient implementation despite a high line count — though the shadcn components do account for a lot of those lines. The Laravel implementation wrote a full test suite and took many turns to iterate and validate through testing — as a result the token cost was higher, but this feels worthwhile in terms of the maintainability of the end solution.
There is surprisingly little out there in the way of official framework-specific skill files, MCPs or other AI harness tooling. There is real opportunity here for framework developers, as many people will be approaching new projects with an LLM-first approach.
The most interesting tooling story, though, was Clojure. My experience watching Claude edit Clojure was ... painful. The agent would get in such a twist balancing parentheses when editing code that it would eventually start writing Python scripts to do the editing for it.
After a small amount of research I came across clojure-mcp-light — a recommended MCP that encourages REPL-driven development and provides a small command-line utility that helps the agent balance parentheses during edits. What made it unique was a deterministic hook that runs after every edit.
The result surprised me: no marked token efficiency gain, but the agent took a fundamentally different approach to the whole project. It produced 2x the tests and fewer overall lines of code than the first attempt. The right tooling didn't just make Clojure cheaper — it changed how the agent thought about the problem.
Although AI is capable of making a solution in any language, you should consider longer-term maintainability. That comes from simplicity in the overall solution, reduced churn, and consistent use of patterns — code that a human can actually follow. Right now the bottleneck is humans understanding the system, so you really want the agent writing in a structure optimised for human consumption.
It is hard to move away from mature frameworks when you want to keep a system going for 10 years and have multiple people maintaining and extending it. We should not just consider lines of code, just as we should not just consider token efficiency.
Right now the tooling is so new that there is no clear winner in out-of-the-box skills and MCPs. The addition of framework-specific AI tooling is a nice-to-have, but not yet a game changer.
I would encourage you to consider the following before committing to a framework in 2026: