AI

sqlite-utils 4.0rc2: Claude Fable Discovers Critical Bugs

Simon Willison asked Claude Fable to review code before releasing sqlite-utils 4.0rc2. AI discovered multiple critical issues including a data-loss bug in delete_where(), fixed at a cost of about $149.

5 min read Reviewed & edited by the SINGULISM Editorial Team

sqlite-utils 4.0rc2: Claude Fable Discovers Critical Bugs
Photo from Unsplash

Simon Willison, developer of the Python SQLite wrapper library “sqlite-utils,” announced the second release candidate (4.0rc2) of the library’s major version 4.0 on July 5. Notably, the majority of code review and bug fixes for this release candidate were handled by “Claude Fable,” an AI model provided by Anthropic. The changes, spanning 37 prompts, 34 commits, 1,321 added lines, and 190 deleted lines, were completed at a cost of only about $149.25.

Willison had recently released 4.0rc1 and decided to conduct a final check before the stable release. With his Max subscription’s Claude Fable nearly exhausted and about to become unusable, he requested a code review from Claude Code (web version) on his iPhone with the following prompt:

“Final review before shipping stable 4.0. It’s extremely important to catch any issues at the last moment that would become breaking changes if fixed later.”

The AI immediately generated a comprehensive report. It included five critical bugs that even Willison himself had overlooked, which were classified as release blockers.

Data-Loss Bug in delete_where()

The most severe issue was a transaction handling problem in the Table.delete_where() method.

delete_where() internally calls self.db.execute() raw, without being wrapped in an atomic() wrapper like the delete() method. As a result, after executing this method, the connection remains in_transaction=True, causing all subsequent atomic() calls to enter a savepoint branch and never commit.

The reproduction code published by Willison is as follows:

db = sqlite_utils.Database("dw.db")
db["t"].insert_all([{"id": i} for i in range(3)], pk="id")
db["t"].delete_where("id = ?", [0])
# conn.in_transaction becomes True
db["t"].insert({"id": 50})
db["u"].insert({"a": 1})
db.close()
# Upon reopening, rows are [0, 1, 2] — delete, row50, and table u are all lost

This bug caused not only the deletion operation but also subsequent insert operations and even the creation of another table to be lost—a serious defect with data loss risk.

Willison remarked, “This is a really nasty bug. Glad we caught it before shipping.” However, he noted that even if it had shipped in 4.0.0, it would have been a fixable bug, not a design flaw requiring 5.0 or a breaking change.

Collaboration Process with the AI Agent

After discovering the bugs, Willison iteratively worked with the AI to fix them. The entire process involved 37 prompts, 34 commits, and modifications to 30 files.

Notably, Willison conducted most of this work while on the go. While the AI agent took 10–15 minutes to process each new task, he enjoyed the Half Moon Bay Independence Day Parade and sent the next prompt from his iPhone as appropriate.

“The weird thing about code agents is that the harder the task, the more room you have to do other things simultaneously. While the agent works away silently, you can engage in other activities.”

The final review was conducted by switching to a laptop, using GitHub’s pull request interface.

Transaction Handling: The Biggest Change

The most significant change in 4.0rc2 concerns transaction handling. Transaction handling was a flagship feature introduced in 4.0rc1, but the AI’s findings prompted a comprehensive overhaul.

The new release candidate includes comprehensive documentation on the transaction model, clearly defining transaction behavior across all methods of the library. Willison stated, “I place great importance on semantic versioning and want to minimize incompatible major versions,” and this strict versioning policy is the background for adopting the AI review.

Practical Cost of AI Code Review

The cost aspect of this case is noteworthy. At approximately $149.25 (roughly ¥20,000), multiple issues—including a critical data-loss bug—were discovered and fixed. A human developer performing a review of equivalent quality would likely require several hours to half a day of work. Compared to hourly labor costs, this is extremely low.

However, Willison explicitly noted, “The AI’s work was limited to code review and fixes; I handled design decisions and architectural choices.” The AI functioned as a powerful reviewer and implementation partner, not a substitute for human judgment.

Editorial Opinion

AI agent-driven code review and automated fixes have the potential to fundamentally transform the quality assurance process in software development. In this case, it is significant that the AI discovered a data-loss bug that the developer himself had overlooked. In the coming months, we may see a surge in projects integrating AI code review into their CI/CD pipelines. Especially in resource-limited environments like open-source projects, such low-cost quality assurance measures could be a great boon. This aligns with the trend of AI-automated development processes, similar to the case reported previously on automatically investigating Jenkins failure logs with n8n and Claude and sending Slack notifications. In the medium to long term, a system where AI agents autonomously perform code review through to fixes may become established. Just as Willison gave instructions to the AI from his iPhone during a parade, developers will be able to delegate repetitive tasks to AI and focus on more creative decisions.

References

Frequently Asked Questions

What is sqlite-utils?
It's a utility library for operating SQLite databases in Python. It provides functions for creating tables, inserting/updating/deleting data, and exporting to JSON. It's an open-source project developed and maintained by Simon Willison.
What kind of AI model is Claude Fable?
It's a type of large language model "Claude" provided by Anthropic. It is said to be particularly capable of code generation, review, and debugging. Available with a Max subscription, it features a long context window and high reasoning performance.
How much does AI code review cost?
In this case, about $149.25 (roughly ¥20,000). This includes 37 prompts and the associated response generation. For a human senior engineer to deliver the same quality review, it would require several hours to half a day of work, making the AI significantly more cost-effective.
Source: Simon Willison's Weblog

Comments

← Back to Home