Self-Learning Loop

Most AI tools forget every mistake.

You ask Claude to do something, it tries something that doesn't work on this particular WordPress host, you tell it the workaround, you get the task done. Tomorrow, on a different site, the same thing happens. The tool forgot. It has no memory of what it learned yesterday, because it has no memory at all.

This page is about the part of Paper Route that's designed to fix that.

The pipeline

Paper Route's self-learning loop has three stages.

Stage 1: Failure capture. When an AI tool calls one of Paper Route's MCP tools and the call fails in a way that looks like a real obstacle (a WordPress plugin returns an error, a file write hits a permission issue, a WP-CLI command isn't available on the host), Paper Route can log a LearningEvent. The event records what was attempted, on what kind of site, what failed, and what error came back. The mechanism is a report_issue MCP tool the AI tool calls deliberately when it encounters something interesting. We made it explicit rather than automatic so the agent has to decide that a failure is worth flagging, which keeps the noise floor low.

The LearningEvent has hooks into the site's detected hosting platform, builder, and SEO plugin, so we know not just "this failed" but "this failed on Cloudways with Elementor and Yoast." Context matters. The same failure on different stacks can be different problems.

Stage 2: Review. LearningEvents don't become knowledge entries automatically. They land in a review queue. A human (currently the Paper Route team) looks at the queue periodically, decides whether the failure represents a real pattern that future agents should know about, and either promotes the event to a curated knowledge entry or dismisses it.

We chose human review over automatic promotion for two reasons. First, AI-generated failures aren't always meaningful. An agent can fail because it asked the wrong question, not because the host has a quirk. Auto-promoting every failure would fill the library with noise. Second, the curated tier is the trusted tier, and trust is harder to build than it is to break. We'd rather have 30 entries we stand behind than 300 we don't.

Stage 3: Promotion. Approved events become new KnowledgeEntries with sourceType: paper_route_curated, the same trust tier as the original entries we shipped with. They get tagged, indexed by tsvector, and from that point forward they're part of the library that AI tools see in their context blocks and search results. The next agent that hits the same situation gets the answer.

That's the loop. Failure to learning event to reviewed entry to curated knowledge to context for the next agent. Closed loop.

The honest version

We built this loop. We don't yet know how well it works in production.

The pieces are wired up. The data model exists. The MCP tool that captures failures works. The review queue is queryable. We've manually promoted entries through the pipeline to verify the shape of the data. What we haven't done is run it for long enough at sufficient volume to know whether the failures we capture are actually the failures worth learning from.

There are a few specific things that could go wrong, and we're watching for them.

Failures might cluster around the wrong things. Most agent failures might turn out to be the agent asking malformed questions, not the host having a quirk. If 90% of LearningEvents are noise, the review process becomes a chore that doesn't pay off, and we'll either stop reviewing or start lowering the bar to clear the backlog. Either failure mode kills the loop.

The captured context might be insufficient. A LearningEvent records what we know at the moment of failure: the tool name, the arguments, the error, the site context. But the most useful thing about a failure is usually the fix, and the fix isn't in the data model. It's in the conversation between the user and the agent, which Paper Route doesn't see. We may need to add a way for the user (or the agent) to attach a "here's what worked" note to a learning event. That isn't built yet.

The volume might be too low to matter. If Paper Route is running on 10 sites and each one fails interestingly once a week, that's 40 events a month. A year of that is enough to maybe author 50 new entries if half are worth keeping. That's meaningful but not a flywheel. The loop only starts to feel like a real flywheel when the volume is high enough that the system is genuinely getting better faster than a human team could write entries from scratch. We're not at that scale yet.

What we'll measure to know

A few signals we're going to track over the next few months to figure out whether the loop is working.

Rate of LearningEvents per active site per week. If it's too low, the system isn't capturing much, and the loop is dormant. If it's too high, we're capturing noise.
Promotion rate. What fraction of LearningEvents get promoted to KnowledgeEntries during review? A reasonable rate is somewhere between 10% and 30%. If it's much lower, our capture criteria are too loose. If it's much higher, we're being too generous in review.
Re-occurrence rate after promotion. When we promote an entry, the same kind of failure should stop happening on similar sites. If we promote an entry about WP Engine PHP file writes and we keep seeing the same failure show up next week, the entry isn't reaching the agent at the right time. That's a delivery problem, not a knowledge problem, and it tells us to look at how the context block is built or how the agent is searching.
Cross-site generalization. When an entry is promoted from a failure on Site A, does it actually help on Site B? This is the hardest signal to measure but the most important. The whole point of curating knowledge once and using it many times is generalization.

We don't have these dashboards built yet. They are on the list.

What gets captured and what doesn't

A note on data, because it matters.

LearningEvents capture the structured outcome of MCP tool calls. The tool name, the arguments at a structural level (which site, which file path, which kind of operation), the error response from WordPress, the timestamp, and the site context. They do not capture the user's prompt to the AI tool, the AI tool's reasoning, or the conversation around the failure. Paper Route is the proxy between the agent and WordPress. We see what passes through us. The agent's internal state stays with the agent.

This is partly a privacy choice and partly an architectural one. We don't want to be in the position of holding conversational data on behalf of AI tools we don't control. The structured outcome data is enough to learn from in most cases, and it's the data the agent's host provider (Anthropic, OpenAI, whoever) is already responsible for, not us.

If you connect Paper Route to a site and the AI tool's actions trigger learning events, the events live in Paper Route's database, scoped to your account. We don't aggregate them across accounts for now, and if we ever do, that will be opt-in and clearly disclosed.

Why this matters

The strategic argument for building the loop, stated plainly:

The tool layer is going to get commoditized. WordPress Core is standardizing on the Abilities API as how plugins declare capabilities, the official MCP adapter translates abilities into MCP tools automatically, and within the next year or so any plugin author who wants to be MCP-accessible will be. Building MCP tools for WordPress is going to be table stakes, not a differentiator.

What doesn't get commoditized is operational context. Knowing which version of Cloudways changed the file ownership defaults, which managed host silently rejects ed25519 SSH keys, which page builder stores its content in a way that a naive AI write will mangle. That kind of knowledge has to be accumulated by someone who's been watching real sites fail for a long time. It can't be derived from a plugin's manifest.

The self-learning loop is the bet that this kind of knowledge can be captured systematically as it's encountered, not just authored upfront. If the bet works, Paper Route's library grows faster than any team could grow it manually, and the value of the library compounds with every connected site. If the bet doesn't work, the loop quietly stops being useful and we're back to manual curation, which is also fine, just slower.

Either way, we'd rather find out.

Back to the knowledge system

The pipeline

The honest version

What we'll measure to know

What gets captured and what doesn't

Why this matters

On this page