Knowledge System

Paper Route ships with a library of WordPress operational context. Hosting restrictions on the major managed hosts, page builder quirks, deployment workarounds, SEO plugin field references, the kind of thing that an AI tool would otherwise have to learn the hard way on every new site.

This page explains what's in the library, how it gets delivered to the AI tool you're connecting from, and the design choices we made along the way.

What's in the library

As of this writing, the library has 56 entries. They fall into a few buckets:

Hosting restrictions. What WP Engine blocks at the OS level, what Cloudways does with file ownership in PHP-FPM, the SSH key requirements for Flywheel and Kinsta, the WP-CLI Gateway on managed hosts, and the workarounds that actually work on each one.
Page builder content patterns. How WPBakery serializes its layout in shortcodes and where the editable text actually lives. How Elementor stores content in postmeta. How Gutenberg blocks parse and which ones round-trip cleanly through edits.
Deployment patterns. Things like the SSH stdin pipe trick for writing PHP files to hosts that block direct file creation. The perl-with-hash-delimiters pattern for remote file editing through SSH. How to discover the WordPress table prefix instead of assuming wp_.
SEO plugin fields. Yoast meta field names. RankMath equivalents. Where the canonical URL lives. Which fields are stored as postmeta and which are stored in custom tables.

Every entry is tagged by domain (hosting, page_builder, seo, deployment, performance, and so on) and where applicable by platform (wpengine, cloudways, kinsta, flywheel) and builder (wpbakery, elementor, gutenberg). The tags drive both search filtering and automatic context delivery.

Two ways the AI tool gets knowledge

The system has two delivery mechanisms, and they serve different purposes.

Context block on session start. When an AI tool connects to a Paper Route site through MCP, the initialize handshake includes a small block of knowledge relevant to that specific site. Paper Route already knows what host the site is on, what page builder it uses, and what SEO plugin is installed, because the WordPress plugin reports that context back the first time it connects. The initialize block contains the highest-priority entries for that stack, capped at around 2000 characters so it doesn't crowd out the rest of the conversation. The AI tool sees it before it touches anything.

This is the part most users never notice. They just observe that their AI tool seems to know what to do on each site, and they assume it's clever. It is clever, but the cleverness is borrowed from the context block.

Search on demand. When an AI tool needs to know more than what the context block provides, it can search the library directly. Paper Route exposes a search_knowledge MCP tool that takes a query and optional filters and returns ranked entry summaries. A second tool, get_knowledge, fetches the full entry by ID after the search. This is the standard search and retrieve pattern, and we built it because the context block is necessarily small and the agent will sometimes need to dig deeper.

A third tool, list_knowledge_topics, gives the AI tool a high-level view of what categories of knowledge exist for the connected site. Most agents don't need this, but it's there.

Two sources, ranked by trust

The library has two kinds of entries.

Curated entries are written and verified by the Paper Route team. They cover the operational gotchas we've learned the hard way running an agency on managed WordPress hosting for the last decade. They are also the entries that fill the gap nobody else has filled. There is currently no other structured, agent-consumable source of "WP Engine blocks PHP file writes through PHP-FPM but not through SSH" anywhere in the public AI tooling ecosystem. We checked.

Community entries are mirrored from the official WordPress/agent-skills repository, maintained by the WordPress community. We picked three of the dev-focused skills (wp-wpcli-and-ops, wp-performance, wp-rest-api) and imported them as standalone entries alongside the curated content. These cover general WordPress development patterns that the community already does well, and there was no point rewriting them ourselves.

The two sources are weighted differently in search results. Curated entries get a 1.2x boost on the relevance score, so when both a curated and a community entry could answer the same question, the curated one surfaces first. The reasoning is that curated entries are typically more host-specific and operationally critical, and community entries are typically more general background. Both still appear, the order just reflects which kind of answer is more likely to matter.

When the upstream WordPress repository updates a skill, Paper Route's import script picks up the change on the next sync, recomputes the content checksum, and updates the entry in place. We mirror, we don't fork. The community owns the content; we own the delivery.

What we chose not to build

A few things we deliberately did not build.

Vector search. The obvious modern choice for a knowledge system is to embed each entry as a vector and use semantic similarity for retrieval. We're using Postgres tsvector full-text search instead. The reason is that at 56 entries, tsvector is fine. We tested it against 8 sample queries during the import and 6 of the 8 returned the right results on the first try. The two that missed were vocabulary mismatches that pgvector would solve, but those misses don't justify the operational cost of a separate vector index, embedding API calls on every search, and the added complexity. When the library gets to a few hundred entries, or when search recall actually starts hurting, we'll add pgvector as a hybrid layer alongside tsvector. Until then, the simple thing is the right thing.

Hierarchical entries. Early in the design we considered a parent and child relationship for entries, so a "skill" could be one parent entry with multiple child reference entries. We walked away from it. The flat model is easier to reason about, the search ranking already differentiates between general and specific entries through tsrank scoring, and a parent and child structure would have created subtle bugs in the context block delivery. One skill tree could eat all 15 slots in the per-site context block. At 56 entries, hierarchy is a solution to a problem we don't have.

A general-purpose RAG framework. We're not using LangChain, LlamaIndex, or any other RAG framework. The whole knowledge system is a few hundred lines of TypeScript, a Postgres table, and four MCP tools. Frameworks make sense when you have a complex retrieval pipeline and a team that needs to standardize on patterns. We have a small team, a focused use case, and code we can fit in our heads.

What the system gets right and what's still open

Things we think the system gets right:

The two-tier delivery mechanism. Context block for the things the AI tool will always need, search for the things it might. Most users never trigger a search, because the context block already had what they needed.
Site-scoped context. The AI tool only sees knowledge relevant to the site it's connected to. A WP Engine site doesn't get Cloudways quirks dumped into its context.
Source provenance. Every entry has a source field. Curated entries are clearly distinguished from community entries in search results and in the data model.

Things still open:

Whether agents actually call search_knowledge proactively when they should. Our early test on a real site showed Claude resolving a task without ever searching, because Cloudways is well-known enough that Claude already had the answer in training. The knowledge system added zero value on that specific request. Whether the system is consistently useful depends on whether agents reach for it on the harder tasks where they don't know the answer, and we don't have enough real usage data yet to know.
Vocabulary mismatch between user queries and entry content. Tsvector doesn't bridge "REST API 401 nonce error" to "cookie nonce authentication" the way an embedding model would. We're tracking this as a signal for when to add pgvector.

Where the system goes next

The library is the foundation. The interesting layer on top of it is the self-learning loop, which is how new entries get added when the system encounters something it doesn't know. That's a separate page.

The self-learning loop