WordPress 7.0 introduces a centralized way to connect AI providers, which is a useful step. I can plug in services like OpenAI, Claude, and Gemini from one place and use them across AI features in the admin.
But what about local AI?
I’ve been experimenting with models that run on my own hardware, and I wanted that same capability inside WordPress. If I can run inference on my machine, I should be able to connect that to my site just like any hosted provider. So I built a plugin to do exactly that.
Why I care about local AI in WordPress
Hosted AI APIs are convenient, but they come with tradeoffs.
Policies change. Access changes. Terms change. Sometimes models or use cases get restricted in ways that are outside my control. Even when a provider is generous today, there’s no guarantee that remains true tomorrow.
If we care about democratizing publishing, I think we also need to care about democratizing compute. That means being able to choose where inference runs, what models I use, and how much I depend on frontier labs for routine publishing workflows.
What WordPress 7.0 gives us
The new AI connector system in WordPress 7.0 creates a central hub for model connections. That’s what I’m building on.
Out of the box, it’s aimed at hosted frontier providers. That makes sense, and it will be enough for a lot of people. But WordPress has always worked well when core provides a foundation and plugins extend it.
So instead of waiting for local inference support to land in Core, I made my own connector.
The plugin: connecting local inference to WordPress
I built a plugin called MW Local AI Connector. Once installed, it adds a Local AI connector alongside the standard provider connections.
The basic idea is simple:
- run a local inference server such as Ollama or LM Studio
- expose it through a stable endpoint
- connect that endpoint to WordPress through the new connector UI
That proxy can sit in front of:
- Ollama
- LM Studio
- MLX Studio
- Apple’s on-device model APIs (try this script)
- other compatible endpoints
That way I’m not rewiring WordPress every time I switch local runtimes.
Mixing local and hosted models
One thing I wanted to preserve is flexibility.
This is not really an argument for “local only.” Different tasks have different requirements. Some jobs are small and private and fit well on local hardware. Others may need longer context windows, more reliability, or just a model that performs better for a specific task.
In the WordPress AI UI, I can choose models per feature. In the demo, I configured one task to use a local Gemma model for excerpt generation, and another to use a hosted model through a proxy for summarization.
For example:
- local models can be a good fit when I want more control or fewer refusal constraints
- hosted models can still be useful when I want stronger general performance or larger context
- a proxy layer can make both look consistent from WordPress’s point of view
I don’t think this has to be either/or.
What the demo actually shows
In the walkthrough, I install the plugin, configure the Local AI connector, and point it at my machine.
Then I test it inside WordPress’s AI features by generating an excerpt and a summary. One request runs against a model on my own computer, and I verify that by showing the GPU load spike while the request is being processed. Another request goes through a different routed model, which shows how this setup can act as a clearinghouse across local and remote inference.
There was also a small debugging moment during the demo: I had an older plugin version active from an earlier run-through. Once I fixed that, the flow worked as expected. I left that in because this kind of setup is still real software, not magic. Version mismatches and endpoint issues are part of the territory.
My goal here wasn’t to replace hosted AI outright. It was to make WordPress 7.0’s new AI foundation more open to local-first workflows.
Closing
If I have a capable Mac or GPU available, I want to be able to use it. And if WordPress gives me a connector architecture, I want local inference to be a first-class option within that architecture.
Democratize publishing. Democratize compute.