Context Window Fetching & Token Counting Fix for Kilo Code and LM Studio
Tools like Roo-Code and Kilo Code have become increasingly interesting for developers wanting to leverage locally hosted AI models. Such extensions generally depend on an OpenAI-compatible API that provides essential model metadata, such as the contextWindow
. LM Studio, a popular choice for hosting local models, provides a compatible API, but currently omits the contextWindow
property, leading to compatibility issues with these extensions.
The Challenge
Extensions like Roo-Code expect specific information from the /v1/models
endpoint, particularly the contextWindow
attribute, which defines the maximum token length supported by a model. LM Studio’s current OpenAI-compatible API doesn’t return this field, causing issues such as inaccurate token counting or outright errors in the tools.
This problem has been actively discussed by the community, for instance in:
-
A pull request on Roo-Code addressing contextWindow handling: RooCodeInc/Roo-Code PR #3372
-
An alternative fork exploring related compatibility fixes: Jbbrack03’s Roo-Code fork
A Solution: A Simple Python Proxy
To address this issue temporarily until Roo-Code and Kilo Code integrate permanent fixes, I’ve developed a lightweight Python script using Flask, available here:
https://github.com/vtietz/lmstudio_proxy
The proxy is straightforward:
-
It forwards all incoming requests transparently to LM Studio.
-
It automatically injects a sensible default
contextWindow
value (defaulting to either the model’smax_tokens
or 8192 tokens) into responses from the/v1/models
and/v1/models/{id}
endpoints. -
It provides detailed logging to simplify debugging.
The proxy operates simply by intercepting API calls from tools like Roo-Code and forwarding them directly to LM Studio’s API, adding missing information on-the-fly to maintain compatibility. It doesn’t handle streaming progress indicators since LM Studio’s API currently does not provide this data.
The solution provided here is designed as a temporary compatibility layer until the respective extensions implement dedicated fixes.