In The Cloud: How to run free LLM chat agent on VSCode?

Unleash the power of LLM agent on Visual Studio Code for personal use

With the advancement of LLM agents they've become an essential tool for developers to help debug and document code, and also to assist in design, build and review data solutions.

Visual Studio Code (VSCode) is a popular code editor that has gained popularity among developers due to its simplicity and ease of use. It easily integrates GitHub Copilot which is widely used by enterprise developers. However, quota limits the number of requests they can make to the LLM agents on the free tier. Individual developers may not want to commit to a paid subscription or may wish to consider alternatives.

In this article, we will explore how to run free LLM agents in VSCode.

What are the avaiable cloud LLM gateways for VSCode?

Multiple cloud LLM gateway platforms are available. Some of them are:

Open Router https://openrouter.ai/models
Note that Open Router impose usage limits on its free-tier, read their documentation for more details.
Groq Console https://console.groq.com/docs/models
Groq does not impose usage limits on its free-tier, though fewer models are available https://console.groq.com/docs/rate-limits
portkey https://portkey.ai/features/ai-gateway

They all provide access to multiple LLM models, with some free tier options. VSCode developers can integrate any of them into their environment using the extension Continue published by continue.dev.

What about self hosting open source LLM locally?

Self hosting LLM locally is a great option for developers who want to run their own LLM agents or are concerned about data privacy. This prevents data from being sent to third parties. However, the most powerfull models requires significant computational resources and may not be suitable for everyone. In addition to powerful GPU, available memory is also a limiting factor.

A popular tool for self hosting LLM locally is Ollama, which can be directly installed on your computer or through a Docker image.

Ollama https://ollama.com/
Docker https://www.docker.com/products/docker-desktop/
Some open source LLM models include :
- Deepseek
- Kimi
- Llama by Meta
- Mistral
- Qwen by Alibaba

Some models will offer a variant specialized for development which may be more suitable for your needs.

How to install and configure VSCode Continue extension?

Download and install VSCode https://code.visualstudio.com/Download
- Consider turning-off telemetry settings.
Search and install the extension Continue published by continue.dev
To set up a cloud LLM gateway:
1. Create your account on the cloud platform and get your API key.
2. Configure the LLM models in the file ~/.continue/config.json which replaces config.yml (you can safely deleted the YAML file):
```
{
  "models": [
    {
      "title": "your model name",
      "provider": "your provider name",
      "model": "provider-model-name",
      "apiBase": "provider-api-base-url",
      "apiKey": "your-provier-api-key",
      "toolChoice": "auto",
    },
}
```
  The reason for using the JSON configuration file instead of YAML is that the JSON config allows more needed to be set.
  A more concrete example using Qwen3 specialized Coder model on OpenRouter:
```
{
  "models": [
    {
      "title": "Qwen3-Coder-480B (OpenRouter - Free)",
      "provider": "openai",
      "model": "qwen/qwen3-coder:free",
      "apiBase": "https://openrouter.ai/api/v1",
      "apiKey": "your-openrouter-api-key",
      "contextLength": 16384,
      "toolChoice": "auto",
      "completionOptions": {
        "temperature": 0.15,
        "maxTokens": 4096
      }
    },
}
```
  Note that settings such as "contextLength", "temperature" and "maxTokens" varie between models and providers, and can be adjusted.
3. Reload VSCode using the command Ctrl+Shift+P and select "Developer: Reload Window", then select Continue to display the chat option. If you have configured multiple models, you can select the model you want to use from the dropdown menu.
If you are self-hosting a LLM model:
1. Download and install Docker https://www.docker.com/products/docker-desktop/
2. Start Docker Desktop and open a terminal.
3. Download and install Ollama, for computers without GPU, run the following command:
```
docker pull ollama/ollama
```
  If your computer have a GPU, the installation procedure differ depending on your GPU manufacturer. Refer to these instructions https://docs.ollama.com/docker
4. Start Ollama on Docker, you can give your container a different name:
```
docker run -d --name ollama-container -p 11434:11434 -v ollama:/root/.ollama ollama/ollama
```
5. Download and install a LLM model on your Ollama container, for a list of available models, see https://ollama.com/library?sort=popular
  For example, to download Qwen 2.5 specialized Coder 1.5b:
```
docker exec -it ollama-container ollama pull qwen2.5-coder:1.5b
```
  You must specify the model name exactly as listed on the page. Larger ##b versions are more powerful but require more memory. Start with the smaller and upgrade later if your computer has enough memory.
6. Verify the model is installed in your Docker container:
```
docker exec -it ollama-container ollama list
```
7. Test the model prompt directly to ensure it works (just say "hello"):
```
docker exec -it ollama-container ollama run qwen2.5-coder:1.5b
```
8. Optionnally, you can install the Docker extension for VSCode to manage your Docker containers.
9. Add the local LLM model to the file ~/.continue/config.json:
```
{
  "models": [
    {
      "title": "Qwen2.5-Coder-1.5b (Chat)",
      "provider": "ollama",
      "model": "qwen2.5-coder:1.5b",
      "toolChoice": "auto",
      "contextLength": 8192,
      "maxTokens": 4096
    },
}
```
  Specify lower context length reduces resource usage and improve response time. You can progressively increase context length to improve accuracy if your computer performance allows it.
  For example, if you configured both cloud and local Qwen models, your configuration file should look like this:
```
{
  "models": [
    {
      "title": "Qwen3-Coder-480B (OpenRouter - Free)",
      "provider": "openai",
      "model": "qwen/qwen3-coder:free",
      "apiBase": "https://openrouter.ai/api/v1",
      "apiKey": "your-openrouter-api-key",
      "contextLength": 16384,
      "toolChoice": "auto",
      "completionOptions": {
        "temperature": 0.15,
        "maxTokens": 4096
      }
    },
    {
      "title": "Qwen2.5-Coder-1.5b (Chat)",
      "provider": "ollama",
      "model": "qwen2.5-coder:1.5b",
      "toolChoice": "auto",
      "contextLength": 8192,
      "maxTokens": 4096
    },
}
```
10. Open the Continue chat on VSCode and start interacting with the model. When using the local LLM, always start your Docker container before.

Tip: by default, the Continue extension opens on the left side of the editor, you can drag the Continue icon to the Chat panel to move it to the right side, and switch between GitHub Copilot chat and Continue chat.

You are all set now, enjoy the power of free LLM models on Visual Studio Code.

Are there other alternatives to GitHub Copilot and Continue?

Yes, some LLM models inclure their own VSCode extension. Amongst the free options, Qwen is avaiable through the extension Qoder (formerly Lingma). Configuration is easier as long you have an account on their platform.

In The Cloud

How to run free LLM chat agent on VSCode?

Unleash the power of LLM agent on Visual Studio Code for personal use

What are the avaiable cloud LLM gateways for VSCode?

What about self hosting open source LLM locally?

How to install and configure VSCode Continue extension?

Are there other alternatives to GitHub Copilot and Continue?

No comments:

Post a Comment

How to run dbt on Docker