Setting up llama.cpp and LM Studio with Windows and WSL

This is a story of my quest to get llama.cpp working on Windows and WSL and to get LM Studio to share llama.cpp based models between Windows and WSL for consistency and drive resource conservation.

This one was a doozy, and be forewarned:

YMMV
My notes on building llama.cpp in WSL are suspect and may have steps missing and the whole thing was a royal pain.
My notes in general were spotty on this, and had to be re-constructed by forensic investigation of my workstation.

Step 1: Installing `llama.cpp` on Windows

This part was fairly easy:

winget install llama.cpp

Step 2. Installing LM Studio

Now that I had llama.cpp installed in Windows, I next moved on to download and install LM Studio This gives me a local API server for using models and easy loading and unloading.

LM studio by default stores models that it downloads from HuggingFace in :

~/.lmstudio/models/lmstudio-community

in Windows. In my case, I downloaded a whopper to test it out: Qwen3-coder-30B-A3B-Instruct-GGUF (Q3_K_XL) directly from HuggingFace, made a directory for it in the expected location and copied it there. You won’t have to do that if the exact model you want is available from within LM Studio. I didn’t see the quant that I wanted in the interface, so i had to go the manual route.

Step 3. Unifying Model Paths Across OS Boundaries

In your Windows environment variables, create an environment variable for LLAMA_MODELS and set it to %USERPROFILE%\.lmstudio\models

If WSLENV already exists, append LLAMA_MODELS/p using a colon : as the delimiter. The /p ensures Windows paths are translated for WSL.

If you just created it: LLAMA_MODELS/p
If you added it to an existing one VARS_YOU_ALREADY_HAD:LLAMA_MODELS/p

Now, open a new terminal window for WSL and create your symbolic link to the folder in Windows:

ln -s "$LLAMA_MODELS" ~/llama_models

this will create a symbolic link folder called llama_models in your home directory that is an alias for the LM Studio models folder in Windows.

Now both Windows and WSL should be able to share models.

Step 4. Building `llama.cpp` from Source in WSL (with CUDA)

Once again, read the disclaimer at the top for this section. Additional note, you have to have a compatible NVIDIA card for this section or it will not work, and you’re on your own here to figure out how to get this working for your particular GPU. From here on out, we’re working from my possibly suspicious notes:

Install the build dependencies:

sudo apt update
sudo apt install -y \
  build-essential \
  cmake \
  libopenblas-dev \
  libcurl4-openssl-dev \
  nvidia-cuda-toolkit \
  cuda-toolkit

Clone and build the source (this took a lot of trial and error and searching):

cd _path_to_wherever_you_want_to_put_llama_cpp_source_
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release

Step 5. Adding the `llama.cpp` executables to your PATH

In your ~/.bashrc:

export PATH="_wherever_you_cloned_it_to_in_step_4_/llama.cpp/build/bin:$PATH"

Now when you restart or open a new terminal, you now have the llama.cpp commands available wherever you need them in the terminal.

Step 6. Testing `llama.cpp` with your model in WSL

Note, this is what I did, your command will be different based on the model you chose, just swap out your model with yours, and swap out the prompt for something that makes sense with your model.

llama-cli \
  --model "$LLAMA_MODELS/lmstudio-community/Qwen3-coder-30B-A3B-Instruct-GGUF/Qwen3-Coder-30B-A3B-Instruct-UD-Q3_K_XL.gguf" \
  --prompt "Explain the concept of MCP servers in 3 paragraphs." \
  --ctx-size 4096 \
  --temp 0.7 \
  --top-p 0.95 \
  --repeat-penalty 1.1 \
  --n-gpu-layers 100 \
  --threads 12

And you should get a response from your model…eventually.

Step 7. Accessing LM Studio’s API from WSL

YMMV again, as it depends on how you have your WSL networking set up. I have a blog post on this subject on how I did it , but once you sort it out the way you like it (WSL/W11 networking quest ), replace the IP address in the following line to test that you can now access LM Studio in Windows from WSL

curl http://192.168.1.100:1234/v1/models

and the LM Studio should give you a list of all the models you currently have installed. Something like this:

{
  "data": [
    {
      "id": "qwen3-coder-30b-a3b-instruct",
      "object": "model",
      "owned_by": "organization_owner"
    },
    {
      "id": "openai/gpt-oss-20b",
      "object": "model",
      "owned_by": "organization_owner"
    },
    {
      "id": "text-embedding-nomic-embed-text-v1.5",
      "object": "model",
      "owned_by": "organization_owner"
    }
  ],
  "object": "list"
}

Have fun, and I hope this post helped!

Step 1: Installing llama.cpp on Windows