{"id":227,"date":"2026-02-01T14:11:33","date_gmt":"2026-02-01T12:11:33","guid":{"rendered":"https:\/\/boxc.net\/blog\/?p=227"},"modified":"2026-06-10T20:45:45","modified_gmt":"2026-06-10T18:45:45","slug":"claude-code-connecting-to-local-models-when-your-quota-runs-out","status":"publish","type":"post","link":"https:\/\/boxc.net\/blog\/2026\/claude-code-connecting-to-local-models-when-your-quota-runs-out\/","title":{"rendered":"Claude Code: connect to a local model when your quota runs out"},"content":{"rendered":"\n<p>If you&#8217;re on one of the cheaper Anthropic plans like me, it&#8217;s a pretty common scenario when you&#8217;re deep into Claude coding an idea, to hit a daily or weekly quota limit. If you want to keep going, you can connect to a local open source model instead of Anthropic. To monitor your current quota, type: <code>\/usage<\/code><\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/boxc.net\/blog\/wp-content\/uploads\/2026\/02\/claude_code_usage.png\"><img loading=\"lazy\" decoding=\"async\" width=\"902\" height=\"632\" src=\"https:\/\/boxc.net\/blog\/wp-content\/uploads\/2026\/02\/claude_code_usage.png\" alt=\"\" class=\"wp-image-231\" srcset=\"https:\/\/boxc.net\/blog\/wp-content\/uploads\/2026\/02\/claude_code_usage.png 902w, https:\/\/boxc.net\/blog\/wp-content\/uploads\/2026\/02\/claude_code_usage-300x210.png 300w, https:\/\/boxc.net\/blog\/wp-content\/uploads\/2026\/02\/claude_code_usage-768x538.png 768w\" sizes=\"auto, (max-width: 902px) 100vw, 902px\" \/><\/a><figcaption class=\"wp-element-caption\">Type <code>\/usage<\/code> to monitor how much quota you have left and how quick you burn it.<\/figcaption><\/figure>\n\n\n\n<p>The best open source model changes frequently! At the time I wrote this post, I recommended\u00a0<a href=\"https:\/\/docs.z.ai\/guides\/llm\/glm-4.7#glm-4-7-flash\" target=\"_blank\" rel=\"noopener\"><strong>GLM-4.7-Flash<\/strong> from Z.AI<\/a> or <strong><a href=\"https:\/\/qwen.ai\/blog?id=qwen3-coder-next\" target=\"_blank\" rel=\"noopener\">Qwen3-Coder-Next<\/a><\/strong> (* Edit: <strong>mid 2026<\/strong> -> check: <a href=\"https:\/\/huggingface.co\/Qwen\/Qwen3.6-27B\" target=\"_blank\" rel=\"noopener\"><strong>Qwen3.6-27B<\/strong><\/a> and <a href=\"https:\/\/deepmind.google\/models\/gemma\/gemma-4\/\" data-type=\"link\" data-id=\"https:\/\/deepmind.google\/models\/gemma\/gemma-4\/\" target=\"_blank\" rel=\"noopener\"><strong>Gemma 4<\/strong><\/a>). If you want or need to save some disk space and GPU memory, try a smaller quantized version which will load and run quicker with a quality cost. I&#8217;ll save another detailed post for how to find the best open source model for your task and machine constraints.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Method 1: LM Studio<\/h3>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"619\" src=\"https:\/\/boxc.net\/blog\/wp-content\/uploads\/2026\/02\/claude_code_lmstudio_model_search-1024x619.png\" alt=\"\" class=\"wp-image-232\" srcset=\"https:\/\/boxc.net\/blog\/wp-content\/uploads\/2026\/02\/claude_code_lmstudio_model_search-1024x619.png 1024w, https:\/\/boxc.net\/blog\/wp-content\/uploads\/2026\/02\/claude_code_lmstudio_model_search-300x181.png 300w, https:\/\/boxc.net\/blog\/wp-content\/uploads\/2026\/02\/claude_code_lmstudio_model_search-768x465.png 768w, https:\/\/boxc.net\/blog\/wp-content\/uploads\/2026\/02\/claude_code_lmstudio_model_search-1536x929.png 1536w, https:\/\/boxc.net\/blog\/wp-content\/uploads\/2026\/02\/claude_code_lmstudio_model_search.png 1954w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">Accessing open source models in LM Studio<\/figcaption><\/figure>\n\n\n\n<p>If you haven&#8217;t used LM Studio before, it&#8217;s an accessible way to find and run open source LLMs and vision models locally on your machine. In version 0.4.1, they introduce support to connect to Claude Code (CC). See here: <a href=\"https:\/\/lmstudio.ai\/blog\/claudecode\" target=\"_blank\" rel=\"noopener\">https:\/\/lmstudio.ai\/blog\/claudecode<\/a> or follow the instructions below:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><a href=\"https:\/\/lmstudio.ai\/download\" target=\"_blank\" rel=\"noopener\">Install and run LM Studio<\/a><\/li>\n\n\n\n<li>Find the model search button to install a model (see image above). LM Studio recommends running the model with a context of &gt; 25K.<\/li>\n\n\n\n<li>Open a new terminal session to:<br>a. start the server: <code>lms server start --port 1234<\/code><br>b. configure environment variables to point CC at LM Studio: <br><code>export ANTHROPIC_BASE_URL=http:\/\/localhost:1234<\/code><br><code>export ANTHROPIC_AUTH_TOKEN=lmstudio<\/code><br>c. start CC pointing at your server: <code>claude --model openai\/gpt-oss-20b<\/code><\/li>\n\n\n\n<li>Reduce your expectations about speed and performance! <\/li>\n\n\n\n<li>To confirm which model you are using or when you want to switch back, type <code>\/model<\/code><\/li>\n<\/ol>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/boxc.net\/blog\/wp-content\/uploads\/2026\/02\/claude_code_running_oss_model.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"318\" src=\"https:\/\/boxc.net\/blog\/wp-content\/uploads\/2026\/02\/claude_code_running_oss_model-1024x318.png\" alt=\"\" class=\"wp-image-233\" srcset=\"https:\/\/boxc.net\/blog\/wp-content\/uploads\/2026\/02\/claude_code_running_oss_model-1024x318.png 1024w, https:\/\/boxc.net\/blog\/wp-content\/uploads\/2026\/02\/claude_code_running_oss_model-300x93.png 300w, https:\/\/boxc.net\/blog\/wp-content\/uploads\/2026\/02\/claude_code_running_oss_model-768x239.png 768w, https:\/\/boxc.net\/blog\/wp-content\/uploads\/2026\/02\/claude_code_running_oss_model-1536x478.png 1536w, https:\/\/boxc.net\/blog\/wp-content\/uploads\/2026\/02\/claude_code_running_oss_model.png 2032w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><figcaption class=\"wp-element-caption\">Enter \/model to confirm which model you are using or to switch back<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">Method 2: Connecting directly to Llama.CPP<\/h3>\n\n\n\n<p>LM Studio is built on top of the open source project <a href=\"https:\/\/github.com\/ggml-org\/llama.cpp\" target=\"_blank\" rel=\"noopener\">llama.cpp<\/a>.<br>If you prefer not to use LM Studio, <a href=\"https:\/\/unsloth.ai\/docs\/basics\/claude-codex\" target=\"_blank\" rel=\"noopener\">you can install and run the project directly and connect Claude Code to it<\/a> but honestly, unless you are fine tuning a model, or have really specific needs, probably LM Studio is going to be a quicker setup.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Conclusion<\/h3>\n\n\n\n<p>For the moment, this is a backup solution. Unless you have a monster of a machine, you&#8217;re going to notice the time it takes to do things and a drop in code quality but it works(!) and it&#8217;s easy enough to switch between your local OSS model and Claude when you&#8217;re quota limit is back, so it&#8217;s a good way to keep coding when you&#8217;re stuck or you just want to save some quota. If you try it let me know how you go and which model works for you.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>If you&#8217;re on one of the cheaper Anthropic plans like me, it&#8217;s a pretty common scenario when you&#8217;re deep into Claude coding an idea, to hit a daily or weekly quota limit. If you want to keep going, you can connect to a local open source model instead of Anthropic. To monitor your current quota, &hellip; <a href=\"https:\/\/boxc.net\/blog\/2026\/claude-code-connecting-to-local-models-when-your-quota-runs-out\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Claude Code: connect to a local model when your quota runs out<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":240,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-227","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/boxc.net\/blog\/wp-json\/wp\/v2\/posts\/227","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/boxc.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/boxc.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/boxc.net\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/boxc.net\/blog\/wp-json\/wp\/v2\/comments?post=227"}],"version-history":[{"count":13,"href":"https:\/\/boxc.net\/blog\/wp-json\/wp\/v2\/posts\/227\/revisions"}],"predecessor-version":[{"id":268,"href":"https:\/\/boxc.net\/blog\/wp-json\/wp\/v2\/posts\/227\/revisions\/268"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/boxc.net\/blog\/wp-json\/wp\/v2\/media\/240"}],"wp:attachment":[{"href":"https:\/\/boxc.net\/blog\/wp-json\/wp\/v2\/media?parent=227"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/boxc.net\/blog\/wp-json\/wp\/v2\/categories?post=227"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/boxc.net\/blog\/wp-json\/wp\/v2\/tags?post=227"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}