local llm with claude code and opencode

어떤 로컬 llmfit(https://github.com/AlexsJones/llmfit) 을 사용하면 내 하드웨어에 맞는 최고의 모델이 무엇인지알 수 있다.

Inst 컬럼은 설치 가능 여부 표시다.

O ollama 로 다운로드 가능

L llama.cpp 로 다운로드 가능

S LM Studio 로 다운로드 가능

- 다운로드 가능한 프로바이더 찾지 못함

✓ 로컬에 설치되어 있음

llmfit 에서도 perfect 라고 나와도 실제 내가 사용중인 프로그램/환경등으로 인해 리소스가 부족해 안뜰 수 있다.

lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit 은 ollama 에서 찾을 수 없어 qwen3-coder 로 사용해보자.

ollama pull qwen3-coder:latest

ollama ls

ollama run qwen3-coder

ollama ps

참고로 ollama는 OpenAI 호환 API (/v1/chat/completions 등)를 제공한다.
opencode.json > provider > ollama 에 다음과 같이 설정한다.

{

"provider": {

"ollama": {

"npm": "@ai-sdk/openai-compatible",

"name": "Ollama (local)",

"options": {

"baseURL": "http://localhost:11434/v1"

"models": {

"qwen3-coder:latest": {

"name": "Qwen3 Coder 30.5B"

"llama3.2:latest": {

"name": "Llama 3.2 3B"

}

opencode 에서 로컬 llm 사용 결과

claude code 에서는 openAI api 호환이 되지 않아 중간에 litellm(proxy) 서버를 둬야 한다.

참고로 litellm[proxy] 로 기본 + 프록시 실행에 필요한 추가 패키지들(backoff, uvicorn, fastapi 등)을 함께 설치해야 한다.

uv pip install 'litellm[proxy]' --system

litellm 프록시 서버 구동

litellm --model qwen-coder3 --api_base http://localhost:11434/v1 --port 8001

이제 다음 환경변수를 .local_llm_env_for_claude_code 등의 파일로 저장하자.

# 로컬 llm URL

export ANTHROPIC_BASE_URL="http://localhost:8001"

# 더미값으로 설정하면 된다.(sk:SecretKey)

export ANTHROPIC_API_KEY="sk-dummy-key"

# Claude Code의 외부 통신을 최소화하기 위해 다음 환경변수도 설정하자.

# 텔레메트리(사용 통계 수집)를 비활성화

export CLAUDE_CODE_ENABLE_TELEMETRY=0

# 핵심 API 호출 외의 불필요한 네트워크 트래픽 차단

export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1

# API 요청 시 Claude Code가 보내는 attribution 헤더(어떤 클라이언트에서 요청했는지 식별하는 정보)를 비활성화

export CLAUDE_CODE_ATTRIBUTION_HEADER=0

이제 다음과 같이 실행한다.

source .local_llm_env_for_claude_code; claude --model qwen3-coder

실행 후 api key 사용을 선택

왼쪽: 기본 opus 모델을 사용한 경우

오른쪽: 로컬 llm(qwen3-coder) 사용한 경우

사용해보니 ollama M1 맥북에서 느리다.

Apple Silicon 전용 머신러닝 프레임워크 MLX(https://github.com/ml-explore/mlx)를 사용하자.

위 llmfit 나온것도 MLX 모델이다.

# 설치

pip install mlx-lm

# 또는

uv tool install mlx-lm

# 실행, 모델은 huggingface 에서 다운받는다.

mlx_lm.server \

--model lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit \

--port 8080

# 모델명을 확인한다.

http://localhost:8080/v1/models

# ~/.config/opencode/opencode.json 에 다음 모델 설정을 추가한다.

{

"$schema": "https://opencode.ai/config.json",

"provider": {

"mlx-local": {

"npm": "@ai-sdk/openai-compatible",

"name": "MLX Local",

"options": {

"baseURL": "http://localhost:1234/v1"

"models": {

"lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit": {

"name": "lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit"

}

이제 opencode 를 열고 모델을 선택해 사용하면 된다.

# 참고로 모델은 ~/.cache/huggingface/hub 에 모델이 있다.

# huggingface_hub(hf)툴 설치

uv tool install huggingface_hub

# 캐시(모델) 목록/크기 조회

hf cache list --format agent

# 모델 삭제시

hf cache rm <모델ID>

local llm with claude code and opencode

comments:

댓글 쓰기