Yep! I'm doing something similar but with a slightly different split.
My setup:
- Local (Continue.dev) - handles all autocomplete and quick inline suggestions. Runs Codestral locally via Ollama. Super fast, zero latency, private.
- Cloud (Claude API) - for complex refactors, architecture questions, debugging. Better reasoning but obviously requires internet + costs tokens.
The key thing I've found is having a clear mental model of when to use which. Like:
- Writing boilerplate, variable names, simple functions → local
- "How should I structure this feature?" or "Why is this async code deadlocking?" → cloud
One issue I hit early on: the local model kept suggesting patterns that conflicted with what Claude recommended for bigger stuff. Had to fine-tune the local prompts to match Claude's style more. Now it feels way more coherent.
Do you find the Qwen-coder variant accurate enough for autocomplete? I tried it briefly but found it suggested outdated API patterns sometimes. Might need to give it another shot tho.