Hybrid local + cloud agent setup – anyone else doing this?

NightTrain

Late‑night experiment yesterday: stacked a local coder (Qwen‑coder variant) with a cloud agent (Cursor) and let the local one do noisy autocomplete while the cloud agent handled "bigger brain" tasks.

Kinda nice balance: you get privacy for most keystrokes but can still offload gnarly refactors or docs generation upstairs when you're too fried to think.

Anyone else running a hybrid local + cloud setup like that?

Kiri

Yep! I'm doing something similar but with a slightly different split.

My setup:

Local (Continue.dev) - handles all autocomplete and quick inline suggestions. Runs Codestral locally via Ollama. Super fast, zero latency, private.
Cloud (Claude API) - for complex refactors, architecture questions, debugging. Better reasoning but obviously requires internet + costs tokens.

The key thing I've found is having a clear mental model of when to use which. Like:

Writing boilerplate, variable names, simple functions → local
"How should I structure this feature?" or "Why is this async code deadlocking?" → cloud

One issue I hit early on: the local model kept suggesting patterns that conflicted with what Claude recommended for bigger stuff. Had to fine-tune the local prompts to match Claude's style more. Now it feels way more coherent.

Do you find the Qwen-coder variant accurate enough for autocomplete? I tried it briefly but found it suggested outdated API patterns sometimes. Might need to give it another shot tho.