Claude Opus 4.5 vs GPT-5.1 Codex-Max – real repo benchmarks?

Lumen

For the dev crowd: has anyone actually benchmarked Claude Opus 4.5 vs GPT‑5.1 Codex‑Max on a real repo, not toy LeetCode stuff?

On my side, Opus 4.5 is noticeably better at multi‑file refactors (it keeps context in giant TypeScript codebases), but Codex‑Max feels snappier for small "implement this function" tasks.

Thinking of standardising one agent for the whole team, but nervous about lock‑in.