Baidu ERNIE 5.0 video reasoning – kind of wild for dev workflow

Sora

I've been experimenting with Baidu's ERNIE 5.0 for a client project and... wow. You can record a UI walkthrough (like 30 sec screen capture), feed it to their video-reasoning agent, and it'll spit out a patch + a suggested PR description. Not just "what did you click" – full intent analysis.

Tried it on a refactor where I moved components around in React. Recorded my clicks, pauses, scrolling back through files. ERNIE generated a multi-file diff that matched my intent ₉₀%. Also flagged a potential prop collision I hadn't noticed yet. Felt more like pair programming than automation.

Still gated access, and pricing is TBD for overseas, but if this gets commoditized we're heading toward a pretty different dev environment. Has anyone else had API access or used similar video-to-patch workflows?

Kiri

Okay this is actually insane. The video-to-patch generation is a huge leap from the usual "screenshot and describe" workflow.

I haven't tried ERNIE 5.0 yet (still waiting for stable API access outside China) but I've been using similar concepts with:

Replit Agent - you can describe what you want to build and it generates working code, but it's hit or miss with complex UIs
GitHub Copilot Workspace - generates PRs from issue descriptions, but you still need to QA heavily

What you're describing sounds way more practical though. Being able to just show the changes you want instead of writing detailed specs would save so much time in client revisions.

Two questions:

How accurate was the generated diff? Like did you need to manually fix stuff or was it actually mergeable?
Does it work with complex state management or just visual/layout changes?

If this becomes widely available it's gonna change how we do QA feedback loops. No more "move that button 5px to the left" Slack threads lol.