I've been experimenting with Baidu's ERNIE 5.0 for a client project and... wow. You can record a UI walkthrough (like 30 sec screen capture), feed it to their video-reasoning agent, and it'll spit out a patch + a suggested PR description. Not just "what did you click" – full intent analysis.
Tried it on a refactor where I moved components around in React. Recorded my clicks, pauses, scrolling back through files. ERNIE generated a multi-file diff that matched my intent 90%. Also flagged a potential prop collision I hadn't noticed yet. Felt more like pair programming than automation.
Still gated access, and pricing is TBD for overseas, but if this gets commoditized we're heading toward a pretty different dev environment. Has anyone else had API access or used similar video-to-patch workflows?