AI Daily — April 15, 2026
2026-04-15
deepmind.google
Gemini Robotics-ER 1.6: Powering real-world robotics tasks through enhanced embodied reasoning
DeepMind released Gemini Robotics ER 1.6, an update to its embodied reasoning model with improved spatial reasoning and multi-view scene understanding for autonomous robot control. The model targets real-world manipulation tasks where understanding 3D geometry across camera perspectives is a key bottleneck. This is a continued push to make foundation models directly actionable in physical environments without task-specific retraining.
openai.com
Trusted access for the next era of cyber defense
OpenAI is expanding its Trusted Access for Cyber program and introducing GPT-5.4-Cyber, a model variant specifically made available to vetted cybersecurity defenders. Access is gated through a vetting process, reflecting OpenAI's attempt to balance capability diffusion with dual-use risk in the cybersecurity domain. This is notable as one of the first public acknowledgments of a GPT-5-series specialized variant being deployed in a restricted operational context.
arxiv.org
Self-Distillation Zero: Self-Revision Turns Binary Rewards into Dense Supervision
SD-Zero proposes a post-training method that bridges reinforcement learning from verifiable rewards (RLVR) and distillation by having a single model act as both generator and self-reviser, converting sparse binary rewards into denser token-level supervision without an external teacher. The approach claims substantially better sample efficiency than standard RLVR while avoiding the cost of curating high-quality demonstrations. This is relevant to anyone working on scalable alignment or reasoning improvements where human-labeled data is scarce.
arxiv.org
The Long-Horizon Task Mirage? Diagnosing Where and Why Agentic Systems Break
Researchers introduce HORIZON, a cross-domain diagnostic benchmark designed to systematically expose failure modes in LLM-based agents on long-horizon tasks requiring extended, interdependent action sequences. Evaluation covers GPT-5 variants and Claude models across 3,100+ trajectories in four agentic domains, revealing structured patterns in how and where agent performance degrades with task horizon. The benchmark addresses a notable gap: current evals often capture short-to-mid horizon performance but lack rigorous characterization of long-horizon breakdown.