AI Daily — April 17, 2026

2026-04-17

openai.com

Introducing GPT-Rosalind for life sciences research

OpenAI has released GPT-Rosalind, a frontier reasoning model purpose-built for life sciences workloads including drug discovery, genomics analysis, and protein reasoning. The model represents a domain-specialized branch of OpenAI's reasoning model line, targeting scientific research workflows rather than general use. This follows a broader trend of frontier labs releasing vertically specialized models alongside general-purpose ones.

#llms#reasoning#science

openai.com

Codex for (almost) everything

OpenAI has significantly expanded the Codex desktop app for macOS and Windows, adding computer use, in-app browsing, image generation, persistent memory, and a plugin system. The update positions Codex as a general-purpose developer agent rather than a code completion tool, competing directly with tools like Cursor and GitHub Copilot Workspace. Computer use integration in particular enables Codex to interact with local applications and browser environments autonomously.

#agents#tools#products

huggingface.co

Training and Finetuning Multimodal Embedding & Reranker Models with Sentence Transformers

Hugging Face has published a technical guide covering training and fine-tuning multimodal embedding and reranker models using the Sentence Transformers library. The post details how to handle image-text pairs for retrieval tasks, covering loss functions, data formats, and evaluation strategies for multimodal scenarios. This extends Sentence Transformers' previously text-only fine-tuning workflows to multimodal retrieval pipelines.

#multimodal#research#open-source

arxiv.org

MixAtlas: Uncertainty-aware Data Mixture Optimization for Multimodal LLM Midtraining

MixAtlas proposes a two-dimensional decomposition of multimodal training corpora along image concept clusters (derived via CLIP embeddings) and task supervision types (captioning, OCR, grounding, detection, VQA), then uses a Gaussian-process surrogate with GP-UCB acquisition to optimize data mixture ratios. Small proxy models (Qwen2-0.5B) are used to cheaply estimate performance, making the method tractable for large-scale midtraining. The approach produces benchmark-targeted, interpretable data recipes that can be transferred to new corpora without full retraining.

#multimodal#research#data

arxiv.org

Differentially Private Conformal Prediction

This paper presents DPCP, a framework that combines differential privacy with conformal prediction to enable statistically efficient uncertainty quantification under strict privacy constraints. A key contribution is 'differential CP,' a non-splitting conformal procedure that avoids the sample efficiency loss typically caused by data splitting in private settings. The work provides formal validity guarantees by exploiting stability properties of DP mechanisms, bridging oracle CP and private conformal inference.

#research#safety