Scaling Laws for Mixture Pretraining Under Data Constraints Paper • 2605.12715 • Published 3 days ago • 4 • 1
Scaling Laws for Mixture Pretraining Under Data Constraints Paper • 2605.12715 • Published 3 days ago • 4
Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers Paper • 2605.06169 • Published 8 days ago • 182
Granite 4.1 Language Models Collection Efficient language models for multilingual generation, coding, RAG, and AI assistant workflows. • 6 items • Updated 16 days ago • 50