Papers that exist
updated
Latent Zoning Network: A Unified Principle for Generative Modeling,
Representation Learning, and Classification
Paper
• 2509.15591
• Published • 45
A Survey on Latent Reasoning
Paper
• 2507.06203
• Published • 94
Quantized Evolution Strategies: High-precision Fine-tuning of Quantized LLMs at Low-precision Cost
Paper
• 2602.03120
• Published • 1
TADA! Tuning Audio Diffusion Models through Activation Steering
Paper
• 2602.11910
• Published • 2
CoPE-VideoLM: Codec Primitives For Efficient Video Language Models
Paper
• 2602.13191
• Published • 30
GeoAgent: Learning to Geolocate Everywhere with Reinforced Geographic Characteristics
Paper
• 2602.12617
• Published • 20
OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence
Paper
• 2602.08683
• Published • 52
Towards Universal Video MLLMs with Attribute-Structured and Quality-Verified Instructions
Paper
• 2602.13013
• Published • 54
Language Models are Hidden Reasoners: Unlocking Latent Reasoning
Capabilities via Self-Rewarding
Paper
• 2411.04282
• Published • 37
Paper
• 2505.14513
• Published • 29
LoRA-Drop: Temporal LoRA Decoding for Efficient LLM Inference
Paper
• 2601.02569
• Published
LLMs + Persona-Plug = Personalized LLMs
Paper
• 2409.11901
• Published • 35
Thanos: Enhancing Conversational Agents with Skill-of-Mind-Infused Large
Language Model
Paper
• 2411.04496
• Published • 22
FoNE: Precise Single-Token Number Embeddings via Fourier Features
Paper
• 2502.09741
• Published • 15
FLEXITOKENS: Flexible Tokenization for Evolving Language Models
Paper
• 2507.12720
• Published • 10
Distilling Token-Trained Models into Byte-Level Models
Paper
• 2602.01007
• Published
Multiscale Byte Language Models -- A Hierarchical Architecture for
Causal Million-Length Sequence Modeling
Paper
• 2502.14553
• Published • 1
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper
• 2412.09871
• Published • 108
Delta Attention: Fast and Accurate Sparse Attention Inference by Delta
Correction
Paper
• 2505.11254
• Published • 48
Less is More: Recursive Reasoning with Tiny Networks
Paper
• 2510.04871
• Published • 511
OmniQuery: Contextually Augmenting Captured Multimodal Memory to Enable
Personal Question Answering
Paper
• 2409.08250
• Published • 1
LightMem: Lightweight and Efficient Memory-Augmented Generation
Paper
• 2510.18866
• Published • 115
The End of Manual Decoding: Towards Truly End-to-End Language Models
Paper
• 2510.26697
• Published • 119
Kimi Linear: An Expressive, Efficient Attention Architecture
Paper
• 2510.26692
• Published • 133
Internalizing Meta-Experience into Memory for Guided Reinforcement Learning in Large Language Models
Paper
• 2602.10224
• Published • 19
ProRAG: Process-Supervised Reinforcement Learning for Retrieval-Augmented Generation
Paper
• 2601.21912
• Published • 1
xRAG: Extreme Context Compression for Retrieval-augmented Generation
with One Token
Paper
• 2405.13792
• Published • 1
ReplaceMe: Network Simplification via Layer Pruning and Linear
Transformations
Paper
• 2505.02819
• Published • 26
Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and
Mixture-of-Experts Optimization Alignment
Paper
• 2502.16894
• Published • 32
DeepGen 1.0: A Lightweight Unified Multimodal Model for Advancing Image Generation and Editing
Paper
• 2602.12205
• Published • 80
MiniCPM-SALA: Hybridizing Sparse and Linear Attention for Efficient Long-Context Modeling
Paper
• 2602.11761
• Published • 7
CoMeT: Collaborative Memory Transformer for Efficient Long Context Modeling
Paper
• 2602.01766
• Published
Elastic Attention: Test-time Adaptive Sparsity Ratios for Efficient Transformers
Paper
• 2601.17367
• Published • 34
MemFly: On-the-Fly Memory Optimization via Information Bottleneck
Paper
• 2602.07885
• Published • 7
Paper
• 2602.11298
• Published • 19
UMEM: Unified Memory Extraction and Management Framework for Generalizable Memory
Paper
• 2602.10652
• Published • 3
Weight Decay Improves Language Model Plasticity
Paper
• 2602.11137
• Published • 2
Latent Thoughts Tuning: Bridging Context and Reasoning with Fused Information in Latent Tokens
Paper
• 2602.10229
• Published • 5
Stroke3D: Lifting 2D strokes into rigged 3D model via latent diffusion models
Paper
• 2602.09713
• Published • 8
Ex-Omni: Enabling 3D Facial Animation Generation for Omni-modal Large Language Models
Paper
• 2602.07106
• Published • 11
TimeChat-Captioner: Scripting Multi-Scene Videos with Time-Aware and Structural Audio-Visual Captions
Paper
• 2602.08711
• Published • 28
How Do Decoder-Only LLMs Perceive Users? Rethinking Attention Masking for User Representation Learning
Paper
• 2602.10622
• Published • 27
Paper
• 2410.05258
• Published • 182
Möbius Transform for Mitigating Perspective Distortions in
Representation Learning
Paper
• 2405.02296
• Published • 4
AToken: A Unified Tokenizer for Vision
Paper
• 2509.14476
• Published • 37
Flash-VStream: Memory-Based Real-Time Understanding for Long Video
Streams
Paper
• 2406.08085
• Published • 17
Badllama 3: removing safety finetuning from Llama 3 in minutes
Paper
• 2407.01376
• Published
RoseLoRA: Row and Column-wise Sparse Low-rank Adaptation of Pre-trained
Language Model for Knowledge Editing and Fine-tuning
Paper
• 2406.10777
• Published • 2
OLoRA: Orthonormal Low-Rank Adaptation of Large Language Models
Paper
• 2406.01775
• Published • 3
RAG-Instruct: Boosting LLMs with Diverse Retrieval-Augmented
Instructions
Paper
• 2501.00353
• Published
Video Streaming Thinking: VideoLLMs Can Watch and Think Simultaneously
Paper
• 2603.12262
• Published • 30
Think While Watching: Online Streaming Segment-Level Memory for Multi-Turn Video Reasoning in Multimodal Large Language Models
Paper
• 2603.11896
• Published • 8
WeEdit: A Dataset, Benchmark and Glyph-Guided Framework for Text-centric Image Editing
Paper
• 2603.11593
• Published • 25
One Model, Many Budgets: Elastic Latent Interfaces for Diffusion Transformers
Paper
• 2603.12245
• Published • 18
EVATok: Adaptive Length Video Tokenization for Efficient Visual Autoregressive Generation
Paper
• 2603.12267
• Published • 13
OmniStream: Mastering Perception, Reconstruction and Action in Continuous Streams
Paper
• 2603.12265
• Published • 12
Training Language Models via Neural Cellular Automata
Paper
• 2603.10055
• Published • 7
Accent Vector: Controllable Accent Manipulation for Multilingual TTS Without Accented Data
Paper
• 2603.07534
• Published • 5
FireRedASR2S: A State-of-the-Art Industrial-Grade All-in-One Automatic Speech Recognition System
Paper
• 2603.10420
• Published • 6
NerVE: Nonlinear Eigenspectrum Dynamics in LLM Feed-Forward Networks
Paper
• 2603.06922
• Published • 2
Flash-KMeans: Fast and Memory-Efficient Exact K-Means
Paper
• 2603.09229
• Published • 79
EvoTok: A Unified Image Tokenizer via Residual Latent Evolution for Visual Understanding and Generation
Paper
• 2603.12108
• Published • 8
Aligning Visual Foundation Encoders to Tokenizers for Diffusion Models
Paper
• 2509.25162
• Published • 3
RecTok: Reconstruction Distillation along Rectified Flow
Paper
• 2512.13421
• Published • 5
ε-VAE: Denoising as Visual Decoding
Paper
• 2410.04081
• Published • 7
TokenFlow: Unified Image Tokenizer for Multimodal Understanding and
Generation
Paper
• 2412.03069
• Published • 34
Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent
Attention in Any Transformer-based LLMs
Paper
• 2502.14837
• Published • 3
Sparse-BitNet: 1.58-bit LLMs are Naturally Friendly to Semi-Structured Sparsity
Paper
• 2603.05168
• Published • 4
Truncated Step-Level Sampling with Process Rewards for Retrieval-Augmented Reasoning
Paper
• 2602.23440
• Published • 3
BBQ-to-Image: Numeric Bounding Box and Qolor Control in Large-Scale Text-to-Image Models
Paper
• 2602.20672
• Published • 9
LLaDA-o: An Effective and Length-Adaptive Omni Diffusion Model
Paper
• 2603.01068
• Published • 22
NOVA: Sparse Control, Dense Synthesis for Pair-Free Video Editing
Paper
• 2603.02802
• Published • 7
InfinityStory: Unlimited Video Generation with World Consistency and Character-Aware Shot Transitions
Paper
• 2603.03646
• Published • 8
Memex(RL): Scaling Long-Horizon LLM Agents via Indexed Experience Memory
Paper
• 2603.04257
• Published • 19
MemSifter: Offloading LLM Memory Retrieval via Outcome-Driven Proxy Reasoning
Paper
• 2603.03379
• Published • 31
Proact-VL: A Proactive VideoLLM for Real-Time AI Companions
Paper
• 2603.03447
• Published • 36
Helios: Real Real-Time Long Video Generation Model
Paper
• 2603.04379
• Published • 173
Beyond the Grid: Layout-Informed Multi-Vector Retrieval with Parsed Visual Document Representations
Paper
• 2603.01666
• Published • 1
WildActor: Unconstrained Identity-Preserving Video Generation
Paper
• 2603.00586
• Published • 37
ByteFlow: Language Modeling through Adaptive Byte Compression without a Tokenizer
Paper
• 2603.03583
• Published • 2
HY-WU (Part I): An Extensible Functional Neural Memory Framework and An Instantiation in Text-Guided Image Editing
Paper
• 2603.07236
• Published • 3
CoCo: Code as CoT for Text-to-Image Preview and Rare Concept Generation
Paper
• 2603.08652
• Published • 39
Lost in Stories: Consistency Bugs in Long Story Generation by LLMs
Paper
• 2603.05890
• Published • 91
BrandFusion: A Multi-Agent Framework for Seamless Brand Integration in Text-to-Video Generation
Paper
• 2603.02816
• Published • 2
Towards a Neural Debugger for Python
Paper
• 2603.09951
• Published • 5
Reading, Not Thinking: Understanding and Bridging the Modality Gap When Text Becomes Pixels in Multimodal LLMs
Paper
• 2603.09095
• Published • 28
Fish Audio S2 Technical Report
Paper
• 2603.08823
• Published • 34
InternVL-U: Democratizing Unified Multimodal Models for Understanding, Reasoning, Generation and Editing
Paper
• 2603.09877
• Published • 47
UniCom: Unified Multimodal Modeling via Compressed Continuous Semantic Representations
Paper
• 2603.10702
• Published • 4
According to Me: Long-Term Personalized Referential Memory QA
Paper
• 2603.01990
• Published • 5
ID-LoRA: Identity-Driven Audio-Video Personalization with In-Context LoRA
Paper
• 2603.10256
• Published • 19
HybridStitch: Pixel and Timestep Level Model Stitching for Diffusion Acceleration
Paper
• 2603.07815
• Published • 10
OmniForcing: Unleashing Real-time Joint Audio-Visual Generation
Paper
• 2603.11647
• Published • 31
Cheers: Decoupling Patch Details from Semantic Representations Enables Unified Multimodal Comprehension and Generation
Paper
• 2603.12793
• Published • 37
MoKus: Leveraging Cross-Modal Knowledge Transfer for Knowledge-Aware Concept Customization
Paper
• 2603.12743
• Published • 3
Paper
• 2603.15031
• Published • 140
Test-Time Strategies for More Efficient and Accurate Agentic RAG
Paper
• 2603.12396
• Published • 1
SuperLocalMemory V3: Information-Geometric Foundations for Zero-LLM Enterprise Agent Memory
Paper
• 2603.14588
• Published • 2
MEMO: Memory-Augmented Model Context Optimization for Robust Multi-Turn Multi-Agent LLM Games
Paper
• 2603.09022
• Published • 24
GradMem: Learning to Write Context into Memory with Test-Time Gradient Descent
Paper
• 2603.13875
• Published • 32
Rethinking UMM Visual Generation: Masked Modeling for Efficient Image-Only Pre-training
Paper
• 2603.16139
• Published • 31