Trending Papers - Hugging Face (original) (raw)
new
Get trending papers in your email inbox once a day!
Get trending papers in your email inbox!
by
AK and the research community
Submitted by
![]()
taesiri
SkillOpt: Executive Strategy for Self-Evolving Agent Skills
SkillOpt introduces a systematic text-space optimizer for agent skills that trains skills as external agent state with stable updates and zero deployment inference overhead, achieving superior performance across multiple benchmarks and execution environments.
Submitted by
![]()
taesiri
SkillOpt: Executive Strategy for Self-Evolving Agent Skills
SkillOpt introduces a systematic text-space optimizer for agent skills that trains skills as external agent state with stable updates and zero deployment inference overhead, achieving superior performance across multiple benchmarks and execution environments.
Kronos: A Foundation Model for the Language of Financial Markets
Kronos, a specialized pre-training framework for financial K-line data, outperforms existing models in forecasting and synthetic data generation through a unique tokenizer and autoregressive pre-training on a large dataset.
- 7 authors
· Published on Aug 2, 2025
Submitted by
iieycx
Submitted by
iieycx
Submitted by
![]()
qiushao
Submitted by
![]()
qiushao
Submitted by
![]()
ChengCui
Submitted by
![]()
ChengCui
Submitted by
![]()
akhaliq
Submitted by
![]()
akhaliq
Submitted by
![]()
AdinaY
Submitted by
![]()
AdinaY
Submitted by
![]()
taesiri
Submitted by
![]()
taesiri
Submitted by
![]()
akhaliq
Submitted by
![]()
akhaliq
Submitted by
![]()
akhaliq
Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory
Mem0, a memory-centric architecture with graph-based memory, enhances long-term conversational coherence in LLMs by efficiently extracting, consolidating, and retrieving information, outperforming existing memory systems in terms of accuracy and computational efficiency.
· Published on Apr 28, 2025
Submitted by
![]()
akhaliq
Submitted by
![]()
rajkumarrawal
Recursive Language Models
We study allowing large language models (LLMs) to process arbitrarily long prompts through the lens of inference-time scaling. We propose Recursive Language Models (RLMs), a general inference strategy that treats long prompts as part of an external environment and allows the LLM to programmatically examine, decompose, and recursively call itself over snippets of the prompt. We find that RLMs successfully handle inputs up to two orders of magnitude beyond model context windows and, even for shorter prompts, dramatically outperform the quality of base LLMs and common long-context scaffolds across four diverse long-context tasks, while having comparable (or cheaper) cost per query.
Submitted by
![]()
rajkumarrawal
Recursive Language Models
We study allowing large language models (LLMs) to process arbitrarily long prompts through the lens of inference-time scaling. We propose Recursive Language Models (RLMs), a general inference strategy that treats long prompts as part of an external environment and allows the LLM to programmatically examine, decompose, and recursively call itself over snippets of the prompt. We find that RLMs successfully handle inputs up to two orders of magnitude beyond model context windows and, even for shorter prompts, dramatically outperform the quality of base LLMs and common long-context scaffolds across four diverse long-context tasks, while having comparable (or cheaper) cost per query.
Submitted by
![]()
taesiri
Submitted by
![]()
taesiri
Submitted by
VigneshHexo
SIA: Self Improving AI with Harness & Weight Updates
A self-improving AI framework simultaneously updates both model weights and task-specific agent architecture through a language-model feedback agent across legal classification, GPU optimization, and biological data denoising tasks.
· Published on May 26, 2026
Submitted by
VigneshHexo
SIA: Self Improving AI with Harness & Weight Updates
A self-improving AI framework simultaneously updates both model weights and task-specific agent architecture through a language-model feedback agent across legal classification, GPU optimization, and biological data denoising tasks.
Submitted by
![]()
zhengli1013
InterleaveThinker: Reinforcing Agentic Interleaved Generation
InterleaveThinker enables interleaved generation capabilities for image generators through a multi-agent pipeline with planner and critic agents, achieving performance comparable to state-of-the-art models while enhancing reasoning benchmarks.

- 7 authors
· Published on Jun 11, 2026
Submitted by
![]()
zhengli1013
InterleaveThinker: Reinforcing Agentic Interleaved Generation
InterleaveThinker enables interleaved generation capabilities for image generators through a multi-agent pipeline with planner and critic agents, achieving performance comparable to state-of-the-art models while enhancing reasoning benchmarks.

- 7 authors
· Jun 11, 2026
Submitted by
![]()
namespace-ERI
Submitted by
![]()
namespace-ERI
Submitted by
hongsunghwan
Geometric Action Model for Robot Policy Learning
A geometric action model leverages pretrained geometric foundation models to enable language-conditioned manipulation policies with improved accuracy, robustness, and efficiency in 3D physical environments.
Submitted by
hongsunghwan
Geometric Action Model for Robot Policy Learning
A geometric action model leverages pretrained geometric foundation models to enable language-conditioned manipulation policies with improved accuracy, robustness, and efficiency in 3D physical environments.
Submitted by
![]()
taesiri
Submitted by
![]()
taesiri
Submitted by
![]()
taesiri
Cosmos 3: Omnimodal World Models for Physical AI
Cosmos 3 is an omnimodal world model that processes and generates multiple data types through a unified mixture-of-transformers architecture, achieving state-of-the-art performance in various understanding and generation tasks.
· Published on Jun 1, 2026
Submitted by
![]()
taesiri
Cosmos 3: Omnimodal World Models for Physical AI
Cosmos 3 is an omnimodal world model that processes and generates multiple data types through a unified mixture-of-transformers architecture, achieving state-of-the-art performance in various understanding and generation tasks.
Submitted by
![]()
andito
Submitted by
![]()
andito
Submitted by
RuofengYang
Submitted by
RuofengYang
Submitted by
Karl28
Orchestra-o1: Omnimodal Agent Orchestration
An omnimodal agent orchestration framework is presented that enables efficient collaboration across multiple modalities through unified task decomposition and specialized sub-agent execution, achieving superior performance on complex multimodal benchmarks.
Submitted by
Karl28
Orchestra-o1: Omnimodal Agent Orchestration
An omnimodal agent orchestration framework is presented that enables efficient collaboration across multiple modalities through unified task decomposition and specialized sub-agent execution, achieving superior performance on complex multimodal benchmarks.
Submitted by
![]()
taesiri
dots.tts Technical Report
A 2B-parameter continuous autoregressive text-to-speech model trained on a multilingual corpus achieves state-of-the-art performance on multiple benchmarks while enabling efficient low-latency speech generation through specialized distillation techniques.
- 9 authors
· Published on Jun 5, 2026
Submitted by
![]()
taesiri
dots.tts Technical Report
A 2B-parameter continuous autoregressive text-to-speech model trained on a multilingual corpus achieves state-of-the-art performance on multiple benchmarks while enabling efficient low-latency speech generation through specialized distillation techniques.
Submitted by
XinyangDavidHan
Agents' Last Exam
Agents' Last Exam (ALE) is a benchmark for evaluating AI agents on long-term, economically valuable real-world tasks across 13 industry clusters with 1K+ tasks, revealing significant gaps between benchmark performance and practical deployment.
Submitted by
XinyangDavidHan
Agents' Last Exam
Agents' Last Exam (ALE) is a benchmark for evaluating AI agents on long-term, economically valuable real-world tasks across 13 industry clusters with 1K+ tasks, revealing significant gaps between benchmark performance and practical deployment.
Submitted by
![]()
Jiaqi-hkust
Submitted by
![]()
Jiaqi-hkust
Submitted by
![]()
SenXu1123
Submitted by
![]()
SenXu1123
Submitted by
![]()
akhaliq
Very Large-Scale Multi-Agent Simulation in AgentScope
Enhancements to the AgentScope platform improve scalability, efficiency, and ease of use for large-scale multi-agent simulations through distributed mechanisms, flexible environments, and user-friendly tools.
· Published on Jul 25, 2024
Submitted by
![]()
akhaliq
Submitted by
unilm
VibeVoice Technical Report
VibeVoice synthesizes long-form multi-speaker speech using next-token diffusion and a highly efficient continuous speech tokenizer, achieving superior performance and fidelity.
Submitted by
unilm
VibeVoice Technical Report
VibeVoice synthesizes long-form multi-speaker speech using next-token diffusion and a highly efficient continuous speech tokenizer, achieving superior performance and fidelity.
Submitted by
![]()
taesiri
Submitted by
![]()
taesiri
Submitted by
zhouxiangxin
Submitted by
zhouxiangxin
Submitted by
liushiliushi
Submitted by
liushiliushi
LightRAG: Simple and Fast Retrieval-Augmented Generation
LightRAG improves Retrieval-Augmented Generation by integrating graph structures for enhanced contextual awareness and efficient information retrieval, achieving better accuracy and response times.
- 5 authors
· Published on Oct 8, 2024
Submitted by
![]()
Paranioar
Submitted by
![]()
Paranioar
Submitted by
![]()
ryanlee-dev
MiniMax Sparse Attention
MiniMax Sparse Attention enables efficient processing of ultra-long contexts in large language models through blockwise sparsity and optimized GPU execution, achieving significant speedups while maintaining performance.
· Published on Jun 11, 2026
Submitted by
![]()
ryanlee-dev
MiniMax Sparse Attention
MiniMax Sparse Attention enables efficient processing of ultra-long contexts in large language models through blockwise sparsity and optimized GPU execution, achieving significant speedups while maintaining performance.
Submitted by
![]()
Rbin
RAG-Anything: All-in-One RAG Framework
RAG-Anything is a unified framework that enhances multimodal knowledge retrieval by integrating cross-modal relationships and semantic matching, outperforming existing methods on complex benchmarks.
Submitted by
![]()
Rbin
RAG-Anything: All-in-One RAG Framework
RAG-Anything is a unified framework that enhances multimodal knowledge retrieval by integrating cross-modal relationships and semantic matching, outperforming existing methods on complex benchmarks.
Submitted by
jasonrqh
Submitted by
jasonrqh
Submitted by
![]()
nielsr
Submitted by
![]()
nielsr
Submitted by
zbhpku
Submitted by
zbhpku
Submitted by
![]()
imone
HRM-Text: Efficient Pretraining Beyond Scaling
A Hierarchical Recurrent Model architecture with specialized training on instruction-response pairs achieves competitive language modeling performance with significantly reduced computational requirements compared to traditional Transformer-based approaches.
Submitted by
![]()
imone
HRM-Text: Efficient Pretraining Beyond Scaling
A Hierarchical Recurrent Model architecture with specialized training on instruction-response pairs achieves competitive language modeling performance with significantly reduced computational requirements compared to traditional Transformer-based approaches.
Submitted by
pat-jj
Submitted by
pat-jj
Submitted by
![]()
MoeinAbtahi
Submitted by
![]()
MoeinAbtahi
Submitted by
![]()
mervenoyan
Submitted by
![]()
mervenoyan
Submitted by
![]()
yifanzhang114
Submitted by
![]()
yifanzhang114