Trending Papers - Hugging Face (original) (raw)

new

Get trending papers in your email inbox once a day!

Get trending papers in your email inbox!

Subscribe

byAK and the research community

Submitted by

taesiri

SkillOpt: Executive Strategy for Self-Evolving Agent Skills

SkillOpt introduces a systematic text-space optimizer for agent skills that trains skills as external agent state with stable updates and zero deployment inference overhead, achieving superior performance across multiple benchmarks and execution environments.

Submitted by

taesiri

SkillOpt: Executive Strategy for Self-Evolving Agent Skills

SkillOpt introduces a systematic text-space optimizer for agent skills that trains skills as external agent state with stable updates and zero deployment inference overhead, achieving superior performance across multiple benchmarks and execution environments.

Kronos: A Foundation Model for the Language of Financial Markets

Kronos, a specialized pre-training framework for financial K-line data, outperforms existing models in forecasting and synthetic data generation through a unique tokenizer and autoregressive pre-training on a large dataset.

· Published on Aug 2, 2025

Submitted by

iieycx

Submitted by

iieycx

Submitted by

qiushao

Submitted by

qiushao

Submitted by

ChengCui

Submitted by

ChengCui

Submitted by

akhaliq

Submitted by

akhaliq

Submitted by

AdinaY

Submitted by

AdinaY

Submitted by

taesiri

Submitted by

taesiri

Submitted by

akhaliq

Submitted by

akhaliq

Submitted by

akhaliq

Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

Mem0, a memory-centric architecture with graph-based memory, enhances long-term conversational coherence in LLMs by efficiently extracting, consolidating, and retrieving information, outperforming existing memory systems in terms of accuracy and computational efficiency.

· Published on Apr 28, 2025

Submitted by

akhaliq

Submitted by

rajkumarrawal

Recursive Language Models

We study allowing large language models (LLMs) to process arbitrarily long prompts through the lens of inference-time scaling. We propose Recursive Language Models (RLMs), a general inference strategy that treats long prompts as part of an external environment and allows the LLM to programmatically examine, decompose, and recursively call itself over snippets of the prompt. We find that RLMs successfully handle inputs up to two orders of magnitude beyond model context windows and, even for shorter prompts, dramatically outperform the quality of base LLMs and common long-context scaffolds across four diverse long-context tasks, while having comparable (or cheaper) cost per query.

Submitted by

rajkumarrawal

Recursive Language Models

We study allowing large language models (LLMs) to process arbitrarily long prompts through the lens of inference-time scaling. We propose Recursive Language Models (RLMs), a general inference strategy that treats long prompts as part of an external environment and allows the LLM to programmatically examine, decompose, and recursively call itself over snippets of the prompt. We find that RLMs successfully handle inputs up to two orders of magnitude beyond model context windows and, even for shorter prompts, dramatically outperform the quality of base LLMs and common long-context scaffolds across four diverse long-context tasks, while having comparable (or cheaper) cost per query.

Submitted by

taesiri

Submitted by

taesiri

Submitted by

VigneshHexo

SIA: Self Improving AI with Harness & Weight Updates

A self-improving AI framework simultaneously updates both model weights and task-specific agent architecture through a language-model feedback agent across legal classification, GPU optimization, and biological data denoising tasks.

hexoaiorg Hexo AI

· Published on May 26, 2026

Submitted by

VigneshHexo

SIA: Self Improving AI with Harness & Weight Updates

A self-improving AI framework simultaneously updates both model weights and task-specific agent architecture through a language-model feedback agent across legal classification, GPU optimization, and biological data denoising tasks.

Submitted by

zhengli1013

InterleaveThinker: Reinforcing Agentic Interleaved Generation

InterleaveThinker enables interleaved generation capabilities for image generators through a multi-agent pipeline with planner and critic agents, achieving performance comparable to state-of-the-art models while enhancing reasoning benchmarks.

· Published on Jun 11, 2026

Submitted by

zhengli1013

InterleaveThinker: Reinforcing Agentic Interleaved Generation

InterleaveThinker enables interleaved generation capabilities for image generators through a multi-agent pipeline with planner and critic agents, achieving performance comparable to state-of-the-art models while enhancing reasoning benchmarks.

· Jun 11, 2026

Submitted by

namespace-ERI

Submitted by

namespace-ERI

Submitted by

hongsunghwan

Geometric Action Model for Robot Policy Learning

A geometric action model leverages pretrained geometric foundation models to enable language-conditioned manipulation policies with improved accuracy, robustness, and efficiency in 3D physical environments.

Submitted by

hongsunghwan

Geometric Action Model for Robot Policy Learning

A geometric action model leverages pretrained geometric foundation models to enable language-conditioned manipulation policies with improved accuracy, robustness, and efficiency in 3D physical environments.

Submitted by

taesiri

Submitted by

taesiri

Submitted by

taesiri

Cosmos 3: Omnimodal World Models for Physical AI

Cosmos 3 is an omnimodal world model that processes and generates multiple data types through a unified mixture-of-transformers architecture, achieving state-of-the-art performance in various understanding and generation tasks.

nvidia NVIDIA

· Published on Jun 1, 2026

Submitted by

taesiri

Cosmos 3: Omnimodal World Models for Physical AI

Cosmos 3 is an omnimodal world model that processes and generates multiple data types through a unified mixture-of-transformers architecture, achieving state-of-the-art performance in various understanding and generation tasks.

Submitted by

andito

Submitted by

andito

Submitted by

RuofengYang

Submitted by

RuofengYang

Submitted by

Karl28

Orchestra-o1: Omnimodal Agent Orchestration

An omnimodal agent orchestration framework is presented that enables efficient collaboration across multiple modalities through unified task decomposition and specialized sub-agent execution, achieving superior performance on complex multimodal benchmarks.

Submitted by

Karl28

Orchestra-o1: Omnimodal Agent Orchestration

An omnimodal agent orchestration framework is presented that enables efficient collaboration across multiple modalities through unified task decomposition and specialized sub-agent execution, achieving superior performance on complex multimodal benchmarks.

Submitted by

taesiri

dots.tts Technical Report

A 2B-parameter continuous autoregressive text-to-speech model trained on a multilingual corpus achieves state-of-the-art performance on multiple benchmarks while enabling efficient low-latency speech generation through specialized distillation techniques.

· Published on Jun 5, 2026

Submitted by

taesiri

dots.tts Technical Report

A 2B-parameter continuous autoregressive text-to-speech model trained on a multilingual corpus achieves state-of-the-art performance on multiple benchmarks while enabling efficient low-latency speech generation through specialized distillation techniques.

Submitted by

XinyangDavidHan

Agents' Last Exam

Agents' Last Exam (ALE) is a benchmark for evaluating AI agents on long-term, economically valuable real-world tasks across 13 industry clusters with 1K+ tasks, revealing significant gaps between benchmark performance and practical deployment.

Submitted by

XinyangDavidHan

Agents' Last Exam

Agents' Last Exam (ALE) is a benchmark for evaluating AI agents on long-term, economically valuable real-world tasks across 13 industry clusters with 1K+ tasks, revealing significant gaps between benchmark performance and practical deployment.

Submitted by

Jiaqi-hkust

Submitted by

Jiaqi-hkust

Submitted by

SenXu1123

Submitted by

SenXu1123

Submitted by

akhaliq

Very Large-Scale Multi-Agent Simulation in AgentScope

Enhancements to the AgentScope platform improve scalability, efficiency, and ease of use for large-scale multi-agent simulations through distributed mechanisms, flexible environments, and user-friendly tools.

· Published on Jul 25, 2024

Submitted by

akhaliq

Submitted by

unilm

VibeVoice Technical Report

VibeVoice synthesizes long-form multi-speaker speech using next-token diffusion and a highly efficient continuous speech tokenizer, achieving superior performance and fidelity.

Submitted by

unilm

VibeVoice Technical Report

VibeVoice synthesizes long-form multi-speaker speech using next-token diffusion and a highly efficient continuous speech tokenizer, achieving superior performance and fidelity.

Submitted by

taesiri

Submitted by

taesiri

Submitted by

zhouxiangxin

Submitted by

zhouxiangxin

Submitted by

liushiliushi

Submitted by

liushiliushi

LightRAG: Simple and Fast Retrieval-Augmented Generation

LightRAG improves Retrieval-Augmented Generation by integrating graph structures for enhanced contextual awareness and efficient information retrieval, achieving better accuracy and response times.

· Published on Oct 8, 2024

Submitted by

Paranioar

Submitted by

Paranioar

Submitted by

ryanlee-dev

MiniMax Sparse Attention

MiniMax Sparse Attention enables efficient processing of ultra-long contexts in large language models through blockwise sparsity and optimized GPU execution, achieving significant speedups while maintaining performance.

MiniMaxAI MiniMax

· Published on Jun 11, 2026

Submitted by

ryanlee-dev

MiniMax Sparse Attention

MiniMax Sparse Attention enables efficient processing of ultra-long contexts in large language models through blockwise sparsity and optimized GPU execution, achieving significant speedups while maintaining performance.

Submitted by

Rbin

RAG-Anything: All-in-One RAG Framework

RAG-Anything is a unified framework that enhances multimodal knowledge retrieval by integrating cross-modal relationships and semantic matching, outperforming existing methods on complex benchmarks.

Submitted by

Rbin

RAG-Anything: All-in-One RAG Framework

RAG-Anything is a unified framework that enhances multimodal knowledge retrieval by integrating cross-modal relationships and semantic matching, outperforming existing methods on complex benchmarks.

Submitted by

jasonrqh

Submitted by

jasonrqh

Submitted by

nielsr

Submitted by

nielsr

Submitted by

zbhpku

Submitted by

zbhpku

Submitted by

imone

HRM-Text: Efficient Pretraining Beyond Scaling

A Hierarchical Recurrent Model architecture with specialized training on instruction-response pairs achieves competitive language modeling performance with significantly reduced computational requirements compared to traditional Transformer-based approaches.

Submitted by

imone

HRM-Text: Efficient Pretraining Beyond Scaling

A Hierarchical Recurrent Model architecture with specialized training on instruction-response pairs achieves competitive language modeling performance with significantly reduced computational requirements compared to traditional Transformer-based approaches.

Submitted by

pat-jj

Submitted by

pat-jj

Submitted by

MoeinAbtahi

Submitted by

MoeinAbtahi

Submitted by

mervenoyan

Submitted by

mervenoyan

Submitted by

yifanzhang114

Submitted by

yifanzhang114