Trending Papers - Hugging Face (original) (raw)
new
Get trending papers in your email inbox once a day!
Get trending papers in your email inbox!
by
AK and the research community
Submitted by
unilm
VibeVoice Technical Report
VibeVoice synthesizes long-form multi-speaker speech using next-token diffusion and a highly efficient continuous speech tokenizer, achieving superior performance and fidelity.
Submitted by
unilm
VibeVoice Technical Report
VibeVoice synthesizes long-form multi-speaker speech using next-token diffusion and a highly efficient continuous speech tokenizer, achieving superior performance and fidelity.
Submitted by
![]()
BradyFU
Submitted by
![]()
BradyFU
Submitted by
![]()
taesiri
Submitted by
![]()
taesiri
Submitted by
chengtim
VOID: Video Object and Interaction Deletion
VOID is a video object removal framework that uses vision-language models and video diffusion models to generate physically plausible scenes by leveraging causal reasoning and counterfactual reasoning.
Submitted by
chengtim
VOID: Video Object and Interaction Deletion
VOID is a video object removal framework that uses vision-language models and video diffusion models to generate physically plausible scenes by leveraging causal reasoning and counterfactual reasoning.
Submitted by
![]()
AaronHuangWei
Submitted by
![]()
AaronHuangWei
Kronos: A Foundation Model for the Language of Financial Markets
Kronos, a specialized pre-training framework for financial K-line data, outperforms existing models in forecasting and synthetic data generation through a unique tokenizer and autoregressive pre-training on a large dataset.
- 7 authors
· Published on Aug 2, 2025
Submitted by
![]()
WENGSYX
Submitted by
![]()
WENGSYX
LightRAG: Simple and Fast Retrieval-Augmented Generation
LightRAG improves Retrieval-Augmented Generation by integrating graph structures for enhanced contextual awareness and efficient information retrieval, achieving better accuracy and response times.
- 5 authors
· Published on Oct 8, 2024
Submitted by
![]()
akhaliq
Submitted by
![]()
akhaliq
Submitted by
Tyrannosaurus
Submitted by
Tyrannosaurus
Submitted by
![]()
akhaliq
Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory
Mem0, a memory-centric architecture with graph-based memory, enhances long-term conversational coherence in LLMs by efficiently extracting, consolidating, and retrieving information, outperforming existing memory systems in terms of accuracy and computational efficiency.
· Published on Apr 28, 2025
Submitted by
![]()
akhaliq
Submitted by
![]()
taesiri
GLM-5: from Vibe Coding to Agentic Engineering
GLM-5 advances foundation models with DSA for cost reduction, asynchronous reinforcement learning for improved alignment, and enhanced coding capabilities for real-world software engineering.
· Published on Feb 17, 2026
Submitted by
![]()
taesiri
GLM-5: from Vibe Coding to Agentic Engineering
GLM-5 advances foundation models with DSA for cost reduction, asynchronous reinforcement learning for improved alignment, and enhanced coding capabilities for real-world software engineering.
Submitted by
yyamada
Submitted by
yyamada
AutoDev: Automated AI-Driven Development
AutoDev is an AI-driven software development framework that automates complex engineering tasks within a secure Docker environment, achieving high performance in code and test generation.
- 5 authors
· Published on Mar 13, 2024
AutoDev: Automated AI-Driven Development
AutoDev is an AI-driven software development framework that automates complex engineering tasks within a secure Docker environment, achieving high performance in code and test generation.
Submitted by
![]()
taesiri
Submitted by
![]()
taesiri
Submitted by
![]()
jarridrb
Submitted by
![]()
jarridrb
Submitted by
![]()
rubenohana
Submitted by
![]()
rubenohana
Submitted by
![]()
taesiri
Submitted by
![]()
taesiri
Submitted by
![]()
taesiri
Submitted by
![]()
taesiri
Submitted by
![]()
akhaliq
Very Large-Scale Multi-Agent Simulation in AgentScope
Enhancements to the AgentScope platform improve scalability, efficiency, and ease of use for large-scale multi-agent simulations through distributed mechanisms, flexible environments, and user-friendly tools.
· Published on Jul 25, 2024
Submitted by
![]()
akhaliq
Submitted by
![]()
taesiri
Submitted by
![]()
taesiri
Submitted by
wangzx1994
Generative World Renderer
A large-scale dynamic dataset derived from AAA games is introduced to improve generative inverse and forward rendering, featuring high-resolution synchronized RGB and G-buffer data alongside a novel VLM-based evaluation method that correlates well with human judgment.
Submitted by
wangzx1994
Generative World Renderer
A large-scale dynamic dataset derived from AAA games is introduced to improve generative inverse and forward rendering, featuring high-resolution synchronized RGB and G-buffer data alongside a novel VLM-based evaluation method that correlates well with human judgment.
Submitted by
Virgilllll
Submitted by
Virgilllll
Submitted by
![]()
akhaliq
Submitted by
![]()
akhaliq
Submitted by
![]()
Rbin
RAG-Anything: All-in-One RAG Framework
RAG-Anything is a unified framework that enhances multimodal knowledge retrieval by integrating cross-modal relationships and semantic matching, outperforming existing methods on complex benchmarks.
Submitted by
![]()
Rbin
RAG-Anything: All-in-One RAG Framework
RAG-Anything is a unified framework that enhances multimodal knowledge retrieval by integrating cross-modal relationships and semantic matching, outperforming existing methods on complex benchmarks.
Submitted by
![]()
andito
Submitted by
![]()
andito
Submitted by
![]()
jianchen0311
DFlash: Block Diffusion for Flash Speculative Decoding
DFlash is a speculative decoding framework that uses a lightweight block diffusion model for parallel token drafting, achieving significant speedup over existing autoregressive methods while maintaining high-quality outputs.
· Published on Feb 5, 2026
Submitted by
![]()
jianchen0311
DFlash: Block Diffusion for Flash Speculative Decoding
DFlash is a speculative decoding framework that uses a lightweight block diffusion model for parallel token drafting, achieving significant speedup over existing autoregressive methods while maintaining high-quality outputs.
Submitted by
![]()
vinthony
Submitted by
![]()
vinthony
Submitted by
![]()
akhaliq
Submitted by
![]()
akhaliq
Submitted by
ethanchern
Submitted by
ethanchern
Submitted by
![]()
taesiri
Submitted by
![]()
taesiri
Submitted by
quao627
Submitted by
quao627
Submitted by
Jiabin99
Submitted by
Jiabin99
Submitted by
![]()
taesiri
Memory Intelligence Agent
Memory Intelligence Agent framework integrates non-parametric and parametric memory systems with reinforcement learning to enable efficient reasoning and autonomous evolution in open-world environments.
- 9 authors
· Published on Apr 6, 2026
Submitted by
![]()
taesiri
Memory Intelligence Agent
Memory Intelligence Agent framework integrates non-parametric and parametric memory systems with reinforcement learning to enable efficient reasoning and autonomous evolution in open-world environments.
Self-Supervised Prompt Optimization
A self-supervised framework optimizes prompts for both closed and open-ended tasks by evaluating LLM outputs without external references, reducing costs and required data.
· Published on Feb 7, 2025
Self-Supervised Prompt Optimization
A self-supervised framework optimizes prompts for both closed and open-ended tasks by evaluating LLM outputs without external references, reducing costs and required data.
Submitted by
![]()
Jeff-Wang
GigaWorld-Policy: An Efficient Action-Centered World--Action Model
GigaWorld-Policy introduces an action-centered World-Action Model that improves robotic policy learning by decoupling visual and motion representations, enabling faster inference and better task performance through dual supervision from action prediction and video generation.
· Published on Mar 18, 2026
Submitted by
![]()
Jeff-Wang
GigaWorld-Policy: An Efficient Action-Centered World--Action Model
GigaWorld-Policy introduces an action-centered World-Action Model that improves robotic policy learning by decoupling visual and motion representations, enabling faster inference and better task performance through dual supervision from action prediction and video generation.
Submitted by
![]()
hao-li
Agent READMEs: An Empirical Study of Context Files for Agentic Coding
Agentic coding tools receive goals written in natural language as input, break them down into specific tasks, and write or execute the actual code with minimal human intervention. Central to this process are agent context files ("READMEs for agents") that provide persistent, project-level instructions. In this paper, we conduct the first large-scale empirical study of 2,303 agent context files from 1,925 repositories to characterize their structure, maintenance, and content. We find that these files are not static documentation but complex, difficult-to-read artifacts that evolve like configuration code, maintained through frequent, small additions. Our content analysis of 16 instruction types shows that developers prioritize functional context, such as build and run commands (62.3%), implementation details (69.9%), and architecture (67.7%). We also identify a significant gap: non-functional requirements like security (14.5%) and performance (14.5%) are rarely specified. These findings indicate that while developers use context files to make agents functional, they provide few guardrails to ensure that agent-written code is secure or performant, highlighting the need for improved tooling and practices.

- 11 authors
· Published on Nov 17, 2025
Submitted by
![]()
hao-li
Agent READMEs: An Empirical Study of Context Files for Agentic Coding
Agentic coding tools receive goals written in natural language as input, break them down into specific tasks, and write or execute the actual code with minimal human intervention. Central to this process are agent context files ("READMEs for agents") that provide persistent, project-level instructions. In this paper, we conduct the first large-scale empirical study of 2,303 agent context files from 1,925 repositories to characterize their structure, maintenance, and content. We find that these files are not static documentation but complex, difficult-to-read artifacts that evolve like configuration code, maintained through frequent, small additions. Our content analysis of 16 instruction types shows that developers prioritize functional context, such as build and run commands (62.3%), implementation details (69.9%), and architecture (67.7%). We also identify a significant gap: non-functional requirements like security (14.5%) and performance (14.5%) are rarely specified. These findings indicate that while developers use context files to make agents functional, they provide few guardrails to ensure that agent-written code is secure or performant, highlighting the need for improved tooling and practices.

- 11 authors
· Nov 17, 2025
Submitted by
groundhogLLM
Submitted by
groundhogLLM
Efficient Universal Perception Encoder
Efficient Universal Perception Encoder (EUPE) improves edge device performance by distilling knowledge from multiple vision encoders through a two-stage scaling approach, achieving superior representation quality compared to previous methods.
- 11 authors
· Published on Mar 23, 2026
Efficient Universal Perception Encoder
Efficient Universal Perception Encoder (EUPE) improves edge device performance by distilling knowledge from multiple vision encoders through a two-stage scaling approach, achieving superior representation quality compared to previous methods.
- 11 authors
· Mar 23, 2026
Submitted by
![]()
yxl66666
Submitted by
![]()
yxl66666
Submitted by
![]()
taesiri
Qwen3-TTS Technical Report
The Qwen3-TTS series presents advanced multilingual text-to-speech models with voice cloning and controllable speech generation capabilities, utilizing dual-track LM architecture and specialized speech tokenizers for efficient streaming synthesis.
· Published on Jan 22, 2026
Submitted by
![]()
taesiri
Qwen3-TTS Technical Report
The Qwen3-TTS series presents advanced multilingual text-to-speech models with voice cloning and controllable speech generation capabilities, utilizing dual-track LM architecture and specialized speech tokenizers for efficient streaming synthesis.
Submitted by
![]()
taesiri
In-Place Test-Time Training
In-Place Test-Time Training enables large language models to adapt parameters during inference by modifying the final projection matrix in MLP blocks with a task-aligned objective and efficient update mechanism.
- 7 authors
· Published on Apr 7, 2026
Submitted by
![]()
taesiri
In-Place Test-Time Training
In-Place Test-Time Training enables large language models to adapt parameters during inference by modifying the final projection matrix in MLP blocks with a task-aligned objective and efficient update mechanism.
- 7 authors
· Apr 7, 2026