Weining Yang - TikTok | LinkedIn (original) (raw)
- Yizhe Zhang Apple • 3K followers We (w/ Shansan Gong, Ruixiang ZHANG, Huangjie Zheng, Jiatao Gu, Navdeep Jaitly, Lingpeng Kong) released a family of 7B diffusion language models, DiffuCoder, that specializes on code generation, with a focus on understanding and improving masked diffusion models. A core analysis of DiffuCoder is the autoregressiveness (AR-ness) score, a novel metric that quantifies the causal patterns in decoding, revealing how diffusion models break from strict left-to-right generation for more flexible, non-linear code planning. Recent advances in autoregressive (AR) models dominate code generation, but diffusion-based LLMs (dLLMs) like DiffuCoder offer a promising alternative, especially for complex programming tasks. DiffuCoder explores how these models decode differently—showing less global AR-ness in code tasks compared to math—and how temperature affects both token selection and generation order, unlike traditional AR models. We also introduce coupled-GRPO, a post-training RL method with a coupled-sampling scheme, to reduce performance drops during accelerated decoding, boosting parallelism and efficiency. We use a self-improvement pipeline that leverages AR-ness analysis, coupled-GRPO optimization, and evaluation on benchmarks like AceCode-89k to refine decoding strategies. This approach enables DiffuCoder to navigate diverse code generation pathways and enhance performance with modest computational overhead. Looking ahead, we aim to further leverage Reinforcement Learning to steer code generation through these decoding patterns, with the discrete nature of AR-ness scores providing a foundation for search-based strategies—ideal for the sparse rewards of optimizing complex code structures. Check out our full paper and code for a deeper dive! Paper: https://lnkd.in/gVWU3BDJ Code: https://lnkd.in/gmXTZ_6n Models: https://lnkd.in/gTcKCDr9 #MachineLearning #AI #CodeGeneration #DiffusionModels #NLP
- Antonio Mallia Seltz • 4K followers ⚡ Exciting to see our Block-Max Pruning (BMP) technique in Infinity, an open-source AI-native database designed for LLM applications! In their latest VLDB paper, “Balancing the Blend: An Experimental Analysis of Trade-offs in Hybrid Search”, Hai Jin, Yingfeng Zhang, and co-authors present a rigorous evaluation of hybrid search architectures — combining full-text, sparse, dense, and tensor retrieval. To support efficient sparse vector search at scale, they’ve integrated BMP into Infinity’s SVS engine — a nice validation of our work on fast, top-k lexical retrieval. 🔗 BMP paper: https://lnkd.in/dsc33hGc 🔗 BMP code: https://lnkd.in/dxBxv225 🔗 Infinity: https://lnkd.in/ddRK5mbr 🔗 Hybrid Search paper: https://lnkd.in/dfBuDXmt Great to see ideas from traditional IR continuing to shape the next generation of retrieval infrastructure!
- 𝗧𝗵𝗲 𝗯𝗲𝘀𝘁 𝗔𝗜 𝗰𝗼𝗱𝗶𝗻𝗴 𝗹𝗲𝘀𝘀𝗼𝗻 𝗰𝗼𝘀𝘁 𝗼𝘂𝗿 𝗲𝗻𝗴𝗶𝗻𝗲𝗲𝗿 𝟲𝟬𝟬𝗮𝗻𝗱𝗮𝗺𝗮𝗿𝗿𝗶𝗮𝗴𝗲𝗮𝗿𝗴𝘂𝗺𝗲𝗻𝘁.OurVPofEngineering,Xiaofan(James)Luan,wassupposedtobuyhiswifeaDiorbagfortheiranniversary.Instead,heboughtthreeClaudeCodesubscriptionsandspenttheholidaytryingtocross−compile2millionlinesofC++.Everyfixononeplatformbroketwoothers.𝟲𝟬𝟬 𝗮𝗻𝗱 𝗮 𝗺𝗮𝗿𝗿𝗶𝗮𝗴𝗲 𝗮𝗿𝗴𝘂𝗺𝗲𝗻𝘁. Our VP of Engineering, Xiaofan(James) Luan, was supposed to buy his wife a Dior bag for their anniversary. Instead, he bought three Claude Code subscriptions and spent the holiday trying to cross-compile 2 million lines of C++. Every fix on one platform broke two others. 600andamarriageargument.OurVPofEngineering,Xiaofan(James)Luan,wassupposedtobuyhiswifeaDiorbagfortheiranniversary.Instead,heboughtthreeClaudeCodesubscriptionsandspenttheholidaytryingtocross−compile2millionlinesofC++.Everyfixononeplatformbroketwoothers.600 later, the only output was "git reset --hard" — and a very cold dinner table.😂 "Make it compile on Windows" is a trap. The real goal was "compile everywhere without hacks" — no AI is going to figure that out for you at 2 am. What worked: constraints before code, review tests not code, bottom-up, one layer at a time. Same task, two days. Then he ran six parallel Claude sessions across three machines with git worktree. The bottleneck stopped being intelligence and started being how fast one person can alt-tab. AI solves exactly the problem you give it. Engineering is in knowing which one to give. His wife is still waiting for that bag. Full story: https://lnkd.in/gtsW_Wvk ——— Follow Milvus, created by Zilliz, for everything related to unstructured data
- Milvus, created by Zilliz 13K followers 𝗧𝗵𝗲 𝗯𝗲𝘀𝘁 𝗔𝗜 𝗰𝗼𝗱𝗶𝗻𝗴 𝗹𝗲𝘀𝘀𝗼𝗻 𝗰𝗼𝘀𝘁 𝗼𝘂𝗿 𝗲𝗻𝗴𝗶𝗻𝗲𝗲𝗿 𝟲𝟬𝟬𝗮𝗻𝗱𝗮𝗺𝗮𝗿𝗿𝗶𝗮𝗴𝗲𝗮𝗿𝗴𝘂𝗺𝗲𝗻𝘁.OurVPofEngineering,Xiaofan(James)Luan,wassupposedtobuyhiswifeaDiorbagfortheiranniversary.Instead,heboughtthreeClaudeCodesubscriptionsandspenttheholidaytryingtocross−compile2millionlinesofC++.Everyfixononeplatformbroketwoothers.𝟲𝟬𝟬 𝗮𝗻𝗱 𝗮 𝗺𝗮𝗿𝗿𝗶𝗮𝗴𝗲 𝗮𝗿𝗴𝘂𝗺𝗲𝗻𝘁. Our VP of Engineering, Xiaofan(James) Luan, was supposed to buy his wife a Dior bag for their anniversary. Instead, he bought three Claude Code subscriptions and spent the holiday trying to cross-compile 2 million lines of C++. Every fix on one platform broke two others. 600andamarriageargument.OurVPofEngineering,Xiaofan(James)Luan,wassupposedtobuyhiswifeaDiorbagfortheiranniversary.Instead,heboughtthreeClaudeCodesubscriptionsandspenttheholidaytryingtocross−compile2millionlinesofC++.Everyfixononeplatformbroketwoothers.600 later, the only output was "git reset --hard" — and a very cold dinner table.😂 "Make it compile on Windows" is a trap. The real goal was "compile everywhere without hacks" — no AI is going to figure that out for you at 2 am. What worked: constraints before code, review tests not code, bottom-up, one layer at a time. Same task, two days. Then he ran six parallel Claude sessions across three machines with git worktree. The bottleneck stopped being intelligence and started being how fast one person can alt-tab. AI solves exactly the problem you give it. Engineering is in knowing which one to give. His wife is still waiting for that bag. Full story: https://lnkd.in/gtsW_Wvk ——— Follow Milvus, created by Zilliz, for everything related to unstructured data
- Pavan Kumar Mercedes-Benz Research and… • 2K followers ✍ The "RAM Killer" in LLMs: Why PagedAttention is the Industry Standard 😊 In a System Design interview, if you are asked: "How do you serve a Llama-3-70B model to 1,000 concurrent users/min?" and you answer "Just buy more GPUs," you fail The bottleneck in LLM inference isn't usually Compute; it's Memory Bandwidth and Capacity, specifically caused by the KV Cache. Standard attention wastes 60-80% of GPU memory due to "fragmentation." PagedAttention (used in vLLM) solves this by borrowing a 30-year-old concept from Operating Systems: Virtual Memory Paging. Here is the math of the KV Cache and the architecture of PagedAttention. 👇 1. The Math: Why #KVCache Explodes When an LLM generates token #100, it needs to attend to tokens #1-#99. We don't want to re-calculate the Key (K) and Value (V) matrices for those 99 tokens every single time. So, we cache them. The Memory Formula: For a single request, the VRAM consumption for KV Cache is: Size=2×L×Nlayers×Dmodel×Pprecision 2: One for Key, one for Value. L: Sequence Length (Context). N: Number of Layers. D: Hidden Dimension. P: Precision (e.g., 2 bytes for FP16). The Problem (Internal Fragmentation): In standard frameworks (like HuggingFace default), you must pre-allocate a contiguous block of memory for the maximum context length (e.g., 4096). If the user only types 10 words, you wasted 4086 slots of VRAM. This prevents you from batching other users. The Solution: #PagedAttention PagedAttention breaks the KV Cache into small, fixed-size "Blocks" (e.g., 16 tokens per block). These blocks do not need to be contiguous in physical memory. Logical KV Blocks: What the model sees (continuous). Block Table: The map (just like an OS Page Table). Physical KV Blocks: Where the data actually sits (scattered/non-contiguous). #LLMOps #SystemDesign #GPUOptimization #vLLM #MachineLearning #DeepLearning #Engineering #AIInfrastructure #StatPavan
- Ramin Mehran Google DeepMind • 4K followers In this episode, we discuss ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models by Mingjie Liu, Shizhe Diao, Ximing Lu, Jian Hu, Xin Dong, Yejin Choi, Jan Kautz, Yi Dong. This paper introduces ProRL, a new reinforcement learning training method that uncovers novel reasoning strategies beyond those found in base language models. Empirical results show that models trained with ProRL consistently outperform base models on challenging reasoning tasks, including cases where base models fail even with extensive attempts. The study demonstrates that prolonged RL can meaningfully expand reasoning capabilities by exploring new solution spaces over time, advancing understanding of how RL enhances language model reasoning.
- Ilyes.T. M. GM CAPITAL HOLDING • 10K followers Google Antigravity exposes the critical flaw in autonomous AI agent architectures: trust based governance at scale. Antigravity represents a major shift in how developers work. Instead of writing code line by line, you delegate entire tasks to autonomous agents that can modify files, run tests, browse the web, and execute changes across your codebase in parallel. The problem? These agents operate on trust, not proof. One developer reported: On Day 3, an agent confidently refactored a utility function and silently deleted a critical edge case check. This is not a bug. This is the inevitable result of autonomous agents operating without cryptographic authority validation. When you have three agents working asynchronously across different files, two critical questions emerge: how do you enforce what each agent is authorized to do, and when multiple agents coordinate on shared resources, how do you maintain isolation between their operations? Policy based guardrails do not work at this scale. I have solved both problems through complementary cryptographic architectures. For individual agent authorization, my 13 layer cryptographic governance system validates AI agent authority mathematically before execution. For multi agent coordination, my YIN COLLAB architecture implements Agent Specific Compliance Tokens with per agent privacy isolation maintaining cryptographic boundaries preventing cross contamination. Every action carries immutable proof of authorization. Every decision boundary is pre validated cryptographically. 26 USPTO patents. 2,330 claims. Validated with 640x timing resistance and 500 plus concurrent agent support with sub 15ms latency. Mathematical proof, not policy promises. For developers using Antigravity, Cursor, or any autonomous agent platform: the question is whether you can prove mathematically that unauthorized operations are impossible and that multi agent coordination maintains isolation. Because as agent orchestration becomes the dominant development paradigm, the liability surface expands exponentially. One misconfigured agent with access to production systems is an organizational failure. One agent leaking sensitive data to another agent through shared context is an architectural vulnerability. Making it mathematically impossible for agents to violate boundaries is the only governance model that scales. Autonomous agents are the future of software development. Cryptographic governance is the only way to make that future safe. https://lnkd.in/ejPREk9D #GoogleAntigravity #AIGovernance #AutonomousAgents #Cybersecurity #AIAgents #DeveloperTools #CryptographicSecurity #ZeroTrust #AICompliance #SoftwareDevelopment #TechInnovation #AIEthics
- Ivan Djordjevic **System / Instruction… • 4K followers From self-driving cars to self-driving agents: the same rigor now comes to LLM evaluation. In self-driving, you can’t ship a car without simulation and evaluation. Scorecard brings that same discipline to AI agents. Built by the same Waymo engineer who built self-driving sim and eval, Scorecard introduces reproducible, automated scoring for agent workflows, testing multi-step reasoning, tool usage, and task completion under consistent conditions. You can run 𝐿𝐿𝑀-𝑎𝑠-𝑗𝑢𝑑𝑔𝑒 𝑒𝑣𝑎𝑙𝑢𝑎𝑡𝑖𝑜𝑛𝑠 directly in CI/CD or a playground, then debug issues with OpenTelemetry traces that reveal which tool failed, why an agent looped, or where its reasoning went off track. More than an eval suite, Scorecard is a collaboration hub for shared datasets, simulated agents, and custom metrics, designed to make agent evaluation transparent, auditable, and standardized across teams. In a world where models change weekly, Scorecard gives AI builders a stable foundation for measuring progress. 🚦 Run your first eval: https://app.scorecard.io 📘 Docs: https://docs.scorecard.io