Machine Learning Street Talk (MLST) (original) (raw)

We interview Professor Christopher Summerfield from Oxford University about his new book "These Strange New Minds: How AI Learned to Talk and What It". AI learned to understand the world just by reading text - something scientists thought was impossible. You don't need to see a cat to know what one is; you can learn everything from words alone. This is "the most astonishing scientific discovery of the 21st century."People are split: some refuse to call what AI does "thinking" even when it outperforms humans, while others believe if it acts intelligent, it is intelligent. Summerfield takes the middle ground - AI does something genuinely like human reasoning, but that doesn't make it human.Sponsor messages:========Google Gemini: Google Gemini features Veo3, a state-of-the-art AI video generation model in the Gemini app. Sign up at https://gemini.google.comTufa AI Labs are hiring for ML Engineers and a Chief Scientist in Zurich/SF. They are top of the ARCv2 leaderboard! https://tufalabs.ai/========Prof. Christopher Summerfieldhttps://www.psy.ox.ac.uk/people/christopher-summerfieldThese Strange New Minds: How AI Learned to Talk and What It Meanshttps://amzn.to/4e26BVaTable of Contents:Introduction & Setup00:00:00 Superman 3 Metaphor - Humans Absorbed by Machines00:02:01 Book Introduction & AI Debate Context00:03:45 Sponsor Segments (Google Gemini, Tufa Labs)Philosophical Foundations00:04:48 The Fractured AI Discourse00:08:21 Ancient Roots: Aristotle vs Plato (Empiricism vs Rationalism)00:10:14 Historical AI: Symbolic Logic and Its LimitsThe Language Revolution00:12:11 ChatGPT as the Rubicon Moment00:14:00 The Astonishing Discovery: Learning Reality from Words Alone00:15:47 Equivalentists vs Exceptionalists DebateCognitive Science Perspectives00:19:12 Functionalism and the Duck Test00:21:48 Brain-AI Similarities and Computational Principles00:24:53 Reconciling Chomsky: Evolution vs Learning00:28:15 Lamarckian AI vs Darwinian Human LearningThe Reality of AI Capabilities00:30:29 Anthropomorphism and the Clever Hans Effect00:32:56 The Intentional Stance and Nature of Thinking00:37:56 Three Major AI Worries: Agency, Personalization, DynamicsSocietal Risks and Complex Systems00:37:56 AI Agents and Flash Crash Scenarios00:42:50 Removing Frictions: The Lawfare Example00:46:15 Gradual Disempowerment Theory00:49:18 The Faustian Pact of TechnologyHuman Agency and Control00:51:18 The Crisis of Authenticity00:56:22 Psychology of Control vs Reward01:00:21 Dopamine Hacking and Variable ReinforcementFuture Directions01:02:27 Evolution as Goal-less Optimization01:03:31 Open-Endedness and Creative Evolution01:06:46 Writing, Creativity, and AI-Generated Content01:08:18 Closing RemarksREFS:Academic References (Abbreviated)Essential Books"These Strange New Minds" - C. Summerfield [00:02:01] - Main discussion topic"The Mind is Flat" - N. Chater [00:33:45] - Summerfield's favorite on cognitive illusions"AI: A Guide for Thinking Humans" - M. Mitchell [00:04:58] - Host's previous favorite"Principia Mathematica" - Russell & Whitehead [00:11:00] - Logic Theorist reference"Syntactic Structures" - N. Chomsky (1957) [00:13:30] - Generative grammar foundation"Why Greatness Cannot Be Planned" - Stanley & Lehman [01:04:00] - Open-ended evolutionKey Papers & Studies"Gradual Disempowerment" - D. Duvenaud [00:46:45] - AI threat model"Counterfeit People" - D. Dennett (Atlantic) [00:52:45] - AI societal risks"Open-Endedness is Essential..." - DeepMind/Rocktäschel/Hughes [01:03:42]Heider & Simmel (1944) [00:30:45] - Agency attribution to shapesWhitehall Studies - M. Marmot [00:59:32] - Control and health outcomes"Clever Hans" - O. Pfungst (1911) [00:31:47] - Animal intelligence illusionHistorical References

![](https://image-cdn-fa.spotifycdn.com/image/ab6772ab000015be6eb42bed7d5d15fd6efd779e ""Blurring Reality" - Chai's Social AI Platform (SPONSORED)")

"Blurring Reality" - Chai's Social AI Platform - sponsoredThis episode of MLST explores the groundbreaking work of Chai, a social AI platform that quietly built one of the world's largest AI companion ecosystems before ChatGPT's mainstream adoption. With over 10 million active users and just 13 engineers serving 2 trillion tokens per day, Chai discovered the massive appetite for AI companionship through serendipity while searching for product-market fit.CHAI sponsored this show *because they want to hire amazing engineers* -- CAREER OPPORTUNITIES AT CHAIChai is actively hiring in Palo Alto with competitive compensation ($300K-$800K+ equity) for roles including AI Infrastructure Engineers, Software Engineers, Applied AI Researchers, and more. Fast-track qualification available for candidates with significant product launches, open source contributions, or entrepreneurial success.https://www.chai-research.com/jobs/The conversation with founder William Beauchamp and engineers Tom Lu and Nischay Dhankhar covers Chai's innovative technical approaches including reinforcement learning from human feedback (RLHF), model blending techniques that combine smaller models to outperform larger ones, and their unique infrastructure challenges running exaflop-class compute.SPONSOR MESSAGES:***Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers in Zurich and SF. Goto https://tufalabs.ai/\*\*\*Key themes explored include:- The ethics of AI engagement optimization and attention hacking- Content moderation at scale with a lean engineering team- The shift from AI as utility tool to AI as social companion- How users form deep emotional bonds with artificial intelligence- The broader implications of AI becoming a social mediumWe also examine OpenAI's recent pivot toward companion AI with April's new GPT-4o, suggesting a fundamental shift in how we interact with artificial intelligence - from utility-focused tools to companion-like experiences that blur the lines between human and artificial intimacy.The episode also covers Chai's unconventional approach to hiring only top-tier engineers, their bootstrap funding strategy focused on user revenue over VC funding, and their rapid experimentation culture where one in five experiments succeed.TOC:00:00:00 - Introduction: Steve Jobs' AI Vision & Chai's Scale00:04:02 - Chapter 1: Simulators - The Birth of Social AI00:13:34 - Chapter 2: Engineering at Chai - RLHF & Model Blending00:21:49 - Chapter 3: Social Impact of GenAI - Ethics & Safety00:33:55 - Chapter 4: The Lean Machine - 13 Engineers, Millions of Users00:42:38 - Chapter 5: GPT-4o Becoming a Companion - OpenAI's Pivot00:50:10 - Chapter 6: What Comes Next - The Future of AI Intimacy TRANSCRIPT: https://www.dropbox.com/scl/fi/yz2ewkzmwz9rbbturfbap/CHAI.pdf?rlkey=uuyk2nfhjzezucwdgntg5ubqb&dl=0

Today GoogleDeepMind released AlphaEvolve: a Gemini coding agent for algorithm discovery. It beat the famous Strassen algorithm for matrix multiplication set 56 years ago. Google has been killing it recently. We had early access to the paper and interviewed the researchers behind the work.AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithmshttps://deepmind.google/discover/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/Authors: Alexander Novikov*, Ngân Vũ*, Marvin Eisenberger*, Emilien Dupont*, Po-Sen Huang*, Adam Zsolt Wagner*, Sergey Shirobokov*, Borislav Kozlovskii*, Francisco J. R. Ruiz, Abbas Mehrabian, M. Pawan Kumar, Abigail See, Swarat Chaudhuri, George Holland, Alex Davies, Sebastian Nowozin, Pushmeet Kohli, Matej Balog*(* indicates equal contribution or special designation, if defined elsewhere)SPONSOR MESSAGES:***Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich. Goto https://tufalabs.ai/\*\*\*AlphaEvolve works like a very smart, tireless programmer. It uses powerful AI language models (like Gemini) to generate ideas for computer code. Then, it uses an "evolutionary" process – like survival of the fittest for programs. It tries out many different program ideas, automatically tests how well they solve a problem, and then uses the best ones to inspire new, even better programs.Beyond this mathematical breakthrough, AlphaEvolve has already been used to improve real-world systems at Google, such as making their massive data centers run more efficiently and even speeding up the training of the AI models that power AlphaEvolve itself. The discussion also covers how humans work with AlphaEvolve, the challenges of making AI discover things, and the exciting future of AI helping scientists make new discoveries.In short, AlphaEvolve is a powerful new AI tool that can invent new algorithms and solve complex problems, showing how AI can be a creative partner in science and engineering.Guests:Matej Balog: https://x.com/matejbalogAlexander Novikov: https://x.com/SashaVNovikovREFS:MAP Elites [Jean-Baptiste Mouret, Jeff Clune]https://arxiv.org/abs/1504.04909FunSearch [Bernardino Romera-Paredes, Mohammadamin Barekatain, Alexander Novikov, Matej Balog, M. Pawan Kumar, Emilien Dupont, Francisco J. R. Ruiz, Jordan S. Ellenberg, Pengming Wang, Omar Fawzi, Pushmeet Kohli & Alhussein Fawzi]https://www.nature.com/articles/s41586-023-06924-6TOC:\[00:00:00\] Introduction: Alpha Evolve's Breakthroughs, DeepMind's Lineage, and Real-World Impact[00:12:06] Introducing AlphaEvolve: Concept, Evolutionary Algorithms, and Architecture[00:16:56] Search Challenges: The Halting Problem and Enabling Creative Leaps[00:23:20] Knowledge Augmentation: Self-Generated Data, Meta-Prompting, and Library Learning[00:29:08] Matrix Multiplication Breakthrough: From Strassen to AlphaEvolve's 48 Multiplications[00:39:11] Problem Representation: Direct Solutions, Constructors, and Search Algorithms[00:46:06] Developer Reflections: Surprising Outcomes and Superiority over Simple LLM Sampling[00:51:42] Algorithmic Improvement: Hill Climbing, Program Synthesis, and Intelligibility[01:00:24] Real-World Application: Complex Evaluations and Robotics[01:05:39] Role of LLMs & Future: Advanced Models, Recursive Self-Improvement, and Human-AI Collaboration[01:11:22] Resource Considerations: Compute Costs of AlphaEvolveThis is a trial of posting videos on Spotify, thoughts? Email me or chat in our Discord

Randall Balestriero joins the show to discuss some counterintuitive findings in AI. He shares research showing that huge language models, even when started from scratch (randomly initialized) without massive pre-training, can learn specific tasks like sentiment analysis surprisingly well, train stably, and avoid severe overfitting, sometimes matching the performance of costly pre-trained models. This raises questions about when giant pre-training efforts are truly worth it.He also talks about how self-supervised learning (where models learn from data structure itself) and traditional supervised learning (using labeled data) are fundamentally similar, allowing researchers to apply decades of supervised learning theory to improve newer self-supervised methods.Finally, Randall touches on fairness in AI models used for Earth data (like climate prediction), revealing that these models can be biased, performing poorly in specific locations like islands or coastlines even if they seem accurate overall, which has important implications for policy decisions based on this data.SPONSOR MESSAGES:***Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich. Goto https://tufalabs.ai/\*\*\*TRANSCRIPT + SHOWNOTES:https://www.dropbox.com/scl/fi/n7yev71nsjso71jyjz1fy/RANDALLNEURIPS.pdf?rlkey=0dn4injp1sc4ts8njwf3wfmxv&dl=0TOC:1\. Model Training Efficiency and Scale [00:00:00] 1.1 Training Stability of Large Models on Small Datasets [00:04:09] 1.2 Pre-training vs Random Initialization Performance Comparison [00:07:58] 1.3 Task-Specific Models vs General LLMs Efficiency2. Learning Paradigms and Data Distribution [00:10:35] 2.1 Fair Language Model Paradox and Token Frequency Issues [00:12:02] 2.2 Pre-training vs Single-task Learning Spectrum [00:16:04] 2.3 Theoretical Equivalence of Supervised and Self-supervised Learning [00:19:40] 2.4 Self-Supervised Learning and Supervised Learning Relationships [00:21:25] 2.5 SSL Objectives and Heavy-tailed Data Distribution Challenges3. Geographic Representation in ML Systems [00:25:20] 3.1 Geographic Bias in Earth Data Models and Neural Representations [00:28:10] 3.2 Mathematical Limitations and Model Improvements [00:30:24] 3.3 Data Quality and Geographic Bias in ML DatasetsREFS:[00:01:40] Research on training large language models from scratch on small datasets, Randall Balestriero et al.https://openreview.net/forum?id=wYGBWOjq1Q\[00:10:35\] The Fair Language Model Paradox (2024), Andrea Pinto, Tomer Galanti, Randall Balestrierohttps://arxiv.org/abs/2410.11985\[00:12:20\] Muppet: Massive Multi-task Representations with Pre-Finetuning (2021), Armen Aghajanyan et al.https://arxiv.org/abs/2101.11038\[00:14:30\] Dissociating language and thought in large language models (2023), Kyle Mahowald et al.https://arxiv.org/abs/2301.06627\[00:16:05\] The Birth of Self-Supervised Learning: A Supervised Theory, Randall Balestriero et al.https://openreview.net/forum?id=NhYAjAAdQT\[00:21:25\] VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning, Adrien Bardes, Jean Ponce, Yann LeCunhttps://arxiv.org/abs/2105.04906\[00:25:20\] No Location Left Behind: Measuring and Improving the Fairness of Implicit Representations for Earth Data (2025), Daniel Cai, Randall Balestriero, et al.https://arxiv.org/abs/2502.06831\[00:33:45\] Mark Ibrahim et al.'s work on geographic bias in computer vision datasets, Mark Ibrahimhttps://arxiv.org/pdf/2304.12210

Prof. Kevin Ellis and Dr. Zenna Tavares talk about making AI smarter, like humans. They want AI to learn from just a little bit of information by actively trying things out, not just by looking at tons of data.They discuss two main ways AI can "think": one way is like following specific rules or steps (like a computer program), and the other is more intuitive, like guessing based on patterns (like modern AI often does). They found combining both methods works well for solving complex puzzles like ARC.A key idea is "compositionality" - building big ideas from small ones, like LEGOs. This is powerful but can also be overwhelming. Another important idea is "abstraction" - understanding things simply, without getting lost in details, and knowing there are different levels of understanding.Ultimately, they believe the best AI will need to explore, experiment, and build models of the world, much like humans do when learning something new.SPONSOR MESSAGES:***Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich. Goto https://tufalabs.ai/\*\*\*TRANSCRIPT:https://www.dropbox.com/scl/fi/3ngggvhb3tnemw879er5y/BASIS.pdf?rlkey=lr2zbj3317mex1q5l0c2rsk0h&dl=0 Zenna Tavares:http://www.zenna.org/Kevin Ellis:https://www.cs.cornell.edu/\~ellisk/TOC:1\. Compositionality and Learning Foundations [00:00:00] 1.1 Compositional Search and Learning Challenges [00:03:55] 1.2 Bayesian Learning and World Models [00:12:05] 1.3 Programming Languages and Compositionality Trade-offs [00:15:35] 1.4 Inductive vs Transductive Approaches in AI Systems2. Neural-Symbolic Program Synthesis [00:27:20] 2.1 Integration of LLMs with Traditional Programming and Meta-Programming [00:30:43] 2.2 Wake-Sleep Learning and DreamCoder Architecture [00:38:26] 2.3 Program Synthesis from Interactions and Hidden State Inference [00:41:36] 2.4 Abstraction Mechanisms and Resource Rationality [00:48:38] 2.5 Inductive Biases and Causal Abstraction in AI Systems3. Abstract Reasoning Systems [00:52:10] 3.1 Abstract Concepts and Grid-Based Transformations in ARC [00:56:08] 3.2 Induction vs Transduction Approaches in Abstract Reasoning [00:59:12] 3.3 ARC Limitations and Interactive Learning Extensions [01:06:30] 3.4 Wake-Sleep Program Learning and Hybrid Approaches [01:11:37] 3.5 Project MARA and Future Research DirectionsREFS:[00:00:25] DreamCoder, Kevin Ellis et al.https://arxiv.org/abs/2006.08381\[00:01:10\] Mind Your Step, Ryan Liu et al.https://arxiv.org/abs/2410.21333\[00:06:05\] Bayesian inference, Griffiths, T. L., Kemp, C., & Tenenbaum, J. B.https://psycnet.apa.org/record/2008-06911-003\[00:13:00\] Induction and Transduction, Wen-Ding Li, Zenna Tavares, Yewen Pu, Kevin Ellishttps://arxiv.org/abs/2411.02272\[00:23:15\] Neurosymbolic AI, Garcez, Artur d'Avila et al.https://arxiv.org/abs/2012.05876\[00:33:50\] Induction and Transduction (II), Wen-Ding Li, Kevin Ellis et al.https://arxiv.org/abs/2411.02272\[00:38:35\] ARC, François Chollethttps://arxiv.org/abs/1911.01547\[00:39:20\] Causal Reactive Programs, Ria Das, Joshua B. Tenenbaum, Armando Solar-Lezama, Zenna Tavareshttp://www.zenna.org/publications/autumn2022.pdf\[00:42:50\] MuZero, Julian Schrittwieser et al.http://arxiv.org/pdf/1911.08265\[00:43:20\] VisualPredicator, Yichao Lianghttps://arxiv.org/abs/2410.23156\[00:48:55\] Bayesian models of cognition, Joshua B. Tenenbaumhttps://mitpress.mit.edu/9780262049412/bayesian-models-of-cognition/\[00:49:30\] The Bitter Lesson, Rich Suttonhttp://www.incompleteideas.net/IncIdeas/BitterLesson.html\[01:06:35\] Program induction, Kevin Ellis, Wen-Ding Lihttps://arxiv.org/pdf/2411.02272\[01:06:50\] DreamCoder (II), Kevin Ellis et al.https://arxiv.org/abs/2006.08381\[01:11:55\] Project MARA, Zenna Tavares, Kevin Ellishttps://www.basis.ai/blog/mara/

Eiso Kant, CTO of poolside AI, discusses the company's approach to building frontier AI foundation models, particularly focused on software development. Their unique strategy is reinforcement learning from code execution feedback which is an important axis for scaling AI capabilities beyond just increasing model size or data volume. Kant predicts human-level AI in knowledge work could be achieved within 18-36 months, outlining poolside's vision to dramatically increase software development productivity and accessibility. SPONSOR MESSAGES:***Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich. Goto https://tufalabs.ai/\*\*\*Eiso Kant:https://x.com/eisokanthttps://poolside.ai/TRANSCRIPT:https://www.dropbox.com/scl/fi/szepl6taqziyqie9wgmk9/poolside.pdf?rlkey=iqar7dcwshyrpeoz0xa76k422&dl=0TOC:1\. Foundation Models and AI Strategy [00:00:00] 1.1 Foundation Models and Timeline Predictions for AI Development [00:02:55] 1.2 Poolside AI's Corporate History and Strategic Vision [00:06:48] 1.3 Foundation Models vs Enterprise Customization Trade-offs2. Reinforcement Learning and Model Economics [00:15:42] 2.1 Reinforcement Learning and Code Execution Feedback Approaches [00:22:06] 2.2 Model Economics and Experimental Optimization3. Enterprise AI Implementation [00:25:20] 3.1 Poolside's Enterprise Deployment Strategy and Infrastructure [00:26:00] 3.2 Enterprise-First Business Model and Market Focus [00:27:05] 3.3 Foundation Models and AGI Development Approach [00:29:24] 3.4 DeepSeek Case Study and Infrastructure Requirements4. LLM Architecture and Performance [00:30:15] 4.1 Distributed Training and Hardware Architecture Optimization [00:33:01] 4.2 Model Scaling Strategies and Chinchilla Optimality Trade-offs [00:36:04] 4.3 Emergent Reasoning and Model Architecture Comparisons [00:43:26] 4.4 Balancing Creativity and Determinism in AI Models [00:50:01] 4.5 AI-Assisted Software Development Evolution5. AI Systems Engineering and Scalability [00:58:31] 5.1 Enterprise AI Productivity and Implementation Challenges [00:58:40] 5.2 Low-Code Solutions and Enterprise Hiring Trends [01:01:25] 5.3 Distributed Systems and Engineering Complexity [01:01:50] 5.4 GenAI Architecture and Scalability Patterns [01:01:55] 5.5 Scaling Limitations and Architectural Patterns in AI Code Generation6. AI Safety and Future Capabilities [01:06:23] 6.1 Semantic Understanding and Language Model Reasoning Approaches [01:12:42] 6.2 Model Interpretability and Safety Considerations in AI Systems [01:16:27] 6.3 AI vs Human Capabilities in Software Development [01:33:45] 6.4 Enterprise Deployment and Security ArchitectureCORE REFS (see shownotes for URLs/more refs):[00:15:45] Research demonstrating how training on model-generated content leads to distribution collapse in AI models, Ilia Shumailov et al. (Key finding on synthetic data risk)[00:20:05] Foundational paper introducing Word2Vec for computing word vector representations, Tomas Mikolov et al. (Seminal NLP technique)[00:22:15] OpenAI O3 model's breakthrough performance on ARC Prize Challenge, OpenAI (Significant AI reasoning benchmark achievement)[00:22:40] Seminal paper proposing a formal definition of intelligence as skill-acquisition efficiency, François Chollet (Influential AI definition/philosophy)[00:30:30] Technical documentation of DeepSeek's V3 model architecture and capabilities, DeepSeek AI (Details on a major new model)[00:34:30] Foundational paper establishing optimal scaling laws for LLM training, Jordan Hoffmann et al. (Key paper on LLM scaling)[00:45:45] Seminal essay arguing that scaling computation consistently trumps human-engineered solutions in AI, Richard S. Sutton (Influential "Bitter Lesson" perspective)

Connor Leahy and Gabriel Alfour, AI researchers from Conjecture and authors of "The Compendium," joinus for a critical discussion centered on Artificial Superintelligence (ASI) safety and governance. Drawing from their comprehensive analysis in "The Compendium," they articulate a stark warning about the existential risks inherent in uncontrolled AI development, framing it through the lens of "intelligence domination"—where a sufficiently advanced AI could subordinate humanity, much like humans dominate less intelligent species.SPONSOR MESSAGES:***Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich. Goto https://tufalabs.ai/\*\*\*TRANSCRIPT + REFS + NOTES:https://www.dropbox.com/scl/fi/p86l75y4o2ii40df5t7no/Compendium.pdf?rlkey=tukczgf3flw133sr9rgss0pnj&dl=0https://www.thecompendium.ai/https://en.wikipedia.org/wiki/Connor\_Leahyhttps://www.conjecture.dev/abouthttps://substack.com/@gabeccTOC:1\. AI Intelligence and Safety Fundamentals [00:00:00] 1.1 Understanding Intelligence and AI Capabilities [00:06:20] 1.2 Emergence of Intelligence and Regulatory Challenges [00:10:18] 1.3 Human vs Animal Intelligence Debate [00🔞00] 1.4 AI Regulation and Risk Assessment Approaches [00:26:14] 1.5 Competing AI Development Ideologies2. Economic and Social Impact [00:29:10] 2.1 Labor Market Disruption and Post-Scarcity Scenarios [00:32:40] 2.2 Institutional Frameworks and Tech Power Dynamics [00:37:40] 2.3 Ethical Frameworks and AI Governance Debates [00:40:52] 2.4 AI Alignment Evolution and Technical Challenges3. Technical Governance Framework [00:55:07] 3.1 Three Levels of AI Safety: Alignment, Corrigibility, and Boundedness [00:55:30] 3.2 Challenges of AI System Corrigibility and Constitutional Models [00:57:35] 3.3 Limitations of Current Boundedness Approaches [00:59:11] 3.4 Abstract Governance Concepts and Policy Solutions4. Democratic Implementation and Coordination [00:59:20] 4.1 Governance Design and Measurement Challenges [01:00:10] 4.2 Democratic Institutions and Experimental Governance [01:14:10] 4.3 Political Engagement and AI Safety Advocacy [01:25:30] 4.4 Practical AI Safety Measures and International CoordinationCORE REFS:[00:01:45] The Compendium (2023), Leahy et al.https://pdf.thecompendium.ai/the\_compendium.pdf\[00:06:50\] Geoffrey Hinton Leaves Google, BBC Newshttps://www.bbc.com/news/world-us-canada-65452940\[00:10:00\] ARC-AGI, Chollethttps://arcprize.org/arc-agi\[00:13:25\] A Brief History of Intelligence, Bennetthttps://www.amazon.com/Brief-History-Intelligence-Humans-Breakthroughs/dp/0063286343\[00:25:35\] Statement on AI Risk, Center for AI Safetyhttps://www.safe.ai/work/statement-on-ai-risk\[00:26:15\] Machines of Love and Grace, Amodeihttps://darioamodei.com/machines-of-loving-grace\[00:26:35\] The Techno-Optimist Manifesto, Andreessenhttps://a16z.com/the-techno-optimist-manifesto/\[00:31:55\] Techno-Feudalism, Varoufakishttps://www.amazon.co.uk/Technofeudalism-Killed-Capitalism-Yanis-Varoufakis/dp/1847927270\[00:42:40\] Introducing Superalignment, OpenAIhttps://openai.com/index/introducing-superalignment/\[00:47:20\] Three Laws of Robotics, Asimovhttps://www.britannica.com/topic/Three-Laws-of-Robotics\[00:50:00\] Symbolic AI (GOFAI), Haugelandhttps://en.wikipedia.org/wiki/Symbolic\_artificial\_intelligence\[00:52:30\] Intent Alignment, Christianohttps://www.alignmentforum.org/posts/HEZgGBZTpT4Bov7nH/mapping-the-conceptual-territory-in-ai-existential-safety\[00:55:10\] Large Language Model Alignment: A Survey, Jiang et al.http://arxiv.org/pdf/2309.15025\[00:55:40\] Constitutional Checks and Balances, Bokhttps://plato.stanford.edu/entries/montesquieu/

We are joined by Francois Chollet and Mike Knoop, to launch the new version of the ARC prize! In version 2, the challenges have been calibrated with humans such that at least 2 humans could solve each task in a reasonable task, but also adversarially selected so that frontier reasoning models can't solve them. The best LLMs today get negligible performance on this challenge. https://arcprize.org/SPONSOR MESSAGES:***Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich. Goto https://tufalabs.ai/\*\*\*TRANSCRIPT:https://www.dropbox.com/scl/fi/0v9o8xcpppdwnkntj59oi/ARCv2.pdf?rlkey=luqb6f141976vra6zdtptv5uj&dl=0TOC:1\. ARC v2 Core Design & Objectives [00:00:00] 1.1 ARC v2 Launch and Benchmark Architecture [00:03:16] 1.2 Test-Time Optimization and AGI Assessment [00:06:24] 1.3 Human-AI Capability Analysis [00:13:02] 1.4 OpenAI o3 Initial Performance Results2. ARC Technical Evolution [00:17:20] 2.1 ARC-v1 to ARC-v2 Design Improvements [00:21:12] 2.2 Human Validation Methodology [00:26:05] 2.3 Task Design and Gaming Prevention [00:29:11] 2.4 Intelligence Measurement Framework3. O3 Performance & Future Challenges [00:38:50] 3.1 O3 Comprehensive Performance Analysis [00:43:40] 3.2 System Limitations and Failure Modes [00:49:30] 3.3 Program Synthesis Applications [00:53:00] 3.4 Future Development RoadmapREFS:[00:00:15] On the Measure of Intelligence, François Chollethttps://arxiv.org/abs/1911.01547\[00:06:45\] ARC Prize Foundation, François Chollet, Mike Knoophttps://arcprize.org/\[00:12:50\] OpenAI o3 model performance on ARC v1, ARC Prize Teamhttps://arcprize.org/blog/oai-o3-pub-breakthrough\[00🔞30\] Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, Jason Wei et al.https://arxiv.org/abs/2201.11903\[00:21:45\] ARC-v2 benchmark tasks, Mike Knoophttps://arcprize.org/blog/introducing-arc-agi-public-leaderboard\[00:26:05\] ARC Prize 2024: Technical Report, Francois Chollet et al.https://arxiv.org/html/2412.04604v2\[00:32:45\] ARC Prize 2024 Technical Report, Francois Chollet, Mike Knoop, Gregory Kamradthttps://arxiv.org/abs/2412.04604\[00:48:55\] The Bitter Lesson, Rich Suttonhttp://www.incompleteideas.net/IncIdeas/BitterLesson.html\[00:53:30\] Decoding strategies in neural text generation, Sina Zarrießhttps://www.mdpi.com/2078-2489/12/9/355/pdf

Mohamed Osman joins to discuss MindsAI's highest scoring entry to the ARC challenge 2024 and the paradigm of test-time fine-tuning. They explore how the team, now part of Tufa Labs in Zurich, achieved state-of-the-art results using a combination of pre-training techniques, a unique meta-learning strategy, and an ensemble voting mechanism. Mohamed emphasizes the importance of raw data input and flexibility of the network.SPONSOR MESSAGES:***Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich. Goto https://tufalabs.ai/\*\*\*TRANSCRIPT + REFS:https://www.dropbox.com/scl/fi/jeavyqidsjzjgjgd7ns7h/MoFInal.pdf?rlkey=cjjmo7rgtenxrr3b46nk6yq2e&dl=0Mohamed Osman (Tufa Labs)https://x.com/MohamedOsmanMLJack Cole (Tufa Labs)https://x.com/MindsAI\_JackHow and why deep learning for ARC paper:https://github.com/MohamedOsman1998/deep-learning-for-arc/blob/main/deep\_learning\_for\_arc.pdfTOC:1\. Abstract Reasoning Foundations [00:00:00] 1.1 Test-Time Fine-Tuning and ARC Challenge Overview [00:10:20] 1.2 Neural Networks vs Programmatic Approaches to Reasoning [00:13:23] 1.3 Code-Based Learning and Meta-Model Architecture [00:20:26] 1.4 Technical Implementation with Long T5 Model2. ARC Solution Architectures [00:24:10] 2.1 Test-Time Tuning and Voting Methods for ARC Solutions [00:27:54] 2.2 Model Generalization and Function Generation Challenges [00:32:53] 2.3 Input Representation and VLM Limitations [00:36:21] 2.4 Architecture Innovation and Cross-Modal Integration [00:40:05] 2.5 Future of ARC Challenge and Program Synthesis Approaches3. Advanced Systems Integration [00:43:00] 3.1 DreamCoder Evolution and LLM Integration [00:50:07] 3.2 MindsAI Team Progress and Acquisition by Tufa Labs [00:54:15] 3.3 ARC v2 Development and Performance Scaling [00:58:22] 3.4 Intelligence Benchmarks and Transformer Limitations [01:01:50] 3.5 Neural Architecture Optimization and Processing DistributionREFS:[00:01:32] Original ARC challenge paper, François Chollethttps://arxiv.org/abs/1911.01547\[00:06:55\] DreamCoder, Kevin Ellis et al.https://arxiv.org/abs/2006.08381\[00:12:50\] Deep Learning with Python, François Chollethttps://www.amazon.com/Deep-Learning-Python-Francois-Chollet/dp/1617294438\[00:13:35\] Deep Learning with Python, François Chollethttps://www.amazon.com/Deep-Learning-Python-Francois-Chollet/dp/1617294438\[00:13:35\] Influence of pretraining data for reasoning, Laura Ruishttps://arxiv.org/abs/2411.12580\[00:17:50\] Latent Program Networks, Clement Bonnethttps://arxiv.org/html/2411.08706v1\[00:20:50\] T5, Colin Raffel et al.https://arxiv.org/abs/1910.10683\[00:30:30\] Combining Induction and Transduction for Abstract Reasoning, Wen-Ding Li, Kevin Ellis et al.https://arxiv.org/abs/2411.02272\[00:34:15\] Six finger problem, Chen et al.https://openaccess.thecvf.com/content/CVPR2024/papers/Chen\_SpatialVLM\_Endowing\_Vision-Language\_Models\_with\_Spatial\_Reasoning\_Capabilities\_CVPR\_2024\_paper.pdf\[00:38:15\] DeepSeek-R1-Distill-Llama, DeepSeek AIhttps://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B\[00:40:10\] ARC Prize 2024 Technical Report, François Chollet et al.https://arxiv.org/html/2412.04604v2\[00:45:20\] LLM-Guided Compositional Program Synthesis, Wen-Ding Li and Kevin Ellishttps://arxiv.org/html/2503.15540\[00:54:25\] Abstraction and Reasoning Corpus, François Chollethttps://github.com/fchollet/ARC-AGI\[00:57:10\] O3 breakthrough on ARC-AGI, OpenAIhttps://arcprize.org/\[00:59:35\] ConceptARC Benchmark, Arseny Moskvichev, Melanie Mitchellhttps://arxiv.org/abs/2305.07141\[01:02:05\] Mixtape: Breaking the Softmax Bottleneck Efficiently, Yang, Zhilin and Dai, Zihang and Salakhutdinov, Ruslan and Cohen, William W.http://papers.neurips.cc/paper/9723-mixtape-breaking-the-softmax-bottleneck-efficiently.pdf

Iman Mirzadeh from Apple, who recently published the GSM-Symbolic paper discusses the crucial distinction between intelligence and achievement in AI systems. He critiques current AI research methodologies, highlighting the limitations of Large Language Models (LLMs) in reasoning and knowledge representation. SPONSOR MESSAGES:***Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich. Goto https://tufalabs.ai/\*\*\*TRANSCRIPT + RESEARCH:https://www.dropbox.com/scl/fi/mlcjl9cd5p1kem4l0vqd3/IMAN.pdf?rlkey=dqfqb74zr81a5gqr8r6c8isg3&dl=0TOC:1\. Intelligence vs Achievement in AI Systems [00:00:00] 1.1 Intelligence vs Achievement Metrics in AI Systems [00:03:27] 1.2 AlphaZero and Abstract Understanding in Chess [00:10:10] 1.3 Language Models and Distribution Learning Limitations [00:14:47] 1.4 Research Methodology and Theoretical Frameworks2. Intelligence Measurement and Learning [00:24:24] 2.1 LLM Capabilities: Interpolation vs True Reasoning [00:29:00] 2.2 Intelligence Definition and Measurement Approaches [00:34:35] 2.3 Learning Capabilities and Agency in AI Systems [00:39:26] 2.4 Abstract Reasoning and Symbol Understanding3. LLM Performance and Evaluation [00:47:15] 3.1 Scaling Laws and Fundamental Limitations [00:54:33] 3.2 Connectionism vs Symbolism Debate in Neural Networks [00:58:09] 3.3 GSM-Symbolic: Testing Mathematical Reasoning in LLMs [01:08:38] 3.4 Benchmark Evaluation and Model Performance AssessmentREFS:[00:01:00] AlphaZero chess AI system, Silver et al.https://arxiv.org/abs/1712.01815\[00:07:10\] Game Changer: AlphaZero's Groundbreaking Chess Strategies, Sadler & Reganhttps://www.amazon.com/Game-Changer-AlphaZeros-Groundbreaking-Strategies/dp/9056918184\[00:11:35\] Cross-entropy loss in language modeling, Voitahttp://lena-voita.github.io/nlp\_course/language\_modeling.html\[00:17:20\] GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in LLMs, Mirzadeh et al.https://arxiv.org/abs/2410.05229\[00:21:25\] Connectionism and Cognitive Architecture: A Critical Analysis, Fodor & Pylyshynhttps://www.sciencedirect.com/science/article/pii/001002779090014B\[00:28:55\] Brain-to-body mass ratio scaling laws, Sutskeverhttps://www.theverge.com/2024/12/13/24320811/what-ilya-sutskever-sees-openai-model-data-training\[00:29:40\] On the Measure of Intelligence, Chollethttps://arxiv.org/abs/1911.01547\[00:33:30\] On definition of intelligence, Gignac et al.https://www.sciencedirect.com/science/article/pii/S0160289624000266\[00:35:30\] Defining intelligence, Wanghttps://cis.temple.edu/\~wangp/papers.html\[00:37:40\] How We Learn: Why Brains Learn Better Than Any Machine... for Now, Dehaenehttps://www.amazon.com/How-We-Learn-Brains-Machine/dp/0525559884\[00:39:35\] Surfaces and Essences: Analogy as the Fuel and Fire of Thinking, Hofstadter and Sanderhttps://www.amazon.com/Surfaces-Essences-Analogy-Fuel-Thinking/dp/0465018475\[00:43:15\] Chain-of-thought prompting, Wei et al.https://arxiv.org/abs/2201.11903\[00:47:20\] Test-time scaling laws in machine learning, Brownhttps://podcasts.apple.com/mv/podcast/openais-noam-brown-ilge-akkaya-and-hunter-lightman-on/id1750736528?i=1000671532058\[00:47:50\] Scaling Laws for Neural Language Models, Kaplan et al.https://arxiv.org/abs/2001.08361\[00:55:15\] Tensor product variable binding, Smolenskyhttps://www.sciencedirect.com/science/article/abs/pii/000437029090007M\[01:08:45\] GSM-8K dataset, OpenAIhttps://huggingface.co/datasets/openai/gsm8k

Dr. Max Bartolo from Cohere discusses machine learning model development, evaluation, and robustness. Key topics include model reasoning, the DynaBench platform for dynamic benchmarking, data-centric AI development, model training challenges, and the limitations of human feedback mechanisms. The conversation also covers technical aspects like influence functions, model quantization, and the PRISM project.Max Bartolo (Cohere):https://www.maxbartolo.com/https://cohere.com/commandTRANSCRIPT:https://www.dropbox.com/scl/fi/vujxscaffw37pqgb6hpie/MAXB.pdf?rlkey=0oqjxs5u49eqa2m7uaol64lbw&dl=0TOC:1\. Model Reasoning and Verification [00:00:00] 1.1 Model Consistency and Reasoning Verification [00:03:25] 1.2 Influence Functions and Distributed Knowledge Analysis [00:10:28] 1.3 AI Application Development and Model Deployment [00:14:24] 1.4 AI Alignment and Human Feedback Limitations2. Evaluation and Bias Assessment [00:20:15] 2.1 Human Evaluation Challenges and Factuality Assessment [00:27:15] 2.2 Cultural and Demographic Influences on Model Behavior [00:32:43] 2.3 Adversarial Examples and Model Robustness3. Benchmarking Systems and Methods [00:41:54] 3.1 DynaBench and Dynamic Benchmarking Approaches [00:50:02] 3.2 Benchmarking Challenges and Alternative Metrics [00:50:33] 3.3 Evolution of Model Benchmarking Methods [00:51:15] 3.4 Hierarchical Capability Testing Framework [00:52:35] 3.5 Benchmark Platforms and Tools4. Model Architecture and Performance [00:55:15] 4.1 Cohere's Model Development Process [01:00:26] 4.2 Model Quantization and Performance Evaluation [01:05:18] 4.3 Reasoning Capabilities and Benchmark Standards [01:08:27] 4.4 Training Progression and Technical Challenges5. Future Directions and Challenges [01:13:48] 5.1 Context Window Evolution and Trade-offs [01:22:47] 5.2 Enterprise Applications and Future ChallengesREFS:[00:03:10] Research at Cohere with Laura Ruis et al., Max Bartolo, Laura Ruis et al.https://cohere.com/research/papers/procedural-knowledge-in-pretraining-drives-reasoning-in-large-language-models-2024-11-20\[00:04:15\] Influence functions in machine learning, Koh & Lianghttps://arxiv.org/abs/1703.04730\[00:08:05\] Studying Large Language Model Generalization with Influence Functions, Roger Grosse et al.https://storage.prod.researchhub.com/uploads/papers/2023/08/08/2308.03296.pdf\[00:11:10\] The LLM ARChitect: Solving ARC-AGI Is A Matter of Perspective, Daniel Franzen, Jan Disselhoff, and David Hartmannhttps://github.com/da-fr/arc-prize-2024/blob/main/the\_architects.pdf\[00:12:10\] Hugging Face model repo for C4AI Command A, Cohere and Cohere For AIhttps://huggingface.co/CohereForAI/c4ai-command-a-03-2025\[00:13:30\] OpenInterpreterhttps://github.com/KillianLucas/open-interpreter\[00:16:15\] Human Feedback is not Gold Standard, Tom Hosking, Max Bartolo, Phil Blunsomhttps://arxiv.org/abs/2309.16349\[00:27:15\] The PRISM Alignment Dataset, Hannah Kirk et al.https://arxiv.org/abs/2404.16019\[00:32:50\] How adversarial examples arise, Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Logan Engstrom, Brandon Tran, Aleksander Madryhttps://arxiv.org/abs/1905.02175\[00:43:00\] DynaBench platform paper, Douwe Kiela et al.https://aclanthology.org/2021.naacl-main.324.pdf\[00:50:15\] Sara Hooker's work on compute limitations, Sara Hookerhttps://arxiv.org/html/2407.05694v1\[00:53:25\] DataPerf: Community-led benchmark suite, Mazumder et al.https://arxiv.org/abs/2207.10062\[01:04:35\] DROP, Dheeru Dua et al.https://arxiv.org/abs/1903.00161\[01:07:05\] GSM8k, Cobbe et al.https://paperswithcode.com/sota/arithmetic-reasoning-on-gsm8k\[01:09:30\] ARC, François Chollethttps://github.com/fchollet/ARC-AGI\[01:15:50\] Command A, Coherehttps://cohere.com/blog/command-a\[01:22:55\] Enterprise search using LLMs, Coherehttps://cohere.com/blog/commonly-asked-questions-about-search-from-coheres-enterprise-customers

This sponsored episode features mathematician Ohad Asor discussing logical approaches to AI, focusing on the limitations of machine learning and introducing the Tau language for software development and blockchain tech. Asor argues that machine learning cannot guarantee correctness. Tau allows logical specification of software requirements, automatically creating provably correct implementations with potential to revolutionize distributed systems. The discussion highlights program synthesis, software updates, and applications in finance and governance.SPONSOR MESSAGES:***Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich. Goto https://tufalabs.ai/\*\*\*TRANSCRIPT + RESEARCH:https://www.dropbox.com/scl/fi/t849j6v1juk3gc15g4rsy/TAU.pdf?rlkey=hh11h2mhog3ncdbeapbzpzctc&dl=0Tau:https://tau.net/Tau Language:https://tau.ai/tau-language/Research:https://tau.net/Theories-and-Applications-of-Boolean-Algebras-0.29.pdfTOC:1\. Machine Learning Foundations and Limitations [00:00:00] 1.1 Fundamental Limitations of Machine Learning and PAC Learning Theory [00:04:50] 1.2 Transductive Learning and the Three Curses of Machine Learning [00:08:57] 1.3 Language, Reality, and AI System Design [00:12:58] 1.4 Program Synthesis and Formal Verification Approaches2. Logical Programming Architecture [00:31:55] 2.1 Safe AI Development Requirements [00:32:05] 2.2 Self-Referential Language Architecture [00:32:50] 2.3 Boolean Algebra and Logical Foundations [00:37:52] 2.4 SAT Solvers and Complexity Challenges [00:44:30] 2.5 Program Synthesis and Specification [00:47:39] 2.6 Overcoming Tarski's Undefinability with Boolean Algebra [00:56:05] 2.7 Tau Language Implementation and User Control3. Blockchain-Based Software Governance [01:09:10] 3.1 User Control and Software Governance Mechanisms [01🔞27] 3.2 Tau's Blockchain Architecture and Meta-Programming Capabilities [01:21:43] 3.3 Development Status and Token Implementation [01:24:52] 3.4 Consensus Building and Opinion Mapping System [01:35:29] 3.5 Automation and Financial ApplicationsCORE REFS (more in pinned comment):[00:03:45] PAC (Probably Approximately Correct) Learning framework, Leslie Valianthttps://en.wikipedia.org/wiki/Probably\_approximately\_correct\_learning\[00:06:10\] Boolean Satisfiability Problem (SAT), Varioushttps://en.wikipedia.org/wiki/Boolean\_satisfiability\_problem\[00:13:55\] Knowledge as Justified True Belief (JTB), Matthias Steuphttps://plato.stanford.edu/entries/epistemology/\[00:17:50\] Wittgenstein's concept of the limits of language, Ludwig Wittgensteinhttps://plato.stanford.edu/entries/wittgenstein/\[00:21:25\] Boolean algebras, Ohad Osorhttps://tau.net/tau-language-research/\[00:26:10\] The Halting Problemhttps://plato.stanford.edu/entries/turing-machine/#HaltProb\[00:30:25\] Alfred Tarski (1901-1983), Mario Gómez-Torrentehttps://plato.stanford.edu/entries/tarski/\[00:41:50\] DPLLhttps://www.cs.princeton.edu/\~zkincaid/courses/fall18/readings/SATHandbook-CDCL.pdf\[00:49:50\] Tarski's undefinability theorem (1936), Alfred Tarskihttps://plato.stanford.edu/entries/tarski-truth/\[00:51:45\] Boolean Algebra mathematical foundations, J. Donald Monkhttps://plato.stanford.edu/entries/boolalg-math/\[01:02:35\] Belief Revision Theory and AGM Postulates, Sven Ove Hanssonhttps://plato.stanford.edu/entries/logic-belief-revision/\[01:05:35\] Quantifier elimination in atomless boolean algebra, H. Jerome Keislerhttps://people.math.wisc.edu/\~hkeisler/random.pdf\[01:08:35\] Quantifier elimination in Tau language specification, Ohad Asorhttps://tau.ai/Theories-and-Applications-of-Boolean-Algebras-0.29.pdf\[01:11:50\] Tau Net blockchain platformhttps://tau.net/\[01:19:20\] Tau blockchain's innovative approach treating blockchain code itself as a contracthttps://tau.net/Whitepaper.pdf