Jiayi Geng (@JiayiiGeng) on X (original) (raw)

Pinned

As long-horizon software engineering tasks grow in complexity, a single agent can no longer finish the tasks alone — effective multi-agent collaboration becomes necessary. This leads to a natural question: how can multiple agents be coordinated to asynchronously collaborate over
Using LLMs to build AI scientists is all the rage now (e.g., Google’s AI co-scientist [1] and Sakana’s Fully Automated Scientist [2]), but how much do we understand about their core scientific abilities? We know how LLMs can be vastly useful (solving complex math problems) yet
I'm thrilled to share that I've moved to Pittsburgh and joined NeuLab at CMU as a research intern this summer, advised by
@gneubig
! I'll also start my PhD
@LTIatCMU
this fall. Feel free to reach out if you're interested in chatting about multi-agent systems, LLMs for scientific
We use LLMs for everyday tasks—research, writing, coding, decision-making. They remember our conversations, adapt to our needs and preferences. Naturally, we trust them more with repeated use. But this growing trust might be masking a hidden risk: what if their beliefs are
Excited to share our new preprint: Mind Your Step (By Step)📢📢 Making LLMs/VLMs 'think aloud' with chain-of-thought prompting isn't always helpful for reasoning tasks.🤔💭😮🤖 Check out our paper for more details: arxiv.org/abs/2410.21333 🤖
📢 We're thrilled to announce the CMU AI for Science Workshop on Sept 12 at CUC-MPW! Featuring an amazing lineup of speakers: - Akari Asai (AI2/CMU) - Gabe Gomes (CMU) - Chenglei Si (Stanford) - Keyon Vafa (Harvard) Join us on campus, submit your poster & register here:
In "Mind Your Step (by Step): Chain‑of‑Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse", we connect human "overthinking" insights to LLM reasoning, offering a new lens on when thinking‑out‑loud backfires. 📄 Read the full paper:
Replying to @JiayiiGeng
To study this question, we use a three-stage protocol: record baseline belief → accumulate context → record final belief. We measure not only the stated beliefs (what they say) and also behaviors (what they do). Intuitively, belief shift depends on how relevant and targeted
Replying to @JiayiiGeng
Do LM assistants change their beliefs as context accumulates? YES! Substantially. Surprisingly, we find belief shifts happen even through ordinary activities like reading and research with no explicit persuasion needed. We also observe that stated beliefs and behaviors don't
Replying to @JiayiiGeng
In summary, we find belief shifts in LM assistants can emerge gradually through ordinary use. Even without explicit persuasion, extended reading and interaction can subtly reshape their views and behaviors over time. These changes can happen quietly, accumulating with context
Replying to @JiayiiGeng
🤔How well can LLMs infer the internal mechanisms of black box systems from passive observations? We looked at three kinds of black box systems inspired by cognitive studies: 1) list-mapping programs 2) rules of formal languages 3) parameters of math equations Our findings
Replying to @JiayiiGeng
Doing science requires several skills: 1) Performing inductive reasoning based on passively observed data; 2) Actively interacting with a system to collect informative data and reduce uncertainty about its internal mechanisms; 3) Communicating the results. Based on these
🧐Check out our poster 11 am today @ West-320!

Chain of thought can hurt LLM performance 🤖 Verbal (over)thinking can hurt human performance 😵‍💫 Are when/why they happen similar? Come find out at our poster at West-320 ⏰11am tomorrow
Check out this cool video (made by
@theryanliu
) for our #icml25 paper, "Mind Your Step (by Step): Chain-of-Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse"🤗

A short 📹 explainer video on how LLMs can overthink in humanlike ways 😲! had a blast presenting this at #icml2025 🥳