Katerina Fragkiadaki (original) (raw)

Katerina Fragkiadaki
email: katef 'at' cs.cmu.edu

CV |Bio Google Scholar Twitter I am a JPMorgan Chase Associate Professor of Computer Science in the Machine Learning Department at Carnegie Mellon University. I work in Artificial Intelligence at the intersection of Computer Vision, Machine Learning, Language Understanding and Robotics. Prior to joining MLD's faculty I spent three wonderful years as a post doctoral researcher first at UC Berkeley working with Jitendra Malik and then at Google Research in Mountain View working with the video group. I completed my Ph.D. in GRASP, UPenn with Jianbo Shi . I did my undergraduate studies at the National Technical University of Athens and before that I was in Crete. Prospective students: If you want to join CMU as PhD student, just mention my name in your application. Otherwise, if you would like to join our group in any other capacity, please fill this form and then please send me a short email note without any documents.

News

Teaching

Research Group

Our group studies Artificial Intelligence, and specifically Machine Learning models at the intersection of Computer Vision, Language Understanding and Robotics. Our ultimate goal is to build machines that will autonomously and in interactions with humans and with the environment acquire and continuously improve world models, that would let them reason through consequences of their and others decisions, to surpass humans in both dexterity and creativity. Topics we currently focus on include representation learning, video understanding, 2D/3D unified vision language models, generative modeling, learning simulators from data, real2sim and sim2real robot learning, reinforcement learning, continual learning.
PhD Students Wen-hsuan Chu Gabriel Sarch (with Mike Tarr) Brian Yang (with Jeff Schneider) Nikos Gkanatsios Mihir Prabhudesai (with Deepak Pathak) Ayush Jain Kashu Yamazaki Matthew Bronars Postdoc Lei Ke MS Students Yash Jangir Alexander Swerlow
Former Students Tsung-Wei Ke (PostDoc, now professor at NTU CSIE) Xian Zhou (PhD student) Fish Tung (PhD student, post doc in M.I.T., Tesla, Google DeepMind) Adam Harley (PhD student, PostDoc at Stanford, Meta) Theo Gevret (PhD student, Mistral) Pushkal Katara (MS student, ScaledFoundations) Ricson Chen (undergrad, CRA research award) Zhaoyuan Fang (MS student, Google) Mayank Singh (MS student, Apple). Yunchu Zhang (MS student, UW PhD) Ziyan Wang (MSR student, RI CMU PhD) Shamit Lal (MS student, Amazon AGI) Yiming Zuo (MSR student, Princeton PhD ) Max Sieb (MSR student, Copvarian AI, Google DeepMind) Arpit Agarwal (MSR student, RI CMU PhD) Henry Huang (MS student, Bloomberg) Chris Ying (MS student, Google Brain) Darshan Patil (undergraduate, MILA PhD) Nilay Pande (MS, Tesla, Waymo) Ishitta Mediratta (undergraduate collaborator, Meta)

Selected Publications

| | Unified Multimodal Discrete Diffusion Alexander Swerdlow*, Mihir Prabhudesai*, Siddharth Gandhi, Deepak Pathak, Katerina Fragkiadaki arxiv project page | | | | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------- | | | Video Depth without Video Models Bingxin Ke, Dominik Narnhofer, Shengyu Huang, Lei Ke, Torben Peters, Katerina Fragkiadaki, Anton Obukhov, Konrad Schindler CVPR 2025 project page | | | | Unifying 2D and 3D Vision-Language Understanding Ayush Jain*, Alexander Swerdlow*, Yuzhou Wang, Alexander Sax, Franziska Meier, Katerina Fragkiadaki arxiv project page | | | | Video Diffusion Alignment via Reward Gradients Mihir Prabhudesai, Russell Mendonca, Zheyang Qin, Katerina Fragkiadaki, Deepak Pathak VADER aligns video diffusion models using end-to-end reward gradient backpropagation from off-the-shelf differentiable reward functions. arxiv webpage | | | | | VLM Agents Generate Their Own Memories: Distilling Experience into Embodied Programs of Thought Gabriel Sarch, Lawrence Jang, Michael Tarr, William Cohen, Kenneth Marino, Katerina Fragkiadaki A technique that enables VLM agents to take initially suboptimal demonstrations and iteratively improve them, ultimately generating high-quality trajectory data that includes both optimized actions and detailed reasoning annotations suitable for more effective in-context learning and fine-tuning. NeurIPS 2024 spotlight webpage | | | | | DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos Wen-Hsuan Chu*, Lei Ke*, Katerina Fragkiadaki DreamScene4D generates 3D dynamic scenes of multiple objects from monocular videos training-free, using object-centric diffusion priors and pixel and motion reprojection error. NeurIPS 2024 webpage | | | | | ODIN: A Single Model for 2D and 3D Perception Ayush Jain, Pushkal Katara, Nikolaos Gkanatsios, Adam W. Harley, Gabriel Sarch, Kriti Aggarwal, Vishrav Chaudhary, Katerina Fragkiadaki ODIN processes both RGB images and sequences of posed RGB-D images by alternating between 2D and 3D fusion layers using projection and unprojection from camera info. New SOTA in Scannet200. CVPR 2024 spotlight webpage | | | | 3D Diffuser Actor: Policy Diffusion with 3D Scene Representations Tsung-Wei Ke*, Nikolaos Gkanatsios*, Katerina Fragkiadaki Combining 3D relative attention transformers with action trajectory diffusion gives SOTA imitation learning robot policies in CALVIN and RLbench. CoRL 2024 webpage | | | | | Diffusion-ES: Gradient-free Planning with Diffusion for Autonomous Driving and Zero-Shot Instruction Following Brian Yang, Huangyuan Su, Nikolaos Gkanatsios, Tsung-Wei Ke, Ayush Jain, Jeff Schneider, Katerina Fragkiadaki Diffusion-ES combines trajectory diffusion models with evolutionary search and achieves SOTA performance in nuPLAN. We prompt LLMs to map language instructions to shaped reward functions, and optimize them with diffusion-ES, and solve the hardest driving scenarios. CVPR 2024 webpage | | | | | Test-time Adaptation of Discriminative Models via Diffusion Generative Feedback Mihir Prabhudesai*, Tsung-Wei Ke*, Alexander C. Li, Deepak Pathak, Katerina Fragkiadaki NeurIPS 2023 webpage | | | | Open-Ended Instructable Embodied Agents with Memory-Augmented Large Language Models Gabriel Sarch, Yue Wu, Michael J. Tarr, Katerina Fragkiadaki EMNLP findings 2023 webpage | | | | | Act3D: 3D Feature Field Transformers for Multi-Task Robotic Manipulation Theophile Gervet*, Zhou Xian*, Nikolaos Gkanatsios, Katerina Fragkiadaki CoRL 2023 webpage | | | | | Gen2Sim: Scaling up Robot Learning in Simulation with Generative Models Pushkal Katara*, Xian Zhou*, Katerina Fragkiadaki ICRA 2024 webpage | | | | | RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation Yufei Wang, Zhou Xian, Feng Chen, Tsun-Hsuan Wang, Yian Wang, Katerina Fragkiadaki, Zackory Erickson, David Held, Chuang Gan ICML 2024 webpage | | | | | ChainedDiffuser: Unifying Trajectory Diffusion and Keypose Prediction for Robotic Manipulation Zhou Xian*, Nikolaos Gkanatsios*, Theophile Gervet*, Tsung-Wei Ke, Katerina Fragkiadaki CoRL 2023 webpage | | | | Test-time Adaptation with Slot-Centric Models Mihir Prabhudesai, Anirudh Goyal, Sujoy Paul, Sjoerd van Steenkiste, Mehdi S. M. Sajjadi, Gaurav Aggarwal, Thomas Kipf, Deepak Pathak, Katerina Fragkiadaki ICML 2023 webpage | | | | | Energy-based Models are Zero-Shot Planners for Compositional Scene Rearrangement Nikolaos Gkanatsios, Ayush Jain, Zhou Xian, Yunchu Zhang, Christopher Atkeson, Katerina Fragkiadaki RSS 2023 webpage | | | | Simple-BEV: What Really Matters for Multi-Sensor BEV Perception? Adam W. Harley, Zhaoyuan Fang, Jie Li, Rares Ambrus, Katerina Fragkiadaki ICRA 2023 webpage | | | | FluidLab: A Differentiable Environment for Benchmarking Complex Fluid Manipulation Zhou Xian, Bo Zhu, Zhenjia Xu, Hsiao-Yu Tung, Antonio Torralba, Katerina Fragkiadaki, Chuang Gan ICLR 2023, spotlight webpage | | | | Analogy-Forming Transformers for Few-Shot 3D Parsing Nikolaos Gkanatsios, Mayank Singh, Zhaoyuan Fang, Shubham Tulsiani, Katerina Fragkiadaki ICLR 2023 webpage | | | | Bottom Up Top Down Detection Transformers for Language Grounding in Images and Point Clouds Ayush Jain, Nikolaos Gkanatsios, Ishita Mediratta, Katerina Fragkiadaki ECCV 2022 webpage | | | | TIDEE: Tidying Up Novel Rooms using Visuo-Semantic Commonsense Priors Gabriel Sarch, Zhaoyuan Fang, Adam W. Harley, Paul Schydlo, Michael J. Tarr, Saurabh Gupta, Katerina Fragkiadaki ECCV 2022 webpage | | | | Particle Videos Revisited: Tracking Through Occlusions Using Point Trajectories Adam W. Harley, Zhaoyuan Fang, Katerina Fragkiadaki ECCV 2022, oral webpage | | | | Visually-Grounded Library of Behaviors for Manipulating Diverse Objects across Diverse Configurations and Views Jingyun Yang*, Hsiao-Yu Fish Tung*, Yunchu Zhang*, Gaurav Pathak, Ashwini Pokle, Christopher G Atkeson, Katerina Fragkiadaki CoRL 2021 webpage | | | | Disentangling 3D Prototypical Networks for Few-Shot Concept Learning Mihir Prabhudesai*, Shamit Lal*, Darshan Patil*, Hsiao-Yu Tung, Adam Harley, Katerina Fragkiadaki ICLR 2021 webpage | | | | Track, Check, Repeat: An EM Approach to Unsupervised Tracking Adam W. Harley, Yiming Zuo, Jing Wen, Ayush Mangal, Shubhankar Potdar, Ritwick Chaudhry, Katerina Fragkiadaki CVPR 2021 webpage | | | | Move to See Better: Self-Improving Embodied Object Detection Zhaoyuan Fang, Ayush Jain, Gabriel Sarch, Adam W. Harley, Katerina Fragkiadaki BMVC 2021 webpage | | | | HyperDynamics: Generating Expert Dynamics Models by Observation Zhou Xian, Shamit Lal, Hsiao-Yu Tung, Emmanouil Antonios Platanios, Katerina Fragkiadaki ICLR 2021 webpage | | | | CoCoNets: Continuous Contrastive 3D Scene Representations Shamit Lal, Mihir Prabhudesai, Ishita Mediratta, Adam W. Harley, Katerina Fragkiadaki CVPR 2021 webpage | | | | Tracking Emerges by Looking Around Static Scenes, with Neural 3D Mapping Adam W. Harley, Shrinidhi K. Lakshmikanth, Paul Schydlo, Katerina Fragkiadaki ECCV 2020 | | | | Embodied Language Grounding with Implicit 3D Visual Feature Representations Mihir Prabhudesai*, Hsiao-Yu Fish Tung*, Syed Ashar Javed*, Maximilian Sieb, Adam W. Harley, Katerina Fragkiadaki CVPR 2020 webpage | | | | Epipolar Transformers Yihui He, Rui Yan, Katerina Fragkiadaki, Shoou-I Yu CVPR 2020 webpage | | | | Graph-structured Visual Imitation Xian Zhou*, Max Sieb*, Audrey Huang, Oliver Kroemer, Katerina Fragkiadaki CoRL 2019, spotlight webpage | | | | Learning from Unlabelled Videos Using Contrastive Predictive Neural 3D Mapping Adam W. Harley, Fangyu Li, Shrinidhi K. Lakshmikanth, Xian Zhou, Hsiao-Yu Fish Tung, Katerina Fragkiadaki ICLR 2020 webpage | | | | Learning Spatial Common Sense with Geometry-Aware Recurrent Networks Hsiao-Yu Fish Tung, Ricson Cheng, Katerina Fragkiadaki CVPR 2019, oral webpage | | | | Model Learning for Look-ahead Exploration in Continuous Control Arpit Agarwal, Katharina Muelling and Katerina Fragkiadaki AAAI 2019 oral webpage | | | | Reinforcement Learning of Active Vision for Manipulating Objects under Occlusions Ricson Cheng, Arpit Agarwal, and Katerina Fragkiadaki CoRL 2018 slides | code | | | | Geometry-Aware Recurrent Neural Networks for Active Visual Recognition Ricson Cheng, Ziyan Wang, and Katerina Fragkiadaki NIPS 2018 | | | | Reward Learning from Narrated Demonstrations Fish Tung, Adam Harley, Liang-Kang Huang, Katerina Fragkiadaki CVPR 2018 bibtex | | | | Depth-adaptive Computational Policies for Efficient Visual Tracking Chris Ying, Katerina Fragkiadaki EMMCVPR 2017 bibtex | | | | Self-supervised Learning of Motion Capture Hsiao-Yu Fish Tung, Wei Tung, Ersin Yumer, Katerina Fragkiadaki NIPS 2017 spotlight bibtex | code | | | | Adversarial Inverse Graphics Networks: Learning 2D-to-3D Lifting and Image-to-Image Translation from Unpaired Supervision Hsiao-Yu Fish Tung, Adam Harley, William Seto, Katerina Fragkiadaki ICCV 2017 bibtex | code | | | | SfM-Net: Learning of Structure and Motion from Video Sudheendra Vijayanarasimhan, Susanna Ricco, Cordelia Schmid, Rahul Sukthankar, Katerina Fragkiadaki arxiv | | | | Learning Predictive Visual Models of Physics for Playing Billiards Katerina Fragkiadaki*, Pulkit Agrawal*, Sergey Levine, Jitendra Malik ICLR 2016 webpage | | | | Recurrent Network Models for Human Dynamics Katerina Fragkiadaki, Sergey Levine, Panna Felsen, Jitendra Malik ICCV 2015 webpage | | | | Human Pose Estimation with Iterative Error Feedback Joao Carreira, Pulkit Agrawal, Katerina Fragkiadaki, Jitendra Malik arXiv webpage | | | | Learning to Segment Moving Objects in Videos Katerina Fragkiadaki, Pablo Arbelaez, Panna Felsen, Jitendra Malik CVPR 2015 poster | bibtex | webpage | | | Grouping-Based Low-Rank Video Completion and 3D Reconstruction Katerina Fragkiadaki, Marta Salas, Pablo Arbelaez, Jitendra Malik NIPS 2014 poster | bibtex | webpage | | | Two Granularity Tracking: Mediating Trajectory and Detection Graphs for Tracking under Occlusions Katerina Fragkiadaki, Weiyu Zhang, Geng Zhang, Jianbo Shi ECCV 2012 poster | bibtex | webpage |