Machine Understanding of Narrated Guided Tours (original) (raw)
Albert Huang
Seth Teller
Motivation
Suppose Alice is a new student or employee. A typical way to introduce Alice to her new environment is for someone to walk around with her, describing the environment with cues such as "Here's your desk, this is the bathroom, this is the kitchen, etc." Being able to introduce a robot or computational device to a physical environment in this same manner provides significant benefits. In addition to lowering the initial cost and effort involved in integrating such a device into the environment, people could interact with the device intuitively (e.g. "Bring this to Bob's desk" instead of "Bring this to [32.533, 19.89, 43.278]"). Updating the device's internal representation of the space could be simplified from uploading new firmware or specially formatted maps to walking around with it and giving it verbal commands.
Approach
To explore this possibility, we have constructed a sensory platform in the form of a wearable backpack with a number of different sensors:
- Ladybug2 spherical camera
- microphone
- XSens MTi inertial measurement units
- Disto Memo laser range finder The device is called the Ladypack, named after the most prominent sensor and the overall form factor. Data is collected by walking around while wearing the Ladypack, speaking to it with the microphone, and indicating items or areas of interest with the laser range finder.
For a better description of the Ladypack, take a couple minutes to watch the first video linked below.
Videos
note: These videos are all encoded with XViD and MS MPEG4v2. MS MPEG4v2 videos should play without issue on a default Windows 2000/XP installation (but not other operating systems), and XViD videos will play on any machine with XViD codecs installed. (win32 XViD codecs available here) Both versions are linked after each video title.
- General overview of the sensor platform and project goals. (XViD) | (MS MPEG4v2)
- Point feature detector run on omnidirectional video stream. (XViD) | (MS MPEG4v2)
- Corner detector coupled with KLT feature tracker run on omnidirectional video stream(XViD) | (MS MPEG4v2)
- Estimating user pose (left) and trajectory (top right) from raw IMU data, with raw IMU measurements at bottom(XViD) | (MS MPEG4v2)
Links
- wiki - somewhat unorganized, but contains a lot more information.
- media - random video clips and media related to the project
- my personal page
- MIT CSAIL