Tao Zhao - Academia.edu (original) (raw)

Papers by Tao Zhao

Research paper thumbnail of Articulated Surgical Tool Detection Using Virtually-Rendered Templates

Purpose We propose a system capable of detecting articulated surgical instruments without the use... more Purpose We propose a system capable of detecting articulated surgical instruments without the use of assistive markers or manual initialization. Methods The algorithm can provide 3D pose using a combination of online and offline learning techniques along with prior geometric knowledge of the tool. It uses live kinematic data from the robotic system to render nearby poses on-thefly as virtual images and creates gradient orientation templates for fast matching into the real image. Prior appearance models of different material classes and projective invariance are used to reject false positives. Results Results are verified using in-vivo data recorded from the da Vinci R © robotic surgical system. The method detects successfully at a high correctness rate and a pyramid search method is proposed which reduces a brute-force method from 23 secs/frame down to 3 secs/frame. Conclusion We have shown a top-down approach to detect surgical tools within in-vivo video sequences and is capable of...

Research paper thumbnail of Bayesian human segmentation in crowded situations

2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings.

Problem of segmenting individual humans in crowded situations from stationary video camera sequen... more Problem of segmenting individual humans in crowded situations from stationary video camera sequences is exacerbated by object inter-occlusion. We pose this problem as a "model-based segmentation" problem in which human shape models are used to interpret the foreground in a Bayesian framework. The solution is obtained by using an efficient Markov chain Monte Carlo (MCMC) method which uses domain knowledge as proposal probabilities. Knowledge of various aspects including human shape, human height, camera model, and image cues including human head candidates, foreground/background separation are integrated in one theoretically sound framework. We show promising results and evaluations on some challenging data.

Research paper thumbnail of Tracking multiple humans in crowded environment

Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004.

Tracking of humans in dynamic scenes has been an important topic of research. Most techniques, ho... more Tracking of humans in dynamic scenes has been an important topic of research. Most techniques, however, are limited to situations where humans appear isolated and occlusion is small. Typical methods rely on appearance models that must be acquired when the humans enter the scene and are not occluded. We present a method that can track humans in crowded environments, with significant and persistent occlusion by making use of human shape models in addition to camera models, the assumption that humans walk on a plane and acquired appearance models. Experimental results and a quantitative evaluation are included.

Research paper thumbnail of Feature Classification for Tracking Articulated Surgical Tools

Medical Image Computing and Computer-Assisted Intervention – MICCAI 2012, 2012

Tool tracking is an accepted capability for computer-aided surgical intervention which has numero... more Tool tracking is an accepted capability for computer-aided surgical intervention which has numerous applications, both in robotic and manual minimally-invasive procedures. In this paper, we describe a tracking system which learns visual feature descriptors as class-specific landmarks on an articulated tool. The features are localized in 3D using stereo vision and are fused with the robot kinematics to track all of the joints of the dexterous manipulator. Experiments are performed using previously-collected porcine data from a surgical robot.

Research paper thumbnail of Learning features on robotic surgical tools

2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2012

Computer-aided surgical interventions in both manual and robotic procedures have been shown to im... more Computer-aided surgical interventions in both manual and robotic procedures have been shown to improve patient outcomes and enhance the skills of the human physician. Tool tracking is one such example that has various applications. In this paper, we show how to learn fine-scaled features on surgical tools for the purpose of pose estimation. Our experiments analyze different state-of-the-art feature descriptors coupled with various learning algorithms on in-vivo data from a surgical robot. We propose that it is important to be able to detect naturally-occurring features robustly in order to achieve long-term, marker-less tool tracking. We also contribute a new improvement on feature classification based on Randomized Trees.

Research paper thumbnail of Learning a highly structured motion model for 3D human tracking

This paper presents our work on learning high level structure from human motion sequences, and it... more This paper presents our work on learning high level structure from human motion sequences, and its applications in human figure tracking. We use a structured representation ("primitives" and their transitions) of complex motion and propose a two-step unsupervised learning approach to recover the natural "primitives" from unsegmented 3D-motion captured sequences of complex human motion. The structure recovery is done under the MDL (minimum description length) paradigm. Then the learnt dynamic model of human motion is used in the CONDENSATION framework to successfully track human motion in a video sequence. Experimental results of ballet dancing sequences demonstrate that our approach works well. The learnt structure is also used to synthesize new video sequences.

Research paper thumbnail of Marker-less articulated surgical tool detection

Purpose We propose a system capable of detecting articulated surgical instruments without the use... more Purpose We propose a system capable of detecting articulated surgical instruments without the use of assistive markers or manual initialization. Methods The algorithm can provide 3D pose using a combination of online and offline learning techniques along with prior geometric knowledge of the tool. It uses live kinematic data from the robotic system to render nearby poses on-thefly as virtual images and creates gradient orientation templates for fast matching into the real image. Prior appearance models of different material classes and projective invariance are used to reject false positives. Results Results are verified using in-vivo data recorded from the da Vinci R robotic surgical system. The method detects successfully at a high correctness rate and a pyramid search method is proposed which reduces a brute-force method from 23 secs/frame down to 3 secs/frame. Conclusion We have shown a top-down approach to detect surgical tools within in-vivo video sequences and is capable of determining the pose and articulation by learning on-thefly from virtual renderings driven by real kinematic data.

Research paper thumbnail of Segmentation and Tracking of Multiple Humans in Crowded Environments

IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

Segmentation and tracking of multiple humans in crowded situations is made difficult by interobje... more Segmentation and tracking of multiple humans in crowded situations is made difficult by interobject occlusion. We propose a model-based approach to interpret the image observations by multiple partially occluded human hypotheses in a Bayesian framework. We define a joint image likelihood for multiple humans based on the appearance of the humans, the visibility of the body obtained by occlusion reasoning, and foreground/background separation. The optimal solution is obtained by using an efficient sampling method, data-driven Markov chain Monte Carlo (DDMCMC), which uses image observations for proposal probabilities. Knowledge of various aspects, including human shape, camera model, and image cues, are integrated in one theoretically sound framework. We present experimental results and quantitative evaluation, demonstrating that the resulting approach is effective for very challenging data.

Research paper thumbnail of Car detection in low resolution aerial images

Image and Vision Computing, 2003

Research paper thumbnail of Car detection in low resolution aerial image

W e present a system to detect passenger cars i n aerial images ,whe,re cmrs appear as small obje... more W e present a system to detect passenger cars i n aerial images ,whe,re cmrs appear as small objects. W e pose this as a 3D object recognition problem to account f o r the uariation i n viezupoint and the shadow. W e started from. psychological tests t o find important features f o r human detection of cars. Based on these observations, we selected the boundary of the car body, the boundary of the front windshield, and the shadow as the features. Some of these features are affected by the intensity of the car and whether or not there is a shad0.w along it. This information is 'represented in the structure of the Bayesian network that * we use t o ziitegrate all feutures. Experzments show very promzsing results euen on some very challengang zmages. We need to account for all these difficulties t o get a reasonable good system. Cars can be of any intensity in the image, from the darkest to the lightest. And some cars' intensity is very close to the road. The image quality varies. The brightness, contrast and sharpness of the images change due to factors including illumination, focusing and atmospheric turbulence. The expected features of a car differ with its intensity and the existence of shadow. For a simple example, whether or not the boundary of a gray car can be detected depends heavily on its shadow. (See 4.1 for more detail.) require integration of multiple cues. The aerial images we used are grayscale images taken mostly from a vertical or slightly oblique viewpoint,. The length of a typical car is around 26 pixels in image. The camera calibration is known as well as the sunlight direction. Detection from aerial image is easier than from detection from an arbitrary viewpoint in that the viewpoint is constrained. However, it is still not as easy as it may seem t o be. Example images are shown in Fig.1. The main difficulties lie in the following: e Although the viewpoint is constrained, there are still variations that make the cars look different. 0 The image resolution is low so not many details are visible.

Research paper thumbnail of Articulated Surgical Tool Detection Using Virtually-Rendered Templates

Purpose We propose a system capable of detecting articulated surgical instruments without the use... more Purpose We propose a system capable of detecting articulated surgical instruments without the use of assistive markers or manual initialization. Methods The algorithm can provide 3D pose using a combination of online and offline learning techniques along with prior geometric knowledge of the tool. It uses live kinematic data from the robotic system to render nearby poses on-thefly as virtual images and creates gradient orientation templates for fast matching into the real image. Prior appearance models of different material classes and projective invariance are used to reject false positives. Results Results are verified using in-vivo data recorded from the da Vinci R © robotic surgical system. The method detects successfully at a high correctness rate and a pyramid search method is proposed which reduces a brute-force method from 23 secs/frame down to 3 secs/frame. Conclusion We have shown a top-down approach to detect surgical tools within in-vivo video sequences and is capable of...

Research paper thumbnail of Bayesian human segmentation in crowded situations

2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings.

Problem of segmenting individual humans in crowded situations from stationary video camera sequen... more Problem of segmenting individual humans in crowded situations from stationary video camera sequences is exacerbated by object inter-occlusion. We pose this problem as a "model-based segmentation" problem in which human shape models are used to interpret the foreground in a Bayesian framework. The solution is obtained by using an efficient Markov chain Monte Carlo (MCMC) method which uses domain knowledge as proposal probabilities. Knowledge of various aspects including human shape, human height, camera model, and image cues including human head candidates, foreground/background separation are integrated in one theoretically sound framework. We show promising results and evaluations on some challenging data.

Research paper thumbnail of Tracking multiple humans in crowded environment

Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004.

Tracking of humans in dynamic scenes has been an important topic of research. Most techniques, ho... more Tracking of humans in dynamic scenes has been an important topic of research. Most techniques, however, are limited to situations where humans appear isolated and occlusion is small. Typical methods rely on appearance models that must be acquired when the humans enter the scene and are not occluded. We present a method that can track humans in crowded environments, with significant and persistent occlusion by making use of human shape models in addition to camera models, the assumption that humans walk on a plane and acquired appearance models. Experimental results and a quantitative evaluation are included.

Research paper thumbnail of Feature Classification for Tracking Articulated Surgical Tools

Medical Image Computing and Computer-Assisted Intervention – MICCAI 2012, 2012

Tool tracking is an accepted capability for computer-aided surgical intervention which has numero... more Tool tracking is an accepted capability for computer-aided surgical intervention which has numerous applications, both in robotic and manual minimally-invasive procedures. In this paper, we describe a tracking system which learns visual feature descriptors as class-specific landmarks on an articulated tool. The features are localized in 3D using stereo vision and are fused with the robot kinematics to track all of the joints of the dexterous manipulator. Experiments are performed using previously-collected porcine data from a surgical robot.

Research paper thumbnail of Learning features on robotic surgical tools

2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2012

Computer-aided surgical interventions in both manual and robotic procedures have been shown to im... more Computer-aided surgical interventions in both manual and robotic procedures have been shown to improve patient outcomes and enhance the skills of the human physician. Tool tracking is one such example that has various applications. In this paper, we show how to learn fine-scaled features on surgical tools for the purpose of pose estimation. Our experiments analyze different state-of-the-art feature descriptors coupled with various learning algorithms on in-vivo data from a surgical robot. We propose that it is important to be able to detect naturally-occurring features robustly in order to achieve long-term, marker-less tool tracking. We also contribute a new improvement on feature classification based on Randomized Trees.

Research paper thumbnail of Learning a highly structured motion model for 3D human tracking

This paper presents our work on learning high level structure from human motion sequences, and it... more This paper presents our work on learning high level structure from human motion sequences, and its applications in human figure tracking. We use a structured representation ("primitives" and their transitions) of complex motion and propose a two-step unsupervised learning approach to recover the natural "primitives" from unsegmented 3D-motion captured sequences of complex human motion. The structure recovery is done under the MDL (minimum description length) paradigm. Then the learnt dynamic model of human motion is used in the CONDENSATION framework to successfully track human motion in a video sequence. Experimental results of ballet dancing sequences demonstrate that our approach works well. The learnt structure is also used to synthesize new video sequences.

Research paper thumbnail of Marker-less articulated surgical tool detection

Purpose We propose a system capable of detecting articulated surgical instruments without the use... more Purpose We propose a system capable of detecting articulated surgical instruments without the use of assistive markers or manual initialization. Methods The algorithm can provide 3D pose using a combination of online and offline learning techniques along with prior geometric knowledge of the tool. It uses live kinematic data from the robotic system to render nearby poses on-thefly as virtual images and creates gradient orientation templates for fast matching into the real image. Prior appearance models of different material classes and projective invariance are used to reject false positives. Results Results are verified using in-vivo data recorded from the da Vinci R robotic surgical system. The method detects successfully at a high correctness rate and a pyramid search method is proposed which reduces a brute-force method from 23 secs/frame down to 3 secs/frame. Conclusion We have shown a top-down approach to detect surgical tools within in-vivo video sequences and is capable of determining the pose and articulation by learning on-thefly from virtual renderings driven by real kinematic data.

Research paper thumbnail of Segmentation and Tracking of Multiple Humans in Crowded Environments

IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

Segmentation and tracking of multiple humans in crowded situations is made difficult by interobje... more Segmentation and tracking of multiple humans in crowded situations is made difficult by interobject occlusion. We propose a model-based approach to interpret the image observations by multiple partially occluded human hypotheses in a Bayesian framework. We define a joint image likelihood for multiple humans based on the appearance of the humans, the visibility of the body obtained by occlusion reasoning, and foreground/background separation. The optimal solution is obtained by using an efficient sampling method, data-driven Markov chain Monte Carlo (DDMCMC), which uses image observations for proposal probabilities. Knowledge of various aspects, including human shape, camera model, and image cues, are integrated in one theoretically sound framework. We present experimental results and quantitative evaluation, demonstrating that the resulting approach is effective for very challenging data.

Research paper thumbnail of Car detection in low resolution aerial images

Image and Vision Computing, 2003

Research paper thumbnail of Car detection in low resolution aerial image

W e present a system to detect passenger cars i n aerial images ,whe,re cmrs appear as small obje... more W e present a system to detect passenger cars i n aerial images ,whe,re cmrs appear as small objects. W e pose this as a 3D object recognition problem to account f o r the uariation i n viezupoint and the shadow. W e started from. psychological tests t o find important features f o r human detection of cars. Based on these observations, we selected the boundary of the car body, the boundary of the front windshield, and the shadow as the features. Some of these features are affected by the intensity of the car and whether or not there is a shad0.w along it. This information is 'represented in the structure of the Bayesian network that * we use t o ziitegrate all feutures. Experzments show very promzsing results euen on some very challengang zmages. We need to account for all these difficulties t o get a reasonable good system. Cars can be of any intensity in the image, from the darkest to the lightest. And some cars' intensity is very close to the road. The image quality varies. The brightness, contrast and sharpness of the images change due to factors including illumination, focusing and atmospheric turbulence. The expected features of a car differ with its intensity and the existence of shadow. For a simple example, whether or not the boundary of a gray car can be detected depends heavily on its shadow. (See 4.1 for more detail.) require integration of multiple cues. The aerial images we used are grayscale images taken mostly from a vertical or slightly oblique viewpoint,. The length of a typical car is around 26 pixels in image. The camera calibration is known as well as the sunlight direction. Detection from aerial image is easier than from detection from an arbitrary viewpoint in that the viewpoint is constrained. However, it is still not as easy as it may seem t o be. Example images are shown in Fig.1. The main difficulties lie in the following: e Although the viewpoint is constrained, there are still variations that make the cars look different. 0 The image resolution is low so not many details are visible.