Shih-Chung Hsu - Academia.edu (original) (raw)
Papers by Shih-Chung Hsu
Real-Time Hand Finger Motion Capturing Using Regression Forest
This paper proposes a real-time hand finger motion capturing method using Kinect. It consists of ... more This paper proposes a real-time hand finger motion capturing method using Kinect. It consists of three modules: hand region segmentation, feature points extraction, and joint angle estimation. The first module extracts the hand region from the depth image. The second module applies a pixel classifier to segment the hand region into eight characteristic sub-regions and the residual sub-region. The centroid of each characteristic sub-region is extracted as the feature point. The third module converts these feature points to the feature vector for finger joint angle estimation by using the regression forest. The estimation process has both the speed and precision advantages and it can also deal with the hand finger motion parameter of novel hand gesture. The experimental results show that our method can capture the hand finger motion parameters of global in-plane hand rotation with sufficient estimation accuracy.
Journal of Information Science and Engineering, 2016
This paper introduces a new video-based facial expression recognition system. Facial expression a... more This paper introduces a new video-based facial expression recognition system. Facial expression analysis encounters two major problems: non-rigid shape deformation and person-specific facial expression appearance. Our method analyzes the video sequence to recognize facial expression and locate the temporal apex of the facial expression by using modified Hough forest and minimizing the influence of person-specific facial expression appearance. Our contributions are (1) random sampling 3-D accumulated spatial-temporal motion map to generate video patches, (2) proposing the correlation filtering for more effective Hough voting, and (3) recognizing and locating the apex of the facial expression. The experimental results show that the performance of our method is better than the other face expression recognition methods.
Journal of Information Science and Engineering, Nov 1, 2014
Real-time foreground object extraction is an important subject for computer vision applications. ... more Real-time foreground object extraction is an important subject for computer vision applications. Model-based background subtraction methods have been used to extract the foreground objects. Different from previous methods, this paper introduces a hybrid codebook-based background subtraction method by combining the mixture of Gaussian (MOG) with the codebook (CB) method. We propose an ellipsoid CB model for modeling the dynamic background with highlight and shadow, and develop a modified shadow/highlight removal method to overcome the influence of illumination change. Our method can avoid extracting the false foreground pixels (e.g., dark background) or missing the real foreground pixels (e.g., bright foreground). Finally, we have done two experiments to compare the performance of our method with the others based on [18] and the change detection benchmark dataset provided in CVPR 2011, respectively.
A video-based abnormal human behavior detection for psychiatric patient monitoring
2018 International Workshop on Advanced Image Technology (IWAIT), 2018
This paper proposes an abnormal human behavior detection system for monitoring psychiatric patien... more This paper proposes an abnormal human behavior detection system for monitoring psychiatric patient. A normal behavior can be characterized by the spatial and temporal features of human activities. The difficulty of abnormal behavior detection is that human behavior is unpredictable and complicated. It varies in both motion and appearance. The human behavior video stream is interspersed with transition of abnormal and normal events. Here, we propose an unsupervised learning using the N-cut algorithm along with the SVM to label the video segments and then apply the Condition random field (CRF) with an adaptive threshold to distinguish the normal and abnormal events.
Pattern Recognition, Sep 1, 2018
Highlights The proposed system can be applied to vehicle verification under non-overlapped view... more Highlights The proposed system can be applied to vehicle verification under non-overlapped views of which the shapes and illuminations of vehicles are different. Propose a novel sparse dictionary learning approach, Boost K-SVD, for vehicle verification. The generated dictionary provides good RIP and sparser representation for samples. The better dictionary, the better pair verification can be promised. An adaptive dictionary size estimation is proposed to estimate optimal sizes for different datasets.
Springer eBooks, 2011
This paper introduces a tracking algorithm to track the multiple objects across multiple non-over... more This paper introduces a tracking algorithm to track the multiple objects across multiple non-overlapped views. First, we track every single object in each single view and record its activity as the object-based video fragments (OVFs). By linking the related OVFs across different cameras, we may connect two OVFs across two non-overlapped views. Because of scene illumination change, blind region lingering, and objects similar appearance, we may have the problem of path misconnection and fragmentation. This paper develops the Error Path Detection Function (EPDF) and uses the augmented feature (AF) to solve those two problems.
We present an image classification method which consists of salient region (SR) detection, local ... more We present an image classification method which consists of salient region (SR) detection, local feature extraction, and pairwise local observations based Naive Bayes classifier (NBPLO). Different from previous image classification algorithms, we propose a scale, translation, and rotation invariant image classification algorithm. Based on the discriminative pairwise local observations, we develop the structure object model based Naive Bayes classifier for image classification. We do the experiments using Scene-15 and Caltech-101 database and compare the experiment results of bag-of-features (BoF) and SPM algorithms.
Virtual touchpad: Hand gesture recognition for smartphone with depth camera
This paper presents a virtual touchpad for smartphone with depth camera such as HTC One(M8). Hand... more This paper presents a virtual touchpad for smartphone with depth camera such as HTC One(M8). Hand gesture recognition is used as a new human machine interface for hand-held devices. This system consists of three modules: (1) open hand extraction, (2) active fingertip detection and tracking, and (3) hand gesture recognition. In the experiment, we show that our method can detect the active fingertips and track their trajectory to recognize the hand gestures effectively.
Facial Expression Recognition for Human-Robot Interaction
Facial expression recognition (FER) has been applied for human-robot interaction (HRI). An assist... more Facial expression recognition (FER) has been applied for human-robot interaction (HRI). An assistant robot having a close interaction with human being should be able to recognize human facial expression. FER is a non-trivial problem because each individual has his own way to reveal his emotion and the facial expressions of two different persons may not be totally identical. Facial expression can be divided into four phases: neutral, onset, apex, offset, and then back to neutral. In this paper, we propose a hybrid method to recognize the facial expression in the apex phase. In the 1st stage, we use Gabor filter to obtain the facial features and apply Support Vector Machine (SVM) to identify Action Units (AUs). In the 2nd stage, based on the identified AUs, we apply random forest classifiers to recognize the facial expressions. Finally, we show the experimental results and compare our method with the other methods.
Vehicle verification in two nonoverlapped views using sparse representation
Vehicle verification in two different views can be applied for Intelligent Transportation System.... more Vehicle verification in two different views can be applied for Intelligent Transportation System. However, object appearance matching in two different views is difficult. The vehicle images captured in two views are represented as a feature pair which can be classified as the same/different pair. Sparse representation (SR) has been applied for reconstruction, recognition, and verification. However, the SR dictionary may not guarantee feature sparsity and effective representation. In the paper, we propose Boost-KSVD method without using initial random atom to generate the SR dictionary which can be applied for object verification with very good accuracy. Then, we develop a discriminative criterion to decide the SR dictionary size. Finally, the experiments show that our method can generate better verification accuracy compared with the other methods.
Object verification in two different views using sparse representation
This paper proposes an object verification method in two different views by using sparse represen... more This paper proposes an object verification method in two different views by using sparse representation. The proposed method contains three major modules. First, we train the sparse matrix by using K-Singular Valued Decomposition (K-SVD) and the maximum correlation training sample selection. Second, we project the training samples onto the sparse matrix to obtain the parse vector training set. Third, we combine two training sets of the same/different objects from two different views to generate positive/negative hybrid sparse vector sets for SVM classifier training. Our contributions in this paper are (1) proposing a better dictionary representation learning than original K-SVD learning, and (2) developing an optimal sparse representation for object verification with very good accuracy. In the experiment, we show that our method has the better accuracy than the other methods.
First-person-vision-based driver assistance system
This paper presents a driver assistance system to monitor the driver driving behavior by applying... more This paper presents a driver assistance system to monitor the driver driving behavior by applying the so-called “First-Person Vision” (FPV) technology. It consists of two modules: the scene classification and the driver viewing angle estimation. First, we use “bag of words” image classification approach based on FAST and BRIEF feature descriptor in the dataset. Second, we establish the “vocabulary dictionary” to encode an input image as a feature vector. Third, we apply SVM classifier to detect whether the driver's view is inside or outside scene of a vehicle. Finally, we estimate the driver viewing angle estimation based on FPV and the windshield-mounted camera. In the experiments, we illustrate the effectiveness of our system.
Object verification in two views using Sparse representation
This paper proposes an object verification method by using sparse representation (SR) which has b... more This paper proposes an object verification method by using sparse representation (SR) which has been applied for object representation and recognition. However, SR dictionary does not show sufficient compactness. Our method comprises three major modules. First, we train the sparse matrix by using boost K-Singular Value Decomposition (boost K-SVD) to obtain a sparse vector set. Second, we combine two training sparse vector sets of the same and different objects from two views to generate a positive/negative combined sparse vector set. Finally, a Support Vector Machine (SVM) classifier is applied for verification. Our contributions are (1) obtaining a sparser vector set using K-SVD, (2) demonstrating the SR matrix with better Restricted Isometry Property (RIP), and (3) applying the SR matrix to the object verification process with high accuracy. The experimental results prove that our method has higher accuracy than the other methods.
A vision-based walking motion parameters capturing system
The markerless vision-based human motion parameters capturing has been widely applied for human-m... more The markerless vision-based human motion parameters capturing has been widely applied for human-machine interface. However, it faces two problems: the high-dimensional parameter estimation and the self-occlusion. Here, we propose a 3-D human model with structural, kinematic, and temporal constraints to track a walking human object in any viewing direction. Our method modifies the Annealed Particle Filter (APF) by applying the pre-trained spatial correlation map and the temporal constraint to estimate the motion parameters of a walking human object. In the experiments, we demonstrate that the proposed method requires less computation time and generates more accurate results.
Optimal Training Set Selection for Video Annotation
Most learning-based video semantic analysis methods require a large training set to achieve good ... more Most learning-based video semantic analysis methods require a large training set to achieve good performances. However, annotating a large video is laborintensive. This paper introduces how to construct the training set and reduce user involvement. There are four selection schemes proposed: clustering-based, spatial dispersiveness, temporal dispersiveness, and sample-based which can be used construct a small size and effective training set.
People counting using ellipse detection and forward/backward tracing
There are two different people counting methods: (1) counting people across a detecting line in c... more There are two different people counting methods: (1) counting people across a detecting line in certain time duration and (2) estimating the total number of people in some region at certain time instance. This paper presents a new approach to count the number of people crossing a line of interest (LOI). First, the foreground object silhouettes are extracted described as
This paper presents a single sample per person (SSPP)-based face recognition method. Based on the... more This paper presents a single sample per person (SSPP)-based face recognition method. Based on the Discriminative Multi-manifold Analysis (DMMA), we propose an accelerative face recognition method which consists of three modules. First, for one person one training image sample, we use a modified of K-means method to cluster two groups of people. Second, we divide the face images into nonoverlapping local patches and apply DMMA. Third, we repeat the previous two steps to obtain the binary tree projection matrix of fast DMMA. In the experiments, we test the AR database and FERET database to verify the effectiveness of SSPP-based fast DMMA face recognition process in both accuracy and speed.
A hybrid codebook background model for background subtraction
Real-time segmentation of scene into foreground and background is an important issue for many app... more Real-time segmentation of scene into foreground and background is an important issue for many applications. Different from previous codebook (CB) methods, this paper introduces a hybrid CB model by combining the mixture of Gaussian (MOG) method and the CB method. It can be used to solve the problems of moving background and shadow/highlight on the background and background. Our method
Real-Time Hand Finger Motion Capturing Using Regression Forest
2016 International Computer Symposium (ICS), 2016
This paper proposes a real-time hand finger motion capturing method using Kinect. It consists of ... more This paper proposes a real-time hand finger motion capturing method using Kinect. It consists of three modules: hand region segmentation, feature points extraction, and joint angle estimation. The first module extracts the hand region from the depth image. The second module applies a pixel classifier to segment the hand region into eight characteristic sub-regions and the residual sub-region. The centroid of each characteristic sub-region is extracted as the feature point. The third module converts these feature points to the feature vector for finger joint angle estimation by using the regression forest. The estimation process has both the speed and precision advantages and it can also deal with the hand finger motion parameter of novel hand gesture. The experimental results show that our method can capture the hand finger motion parameters of global in-plane hand rotation with sufficient estimation accuracy.
Virtual touchpad: Hand gesture recognition for smartphone with depth camera
2015 IEEE International Conference on Consumer Electronics - Taiwan, 2015
This paper presents a virtual touchpad for smartphone with depth camera such as HTC One(M8). Hand... more This paper presents a virtual touchpad for smartphone with depth camera such as HTC One(M8). Hand gesture recognition is used as a new human machine interface for hand-held devices. This system consists of three modules: (1) open hand extraction, (2) active fingertip detection and tracking, and (3) hand gesture recognition. In the experiment, we show that our method can detect the active fingertips and track their trajectory to recognize the hand gestures effectively.
Real-Time Hand Finger Motion Capturing Using Regression Forest
This paper proposes a real-time hand finger motion capturing method using Kinect. It consists of ... more This paper proposes a real-time hand finger motion capturing method using Kinect. It consists of three modules: hand region segmentation, feature points extraction, and joint angle estimation. The first module extracts the hand region from the depth image. The second module applies a pixel classifier to segment the hand region into eight characteristic sub-regions and the residual sub-region. The centroid of each characteristic sub-region is extracted as the feature point. The third module converts these feature points to the feature vector for finger joint angle estimation by using the regression forest. The estimation process has both the speed and precision advantages and it can also deal with the hand finger motion parameter of novel hand gesture. The experimental results show that our method can capture the hand finger motion parameters of global in-plane hand rotation with sufficient estimation accuracy.
Journal of Information Science and Engineering, 2016
This paper introduces a new video-based facial expression recognition system. Facial expression a... more This paper introduces a new video-based facial expression recognition system. Facial expression analysis encounters two major problems: non-rigid shape deformation and person-specific facial expression appearance. Our method analyzes the video sequence to recognize facial expression and locate the temporal apex of the facial expression by using modified Hough forest and minimizing the influence of person-specific facial expression appearance. Our contributions are (1) random sampling 3-D accumulated spatial-temporal motion map to generate video patches, (2) proposing the correlation filtering for more effective Hough voting, and (3) recognizing and locating the apex of the facial expression. The experimental results show that the performance of our method is better than the other face expression recognition methods.
Journal of Information Science and Engineering, Nov 1, 2014
Real-time foreground object extraction is an important subject for computer vision applications. ... more Real-time foreground object extraction is an important subject for computer vision applications. Model-based background subtraction methods have been used to extract the foreground objects. Different from previous methods, this paper introduces a hybrid codebook-based background subtraction method by combining the mixture of Gaussian (MOG) with the codebook (CB) method. We propose an ellipsoid CB model for modeling the dynamic background with highlight and shadow, and develop a modified shadow/highlight removal method to overcome the influence of illumination change. Our method can avoid extracting the false foreground pixels (e.g., dark background) or missing the real foreground pixels (e.g., bright foreground). Finally, we have done two experiments to compare the performance of our method with the others based on [18] and the change detection benchmark dataset provided in CVPR 2011, respectively.
A video-based abnormal human behavior detection for psychiatric patient monitoring
2018 International Workshop on Advanced Image Technology (IWAIT), 2018
This paper proposes an abnormal human behavior detection system for monitoring psychiatric patien... more This paper proposes an abnormal human behavior detection system for monitoring psychiatric patient. A normal behavior can be characterized by the spatial and temporal features of human activities. The difficulty of abnormal behavior detection is that human behavior is unpredictable and complicated. It varies in both motion and appearance. The human behavior video stream is interspersed with transition of abnormal and normal events. Here, we propose an unsupervised learning using the N-cut algorithm along with the SVM to label the video segments and then apply the Condition random field (CRF) with an adaptive threshold to distinguish the normal and abnormal events.
Pattern Recognition, Sep 1, 2018
Highlights The proposed system can be applied to vehicle verification under non-overlapped view... more Highlights The proposed system can be applied to vehicle verification under non-overlapped views of which the shapes and illuminations of vehicles are different. Propose a novel sparse dictionary learning approach, Boost K-SVD, for vehicle verification. The generated dictionary provides good RIP and sparser representation for samples. The better dictionary, the better pair verification can be promised. An adaptive dictionary size estimation is proposed to estimate optimal sizes for different datasets.
Springer eBooks, 2011
This paper introduces a tracking algorithm to track the multiple objects across multiple non-over... more This paper introduces a tracking algorithm to track the multiple objects across multiple non-overlapped views. First, we track every single object in each single view and record its activity as the object-based video fragments (OVFs). By linking the related OVFs across different cameras, we may connect two OVFs across two non-overlapped views. Because of scene illumination change, blind region lingering, and objects similar appearance, we may have the problem of path misconnection and fragmentation. This paper develops the Error Path Detection Function (EPDF) and uses the augmented feature (AF) to solve those two problems.
We present an image classification method which consists of salient region (SR) detection, local ... more We present an image classification method which consists of salient region (SR) detection, local feature extraction, and pairwise local observations based Naive Bayes classifier (NBPLO). Different from previous image classification algorithms, we propose a scale, translation, and rotation invariant image classification algorithm. Based on the discriminative pairwise local observations, we develop the structure object model based Naive Bayes classifier for image classification. We do the experiments using Scene-15 and Caltech-101 database and compare the experiment results of bag-of-features (BoF) and SPM algorithms.
Virtual touchpad: Hand gesture recognition for smartphone with depth camera
This paper presents a virtual touchpad for smartphone with depth camera such as HTC One(M8). Hand... more This paper presents a virtual touchpad for smartphone with depth camera such as HTC One(M8). Hand gesture recognition is used as a new human machine interface for hand-held devices. This system consists of three modules: (1) open hand extraction, (2) active fingertip detection and tracking, and (3) hand gesture recognition. In the experiment, we show that our method can detect the active fingertips and track their trajectory to recognize the hand gestures effectively.
Facial Expression Recognition for Human-Robot Interaction
Facial expression recognition (FER) has been applied for human-robot interaction (HRI). An assist... more Facial expression recognition (FER) has been applied for human-robot interaction (HRI). An assistant robot having a close interaction with human being should be able to recognize human facial expression. FER is a non-trivial problem because each individual has his own way to reveal his emotion and the facial expressions of two different persons may not be totally identical. Facial expression can be divided into four phases: neutral, onset, apex, offset, and then back to neutral. In this paper, we propose a hybrid method to recognize the facial expression in the apex phase. In the 1st stage, we use Gabor filter to obtain the facial features and apply Support Vector Machine (SVM) to identify Action Units (AUs). In the 2nd stage, based on the identified AUs, we apply random forest classifiers to recognize the facial expressions. Finally, we show the experimental results and compare our method with the other methods.
Vehicle verification in two nonoverlapped views using sparse representation
Vehicle verification in two different views can be applied for Intelligent Transportation System.... more Vehicle verification in two different views can be applied for Intelligent Transportation System. However, object appearance matching in two different views is difficult. The vehicle images captured in two views are represented as a feature pair which can be classified as the same/different pair. Sparse representation (SR) has been applied for reconstruction, recognition, and verification. However, the SR dictionary may not guarantee feature sparsity and effective representation. In the paper, we propose Boost-KSVD method without using initial random atom to generate the SR dictionary which can be applied for object verification with very good accuracy. Then, we develop a discriminative criterion to decide the SR dictionary size. Finally, the experiments show that our method can generate better verification accuracy compared with the other methods.
Object verification in two different views using sparse representation
This paper proposes an object verification method in two different views by using sparse represen... more This paper proposes an object verification method in two different views by using sparse representation. The proposed method contains three major modules. First, we train the sparse matrix by using K-Singular Valued Decomposition (K-SVD) and the maximum correlation training sample selection. Second, we project the training samples onto the sparse matrix to obtain the parse vector training set. Third, we combine two training sets of the same/different objects from two different views to generate positive/negative hybrid sparse vector sets for SVM classifier training. Our contributions in this paper are (1) proposing a better dictionary representation learning than original K-SVD learning, and (2) developing an optimal sparse representation for object verification with very good accuracy. In the experiment, we show that our method has the better accuracy than the other methods.
First-person-vision-based driver assistance system
This paper presents a driver assistance system to monitor the driver driving behavior by applying... more This paper presents a driver assistance system to monitor the driver driving behavior by applying the so-called “First-Person Vision” (FPV) technology. It consists of two modules: the scene classification and the driver viewing angle estimation. First, we use “bag of words” image classification approach based on FAST and BRIEF feature descriptor in the dataset. Second, we establish the “vocabulary dictionary” to encode an input image as a feature vector. Third, we apply SVM classifier to detect whether the driver's view is inside or outside scene of a vehicle. Finally, we estimate the driver viewing angle estimation based on FPV and the windshield-mounted camera. In the experiments, we illustrate the effectiveness of our system.
Object verification in two views using Sparse representation
This paper proposes an object verification method by using sparse representation (SR) which has b... more This paper proposes an object verification method by using sparse representation (SR) which has been applied for object representation and recognition. However, SR dictionary does not show sufficient compactness. Our method comprises three major modules. First, we train the sparse matrix by using boost K-Singular Value Decomposition (boost K-SVD) to obtain a sparse vector set. Second, we combine two training sparse vector sets of the same and different objects from two views to generate a positive/negative combined sparse vector set. Finally, a Support Vector Machine (SVM) classifier is applied for verification. Our contributions are (1) obtaining a sparser vector set using K-SVD, (2) demonstrating the SR matrix with better Restricted Isometry Property (RIP), and (3) applying the SR matrix to the object verification process with high accuracy. The experimental results prove that our method has higher accuracy than the other methods.
A vision-based walking motion parameters capturing system
The markerless vision-based human motion parameters capturing has been widely applied for human-m... more The markerless vision-based human motion parameters capturing has been widely applied for human-machine interface. However, it faces two problems: the high-dimensional parameter estimation and the self-occlusion. Here, we propose a 3-D human model with structural, kinematic, and temporal constraints to track a walking human object in any viewing direction. Our method modifies the Annealed Particle Filter (APF) by applying the pre-trained spatial correlation map and the temporal constraint to estimate the motion parameters of a walking human object. In the experiments, we demonstrate that the proposed method requires less computation time and generates more accurate results.
Optimal Training Set Selection for Video Annotation
Most learning-based video semantic analysis methods require a large training set to achieve good ... more Most learning-based video semantic analysis methods require a large training set to achieve good performances. However, annotating a large video is laborintensive. This paper introduces how to construct the training set and reduce user involvement. There are four selection schemes proposed: clustering-based, spatial dispersiveness, temporal dispersiveness, and sample-based which can be used construct a small size and effective training set.
People counting using ellipse detection and forward/backward tracing
There are two different people counting methods: (1) counting people across a detecting line in c... more There are two different people counting methods: (1) counting people across a detecting line in certain time duration and (2) estimating the total number of people in some region at certain time instance. This paper presents a new approach to count the number of people crossing a line of interest (LOI). First, the foreground object silhouettes are extracted described as
This paper presents a single sample per person (SSPP)-based face recognition method. Based on the... more This paper presents a single sample per person (SSPP)-based face recognition method. Based on the Discriminative Multi-manifold Analysis (DMMA), we propose an accelerative face recognition method which consists of three modules. First, for one person one training image sample, we use a modified of K-means method to cluster two groups of people. Second, we divide the face images into nonoverlapping local patches and apply DMMA. Third, we repeat the previous two steps to obtain the binary tree projection matrix of fast DMMA. In the experiments, we test the AR database and FERET database to verify the effectiveness of SSPP-based fast DMMA face recognition process in both accuracy and speed.
A hybrid codebook background model for background subtraction
Real-time segmentation of scene into foreground and background is an important issue for many app... more Real-time segmentation of scene into foreground and background is an important issue for many applications. Different from previous codebook (CB) methods, this paper introduces a hybrid CB model by combining the mixture of Gaussian (MOG) method and the CB method. It can be used to solve the problems of moving background and shadow/highlight on the background and background. Our method
Real-Time Hand Finger Motion Capturing Using Regression Forest
2016 International Computer Symposium (ICS), 2016
This paper proposes a real-time hand finger motion capturing method using Kinect. It consists of ... more This paper proposes a real-time hand finger motion capturing method using Kinect. It consists of three modules: hand region segmentation, feature points extraction, and joint angle estimation. The first module extracts the hand region from the depth image. The second module applies a pixel classifier to segment the hand region into eight characteristic sub-regions and the residual sub-region. The centroid of each characteristic sub-region is extracted as the feature point. The third module converts these feature points to the feature vector for finger joint angle estimation by using the regression forest. The estimation process has both the speed and precision advantages and it can also deal with the hand finger motion parameter of novel hand gesture. The experimental results show that our method can capture the hand finger motion parameters of global in-plane hand rotation with sufficient estimation accuracy.
Virtual touchpad: Hand gesture recognition for smartphone with depth camera
2015 IEEE International Conference on Consumer Electronics - Taiwan, 2015
This paper presents a virtual touchpad for smartphone with depth camera such as HTC One(M8). Hand... more This paper presents a virtual touchpad for smartphone with depth camera such as HTC One(M8). Hand gesture recognition is used as a new human machine interface for hand-held devices. This system consists of three modules: (1) open hand extraction, (2) active fingertip detection and tracking, and (3) hand gesture recognition. In the experiment, we show that our method can detect the active fingertips and track their trajectory to recognize the hand gestures effectively.