Chi-Yi Tsai - Academia.edu (original) (raw)
Papers by Chi-Yi Tsai
IEEE Access
Bin pick-and-place is an important topic in factory automation and warehouse automation. In this ... more Bin pick-and-place is an important topic in factory automation and warehouse automation. In this paper, a bin pick-and-place system based on robot operating system (ROS) is implemented to make a six-degree-of-freedom (6-DOF) robot manipulator to complete multiple pick-and-place tasks. The proposed system uses ROS to integrate an object perception module and a pick-and-place module, where the former uses an RGB-D camera to capture images inside the bin, and the latter controls a 6-DOF robot manipulator and two self-made vacuum tools. To estimate the pose of the target object, a YOLOv4 object detector is implemented, and an object sorting method is proposed to find the target object in the image. Then, a pose estimation method based on computer aided design (CAD) is proposed to estimate the pose of target object. To perform the object pick-and-place operations, a coordinate transformation node is designed to transfer the pose of the target object into the workspace. Then, a link distance-based bin collision avoidance method is proposed to avoid collisions. Finally, the angle of the 1-DOF vacuum tool and the picking and placement poses of the robot manipulator are obtained from the result of the bin collision avoidance and the pose of the target object. In this study, a total of ten ROS nodes are designed, and the solutions that make each function easier to implement and reproduce are proposed. In the experiments, we set up four experiments with two task types and two object types to verify the effectiveness of the implemented bin pick-and-place system.
Multimedia Tools and Applications, 2022
Sensors, 2022
Programming is a skill that requires high levels of logical thinking and problem-solving abilitie... more Programming is a skill that requires high levels of logical thinking and problem-solving abilities. According to the Curriculum Guidelines for the 12-Year Basic Education currently implemented in Taiwan, programming has been included in the mandatory courses of middle and high schools. Nevertheless, the guidelines simply recommend that elementary schools conduct fundamental instructions in related fields during alternative learning periods. This may result in the problem of a rough transition in programming learning for middle school freshmen. To alleviate this problem, this study proposes an augmented reality (AR) logic programming teaching system that combines AR technologies and game-based teaching material designs on the basis of the fundamental concepts for seventh-grade structured programming. This system can serve as an articulation curriculum for logic programming in primary education. Thus, students are able to develop basic programming logic concepts through AR technologie...
2017 International Conference on Applied System Innovation (ICASI), 2017
This paper proposes an indoor scene three-dimensional (3D) reconstruction system using pan-tilt p... more This paper proposes an indoor scene three-dimensional (3D) reconstruction system using pan-tilt platform and RGB-D camera. The proposed system can automatically reconstruct 3D indoor scenes on a fixed position. An efficient point cloud registration algorithm is proposed to align point clouds based on extrinsic parameters of the RGB-D camera from every presetted pan-tilt control points. Then, a local registration method is performed to refine the alignment result. Experimental results verify the quality and efficiency of the proposed point cloud alignment algorithm by comparing with a state-of-the-art method.
IEEE Access, 2021
This paper addresses the problems related to the mapless navigation control of wheeled mobile rob... more This paper addresses the problems related to the mapless navigation control of wheeled mobile robots based on deep learning technology. The traditional navigation control framework is based on a global map of the environment, and its navigation performance depends on the quality of the global map. In this paper, we proposes a mapless Light Detection and Ranging (LiDAR) navigation control method for wheeled mobile robots based on deep imitation learning. The proposed method is a data-driven control method that directly uses LiDAR sensors and relative target position for mobile robot navigation control. A deep convolutional neural network (CNN) model is proposed to predict motion control commands of the mobile robot without the requirement of the global map to achieve navigation control of the mobile robot in unknown environments. While collecting the training dataset, we manipulated the mobile robot to avoid obstacles through manual control and recorded the raw data of the LiDAR sens...
Generic object tracking (GOT) is one of the main topics in computer vision for many years. The go... more Generic object tracking (GOT) is one of the main topics in computer vision for many years. The goal of GOT is to recognize and locate a specific object in the form of bounding box throughout a sequence of images. Moreover, GOT also requires algorithms to locate objects down to instances level. These requirements produce some unique challenges especially for deep learning based GOT algorithms that may easily become over-fitting if given a really small training dataset of the object during the online tracking process. To deal with this issue, we propose a novel Reptile meta-tracking algorithm, which adopts a first-order meta-learning technique so that during initialization, the visual tracker only requires few training examples and few steps of optimization to perform well. The proposed Reptile meta-tracker is evaluated on OTB2015 and VOT2018 tracking benchmark datasets, and outperforms several state-of-the-art trackers using one-pass evaluation.
IET Image Processing, 2015
This study presents a cost-efficient and high-performance field programmable gate array (FPGA)-ba... more This study presents a cost-efficient and high-performance field programmable gate array (FPGA)-based hardware implementation of a contrast-preserving image dynamic range compression algorithm, which is an important function used in modern digital video cameras and displays to improve visual quality of standard dynamic range colour images (8 bits/channel). To achieve this purpose, a hardware-friendly approximation to an existing fast dynamic range compression with local contrast preservation (FDRCLCP) algorithm is proposed. The computation of the proposed approximated FDRCLCP algorithm requires only fixed-point unsigned binary addition, multiplication, and bitshifting. Moreover, the proposed hardware implementation uses a line buffer instead of a frame buffer to process whole image data. These advantages significantly improve throughput performance and reduce memory requirement of the system. The FPGA implementation of the proposed algorithm requires only about 98 K bits on-chip memory and achieves about 170.24 MHz operating frequency by using an Altera Cyclone II device. This is a large improvement compared with the existing results as it is quick enough to process full high-definition videos (1920 × 1080 pixels) at least 80 frames per second using a low-cost FPGA device.
Mathematics
Music is considered a powerful brain stimulus, as listening to it can activate several brain netw... more Music is considered a powerful brain stimulus, as listening to it can activate several brain networks. Music of different kinds and genres may have a different effect on the human brain. The goal of this study is to investigate the change in the brain’s functional connectivity (FC) when music is used as a stimulus. Secondly, the effect of listening to the subject’s favorite music is compared with listening to specifically formulated relaxing music with alpha binaural beats. Finally, the effect of the duration of music listening is studied. Subjects’ electroencephalographic (EEG) signals were captured as they listened to favorite and relaxing music. After preprocessing and artifact removal, the EEG recordings were decomposed into the delta, theta, alpha, and beta frequency bands, and the grand-averaged connectivity matrices were generated using Inter-Site Phase Clustering (ISPC) for each frequency band and each type of music. Furthermore, each lobe of the brain was analyzed separatel...
2017 10th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)
The usage of automatic driving assistance systems (ADAS) has become more and more popular in rece... more The usage of automatic driving assistance systems (ADAS) has become more and more popular in recent years. In the design of ADAS, traffic sign detection (TSD) and traffic sign recognition (TSR) are two important functions and have been widely studied in the literature. This paper addresses the design of a vision-based TSD and recognition (TSDR) system, which is computationally efficient and can run on a common smartphone with real time performance. To achieve this, a novel TSD algorithm is proposed based on the Maximally Stable Extremal Regions (MSER) algorithm to accurately detect all traffic sign candidates in real time. Then, the feature vector of each candidate region is extracted via the Histogram of Oriented Gradient (HOG) algorithm. To recognize the traffic sign, the proposed TSR algorithm is designed by combining a Linear Support Vector Machine (LSVM) classifier with a voting process to improve the traffic sign recognition rate in real-world environments. The proposed TSDR system had been implemented on iOS embedded platform and can operate at an average speed of 30 fps for processing 640 × 480 video streams. Moreover, experimental results show that the proposed system achieves about 96% precision rate for recognition on the German Traffic Sign Recognition Benchmark (GTSRB). Therefore, the proposed TSDR algorithm has a potential to be used in realistic products.
"ICIC Express Letters, Part B: Applications An International Journal of Research and Surveys", 2016
Dynamic range compression is an important function used in modern digital video cameras and displ... more Dynamic range compression is an important function used in modern digital video cameras and displays to improve visual quality of standard dynamic range color images. This chapter presents a real-time implementation of an adaptive contrast-enhancing image dynamic range compression algorithm on a graphics processing unit (GPU) for color image enhancement. To achieve this purpose, an image-dependent nonlinear intensity transfer function is first presented to produce a satisfactory dynamic-range compression result with less color artifacts. The proposed algorithm is then derived by combining the proposed nonlinear intensity transfer function with an existing simultaneous dynamic range compression and local-contrast enhancement (SDRCLCE) algorithm, which is a parallelizable method to compress image dynamic range while enhancing local contrast of output images. Finally, the proposed algorithm is implemented on the GPU by using NVIDIA Compute Unified Device Architecture (CUDA), achieving ...
Planar tracking is an essential function in computer vision to address the problem of tracking a ... more Planar tracking is an essential function in computer vision to address the problem of tracking a planar target affected by illumination, deformation and blurring with real-time performance. To achieve this goal, the tracking algorithm has to train a model of the planar target and uses this model to estimate the change of target pose. However, when the target is changed, the algorithm needs to retrain a model for the new target. To address this problem, this paper presents a novel deep learning based planar tracking algorithm, which can efficiently estimate the pose change of the target without retraining a new model when selecting a new target. The proposed convolutional neural network (CNN) model only requires two input images, one is the reference image, and the other one is the query image. Then, the pose change of the planar target between the two images can be directly estimated from the output of the proposed CNN model. Experimental results show that the proposed algorithm per...
Computer Vision – ECCV 2020 Workshops
The Visual Object Tracking challenge VOT2020 is the eighth annual tracker benchmarking activity o... more The Visual Object Tracking challenge VOT2020 is the eighth annual tracker benchmarking activity organized by the VOT initiative. Results of 58 trackers are presented; many are state-of-the-art trackers published at major computer vision conferences or in journals in the recent years. The VOT2020 challenge was composed of five sub-challenges focusing on different tracking domains: (i) VOT-ST2020 challenge focused on short-term tracking in RGB, (ii) VOT-RT2020 challenge focused on "real-time" short-term tracking in RGB, (iii) VOT-LT2020 focused on long-term tracking namely coping with target disappearance and reappearance, (iv) VOT-RGBT2020 challenge focused on short-term tracking in RGB and thermal imagery and (v) VOT-RGBD2020 challenge focused on long-term tracking in RGB and depth imagery. Only the VOT-ST2020 datasets were refreshed. A significant novelty is introduction of a new VOT short-term tracking evaluation methodology, and introduction of segmentation ground truth in the VOT-ST2020 challenge-bounding boxes will no longer be used in the VOT-ST challenges. A The Eighth Visual Object Tracking VOT2020 Challenge Results 3 new VOT Python toolkit that implements all these novelites was introduced. Performance of the tested trackers typically by far exceeds standard baselines. The source code for most of the trackers is publicly available from the VOT page. The dataset, the evaluation kit and the results are publicly available at the challenge website 47 .
2018 International Automatic Control Conference (CACS), 2018
This paper presents a novel convolutional neural network (CNN) based high-level control architect... more This paper presents a novel convolutional neural network (CNN) based high-level control architecture that uses deep learning technique to realize autonomous picking control of a six-degree-of-freedom (6-DoF) manipulator using the visual information only. The proposed manipulator control system uses a stereo camera as a measurement device to capture a stereo image of the scene in front of the robot. Then, the proposed CNN-based picking controller uses the captured stereo image as an input to predict the optimal picking control command of the manipulator directly. In the collection of the training dataset, we controlled the manipulator to pick up the object-of-interest (OOI) manually and recorded the stereo images and the corresponding control commands. In the CNN training phase, the supervised end-to-end learning technique is used to learn the mapping between the stereo image observation and the picking control commands of the 6-DoF manipulator. Experimental results show that the pro...
Sensors
In this paper, a manipulation planning method for object re-orientation based on semantic segment... more In this paper, a manipulation planning method for object re-orientation based on semantic segmentation keypoint detection is proposed for robot manipulator which is able to detect and re-orientate the randomly placed objects to a specified position and pose. There are two main parts: (1) 3D keypoint detection system; and (2) manipulation planning system for object re-orientation. In the 3D keypoint detection system, an RGB-D camera is used to obtain the information of the environment and can generate 3D keypoints of the target object as inputs to represent its corresponding position and pose. This process simplifies the 3D model representation so that the manipulation planning for object re-orientation can be executed in a category-level manner by adding various training data of the object in the training phase. In addition, 3D suction points in both the object’s current and expected poses are also generated as the inputs of the next operation stage. During the next stage, Mask Regi...
Applied Sciences
In the natural science curriculum, chemistry is a very important domain. However, when conducting... more In the natural science curriculum, chemistry is a very important domain. However, when conducting chemistry experiments, safety issues need to be taken seriously, and excessive material waste may be caused during the experiment. Based on the 11-year-old student science curriculum, this paper proposed a virtual chemistry laboratory, which was designed by combining a virtual experiment application with physical teaching materials. The virtual experiment application was a virtual experiment laboratory environment created by using selected experimental equipment cards in combination with augmented reality (AR) technology. The physical teaching materials included all virtual equipment required for experiment units. Each piece of equipment had corresponding cards for learners to choose from and utilize in specific experimental operations. It was hoped that students were able to achieve the desired learning effectiveness of experimental teaching while reducing the waste of experimental mat...
Proceedings of The 3rd International Conference on Intelligent Systems and Image Processing 2015, 2015
IET Computer Vision
Traffic sign recognition is a very important function in automatic driving assistance systems (AD... more Traffic sign recognition is a very important function in automatic driving assistance systems (ADAS). This study addresses the design and implementation of a vision-based ADAS based on an image-based speed-limit sign (SLS) recognition algorithm, which can automatically detect and recognise SLS on the road in real-time. To improve the recognition rate of SLS having different orientations and scales in the image, this study also presents a new sign content description algorithm, which describes the detected road sign using centroid-to-contour (CtC) distances of the extracted sign content. The proposed CtC descriptor is robust to translation, rotation and scale changes of the SLS in the image. This advantage improves the recognition accuracy of a support vector machine classifier trained using a large database of traffic signs. The proposed SLS recognition method had been implemented on two different embedded platforms, each of them equipped with an ARM-based Quad-Core CPU running Android 4.4 operating system. Experimental results validate that the proposed method not only provides a high recognition rate, but also achieves real-time performance up to 30 frames per second for processing 1280 × 720 video streams running on a commercial ARM-based smartphone.
2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)
The Visual Object Tracking challenge VOT2021 is the ninth annual tracker benchmarking activity or... more The Visual Object Tracking challenge VOT2021 is the ninth annual tracker benchmarking activity organized by the VOT initiative. Results of 71 trackers are presented; many are state-of-the-art trackers published at major computer vision conferences or in journals in recent years. The VOT2021 challenge was composed of four sub-challenges focusing on different tracking domains: (i) VOT-ST2021 challenge focused on short-term tracking in RGB, (ii) VOT-RT2021 challenge focused on "real-time" short-term tracking in RGB, (iii) VOT-LT2021 focused on long-term tracking, namely coping with target disappearance and reappearance and (iv) VOT-RGBD2021 challenge focused on long-term tracking in RGB and depth imagery. The VOT-ST2021 dataset was refreshed, while VOT-RGBD2021 introduces a training dataset and sequestered dataset for winner identification. The source code for most of the trackers, the datasets, the evaluation kit and the results along with the source code for most trackers are publicly available at the challenge website 1 .
IEEE Access
Bin pick-and-place is an important topic in factory automation and warehouse automation. In this ... more Bin pick-and-place is an important topic in factory automation and warehouse automation. In this paper, a bin pick-and-place system based on robot operating system (ROS) is implemented to make a six-degree-of-freedom (6-DOF) robot manipulator to complete multiple pick-and-place tasks. The proposed system uses ROS to integrate an object perception module and a pick-and-place module, where the former uses an RGB-D camera to capture images inside the bin, and the latter controls a 6-DOF robot manipulator and two self-made vacuum tools. To estimate the pose of the target object, a YOLOv4 object detector is implemented, and an object sorting method is proposed to find the target object in the image. Then, a pose estimation method based on computer aided design (CAD) is proposed to estimate the pose of target object. To perform the object pick-and-place operations, a coordinate transformation node is designed to transfer the pose of the target object into the workspace. Then, a link distance-based bin collision avoidance method is proposed to avoid collisions. Finally, the angle of the 1-DOF vacuum tool and the picking and placement poses of the robot manipulator are obtained from the result of the bin collision avoidance and the pose of the target object. In this study, a total of ten ROS nodes are designed, and the solutions that make each function easier to implement and reproduce are proposed. In the experiments, we set up four experiments with two task types and two object types to verify the effectiveness of the implemented bin pick-and-place system.
Multimedia Tools and Applications, 2022
Sensors, 2022
Programming is a skill that requires high levels of logical thinking and problem-solving abilitie... more Programming is a skill that requires high levels of logical thinking and problem-solving abilities. According to the Curriculum Guidelines for the 12-Year Basic Education currently implemented in Taiwan, programming has been included in the mandatory courses of middle and high schools. Nevertheless, the guidelines simply recommend that elementary schools conduct fundamental instructions in related fields during alternative learning periods. This may result in the problem of a rough transition in programming learning for middle school freshmen. To alleviate this problem, this study proposes an augmented reality (AR) logic programming teaching system that combines AR technologies and game-based teaching material designs on the basis of the fundamental concepts for seventh-grade structured programming. This system can serve as an articulation curriculum for logic programming in primary education. Thus, students are able to develop basic programming logic concepts through AR technologie...
2017 International Conference on Applied System Innovation (ICASI), 2017
This paper proposes an indoor scene three-dimensional (3D) reconstruction system using pan-tilt p... more This paper proposes an indoor scene three-dimensional (3D) reconstruction system using pan-tilt platform and RGB-D camera. The proposed system can automatically reconstruct 3D indoor scenes on a fixed position. An efficient point cloud registration algorithm is proposed to align point clouds based on extrinsic parameters of the RGB-D camera from every presetted pan-tilt control points. Then, a local registration method is performed to refine the alignment result. Experimental results verify the quality and efficiency of the proposed point cloud alignment algorithm by comparing with a state-of-the-art method.
IEEE Access, 2021
This paper addresses the problems related to the mapless navigation control of wheeled mobile rob... more This paper addresses the problems related to the mapless navigation control of wheeled mobile robots based on deep learning technology. The traditional navigation control framework is based on a global map of the environment, and its navigation performance depends on the quality of the global map. In this paper, we proposes a mapless Light Detection and Ranging (LiDAR) navigation control method for wheeled mobile robots based on deep imitation learning. The proposed method is a data-driven control method that directly uses LiDAR sensors and relative target position for mobile robot navigation control. A deep convolutional neural network (CNN) model is proposed to predict motion control commands of the mobile robot without the requirement of the global map to achieve navigation control of the mobile robot in unknown environments. While collecting the training dataset, we manipulated the mobile robot to avoid obstacles through manual control and recorded the raw data of the LiDAR sens...
Generic object tracking (GOT) is one of the main topics in computer vision for many years. The go... more Generic object tracking (GOT) is one of the main topics in computer vision for many years. The goal of GOT is to recognize and locate a specific object in the form of bounding box throughout a sequence of images. Moreover, GOT also requires algorithms to locate objects down to instances level. These requirements produce some unique challenges especially for deep learning based GOT algorithms that may easily become over-fitting if given a really small training dataset of the object during the online tracking process. To deal with this issue, we propose a novel Reptile meta-tracking algorithm, which adopts a first-order meta-learning technique so that during initialization, the visual tracker only requires few training examples and few steps of optimization to perform well. The proposed Reptile meta-tracker is evaluated on OTB2015 and VOT2018 tracking benchmark datasets, and outperforms several state-of-the-art trackers using one-pass evaluation.
IET Image Processing, 2015
This study presents a cost-efficient and high-performance field programmable gate array (FPGA)-ba... more This study presents a cost-efficient and high-performance field programmable gate array (FPGA)-based hardware implementation of a contrast-preserving image dynamic range compression algorithm, which is an important function used in modern digital video cameras and displays to improve visual quality of standard dynamic range colour images (8 bits/channel). To achieve this purpose, a hardware-friendly approximation to an existing fast dynamic range compression with local contrast preservation (FDRCLCP) algorithm is proposed. The computation of the proposed approximated FDRCLCP algorithm requires only fixed-point unsigned binary addition, multiplication, and bitshifting. Moreover, the proposed hardware implementation uses a line buffer instead of a frame buffer to process whole image data. These advantages significantly improve throughput performance and reduce memory requirement of the system. The FPGA implementation of the proposed algorithm requires only about 98 K bits on-chip memory and achieves about 170.24 MHz operating frequency by using an Altera Cyclone II device. This is a large improvement compared with the existing results as it is quick enough to process full high-definition videos (1920 × 1080 pixels) at least 80 frames per second using a low-cost FPGA device.
Mathematics
Music is considered a powerful brain stimulus, as listening to it can activate several brain netw... more Music is considered a powerful brain stimulus, as listening to it can activate several brain networks. Music of different kinds and genres may have a different effect on the human brain. The goal of this study is to investigate the change in the brain’s functional connectivity (FC) when music is used as a stimulus. Secondly, the effect of listening to the subject’s favorite music is compared with listening to specifically formulated relaxing music with alpha binaural beats. Finally, the effect of the duration of music listening is studied. Subjects’ electroencephalographic (EEG) signals were captured as they listened to favorite and relaxing music. After preprocessing and artifact removal, the EEG recordings were decomposed into the delta, theta, alpha, and beta frequency bands, and the grand-averaged connectivity matrices were generated using Inter-Site Phase Clustering (ISPC) for each frequency band and each type of music. Furthermore, each lobe of the brain was analyzed separatel...
2017 10th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)
The usage of automatic driving assistance systems (ADAS) has become more and more popular in rece... more The usage of automatic driving assistance systems (ADAS) has become more and more popular in recent years. In the design of ADAS, traffic sign detection (TSD) and traffic sign recognition (TSR) are two important functions and have been widely studied in the literature. This paper addresses the design of a vision-based TSD and recognition (TSDR) system, which is computationally efficient and can run on a common smartphone with real time performance. To achieve this, a novel TSD algorithm is proposed based on the Maximally Stable Extremal Regions (MSER) algorithm to accurately detect all traffic sign candidates in real time. Then, the feature vector of each candidate region is extracted via the Histogram of Oriented Gradient (HOG) algorithm. To recognize the traffic sign, the proposed TSR algorithm is designed by combining a Linear Support Vector Machine (LSVM) classifier with a voting process to improve the traffic sign recognition rate in real-world environments. The proposed TSDR system had been implemented on iOS embedded platform and can operate at an average speed of 30 fps for processing 640 × 480 video streams. Moreover, experimental results show that the proposed system achieves about 96% precision rate for recognition on the German Traffic Sign Recognition Benchmark (GTSRB). Therefore, the proposed TSDR algorithm has a potential to be used in realistic products.
"ICIC Express Letters, Part B: Applications An International Journal of Research and Surveys", 2016
Dynamic range compression is an important function used in modern digital video cameras and displ... more Dynamic range compression is an important function used in modern digital video cameras and displays to improve visual quality of standard dynamic range color images. This chapter presents a real-time implementation of an adaptive contrast-enhancing image dynamic range compression algorithm on a graphics processing unit (GPU) for color image enhancement. To achieve this purpose, an image-dependent nonlinear intensity transfer function is first presented to produce a satisfactory dynamic-range compression result with less color artifacts. The proposed algorithm is then derived by combining the proposed nonlinear intensity transfer function with an existing simultaneous dynamic range compression and local-contrast enhancement (SDRCLCE) algorithm, which is a parallelizable method to compress image dynamic range while enhancing local contrast of output images. Finally, the proposed algorithm is implemented on the GPU by using NVIDIA Compute Unified Device Architecture (CUDA), achieving ...
Planar tracking is an essential function in computer vision to address the problem of tracking a ... more Planar tracking is an essential function in computer vision to address the problem of tracking a planar target affected by illumination, deformation and blurring with real-time performance. To achieve this goal, the tracking algorithm has to train a model of the planar target and uses this model to estimate the change of target pose. However, when the target is changed, the algorithm needs to retrain a model for the new target. To address this problem, this paper presents a novel deep learning based planar tracking algorithm, which can efficiently estimate the pose change of the target without retraining a new model when selecting a new target. The proposed convolutional neural network (CNN) model only requires two input images, one is the reference image, and the other one is the query image. Then, the pose change of the planar target between the two images can be directly estimated from the output of the proposed CNN model. Experimental results show that the proposed algorithm per...
Computer Vision – ECCV 2020 Workshops
The Visual Object Tracking challenge VOT2020 is the eighth annual tracker benchmarking activity o... more The Visual Object Tracking challenge VOT2020 is the eighth annual tracker benchmarking activity organized by the VOT initiative. Results of 58 trackers are presented; many are state-of-the-art trackers published at major computer vision conferences or in journals in the recent years. The VOT2020 challenge was composed of five sub-challenges focusing on different tracking domains: (i) VOT-ST2020 challenge focused on short-term tracking in RGB, (ii) VOT-RT2020 challenge focused on "real-time" short-term tracking in RGB, (iii) VOT-LT2020 focused on long-term tracking namely coping with target disappearance and reappearance, (iv) VOT-RGBT2020 challenge focused on short-term tracking in RGB and thermal imagery and (v) VOT-RGBD2020 challenge focused on long-term tracking in RGB and depth imagery. Only the VOT-ST2020 datasets were refreshed. A significant novelty is introduction of a new VOT short-term tracking evaluation methodology, and introduction of segmentation ground truth in the VOT-ST2020 challenge-bounding boxes will no longer be used in the VOT-ST challenges. A The Eighth Visual Object Tracking VOT2020 Challenge Results 3 new VOT Python toolkit that implements all these novelites was introduced. Performance of the tested trackers typically by far exceeds standard baselines. The source code for most of the trackers is publicly available from the VOT page. The dataset, the evaluation kit and the results are publicly available at the challenge website 47 .
2018 International Automatic Control Conference (CACS), 2018
This paper presents a novel convolutional neural network (CNN) based high-level control architect... more This paper presents a novel convolutional neural network (CNN) based high-level control architecture that uses deep learning technique to realize autonomous picking control of a six-degree-of-freedom (6-DoF) manipulator using the visual information only. The proposed manipulator control system uses a stereo camera as a measurement device to capture a stereo image of the scene in front of the robot. Then, the proposed CNN-based picking controller uses the captured stereo image as an input to predict the optimal picking control command of the manipulator directly. In the collection of the training dataset, we controlled the manipulator to pick up the object-of-interest (OOI) manually and recorded the stereo images and the corresponding control commands. In the CNN training phase, the supervised end-to-end learning technique is used to learn the mapping between the stereo image observation and the picking control commands of the 6-DoF manipulator. Experimental results show that the pro...
Sensors
In this paper, a manipulation planning method for object re-orientation based on semantic segment... more In this paper, a manipulation planning method for object re-orientation based on semantic segmentation keypoint detection is proposed for robot manipulator which is able to detect and re-orientate the randomly placed objects to a specified position and pose. There are two main parts: (1) 3D keypoint detection system; and (2) manipulation planning system for object re-orientation. In the 3D keypoint detection system, an RGB-D camera is used to obtain the information of the environment and can generate 3D keypoints of the target object as inputs to represent its corresponding position and pose. This process simplifies the 3D model representation so that the manipulation planning for object re-orientation can be executed in a category-level manner by adding various training data of the object in the training phase. In addition, 3D suction points in both the object’s current and expected poses are also generated as the inputs of the next operation stage. During the next stage, Mask Regi...
Applied Sciences
In the natural science curriculum, chemistry is a very important domain. However, when conducting... more In the natural science curriculum, chemistry is a very important domain. However, when conducting chemistry experiments, safety issues need to be taken seriously, and excessive material waste may be caused during the experiment. Based on the 11-year-old student science curriculum, this paper proposed a virtual chemistry laboratory, which was designed by combining a virtual experiment application with physical teaching materials. The virtual experiment application was a virtual experiment laboratory environment created by using selected experimental equipment cards in combination with augmented reality (AR) technology. The physical teaching materials included all virtual equipment required for experiment units. Each piece of equipment had corresponding cards for learners to choose from and utilize in specific experimental operations. It was hoped that students were able to achieve the desired learning effectiveness of experimental teaching while reducing the waste of experimental mat...
Proceedings of The 3rd International Conference on Intelligent Systems and Image Processing 2015, 2015
IET Computer Vision
Traffic sign recognition is a very important function in automatic driving assistance systems (AD... more Traffic sign recognition is a very important function in automatic driving assistance systems (ADAS). This study addresses the design and implementation of a vision-based ADAS based on an image-based speed-limit sign (SLS) recognition algorithm, which can automatically detect and recognise SLS on the road in real-time. To improve the recognition rate of SLS having different orientations and scales in the image, this study also presents a new sign content description algorithm, which describes the detected road sign using centroid-to-contour (CtC) distances of the extracted sign content. The proposed CtC descriptor is robust to translation, rotation and scale changes of the SLS in the image. This advantage improves the recognition accuracy of a support vector machine classifier trained using a large database of traffic signs. The proposed SLS recognition method had been implemented on two different embedded platforms, each of them equipped with an ARM-based Quad-Core CPU running Android 4.4 operating system. Experimental results validate that the proposed method not only provides a high recognition rate, but also achieves real-time performance up to 30 frames per second for processing 1280 × 720 video streams running on a commercial ARM-based smartphone.
2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)
The Visual Object Tracking challenge VOT2021 is the ninth annual tracker benchmarking activity or... more The Visual Object Tracking challenge VOT2021 is the ninth annual tracker benchmarking activity organized by the VOT initiative. Results of 71 trackers are presented; many are state-of-the-art trackers published at major computer vision conferences or in journals in recent years. The VOT2021 challenge was composed of four sub-challenges focusing on different tracking domains: (i) VOT-ST2021 challenge focused on short-term tracking in RGB, (ii) VOT-RT2021 challenge focused on "real-time" short-term tracking in RGB, (iii) VOT-LT2021 focused on long-term tracking, namely coping with target disappearance and reappearance and (iv) VOT-RGBD2021 challenge focused on long-term tracking in RGB and depth imagery. The VOT-ST2021 dataset was refreshed, while VOT-RGBD2021 introduces a training dataset and sequestered dataset for winner identification. The source code for most of the trackers, the datasets, the evaluation kit and the results along with the source code for most trackers are publicly available at the challenge website 1 .