Ching Te Chiu - Academia.edu (original) (raw)

Papers by Ching Te Chiu

Object detection technology has received increasing research attention with recent developments i... more Object detection technology has received increasing research attention with recent developments in automation technology. Most studies in this field, however, use RGB images as input to deep-learning classifiers, and they rarely use depth information.So, in this paper, we use images with both RGB and depth information as input to an object detection network. We base our network on the Faster R-CNN proposed by Shih et al., and we develop a fast and accurate object detection architecture. In addition to adding depth as input, we also adjust the type of anchor boxes to improve performance on some objects. We also discuss the impact of pooling training data with multiple region proposal networks (RPN) and regions of interest (ROI).Adding depth information improved the mAP by 8.15%, from 36.86% to 45.01%, when using the SUN RGB-D dataset with 10 classes. Optimizing the anchor boxes improved the mAP from 45.01% to 45.88%. After testing various architectures with different reduced RPNs, we find that the model of 1RRPN-2ROIP performs best. The running time is 0.123 s, which is 1.8 times faster than the 3D-SSD model.

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Convolutional neural networks can effectively learn features and predict the depth by considering... more Convolutional neural networks can effectively learn features and predict the depth by considering different scene types. However, previous studies have not accurately predicted the depth in cases wherein the objects or scenes were small and the background was complex. These studies have used the bilinear up-sampling method to enlarge the feature maps during training, or to disable the transfer of multiscale information to the end of the network. However, this has resulted in blurred regions in the depth maps and contour loss.This paper proposes a multi-path-multi-rate feature extractor, which can effectively extract multi-scale information to make accurate depth predictions. We used the U-NET [1] architecture to obtain depth maps with high resolution, and also used the proposed multi-path-multi-rate feature extractor to translate useful features from the encoder to the decoder. Dilated convolutions with different rates can provide different types of field-of-view information, which increases the precision of depth estimation and maintains the object contours. Finally, we conducted experiments using an indoor scene (NYUv2 [2]). The results show that the proposed framework achieved an improvement of 12.9% in RMSE, 9.9% in REL, and 9.3% in log10, and it requires approximately 0.048 seconds to predict a depth map from a single image.

2021 International Conference on Computational Science and Computational Intelligence (CSCI)

[Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing, 1992

2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2021

Most face recognition algorithms achieve great per-formances in small poses, but they are unable ... more Most face recognition algorithms achieve great per-formances in small poses, but they are unable to extract intact features for large-pose faces. In order to improve large-pose face recognition, we propose a 3D landmark-based face recognition system. We first extract RGB features from a face recognition model. Then we predict 3D landmarks and facial pose degrees via a projected 3DMM vector in the 3D landmark model. To test images, we compute the distance of two RGB features and rotate two 3D 68-point landmarks to the frontal view. If both of two features are smaller than a threshold, we purport these two images are from the same person. Compared to traditional face recognition CNN methods, the proposed method not only consider RGB features but 3D estimated landmarks. With this information, we can achieve higher performance. We conduct experiments on large-pose face datasets, CPLFW, CFPFP and IJB-B. The results outperform the state-of-the-art methods. We achieve recognition rate of 9...

Journal of Signal Processing Systems, 2021

With the latest development of automation technology, object detection technology has received mo... more With the latest development of automation technology, object detection technology has received more and more research attention. Automated object detection technology can reduce labor costs, avoid the problem visual fatigue, and is more consistent than human observers. Therefore, as deep learning has become commonplace, it has been used for object detection with great results. However, most studies in this field use RGB images as input for deep-learning classifiers instead of RGB-D input. Depth information captures both the appearance and shape of objects and can be captured in any lighting conditions. Depth information could improve the accuracy of object identification, which could improve safety in a wide range of applications. RGB color information has also been shown to be key to successful object detection. So in this paper, we improve our object detection network by using RGB-D images as input. We use the Pruning Faster R-CNN proposed by Shih et al. as a base and design an accurate and fast architecture for object detection. In addition to adding depth as input, we add other new types of anchor boxes to improve performance of some objects. We also discuss the impact of pooling training data with multiple region proposal networks (RPN) and regions of interest (ROI). We performed experiments on the SUN RGB-D and NYU.v2 datasets. The results show that after adding depth to the inputs, the mean average precision (mAP) of our architecture is 9.017% higher than the mAP of the original Faster R-CNN architecture using only RGB information as input from the SUN RGB-D datasets. Working with either dataset, the network takes only 0.123 seconds to test an RGB-D image with GPU acceleration. Adjusting the arrangement of anchor boxes improved object-detection accuracy by 1.58% when using the SUN RGB-D dataset.

2017 IEEE International Symposium on Circuits and Systems (ISCAS), 2017

In this paper, we proposed a hybrid patch search process, which combines the gradient and low fre... more In this paper, we proposed a hybrid patch search process, which combines the gradient and low frequency (LF)-based patch search to further enhance the effects of the above mentioned methods. We use the assumption of local self-similarity to limit the search area within a small window, while obtaining similar results in most cases. In the proposed framework, two different patch search methods are applied. For edge regions, we use the gradient-based patch search, whereas in smooth regions, LF-based patch search is adopted. When the difference is close between two patches of the hybrid patch search, we further compare the gradient direction for verification. In the experimental results, compared with the SR method that only use LF-based patch search and the SR method that gradient-based patch search only, our proposed method gains higher PSNR and SSIM average values. Also, the computation for high frequency (HF) reconstruction is reduced by about half compared with the gradient-based SR method.

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017

In recent years, face recognition has become a popular topic in academia and industry. Current lo... more In recent years, face recognition has become a popular topic in academia and industry. Current local methods such as the local binary pattern (LBP), and scale invariant feature transform (SIFT) perform better than holistic methods, but their high complexity levels limit their application. In addition, SIFT-based schemes are sensitive to illumination variation. We propose an LBP edge-mapped descriptor that uses maxima of gradient magnitude (MGM) points. It can completely illustrate facial contours and has low computational complexity. Under variable lighting, experimental results show that our proposed method has a 16.5% higher recognition rate and requires 9.06 times less execution time than SIFT in the FERET database subset fc. In addition, when applied to the Extended Yale Face Database B, our method outperformed SIFT-based approaches as well as saving about 70.9% in execution time. Furthermore, in uncontrolled conditions, our method has a 0.82% higher recognition rate than local derivative pattern histogram sequences (LDPHS) in the Unconstrained Facial Images (UFI) database.

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019

Object detection is an important research area in the field of computer vision. Its purpose is to... more Object detection is an important research area in the field of computer vision. Its purpose is to find all objects in an image and recognize the class of each object. Since the development of deep learning, an increasing number of studies have applied deep learning in object detection and have achieved successful results. For object detection, there are two types of network architectures: one-stage and two-stage. This study is based on the widely-used two-stage architecture, called Faster R-CNN, and our goal is to improve the inference time to achieve real-time speed without losing accuracy.First, we use pruning to reduce the number of parameters and the amount of computation, which is expected to reduce accuracy as a result. Therefore, we propose a multi-feature assisted region proposal network composed of assisted multi-feature concatenation and a reduced region proposal network to improve accuracy. Assisted multi-feature concatenation combines feature maps from different convolutional layers as inputs for a reduced region proposal network. With our proposed method, the network can find regions of interest (ROIs) more accurately. Thus, it compensates for loss of accuracy due to pruning. Finally, we use ZF-Net and VGG16 as backbones, and test the network on the PASCAL VOC 2007 dataset.

2017 IEEE International Workshop on Signal Processing Systems (SiPS), 2017

Dry and wet fingers lead to poor fingerprint quality, which means that it has impact for fingerpr... more Dry and wet fingers lead to poor fingerprint quality, which means that it has impact for fingerprint recognition and matching. Recognition methods that are based on the feature of ridge, valley, minutiae or pore are affected by skin conditions. In this paper, we propose a novel dry fingerprint detection method for images with different resolutions using ridge features. The dry fingerprints have vague pores and discontinuous and fragmented ridges. Therefore, the features that we adopt for detection are ridge continuity, ridge fragmentation and ridge/valley ratio. These features can be observed clearly under different image resolutions, so our proposed method can work on 500∼1200 dpi. We propose several ridge features and use the support vector machine to classify into two groups, dry and normal. The NASIC database (1200dpi) and FVC2002 DB1 (500dpi) are used in our experiments, the SVM classification accuracy are 99.00%, and 99.09% relatively.

2020 IEEE Workshop on Signal Processing Systems (SiPS), 2020

Semantic segmentation has been one of the most important research areas in computer vision. In se... more Semantic segmentation has been one of the most important research areas in computer vision. In semantic segmentation, both the context information and spatial information are important in semantic segmentation performance of a con-volutional neural network (CNN) model. This study focuses on methods to optimize the context information. We proposed a structure consisting of multiple upsampling blocks and concatenated Dense Block(DB), enhancing the context information on the decoder by the fusion of multiple high-level features. Furthermore, we use Blur Pooling [1], which is a better downsampling method proposed by Richard Zhang, making the CNN more shift-invariant to an input image. Finally, we determined the drawback of using the cross entropy, which can result in poor small-object-detection performance, on semantic segmentation tasks. Furthermore, we boosted the performance of our model by applying the soft-IoU loss function. Finally, we test our model's performance using the CamVid dataset established. The experimental results show that our proposed approach can provide a better ability of integrating the context feature in the upsampling path. The mIoU result of our proposed model was 70.534% on the Camvid dataset, outperforming the 65.8% mIoU result of FC-DenseNet67, which is our baseline model, obtained from the study in [2].

2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2020

Knowledge Distillation is one approach in Deep Neural Networks (DNN) to compress huge parameters ... more Knowledge Distillation is one approach in Deep Neural Networks (DNN) to compress huge parameters and high level of computation associated with a teacher model to a smaller student model. Therefore, the smaller model can be deployed in embedded systems. Most of Knowledge Distillations transfer information at the last stage of the DNN model. We propose an efficient compression method that can be split into three parts. First, we propose a cross-layer Gramian matrix to extract more features from the teacher’s model. Second, we adopt Kullback Leibler (KL) Divergence in an offline deep mutual learning (DML) environment to make the student model find a wider robust minimum. Finally, we propose the use of offline ensemble pre-trained teachers to teach a student model. With ResNet-32 as the teacher’s model and ResNet-8 as the student’s model, experimental results showed that Top-l accuracy increased by 4.38% with a 6. 11x compression rate and 5. 27x computation rate.

2021 IEEE International Symposium on Circuits and Systems (ISCAS), 2021

Deep convolution Neural Network (DCNN) has been widely used in computer vision tasks. However, fo... more Deep convolution Neural Network (DCNN) has been widely used in computer vision tasks. However, for edge device, even then inference has too large computational complexity and data access amount. Due to the mentioned shortcomings, the inference latency of state-of-the-art models are still impractical for real-world applications. In this paper, we proposed a high utilization energy-aware real-time inference deep convolutional neural network accelerator, which outperforms the current accelerators. First, we use 1x1 size convolution kernels as the smallest unit of the computing unit. And we design suitable computing unit for different models based on the requirement of each model. Second, we use Reuse Feature SRAM to store the output of current layer in the chip and use as the input of the next layer. Moreover, we import Output Reuse Strategy and Ring Stream Data flow not only to expand the reuse rate of data in the chip but to reduce the amount of data exchange between chips and DRAM. Finally, we present On-fly Pooling Module to let the calculation of the Pooling layer to be completed directly in the chip. With the aid of the proposed method in this paper, the implemented CNN acceleration chip has extreme high hardware utilization rate. We reduce a generous amount of data transfer on the specific module, ECNN [1]. Compared to the methods without reuse strategy, we can reduce 533 times of data access amount. At the same time, we have enough computing power to perform real-time execution of the existing image classification model, VGG16 [2] and MobileNet [3]. Compared with the design in [4], we can speed up 7.52 times and have 1.92x energy efficiency.

Journal of Systems Architecture, 2020

This is a PDF file of an article that has undergone enhancements after acceptance, such as the ad... more This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Journal of Systems Architecture, 2019

Deep neural networks are powerful, but using these networks is both memory and time consuming due... more Deep neural networks are powerful, but using these networks is both memory and time consuming due to their numerous parameters and large amounts of computation. Many studies have been conducted on compressing the models on the parameter-level as well as on the bit-level. Here, we propose an efficient strategy to compress on the layers that are computation or memory consuming. We compress the model by introducing global average pooling, performing iterative pruning on the filters with the proposed order-deciding scheme in order to prune more efficiently, applying truncated SVD to the fully-connected layer, and performing quantization. Experiments on the VGG16 model show that our approach achieves a 60.9× compression ratio in off-line storage with about 0.848% and 0.1378% loss of accuracy in the top-1 and top-5 classification results, respectively, with the validation dataset of ILSVRC2012. Our approach also shows good compression results on AlexNet and faster R-CNN.

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015

Scale-invariant feature transform (SIFT) is a feature point based method using the orientation de... more Scale-invariant feature transform (SIFT) is a feature point based method using the orientation descriptor for pattern recognition. It is robust under the variation of scale and rotation changes, but the computation cost increases with its feature points. Local binary pattern (LBP) is a pixel based texture extraction method that achieves high face recognition rate with low computation time. We propose a new descriptor that combines the LBP texture and SIFT orientation information to improve the recognition rate using limited number of interest points. By adding the LBP texture information, we could reduce the SIFT orientation number in the descriptor by half. Therefore, we could reduce the computation time while keeping the recognition rate. In addition, we propose a matching method to reserve the effective matching pairs and calculate the similarity between two images. By combining these two methods, we can extract different face details effectively and further reduce computational cost. We also propose an approach using the region of interest (ROI) to remove the useless interest points for saving our computation time and maintaining the recognition rate. Experimental results demonstrate that our proposed LBP orientation descriptor can reduce around 30% computation time compared with the original SIFT descriptor while maintaining the recognition rate in FERET database. Adding the ROI at our proposed LBP orientation descriptor can reduce around 58% computation time compared with the original SIFT descriptor in FERET database. For extended YaleB database, our method has 1.2% higher recognition rate than original SIFT method and reduces 28.6% computational time. The experimental results with adding ROI reduces 61.9% computation time for YaleB database.

Multimedia and Expo, 2007 IEEE International Conference on, 2007

To improve memory access efficiency and to reduce power consumption in HDTV video decoders, we pr... more To improve memory access efficiency and to reduce power consumption in HDTV video decoders, we propose a novel memory address mapping method and an efficient memory accessing architecture. The memory address mapping enables a computation-free memory address generation from the logical address of the data word in a video frame. The simple address generation is achieved by combining neighboring macroblocks into groups and stores the group of macroblocks in the same row of the external memory. By grouping suitable macroblocks, depending on interlaced or progressive scanning, we significantly reduce the cross-row memory accessing in the external memory, which is both time consuming and power consuming. In the memory accessing architecture, we rearrange the access order of luminance and chrominance data in motion compensations to further reduce the number of row changes of the external memory. Our analysis shows that the number of row changes is reduced by 87.67% and throughput of our memory-accessing scheme is improved by 30.91% compared to conventional approaches.