Linlin Shen | Shenzhen University (original) (raw)
Papers by Linlin Shen
Mathematics
Precise vertebrae segmentation is essential for the image-related analysis of spine pathologies s... more Precise vertebrae segmentation is essential for the image-related analysis of spine pathologies such as vertebral compression fractures and other abnormalities, as well as for clinical diagnostic treatment and surgical planning. An automatic and objective system for vertebra segmentation is required, but its development is likely to run into difficulties such as low segmentation accuracy and the requirement of prior knowledge or human intervention. Recently, vertebral segmentation methods have focused on deep learning-based techniques. To mitigate the challenges involved, we propose deep learning primitives and stacked Sparse autoencoder-based patch classification modeling for Vertebrae segmentation (SVseg) from Computed Tomography (CT) images. After data preprocessing, we extract overlapping patches from CT images as input to train the model. The stacked sparse autoencoder learns high-level features from unlabeled image patches in an unsupervised way. Furthermore, we employ supervi...
Procedings of the British Machine Vision Conference 2005, 2005
A discriminative and robust feature-Kernel enhanced informative Gabor feature is proposed in this... more A discriminative and robust feature-Kernel enhanced informative Gabor feature is proposed in this paper for face recognition. Mutual information is applied to select a set of informative and non-redundant Gabor features, which are then further enhanced by Kernel methods for recognition. When compared with an approach using the downsampled Gabor features, our methods introduce advantages on computation, memory cost and accuracy. The proposed method has also been fully tested on the FERET database according to the evaluation protocol, significant improvements on the test set is observed. Compared with the classical Gabor feature extraction approach using complex convolution process, our method requires less than 4ms to retrieve a few hundreds of features. Due to the substantially reduced feature dimension, only 4 seconds are required to recognize 200 face images.
IEEE Transactions on Image Processing, 2021
Though widely used in image classification, convolutional neural networks (CNNs) are prone to noi... more Though widely used in image classification, convolutional neural networks (CNNs) are prone to noise interruptions, i.e. the CNN output can be drastically changed by small image noise. To improve the noise robustness, we try to integrate CNNs with wavelet by replacing the common down-sampling (maxpooling, strided-convolution, and average pooling) with discrete wavelet transform (DWT). We firstly propose general DWT and inverse DWT (IDWT) layers applicable to various orthogonal and biorthogonal discrete wavelets like Haar, Daubechies, and Cohen, etc., and then design wavelet integrated CNNs (WaveCNets) by integrating DWT into the commonly used CNNs (VGG, ResNets, and DenseNet). During the down-sampling, WaveCNets apply DWT to decompose the feature maps into the low-frequency and high-frequency components. Containing the main information including the basic object structures, the low-frequency component is transmitted into the following layers to generate robust high-level features. The high-frequency components are dropped to remove most of the data noises. The experimental results show that WaveCNets achieve higher accuracy on ImageNet than various vanilla CNNs. We have also tested the performance of WaveCNets on the noisy version of ImageNet, ImageNet-C and six adversarial attacks, the results suggest that the proposed DWT/IDWT layers could provide better noise-robustness and adversarial robustness. When applying WaveCNets as backbones, the performance of object detectors (i.e., faster R-CNN and RetinaNet) on COCO detection dataset are consistently improved. We believe that suppression of aliasing effect, i.e. separation of low frequency and high frequency information, is the main advantages of our approach. The code of our DWT/IDWT layer and different WaveCNets are available at https://github.com/CVI-SZU/WaveCNet.
International journal of biomedical soft computing and human sciences, 2009
A Shrlrport Vbctor MZichine (SVAny foce identij7cation neethod using optirnized Gabor.features is... more A Shrlrport Vbctor MZichine (SVAny foce identij7cation neethod using optirnized Gabor.features is presented in this papen 200 Gabor .12iatures are ,first selected Lly a boosting aigorithm, which are then combined with SVM to build a two-class basedfoce recognition optstem. Vvaiile computation and men;oiy cost ofthe Gaborfaature extraction process has been signij7cantly redueed our method has achieved the same accuracy as a Gaborfaature andLDA based multi-ctass system.
International journal of biomedical soft computing and human sciences, 2009
The paper was received on Jan. 2, 20e8.) Abstraet: A new .fingei:print recognition opproach based... more The paper was received on Jan. 2, 20e8.) Abstraet: A new .fingei:print recognition opproach based on faatztres extracted.from the wcn,eiet domain is presented fVie 64-subband structure proposed by the EBI wse stan`lard is used to decompose the.fi'eguency ofthe imcrge. T7ie E177ciency ofthe method is achieved by using the k;・nearest neighbor (kLArlNl} classij7er. 7?ie resutt is compared with ether image-based methods, Fbr compressed.fingenprint images, this proposed rnethodcan achieve much lower eomputationat q07orts.
PRICAI 2019: Trends in Artificial Intelligence, 2019
Human face image contains abundant information including expression, age and gender, etc. Therefo... more Human face image contains abundant information including expression, age and gender, etc. Therefore, extracting discriminative feature for certain attribute while expelling others is critical for single facial attribute analysis. In this paper, we propose an adversarial facial expression recognition system, named expression distilling and dispelling learning (ED 2 L), to extract discriminative expression feature from a given face image. The proposed ED 2 L framework composed of two branches, i.e. expression distilling branch ED 2 L-t and expression dispelling branch ED 2 L-p. The ED 2 L-t branch aims to extract the expression-related feature, while the ED 2 L-p branch extracts the non-related feature. The disentangled features jointly serve as a complete representation of the face. Extensive experiments on several benchmark databases, i.e. the CK+, MMI, BU-3DFE and Oulu-CASIA, demonstrate the effectiveness of the proposed ED 2 L framework.
Biometric Recognition, 2017
Hyperspectral palmprint contains various information in the joint spatial-spectral domain. One cr... more Hyperspectral palmprint contains various information in the joint spatial-spectral domain. One crucial task in hyperspectral palmprint recognition is how to extract spatial-spectral features. Since hyperspectral palmprint is three dimensional, most of the existing 2D based algorithms, such as collaborative representation (CR) based framework [15], may not fully explore the information on the spectral domain. Although 3D Gabor filter [18] can be utilized to encode the information on the joint spatial-spectral domain, the texture direction information such as the surface map may not be explored sufficiently. In this work, a novel response-competition (ResCom) feature is proposed to present the spectral information of hyperspectral palmprint based on 3D Gabor filters. Incorporated with the 2D surface map, the ResCom feature can encode not only the 2D texture but also the 3D response variation. Therefore, features of hyperspectral palmprint will be extracted efficiently on the joint spatial-spectral domain. By fusing Block-wise and ResCom features, the proposed approach achieves so far the highest recognition rate of 99.43% on the public hyperspectral palmprint database.
A Shrlrport Vbctor MZichine (SVAny foce identij7cation eethod using optirnized Gabor.features is ... more A Shrlrport Vbctor MZichine (SVAny foce identij7cation eethod using optirnized Gabor.features is presented in this papen 200 Gabor .12iatures are ,first selected Lly a boosting aigorithm, which are then combined with SVM to build a two-class basedfoce recognition optstem. Vvaiile computation and men;oiy cost ofthe Gaborfaature extraction process has been signij7cantly redueed our method has achieved the same accuracy as a Gaborfaature andLDA based multi-ctass system. KleyivoJzis Gahor.features, Support Vbctor MZichine, hace identijication,
ArXiv, 2021
3D neuron segmentation is a key step for the neuron digital reconstruction, which is essential fo... more 3D neuron segmentation is a key step for the neuron digital reconstruction, which is essential for exploring brain circuits and understanding brain functions. However, the fine line-shaped nerve fibers of neuron could spread in a large region, which brings great computational cost to the segmentation in 3D neuronal images. Meanwhile, the strong noises and disconnected nerve fibers in the image bring great challenges to the task. In this paper, we propose a 3D wavelet and deep learning based 3D neuron segmentation method. The neuronal image is first partitioned into neuronal cubes to simplify the segmentation task. Then, we design 3D WaveUNet, the first 3D wavelet integrated encoder-decoder network, to segment the nerve fibers in the cubes; the wavelets could assist the deep networks in suppressing data noise and connecting the broken fibers. We also produce a Neuronal Cube Dataset (NeuCuDa) using the biggest available annotated neuronal image dataset, BigNeuron, to train 3D WaveUNet...
In the context of supervised tensor learning, preserving the structural information and exploitin... more In the context of supervised tensor learning, preserving the structural information and exploiting the discriminative nonlinear relationships of tensor data are crucial for improving the performance of learning tasks. Based on tensor factorization theory and kernel methods, we propose a novel Kernelized Support Tensor Machine (KSTM) which integrates kernelized tensor factorization with maximum-margin criterion. Specifically, the kernelized factorization technique is introduced to approximate the tensor data in kernel space such that the complex nonlinear relationships within tensor data can be explored. Further, dual structural preserving kernels are devised to learn the nonlinear boundary between tensor data. As a result of joint optimization, the kernels obtained in KSTM exhibit better generalization power to discriminative analysis. The experimental results on realworld neuroimaging datasets show the superiority of KSTM over the state-of-the-art techniques.
2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017
Owing to prominence as a diagnostic tool for probing the neural correlates of cognition, neuroima... more Owing to prominence as a diagnostic tool for probing the neural correlates of cognition, neuroimaging tensor data has been the focus of intense investigation. Although many supervised tensor learning approaches have been proposed, they either cannot capture the nonlinear relationships of tensor data or cannot preserve the complex multi-way structural information. In this paper, we propose a Multi-way Multi-level Kernel (MMK) model that can extract discriminative, nonlinear and structural preserving representations of tensor data. Specifically, we introduce a kernelized CP tensor factorization technique, which is equivalent to performing the low-rank tensor factorization in a possibly much higher dimensional space that is implicitly defined by the kernel function. We further employ a multi-way nonlinear feature mapping to derive the dual structural preserving kernels, which are used in conjunction with kernel machines (e.g., SVM). Extensive experiments on real-world neuroimages demonstrate that the proposed MMK method can effectively boost the classification performance on diverse brain disorders (i.e., Alzheimer's disease, ADHD, and HIV).
2019 IEEE International Conference on Image Processing (ICIP), 2019
A facial expression image can be considered as an addition of expressive component to a neutral e... more A facial expression image can be considered as an addition of expressive component to a neutral expression face. With this in mind, in this paper, we propose a novel end-to-end adversarial disentangled feature learning (ADFL) framework for facial expression recognition. The ADFL framework is mainly composed of three branches: expression disentangling branch ADFL-d, neutral expression branch ADFL-n and residual expression branch ADFL-r. The ADFL-d and ADFL-n aim to extract the expressive component and neutral component, respectively. The ADFL-r extracts the residual expression by calculating the difference between feature maps of ADFL-d and ADFL-n, and uses the residual expression feature for expression classification. Experimental results on several benchmark databases (CK+, MMI and Oulu-CASIA) show that the proposed method has remarkable performance compared to state-of-the-art methods.
IEEE Transactions on Multimedia, 2020
As 2D and 3D data present different views of the same face, the features extracted from them can ... more As 2D and 3D data present different views of the same face, the features extracted from them can be both complementary and redundant. In this paper, we present a novel and efficient orthogonalization-guided feature fusion network, namely OGF 2 Net, to fuse the features extracted from 2D and 3D faces for facial expression recognition. While 2D texture maps are fed into a 2D feature extraction pipeline (FE2DNet), the attribute maps generated from 3D data are concatenated as input of the 3D feature extraction pipeline (FE3DNet). The two networks are separately trained at the first stage and frozen in the second stage for late feature fusion, which can well address the unavailability of a large number of 3D+2D face pairs. To reduce the redundancies among features extracted from 2D and 3D streams, we design an orthogonal loss-guided feature fusion network to orthogonalize the features before fusing them. Experimental results show that the proposed method significantly outperforms the state-of-the-art algorithms on both the BU-3DFE and Bosphorus databases. While accuracies as high as 89.05% (P1 protocol) and 89.07% (P2 protocol) are achieved on the BU-3DFE database, an accuracy of 89.28% is achieved on the Bosphorus database. The complexity analysis also suggests that our approach achieves a higher processing speed while simultaneously requiring lower memory costs.
IEEE Transactions on Cybernetics, 2019
Due to the importance of facial expressions in human-machine interaction, a number of handcrafted... more Due to the importance of facial expressions in human-machine interaction, a number of handcrafted features and deep neural networks have been developed for facial expression recognition. While a few studies have shown the similarity between the handcrafted features and the features learned by deep network, a new feature loss is proposed to use feature bias constraint of handcrafted and deep features to guide the deep feature learning during the early training of network. The feature maps learned with and without the proposed feature loss for a toy network suggest that our approach can fully explore the complementarity between handcrafted features and deep features. Based on the feature loss, a general framework for embedding the traditional feature information into deep network training was developed and tested using the FER2013, CK+, Oulu-CASIA, and MMI datasets. Moreover, adaptive loss weighting strategies are proposed to balance the influence of different losses for different expression databases. The experimental results show that the proposed feature loss with adaptive weighting achieves much better accuracy than the original handcrafted feature and the network trained without using our feature loss. Meanwhile, the feature loss with adaptive weighting can provide complementary information to compensate for the deficiency of a single feature.
Journal of Instrumentation, 2018
In the search for neutrinoless double-beta decay, the high-pressure gaseous Time Projection Chamb... more In the search for neutrinoless double-beta decay, the high-pressure gaseous Time Projection Chamber has a distinct advantage, because the ionization charge tracks produced by particle interactions are extended and the detector captures the full three-dimensional charge distribution with appropriate charge readout systems. Such information of tracks provides a crucial extra-handle for discriminating signal events against backgrounds. In this paper, we constructed a toy model to demonstrate where the discrimination power comes from and how much of it the neural network models have already harnessed. Then we adapted 3-dimensional convolutional and residual neural networks on the simulated double-beta and background charge tracks and tested their capabilities in classifying these two types of events. We show that both the 3D structure and the overall depth of the neural networks significantly improve the accuracy of the classifier and lead to results better than previous works. We also studied their performance under various spatial granularities as well as different diffusion and noise conditions. The results indicate that the methods are stable and generalize well despite varying experimental conditions. K : Analysis and statistical methods; Pattern recognition, cluster finding, calibration and fitting methods; Double-beta decay detectors; Time projection chambers A X P : 1803.01482 1Corresponding author.
IEEE Transactions on Circuits and Systems for Video Technology, 2019
IEEE Access, 2019
Deep neural networks (DNNs) have been widely applied to the automatic analysis of medical images ... more Deep neural networks (DNNs) have been widely applied to the automatic analysis of medical images for disease diagnosis and to help human experts by efficiently processing immense amounts of images. While the handcrafted feature has been used for eye disease detection or classification since the 1990s, DNN was recently adopted in this area and showed a very promising performance. Since handcrafted and deep feature can extract complementary information, we propose, in this paper, three different integration frameworks to combine handcrafted and deep feature for optical coherence tomography image-based eye disease classification. In addition, to integrate the handcrafted feature at the input and fully connected layers using existing networks, such as VGG, DenseNet, and Xception, a novel ribcage network (RC Net) is also proposed for feature integration at middle layers. For RC Net, two ''rib'' channels are designed to independently process deep and handcrafted features, and another so-called ''spine'' channel is designed for the integration. While dense blocks are the main components of the three channels, sum operation is proposed for the feature map integration. Our experimental results showed that the deep networks achieved better classification accuracy after the integration of the handcrafted features, e.g., scale-invariant feature transform and Gabor. The RC Net showed the best performance among all the proposed feature integration methods. INDEX TERMS Artificial intelligence, deep learning, optical coherence tomography, feature integration.
Sensors, 2018
Skin lesions are a severe disease globally. Early detection of melanoma in dermoscopy images sign... more Skin lesions are a severe disease globally. Early detection of melanoma in dermoscopy images significantly increases the survival rate. However, the accurate recognition of melanoma is extremely challenging due to the following reasons: low contrast between lesions and skin, visual similarity between melanoma and non-melanoma lesions, etc. Hence, reliable automatic detection of skin tumors is very useful to increase the accuracy and efficiency of pathologists. In this paper, we proposed two deep learning methods to address three main tasks emerging in the area of skin lesion image processing, i.e., lesion segmentation (task 1), lesion dermoscopic feature extraction (task 2) and lesion classification (task 3). A deep learning framework consisting of two fully convolutional residual networks (FCRN) is proposed to simultaneously produce the segmentation result and the coarse classification result. A lesion index calculation unit (LICU) is developed to refine the coarse classification results by calculating the distance heat-map. A straightforward CNN is proposed for the dermoscopic feature extraction task. The proposed deep learning frameworks were evaluated on the ISIC 2017 dataset. Experimental results show the promising accuracies of our frameworks, i.e., 0.753 for task 1, 0.848 for task 2 and 0.912 for task 3 were achieved.
IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2017
Kernelized Correlation Filter (KCF) is one of the state-of-the-art object trackers. However, it d... more Kernelized Correlation Filter (KCF) is one of the state-of-the-art object trackers. However, it does not reasonably model the distribution of correlation response during tracking process, which might cause the drifting problem, especially when targets undergo significant appearance changes due to occlusion, camera shaking, and/or deformation. In this paper, we propose an Output Constraint Transfer (OCT) method that by modeling the distribution of correlation response in a Bayesian optimization framework is able to mitigate the drifting problem. OCT builds upon the reasonable assumption that the correlation response to the target image follows a Gaussian distribution, which we exploit to select training samples and reduce model uncertainty. OCT is rooted in a new theory which transfers data distribution to a constraint of the optimized variable, leading to an efficient framework to calculate correlation filters. Extensive experiments on a commonly used tracking benchmark show that the proposed method significantly improves KCF, and achieves better performance than other state-of-the-art trackers. To encourage further developments, the source code is made available https://github.com/bczhangbczhang/OCT-KCF ; [2] Baochang Zhang, Z.
Mathematics
Precise vertebrae segmentation is essential for the image-related analysis of spine pathologies s... more Precise vertebrae segmentation is essential for the image-related analysis of spine pathologies such as vertebral compression fractures and other abnormalities, as well as for clinical diagnostic treatment and surgical planning. An automatic and objective system for vertebra segmentation is required, but its development is likely to run into difficulties such as low segmentation accuracy and the requirement of prior knowledge or human intervention. Recently, vertebral segmentation methods have focused on deep learning-based techniques. To mitigate the challenges involved, we propose deep learning primitives and stacked Sparse autoencoder-based patch classification modeling for Vertebrae segmentation (SVseg) from Computed Tomography (CT) images. After data preprocessing, we extract overlapping patches from CT images as input to train the model. The stacked sparse autoencoder learns high-level features from unlabeled image patches in an unsupervised way. Furthermore, we employ supervi...
Procedings of the British Machine Vision Conference 2005, 2005
A discriminative and robust feature-Kernel enhanced informative Gabor feature is proposed in this... more A discriminative and robust feature-Kernel enhanced informative Gabor feature is proposed in this paper for face recognition. Mutual information is applied to select a set of informative and non-redundant Gabor features, which are then further enhanced by Kernel methods for recognition. When compared with an approach using the downsampled Gabor features, our methods introduce advantages on computation, memory cost and accuracy. The proposed method has also been fully tested on the FERET database according to the evaluation protocol, significant improvements on the test set is observed. Compared with the classical Gabor feature extraction approach using complex convolution process, our method requires less than 4ms to retrieve a few hundreds of features. Due to the substantially reduced feature dimension, only 4 seconds are required to recognize 200 face images.
IEEE Transactions on Image Processing, 2021
Though widely used in image classification, convolutional neural networks (CNNs) are prone to noi... more Though widely used in image classification, convolutional neural networks (CNNs) are prone to noise interruptions, i.e. the CNN output can be drastically changed by small image noise. To improve the noise robustness, we try to integrate CNNs with wavelet by replacing the common down-sampling (maxpooling, strided-convolution, and average pooling) with discrete wavelet transform (DWT). We firstly propose general DWT and inverse DWT (IDWT) layers applicable to various orthogonal and biorthogonal discrete wavelets like Haar, Daubechies, and Cohen, etc., and then design wavelet integrated CNNs (WaveCNets) by integrating DWT into the commonly used CNNs (VGG, ResNets, and DenseNet). During the down-sampling, WaveCNets apply DWT to decompose the feature maps into the low-frequency and high-frequency components. Containing the main information including the basic object structures, the low-frequency component is transmitted into the following layers to generate robust high-level features. The high-frequency components are dropped to remove most of the data noises. The experimental results show that WaveCNets achieve higher accuracy on ImageNet than various vanilla CNNs. We have also tested the performance of WaveCNets on the noisy version of ImageNet, ImageNet-C and six adversarial attacks, the results suggest that the proposed DWT/IDWT layers could provide better noise-robustness and adversarial robustness. When applying WaveCNets as backbones, the performance of object detectors (i.e., faster R-CNN and RetinaNet) on COCO detection dataset are consistently improved. We believe that suppression of aliasing effect, i.e. separation of low frequency and high frequency information, is the main advantages of our approach. The code of our DWT/IDWT layer and different WaveCNets are available at https://github.com/CVI-SZU/WaveCNet.
International journal of biomedical soft computing and human sciences, 2009
A Shrlrport Vbctor MZichine (SVAny foce identij7cation neethod using optirnized Gabor.features is... more A Shrlrport Vbctor MZichine (SVAny foce identij7cation neethod using optirnized Gabor.features is presented in this papen 200 Gabor .12iatures are ,first selected Lly a boosting aigorithm, which are then combined with SVM to build a two-class basedfoce recognition optstem. Vvaiile computation and men;oiy cost ofthe Gaborfaature extraction process has been signij7cantly redueed our method has achieved the same accuracy as a Gaborfaature andLDA based multi-ctass system.
International journal of biomedical soft computing and human sciences, 2009
The paper was received on Jan. 2, 20e8.) Abstraet: A new .fingei:print recognition opproach based... more The paper was received on Jan. 2, 20e8.) Abstraet: A new .fingei:print recognition opproach based on faatztres extracted.from the wcn,eiet domain is presented fVie 64-subband structure proposed by the EBI wse stan`lard is used to decompose the.fi'eguency ofthe imcrge. T7ie E177ciency ofthe method is achieved by using the k;・nearest neighbor (kLArlNl} classij7er. 7?ie resutt is compared with ether image-based methods, Fbr compressed.fingenprint images, this proposed rnethodcan achieve much lower eomputationat q07orts.
PRICAI 2019: Trends in Artificial Intelligence, 2019
Human face image contains abundant information including expression, age and gender, etc. Therefo... more Human face image contains abundant information including expression, age and gender, etc. Therefore, extracting discriminative feature for certain attribute while expelling others is critical for single facial attribute analysis. In this paper, we propose an adversarial facial expression recognition system, named expression distilling and dispelling learning (ED 2 L), to extract discriminative expression feature from a given face image. The proposed ED 2 L framework composed of two branches, i.e. expression distilling branch ED 2 L-t and expression dispelling branch ED 2 L-p. The ED 2 L-t branch aims to extract the expression-related feature, while the ED 2 L-p branch extracts the non-related feature. The disentangled features jointly serve as a complete representation of the face. Extensive experiments on several benchmark databases, i.e. the CK+, MMI, BU-3DFE and Oulu-CASIA, demonstrate the effectiveness of the proposed ED 2 L framework.
Biometric Recognition, 2017
Hyperspectral palmprint contains various information in the joint spatial-spectral domain. One cr... more Hyperspectral palmprint contains various information in the joint spatial-spectral domain. One crucial task in hyperspectral palmprint recognition is how to extract spatial-spectral features. Since hyperspectral palmprint is three dimensional, most of the existing 2D based algorithms, such as collaborative representation (CR) based framework [15], may not fully explore the information on the spectral domain. Although 3D Gabor filter [18] can be utilized to encode the information on the joint spatial-spectral domain, the texture direction information such as the surface map may not be explored sufficiently. In this work, a novel response-competition (ResCom) feature is proposed to present the spectral information of hyperspectral palmprint based on 3D Gabor filters. Incorporated with the 2D surface map, the ResCom feature can encode not only the 2D texture but also the 3D response variation. Therefore, features of hyperspectral palmprint will be extracted efficiently on the joint spatial-spectral domain. By fusing Block-wise and ResCom features, the proposed approach achieves so far the highest recognition rate of 99.43% on the public hyperspectral palmprint database.
A Shrlrport Vbctor MZichine (SVAny foce identij7cation eethod using optirnized Gabor.features is ... more A Shrlrport Vbctor MZichine (SVAny foce identij7cation eethod using optirnized Gabor.features is presented in this papen 200 Gabor .12iatures are ,first selected Lly a boosting aigorithm, which are then combined with SVM to build a two-class basedfoce recognition optstem. Vvaiile computation and men;oiy cost ofthe Gaborfaature extraction process has been signij7cantly redueed our method has achieved the same accuracy as a Gaborfaature andLDA based multi-ctass system. KleyivoJzis Gahor.features, Support Vbctor MZichine, hace identijication,
ArXiv, 2021
3D neuron segmentation is a key step for the neuron digital reconstruction, which is essential fo... more 3D neuron segmentation is a key step for the neuron digital reconstruction, which is essential for exploring brain circuits and understanding brain functions. However, the fine line-shaped nerve fibers of neuron could spread in a large region, which brings great computational cost to the segmentation in 3D neuronal images. Meanwhile, the strong noises and disconnected nerve fibers in the image bring great challenges to the task. In this paper, we propose a 3D wavelet and deep learning based 3D neuron segmentation method. The neuronal image is first partitioned into neuronal cubes to simplify the segmentation task. Then, we design 3D WaveUNet, the first 3D wavelet integrated encoder-decoder network, to segment the nerve fibers in the cubes; the wavelets could assist the deep networks in suppressing data noise and connecting the broken fibers. We also produce a Neuronal Cube Dataset (NeuCuDa) using the biggest available annotated neuronal image dataset, BigNeuron, to train 3D WaveUNet...
In the context of supervised tensor learning, preserving the structural information and exploitin... more In the context of supervised tensor learning, preserving the structural information and exploiting the discriminative nonlinear relationships of tensor data are crucial for improving the performance of learning tasks. Based on tensor factorization theory and kernel methods, we propose a novel Kernelized Support Tensor Machine (KSTM) which integrates kernelized tensor factorization with maximum-margin criterion. Specifically, the kernelized factorization technique is introduced to approximate the tensor data in kernel space such that the complex nonlinear relationships within tensor data can be explored. Further, dual structural preserving kernels are devised to learn the nonlinear boundary between tensor data. As a result of joint optimization, the kernels obtained in KSTM exhibit better generalization power to discriminative analysis. The experimental results on realworld neuroimaging datasets show the superiority of KSTM over the state-of-the-art techniques.
2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017
Owing to prominence as a diagnostic tool for probing the neural correlates of cognition, neuroima... more Owing to prominence as a diagnostic tool for probing the neural correlates of cognition, neuroimaging tensor data has been the focus of intense investigation. Although many supervised tensor learning approaches have been proposed, they either cannot capture the nonlinear relationships of tensor data or cannot preserve the complex multi-way structural information. In this paper, we propose a Multi-way Multi-level Kernel (MMK) model that can extract discriminative, nonlinear and structural preserving representations of tensor data. Specifically, we introduce a kernelized CP tensor factorization technique, which is equivalent to performing the low-rank tensor factorization in a possibly much higher dimensional space that is implicitly defined by the kernel function. We further employ a multi-way nonlinear feature mapping to derive the dual structural preserving kernels, which are used in conjunction with kernel machines (e.g., SVM). Extensive experiments on real-world neuroimages demonstrate that the proposed MMK method can effectively boost the classification performance on diverse brain disorders (i.e., Alzheimer's disease, ADHD, and HIV).
2019 IEEE International Conference on Image Processing (ICIP), 2019
A facial expression image can be considered as an addition of expressive component to a neutral e... more A facial expression image can be considered as an addition of expressive component to a neutral expression face. With this in mind, in this paper, we propose a novel end-to-end adversarial disentangled feature learning (ADFL) framework for facial expression recognition. The ADFL framework is mainly composed of three branches: expression disentangling branch ADFL-d, neutral expression branch ADFL-n and residual expression branch ADFL-r. The ADFL-d and ADFL-n aim to extract the expressive component and neutral component, respectively. The ADFL-r extracts the residual expression by calculating the difference between feature maps of ADFL-d and ADFL-n, and uses the residual expression feature for expression classification. Experimental results on several benchmark databases (CK+, MMI and Oulu-CASIA) show that the proposed method has remarkable performance compared to state-of-the-art methods.
IEEE Transactions on Multimedia, 2020
As 2D and 3D data present different views of the same face, the features extracted from them can ... more As 2D and 3D data present different views of the same face, the features extracted from them can be both complementary and redundant. In this paper, we present a novel and efficient orthogonalization-guided feature fusion network, namely OGF 2 Net, to fuse the features extracted from 2D and 3D faces for facial expression recognition. While 2D texture maps are fed into a 2D feature extraction pipeline (FE2DNet), the attribute maps generated from 3D data are concatenated as input of the 3D feature extraction pipeline (FE3DNet). The two networks are separately trained at the first stage and frozen in the second stage for late feature fusion, which can well address the unavailability of a large number of 3D+2D face pairs. To reduce the redundancies among features extracted from 2D and 3D streams, we design an orthogonal loss-guided feature fusion network to orthogonalize the features before fusing them. Experimental results show that the proposed method significantly outperforms the state-of-the-art algorithms on both the BU-3DFE and Bosphorus databases. While accuracies as high as 89.05% (P1 protocol) and 89.07% (P2 protocol) are achieved on the BU-3DFE database, an accuracy of 89.28% is achieved on the Bosphorus database. The complexity analysis also suggests that our approach achieves a higher processing speed while simultaneously requiring lower memory costs.
IEEE Transactions on Cybernetics, 2019
Due to the importance of facial expressions in human-machine interaction, a number of handcrafted... more Due to the importance of facial expressions in human-machine interaction, a number of handcrafted features and deep neural networks have been developed for facial expression recognition. While a few studies have shown the similarity between the handcrafted features and the features learned by deep network, a new feature loss is proposed to use feature bias constraint of handcrafted and deep features to guide the deep feature learning during the early training of network. The feature maps learned with and without the proposed feature loss for a toy network suggest that our approach can fully explore the complementarity between handcrafted features and deep features. Based on the feature loss, a general framework for embedding the traditional feature information into deep network training was developed and tested using the FER2013, CK+, Oulu-CASIA, and MMI datasets. Moreover, adaptive loss weighting strategies are proposed to balance the influence of different losses for different expression databases. The experimental results show that the proposed feature loss with adaptive weighting achieves much better accuracy than the original handcrafted feature and the network trained without using our feature loss. Meanwhile, the feature loss with adaptive weighting can provide complementary information to compensate for the deficiency of a single feature.
Journal of Instrumentation, 2018
In the search for neutrinoless double-beta decay, the high-pressure gaseous Time Projection Chamb... more In the search for neutrinoless double-beta decay, the high-pressure gaseous Time Projection Chamber has a distinct advantage, because the ionization charge tracks produced by particle interactions are extended and the detector captures the full three-dimensional charge distribution with appropriate charge readout systems. Such information of tracks provides a crucial extra-handle for discriminating signal events against backgrounds. In this paper, we constructed a toy model to demonstrate where the discrimination power comes from and how much of it the neural network models have already harnessed. Then we adapted 3-dimensional convolutional and residual neural networks on the simulated double-beta and background charge tracks and tested their capabilities in classifying these two types of events. We show that both the 3D structure and the overall depth of the neural networks significantly improve the accuracy of the classifier and lead to results better than previous works. We also studied their performance under various spatial granularities as well as different diffusion and noise conditions. The results indicate that the methods are stable and generalize well despite varying experimental conditions. K : Analysis and statistical methods; Pattern recognition, cluster finding, calibration and fitting methods; Double-beta decay detectors; Time projection chambers A X P : 1803.01482 1Corresponding author.
IEEE Transactions on Circuits and Systems for Video Technology, 2019
IEEE Access, 2019
Deep neural networks (DNNs) have been widely applied to the automatic analysis of medical images ... more Deep neural networks (DNNs) have been widely applied to the automatic analysis of medical images for disease diagnosis and to help human experts by efficiently processing immense amounts of images. While the handcrafted feature has been used for eye disease detection or classification since the 1990s, DNN was recently adopted in this area and showed a very promising performance. Since handcrafted and deep feature can extract complementary information, we propose, in this paper, three different integration frameworks to combine handcrafted and deep feature for optical coherence tomography image-based eye disease classification. In addition, to integrate the handcrafted feature at the input and fully connected layers using existing networks, such as VGG, DenseNet, and Xception, a novel ribcage network (RC Net) is also proposed for feature integration at middle layers. For RC Net, two ''rib'' channels are designed to independently process deep and handcrafted features, and another so-called ''spine'' channel is designed for the integration. While dense blocks are the main components of the three channels, sum operation is proposed for the feature map integration. Our experimental results showed that the deep networks achieved better classification accuracy after the integration of the handcrafted features, e.g., scale-invariant feature transform and Gabor. The RC Net showed the best performance among all the proposed feature integration methods. INDEX TERMS Artificial intelligence, deep learning, optical coherence tomography, feature integration.
Sensors, 2018
Skin lesions are a severe disease globally. Early detection of melanoma in dermoscopy images sign... more Skin lesions are a severe disease globally. Early detection of melanoma in dermoscopy images significantly increases the survival rate. However, the accurate recognition of melanoma is extremely challenging due to the following reasons: low contrast between lesions and skin, visual similarity between melanoma and non-melanoma lesions, etc. Hence, reliable automatic detection of skin tumors is very useful to increase the accuracy and efficiency of pathologists. In this paper, we proposed two deep learning methods to address three main tasks emerging in the area of skin lesion image processing, i.e., lesion segmentation (task 1), lesion dermoscopic feature extraction (task 2) and lesion classification (task 3). A deep learning framework consisting of two fully convolutional residual networks (FCRN) is proposed to simultaneously produce the segmentation result and the coarse classification result. A lesion index calculation unit (LICU) is developed to refine the coarse classification results by calculating the distance heat-map. A straightforward CNN is proposed for the dermoscopic feature extraction task. The proposed deep learning frameworks were evaluated on the ISIC 2017 dataset. Experimental results show the promising accuracies of our frameworks, i.e., 0.753 for task 1, 0.848 for task 2 and 0.912 for task 3 were achieved.
IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2017
Kernelized Correlation Filter (KCF) is one of the state-of-the-art object trackers. However, it d... more Kernelized Correlation Filter (KCF) is one of the state-of-the-art object trackers. However, it does not reasonably model the distribution of correlation response during tracking process, which might cause the drifting problem, especially when targets undergo significant appearance changes due to occlusion, camera shaking, and/or deformation. In this paper, we propose an Output Constraint Transfer (OCT) method that by modeling the distribution of correlation response in a Bayesian optimization framework is able to mitigate the drifting problem. OCT builds upon the reasonable assumption that the correlation response to the target image follows a Gaussian distribution, which we exploit to select training samples and reduce model uncertainty. OCT is rooted in a new theory which transfers data distribution to a constraint of the optimized variable, leading to an efficient framework to calculate correlation filters. Extensive experiments on a commonly used tracking benchmark show that the proposed method significantly improves KCF, and achieves better performance than other state-of-the-art trackers. To encourage further developments, the source code is made available https://github.com/bczhangbczhang/OCT-KCF ; [2] Baochang Zhang, Z.