Xueping Peng - Academia.edu (original) (raw)
Papers by Xueping Peng
Computers in Biology and Medicine
Cornell University - arXiv, Sep 6, 2022
Next basket recommender systems (NBRs) aim to recommend a user's next (shopping) basket of items ... more Next basket recommender systems (NBRs) aim to recommend a user's next (shopping) basket of items via modeling the user's preferences towards items based on the user's purchase history, usually a sequence of historical baskets. Due to its wide applicability in the real-world E-commerce industry, the studies NBR have attracted increasing attention in recent years. NBRs have been widely studied and much progress has been achieved in this area with a variety of NBR approaches having been proposed. However, an important issue is that there is a lack of a systematic and unified evaluation over the various NBR approaches. Different studies often evaluate NBR approaches on different datasets, under different experimental settings, making it hard to fairly and effectively compare the performance of different NBR approaches. To bridge this gap, in this work, we conduct a systematical empirical study in NBR area. Specifically, we review the representative work in NBR and analyze their cons and pros. Then, we run the selected NBR algorithms on the same datasets, under the same experimental setting and evaluate their performances using the same measurements. This provides a unified framework to fairly compare different NBR approaches. We hope this study can provide a valuable reference for the future research in this vibrant area.
Lecture Notes in Computer Science, 2022
2022 International Joint Conference on Neural Networks (IJCNN)
2022 International Joint Conference on Neural Networks (IJCNN)
Cornell University - arXiv, Jul 20, 2021
Healthcare representation learning on the Electronic Health Records (EHRs) is crucial for downstr... more Healthcare representation learning on the Electronic Health Records (EHRs) is crucial for downstream medical prediction tasks in health informatics. Many natural language processing techniques, such as word2vec, RNN and self-attention, have been adapted to learn medical representations from hierarchical and time-stamped EHRs data, but fail when they lack either general or task-specific data.
Frontiers in Neuroinformatics
Magnetoencephalography is a noninvasive neuromagnetic technology to record epileptic activities f... more Magnetoencephalography is a noninvasive neuromagnetic technology to record epileptic activities for the pre-operative localization of epileptogenic zones, which has received increasing attention in the diagnosis and surgery of epilepsy. As reported by recent studies, pathological high frequency oscillations (HFOs), when utilized as a biomarker to localize the epileptogenic zones, result in a significant reduction in seizure frequency, even seizure elimination in around 80% of cases. Thus, objective, rapid, and automatic detection and recommendation of HFOs are highly desirable for clinicians to alleviate the burden of reviewing a large amount of MEG data from a given patient. Despite the advantage, the performance of existing HFOs rarely satisfies the clinical requirement. Consequently, no HFOs have been successfully applied to real clinical applications so far. In this work, we propose a multi-head self-attention-based detector for recommendation, termed MSADR, to detect and recomm...
With the rapid development of the Internet and WWW, it is more and more important for people to a... more With the rapid development of the Internet and WWW, it is more and more important for people to access quality web information. Thus the problem of enabling users to quickly and accurately find information has become an urgent issue. As one of the basic ways to solve this problem, personalized information services have been focusing on fulfilling the personalized information requirements of different users based on their actual demands, preference characteristics, behaviour patterns, etc. This thesis focuses on enhancing web log based recommendation by personalized retrieval, and its main works and innovations include: • For personalized retrieval, the thesis proposes two models to improve user experience and optimize search performance. The first is a query suggestion model based on query semantics and click-through data. This model calculates the subject relevance between queries, and then combines the semantic information and the relevance of the query-click matrix model as this ...
The increasing amount of Tibetan information has made Tibetan text processing popular and highly ... more The increasing amount of Tibetan information has made Tibetan text processing popular and highly significant. In this study, Tibetan hot topic extraction and public opinion classification were investigated to accelerate the development of Tibetan information processing. First, Tibetan word segmentation in Tibetan hot topic extraction was presented. Second, feature selection based on term frequency and that based on document frequency was adopted to decrease feature dimensions. Third, a vector space model was used to conduct text representation. Finally, a statistical-based method was utilized to extract hot topics. In studying public opinion classification, a keyword table of public opinion needed to be established to conduct Tibetan public opinion classification. According to field, 18 classes were selected and used for public opinion classification. A keyword table of public opinion was constructed by domain experts. The approach to public opinion classification was introduced on ...
ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Lecture Notes in Computer Science, 2022
Fuel Processing Technology, 2022
Frontiers in Molecular Biosciences, 2022
High-frequency oscillations (HFOs), observed within 80–500 Hz of magnetoencephalography (MEG) dat... more High-frequency oscillations (HFOs), observed within 80–500 Hz of magnetoencephalography (MEG) data, are putative biomarkers to localize epileptogenic zones that are critical for the success of surgical epilepsy treatment. It is crucial to accurately detect HFOs for improving the surgical outcome of patients with epilepsy. However, in clinical practices, detecting HFOs in MEG signals mainly depends on visual inspection by clinicians, which is very time-consuming, labor-intensive, subjective, and error-prone. To accurately and automatically detect HFOs, machine learning approaches have been developed and have demonstrated the promising results of automated HFO detection. More recently, the transformer-based model has attracted wide attention and achieved state-of-the-art performance on many machine learning tasks. In this paper, we are investigating the suitability of transformer-based models on the detection of HFOs. Specifically, we propose a transformer-based HFO detection framewor...
Journal of Biomedical Informatics, 2022
The goal of mortality prediction task is to predict the future death risk of patients according t... more The goal of mortality prediction task is to predict the future death risk of patients according to their previous Electronic Healthcare Records (EHR). The main challenge of mortality prediction is how to design an accurate and robust predictive model with sequential, multivariate, sparse and irregular EHR data. In addition, the performance of model may be affected by lack of sufficient information of some patients with rare diseases in EHRs. To address these challenges, we propose a model to fuse Sequential visits and Medical Ontology to predict patients' death risk. SeMO not only learns reasonable embeddings for medical concepts from sequential and irregular visits, but also exploits medical ontology to improve the prediction performance. With integration of multivariate features, SeMO learns robust representations of medical codes, mitigating data insufficiency and insightful sequential dependencies among patient's visits. Experimental results on real world datasets prove that the proposed SeMO improves the prediction performance compared with the baseline approaches. Our model achieves an precision of up to 0.975. Compared with RNN, the precision has been improved up to 2.204%.
Session-based recommendations (SBRs) recommend the next item for an anonymous user by modeling th... more Session-based recommendations (SBRs) recommend the next item for an anonymous user by modeling the dependencies between items in a session. Benefiting from the superiority of graph neural networks (GNN) in learning complex dependencies, GNN-based SBRs have become the main stream of SBRs in recent years. Most GNN-based SBRs are based on a strong assumption of adjacent dependency, which means any two adjacent items in a session are necessarily dependent here. However, based on our observation, the adjacency does not necessarily indicate dependency due to the uncertainty and complexity of user behaviours. Therefore, the aforementioned assumption does not always hold in the real-world cases and thus easily leads to two deficiencies: (1) the introduction of false dependencies between items which are adjacent in a session but are not really dependent, and (2) the missing of true dependencies between items which are not adjacent but are actually dependent. Such deficiencies significantly d...
present here the research work on data mining technologies for complicated attributes relationshi... more present here the research work on data mining technologies for complicated attributes relationship in digital library collections. Firstly our work and ideology is introduced as the research background of this paper. Digital library evaluation is an important topic in information systems domain. We creatively import data mining technologies into it to get an intelligent decision support. But traditional data prediction algorithm didn’t work well. This is the problem which would be solved in this paper. Secondly related preliminary research is introduced. We researched on attributes of digital library collections, proposed a parallel discretization algorithm based on z-score theory, and by the discretization algorithm discovered a complicated condition attribute relation among attributes, it is the reason why traditional data prediction algorithm didn’t work well. At last a stratified decision tree algorithm for value prediction about digital collection is put forward as the ultimate...
Predicting students academic performance is very important for students future development. There... more Predicting students academic performance is very important for students future development. There are a large number of students who can not graduate from colleges on time for various reasons every year. Nowadays, a large volume of students academic data has been generated in the process of promoting education informatization from the field of education. It becomes critical to predict student performance and ensure students to graduate on time by taking the best of these data. Machine learning models that predict students performance are widely available. However, some existing machine learning models still have the problem of low accuracy in predicting students performance. To solve this problem, we proposes a SMNaive Bayes (SMNB) model, which integrates Sequential Minimal Optimization (SMO) and Naive Bayes to make the prediction result more accurate. The basic idea is that the model predicts the performance of students professional courses via their basic course performance in the...
ArXiv, 2021
News recommender systems are essential for helping users to efficiently and effectively find out ... more News recommender systems are essential for helping users to efficiently and effectively find out those interesting news from a large amount of news. Most of existing news recommender systems usually learn topic-level representations of users and news for recommendation, and neglect to learn more informative aspect-level features of users and news for more accurate recommendation. As a result, they achieve limited recommendation performance. Aiming at addressing this deficiency, we propose a novel Aspect-driven News Recommender System (ANRS) built on aspect-level user preference and news representation learning. Here, news aspect is fine-grained semantic information expressed by a set of related words, which indicates specific aspects described by the news. In ANRS, news aspect-level encoder and user aspect-level encoder are devised to learn the fine-grained aspect-level representations of user’s preferences and news characteristics respectively, which are fed into click predictor to...
ArXiv, 2021
Sequential diagnosis prediction on the Electronic Health Record (EHR) has been proven crucial for... more Sequential diagnosis prediction on the Electronic Health Record (EHR) has been proven crucial for predictive analytics in the medical domain. EHR data, sequential records of a patient’s interactions with healthcare systems, has numerous inherent characteristics of temporality, irregularity and data insufficiency. Some recent works train healthcare predictive models by making use of sequential information in EHR data, but they are vulnerable to irregular, temporal EHR data with the states of admission/discharge from hospital, and insufficient data. To mitigate this, we propose an end-to-end robust transformer-based model called SETOR, which exploits neural ordinary differential equation to handle both irregular intervals between a patient’s visits with admitted timestamps and length of stay in each visit, to alleviate the limitation of insufficient data by integrating medical ontology, and to capture the dependencies between the patient’s visits by employing multi-layer transformer b...
Computers in Biology and Medicine
Cornell University - arXiv, Sep 6, 2022
Next basket recommender systems (NBRs) aim to recommend a user's next (shopping) basket of items ... more Next basket recommender systems (NBRs) aim to recommend a user's next (shopping) basket of items via modeling the user's preferences towards items based on the user's purchase history, usually a sequence of historical baskets. Due to its wide applicability in the real-world E-commerce industry, the studies NBR have attracted increasing attention in recent years. NBRs have been widely studied and much progress has been achieved in this area with a variety of NBR approaches having been proposed. However, an important issue is that there is a lack of a systematic and unified evaluation over the various NBR approaches. Different studies often evaluate NBR approaches on different datasets, under different experimental settings, making it hard to fairly and effectively compare the performance of different NBR approaches. To bridge this gap, in this work, we conduct a systematical empirical study in NBR area. Specifically, we review the representative work in NBR and analyze their cons and pros. Then, we run the selected NBR algorithms on the same datasets, under the same experimental setting and evaluate their performances using the same measurements. This provides a unified framework to fairly compare different NBR approaches. We hope this study can provide a valuable reference for the future research in this vibrant area.
Lecture Notes in Computer Science, 2022
2022 International Joint Conference on Neural Networks (IJCNN)
2022 International Joint Conference on Neural Networks (IJCNN)
Cornell University - arXiv, Jul 20, 2021
Healthcare representation learning on the Electronic Health Records (EHRs) is crucial for downstr... more Healthcare representation learning on the Electronic Health Records (EHRs) is crucial for downstream medical prediction tasks in health informatics. Many natural language processing techniques, such as word2vec, RNN and self-attention, have been adapted to learn medical representations from hierarchical and time-stamped EHRs data, but fail when they lack either general or task-specific data.
Frontiers in Neuroinformatics
Magnetoencephalography is a noninvasive neuromagnetic technology to record epileptic activities f... more Magnetoencephalography is a noninvasive neuromagnetic technology to record epileptic activities for the pre-operative localization of epileptogenic zones, which has received increasing attention in the diagnosis and surgery of epilepsy. As reported by recent studies, pathological high frequency oscillations (HFOs), when utilized as a biomarker to localize the epileptogenic zones, result in a significant reduction in seizure frequency, even seizure elimination in around 80% of cases. Thus, objective, rapid, and automatic detection and recommendation of HFOs are highly desirable for clinicians to alleviate the burden of reviewing a large amount of MEG data from a given patient. Despite the advantage, the performance of existing HFOs rarely satisfies the clinical requirement. Consequently, no HFOs have been successfully applied to real clinical applications so far. In this work, we propose a multi-head self-attention-based detector for recommendation, termed MSADR, to detect and recomm...
With the rapid development of the Internet and WWW, it is more and more important for people to a... more With the rapid development of the Internet and WWW, it is more and more important for people to access quality web information. Thus the problem of enabling users to quickly and accurately find information has become an urgent issue. As one of the basic ways to solve this problem, personalized information services have been focusing on fulfilling the personalized information requirements of different users based on their actual demands, preference characteristics, behaviour patterns, etc. This thesis focuses on enhancing web log based recommendation by personalized retrieval, and its main works and innovations include: • For personalized retrieval, the thesis proposes two models to improve user experience and optimize search performance. The first is a query suggestion model based on query semantics and click-through data. This model calculates the subject relevance between queries, and then combines the semantic information and the relevance of the query-click matrix model as this ...
The increasing amount of Tibetan information has made Tibetan text processing popular and highly ... more The increasing amount of Tibetan information has made Tibetan text processing popular and highly significant. In this study, Tibetan hot topic extraction and public opinion classification were investigated to accelerate the development of Tibetan information processing. First, Tibetan word segmentation in Tibetan hot topic extraction was presented. Second, feature selection based on term frequency and that based on document frequency was adopted to decrease feature dimensions. Third, a vector space model was used to conduct text representation. Finally, a statistical-based method was utilized to extract hot topics. In studying public opinion classification, a keyword table of public opinion needed to be established to conduct Tibetan public opinion classification. According to field, 18 classes were selected and used for public opinion classification. A keyword table of public opinion was constructed by domain experts. The approach to public opinion classification was introduced on ...
ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Lecture Notes in Computer Science, 2022
Fuel Processing Technology, 2022
Frontiers in Molecular Biosciences, 2022
High-frequency oscillations (HFOs), observed within 80–500 Hz of magnetoencephalography (MEG) dat... more High-frequency oscillations (HFOs), observed within 80–500 Hz of magnetoencephalography (MEG) data, are putative biomarkers to localize epileptogenic zones that are critical for the success of surgical epilepsy treatment. It is crucial to accurately detect HFOs for improving the surgical outcome of patients with epilepsy. However, in clinical practices, detecting HFOs in MEG signals mainly depends on visual inspection by clinicians, which is very time-consuming, labor-intensive, subjective, and error-prone. To accurately and automatically detect HFOs, machine learning approaches have been developed and have demonstrated the promising results of automated HFO detection. More recently, the transformer-based model has attracted wide attention and achieved state-of-the-art performance on many machine learning tasks. In this paper, we are investigating the suitability of transformer-based models on the detection of HFOs. Specifically, we propose a transformer-based HFO detection framewor...
Journal of Biomedical Informatics, 2022
The goal of mortality prediction task is to predict the future death risk of patients according t... more The goal of mortality prediction task is to predict the future death risk of patients according to their previous Electronic Healthcare Records (EHR). The main challenge of mortality prediction is how to design an accurate and robust predictive model with sequential, multivariate, sparse and irregular EHR data. In addition, the performance of model may be affected by lack of sufficient information of some patients with rare diseases in EHRs. To address these challenges, we propose a model to fuse Sequential visits and Medical Ontology to predict patients' death risk. SeMO not only learns reasonable embeddings for medical concepts from sequential and irregular visits, but also exploits medical ontology to improve the prediction performance. With integration of multivariate features, SeMO learns robust representations of medical codes, mitigating data insufficiency and insightful sequential dependencies among patient's visits. Experimental results on real world datasets prove that the proposed SeMO improves the prediction performance compared with the baseline approaches. Our model achieves an precision of up to 0.975. Compared with RNN, the precision has been improved up to 2.204%.
Session-based recommendations (SBRs) recommend the next item for an anonymous user by modeling th... more Session-based recommendations (SBRs) recommend the next item for an anonymous user by modeling the dependencies between items in a session. Benefiting from the superiority of graph neural networks (GNN) in learning complex dependencies, GNN-based SBRs have become the main stream of SBRs in recent years. Most GNN-based SBRs are based on a strong assumption of adjacent dependency, which means any two adjacent items in a session are necessarily dependent here. However, based on our observation, the adjacency does not necessarily indicate dependency due to the uncertainty and complexity of user behaviours. Therefore, the aforementioned assumption does not always hold in the real-world cases and thus easily leads to two deficiencies: (1) the introduction of false dependencies between items which are adjacent in a session but are not really dependent, and (2) the missing of true dependencies between items which are not adjacent but are actually dependent. Such deficiencies significantly d...
present here the research work on data mining technologies for complicated attributes relationshi... more present here the research work on data mining technologies for complicated attributes relationship in digital library collections. Firstly our work and ideology is introduced as the research background of this paper. Digital library evaluation is an important topic in information systems domain. We creatively import data mining technologies into it to get an intelligent decision support. But traditional data prediction algorithm didn’t work well. This is the problem which would be solved in this paper. Secondly related preliminary research is introduced. We researched on attributes of digital library collections, proposed a parallel discretization algorithm based on z-score theory, and by the discretization algorithm discovered a complicated condition attribute relation among attributes, it is the reason why traditional data prediction algorithm didn’t work well. At last a stratified decision tree algorithm for value prediction about digital collection is put forward as the ultimate...
Predicting students academic performance is very important for students future development. There... more Predicting students academic performance is very important for students future development. There are a large number of students who can not graduate from colleges on time for various reasons every year. Nowadays, a large volume of students academic data has been generated in the process of promoting education informatization from the field of education. It becomes critical to predict student performance and ensure students to graduate on time by taking the best of these data. Machine learning models that predict students performance are widely available. However, some existing machine learning models still have the problem of low accuracy in predicting students performance. To solve this problem, we proposes a SMNaive Bayes (SMNB) model, which integrates Sequential Minimal Optimization (SMO) and Naive Bayes to make the prediction result more accurate. The basic idea is that the model predicts the performance of students professional courses via their basic course performance in the...
ArXiv, 2021
News recommender systems are essential for helping users to efficiently and effectively find out ... more News recommender systems are essential for helping users to efficiently and effectively find out those interesting news from a large amount of news. Most of existing news recommender systems usually learn topic-level representations of users and news for recommendation, and neglect to learn more informative aspect-level features of users and news for more accurate recommendation. As a result, they achieve limited recommendation performance. Aiming at addressing this deficiency, we propose a novel Aspect-driven News Recommender System (ANRS) built on aspect-level user preference and news representation learning. Here, news aspect is fine-grained semantic information expressed by a set of related words, which indicates specific aspects described by the news. In ANRS, news aspect-level encoder and user aspect-level encoder are devised to learn the fine-grained aspect-level representations of user’s preferences and news characteristics respectively, which are fed into click predictor to...
ArXiv, 2021
Sequential diagnosis prediction on the Electronic Health Record (EHR) has been proven crucial for... more Sequential diagnosis prediction on the Electronic Health Record (EHR) has been proven crucial for predictive analytics in the medical domain. EHR data, sequential records of a patient’s interactions with healthcare systems, has numerous inherent characteristics of temporality, irregularity and data insufficiency. Some recent works train healthcare predictive models by making use of sequential information in EHR data, but they are vulnerable to irregular, temporal EHR data with the states of admission/discharge from hospital, and insufficient data. To mitigate this, we propose an end-to-end robust transformer-based model called SETOR, which exploits neural ordinary differential equation to handle both irregular intervals between a patient’s visits with admitted timestamps and length of stay in each visit, to alleviate the limitation of insufficient data by integrating medical ontology, and to capture the dependencies between the patient’s visits by employing multi-layer transformer b...