Jialun Wu - Academia.edu (original) (raw)
Papers by Jialun Wu
Bioinformatics
Motivation Artificially making clinical decisions for patients with multi-morbidity has long been... more Motivation Artificially making clinical decisions for patients with multi-morbidity has long been considered a thorny problem due to the complexity of the disease. Drug recommendations can assist doctors in automatically providing effective and safe drug combinations conducive to treatment and reducing adverse reactions. However, the existing drug recommendation works ignored two critical information. (1) Different types of medical information and their interrelationships in the patient's visit history can be used to construct a comprehensive patient representation. (2) Patients with similar disease characteristics and their corresponding medication information can be used as a reference for predicting drug combinations. Results To address these limitations, we propose DAPSNet, which encodes multi-type medical codes into patient representations through code-level and visit-level attention mechanisms, while integrating drug information corresponding to similar patient states to i...
2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
Proceedings of the 31st ACM International Conference on Information & Knowledge Management
Predicting drug combinations according to patients' electronic health records is an essential tas... more Predicting drug combinations according to patients' electronic health records is an essential task in intelligent healthcare systems, which can assist clinicians in ordering safe and effective prescriptions. However, existing work either missed/underutilized the important information lying in the drug molecule structure in drug encoding or has insufficient control over Drug-Drug Interactions (DDIs) rates within the predictions. To address these limitations, we propose CSEDrug, which enhances the drug encoding and DDIs controlling by leveraging multi-faceted drug knowledge, including molecule structures of drugs, Synergistic DDIs (SDDIs), and Antagonistic DDIs (ADDIs). We integrate these types of knowledge into CSEDrug by a graph-based drug encoder and multiple loss functions, including a novel triplet learning loss and a comprehensive DDI controllable loss. We evaluate the performance of CSEDrug in terms of accuracy, effectiveness, and safety on the public MIMIC-III dataset. The experimental results demonstrate that CSEDrug outperforms several state-of-the-art methods and achieves a 2.93% and a 2.77% increase in the Jaccard similarity scores and F1 scores,
IEEE Transactions on Medical Imaging
2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
Healthcare Representation learning has been a key element to achieving state-of-the-art performan... more Healthcare Representation learning has been a key element to achieving state-of-the-art performance on healthcare prediction. Recent advances based Electronic Healthcare Records(EHRs) are mostly devoted to extracting temporal progression patterns with temporal model and their variants. Although these works have shown excellent performances in healthcare prediction, the unified temporal pattern may not be suitable for individuals in all healthcare conditions. Moreover, some studies ususally introduce complex Deep Neural Networks models and medical prior knowledge to get compact representation, causing great computational burden. In this paper, we propose a general health care representation model, named AEFNet. We only leverage three simple convolution operations and a set of up and down sampling to ensure performance and model complexity equally, which achieves adaptively extract distinct individual key feature in a light manner. AEFNet can shrink and refine highly suitable scale information adaptively and comletely. Breaking traditional fixed convolution scale or multi-scale, AEFNet achieves scale adaptively to extract the most significant information and context relationship. Finally, We validate our method on the public dataset MIMIC-III, and the evaluation results indicate that our method can significantly outperform other remarkable baseline models.
2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
Constructing large-scaled medical knowledge graphs (MKGs) can significantly boost healthcare appl... more Constructing large-scaled medical knowledge graphs (MKGs) can significantly boost healthcare applications for medical surveillance, bring much attention from recent research. An essential step in constructing large-scale MKG is extracting information from medical reports. Recently, information extraction techniques have been proposed and show promising performance in biomedical information extraction. However, these methods only consider limited types of entity and relation due to the noisy biomedical text data with complex entity correlations. Thus, they fail to provide enough information for constructing MKGs and restrict the downstream applications. To address this issue, we propose Biomedical Information Extraction (BioIE), a hybrid neural network to extract relations from biomedical text and unstructured medical reports. Our model utilizes a multi-head attention enhanced graph convolutional network (GCN) to capture the complex relations and context information while resisting the noise from the data. We evaluate our model on two major biomedical relationship extraction tasks, chemical-disease relation (CDR) and chemical-protein interaction (CPI), and a cross-hospital pan-cancer pathology report corpus. The results show that our method achieves superior performance than baselines. Furthermore, we evaluate the applicability of our method under a transfer learning setting and show that BioIE achieves promising performance in processing medical text from different formats and writing styles.
2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2021
Digital pathology plays a crucial role in the development of artificial intelligence in the medic... more Digital pathology plays a crucial role in the development of artificial intelligence in the medical field. The digital pathology platform can make the pathological resources digital and networked, and realize the permanent storage of visual data and the synchronous browsing processing without the limitation of time and space. It has been widely used in various fields of pathology. However, there is still a lack of an open and universal digital pathology platform to assist doctors in the management and analysis of digital pathological sections, as well as the management and structured description of relevant patient information. Most platforms cannot integrate image viewing, annotation and analysis, and text information management. To solve the above problems, we propose a comprehensive and extensible platform, PIMIP (Pathology Information Management & Integration Platform). Our PIMIP has developed the image annotation functions based on the visualization of digital pathological sections. Our annotation functions support multiuser collaborative annotation and multi-device annotation, and realize the automation of some annotation tasks. In the annotation task, we invited a professional pathologist for guidance. We introduce a machine learning module for image analysis. The data we collected included public data from local hospitals and clinical examples. Our platform is more clinical and suitable for clinical use. In addition to image data, we also structured the management and display of text information. So our platform is comprehensive. The platform framework is built in a modular way to support users to add machine learning modules independently, which makes our platform extensible.
2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2020
Electronic medical data contains biochemical, imaging, pathological information during diagnosis ... more Electronic medical data contains biochemical, imaging, pathological information during diagnosis and treatment. The pathology report is a kind of highly liberalized unstructured textual data, which is the basis and gold standard of cancer diagnosis and is very important for the prognosis and treatment of patients. The application of information extraction technology to pathological reports can obtain structured data that can be understood and analyzed by computers, helping pathologists make appropriate decisions. In this work, we proposed an attention-based graph convolutional network (GCN) for converting unstructured pathological reports into a structured form suitable for computer analysis to improve the current pathologist’s workflow, collected medical data from different platforms, and provided more accurate assistance for diagnosis and treatment. We used pathology reports data from TCGA (The Cancer Genome Atlas) database with fine-grained annotations on 3632 pathology reports including four types of cancers. Our method performs better in our pathology report dataset with higher F1 score than traditional methods and deep learning methods. The results indicate that our method is robust, thus may work with other types of cancer pathology report.
2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2021
2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2021
Diagnostic pathology, which is the basis and gold standard of cancer diagnosis, provides essentia... more Diagnostic pathology, which is the basis and gold standard of cancer diagnosis, provides essential information on the prognosis of the disease and vital evidence for clinical treatment. Tumor region detection, subtype and grade classification are the fundamental diagnostic indicators for renal cell carcinoma (RCC) in whole-slide images (WSIs). However, pathological diagnosis is subjective, differences in observation and diagnosis between pathologists is common in hospitals with inadequate diagnostic capacity. The main challenge for developing deep learning based RCC diagnostic system is the lack of large-scale datasets with precise annotations. In this work, we proposed a deep learning-based framework for analyzing histopathological images of patients with renal cell carcinoma, which has the potential to achieve pathologist-level accuracy in diagnosis. A deep convolutional neural network (InceptionV3) was trained on the high-quality annotated dataset of The Cancer Genome Atlas (TCGA) whole-slide histopathological image for accurate tumor area detection, classification of RCC subtypes, and ISUP grades classification of clear cell carcinoma subtypes. These results suggest that our framework can help pathologists in the detection of cancer region and classification of subtypes and grades, which could be applied to any cancer type, providing auxiliary diagnosis and promoting clinical consensus.
Medical Image Computing and Computer Assisted Intervention – MICCAI 2021, 2021
Histological subtype of papillary (p) renal cell carcinoma (RCC), type 1 vs. type 2, is an essent... more Histological subtype of papillary (p) renal cell carcinoma (RCC), type 1 vs. type 2, is an essential prognostic factor. The two subtypes of pRCC have a similar pattern, i.e., the papillary architecture, yet some subtle differences, including cellular and cell-layer level patterns. However, the cellular and cell-layer level patterns almost cannot be captured by existing CNN-based models in large-size histopathological images, which brings obstacles to directly applying these models to such a fine-grained classification task. This paper proposes a novel instance-based Vision Transformer (i-ViT) to learn robust representations of histopathological images for the pRCC subtyping task by extracting finer features from instance patches (by cropping around segmented nuclei and assigning predicted grades). The proposed i-ViT takes top-K instances as input and aggregates them for capturing both the cellular and cell-layer level patterns by a position-embedding layer, a gradeembedding layer, and a multi-head multi-layer self-attention module. To evaluate the performance of the proposed framework, experienced pathologists are invited to selected 1162 regions of interest from 171 whole slide images of type 1 and type 2 pRCC. Experimental results show that the proposed method achieves better performance than existing CNN-based models with a significant margin.
Medical Image Computing and Computer Assisted Intervention – MICCAI 2021, 2021
The grade of clear cell renal cell carcinoma (ccRCC) is a critical prognostic factor, making ccRC... more The grade of clear cell renal cell carcinoma (ccRCC) is a critical prognostic factor, making ccRCC nuclei grading a crucial task in RCC pathology analysis. Computer-aided nuclei grading aims to improve pathologists' work efficiency while reducing their misdiagnosis rate by automatically identifying the grades of tumor nuclei within histopathological images. Such a task requires precisely segment and accurately classify the nuclei. However, most of the existing nuclei segmentation and classification methods can not handle the inter-class similarity property of nuclei grading, thus can not be directly applied to the ccRCC grading task. In this paper, we propose a Composite High-Resolution Network for ccRCC nuclei grading. Specifically, we propose a segmentation network called W-Net that can separate the clustered nuclei. Then, we recast the fine-grained classification of nuclei to two cross-category classification tasks, based on two high-resolution feature extractors (HRFEs) which are proposed for learning these two tasks. The two HRFEs share the same backbone encoder with W-Net by a composite connection so that meaningful features for the segmentation task can be inherited for the classification task. Last, a head-fusion block is applied to generate the predicted label of each nucleus. Furthermore, we introduce a dataset for ccRCC nuclei grading, containing 1000 image patches with 70945 annotated nuclei. We demonstrate that our proposed method achieves state-of-the-art performance compared to existing methods on this large ccRCC grading dataset.
Proceedings of the Fourth Social Media Mining for Health Applications (#SMM4H) Workshop & Shared Task, 2019
Claims database and electronic health records database do not usually capture kinship or family r... more Claims database and electronic health records database do not usually capture kinship or family relationship information, which is imperative for genetic research. We identify online obituaries as a new data source and propose a special named entity recognition and relation extraction solution to extract names and kinships from online obituaries. Built on 1,809 annotated obituaries and a novel tagging scheme, our joint neural model achieved macro-averaged precision, recall and F measure of 72.69%, 78.54% and 74.93%, and micro-averaged precision, recall and F measure of 95.74%, 98.25% and 96.98% using 57 kinships with 10 or more examples in a 10-fold cross-validation experiment. The model performance improved dramatically when trained with 34 kinships with 50 or more examples. Leveraging additional information such as age, death date, birth date and residence mentioned by obituaries, we foresee a promising future of supplementing EHR databases with comprehensive and accurate kinship information for genetic research.
2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2021
Personalized diagnoses have not been possible due to sear amount of data pathologists have to bea... more Personalized diagnoses have not been possible due to sear amount of data pathologists have to bear during the day-today routine. This lead to the current generalized standards that are being continuously updated as new findings are reported. It is noticeable that these effective standards are developed based on a multi-source heterogeneous data, including whole-slide images and pathology and clinical reports. In this study, we propose a framework that combines pathological images and medical reports to generate a personalized diagnosis result for individual patient. We use nuclei-level image feature similarity and contentbased deep learning method to search for a personalized group of population with similar pathological characteristics, extract structured prognostic information from descriptive pathology reports of the similar patient population, and assign importance of different prognostic factors to generate a personalized pathological diagnosis result. We use multi-source heterogeneous data from TCGA (The Cancer Genome Atlas) database. The result demonstrate that our framework matches the performance of pathologists in the diagnosis of renal cell carcinoma. This framework is designed to be generic, thus could be applied for other types of cancer. The weights could provide insights to the known prognostic factors and further guide more precise clinical treatment protocols. Index Terms-multi-source heterogeneous data, personalized diagnostic result, prognostic factor weight.
Bioinformatics
Motivation Artificially making clinical decisions for patients with multi-morbidity has long been... more Motivation Artificially making clinical decisions for patients with multi-morbidity has long been considered a thorny problem due to the complexity of the disease. Drug recommendations can assist doctors in automatically providing effective and safe drug combinations conducive to treatment and reducing adverse reactions. However, the existing drug recommendation works ignored two critical information. (1) Different types of medical information and their interrelationships in the patient's visit history can be used to construct a comprehensive patient representation. (2) Patients with similar disease characteristics and their corresponding medication information can be used as a reference for predicting drug combinations. Results To address these limitations, we propose DAPSNet, which encodes multi-type medical codes into patient representations through code-level and visit-level attention mechanisms, while integrating drug information corresponding to similar patient states to i...
2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
Proceedings of the 31st ACM International Conference on Information & Knowledge Management
Predicting drug combinations according to patients' electronic health records is an essential tas... more Predicting drug combinations according to patients' electronic health records is an essential task in intelligent healthcare systems, which can assist clinicians in ordering safe and effective prescriptions. However, existing work either missed/underutilized the important information lying in the drug molecule structure in drug encoding or has insufficient control over Drug-Drug Interactions (DDIs) rates within the predictions. To address these limitations, we propose CSEDrug, which enhances the drug encoding and DDIs controlling by leveraging multi-faceted drug knowledge, including molecule structures of drugs, Synergistic DDIs (SDDIs), and Antagonistic DDIs (ADDIs). We integrate these types of knowledge into CSEDrug by a graph-based drug encoder and multiple loss functions, including a novel triplet learning loss and a comprehensive DDI controllable loss. We evaluate the performance of CSEDrug in terms of accuracy, effectiveness, and safety on the public MIMIC-III dataset. The experimental results demonstrate that CSEDrug outperforms several state-of-the-art methods and achieves a 2.93% and a 2.77% increase in the Jaccard similarity scores and F1 scores,
IEEE Transactions on Medical Imaging
2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
Healthcare Representation learning has been a key element to achieving state-of-the-art performan... more Healthcare Representation learning has been a key element to achieving state-of-the-art performance on healthcare prediction. Recent advances based Electronic Healthcare Records(EHRs) are mostly devoted to extracting temporal progression patterns with temporal model and their variants. Although these works have shown excellent performances in healthcare prediction, the unified temporal pattern may not be suitable for individuals in all healthcare conditions. Moreover, some studies ususally introduce complex Deep Neural Networks models and medical prior knowledge to get compact representation, causing great computational burden. In this paper, we propose a general health care representation model, named AEFNet. We only leverage three simple convolution operations and a set of up and down sampling to ensure performance and model complexity equally, which achieves adaptively extract distinct individual key feature in a light manner. AEFNet can shrink and refine highly suitable scale information adaptively and comletely. Breaking traditional fixed convolution scale or multi-scale, AEFNet achieves scale adaptively to extract the most significant information and context relationship. Finally, We validate our method on the public dataset MIMIC-III, and the evaluation results indicate that our method can significantly outperform other remarkable baseline models.
2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
Constructing large-scaled medical knowledge graphs (MKGs) can significantly boost healthcare appl... more Constructing large-scaled medical knowledge graphs (MKGs) can significantly boost healthcare applications for medical surveillance, bring much attention from recent research. An essential step in constructing large-scale MKG is extracting information from medical reports. Recently, information extraction techniques have been proposed and show promising performance in biomedical information extraction. However, these methods only consider limited types of entity and relation due to the noisy biomedical text data with complex entity correlations. Thus, they fail to provide enough information for constructing MKGs and restrict the downstream applications. To address this issue, we propose Biomedical Information Extraction (BioIE), a hybrid neural network to extract relations from biomedical text and unstructured medical reports. Our model utilizes a multi-head attention enhanced graph convolutional network (GCN) to capture the complex relations and context information while resisting the noise from the data. We evaluate our model on two major biomedical relationship extraction tasks, chemical-disease relation (CDR) and chemical-protein interaction (CPI), and a cross-hospital pan-cancer pathology report corpus. The results show that our method achieves superior performance than baselines. Furthermore, we evaluate the applicability of our method under a transfer learning setting and show that BioIE achieves promising performance in processing medical text from different formats and writing styles.
2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2021
Digital pathology plays a crucial role in the development of artificial intelligence in the medic... more Digital pathology plays a crucial role in the development of artificial intelligence in the medical field. The digital pathology platform can make the pathological resources digital and networked, and realize the permanent storage of visual data and the synchronous browsing processing without the limitation of time and space. It has been widely used in various fields of pathology. However, there is still a lack of an open and universal digital pathology platform to assist doctors in the management and analysis of digital pathological sections, as well as the management and structured description of relevant patient information. Most platforms cannot integrate image viewing, annotation and analysis, and text information management. To solve the above problems, we propose a comprehensive and extensible platform, PIMIP (Pathology Information Management & Integration Platform). Our PIMIP has developed the image annotation functions based on the visualization of digital pathological sections. Our annotation functions support multiuser collaborative annotation and multi-device annotation, and realize the automation of some annotation tasks. In the annotation task, we invited a professional pathologist for guidance. We introduce a machine learning module for image analysis. The data we collected included public data from local hospitals and clinical examples. Our platform is more clinical and suitable for clinical use. In addition to image data, we also structured the management and display of text information. So our platform is comprehensive. The platform framework is built in a modular way to support users to add machine learning modules independently, which makes our platform extensible.
2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2020
Electronic medical data contains biochemical, imaging, pathological information during diagnosis ... more Electronic medical data contains biochemical, imaging, pathological information during diagnosis and treatment. The pathology report is a kind of highly liberalized unstructured textual data, which is the basis and gold standard of cancer diagnosis and is very important for the prognosis and treatment of patients. The application of information extraction technology to pathological reports can obtain structured data that can be understood and analyzed by computers, helping pathologists make appropriate decisions. In this work, we proposed an attention-based graph convolutional network (GCN) for converting unstructured pathological reports into a structured form suitable for computer analysis to improve the current pathologist’s workflow, collected medical data from different platforms, and provided more accurate assistance for diagnosis and treatment. We used pathology reports data from TCGA (The Cancer Genome Atlas) database with fine-grained annotations on 3632 pathology reports including four types of cancers. Our method performs better in our pathology report dataset with higher F1 score than traditional methods and deep learning methods. The results indicate that our method is robust, thus may work with other types of cancer pathology report.
2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2021
2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2021
Diagnostic pathology, which is the basis and gold standard of cancer diagnosis, provides essentia... more Diagnostic pathology, which is the basis and gold standard of cancer diagnosis, provides essential information on the prognosis of the disease and vital evidence for clinical treatment. Tumor region detection, subtype and grade classification are the fundamental diagnostic indicators for renal cell carcinoma (RCC) in whole-slide images (WSIs). However, pathological diagnosis is subjective, differences in observation and diagnosis between pathologists is common in hospitals with inadequate diagnostic capacity. The main challenge for developing deep learning based RCC diagnostic system is the lack of large-scale datasets with precise annotations. In this work, we proposed a deep learning-based framework for analyzing histopathological images of patients with renal cell carcinoma, which has the potential to achieve pathologist-level accuracy in diagnosis. A deep convolutional neural network (InceptionV3) was trained on the high-quality annotated dataset of The Cancer Genome Atlas (TCGA) whole-slide histopathological image for accurate tumor area detection, classification of RCC subtypes, and ISUP grades classification of clear cell carcinoma subtypes. These results suggest that our framework can help pathologists in the detection of cancer region and classification of subtypes and grades, which could be applied to any cancer type, providing auxiliary diagnosis and promoting clinical consensus.
Medical Image Computing and Computer Assisted Intervention – MICCAI 2021, 2021
Histological subtype of papillary (p) renal cell carcinoma (RCC), type 1 vs. type 2, is an essent... more Histological subtype of papillary (p) renal cell carcinoma (RCC), type 1 vs. type 2, is an essential prognostic factor. The two subtypes of pRCC have a similar pattern, i.e., the papillary architecture, yet some subtle differences, including cellular and cell-layer level patterns. However, the cellular and cell-layer level patterns almost cannot be captured by existing CNN-based models in large-size histopathological images, which brings obstacles to directly applying these models to such a fine-grained classification task. This paper proposes a novel instance-based Vision Transformer (i-ViT) to learn robust representations of histopathological images for the pRCC subtyping task by extracting finer features from instance patches (by cropping around segmented nuclei and assigning predicted grades). The proposed i-ViT takes top-K instances as input and aggregates them for capturing both the cellular and cell-layer level patterns by a position-embedding layer, a gradeembedding layer, and a multi-head multi-layer self-attention module. To evaluate the performance of the proposed framework, experienced pathologists are invited to selected 1162 regions of interest from 171 whole slide images of type 1 and type 2 pRCC. Experimental results show that the proposed method achieves better performance than existing CNN-based models with a significant margin.
Medical Image Computing and Computer Assisted Intervention – MICCAI 2021, 2021
The grade of clear cell renal cell carcinoma (ccRCC) is a critical prognostic factor, making ccRC... more The grade of clear cell renal cell carcinoma (ccRCC) is a critical prognostic factor, making ccRCC nuclei grading a crucial task in RCC pathology analysis. Computer-aided nuclei grading aims to improve pathologists' work efficiency while reducing their misdiagnosis rate by automatically identifying the grades of tumor nuclei within histopathological images. Such a task requires precisely segment and accurately classify the nuclei. However, most of the existing nuclei segmentation and classification methods can not handle the inter-class similarity property of nuclei grading, thus can not be directly applied to the ccRCC grading task. In this paper, we propose a Composite High-Resolution Network for ccRCC nuclei grading. Specifically, we propose a segmentation network called W-Net that can separate the clustered nuclei. Then, we recast the fine-grained classification of nuclei to two cross-category classification tasks, based on two high-resolution feature extractors (HRFEs) which are proposed for learning these two tasks. The two HRFEs share the same backbone encoder with W-Net by a composite connection so that meaningful features for the segmentation task can be inherited for the classification task. Last, a head-fusion block is applied to generate the predicted label of each nucleus. Furthermore, we introduce a dataset for ccRCC nuclei grading, containing 1000 image patches with 70945 annotated nuclei. We demonstrate that our proposed method achieves state-of-the-art performance compared to existing methods on this large ccRCC grading dataset.
Proceedings of the Fourth Social Media Mining for Health Applications (#SMM4H) Workshop & Shared Task, 2019
Claims database and electronic health records database do not usually capture kinship or family r... more Claims database and electronic health records database do not usually capture kinship or family relationship information, which is imperative for genetic research. We identify online obituaries as a new data source and propose a special named entity recognition and relation extraction solution to extract names and kinships from online obituaries. Built on 1,809 annotated obituaries and a novel tagging scheme, our joint neural model achieved macro-averaged precision, recall and F measure of 72.69%, 78.54% and 74.93%, and micro-averaged precision, recall and F measure of 95.74%, 98.25% and 96.98% using 57 kinships with 10 or more examples in a 10-fold cross-validation experiment. The model performance improved dramatically when trained with 34 kinships with 50 or more examples. Leveraging additional information such as age, death date, birth date and residence mentioned by obituaries, we foresee a promising future of supplementing EHR databases with comprehensive and accurate kinship information for genetic research.
2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2021
Personalized diagnoses have not been possible due to sear amount of data pathologists have to bea... more Personalized diagnoses have not been possible due to sear amount of data pathologists have to bear during the day-today routine. This lead to the current generalized standards that are being continuously updated as new findings are reported. It is noticeable that these effective standards are developed based on a multi-source heterogeneous data, including whole-slide images and pathology and clinical reports. In this study, we propose a framework that combines pathological images and medical reports to generate a personalized diagnosis result for individual patient. We use nuclei-level image feature similarity and contentbased deep learning method to search for a personalized group of population with similar pathological characteristics, extract structured prognostic information from descriptive pathology reports of the similar patient population, and assign importance of different prognostic factors to generate a personalized pathological diagnosis result. We use multi-source heterogeneous data from TCGA (The Cancer Genome Atlas) database. The result demonstrate that our framework matches the performance of pathologists in the diagnosis of renal cell carcinoma. This framework is designed to be generic, thus could be applied for other types of cancer. The weights could provide insights to the known prognostic factors and further guide more precise clinical treatment protocols. Index Terms-multi-source heterogeneous data, personalized diagnostic result, prognostic factor weight.