Hyunwhan Joe | Seoul National University (original) (raw)
Papers by Hyunwhan Joe
Question answering systems for Linked Open Data represented in RDF has received attention lately.... more Question answering systems for Linked Open Data represented in RDF has received attention lately. These systems allow users to access datasets without any prior knowledge of the data model, schema, or query language. Template generation is one such method used by systems to transform natural language questions into SPARQL queries. TBSL is a representative of systems that use template-based approaches. TBSL first transforms questions into an intermediate semantic representation. After this the semantic representation is transformed into SPARQL templates. Several candidate templates can be generated. In this paper we propose an example of a possible scoring method on the intermediate semantic representations that can be later used for ranking the templates.
Entity Linking (EL) is a technique to link named entities in a given context to relevant entities... more Entity Linking (EL) is a technique to link named entities in a given context to relevant entities in a given knowledge base. Generally, EL consists of two important task. But, there are limitations of these tasks. To overcome these limitations, we tried to solve the problem of EL through the interdependencies between not only named entities but also common nouns and verbs appearing in the context. The second approach is that the words appeared in context are more closely related and the distance between those is closer. In this paper, we proposed the approaches to overcome these limitations.
Semantic technology research has tried to add a new layer of meaning to data to make it more acce... more Semantic technology research has tried to add a new layer of meaning to data to make it more accessible. As part of that, question answering system has facilitated interaction between human and computer by enabling communication in natural language. The primary focus of question answering system research has been on the analysis of the question in the sense that the answer can be predicted from the syntax and the semantics of the question. The more fundamental problem, however, lies in the pragmatic use of the question-answer pair beyond the question itself. In this paper, we elucidated how pragmatics can be integrated into a question answering system by introducing Extended Speech Act Theory and presenting real-world examples from our system. With this approach, it is anticipated that errors in the system be rectified and deeper inferences be drawn beyond the surface meaning of the speech.
Many studies aimed to construct an automated gene expression analysis platform for researchers. H... more Many studies aimed to construct an automated gene expression analysis platform for researchers. However, they lack an integrated data model for analyzing heterogeneous data. In order to address this issue, we created a biological data integration platform for transcriptome analysis (BiDIP) for managing various kinds of databases. As part of this platform, we developed a biological interaction data model (BIM). Also we provide a Web application and OpenAPIs that allow users to search for connections among multiple databases conformant to the proposed data model. In this paper, we will present the current status of the platform as well as its future research venues.
The lemon model has been developed to link lexical knowledge to ontology classes and properties. ... more The lemon model has been developed to link lexical knowledge to ontology classes and properties. However, it is onl y possible to describe an ontology property with lexical senses. In our previ ous work, we found that there are relations between ontology properties and natur al language predicates. We suggest a model for explaining ontology properties with not only lexical senses such as verb senses, but also arguments. In this pa per, we proposed a method to build ontology property explanations to help linkin g ontology properties and predicates in natural languages.
Journal of Computational Biology
Semantic Technology
In vivo experiments have had a great impact on the development of biomedicine, and as a result, a... more In vivo experiments have had a great impact on the development of biomedicine, and as a result, a variety of biomedical data is produced and provided to researchers. Standardization and ontology design were carried out for the systematic management and effective sharing of these data. As results of their efforts, useful ontologies such as the Experimental Factor Ontology (EFO), Disease Ontology (DO), Gene Ontology (GO), Chemical Entities of Biological Interest (ChEBI) were developed. However, these ontologies are not enough to provide knowledge about the experiments to researchers conducting in vivo studies. Specifically, in the experimental design process, the generation of cancer causes considerable time and research costs. Researchers conducting animal experiments need animals with signs of carcinogenesis that fits their research interests. Therefore, our study is intended to provide experimental data about inducing cancer in animals. In order to provide this data, we collect experimental data about chemical substances that cause cancer. After that, we design an ontology based on these data and link it with the Disease Ontology. Our research focuses largely on two aspects. The first is to create a knowledge graph that inter-links with other biomedical linked data. The second is to provide practical knowledge to researchers conducting in vivo experiments. In conclusion, our research is provided in the form of a web service, which makes it easy to use the SPARQL endpoint and search service.
Computer Speech & Language
Recently there has been a trend in bioinformatics to produce and manage large quantities of data ... more Recently there has been a trend in bioinformatics to produce and manage large quantities of data to better explain complex life phenomena through relationship and interactions among biomedical entities. This increase in data leads to a need for more efficient management and searching capabilities. As a result, Semantic Web technologies have been applied to biomedical data. To use these technologies, users have to learn a query language such as SPARQL in order to ask complex queries such as ‘What are the drugs associated with the disease breast carcinoma and Osteoporosis but not the gene ESR1’. BEE was developed to overcome the limitations and difficulties of learning such query languages. Our proposed system provides an intuitive and effective query interface based on natural language. Our system is a heterogeneous biomedical entity query system based on pathway, drug, microRNA, disease and gene datasets from DGIdb, Tarbase, Human Phenotype Ontology and Reactome, Gene Ontology, KEGG...
Journal of Cheminformatics
Analysis of compound-protein interactions (CPIs) has become a crucial prerequisite for drug disco... more Analysis of compound-protein interactions (CPIs) has become a crucial prerequisite for drug discovery and drug repositioning. In vitro experiments are commonly used in identifying CPIs, but it is not feasible to discover the molecular and proteomic space only through experimental approaches. Machine learning's advances in predicting CPIs have made significant contributions to drug discovery. Deep neural networks (DNNs), which have recently been applied to predict CPIs, performed better than other shallow classifiers. However, such techniques commonly require a considerable volume of dense data for each training target. Although the number of publicly available CPI data has grown rapidly, public data is still sparse and has a large number of measurement errors. In this paper, we propose a novel method, Multi-channel PINN, to fully utilize sparse data in terms of representation learning. With representation learning, Multi-channel PINN can utilize three approaches of DNNs which are a classifier, a feature extractor, and an end-toend learner. Multi-channel PINN can be fed with both low and high levels of representations and incorporates each of them by utilizing all approaches within a single model. To fully utilize sparse public data, we additionally explore the potential of transferring representations from training tasks to test tasks. As a proof of concept, Multi-channel PINN was evaluated on fifteen combinations of feature pairs to investigate how they affect the performance in terms of highest performance, initial performance, and convergence speed. The experimental results obtained indicate that the multichannel models using protein features performed better than single-channel models or multi-channel models using compound features. Therefore, Multi-channel PINN can be advantageous when used with appropriate representations. Additionally, we pretrained models on a training task then finetuned them on a test task to figure out whether Multichannel PINN can capture general representations for compounds and proteins. We found that there were significant differences in performance between pretrained models and non-pretrained models.
Knowledge-Based Systems, 2014
ABSTRACT
Question answering systems for Linked Open Data represented in RDF has received attention lately.... more Question answering systems for Linked Open Data represented in RDF has received attention lately. These systems allow users to access datasets without any prior knowledge of the data model, schema, or query language. Template generation is one such method used by systems to transform natural language questions into SPARQL queries. TBSL is a representative of systems that use template-based approaches. TBSL first transforms questions into an intermediate semantic representation. After this the semantic representation is transformed into SPARQL templates. Several candidate templates can be generated. In this paper we propose an example of a possible scoring method on the intermediate semantic representations that can be later used for ranking the templates.
Entity Linking (EL) is a technique to link named entities in a given context to relevant entities... more Entity Linking (EL) is a technique to link named entities in a given context to relevant entities in a given knowledge base. Generally, EL consists of two important task. But, there are limitations of these tasks. To overcome these limitations, we tried to solve the problem of EL through the interdependencies between not only named entities but also common nouns and verbs appearing in the context. The second approach is that the words appeared in context are more closely related and the distance between those is closer. In this paper, we proposed the approaches to overcome these limitations.
Semantic technology research has tried to add a new layer of meaning to data to make it more acce... more Semantic technology research has tried to add a new layer of meaning to data to make it more accessible. As part of that, question answering system has facilitated interaction between human and computer by enabling communication in natural language. The primary focus of question answering system research has been on the analysis of the question in the sense that the answer can be predicted from the syntax and the semantics of the question. The more fundamental problem, however, lies in the pragmatic use of the question-answer pair beyond the question itself. In this paper, we elucidated how pragmatics can be integrated into a question answering system by introducing Extended Speech Act Theory and presenting real-world examples from our system. With this approach, it is anticipated that errors in the system be rectified and deeper inferences be drawn beyond the surface meaning of the speech.
Many studies aimed to construct an automated gene expression analysis platform for researchers. H... more Many studies aimed to construct an automated gene expression analysis platform for researchers. However, they lack an integrated data model for analyzing heterogeneous data. In order to address this issue, we created a biological data integration platform for transcriptome analysis (BiDIP) for managing various kinds of databases. As part of this platform, we developed a biological interaction data model (BIM). Also we provide a Web application and OpenAPIs that allow users to search for connections among multiple databases conformant to the proposed data model. In this paper, we will present the current status of the platform as well as its future research venues.
The lemon model has been developed to link lexical knowledge to ontology classes and properties. ... more The lemon model has been developed to link lexical knowledge to ontology classes and properties. However, it is onl y possible to describe an ontology property with lexical senses. In our previ ous work, we found that there are relations between ontology properties and natur al language predicates. We suggest a model for explaining ontology properties with not only lexical senses such as verb senses, but also arguments. In this pa per, we proposed a method to build ontology property explanations to help linkin g ontology properties and predicates in natural languages.
Journal of Computational Biology
Semantic Technology
In vivo experiments have had a great impact on the development of biomedicine, and as a result, a... more In vivo experiments have had a great impact on the development of biomedicine, and as a result, a variety of biomedical data is produced and provided to researchers. Standardization and ontology design were carried out for the systematic management and effective sharing of these data. As results of their efforts, useful ontologies such as the Experimental Factor Ontology (EFO), Disease Ontology (DO), Gene Ontology (GO), Chemical Entities of Biological Interest (ChEBI) were developed. However, these ontologies are not enough to provide knowledge about the experiments to researchers conducting in vivo studies. Specifically, in the experimental design process, the generation of cancer causes considerable time and research costs. Researchers conducting animal experiments need animals with signs of carcinogenesis that fits their research interests. Therefore, our study is intended to provide experimental data about inducing cancer in animals. In order to provide this data, we collect experimental data about chemical substances that cause cancer. After that, we design an ontology based on these data and link it with the Disease Ontology. Our research focuses largely on two aspects. The first is to create a knowledge graph that inter-links with other biomedical linked data. The second is to provide practical knowledge to researchers conducting in vivo experiments. In conclusion, our research is provided in the form of a web service, which makes it easy to use the SPARQL endpoint and search service.
Computer Speech & Language
Recently there has been a trend in bioinformatics to produce and manage large quantities of data ... more Recently there has been a trend in bioinformatics to produce and manage large quantities of data to better explain complex life phenomena through relationship and interactions among biomedical entities. This increase in data leads to a need for more efficient management and searching capabilities. As a result, Semantic Web technologies have been applied to biomedical data. To use these technologies, users have to learn a query language such as SPARQL in order to ask complex queries such as ‘What are the drugs associated with the disease breast carcinoma and Osteoporosis but not the gene ESR1’. BEE was developed to overcome the limitations and difficulties of learning such query languages. Our proposed system provides an intuitive and effective query interface based on natural language. Our system is a heterogeneous biomedical entity query system based on pathway, drug, microRNA, disease and gene datasets from DGIdb, Tarbase, Human Phenotype Ontology and Reactome, Gene Ontology, KEGG...
Journal of Cheminformatics
Analysis of compound-protein interactions (CPIs) has become a crucial prerequisite for drug disco... more Analysis of compound-protein interactions (CPIs) has become a crucial prerequisite for drug discovery and drug repositioning. In vitro experiments are commonly used in identifying CPIs, but it is not feasible to discover the molecular and proteomic space only through experimental approaches. Machine learning's advances in predicting CPIs have made significant contributions to drug discovery. Deep neural networks (DNNs), which have recently been applied to predict CPIs, performed better than other shallow classifiers. However, such techniques commonly require a considerable volume of dense data for each training target. Although the number of publicly available CPI data has grown rapidly, public data is still sparse and has a large number of measurement errors. In this paper, we propose a novel method, Multi-channel PINN, to fully utilize sparse data in terms of representation learning. With representation learning, Multi-channel PINN can utilize three approaches of DNNs which are a classifier, a feature extractor, and an end-toend learner. Multi-channel PINN can be fed with both low and high levels of representations and incorporates each of them by utilizing all approaches within a single model. To fully utilize sparse public data, we additionally explore the potential of transferring representations from training tasks to test tasks. As a proof of concept, Multi-channel PINN was evaluated on fifteen combinations of feature pairs to investigate how they affect the performance in terms of highest performance, initial performance, and convergence speed. The experimental results obtained indicate that the multichannel models using protein features performed better than single-channel models or multi-channel models using compound features. Therefore, Multi-channel PINN can be advantageous when used with appropriate representations. Additionally, we pretrained models on a training task then finetuned them on a test task to figure out whether Multichannel PINN can capture general representations for compounds and proteins. We found that there were significant differences in performance between pretrained models and non-pretrained models.
Knowledge-Based Systems, 2014
ABSTRACT