Behrouz Bokharaeian | Universidad Complutense de Madrid (original) (raw)
Papers by Behrouz Bokharaeian
— extracting biomedical relations such as drug-drug interaction (DDI) from text is an important t... more — extracting biomedical relations such as drug-drug interaction (DDI) from text is an important task in biomedical NLP. Due to the large number of complex sentences in biomedical literature, researchers have employed some sentence simplification techniques to improve the performance of the relation extraction methods. However, due to difficulty of the task, there is no noteworthy improvement in the research literature. This paper aims to explore clause dependency related features alongside to linguistic-based negation scope and cues to overcome complexity of the sentences. The experiments indicate the ratio of negation cues which is another source of inaccuracy is higher in complex sentences in comparison with simple ones. Additionally, the results show by employing the proposed features combined with a bag of words kernel, the performance of the used kernel methods improves. Moreover, experiments show the enhanced local context kernel outperforms other methods. The proposed method can be used as an alternative approach for sentence simplification techniques in biomedical area which is an error-prone task.
Resumen: La extracción de relaciones entre entidades es una tarea muy impor-tante dentro del proc... more Resumen: La extracción de relaciones entre entidades es una tarea muy impor-tante dentro del procesamiento de textos biomédicos. Se han desarrollado muchos algoritmos para este propósito aunque sólo unos pocos han estudiado el tema de las interacciones entre fármacos. En este trabajo se ha estudiado el efecto de la negación para esta tarea. En primer lugar, se describe cómo se ha extendido el corpus DrugDDI con anotaciones sobre negaciones y, en segundo lugar, se muestran una serie de experimentos en los que se muestra que tener en cuenta el efecto de la negación puede mejorar la detección de interacciones entre fármacos cuando se combina con otros métodos de extracción de relaciones. Palabras clave: Interacciones entre fármacos, negación, funciones kernel, máquinas de vectores de soporte, funciones kernel. Abstract: Extracting biomedical relations from text is an important task in BioMedical NLP. There are several systems developed for this purpose but the ones on Drug-Drug interactions are still a few. In this paper we want to show the effectiveness of negation features for this task. We firstly describe how we extended the DrugDDI corpus by annotating it with the scope of negation, and secondly we report a set of experiments in which we show that negation features provide benefits for the detection of drug-drug interactions in combination with some simple relation extraction methods.
Detecting drug-drug interactions (DDI) is an important research field in pharmacology and medicin... more Detecting drug-drug interactions (DDI) is an important research field in pharmacology and medicine and several publications report every year the negative effect of combining drugs and chemical treatments. The DrugDDI corpus is a collection of documents derived from the DrugBank database and contains manual annotations for interactions between drugs. We have investigated the negated statements in this corpus and found that they consist of approximately 21% of its sentences. Previous works have shown that considering features related to negation can improve results for the DDI task. The main goal of this paper is to describe the process for annotating the DDI-DrugBank corpus with negation cues and scopes, to show the correlations between these and the DDI annotations and to demonstrate that negations can be used as features for a DDI detection system. Basic experiments have been carried out to show the benefits when considering negations in the DDI task. We believe that the extended corpus can be a significant progress in training and testing algorithms for DDi extraction.
—Unstructured text documents are the major source of knowledge in biomedical fields. These huge a... more —Unstructured text documents are the major source of knowledge in biomedical fields. These huge amounts of information cause very difficult task of extraction or classification. Therefore, there is a need for knowledge discovery and text mining tools in this field. A lot of works have been done on relation extraction in biomedical field. However, each of them was implemented in three major types of techniques separately i.e. co-occurrence, kernel based and rule based methods. There are many variants of these algorithms have been developed but the combination of it has not been verified yet. In this paper we will compare each of those three methods and propose a new combination of relation extraction method between medical and biological entities from biomedical documents. Furthermore, a lot of researches have been done on biomedical binary relation such as protein-protein and gene-protein relations and few researches were on complex relations such as metabolic pathways. However, in this work we will discuss the overview a combination of three methods called as hybrid rule-based to extract complex and simple relations.
Background: Single Nucleotide Polymorphisms (SNPs) are among the most important types of genetic ... more Background: Single Nucleotide Polymorphisms (SNPs) are among the most important types of genetic variations influencing common diseases and phenotypes. Recently, some corpora and methods have been developed with the purpose of extracting mutations and diseases from texts. However, there is no available corpus, for extracting associations from texts, that is annotated with linguistic-based negation, modality markers, neutral candidates, and confidence level of associations. Method: In this research, different steps were presented so as to produce the SNPPhenA corpus. They include automatic Named Entity Recognition (NER) followed by the manual annotation of SNP and phenotype names, annotation of the SNP-phenotype associations and their level of confidence, as well as modality markers. Moreover, the produced corpus was annotated with negation scopes and cues as well as neutral candidates that play crucial role as far as negation and the modality phenomenon in relation to extraction tasks. Result: The agreement between annotators was measured by Cohen's Kappa coefficient where the resulting scores indicated the reliability of the corpus. The Kappa score was 0.79 for annotating the associations and 0.80 for the confidence degree of associations. Further presented were the basic statistics of the annotated features of the corpus in addition to the results of our first experiments related to the extraction of ranked SNP-Phenotype associations. The prepared guideline documents render the corpus more convenient and facile to use. The corpus, guidelines and inter-annotator agreement analysis are available on the website of the corpus: http://nil.fdi.ucm.es/?q=node/639. Conclusion: Specifying the confidence degree of SNP-phenotype associations from articles helps identify the strength of associations that could in turn assist genomics scientists in determining phenotypic plasticity and the importance of environmental factors. What is more, our first experiments with the corpus show that linguistic-based confidence alongside other non-linguistic features can be utilized in order to estimate the strength of the observed SNP-phenotype associations. Trial Registration: Not Applicable
Genome-wide association (GWA) constitutes a prominent portion of studies which have been conducte... more Genome-wide association (GWA) constitutes a prominent portion of studies which have been conducted on personalized medicine and pharmacogenomics. Recently, very few methods have been developed for extracting mutation-diseases associations. However, there is no available method for extracting the association of SNP-phenotype from text which considers degree of confidence in associations. In this study, first a relation extraction method relying on linguistic-based negation detection and neutral candidates is proposed. The experiments show that negation cues and scope as well as detecting neutral candidates can be employed for implementing a superior relation extraction method which outperforms the kernel-based counterparts due to a uniform innate polarity of sentences and small number of complex sentences in the corpus. Moreover, a modality based approach is proposed to estimate the confidence level of the extracted association which can be used to assess the reliability of the reported association.
Extracting biomedical relations such as drug-drug interaction (DDI) from text is an important tas... more Extracting biomedical relations such as drug-drug interaction (DDI) from text is an important task in biomedical natural language processing. Due to the large number of complex sentences in biomedical literature, researchers have employed some sentence simplification techniques to improve the performance of the relation extraction methods. However, no significant improvement has been reported in literature, since the task is difficult. This paper aims to explore clause dependency related features alongside to linguistic-based negation scope and cues to overcome complexity of the sentences. The results show through employing the proposed features combined with a bag of words kernel, the performance of the used kernel methods improves. Moreover, experiments show that the enhanced local context kernel outperforms other methods. The proposed method can be used as an alternative approach for sentence simplification techniques in biomedical area which is an error-prone task.
It has been known that SNPs are the most important types of genetic variations that can influence... more It has been known that SNPs are the most important types of genetic variations that can influence common diseases and phenotypes. Increasing number of SNP-phenotype related publications, demonstrate the need for an automatic extraction of this association from biomedical articles. Although few corpora have been developed for obtaining the mutation and disease from text, no corpus is available which has been annotated the level of confidence in associations which demonstrate the phenotypic plasticity. In this paper, different steps of producing SNPPhenA corpus were explained. They include gathering the abstracts, automatic and manual SNP and phenotype name tagging and annotating their associations. Additionally, the corpus includes negation scope and cues as well as neutral candidates that have important role in the relation extraction tasks. The inter-annotator agreement score for the confidence level which have been annotated by two annotators was between 0.70 and 0.85 that exhibit the reliability of the corpus. Additionally, an initial experiment was carried out on the corpus.
— Genome-wide association (GWA) studies form an important category of research studies in persona... more — Genome-wide association (GWA) studies form an important category of research studies in personalized medicine which discuss on associations between single-nucleotide polymorphisms (SNPs) and phenotypic traits. Considering the fast growing rate of GWA studies, automatic extraction of SNP-Traits associations from text is a highly demanding task. In this research, first an SNP-Trait association corpus is produced and then a non-supervised relation extraction method grounded on linguistic-based negation detection method is proposed. The experiments show that negation cues and scope can be employed as a superior relation extraction method due to uniform polarity of the sentences, small number of neutral examples and concessive clauses in the corpus. The proposed method is a non-supervised relation extraction method which works at the sentence-level with no need to label training data. Moreover, the experiments indicate that the proposed method has a superior performance over the studied sequence kernel method.
2015 4th Iranian Joint Congress on Fuzzy and Intelligent Systems (CFIS), 2015
2015 Signal Processing and Intelligent Systems Conference (SPIS), 2015
International Journal of Information and Education Technology, 2011
International Journal of Computer and Electrical Engineering, 2013
Expectation maximization algorithm has been extensively used in a variety of medical image proces... more Expectation maximization algorithm has been extensively used in a variety of medical image processing applications, especially for detecting human brain disease. In this paper, an efficient and improved semi-automated Fuzzy EM based techniques for 3-D MR segmentation of human brain images is presented. FEM along with histogram based K-means in initialization step is used for the labeling of individual pixels/voxels of a 3D anatomical MR image (MRI) into the main tissue classes in the brain, Gray matter (GM), White matter (WM), CSF (Celebro-spinal fluid). FEM's membership function were estimated through a histogram-based method. The results show our proposed FEM-KMeans has better performance and convergence speed compare to histogram based EM.
J. for International Business and Entrepreneurship Development, 2011
... (SABA), Mauritius, and Financial Advisor, Rapid Online Sdn Bhd, Kuala Lumpur, Malaysia. ... H... more ... (SABA), Mauritius, and Financial Advisor, Rapid Online Sdn Bhd, Kuala Lumpur, Malaysia. ... He received his Master of BioMedical Informatics and Master of Computer Science from Amir Kabir University of Technology, Tehran, Iran. ...
We designed a working memory (WM) training programme in game framework for mild intellectually di... more We designed a working memory (WM) training programme in game framework for mild intellectually disabled students. Twenty-four students participated as test and control groups. The auditory and visual-spatial WM were assessed by primary test, which included computerised Wechsler numerical forward and backward sub-tests and secondary tests, which contained three parts: dual visual-spatial test, auditory test and a one-syllable word recalling test. The results showed significant difference between WM capacity in the intellectually disabled children and normal ones (p-value < 0.00001). Visual-spatial WM, auditory WM and speaking were improved in the trained group. Four tests showed significant differences between pre-test and post-tests. The trained group showed more improvements in forward tasks. The trained participant's processing speed increased with training.
Resumen: La extracción de relaciones entre entidades es una tarea muy importante dentro del proce... more Resumen: La extracción de relaciones entre entidades es una tarea muy importante dentro del procesamiento de textos biomédicos. Se han desarrollado muchos algoritmos para este propósito aunque sólo unos pocos han estudiado el tema de las interacciones entre fármacos. En este trabajo se ha estudiado el efecto de la negación para esta tarea. En primer lugar, se describe cómo se ha extendido el corpus DrugDDI con anotaciones sobre negaciones y, en segundo lugar, se muestran una serie de experimentos en los que se muestra que tener en cuenta el efecto de la negación puede mejorar la detección de interacciones entre fármacos cuando se combina con otros métodos de extracción de relaciones. Palabras clave: Interacciones entre fármacos, negación, funciones kernel, máquinas de vectores de soporte, funciones kernel.
— extracting biomedical relations such as drug-drug interaction (DDI) from text is an important t... more — extracting biomedical relations such as drug-drug interaction (DDI) from text is an important task in biomedical NLP. Due to the large number of complex sentences in biomedical literature, researchers have employed some sentence simplification techniques to improve the performance of the relation extraction methods. However, due to difficulty of the task, there is no noteworthy improvement in the research literature. This paper aims to explore clause dependency related features alongside to linguistic-based negation scope and cues to overcome complexity of the sentences. The experiments indicate the ratio of negation cues which is another source of inaccuracy is higher in complex sentences in comparison with simple ones. Additionally, the results show by employing the proposed features combined with a bag of words kernel, the performance of the used kernel methods improves. Moreover, experiments show the enhanced local context kernel outperforms other methods. The proposed method can be used as an alternative approach for sentence simplification techniques in biomedical area which is an error-prone task.
Resumen: La extracción de relaciones entre entidades es una tarea muy impor-tante dentro del proc... more Resumen: La extracción de relaciones entre entidades es una tarea muy impor-tante dentro del procesamiento de textos biomédicos. Se han desarrollado muchos algoritmos para este propósito aunque sólo unos pocos han estudiado el tema de las interacciones entre fármacos. En este trabajo se ha estudiado el efecto de la negación para esta tarea. En primer lugar, se describe cómo se ha extendido el corpus DrugDDI con anotaciones sobre negaciones y, en segundo lugar, se muestran una serie de experimentos en los que se muestra que tener en cuenta el efecto de la negación puede mejorar la detección de interacciones entre fármacos cuando se combina con otros métodos de extracción de relaciones. Palabras clave: Interacciones entre fármacos, negación, funciones kernel, máquinas de vectores de soporte, funciones kernel. Abstract: Extracting biomedical relations from text is an important task in BioMedical NLP. There are several systems developed for this purpose but the ones on Drug-Drug interactions are still a few. In this paper we want to show the effectiveness of negation features for this task. We firstly describe how we extended the DrugDDI corpus by annotating it with the scope of negation, and secondly we report a set of experiments in which we show that negation features provide benefits for the detection of drug-drug interactions in combination with some simple relation extraction methods.
Detecting drug-drug interactions (DDI) is an important research field in pharmacology and medicin... more Detecting drug-drug interactions (DDI) is an important research field in pharmacology and medicine and several publications report every year the negative effect of combining drugs and chemical treatments. The DrugDDI corpus is a collection of documents derived from the DrugBank database and contains manual annotations for interactions between drugs. We have investigated the negated statements in this corpus and found that they consist of approximately 21% of its sentences. Previous works have shown that considering features related to negation can improve results for the DDI task. The main goal of this paper is to describe the process for annotating the DDI-DrugBank corpus with negation cues and scopes, to show the correlations between these and the DDI annotations and to demonstrate that negations can be used as features for a DDI detection system. Basic experiments have been carried out to show the benefits when considering negations in the DDI task. We believe that the extended corpus can be a significant progress in training and testing algorithms for DDi extraction.
—Unstructured text documents are the major source of knowledge in biomedical fields. These huge a... more —Unstructured text documents are the major source of knowledge in biomedical fields. These huge amounts of information cause very difficult task of extraction or classification. Therefore, there is a need for knowledge discovery and text mining tools in this field. A lot of works have been done on relation extraction in biomedical field. However, each of them was implemented in three major types of techniques separately i.e. co-occurrence, kernel based and rule based methods. There are many variants of these algorithms have been developed but the combination of it has not been verified yet. In this paper we will compare each of those three methods and propose a new combination of relation extraction method between medical and biological entities from biomedical documents. Furthermore, a lot of researches have been done on biomedical binary relation such as protein-protein and gene-protein relations and few researches were on complex relations such as metabolic pathways. However, in this work we will discuss the overview a combination of three methods called as hybrid rule-based to extract complex and simple relations.
Background: Single Nucleotide Polymorphisms (SNPs) are among the most important types of genetic ... more Background: Single Nucleotide Polymorphisms (SNPs) are among the most important types of genetic variations influencing common diseases and phenotypes. Recently, some corpora and methods have been developed with the purpose of extracting mutations and diseases from texts. However, there is no available corpus, for extracting associations from texts, that is annotated with linguistic-based negation, modality markers, neutral candidates, and confidence level of associations. Method: In this research, different steps were presented so as to produce the SNPPhenA corpus. They include automatic Named Entity Recognition (NER) followed by the manual annotation of SNP and phenotype names, annotation of the SNP-phenotype associations and their level of confidence, as well as modality markers. Moreover, the produced corpus was annotated with negation scopes and cues as well as neutral candidates that play crucial role as far as negation and the modality phenomenon in relation to extraction tasks. Result: The agreement between annotators was measured by Cohen's Kappa coefficient where the resulting scores indicated the reliability of the corpus. The Kappa score was 0.79 for annotating the associations and 0.80 for the confidence degree of associations. Further presented were the basic statistics of the annotated features of the corpus in addition to the results of our first experiments related to the extraction of ranked SNP-Phenotype associations. The prepared guideline documents render the corpus more convenient and facile to use. The corpus, guidelines and inter-annotator agreement analysis are available on the website of the corpus: http://nil.fdi.ucm.es/?q=node/639. Conclusion: Specifying the confidence degree of SNP-phenotype associations from articles helps identify the strength of associations that could in turn assist genomics scientists in determining phenotypic plasticity and the importance of environmental factors. What is more, our first experiments with the corpus show that linguistic-based confidence alongside other non-linguistic features can be utilized in order to estimate the strength of the observed SNP-phenotype associations. Trial Registration: Not Applicable
Genome-wide association (GWA) constitutes a prominent portion of studies which have been conducte... more Genome-wide association (GWA) constitutes a prominent portion of studies which have been conducted on personalized medicine and pharmacogenomics. Recently, very few methods have been developed for extracting mutation-diseases associations. However, there is no available method for extracting the association of SNP-phenotype from text which considers degree of confidence in associations. In this study, first a relation extraction method relying on linguistic-based negation detection and neutral candidates is proposed. The experiments show that negation cues and scope as well as detecting neutral candidates can be employed for implementing a superior relation extraction method which outperforms the kernel-based counterparts due to a uniform innate polarity of sentences and small number of complex sentences in the corpus. Moreover, a modality based approach is proposed to estimate the confidence level of the extracted association which can be used to assess the reliability of the reported association.
Extracting biomedical relations such as drug-drug interaction (DDI) from text is an important tas... more Extracting biomedical relations such as drug-drug interaction (DDI) from text is an important task in biomedical natural language processing. Due to the large number of complex sentences in biomedical literature, researchers have employed some sentence simplification techniques to improve the performance of the relation extraction methods. However, no significant improvement has been reported in literature, since the task is difficult. This paper aims to explore clause dependency related features alongside to linguistic-based negation scope and cues to overcome complexity of the sentences. The results show through employing the proposed features combined with a bag of words kernel, the performance of the used kernel methods improves. Moreover, experiments show that the enhanced local context kernel outperforms other methods. The proposed method can be used as an alternative approach for sentence simplification techniques in biomedical area which is an error-prone task.
It has been known that SNPs are the most important types of genetic variations that can influence... more It has been known that SNPs are the most important types of genetic variations that can influence common diseases and phenotypes. Increasing number of SNP-phenotype related publications, demonstrate the need for an automatic extraction of this association from biomedical articles. Although few corpora have been developed for obtaining the mutation and disease from text, no corpus is available which has been annotated the level of confidence in associations which demonstrate the phenotypic plasticity. In this paper, different steps of producing SNPPhenA corpus were explained. They include gathering the abstracts, automatic and manual SNP and phenotype name tagging and annotating their associations. Additionally, the corpus includes negation scope and cues as well as neutral candidates that have important role in the relation extraction tasks. The inter-annotator agreement score for the confidence level which have been annotated by two annotators was between 0.70 and 0.85 that exhibit the reliability of the corpus. Additionally, an initial experiment was carried out on the corpus.
— Genome-wide association (GWA) studies form an important category of research studies in persona... more — Genome-wide association (GWA) studies form an important category of research studies in personalized medicine which discuss on associations between single-nucleotide polymorphisms (SNPs) and phenotypic traits. Considering the fast growing rate of GWA studies, automatic extraction of SNP-Traits associations from text is a highly demanding task. In this research, first an SNP-Trait association corpus is produced and then a non-supervised relation extraction method grounded on linguistic-based negation detection method is proposed. The experiments show that negation cues and scope can be employed as a superior relation extraction method due to uniform polarity of the sentences, small number of neutral examples and concessive clauses in the corpus. The proposed method is a non-supervised relation extraction method which works at the sentence-level with no need to label training data. Moreover, the experiments indicate that the proposed method has a superior performance over the studied sequence kernel method.
2015 4th Iranian Joint Congress on Fuzzy and Intelligent Systems (CFIS), 2015
2015 Signal Processing and Intelligent Systems Conference (SPIS), 2015
International Journal of Information and Education Technology, 2011
International Journal of Computer and Electrical Engineering, 2013
Expectation maximization algorithm has been extensively used in a variety of medical image proces... more Expectation maximization algorithm has been extensively used in a variety of medical image processing applications, especially for detecting human brain disease. In this paper, an efficient and improved semi-automated Fuzzy EM based techniques for 3-D MR segmentation of human brain images is presented. FEM along with histogram based K-means in initialization step is used for the labeling of individual pixels/voxels of a 3D anatomical MR image (MRI) into the main tissue classes in the brain, Gray matter (GM), White matter (WM), CSF (Celebro-spinal fluid). FEM's membership function were estimated through a histogram-based method. The results show our proposed FEM-KMeans has better performance and convergence speed compare to histogram based EM.
J. for International Business and Entrepreneurship Development, 2011
... (SABA), Mauritius, and Financial Advisor, Rapid Online Sdn Bhd, Kuala Lumpur, Malaysia. ... H... more ... (SABA), Mauritius, and Financial Advisor, Rapid Online Sdn Bhd, Kuala Lumpur, Malaysia. ... He received his Master of BioMedical Informatics and Master of Computer Science from Amir Kabir University of Technology, Tehran, Iran. ...
We designed a working memory (WM) training programme in game framework for mild intellectually di... more We designed a working memory (WM) training programme in game framework for mild intellectually disabled students. Twenty-four students participated as test and control groups. The auditory and visual-spatial WM were assessed by primary test, which included computerised Wechsler numerical forward and backward sub-tests and secondary tests, which contained three parts: dual visual-spatial test, auditory test and a one-syllable word recalling test. The results showed significant difference between WM capacity in the intellectually disabled children and normal ones (p-value < 0.00001). Visual-spatial WM, auditory WM and speaking were improved in the trained group. Four tests showed significant differences between pre-test and post-tests. The trained group showed more improvements in forward tasks. The trained participant's processing speed increased with training.
Resumen: La extracción de relaciones entre entidades es una tarea muy importante dentro del proce... more Resumen: La extracción de relaciones entre entidades es una tarea muy importante dentro del procesamiento de textos biomédicos. Se han desarrollado muchos algoritmos para este propósito aunque sólo unos pocos han estudiado el tema de las interacciones entre fármacos. En este trabajo se ha estudiado el efecto de la negación para esta tarea. En primer lugar, se describe cómo se ha extendido el corpus DrugDDI con anotaciones sobre negaciones y, en segundo lugar, se muestran una serie de experimentos en los que se muestra que tener en cuenta el efecto de la negación puede mejorar la detección de interacciones entre fármacos cuando se combina con otros métodos de extracción de relaciones. Palabras clave: Interacciones entre fármacos, negación, funciones kernel, máquinas de vectores de soporte, funciones kernel.