Gurinder Gosal - Academia.edu (original) (raw)
Papers by Gurinder Gosal
2011 IEEE International Conference on Bioinformatics and Biomedicine, 2011
... REFERENCES [1] G. Manning, DB Whyte, R. Martinez, T. Hunter, and S. Sudarsanam, &... more ... REFERENCES [1] G. Manning, DB Whyte, R. Martinez, T. Hunter, and S. Sudarsanam, "The protein kinase complement of the human genome," Science, vol. 298, 2002, pp. 1912-34. ... 26, Sep. 2006, pp. 569-94. [15] A. Krupa, KR Abhinandan, and N. Srinivasan, "KinG: a ...
npj Science of Food
The construction of high capacity data sharing networks to support increasing government and comm... more The construction of high capacity data sharing networks to support increasing government and commercial data exchange has highlighted a key roadblock: the content of existing Internet-connected information remains siloed due to a multiplicity of local languages and data dictionaries. This lack of a digital lingua franca is obvious in the domain of human food as materials travel from their wild or farm origin, through processing and distribution chains, to consumers. Well defined, hierarchical vocabulary, connected with logical relationships-in other words, an ontology-is urgently needed to help tackle data harmonization problems that span the domains of food security, safety, quality, production, distribution, and consumer health and convenience. FoodOn (http://foodon.org) is a consortium-driven project to build a comprehensive and easily accessible global farm-to-fork ontology about food, that accurately and consistently describes foods commonly known in cultures from around the world. FoodOn addresses food product terminology gaps and supports food traceability. Focusing on human and domesticated animal food description, FoodOn contains animal and plant food sources, food categories and products, and other facets like preservation processes, contact surfaces, and packaging. Much of FoodOn's vocabulary comes from transforming LanguaL, a mature and popular food indexing thesaurus, into a World Wide Web Consortium (W3C) OWL Web Ontology Language-formatted vocabulary that provides system interoperability, quality control, and software-driven intelligence. FoodOn compliments other technologies facilitating food traceability, which is becoming critical in this age of increasing globalization of food networks.
npj Science of Food
The construction of high capacity data sharing networks to support increasing government and comm... more The construction of high capacity data sharing networks to support increasing government and commercial data exchange has highlighted a key roadblock: the content of existing Internet-connected information remains siloed due to a multiplicity of local languages and data dictionaries. This lack of a digital lingua franca is obvious in the domain of human food as materials travel from their wild or farm origin, through processing and distribution chains, to consumers. Well defined, hierarchical vocabulary, connected with logical relationships-in other words, an ontology-is urgently needed to help tackle data harmonization problems that span the domains of food security, safety, quality, production, distribution, and consumer health and convenience. FoodOn (http://foodon.org) is a consortium-driven project to build a comprehensive and easily accessible global farm-to-fork ontology about food, that accurately and consistently describes foods commonly known in cultures from around the world. FoodOn addresses food product terminology gaps and supports food traceability. Focusing on human and domesticated animal food description, FoodOn contains animal and plant food sources, food categories and products, and other facets like preservation processes, contact surfaces, and packaging. Much of FoodOn's vocabulary comes from transforming LanguaL, a mature and popular food indexing thesaurus, into a World Wide Web Consortium (W3C) OWL Web Ontology Language-formatted vocabulary that provides system interoperability, quality control, and software-driven intelligence. FoodOn compliments other technologies facilitating food traceability, which is becoming critical in this age of increasing globalization of food networks.
—There has been growing interest in the task of Named Entity Recognition (NER) and a lot of resea... more —There has been growing interest in the task of Named Entity Recognition (NER) and a lot of research has been done in this direction in last two decades. Particularly, a lot of progress has been made in the biomedical domain with emphasis on identifying domain-specific entities and often the task being known as Biological Named Entity Recognition (BER). The task of biological entity recognition (BER) has been proved to be a challenging task due to several reasons as identified by many researchers. The recognition of biological entities in text and the extraction of relationships between them have paved the way for doing more complex text-mining tasks and building further applications. This paper looks at the challenges perceived by the researchers in BER task and investigates the works done in the domain of BER by using the multiple approaches available for the task.
There is an increasing awareness within private and public organizations that ontologies (globall... more There is an increasing awareness within private and public organizations that ontologies (globally accessible and uniquely identified terms that have both natural language definitions and logic relations which can be queried and reasoned over by computers) are useful in solving interoperability quagmires between data silos and the add-hoc data dictionaries that describe them. However, the complexity of implementing evolving ontologies in content management and federated data querying applications is formidable. The Genomic Epidemiology Entity Mart (GEEM) web platform is a proof-of-concept web portal designed to provide non-ontologist users with an ontology-driven interface for examining data standards related to genomic sequence repository records. GEEM provides web forms that show labels and allowed-values for easy review. It also provides software developers with downloadable specifications in JSON and other data formats that can be used without the need for ontology expertise. New systems can adopt ontology-driven standards specifications from the start, and the same specifications can be used to facilitate and validate the conversion of legacy data.
— In the last two decades a large amount of research has been undertaken in the area of mining th... more — In the last two decades a large amount of research has been undertaken in the area of mining the biological text. This has led to the development of multiple applications and tools by the researchers and developers thus benefitting a large number of users to effectively deal with extraction and integration of knowledge from unstructured data. One of the important tasks under Biological Natural language Processing (Bio-NLP) is of recognizing domain-specific biological entities. This task, usually referred to as Biological Named Entity Recognition (BER), has been perceived by many researchers as a challenging task citing several reasons. The presented paper looks at the multiple approaches used for the task of recognizing biological entities, examines the challenges faced in the task and reviews the important works done by the researchers for performing the task while using these approaches.
The word sense disambiguation (WSD) is the task ofautomatically selecting the correct sense given... more The word sense disambiguation (WSD) is the task ofautomatically selecting the correct sense given a context and it helps in solving many ambiguity problems inherently existing in all natural languages.Statistical Natural Language Processing (NLP),which is based on probabilistic, stochastic and statistical methods, has been used to solve many NLP problems.The Naive Bayes algorithm which is one of the supervised learning techniques has worked well in many classification problems. In the present work, WSD task to disambiguate the senses of different words from the standard corpora available in the " 1998 SENSEVAL Word Sense Disambiguation (WSD) shared task " is performed by applying Naïve Bayes machine learning technique. It is observed that senses of ambiguous word having lesser number of part-of-speeches are disambiguated more correctly. Other key observation is that with lesser number of senses to be disambiguated, the chances of words being disambiguated with correct senses are more. I. INTRODUCTION The ambiguity in the senses of the words of different languages does exist inherently in all natural languages used by humans. There are many words in every language which carry more than one meaning for the same word. For example, the word ―chair‖ has one sense which means a piece of furniture and other sense of it means a person chairing say some session. So obviously we need some context to select the correct sense given a situation. Automatically selecting the correct sense given a context is in the core of solving many ambiguity problems. The word sense disambiguation (WSD) is the task to automatically determine which of the senses of an ambiguous (target) word is chosen in the specific use of the word by taking into consideration the context of word's use [1,2]. Having an accurate and reliable word sense disambiguation has been the target of natural language community since long. The motivation and belief behind performing word sense disambiguation is that many tasks which are performed under the umbrella of NLP are highly benefitted with properly disambiguated word senses.Statistical NLP, a special approach of NLP based onthe probabilistic, stochastic and statistical methods, uses machine learning algorithms to solve many NLP problems. AS a branch ofartificial intelligence, machine learning involves computationallylearning patterns from given data, and applying to new or unseen data the pattern which were learned earlier. Machine learning is defined by Tom M.Mitchell as ―A computer program is said to learn from experience E with respect to some class of tasksT and performance measure P, if its performance at tasks in T,as measured by P, improves withexperience E [3].‖ Learning algorithms can be generally classified into three types: supervised learning, semi-supervised learning and unsupervised learning. Supervised learning technique is based on the idea of studying the features of positive and negative examples over a large collection of annotated corpus. Semi-supervised learning uses both labeled data and unlabeled data for the learning process to reduce the dependence on training data. In the unsupervised learning, decisions are made on the basis of unlabeled data. The methods of unsupervised learning are mostly built upon clustering techniques, similarity based functions and distribution statistics. For automatic WSD,supervised learningis one ofthe most successfulapproaches.
The semantic annotation with manual means is an expensive process and often does not consider the... more The semantic annotation with manual means is an expensive process and often does not consider the multiple perspectives of a data source. The automation of the annotation process is essential to provide the scalability needed to annotate existing documents and reduce the burden of annotating new documents considering we have to deal with large collections of data. The automatic annotations bring with them the benefits of improved information retrieval and enhanced interoperability. In this paper the issues related to automatic representation and uses of the semantic annotation have been looked at and some important semantic works and platforms are investigated.
— (Natural Language Processing (NLP) is the area of research which focuses on the different tasks... more — (Natural Language Processing (NLP) is the area of research which focuses on the different tasks of understanding, extraction and retrieval from unstructured text. It makes use of multiple tools, resources and methodologies for performing these tasks. NLP applications developed depend heavily upon resources apart from tools and methodologies. Like many other Indian languages, Punjabi language also inherits a rich literature history but on technological aspects it is relatively under resourced and still a lot of work remains to be done in the field of Punjabi language processing. There are many researchers, groups and organizations which are working on the different aspects of Punjabi language processing but it does not have many NLP resources of its own, such as, annotated corpora, rich dictionaries, sentiment lexicons, conceptualized domains etc. Our present work is an attempt to develop a controlled vocabulary of concepts or topics (domains) for Punjabi words and present it in the form of 'domains ontology', PUDO (Punjabi Domains Ontology). Ontologies capture and describe the current state of knowledge about a domain of interest, and represent it in terms of concepts and relationships in ways that computers can process efficiently and humans can understand easily. This paper presents our work which is based on identifying the concepts termed as domains for Punjabi language words which can be organized in a hierarchical manner. The hierarchy is based on relation of specificity for Punjabi language. We developed the domains ontology by starting assigning concepts as top level domains and then conceptualized lower level domains having more granular conceptualization under the higher level domains. The developed ontology is further populated with the words as instances which evoke these domains. This developed resource can further be used in different semantically based NLP tasks in Punjabi language.
Background: Protein kinases are a large and diverse family of enzymes that are genomically altere... more Background: Protein kinases are a large and diverse family of enzymes that are genomically altered in many human cancers. Targeted cancer genome sequencing efforts have unveiled the mutational profiles of protein kinase genes from many different cancer types. While mutational data on protein kinases is currently catalogued in various databases, integration of mutation data with other forms of data on protein kinases such as sequence, structure, function and pathway is necessary to identify and characterize key cancer causing mutations. Integrative analysis of protein kinase data, however, is a challenge because of the disparate nature of protein kinase data sources and data formats.
— Protein kinases play a prominent role in cell regulation and disease, which has given rise to a... more — Protein kinases play a prominent role in cell regulation and disease, which has given rise to an abundance of information about the structure, function, disease, pathway, interaction and evolution of these proteins. This information, however, is currently spread across several heterogeneous resources, an obstacle to the kind of integrative approaches needed in utilizing existing knowledge for research related to diseases. We have designed and developed an ontology for protein kinases, ProKinO, that serves as a useful and efficient representation of the integrated knowledge about these complex proteins which are intimately involved in the genesis and behavior of cancer cells. Concepts and relationships in ProKinO capture important knowledge about kinases while ProKinO instances represent a wealth of data acquired from disparate resources, including KinBase, COSMIC, UniProt, and Reactome. We have created a customized ontology browser for ProKinO. Also, we used ProKinO to do a variety of integrative analyses using SPARQL queries.
—Ontologies are being developed and used in many disciplines now a day and they have become a key... more —Ontologies are being developed and used in many disciplines now a day and they have become a key tool of data integration and knowledge representation in different domains of interest. The ontology building process identifies the stages through which the ontology should go through during its development. There is a certain set of activities to be performed in each stage of the ontology development process and different methodologies have been proposed by researchers for formalizing the different stages. The present paper investigates the most representative methodologies used in the ontology development to look at the different activities that are performed during the process of ontology development. The paper further attempts to provide an integrative view of the most representative methodologies used in the ontology development to look at the set of different activities that can be performed during the process of the ontology development.
–Statistical NLP is an approach of doing natural language processing to resolve the problems usua... more –Statistical NLP is an approach of doing natural language processing to resolve the problems usually encountered with traditional NLP. The task of finding some interesting combination of wordsfrom text in large corpora, known as collocation extraction, is one of many tasks in statistical NLP. Collocations have multiple applications and the methods of collocation extraction are influenced by their intended use. In the present task,the effectiveness of selecting bigram collocations from textis analysed and evaluatedby applying different statistical NLP approaches ranging from just raw frequency count to the usage of more sophisticated statistical association measures. It is observed that the collocations extracted by filtering bigrams using POS taggers seem to give the best results. It is also interesting to observe that some bad collocations are selected and verified by all approaches.
— Pharmacovigilance is the science that focuses on identification and characterization of adverse... more — Pharmacovigilance is the science that focuses on identification and characterization of adverse effects of medications in populations when released to market. The focus of this paper is to study the prospects of exploiting drug related online reviews contributed by social media groups for finding the adverse effects of drugs using opinion mining and sentiment analysis. The experiences and opinions related to drug adverse reactions by patients or other contributors in these forums can be mined and analyzed as a facilitator for pharmacovigilance. This review paper highlights the usability of opinion mining and sentiment analysis as one of the approaches for pharmacovigilance.
2011 IEEE International Conference on Bioinformatics and Biomedicine, 2011
... REFERENCES [1] G. Manning, DB Whyte, R. Martinez, T. Hunter, and S. Sudarsanam, &... more ... REFERENCES [1] G. Manning, DB Whyte, R. Martinez, T. Hunter, and S. Sudarsanam, "The protein kinase complement of the human genome," Science, vol. 298, 2002, pp. 1912-34. ... 26, Sep. 2006, pp. 569-94. [15] A. Krupa, KR Abhinandan, and N. Srinivasan, "KinG: a ...
npj Science of Food
The construction of high capacity data sharing networks to support increasing government and comm... more The construction of high capacity data sharing networks to support increasing government and commercial data exchange has highlighted a key roadblock: the content of existing Internet-connected information remains siloed due to a multiplicity of local languages and data dictionaries. This lack of a digital lingua franca is obvious in the domain of human food as materials travel from their wild or farm origin, through processing and distribution chains, to consumers. Well defined, hierarchical vocabulary, connected with logical relationships-in other words, an ontology-is urgently needed to help tackle data harmonization problems that span the domains of food security, safety, quality, production, distribution, and consumer health and convenience. FoodOn (http://foodon.org) is a consortium-driven project to build a comprehensive and easily accessible global farm-to-fork ontology about food, that accurately and consistently describes foods commonly known in cultures from around the world. FoodOn addresses food product terminology gaps and supports food traceability. Focusing on human and domesticated animal food description, FoodOn contains animal and plant food sources, food categories and products, and other facets like preservation processes, contact surfaces, and packaging. Much of FoodOn's vocabulary comes from transforming LanguaL, a mature and popular food indexing thesaurus, into a World Wide Web Consortium (W3C) OWL Web Ontology Language-formatted vocabulary that provides system interoperability, quality control, and software-driven intelligence. FoodOn compliments other technologies facilitating food traceability, which is becoming critical in this age of increasing globalization of food networks.
npj Science of Food
The construction of high capacity data sharing networks to support increasing government and comm... more The construction of high capacity data sharing networks to support increasing government and commercial data exchange has highlighted a key roadblock: the content of existing Internet-connected information remains siloed due to a multiplicity of local languages and data dictionaries. This lack of a digital lingua franca is obvious in the domain of human food as materials travel from their wild or farm origin, through processing and distribution chains, to consumers. Well defined, hierarchical vocabulary, connected with logical relationships-in other words, an ontology-is urgently needed to help tackle data harmonization problems that span the domains of food security, safety, quality, production, distribution, and consumer health and convenience. FoodOn (http://foodon.org) is a consortium-driven project to build a comprehensive and easily accessible global farm-to-fork ontology about food, that accurately and consistently describes foods commonly known in cultures from around the world. FoodOn addresses food product terminology gaps and supports food traceability. Focusing on human and domesticated animal food description, FoodOn contains animal and plant food sources, food categories and products, and other facets like preservation processes, contact surfaces, and packaging. Much of FoodOn's vocabulary comes from transforming LanguaL, a mature and popular food indexing thesaurus, into a World Wide Web Consortium (W3C) OWL Web Ontology Language-formatted vocabulary that provides system interoperability, quality control, and software-driven intelligence. FoodOn compliments other technologies facilitating food traceability, which is becoming critical in this age of increasing globalization of food networks.
—There has been growing interest in the task of Named Entity Recognition (NER) and a lot of resea... more —There has been growing interest in the task of Named Entity Recognition (NER) and a lot of research has been done in this direction in last two decades. Particularly, a lot of progress has been made in the biomedical domain with emphasis on identifying domain-specific entities and often the task being known as Biological Named Entity Recognition (BER). The task of biological entity recognition (BER) has been proved to be a challenging task due to several reasons as identified by many researchers. The recognition of biological entities in text and the extraction of relationships between them have paved the way for doing more complex text-mining tasks and building further applications. This paper looks at the challenges perceived by the researchers in BER task and investigates the works done in the domain of BER by using the multiple approaches available for the task.
There is an increasing awareness within private and public organizations that ontologies (globall... more There is an increasing awareness within private and public organizations that ontologies (globally accessible and uniquely identified terms that have both natural language definitions and logic relations which can be queried and reasoned over by computers) are useful in solving interoperability quagmires between data silos and the add-hoc data dictionaries that describe them. However, the complexity of implementing evolving ontologies in content management and federated data querying applications is formidable. The Genomic Epidemiology Entity Mart (GEEM) web platform is a proof-of-concept web portal designed to provide non-ontologist users with an ontology-driven interface for examining data standards related to genomic sequence repository records. GEEM provides web forms that show labels and allowed-values for easy review. It also provides software developers with downloadable specifications in JSON and other data formats that can be used without the need for ontology expertise. New systems can adopt ontology-driven standards specifications from the start, and the same specifications can be used to facilitate and validate the conversion of legacy data.
— In the last two decades a large amount of research has been undertaken in the area of mining th... more — In the last two decades a large amount of research has been undertaken in the area of mining the biological text. This has led to the development of multiple applications and tools by the researchers and developers thus benefitting a large number of users to effectively deal with extraction and integration of knowledge from unstructured data. One of the important tasks under Biological Natural language Processing (Bio-NLP) is of recognizing domain-specific biological entities. This task, usually referred to as Biological Named Entity Recognition (BER), has been perceived by many researchers as a challenging task citing several reasons. The presented paper looks at the multiple approaches used for the task of recognizing biological entities, examines the challenges faced in the task and reviews the important works done by the researchers for performing the task while using these approaches.
The word sense disambiguation (WSD) is the task ofautomatically selecting the correct sense given... more The word sense disambiguation (WSD) is the task ofautomatically selecting the correct sense given a context and it helps in solving many ambiguity problems inherently existing in all natural languages.Statistical Natural Language Processing (NLP),which is based on probabilistic, stochastic and statistical methods, has been used to solve many NLP problems.The Naive Bayes algorithm which is one of the supervised learning techniques has worked well in many classification problems. In the present work, WSD task to disambiguate the senses of different words from the standard corpora available in the " 1998 SENSEVAL Word Sense Disambiguation (WSD) shared task " is performed by applying Naïve Bayes machine learning technique. It is observed that senses of ambiguous word having lesser number of part-of-speeches are disambiguated more correctly. Other key observation is that with lesser number of senses to be disambiguated, the chances of words being disambiguated with correct senses are more. I. INTRODUCTION The ambiguity in the senses of the words of different languages does exist inherently in all natural languages used by humans. There are many words in every language which carry more than one meaning for the same word. For example, the word ―chair‖ has one sense which means a piece of furniture and other sense of it means a person chairing say some session. So obviously we need some context to select the correct sense given a situation. Automatically selecting the correct sense given a context is in the core of solving many ambiguity problems. The word sense disambiguation (WSD) is the task to automatically determine which of the senses of an ambiguous (target) word is chosen in the specific use of the word by taking into consideration the context of word's use [1,2]. Having an accurate and reliable word sense disambiguation has been the target of natural language community since long. The motivation and belief behind performing word sense disambiguation is that many tasks which are performed under the umbrella of NLP are highly benefitted with properly disambiguated word senses.Statistical NLP, a special approach of NLP based onthe probabilistic, stochastic and statistical methods, uses machine learning algorithms to solve many NLP problems. AS a branch ofartificial intelligence, machine learning involves computationallylearning patterns from given data, and applying to new or unseen data the pattern which were learned earlier. Machine learning is defined by Tom M.Mitchell as ―A computer program is said to learn from experience E with respect to some class of tasksT and performance measure P, if its performance at tasks in T,as measured by P, improves withexperience E [3].‖ Learning algorithms can be generally classified into three types: supervised learning, semi-supervised learning and unsupervised learning. Supervised learning technique is based on the idea of studying the features of positive and negative examples over a large collection of annotated corpus. Semi-supervised learning uses both labeled data and unlabeled data for the learning process to reduce the dependence on training data. In the unsupervised learning, decisions are made on the basis of unlabeled data. The methods of unsupervised learning are mostly built upon clustering techniques, similarity based functions and distribution statistics. For automatic WSD,supervised learningis one ofthe most successfulapproaches.
The semantic annotation with manual means is an expensive process and often does not consider the... more The semantic annotation with manual means is an expensive process and often does not consider the multiple perspectives of a data source. The automation of the annotation process is essential to provide the scalability needed to annotate existing documents and reduce the burden of annotating new documents considering we have to deal with large collections of data. The automatic annotations bring with them the benefits of improved information retrieval and enhanced interoperability. In this paper the issues related to automatic representation and uses of the semantic annotation have been looked at and some important semantic works and platforms are investigated.
— (Natural Language Processing (NLP) is the area of research which focuses on the different tasks... more — (Natural Language Processing (NLP) is the area of research which focuses on the different tasks of understanding, extraction and retrieval from unstructured text. It makes use of multiple tools, resources and methodologies for performing these tasks. NLP applications developed depend heavily upon resources apart from tools and methodologies. Like many other Indian languages, Punjabi language also inherits a rich literature history but on technological aspects it is relatively under resourced and still a lot of work remains to be done in the field of Punjabi language processing. There are many researchers, groups and organizations which are working on the different aspects of Punjabi language processing but it does not have many NLP resources of its own, such as, annotated corpora, rich dictionaries, sentiment lexicons, conceptualized domains etc. Our present work is an attempt to develop a controlled vocabulary of concepts or topics (domains) for Punjabi words and present it in the form of 'domains ontology', PUDO (Punjabi Domains Ontology). Ontologies capture and describe the current state of knowledge about a domain of interest, and represent it in terms of concepts and relationships in ways that computers can process efficiently and humans can understand easily. This paper presents our work which is based on identifying the concepts termed as domains for Punjabi language words which can be organized in a hierarchical manner. The hierarchy is based on relation of specificity for Punjabi language. We developed the domains ontology by starting assigning concepts as top level domains and then conceptualized lower level domains having more granular conceptualization under the higher level domains. The developed ontology is further populated with the words as instances which evoke these domains. This developed resource can further be used in different semantically based NLP tasks in Punjabi language.
Background: Protein kinases are a large and diverse family of enzymes that are genomically altere... more Background: Protein kinases are a large and diverse family of enzymes that are genomically altered in many human cancers. Targeted cancer genome sequencing efforts have unveiled the mutational profiles of protein kinase genes from many different cancer types. While mutational data on protein kinases is currently catalogued in various databases, integration of mutation data with other forms of data on protein kinases such as sequence, structure, function and pathway is necessary to identify and characterize key cancer causing mutations. Integrative analysis of protein kinase data, however, is a challenge because of the disparate nature of protein kinase data sources and data formats.
— Protein kinases play a prominent role in cell regulation and disease, which has given rise to a... more — Protein kinases play a prominent role in cell regulation and disease, which has given rise to an abundance of information about the structure, function, disease, pathway, interaction and evolution of these proteins. This information, however, is currently spread across several heterogeneous resources, an obstacle to the kind of integrative approaches needed in utilizing existing knowledge for research related to diseases. We have designed and developed an ontology for protein kinases, ProKinO, that serves as a useful and efficient representation of the integrated knowledge about these complex proteins which are intimately involved in the genesis and behavior of cancer cells. Concepts and relationships in ProKinO capture important knowledge about kinases while ProKinO instances represent a wealth of data acquired from disparate resources, including KinBase, COSMIC, UniProt, and Reactome. We have created a customized ontology browser for ProKinO. Also, we used ProKinO to do a variety of integrative analyses using SPARQL queries.
—Ontologies are being developed and used in many disciplines now a day and they have become a key... more —Ontologies are being developed and used in many disciplines now a day and they have become a key tool of data integration and knowledge representation in different domains of interest. The ontology building process identifies the stages through which the ontology should go through during its development. There is a certain set of activities to be performed in each stage of the ontology development process and different methodologies have been proposed by researchers for formalizing the different stages. The present paper investigates the most representative methodologies used in the ontology development to look at the different activities that are performed during the process of ontology development. The paper further attempts to provide an integrative view of the most representative methodologies used in the ontology development to look at the set of different activities that can be performed during the process of the ontology development.
–Statistical NLP is an approach of doing natural language processing to resolve the problems usua... more –Statistical NLP is an approach of doing natural language processing to resolve the problems usually encountered with traditional NLP. The task of finding some interesting combination of wordsfrom text in large corpora, known as collocation extraction, is one of many tasks in statistical NLP. Collocations have multiple applications and the methods of collocation extraction are influenced by their intended use. In the present task,the effectiveness of selecting bigram collocations from textis analysed and evaluatedby applying different statistical NLP approaches ranging from just raw frequency count to the usage of more sophisticated statistical association measures. It is observed that the collocations extracted by filtering bigrams using POS taggers seem to give the best results. It is also interesting to observe that some bad collocations are selected and verified by all approaches.
— Pharmacovigilance is the science that focuses on identification and characterization of adverse... more — Pharmacovigilance is the science that focuses on identification and characterization of adverse effects of medications in populations when released to market. The focus of this paper is to study the prospects of exploiting drug related online reviews contributed by social media groups for finding the adverse effects of drugs using opinion mining and sentiment analysis. The experiences and opinions related to drug adverse reactions by patients or other contributors in these forums can be mined and analyzed as a facilitator for pharmacovigilance. This review paper highlights the usability of opinion mining and sentiment analysis as one of the approaches for pharmacovigilance.