Goran Nenadic - Academia.edu (original) (raw)

Papers by Goran Nenadic

Research paper thumbnail of MedMine: Examining Pre-trained Language Models on Medication Mining

arXiv (Cornell University), Aug 7, 2023

Research paper thumbnail of Topic Modelling of Swedish Newspaper Articles about Coronavirus: a Case Study using Latent Dirichlet Allocation Method

arXiv (Cornell University), Jan 8, 2023

Research paper thumbnail of An Analysis of PubMed Abstracts From 1946 to 2021 to Identify Organizational Affiliations in Epidemiological Criminology: Descriptive Study

Interactive Journal of Medical Research

Background Epidemiological criminology refers to health issues affecting incarcerated and noninca... more Background Epidemiological criminology refers to health issues affecting incarcerated and nonincarcerated offender populations, a group recognized as being challenging to conduct research with. Notwithstanding this, an urgent need exists for new knowledge and interventions to improve health, justice, and social outcomes for this marginalized population. Objective To better understand research outputs in the field of epidemiological criminology, we examined the lead author’s affiliation by analyzing peer-reviewed published outputs to determine countries and organizations (eg, universities, governmental and nongovernmental organizations) responsible for peer-reviewed publications. Methods We used a semiautomated approach to examine the first-author affiliations of 23,904 PubMed epidemiological studies related to incarcerated and offender populations published in English between 1946 and 2021. We also mapped research outputs to the World Justice Project Rule of Law Index to better unde...

Research paper thumbnail of EDU-level Extractive Summarization with Varying Summary Lengths

arXiv (Cornell University), Oct 8, 2022

Research paper thumbnail of A Text Mining Model for Answering Checklist Questions Automatically from Parasitology Literature

2020 International Conference on Computing and Information Technology (ICCIT-1441), 2020

Complete reporting of Experimental Meta-data (EM) is necessary for reproducing and understanding ... more Complete reporting of Experimental Meta-data (EM) is necessary for reproducing and understanding biomedical experiments and results. Experimental Metadata Reporting Checklist Questions (EMR-CLQs) have been designed and used by journals as guidelines to capture EM and evaluate the quality of the reporting. Automatically answering EMR-CLQs is necessary to check completeness and clarity of EM, which can be useful for the peer-review process. Moreover, automatically extracting the EMR-CLQs answers can be used to search the relevant literature for the meta-data analysis process in an efficient way. This paper shows the possibility of answering different types of EMR-CLQs automatically by understanding the structure of both EMR-CLQs and the biomedical article. A text mining model (rule-based approach) based on the information extraction techniques and the structure of the biomedical articles and the EMR-CLQs, is proposed as a first model in the biomedical reproducibility domain to answer EMR-CLQs automatically. The model was used to answer five EMR-CLQs of two different types automatically; Main and Attribute questions. We evaluated the feasibility of the model against gold-standard data of 58 full-text articles annotated by domain experts. The results are showing the possibility of answering the EMR-CLQs automatically with a mean f-measure of 75% and 73% for development and testing datasets, respectively.

Research paper thumbnail of MASK: A Success Story for An International Collaboration

International Journal of Population Data Science, 2020

IntroductionA significant amount of valuable information in Electronic Health Records (EHR) such ... more IntroductionA significant amount of valuable information in Electronic Health Records (EHR) such as laboratory test results or echocardiogram interpretations is embedded in lengthy free-text fields. Often patients’ personal information is also included in these narratives. Privacy legislation in different jurisdictions requires de-identification of this information prior to making it available for research. This process can be challenging and time-consuming. In particular, rule-based algorithms may lead to over-masking of essential medical terms, conditions, or devices that are named after individuals. Objectives and ApproachWe aimed to enhance ICES’ existing rule-based application to make it contextually-driven by applying Artificial Intelligence (AI). The ICES team collaborated with computer scientists at the University of Manchester who had already published work in this area and Evenset, a Toronto-based software company. Based on the Manchester University de-identification frame...

Research paper thumbnail of A curation pipeline for bio-derived chemical feedstocks

Research paper thumbnail of A Framework for Evaluation of Machine Reading Comprehension Gold Standards

ArXiv, 2020

Machine Reading Comprehension (MRC) is the task of answering a question over a paragraph of text.... more Machine Reading Comprehension (MRC) is the task of answering a question over a paragraph of text. While neural MRC systems gain popularity and achieve noticeable performance, issues are being raised with the methodology used to establish their performance, particularly concerning the data design of gold standards that are used to evaluate them. There is but a limited understanding of the challenges present in this data, which makes it hard to draw comparisons and formulate reliable hypotheses. As a first step towards alleviating the problem, this paper proposes a unifying framework to systematically investigate the present linguistic features, required reasoning and background knowledge and factual correctness on one hand, and the presence of lexical cues as a lower bound for the requirement of understanding on the other hand. We propose a qualitative annotation schema for the first and a set of approximative metrics for the latter. In a first application of the framework, we analys...

Research paper thumbnail of Early Phase Validation of a Decision Support System within the Exemplar of Aneurysmal Subarachnoid Haemorrhage

Research paper thumbnail of Extracting useful software development information from mobile application reviews: A survey of intelligent mining techniques and tools

Expert Systems with Applications, 2018

Abstract Mobile application (app) websites such as Google Play and AppStore allow users to review... more Abstract Mobile application (app) websites such as Google Play and AppStore allow users to review their downloaded apps. Such reviews can be useful for app users, as they may help users make an informed decision; such reviews can also be potentially useful for app developers, if they contain valuable information concerning user needs and requirements. However, in order to unleash the value of app reviews for mobile app development, intelligent mining tools that can help discern relevant reviews from irrelevant ones must be provided. This paper surveys the state of the art in the development of such tools and techniques behind them. To gain insight into the maturity of the current support mining tools, the paper will also find out what app development information these tools have discovered and what challenges they are facing. The results of this survey can inform the development of more effective and intelligent app review mining techniques and tools.

Research paper thumbnail of Automatic Extraction of Mental Health Disorders From Domestic Violence Police Narratives: Text Mining Study

Journal of medical Internet research, Jan 13, 2018

Vast numbers of domestic violence (DV) incidents are attended by the New South Wales Police Force... more Vast numbers of domestic violence (DV) incidents are attended by the New South Wales Police Force each year in New South Wales and recorded as both structured quantitative data and unstructured free text in the WebCOPS (Web-based interface for the Computerised Operational Policing System) database regarding the details of the incident, the victim, and person of interest (POI). Although the structured data are used for reporting purposes, the free text remains untapped for DV reporting and surveillance purposes. In this paper, we explore whether text mining can automatically identify mental health disorders from this unstructured text. We used a training set of 200 DV recorded events to design a knowledge-driven approach based on lexical patterns in text suggesting mental health disorders for POIs and victims. The precision returned from an evaluation set of 100 DV events was 97.5% and 87.1% for mental health disorders related to POIs and victims, respectively. After applying our app...

Research paper thumbnail of Identification of Occupation Mentions in Clinical Narratives

Lecture Notes in Computer Science, 2016

A patient’s occupation is an important variable used for disease surveillance and modeling, but s... more A patient’s occupation is an important variable used for disease surveillance and modeling, but such information is often only available in free-text clinical narratives. We have developed a large occupation dictionary that is used as part of both knowledge- (dictionary and rules) and data-driven (machine-learning) methods for the identification of occupation mentions. We have evaluated the approaches on both public and non-public clinical datasets. A machine-learning method using linear chain conditional random fields trained on minimalistic set of features achieved up to 88 % \( {\text{F}}_{1} \)-measure (token-level), with the occupation feature derived from the knowledge-driven method showing a notable positive impact across the datasets (up to additional 32 % \( {\text{F}}_{1} \)-measure).

Research paper thumbnail of NTCIR-11

Research paper thumbnail of Building a web application to visualise and explore epidemiological literature

Research paper thumbnail of Improving Project Management through Ontology���Driven Text Mining

Research paper thumbnail of Using local grammars for agreement modeling in highly inflective languages

Research paper thumbnail of Digital methods to enhance the usefulness of patient experience data in services for long-term conditions: the DEPEND mixed-methods study

Health Services and Delivery Research, 2020

Background Collecting NHS patient experience data is critical to ensure the delivery of high-qual... more Background Collecting NHS patient experience data is critical to ensure the delivery of high-quality services. Data are obtained from multiple sources, including service-specific surveys and widely used generic surveys. There are concerns about the timeliness of feedback, that some groups of patients and carers do not give feedback and that free-text feedback may be useful but is difficult to analyse. Objective To understand how to improve the collection and usefulness of patient experience data in services for people with long-term conditions using digital data capture and improved analysis of comments. Design The DEPEND study is a mixed-methods study with four parts: qualitative research to explore the perspectives of patients, carers and staff; use of computer science text-analytics methods to analyse comments; co-design of new tools to improve data collection and usefulness; and implementation and process evaluation to assess use of the tools and any impacts. Setting Services fo...

Research paper thumbnail of MC-DRE: Multi-Aspect Cross Integration for Drug Event/Entity Extraction

Proceedings of the 32nd ACM International Conference on Information and Knowledge Management

Research paper thumbnail of Investigating Massive Multilingual Pre-Trained Machine Translation Models for Clinical Domain via Transfer Learning

Research paper thumbnail of Predicting Perfect Quality Segments in MT Output with Fine-Tuned OpenAI LLM: Is it possible to capture editing distance patterns from historical data?

arXiv (Cornell University), Jul 31, 2023

Research paper thumbnail of MedMine: Examining Pre-trained Language Models on Medication Mining

arXiv (Cornell University), Aug 7, 2023

Research paper thumbnail of Topic Modelling of Swedish Newspaper Articles about Coronavirus: a Case Study using Latent Dirichlet Allocation Method

arXiv (Cornell University), Jan 8, 2023

Research paper thumbnail of An Analysis of PubMed Abstracts From 1946 to 2021 to Identify Organizational Affiliations in Epidemiological Criminology: Descriptive Study

Interactive Journal of Medical Research

Background Epidemiological criminology refers to health issues affecting incarcerated and noninca... more Background Epidemiological criminology refers to health issues affecting incarcerated and nonincarcerated offender populations, a group recognized as being challenging to conduct research with. Notwithstanding this, an urgent need exists for new knowledge and interventions to improve health, justice, and social outcomes for this marginalized population. Objective To better understand research outputs in the field of epidemiological criminology, we examined the lead author’s affiliation by analyzing peer-reviewed published outputs to determine countries and organizations (eg, universities, governmental and nongovernmental organizations) responsible for peer-reviewed publications. Methods We used a semiautomated approach to examine the first-author affiliations of 23,904 PubMed epidemiological studies related to incarcerated and offender populations published in English between 1946 and 2021. We also mapped research outputs to the World Justice Project Rule of Law Index to better unde...

Research paper thumbnail of EDU-level Extractive Summarization with Varying Summary Lengths

arXiv (Cornell University), Oct 8, 2022

Research paper thumbnail of A Text Mining Model for Answering Checklist Questions Automatically from Parasitology Literature

2020 International Conference on Computing and Information Technology (ICCIT-1441), 2020

Complete reporting of Experimental Meta-data (EM) is necessary for reproducing and understanding ... more Complete reporting of Experimental Meta-data (EM) is necessary for reproducing and understanding biomedical experiments and results. Experimental Metadata Reporting Checklist Questions (EMR-CLQs) have been designed and used by journals as guidelines to capture EM and evaluate the quality of the reporting. Automatically answering EMR-CLQs is necessary to check completeness and clarity of EM, which can be useful for the peer-review process. Moreover, automatically extracting the EMR-CLQs answers can be used to search the relevant literature for the meta-data analysis process in an efficient way. This paper shows the possibility of answering different types of EMR-CLQs automatically by understanding the structure of both EMR-CLQs and the biomedical article. A text mining model (rule-based approach) based on the information extraction techniques and the structure of the biomedical articles and the EMR-CLQs, is proposed as a first model in the biomedical reproducibility domain to answer EMR-CLQs automatically. The model was used to answer five EMR-CLQs of two different types automatically; Main and Attribute questions. We evaluated the feasibility of the model against gold-standard data of 58 full-text articles annotated by domain experts. The results are showing the possibility of answering the EMR-CLQs automatically with a mean f-measure of 75% and 73% for development and testing datasets, respectively.

Research paper thumbnail of MASK: A Success Story for An International Collaboration

International Journal of Population Data Science, 2020

IntroductionA significant amount of valuable information in Electronic Health Records (EHR) such ... more IntroductionA significant amount of valuable information in Electronic Health Records (EHR) such as laboratory test results or echocardiogram interpretations is embedded in lengthy free-text fields. Often patients’ personal information is also included in these narratives. Privacy legislation in different jurisdictions requires de-identification of this information prior to making it available for research. This process can be challenging and time-consuming. In particular, rule-based algorithms may lead to over-masking of essential medical terms, conditions, or devices that are named after individuals. Objectives and ApproachWe aimed to enhance ICES’ existing rule-based application to make it contextually-driven by applying Artificial Intelligence (AI). The ICES team collaborated with computer scientists at the University of Manchester who had already published work in this area and Evenset, a Toronto-based software company. Based on the Manchester University de-identification frame...

Research paper thumbnail of A curation pipeline for bio-derived chemical feedstocks

Research paper thumbnail of A Framework for Evaluation of Machine Reading Comprehension Gold Standards

ArXiv, 2020

Machine Reading Comprehension (MRC) is the task of answering a question over a paragraph of text.... more Machine Reading Comprehension (MRC) is the task of answering a question over a paragraph of text. While neural MRC systems gain popularity and achieve noticeable performance, issues are being raised with the methodology used to establish their performance, particularly concerning the data design of gold standards that are used to evaluate them. There is but a limited understanding of the challenges present in this data, which makes it hard to draw comparisons and formulate reliable hypotheses. As a first step towards alleviating the problem, this paper proposes a unifying framework to systematically investigate the present linguistic features, required reasoning and background knowledge and factual correctness on one hand, and the presence of lexical cues as a lower bound for the requirement of understanding on the other hand. We propose a qualitative annotation schema for the first and a set of approximative metrics for the latter. In a first application of the framework, we analys...

Research paper thumbnail of Early Phase Validation of a Decision Support System within the Exemplar of Aneurysmal Subarachnoid Haemorrhage

Research paper thumbnail of Extracting useful software development information from mobile application reviews: A survey of intelligent mining techniques and tools

Expert Systems with Applications, 2018

Abstract Mobile application (app) websites such as Google Play and AppStore allow users to review... more Abstract Mobile application (app) websites such as Google Play and AppStore allow users to review their downloaded apps. Such reviews can be useful for app users, as they may help users make an informed decision; such reviews can also be potentially useful for app developers, if they contain valuable information concerning user needs and requirements. However, in order to unleash the value of app reviews for mobile app development, intelligent mining tools that can help discern relevant reviews from irrelevant ones must be provided. This paper surveys the state of the art in the development of such tools and techniques behind them. To gain insight into the maturity of the current support mining tools, the paper will also find out what app development information these tools have discovered and what challenges they are facing. The results of this survey can inform the development of more effective and intelligent app review mining techniques and tools.

Research paper thumbnail of Automatic Extraction of Mental Health Disorders From Domestic Violence Police Narratives: Text Mining Study

Journal of medical Internet research, Jan 13, 2018

Vast numbers of domestic violence (DV) incidents are attended by the New South Wales Police Force... more Vast numbers of domestic violence (DV) incidents are attended by the New South Wales Police Force each year in New South Wales and recorded as both structured quantitative data and unstructured free text in the WebCOPS (Web-based interface for the Computerised Operational Policing System) database regarding the details of the incident, the victim, and person of interest (POI). Although the structured data are used for reporting purposes, the free text remains untapped for DV reporting and surveillance purposes. In this paper, we explore whether text mining can automatically identify mental health disorders from this unstructured text. We used a training set of 200 DV recorded events to design a knowledge-driven approach based on lexical patterns in text suggesting mental health disorders for POIs and victims. The precision returned from an evaluation set of 100 DV events was 97.5% and 87.1% for mental health disorders related to POIs and victims, respectively. After applying our app...

Research paper thumbnail of Identification of Occupation Mentions in Clinical Narratives

Lecture Notes in Computer Science, 2016

A patient’s occupation is an important variable used for disease surveillance and modeling, but s... more A patient’s occupation is an important variable used for disease surveillance and modeling, but such information is often only available in free-text clinical narratives. We have developed a large occupation dictionary that is used as part of both knowledge- (dictionary and rules) and data-driven (machine-learning) methods for the identification of occupation mentions. We have evaluated the approaches on both public and non-public clinical datasets. A machine-learning method using linear chain conditional random fields trained on minimalistic set of features achieved up to 88 % \( {\text{F}}_{1} \)-measure (token-level), with the occupation feature derived from the knowledge-driven method showing a notable positive impact across the datasets (up to additional 32 % \( {\text{F}}_{1} \)-measure).

Research paper thumbnail of NTCIR-11

Research paper thumbnail of Building a web application to visualise and explore epidemiological literature

Research paper thumbnail of Improving Project Management through Ontology���Driven Text Mining

Research paper thumbnail of Using local grammars for agreement modeling in highly inflective languages

Research paper thumbnail of Digital methods to enhance the usefulness of patient experience data in services for long-term conditions: the DEPEND mixed-methods study

Health Services and Delivery Research, 2020

Background Collecting NHS patient experience data is critical to ensure the delivery of high-qual... more Background Collecting NHS patient experience data is critical to ensure the delivery of high-quality services. Data are obtained from multiple sources, including service-specific surveys and widely used generic surveys. There are concerns about the timeliness of feedback, that some groups of patients and carers do not give feedback and that free-text feedback may be useful but is difficult to analyse. Objective To understand how to improve the collection and usefulness of patient experience data in services for people with long-term conditions using digital data capture and improved analysis of comments. Design The DEPEND study is a mixed-methods study with four parts: qualitative research to explore the perspectives of patients, carers and staff; use of computer science text-analytics methods to analyse comments; co-design of new tools to improve data collection and usefulness; and implementation and process evaluation to assess use of the tools and any impacts. Setting Services fo...

Research paper thumbnail of MC-DRE: Multi-Aspect Cross Integration for Drug Event/Entity Extraction

Proceedings of the 32nd ACM International Conference on Information and Knowledge Management

Research paper thumbnail of Investigating Massive Multilingual Pre-Trained Machine Translation Models for Clinical Domain via Transfer Learning

Research paper thumbnail of Predicting Perfect Quality Segments in MT Output with Fine-Tuned OpenAI LLM: Is it possible to capture editing distance patterns from historical data?

arXiv (Cornell University), Jul 31, 2023