Dr Mahmoud El-Haj | Lancaster University (original) (raw)
Papers by Dr Mahmoud El-Haj
Proceedings of the Workshop MultiLing 2019: Summarization Across Languages, Genres and Sources associated with RANLP 2019, Dec 15, 2019
The Financial Narrative Summarisation task at MultiLing 2019 aims to demonstrate the value and ch... more The Financial Narrative Summarisation task at MultiLing 2019 aims to demonstrate the value and challenges of applying automatic text summarisation to financial text written in English, usually referred to as financial narrative disclosures. The task dataset has been extracted from UK annual reports published in PDF file format. The participants were asked to provide structured summaries, based on real-world, publicly available financial annual reports of UK firms by extracting information from different key sections. Participants were asked to generate summaries that reflects the analysis and assessment of the financial trend of the business over the past year, as provided by annual reports. The evaluation of the summaries was performed using Au-toSummENG and Rouge automatic metrics. This paper focuses mainly on the data creation process.
Multilingual Text Analysis, Feb 1, 2019
This chapter describes and evaluates the use of Information Extraction and Natural Language Proce... more This chapter describes and evaluates the use of Information Extraction and Natural Language Processing methods for extraction and analysis of financial annual reports in three languages: English, Spanish and Portuguese. The work described retains information on document structure which is needed to enable a clear distinction between narrative and financial statement components of annual reports and between individual sections within the narratives component. Extraction accuracy varies between languages with English exceeding 95 %. We apply the extraction methods on a comprehensive sample of annual reports published by UK, Spanish and Portuguese non-financial firms between 2003 and 2014.
Accounting and Business Research
Young (2019): Retrieving, classifying and analysing narrative commentary in unstructured (glossy)... more Young (2019): Retrieving, classifying and analysing narrative commentary in unstructured (glossy) annual reports published as PDF files, Accounting and Business Research,
SSRN Electronic Journal
We measure annual report commentary articulating entities’ business model and strategy, and then ... more We measure annual report commentary articulating entities’ business model and strategy, and then examine the capital market effects of enhancing such disclosure. Our empirical disclosure proxy is based on n-grams drawn from popular strategy textbooks and the academic strategy literature. Validation tests confirm that our score: (a) correlates with manual classifications of the quality of strategy-focused disclosures produced by domain experts; (b) covaries predictably with firm-level drivers of strategy-focused disclosures identified by prior research; and (c) captures the structural break in reporting associated with the regulatory mandate for a subset of London Stock Exchange firms to explain their strategy and business model. Tests using this exogenous and measurable increase in strategy-focused disclosure show that enhanced commentary on strategy and business model is associated with lower investor uncertainty. We also find support for an increase in the speed at which information is incorporated into stock price following the annual report release.
Journal of Business Finance & Accounting
We critically assess mainstream accounting and finance research applying methods from computation... more We critically assess mainstream accounting and finance research applying methods from computational linguistics (CL) to study financial discourse. We also review common themes and innovations in the literature and assess the incremental contributions of work applying CL methods over manual content analysis. Key conclusions emerging from our analysis are: (a) accounting and finance research is behind the curve in terms of CL methods generally and word sense disambiguation in particular; (b) implementation issues mean the proposed benefits of CL are often less pronounced than proponents suggest; (c) structural issues limit practical relevance; and (d) CL methods and high quality manual analysis represent complementary approaches to analyzing financial discourse. We describe four CL tools that have yet to gain traction in mainstream AF research but which we believe offer promising ways to enhance the study of meaning in financial discourse. The four tools are named entity recognition (NER), summarization, semantics and corpus linguistics.
Journal of surgical orthopaedic advances, 2018
This study aimed to evaluate patient education materials that are focused on total hip arthroplas... more This study aimed to evaluate patient education materials that are focused on total hip arthroplasty (THA) and total knee arthroplasty (TKA) using health literacy best practices and plain language principles as frameworks. Readability assessments were conducted on a sample of nine patient education documents that are commonly given to THA and TKA surgery patients. Mean readability scores were compared across the sample. The mean readability grade level for the nine arthroplasty educational documents analyzed in this study was 11th grade (10.5). The mean readability ranged from 9th to 12th grade. The documents in this study were written at levels that exceed recommendations by health literacy experts. Health literacy best practices and plain language principles were suggested to reduce the demands on patients so that the documents are easier to understand. Incorporating health literacy best practices into patient education materials for THA and TKA can contribute to improved communica...
Proceedings of the 1st Workshop on Sense, Concept and Entity Representations and their Applications
Creating high-quality wide-coverage multilingual semantic lexicons to support knowledge-based app... more Creating high-quality wide-coverage multilingual semantic lexicons to support knowledge-based approaches is a challenging time-consuming manual task. This has traditionally been performed by linguistic experts: a slow and expensive process. We present an experiment in which we adapt and evaluate crowdsourcing methods employing native speakers to generate a list of coarse-grained senses under a common multilingual semantic taxonomy for sets of words in six languages. 451 non-experts (including 427 Mechanical Turk workers) and 15 expert participants semantically annotated 250 words manually for Arabic, Chinese, English, Italian, Portuguese and Urdu lexicons. In order to avoid erroneous (spam) crowdsourced results, we used a novel taskspecific two-phase filtering process where users were asked to identify synonyms in the target language, and remove erroneous senses.
Accounting and Business Research
Doubts have been raised about the rigour and objectivity of sell-side analysts' research due to i... more Doubts have been raised about the rigour and objectivity of sell-side analysts' research due to institutional structures that promote pro-management behaviour. However, research in psychology stresses the importance of controlling for biases in individuals' inherent cognitive processing behaviour when drawing conclusions about their propensity to undertake careful scientific analysis. Using social cognition theory, we predict that the rigour and objectivity evident in analyst research is more pronounced following unexpected news in general and unexpected bad news in particular. We evaluate this prediction against the null hypothesis that analyst research consistently lacks rigour and objectivity to maintain good relations with management. Using U.S. firm earnings surprises as our conditioning event, we examine the content of analysts' conference call questions and research notes to assess the properties of their research. We find that analysts' notes and conference call questions display material levels of rigour and objectivity when earnings news is unexpectedly positive, and that these characteristics are more pronounced in response to unexpectedly poor earnings news. Results are consistent with analysts' innate cognitive processing response counteracting institutional considerations when attributional search incentives are strong. Exploratory analysis suggests that studying verbal and written outputs provides a more complete picture of analysts' work.
The Text Analysis Conference MultiLing Pilot of 2011 posed a multi-lingual summarization task to ... more The Text Analysis Conference MultiLing Pilot of 2011 posed a multi-lingual summarization task to the summarization community, aiming to quantify and measure the performance of multi-lingual, multi-document summarization systems. The task was to create a 240-250 word summary from 10 news texts, describing a given topic. The texts of each topic were provided in seven languages (Arabic, Czech, English, French, Greek, Hebrew, Hindi) and each participant generated summaries for at least 2 languages. The evaluation of the summaries was performed using automatic (AutoSummENG, Rouge) and manual processes (Overall Responsiveness score). The participating systems were 8, some of which providing summaries across all languages. This paper provides a brief description for the collection of the data, the evaluation methodology, the problems and challenges faced, and an overview of participation and corresponding results.
This document overviews the strategy, effort and aftermath of the MultiLing 2013 multilingual sum... more This document overviews the strategy, effort and aftermath of the MultiLing 2013 multilingual summarization data collection. We describe how the Data Contributors of MultiLing collected and generated a multilingual multi-document summarization corpus on 10 different languages: Arabic, Chinese, Czech, English, French, Greek, Hebrew, Hindi, Romanian and Spanish. We discuss the rationale behind the main decisions of the collection, the methodology used to generate the multilingual corpus, as well as challenges and problems faced per language. This paper overviews the work on Arabic, Chinese, English, Greek, and Romanian languages. A second part, covering the remaining languages, is available as a distinct paper in the MultiLing 2013 proceedings.
In this paper we use a novel approach towards Arabic dialect identification using language bivale... more In this paper we use a novel approach towards Arabic dialect identification using language bivalency and written code-switching.
Bivalency between languages or dialects is where a word or element is treated by language users as having a fundamentally similar
semantic content in more than one language or dialect. Arabic dialect identification in writing is a difficult task even for humans
due to the fact that words are used interchangeably between dialects. The task of automatically identifying dialect is harder and
classifiers trained using only n-grams will perform poorly when tested on unseen data. Such approaches require significant amounts
of annotated training data which is costly and time consuming to produce. Currently available Arabic dialect datasets do not exceed
a few hundred thousand sentences, thus we need to extract features other than word and character n-grams. In our work we present
experimental results from automatically identifying dialects from the four main Arabic dialect regions (Egypt, North Africa, Gulf and
Levant) in addition to Standard Arabic. We extend previous work by incorporating additional grammatical and stylistic features and
define a subtractive bivalency profiling approach to address issues of bivalent words across the examined Arabic dialects. The results
show that our new methods classification accuracy can reach more than 76% and score well (66%) when tested on completely unseen data.
The Text Analysis Conference MultiLing Pilot of 2011 posed a multi-lingual summarization task to ... more The Text Analysis Conference MultiLing Pilot of 2011 posed a multi-lingual summarization task to the summarization community, aiming to quantify and measure the performance of multi-lingual, multi-document summarization systems. The task was to create a 240-250 word summary from 10 news texts, describing a given topic. The texts of each topic were provided in seven languages (Arabic, Czech, English, French, Greek, Hebrew, Hindi) and each participant generated summaries for at least 2 languages. The evaluation of the summaries was performed using automatic (Au-toSummENG, Rouge) and manual processes (Overall Responsiveness score). The participating systems were 8, some of which providing summaries across all languages. This paper provides a brief description for the collection of the data, the evaluation methodology, the problems and challenges faced, and an overview of participation and corresponding results.
We present the results of our Arabic and English runs at the TAC 2011 Multilingual summarisation ... more We present the results of our Arabic and English runs at the TAC 2011 Multilingual summarisation (MultiLing) task. We participated with centroid-based clustering for multidocument summarisation. The automatically generated Arabic and English summaries were evaluated by human participants and by two automatic evaluation metrics, ROUGE and AutoSummENG. The results are compared with the other systems that participated in the same track on both Arabic and English languages. Our Arabic summariser performed ...
Proceedings of LREC, 2012
The emergence of crowdsourcing as a commonly used approach to collect vast quantities of human as... more The emergence of crowdsourcing as a commonly used approach to collect vast quantities of human assessments on a variety of tasks represents nothing less than a paradigm shift. This is particularly true in academic research where it has suddenly become possible to collect (high-quality) annotations rapidly without the need of an expert. In this paper we investigate factors which can influence the quality of the results obtained through Amazon's Mechanical Turk crowdsourcing platform. We investigated the impact of different ...
Language resources are important for those working on computational methods to analyse and study ... more Language resources are important for those working on computational methods to analyse and study languages. These resources are needed to help advancing the research in fields such as natural language processing, machine learning, information retrieval and text analysis in general. We describe the creation of useful resources for languages that currently lack them, taking resources for Arabic
summarisation as a case study. We illustrate three different paradigms for creating language resources, namely: (1) using crowdsourcing to produce a small resource rapidly and relatively cheaply; (2) translating an existing gold-standard dataset, which is relatively easy but potentially of lower quality; and (3) using manual effort with appropriately
skilled human participants to create a resource that is more expensive but of high quality. The last of these was used as a test collection for TAC-2011. An evaluation of the resources is also presented.
We present quantitative and qualitative results of automatic and manual comparisons of translatio... more We present quantitative and qualitative results of automatic and manual comparisons of translations of the originally French novel “The Stranger” (French: L’Étranger). We provide a novel approach to evaluating translation performance
across languages without the need for reference translations or comparable corpora. Our approach examines the consistency of the translation of various document levels including chapters, parts and sentences. In our experiments we analyse four expert translations of the French novel. We also used Google’s machine translation output as baselines. We analyse the translations by using readability metrics, rank correlation comparisons and Word Error Rate (WER).
Proceedings of the Workshop MultiLing 2019: Summarization Across Languages, Genres and Sources associated with RANLP 2019, Dec 15, 2019
The Financial Narrative Summarisation task at MultiLing 2019 aims to demonstrate the value and ch... more The Financial Narrative Summarisation task at MultiLing 2019 aims to demonstrate the value and challenges of applying automatic text summarisation to financial text written in English, usually referred to as financial narrative disclosures. The task dataset has been extracted from UK annual reports published in PDF file format. The participants were asked to provide structured summaries, based on real-world, publicly available financial annual reports of UK firms by extracting information from different key sections. Participants were asked to generate summaries that reflects the analysis and assessment of the financial trend of the business over the past year, as provided by annual reports. The evaluation of the summaries was performed using Au-toSummENG and Rouge automatic metrics. This paper focuses mainly on the data creation process.
Multilingual Text Analysis, Feb 1, 2019
This chapter describes and evaluates the use of Information Extraction and Natural Language Proce... more This chapter describes and evaluates the use of Information Extraction and Natural Language Processing methods for extraction and analysis of financial annual reports in three languages: English, Spanish and Portuguese. The work described retains information on document structure which is needed to enable a clear distinction between narrative and financial statement components of annual reports and between individual sections within the narratives component. Extraction accuracy varies between languages with English exceeding 95 %. We apply the extraction methods on a comprehensive sample of annual reports published by UK, Spanish and Portuguese non-financial firms between 2003 and 2014.
Accounting and Business Research
Young (2019): Retrieving, classifying and analysing narrative commentary in unstructured (glossy)... more Young (2019): Retrieving, classifying and analysing narrative commentary in unstructured (glossy) annual reports published as PDF files, Accounting and Business Research,
SSRN Electronic Journal
We measure annual report commentary articulating entities’ business model and strategy, and then ... more We measure annual report commentary articulating entities’ business model and strategy, and then examine the capital market effects of enhancing such disclosure. Our empirical disclosure proxy is based on n-grams drawn from popular strategy textbooks and the academic strategy literature. Validation tests confirm that our score: (a) correlates with manual classifications of the quality of strategy-focused disclosures produced by domain experts; (b) covaries predictably with firm-level drivers of strategy-focused disclosures identified by prior research; and (c) captures the structural break in reporting associated with the regulatory mandate for a subset of London Stock Exchange firms to explain their strategy and business model. Tests using this exogenous and measurable increase in strategy-focused disclosure show that enhanced commentary on strategy and business model is associated with lower investor uncertainty. We also find support for an increase in the speed at which information is incorporated into stock price following the annual report release.
Journal of Business Finance & Accounting
We critically assess mainstream accounting and finance research applying methods from computation... more We critically assess mainstream accounting and finance research applying methods from computational linguistics (CL) to study financial discourse. We also review common themes and innovations in the literature and assess the incremental contributions of work applying CL methods over manual content analysis. Key conclusions emerging from our analysis are: (a) accounting and finance research is behind the curve in terms of CL methods generally and word sense disambiguation in particular; (b) implementation issues mean the proposed benefits of CL are often less pronounced than proponents suggest; (c) structural issues limit practical relevance; and (d) CL methods and high quality manual analysis represent complementary approaches to analyzing financial discourse. We describe four CL tools that have yet to gain traction in mainstream AF research but which we believe offer promising ways to enhance the study of meaning in financial discourse. The four tools are named entity recognition (NER), summarization, semantics and corpus linguistics.
Journal of surgical orthopaedic advances, 2018
This study aimed to evaluate patient education materials that are focused on total hip arthroplas... more This study aimed to evaluate patient education materials that are focused on total hip arthroplasty (THA) and total knee arthroplasty (TKA) using health literacy best practices and plain language principles as frameworks. Readability assessments were conducted on a sample of nine patient education documents that are commonly given to THA and TKA surgery patients. Mean readability scores were compared across the sample. The mean readability grade level for the nine arthroplasty educational documents analyzed in this study was 11th grade (10.5). The mean readability ranged from 9th to 12th grade. The documents in this study were written at levels that exceed recommendations by health literacy experts. Health literacy best practices and plain language principles were suggested to reduce the demands on patients so that the documents are easier to understand. Incorporating health literacy best practices into patient education materials for THA and TKA can contribute to improved communica...
Proceedings of the 1st Workshop on Sense, Concept and Entity Representations and their Applications
Creating high-quality wide-coverage multilingual semantic lexicons to support knowledge-based app... more Creating high-quality wide-coverage multilingual semantic lexicons to support knowledge-based approaches is a challenging time-consuming manual task. This has traditionally been performed by linguistic experts: a slow and expensive process. We present an experiment in which we adapt and evaluate crowdsourcing methods employing native speakers to generate a list of coarse-grained senses under a common multilingual semantic taxonomy for sets of words in six languages. 451 non-experts (including 427 Mechanical Turk workers) and 15 expert participants semantically annotated 250 words manually for Arabic, Chinese, English, Italian, Portuguese and Urdu lexicons. In order to avoid erroneous (spam) crowdsourced results, we used a novel taskspecific two-phase filtering process where users were asked to identify synonyms in the target language, and remove erroneous senses.
Accounting and Business Research
Doubts have been raised about the rigour and objectivity of sell-side analysts' research due to i... more Doubts have been raised about the rigour and objectivity of sell-side analysts' research due to institutional structures that promote pro-management behaviour. However, research in psychology stresses the importance of controlling for biases in individuals' inherent cognitive processing behaviour when drawing conclusions about their propensity to undertake careful scientific analysis. Using social cognition theory, we predict that the rigour and objectivity evident in analyst research is more pronounced following unexpected news in general and unexpected bad news in particular. We evaluate this prediction against the null hypothesis that analyst research consistently lacks rigour and objectivity to maintain good relations with management. Using U.S. firm earnings surprises as our conditioning event, we examine the content of analysts' conference call questions and research notes to assess the properties of their research. We find that analysts' notes and conference call questions display material levels of rigour and objectivity when earnings news is unexpectedly positive, and that these characteristics are more pronounced in response to unexpectedly poor earnings news. Results are consistent with analysts' innate cognitive processing response counteracting institutional considerations when attributional search incentives are strong. Exploratory analysis suggests that studying verbal and written outputs provides a more complete picture of analysts' work.
The Text Analysis Conference MultiLing Pilot of 2011 posed a multi-lingual summarization task to ... more The Text Analysis Conference MultiLing Pilot of 2011 posed a multi-lingual summarization task to the summarization community, aiming to quantify and measure the performance of multi-lingual, multi-document summarization systems. The task was to create a 240-250 word summary from 10 news texts, describing a given topic. The texts of each topic were provided in seven languages (Arabic, Czech, English, French, Greek, Hebrew, Hindi) and each participant generated summaries for at least 2 languages. The evaluation of the summaries was performed using automatic (AutoSummENG, Rouge) and manual processes (Overall Responsiveness score). The participating systems were 8, some of which providing summaries across all languages. This paper provides a brief description for the collection of the data, the evaluation methodology, the problems and challenges faced, and an overview of participation and corresponding results.
This document overviews the strategy, effort and aftermath of the MultiLing 2013 multilingual sum... more This document overviews the strategy, effort and aftermath of the MultiLing 2013 multilingual summarization data collection. We describe how the Data Contributors of MultiLing collected and generated a multilingual multi-document summarization corpus on 10 different languages: Arabic, Chinese, Czech, English, French, Greek, Hebrew, Hindi, Romanian and Spanish. We discuss the rationale behind the main decisions of the collection, the methodology used to generate the multilingual corpus, as well as challenges and problems faced per language. This paper overviews the work on Arabic, Chinese, English, Greek, and Romanian languages. A second part, covering the remaining languages, is available as a distinct paper in the MultiLing 2013 proceedings.
In this paper we use a novel approach towards Arabic dialect identification using language bivale... more In this paper we use a novel approach towards Arabic dialect identification using language bivalency and written code-switching.
Bivalency between languages or dialects is where a word or element is treated by language users as having a fundamentally similar
semantic content in more than one language or dialect. Arabic dialect identification in writing is a difficult task even for humans
due to the fact that words are used interchangeably between dialects. The task of automatically identifying dialect is harder and
classifiers trained using only n-grams will perform poorly when tested on unseen data. Such approaches require significant amounts
of annotated training data which is costly and time consuming to produce. Currently available Arabic dialect datasets do not exceed
a few hundred thousand sentences, thus we need to extract features other than word and character n-grams. In our work we present
experimental results from automatically identifying dialects from the four main Arabic dialect regions (Egypt, North Africa, Gulf and
Levant) in addition to Standard Arabic. We extend previous work by incorporating additional grammatical and stylistic features and
define a subtractive bivalency profiling approach to address issues of bivalent words across the examined Arabic dialects. The results
show that our new methods classification accuracy can reach more than 76% and score well (66%) when tested on completely unseen data.
The Text Analysis Conference MultiLing Pilot of 2011 posed a multi-lingual summarization task to ... more The Text Analysis Conference MultiLing Pilot of 2011 posed a multi-lingual summarization task to the summarization community, aiming to quantify and measure the performance of multi-lingual, multi-document summarization systems. The task was to create a 240-250 word summary from 10 news texts, describing a given topic. The texts of each topic were provided in seven languages (Arabic, Czech, English, French, Greek, Hebrew, Hindi) and each participant generated summaries for at least 2 languages. The evaluation of the summaries was performed using automatic (Au-toSummENG, Rouge) and manual processes (Overall Responsiveness score). The participating systems were 8, some of which providing summaries across all languages. This paper provides a brief description for the collection of the data, the evaluation methodology, the problems and challenges faced, and an overview of participation and corresponding results.
We present the results of our Arabic and English runs at the TAC 2011 Multilingual summarisation ... more We present the results of our Arabic and English runs at the TAC 2011 Multilingual summarisation (MultiLing) task. We participated with centroid-based clustering for multidocument summarisation. The automatically generated Arabic and English summaries were evaluated by human participants and by two automatic evaluation metrics, ROUGE and AutoSummENG. The results are compared with the other systems that participated in the same track on both Arabic and English languages. Our Arabic summariser performed ...
Proceedings of LREC, 2012
The emergence of crowdsourcing as a commonly used approach to collect vast quantities of human as... more The emergence of crowdsourcing as a commonly used approach to collect vast quantities of human assessments on a variety of tasks represents nothing less than a paradigm shift. This is particularly true in academic research where it has suddenly become possible to collect (high-quality) annotations rapidly without the need of an expert. In this paper we investigate factors which can influence the quality of the results obtained through Amazon's Mechanical Turk crowdsourcing platform. We investigated the impact of different ...
Language resources are important for those working on computational methods to analyse and study ... more Language resources are important for those working on computational methods to analyse and study languages. These resources are needed to help advancing the research in fields such as natural language processing, machine learning, information retrieval and text analysis in general. We describe the creation of useful resources for languages that currently lack them, taking resources for Arabic
summarisation as a case study. We illustrate three different paradigms for creating language resources, namely: (1) using crowdsourcing to produce a small resource rapidly and relatively cheaply; (2) translating an existing gold-standard dataset, which is relatively easy but potentially of lower quality; and (3) using manual effort with appropriately
skilled human participants to create a resource that is more expensive but of high quality. The last of these was used as a test collection for TAC-2011. An evaluation of the resources is also presented.
We present quantitative and qualitative results of automatic and manual comparisons of translatio... more We present quantitative and qualitative results of automatic and manual comparisons of translations of the originally French novel “The Stranger” (French: L’Étranger). We provide a novel approach to evaluating translation performance
across languages without the need for reference translations or comparable corpora. Our approach examines the consistency of the translation of various document levels including chapters, parts and sentences. In our experiments we analyse four expert translations of the French novel. We also used Google’s machine translation output as baselines. We analyse the translations by using readability metrics, rank correlation comparisons and Word Error Rate (WER).