Corpus Analysis Research Papers - Academia.edu (original) (raw)

The increasing use of computers to enable or replace face-to-face tutorial discussion groups in higher education is creating a new form of academic writing. This small-scale study of 43 students and three tutors identifies ways in which... more

The increasing use of computers to enable or replace face-to-face tutorial discussion groups in higher education is creating a new form of academic writing. This small-scale study of 43 students and three tutors identifies ways in which students present their opinions in a forum which allows greater time for reflection, but also creates a permanent record. The notion of collaborative learning in computer conferencing militates against taking a strong, possibly controversial, stance. Opinions are therefore hedged, or located in the peer discourse community rather than the individual. Through a corpus analysis of the use of the pronouns I, we and it we identify ways in which student writers are representing themselves and their views in both computer conferences and in single-authored essays. A powerful authorial voice was often associated not with the individual I, but with the collective we. In their single-authored essays, students drew upon the consensual voice developed in the conference discussion to support their personal points of view. In both genres, students made use of impersonal it-clauses, but frequently preceded them by personal frames such as I think, thereby resisting the impersonal, but powerful, voice of much academic discourse. This paper contributes to our developing understanding of evolving student writing practices in disciplinary settings.

Over the past couple of decades, research has established that infants are sensitive to the predominant stress pattern of their native language. However, the degree to which the stress pattern shapes infants' language development has yet... more

Over the past couple of decades, research has established that infants are sensitive to the predominant stress pattern of their native language. However, the degree to which the stress pattern shapes infants' language development has yet to be fully determined. Whether stress is merely a cue to help organize the patterns of speech or whether it is an important part of the representation of speech sound sequences has still to be explored. Building on research in the areas of infant speech perception and segmentation, we asked how several months of exposure to the target language shapes infants' speech processing biases with respect to lexical stress. We hypothesize that infants represent stressed and unstressed syllables differently, and employed analyses of child-directed speech to show how this change to the representational landscape results in better distribution-based word segmentation as well as an advantage for stress-initial syllable sequences. A series of experiments then tested 9- and 7-month-old infants on their ability to use lexical stress without any other cues present to parse sequences from an artificial language. We found that infants adopted a stress-initial syllable strategy and that they appear to encode stress information as part of their proto-lexical representations. Together, the results of these studies suggest that stress information in the ambient language not only shapes how statistics are calculated over the speech input, but that it is also encoded in the representations of parsed speech sequences.

As an established fact, corpus-based studies are of paramount value to shed some light on the qualitative and quantitative linguistic analysis whose extension is getting prevalent in the realm of language learning and teaching especially... more

As an established fact, corpus-based studies are of paramount value to shed some light on the qualitative and quantitative linguistic analysis whose extension is getting prevalent in the realm of language learning and teaching especially by making grammar subservient to lexis. Concordance as a vibrant technique for corpus-based studies elucidates some of the previously-ignored features of language learning which have colossal implications for language teaching. This study accentuates on this technique and delves extensively into myriad of applications of ELT situation and its associated prerequisites and conditions. It is done via providing tangible procedure-based strategies and analyzing pitfalls attributed to concordance applications through which the language learning is thought to be facilitated and teachers paves the way for achieving the intended objectives.

Page 1. Multi-layer analysis of translation corpora: methodological issues and practical implications Silvia ... Abstract The present paper discusses an applica-tion of multilingual, multi-layer corpus analysis from translation studies.... more

Page 1. Multi-layer analysis of translation corpora: methodological issues and practical implications Silvia ... Abstract The present paper discusses an applica-tion of multilingual, multi-layer corpus analysis from translation studies. The ...

This is a corpus study aimed to compare six Iranian general English university textbook's reading comprehension passages and the passages of reading comprehension section of MA exams from 2010 to 2014. The study used three reading related... more

This is a corpus study aimed to compare six Iranian general English university textbook's reading comprehension passages and the passages of reading comprehension section of MA exams from 2010 to 2014. The study used three reading related factors to make the comparison: vocabulary coverage, syntactic complexity and discourse features. To meet these needs, three test types were used: measures of vocabulary coverage by the vocabprofiler software, measures of readability by means of readability formulas and measures of text easibility of the Coh-Metrix software. The analyses showed a big gap between what textbooks offered with regard to vocabulary, structures and discourse and what the MA examinations asked from the readers regarding the reading comprehension processes. The findings and results were presented along with the pedagogical implications and some suggestions for future researches.

I have approached the eighteenth-century villancico from various different angles. Each chapter can be seen as a self-contained essay, although the reader will find that each aims to reveal different aspects of the same phenomenon, in... more

I have approached the eighteenth-century villancico from various different angles. Each chapter can be seen as a self-contained essay, although the reader will find that each aims to reveal different aspects of the same phenomenon, in order to provide as comprehensive an overview as possible. The first chapter provides the historical framework of the period under investigation, with special attention given to the social and political events which influenced the production of music, the role played by the Church in musical developments in Spain and the place of Salamanca in the peninsular context. The second chapter studies the institutional framework of the music chapel of Salamanca Cathedral, this being approached from the global perspective of the production of sacred music in Spain, and it provides a detailed description of the structure and transformation of the music chapel from its origins, with special emphasis to the early decades of the eighteenth century. The third chapter comprises a reflection on the concept of villancico, and its relationship with parallel genres in other countries, as well as with contemporary secular genres. In the fourth chapter the function and ceremonial context of the sacred villancico in Salamanca Cathedral is studied, with a detailed analysis of the way these factors influenced the music written for each occasion. The fifth chapter focuses mostly on the formal transformation of the villancico in the early decades of the century, particularly as regards the introduction of Italian operaticforms into the genre. This chapter is not exclusively focused on Salamanca, but also provides a systematic scrutiny of the villancico repertory from the Real Capilla in Madrid and some other important institutions. The sixth chapter is concerned with the stylistic changes in the villancico genre in the same period. Here the study is based on a systematic examination of the repertory preserved in Salamanca cathedral to provide a clear and concise view of the process.

First person pronouns are a rhetorical strategy which allows researchers to perform different discourse functions in the text, through which they construct a convincing argument that persuades readers of the validity and novelty of their... more

First person pronouns are a rhetorical strategy which allows researchers to perform different discourse functions in the text, through which they construct a convincing argument that persuades readers of the validity and novelty of their claims and of their own competence. In this paper I explore how Spanish EFL Engineering students use first person plural pronouns in multi-authored report writing. The paper examines the discourse functions of the pronoun we in a corpus of 55 reports written by Spanish students. The analysis shows that these students fail to understand how expert writers use these pronouns to construct their authorial identities as knowledgeable members of the community. Students are unaware of the conventionalised use of phraseological patterns involving we to perform specific functions in academic genres. The results clearly suggest the need for an approach to academic writing in higher education which combines genre analysis, expert corpora and learner corpora.

This study explores the use of phrasal verbs in English language documents of the European Union (EU) as part of a larger-scale project examining the use of English in EU texts from various aspects including lexical, lexico-grammatical... more

This study explores the use of phrasal verbs in English language documents of the European Union (EU) as part of a larger-scale project examining the use of English in EU texts from various aspects including lexical, lexico-grammatical and textual features. Phrasal verbs, known to represent one of the most difficult aspects of learning English, are highly productive and widely used by native speakers. The purpose of this study is to identify the most frequent phrasal verb combinations in EU documents. To this end, an EU English Corpus of approximately 200,000 running words was built using texts which are representative of the fields of activities of the EU. The analysis revealed that the top 25 phrasal verbs account for more than 60% of all phrasal verb constructions in the corpus. The results also show that in terms of the frequency of phrasal verbs, EU documents show some similarity to written academic English. The paper also illustrates some instructional activities and the pedagogical relevance of the findings.

3rd International Conference on NLP & Big Data (NLPD 2022) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of Natural Language Computing and Big data. Authors are... more

3rd International Conference on NLP & Big Data (NLPD 2022) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of Natural Language Computing and Big data. Authors are solicited to contribute to the conference by submitting articles that illustrate research results, projects, surveying works and industrial experiences that describe significant advances in the following areas, but are not limited to these topics only

International Conference on Machine Learning, NLP and Data Mining (MLDA 2022) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of Machine Learning, Natural Language... more

International Conference on Machine Learning, NLP and Data Mining (MLDA 2022) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of Machine Learning, Natural Language Computing and Data Mining. Authors are solicited to contribute to the conference by submitting articles that
illustrate research results, projects, surveying works and industrial experiences that describe significant advances in the following areas, but are not limited to these topics only.

A plethora of researches have been conducted to improve the grammar teaching technique of English language in order to make it easier for the students to learn English in a fun way. To teach the grammar, the corpus approach proves to be... more

A plethora of researches have been conducted to improve the grammar teaching technique of English language in order to make it easier for the students to learn English in a fun way. To teach the grammar, the corpus approach proves to be very beneficial. The focus of this study is to explore the way how literary poems can be of great use for teaching English language to the students, by using a Simple Concordance Program Scp (4.09). For this purpose, we have selected four English poems. This is an exploratory research. This study investigates the ways in which the application of Corpus Analysis Tool on poems can assist English language teaching. Hence, it makes it easy for the students in ELT classrooms to comprehend the grammatical structures. The results illustrate that the Scp (4.09) tool helps in extracting the words with higher frequency rate but as well as in the teaching of imperative, conditional sentences, present (indefinite, continuous and perfect) and simple past tenses, as it provides the concordances of the words. Further researchers can find out the word collocations by using the same tool. Moreover, the selected poems in this study, can be analyzed through other corpus tools available. This study can also be expanded to the stylistic analysis of the poems. The literary devices used by both poets can be studied, compared and taught to the students.

This paper aims to give an overview of studies on phrasal verbs in three decades to present the theoretical and methodological issues, as well as the findings of research. Moreover, this review reveals the developments and paradigm shifts... more

This paper aims to give an overview of studies on phrasal verbs in three decades to present the theoretical and methodological issues, as well as the findings of research. Moreover, this review reveals the developments and paradigm shifts occurred in this area. Previous studies have shown that the research findings have not been incorporated into classroom activities and English Language Teaching (ELT) materials. The paper claims that the number of research on the use of phrasal verbs in ESL textbooks is limited and, therefore, further research is needed to examine how phrasal verbs are treated in textbooks in order to help ELT materials developers to present these items more effectively based on research findings.

Misuse of English conjunction related to incoherent writing, according to the literature, comes from learners’ first language interference, improper mechanical exercises, and misleading lists of connectors in textbooks demonstrated as if... more

Misuse of English conjunction related to incoherent writing, according to the literature, comes from learners’ first language interference, improper mechanical exercises, and misleading lists of connectors in textbooks demonstrated as if mutually interchangeable without contextual constraints. Form-focused instruction with explicit semantic, stylistic and syntactic properties can help learning of connectors. Additionally, computer learner corpus analysis which identifies systematic interlanguage patterns

Although the field of natural language processing has made considerable strides in the automated processing of standard language, figurative (i.e., non-literal) language still causes great difficulty. Normally, when we understand human... more

Although the field of natural language processing has made considerable strides in the automated processing of standard language, figurative (i.e., non-literal) language still causes great difficulty. Normally, when we understand human language we combine the meaning of individual words into larger units in a compositional manner. However, understanding figurative language often involves an interpretive adjustment to individual words. A complete model of language processing needs to account for the way normal word meanings can be profoundly altered by their combination. Although figurative language is common in naturally occurring language, we know of no previous quantitative analyses of this phenomenon. Furthermore, while certain types and tokens are used more frequently than others, it is unknown whether frequency of use interacts with processing load. This paper outlines our current research program exploring the functional and neural bases of figurative language through a combin...

This corpus-based study investigates how translators of the Qur'an have dealt with the verb ata that seems to represent a case of polysemy. It examines nine English translations of all tokens of the verb ata in the Glorious Qur'an with... more

This corpus-based study investigates how translators of the Qur'an have dealt with the verb ata that seems to represent a case of polysemy. It examines nine English translations of all tokens of the verb ata in the Glorious Qur'an with the aim of determining which of the two policies referred to by Nida and Taber; "Contextual Consistency'' and ''Verbal Consistency'' translators have adopted. It attempts to investigate whether the translators have been able to recognize when ata in the Qur'an is being used in a primary or a secondary sense or they have fallen into the trap of translating it literally. It also attempts to determine whether the syntactic and semantic behavior of the verb ata has an implication for the translating process. It is mainly a descriptive study where an attempt is made to descriptively – rather than prescriptively – discuss the translation product. A statistical analysis is conducted to identify the English equivalents of ata in the corpus. The selection includes the translations of Corpus analysis reveals that the polysemous verb ata has not been translated by a single word in the English translations under study. On the contrary, it has a large number of translation equivalents. This difference in translation equivalents could be a sign of the possibility of a difference in the semantics of the original verb. It means that ata in the Qur'an expresses various meanings and that in translating it most of the translators have adopted the policy of contextual consistency.

2 nd International Conference on NLP & Data Mining (NLDM 2022) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of Natural Language Computing and Data Mining. Authors... more

2
nd
International Conference on NLP & Data Mining (NLDM 2022) will provide an excellent
international forum for sharing knowledge and results in theory, methodology and applications of
Natural Language Computing and Data Mining.
Authors are solicited to contribute to the conference by submitting articles that illustrate research
results, projects, surveying works and industrial experiences that describe significant advances in
the following areas, but are not limited to.

This paper offers the first large-scale study of a multimodal corpus of 210 advertisements. First, the reader is presented with a description of the corpus in terms of the distribution of conceptual operations (for the purposes of this... more

This paper offers the first large-scale study of a multimodal corpus of 210 advertisements. First, the reader is presented with a description of the corpus in terms of the distribution of conceptual operations (for the purposes of this work, metaphor and metonymy) and use of modal cues. Subsequently, the weight of mode and marketing strategy to trigger more or less amounts of conceptual complexity is analysed. This corpus-based survey is complemented with the qualitative analysis of three novel metaphor-metonymy interactions that stem from the data and that have not yet been surveyed in multimodal use. The results show that metaphtonymy (a metaphor-metonymy compound) is the most frequent conceptual operation in the corpus; that there is a significant effect of the use of modes in the activation of different amounts of conceptual complexity; and that the type of advertised product and the marketing strategy has no significant effect on the number and complexity of conceptual mappings in the advertisement.

Min Woo Lee (2017), An Analysis of Meaning Usage of Synonymous Words. Studies in Linguistics 44, 135-154. The purpose of this study is to examine multiple segmentation patterns of synonyms and to identify the aspect of their usage. To do... more

Min Woo Lee (2017), An Analysis of Meaning Usage of Synonymous Words. Studies in Linguistics 44, 135-154. The purpose of this study is to examine multiple segmentation patterns of synonyms and to identify the aspect of their usage. To do this, we quantitatively investigate the use ratio of 62 synonyms according to their multiple meanings. Through this, we could confirm the degree of synonymic rate, directionality, and characteristics of use by objective values. Findings of this study are as follows. First, meaningful semantic correspondence shows one-sided directionality. Also, the semantic correspondence shows a relatively higher degree of similarity on the lower frequency side. The asymmetry of synonymic relation is not simply the asymmetry of quantitative use but the asymmetry in the direction of meaningful correspondence. In addition, it is found that the asymmetry of usage increases when the directional identity is greater. This study presented the similarity of synonyms as an objective measure. It is meaningful in that it provides a basis for objectively explaining the degree of similarity and the characteristics of the meaningful relationships, which were only intuitively grasped.
lexical meaning, synonymy, synonym, analysis of meaning usage, degree of synonymic rate, asymmetry of synonym

This paper presents an approach applicable in assessing the syntactical structures in a learner corpus. As a descriptive and a corpus-based study, it explored the output of L2 learners in the business context providing examples of the... more

This paper presents an approach applicable in assessing the syntactical structures in a learner corpus. As a descriptive and a corpus-based study, it explored the output of L2 learners in the business context providing examples of the syntactic structures. The proposed areas for investigations are frequency analyses, sentence level syntactical analyses, distributional patterns of sentence level linguistic structural patterns and subject-verb agreement analyses reflecting the learners’ knowledge of applying their grammatical linguistics knowledge into their written output. The data is drawn from a sample corpus collected from 24 learners enrolled in business and management courses in 2 higher learning institutions in Malaysia. The methodology applied is fundamental as it tends to investigate the linguistic constitutions in the learner corpus of business undergraduates. Computer-based syntactical studies are limited as it requires hard work and long hours in order to key-in the data and then there is the complex analytic method of
describing the findings. In contrast, this article will demonstrate an uncomplicated method of analysis and also encourage the use of existing part-of speech (POS) tagging software available. The potentialities and limitations of the approach are also given due consideration contributing
towards the sustainability of its’ future requirements into similar area of research.
Keywords: learner corpus, syntactical analysis, frequency analysis, computer-based learner corpus analysis, part-of-speech (POS) tagging.

In recent years, it has become an issue of growing concern that, despite undiminished enthusiasm in the research community, the application of corpus tools and resources in the classroom remains limited. In this paper, I will argue that... more

In recent years, it has become an issue of growing concern that, despite undiminished enthusiasm in the research community, the application of corpus tools and resources in the classroom remains limited. In this paper, I will argue that focusing on the role of the teacher in the process of using corpora in the classroom is an essential step towards popularizing this approach. It is vital that future language teachers can discover corpora and concordances as part of their initial training from the perspectives as learner and as teacher. To this end, I will present and discuss a case study in which student teachers were introduced to corpus analysis and trained how to teach with corpora. Data on the reflections and opinions provided by the student teachers will highlight the significance and potential of such a course.

This paper presents an approach applicable in assessing the syntax in a learner corpus. As a descriptive and a corpus-based study, it explored the output of L2 learners in the business context providing examples of syntactic structures.... more

This paper presents an approach applicable in assessing the syntax in a learner corpus. As a descriptive and a corpus-based study, it explored the output of L2 learners in the business context providing examples of syntactic structures. Data is drawn from a sample corpus collected from learners enrolled in business and management courses in higher learning institutions in Malaysia. The methodology applied is fundamental as it tends to investigate the linguistics constitutions in the learner corpus of business undergraduates.
Computer-based syntactic studies are limited as it requires hard work and long hours in order to key-in the data and then there is the complex analytic method of describing the findings. In contrast, this article will demonstrate an uncomplicated method of analysis and also encourage the use of existing part-of speech (POS) tagging software available. The potentialities and limitations of the approach are also given due consideration contributing
towards the sustainability of its’ future requirements into similar area of research.
Keywords: learner corpus, linguistic structures, syntax, computer-based learner corpus analysis, part-of-speech (POS) tagging software

Among the most impressive human behaviors are closely synchronized actions involving large groups of people—as seen in military displays, synchronized swimming, and much dance. Since coordinated movement promotes cooperation among... more

Among the most impressive human behaviors are closely synchronized actions involving large groups of people—as seen in military displays, synchronized swimming, and much dance. Since coordinated movement promotes cooperation among participants and increases the extent to which observers attribute group rapport and cohesion, there are good a priori reasons to aspire to such behaviors on an individual and group level. Interestingly, synchronized displays are often accompanied by music. This raises the question of which musical features might facilitate precise movement coordination. Musicians recognize that beat subdivision (either imagined or actual) facilitates synchronization. Compared with isochronous rhythms exhibiting the same event density, dotted rhythms necessitate beat subdivision as a criterion for accurate performance. Accordingly, one might predict that dotted rhythms improve group synchronization. A corpus analysis is reported whose purpose was to test the conjecture that dotted rhythms appear more often in music associated with group synchronization (specifically marches) than in other types of music. Two hundred marches were randomly sampled along with a matched sample of 200 control pieces written by the same composer, employing the same instrumental genre, and using the same metric class. The four pairs of notes preceding the first four sounded downbeats were examined. Surprisingly, the results indicate that dotted rhythms are not significantly more common in marches. Double-dotted rhythms were also not more common for slower than for faster tempos. If indeed dotted rhythms contribute to the impressiveness of march displays by facilitating synchronization, Westerns march music does not significantly seem to capitalize on this phenomenon.

The vast amount of electronic texts in the web facilitates the creation of megacorpora at a very short time. However, as the corpus size increases, the complexity of its management is becoming a serious problem limiting its functionality.... more

The vast amount of electronic texts in the web facilitates the creation of megacorpora at a very short time. However, as the corpus size increases, the complexity of its management is becoming a serious problem limiting its functionality. Furthermore, a great deal of corpus linguistics research is based on the quantitative comparison of small corpus samples, which are drawn from

Abstract: This paper presents the results of a parametric and frequency analysis of discourse structuring devices in written texts. We present a typology of organisational metadiscourse markers and examine one specific category of these... more

Abstract: This paper presents the results of a parametric and frequency analysis of discourse structuring devices in written texts. We present a typology of organisational metadiscourse markers and examine one specific category of these markers – sequencers - in more detail ( ...

This paper reports on an observational case study conducted to investigate the possibilities available to language teachers, non-expert, or novice in corpus analysis, in integrating corpus analysis technology to design language learning... more

This paper reports on an observational case study conducted to investigate the possibilities available to language teachers, non-expert, or novice in corpus analysis, in integrating corpus analysis technology to design language learning activities. Despite the availability of corpus analysis technology and large amounts of studies in corpus analysis of texts, studies on corpus-based language teaching and learning, development of language teaching-learning material incorporating corpus analysis technology and techniques by teachers who are non-experts in the field of corpus analysis for classroom teaching has remained an exception. This paper records the personal experience of the researcher as a language practitioner using corpus analysis technology in designing teaching learning material to achieve the objectives of a language course for undergraduates of low English proficiency (LEP) studying in an English medium instruction (EMI) context. Although this exercise posed challenges for the language teacher who is a ‘non-expert’ to the field, the researcher documents its positive and promising outcomes as proof to support more extensive ‘non-expert’ teacher-driven, student-participatory corpus based English language learning methodologies.