Bridging Informal Massive Open Online Courses and Formal English for Academic Purposes Programmes with Language Corpora (original) (raw)

A New Paradigm for Open Data-Driven Language Learning Systems Design in Higher Education

PhD Thesis, Concordia University, Montreal, Canada, 2018

A basic premise underpinning the new research paradigm presented in this thesis and demonstrated by the FLAX project (Flexible Language Acquisition flax.nzdl.org), is that open data-driven language learning systems design as an approach is learner-centric and operates with the interface to the learner. Whether the learner is operating fully online in non-formal or informal learning mode or in a blended modality that is based both within and beyond the formal language classroom, this approach requires that the tools and interfaces, and indeed the corpora, be openly accessible and remixable for development or adaptation to meet this specific learner requirement. This method is different from existing data-driven learning (DDL) approaches which assume specialised knowledge or experience with DDL tools, interfaces and strategies, operating on mostly inaccessible corpora in terms of cost or design, or alternatively assuming training to, hopefully, compensate for this lack of knowledge and experience. From a research and development (R&D) standpoint, the paradigm presented here also operates with the interface to knowledge organisations (universities, libraries, archives) and researchers who are engaging with open educational practices to push at the parameters of open policy for the non-commercial reuse and remix of authentic research and pedagogic content that is increasingly abundant in digital open access format for text and data-mining (TDM) purposes. This open access content is highly relevant to learning features of specialist varieties of English from across the academy but is otherwise off limits for development into proprietary learning materials by the commercial education publishing industry. Indeed, the open corpus development work presented in this thesis would not have been possible had it not been for the campaigners for copyright reform, the Internet activists, the open policy makers, the open-source software developers, and the advocates for open access, open data and open education that have made these resources available for reuse and remix. This paradigm leads down several paths, including research into understanding how users actually perceive, appropriate and use the approach based on the open tools and resources provided. This inquiry informs their design and development, in an R&D process that is presented here through the methodological lens of design-based research and design ethnography. This approach will be fundamentally different than if we assume the user is actually a DDL or linguistics expert or that such an expert will be the learner's interface to the system, by preparing output for the learner to experience and learn from. This approach will also be necessarily different than if we assume the user is always a formally registered student at a university with access to English for Academic Purposes (EAP) support that may or may not offer DDL or linguistics expertise for learning the language features of specific discourse communities from across the academy. The assumption behind this new paradigm that the right tools and resources can allow the end-learner to drive the processes autonomously is fundamentally revolutionary. This premise goes to the original contribution to knowledge of this thesis, but also challenges and directs researchers and practitioners in the field to consider and take up this new direction with open data-driven language learning systems design for applications that can be scaled in higher education to meet the increasing numbers of learners who are coming online. The focus on domain-specific terminology learning support via data-driven approaches is of course also decidedly different from the current EAP paradigm which in mainstream practice has been steadily evolving away from its roots in English for Specific Purposes (ESP), domain specificity and DDL processes towards the generic skills and knowledge programs currently in vogue that are arguably being steered by generic EAP course book publications from the commercial education publishing industry. Thus, this is also a new paradigm based on DDL approaches, driving domain-specific terminology learning support for EAP across formal, non-formal and informal learning modalities in higher education. It will transform, potentially, the focus of DDL systems design developments in language support and learning in general toward the non-specialist end-learner, but also hopefully help re-establish the centrality of language specificity to the field of EAP. The new paradigm is necessarily rooted in greater inter- or multi-disciplinarity. Given the goal of facilitating, in particular, the increasing number of learners who are coming online, and users of large-scale MOOC platforms who are trying to function in domain-specific subject areas that are invariably offered in the English language, the approach requires collaboration and cooperation among platform providers, subject academics and instructors, educational technologists, software developers, educational researchers, EAP practitioners, linguists with expertise in corpus-based and DDL approaches, and policy makers in knowledge organisations (libraries, universities, archives). This doctoral thesis presents three studies in collaboration with the open source FLAX project. This research makes an original contribution to the fields of language education and educational technology by mobilising knowledge from computer science, corpus linguistics and open education, and proposes a new paradigm for open data-driven language learning systems design in higher education. Furthermore, the research presented in this thesis uncovers and engages with an infrastructure of open educational practices (OEP) that push at the parameters of policy for the reuse of open access research and pedagogic content in the design, development, distribution, adoption and evaluation of data-driven language learning systems.

Flexible Open Language Education for a Multilingual World

2014

This research and technology paper will present open language tools and collections that have been developed for supporting domain-specific academic language with the FLAX multilingual open source software. OpenCourseWare (OCW), Massive Open Online Courses (MOOC) and Open Educational Resources (OER) are becoming popular educational vehicles through which well-resourced universities and organisations can reach out to non-traditional audiences, including those from other countries and cultures. For example, the OCW Consortium website states that, " Open Education seeks to scale educational opportunities by taking advantage of the power of the internet, allowing rapid and essentially free dissemination, and enabling people around the world to access knowledge, connect and collaborate " (" About the OCWC, " n.d.). Specificity in Academic Language Open education provides a compelling opportunity for domain-specific academic language learning. Online courses supply a large corpus of interesting linguistic material relevant to a particular area, including supplementary images (slides), audio and video. We contend that this corpus can be automatically analysed, enriched, and transformed into a resource that learners can browse and query in order to extend their ability to understand the language used, and help them express themselves more fluently and eloquently in that domain. To illustrate this idea, an existing online corpus-based language learning tool (FLAX) is applied to an English-medium Coursera MOOC offered by Columbia University, entitled Virology 1: How Viruses Work. MOOC participants register for educational courses; they do not sign up as language learners. However, many online learners will encounter a language barrier during their study with many of the open educational offerings being delivered in the world's presiding lingua francas, namely English, Arabic, French, Chinese, Russian, Spanish, and Portuguese. Beyond the simple translation of lecture transcripts and course readings, learners will be strongly motivated to improve their knowledge of key terms and concepts as they are used in the subject domain, exemplified here with the Virology MOOC collections in FLAX for support with Academic Language. (They are also helpful for native speakers of the target language.) OER Research Hypotheses for Open Language Support Research into the development and uses of text analysis tools from corpus linguistics has been primarily carried out in relation to traditional classroom-based university teaching only. This is despite the growing number of higher education offerings in open and distance learning, including the recent surge in OER, OCW and MOOCs in collaboration with universities and educational organisations. Drawing on linguistic data sources from MOOCs, along with survey data from MOOC learners and interview and survey data from course developers and English language education professionals, this paper will present findings from participants based on their perceptions of the effectiveness of the open language tools and collections in FLAX under investigation for Academic Language support. Specific OER research hypotheses have been investigated through this research in collaboration with the OER Research Hub. The following OER hypotheses have been examined through the different data collection instruments in relation to the different research participant groups, including: Hypothesis A: Use of OER leads to improvement in student performance and satisfaction. Hypothesis E: Use of OER leads to critical reflection by educators, with improvement in their practice. Hypothesis H: Informal learners adopt a variety of techniques to compensate for the lack of formal support. Hypothesis I: Open education acts as a bridge to formal education, and is complementary, not competitive, with it. Scaling Flexible Open Language Learning For the purpose of innovating, building and creating multilingual learning support collections for large-scale learning, both online and offline, the flexible tools and resources in FLAX can be applied to content

Second Language Learning in the Context of MOOCs

Proceedings of the 6th International Conference on Computer Supported Education ISBN 978-989-758-020-8, 2014

Massive Open Online Courses are becoming popular educational vehicles through which universities reach out to non-traditional audiences. Many enrolees hail from other countries and cultures, and struggle to cope with the English language in which these courses are invariably offered. Moreover, most such learners have a strong desire and motivation to extend their knowledge of academic English, particularly in the specific area addressed by the course. Online courses provide a compelling opportunity for domain-specific language learning. They supply a large corpus of interesting linguistic material relevant to a particular area, including supplementary images (slides), audio and video. We contend that this corpus can be automatically analysed, enriched, and transformed into a resource that learners can browse and query in order to extend their ability to understand the language used, and help them express themselves more fluently and eloquently in that domain. To illustrate this idea, an existing online corpus-based language learning tool (FLAX) is applied to a Coursera MOOC entitled Virology 1: How Viruses Work, offered by Columbia University.

FLAX: Flexible and open corpus-based language collections development

This case study has been written as an introductory guide for teachers who are interested in learning what Open Educational Resource (OER) options and practices (OEP) are available to them for developing and distributing online domain-specific language materials for uses in academic and professional settings. We present innovative work in open corpus-based language collections building with the open-source multilingual FLAX language project as a running example of open materials development practices for language teaching and learning. We present language-learning contexts from across formal and informal language learning, including Massive Open Online Courses (MOOCs), English for Academic Purposes (EAP) and language translation studies. We are particularly concerned with closing the gap in language teacher training where competencies in materials development are still dominated by print-based proprietary course book publications, which in many cases do not reflect salient findings from the research into domain-specific language. We are also concerned with the growing gap in language teaching practitioner competencies for understanding important issues of copyright and licensing that are changing rapidly in the context of digital and web literacy developments. These key issues are being largely ignored in the informal language teaching practitioner discussions, both by experienced and new language tutors, and in the formal research into teaching and materials development practices.

Openness in English for Academic Purposes

2013

Commissioned by the Higher Education Academy (HEA) in the United Kingdom, this case study has been written as an introductory guide for teachers and researchers working with international and home students for whom it would be beneficial to develop competencies with academic English as it is used across the different disciplines. In particular, as per the HEA directives handed down in commissioning this case study, Open Educational Resources (OER) and teaching quality are the theme to be addressed in this report with respects to the open web-based tools, resources and practices in English for Academic Purposes (EAP) that will be introduced here. A range of open corpus-based tools, resources and techniques will be demonstrated and discussed in section four, looking at different language projects from around the world that provide free access to valuable English language resources which are relevant for use both within and beyond traditional higher education. Because these tools and resources are openly available they can be used and shared by learners and teachers across a variety of contexts. For example, in language schools and in university language support centres, in open and distance education, and in independent or informal learning. They range from tools and resources that can provide diagnostic help for improving vocabulary, reading and writing to resources that can assist with identifying, retrieving, storing and managing useful words and phrases as they occur across a variety of authentic academic and general English language contexts. Findings and resources will also be shared in section three, based on an OER cascade study that was carried out with EAP teachers and students at Durham University English Language Centre (DUELC). As part of the TOETOE project (ˈtɔɪtɔɪ: Technology for Open English – Toying with Open E-resources), three corpus-based projects - FLAX, the Lextutor, and AntConc - were trialed for their efficacy in mainstream EAP teaching and learning practice. A fourth corpus-based project, WordandPhrase, was introduced at one of the project dissemination events and will also be introduced here in this case study. None of the participants in the study had received any prior training with corpus-based resources for Data-Driven Learning (DDL) in language education. Initial findings from this study at Durham University on the design and usability of corpus-based resources have informed on-going research and development work with the TOETOE project in collaboration with the open-source FLAX project at the University of Waikato in New Zealand.

Making use of and adapting MOOC text resources for language learning

Proceedings of the International Conference of Artificial Intelligence and Technology-Enhanced Language Learning (AiTELL) together with the Post-Graduate Academic Forum, Shanghai, 2019

Massive Open Online Courses are becoming popular educational vehicles through which universities reach out to non-traditional audiences. Many enrollees hail from other countries and cultures, and struggle to cope with the English language in which these courses are invariably offered. Moreover, most such learners have a strong desire and motivation to extend their knowledge of academic English, particularly in the specific area addressed by the course. Online courses provide a compelling opportunity for domain-specific language learning, a growing trend in language teaching and learning. Typical MOOCs supply a large corpus of interesting linguistic material relevant to a particular area, including supplementary images (slides), audio and video. Such corpus provides an excellent context to study domain-specific lexico-grammatical features of any word or phrase, a challenging aspect of English productive use even for quite advanced learners. We contend that this corpus can be automatically analysed, enriched, and transformed into a resource that learners can browse and query in order to extend their ability to understand the language used, and help them express themselves more fluently and eloquently in that domain. To illustrate this idea, an existing online corpus-based language learning tool (FLAX) is applied to a Coursera MOOC entitled English Common Law offered by the University of London. We will illustrate how this resource has been augmented for language learning, and then review how learners can use it to explore language usage. This article uses a single running example, a Coursera MOOC course, but the approach is fully automated and can be applied to any collection of English writing.

Investigating an Open Methodology for Designing Domain-specific Language Collections

In S. Jager, L. Bradley, E. J. Meima, & S. Thouësny (Eds), CALL Design: Principles and Practice. Proceedings of the 2014 EUROCALL Conference, Groningen, The Netherlands. , 2014

With this research and design paper, we are proposing that Open Educational Resources (OERs) and Open Access (OA) publications give increasing access to high quality online educational and research content for the development of powerful domain-specific language collections that can be further enhanced linguistically with the Flexible Language Acquisition System (FLAX, http://flax.nzdl.org). FLAX uses the Greenstone digital library system, which is a widely used open-source software that enables end users to build collections of documents and metadata directly onto the Web (Witten, Bainbridge, & Nichols, 2010). FLAX offers a powerful suite of interactive text-mining tools, using Natural Language Processing and Artificial Intelligence designs, to enable novice collections builders to link selected language content to large pre-processed linguistic databases. An open methodology trialed at Queen Mary University of London in collaboration with the OER Research Hub at the UK Open University demonstrates how applying open corpus-based designs and technologies can enhance open educational practices among language teachers and subject academics for the preparation and delivery of courses in English for Specific Academic Purposes (ESAP).

A reflective e-learning journey from the dawn of CALL to web 2.0 intercultural communicative competence (ICC).

I believe that language learning and teaching should reflect current research findings in the field and that both cognitive theories and interactional/sociolinguistic/sociosemantic ones should be taken into account when trying to understand how languages are learnt. Evidence has been emerging that seems to substantiate the claim that linguistic proficiency and Intercultural Communicative Competence (ICC) –pragmatic competence in the target language in particular–can be enhanced by the use of Computer Mediated Communication (CMC), as is well summarised by O’Dowd (2013) reporting on the findings of research on telecollaboration