Ian Witten | University of Waikato (original) (raw)

Papers by Ian Witten

Research paper thumbnail of Biblioteca digital Greenstone. Gu��a de instalaci��n

Research paper thumbnail of Weka Experiences with a Java Open Source Project

Research paper thumbnail of Computer in(security): infiltrating open systems

Research paper thumbnail of Predictive interfaces: what will they think of next?

Research paper thumbnail of The development and usage of the Greenstone digital library software

Bulletin of the American Society for Information Science and Technology, 2009

Research paper thumbnail of Compression-based template matching

Proceedings of IEEE Data Compression Conference (DCC'94)

Research paper thumbnail of Practical Machine Learning Tools and Techniques with Java Implementations

Research paper thumbnail of Data mining algorithms - part 1 of 2

Data Mining Fundamentals and Latest Developments, Jun 17, 2008

Research paper thumbnail of Storing And Retrieving Keys In A Table By Cross-Indexing

Research paper thumbnail of Making Better Use of Global Discretization

Proceedings of the Sixteenth International Conference on Machine Learning, Jun 27, 1999

Research paper thumbnail of A predictive calculator

Watch What I Do, Aug 30, 1993

Research paper thumbnail of Inside the Myths of Search Engine Technology

Research paper thumbnail of Learning Structure from Sequences, with Applications in a Digital Library

Lecture Notes in Computer Science, 2002

Research paper thumbnail of Chapter 3 - Presentation: User interfaces

How to Build a Digital Library (Second Edition), 2010

Publisher Summary This chapter focuses on presentation of data in digital libraries for use by re... more Publisher Summary This chapter focuses on presentation of data in digital libraries for use by readers, discussing what global users experience when interacting with digital libraries, which they invariably do through a Web browser. The role of metadata is considerably expanded in a digital library. Videos combine time-based information with a spatial image component. As with audio, time-based documents can be made more conveniently brows able by segmenting them, and videos can be automatically converted into sequences of thumbnails that correspond to scene changes. Digital collections of music have the potential to capture popular imagination in ways that scholarly libraries never will. Having different representations of the same music available and linking to external resources to locate additional, relevant information help in creating a resource that is interesting and entertaining to search and brows. When users initiate a search or browse in a digital library, they are often presented with lists or displays that summarize the digital objects themselves. These summaries are known as document surrogates, which are concise displays that represent the actual object, typically using some of its metadata.

Research paper thumbnail of Inferring lexical and grammatical structure from sequences

Research paper thumbnail of Text Mining

Chapman & Hall/CRC Computer & Information Science Series, 2004

Research paper thumbnail of People in digital libraries

How to Build a Digital Library, 2010

This chapter focuses on people and help and user support services in digital libraries, describin... more This chapter focuses on people and help and user support services in digital libraries, describing how to use information from these libraries. The emphasis on people is a fundamental principle of contemporary librarianship, and stands in contrast to medieval librarianship, whose job it was to protect, revere, and even chain up the books. The first step in building a successful digital library, therefore, is to understand the people involved. Libraries are social organizations that connect readers and authors through the content of their collections. Although reader and author are the most prominent roles, numerous people work behind the scenes to enable the simple act of reading a library book. Libraries establish services specifically to help connect users with resources that match their information needs. The figure is taken from a round-the-clock reference service that offers real-time one-on-one reference assistance from professional librarians, using Web-based chat, co-browsing, and cooperative reference tools. The copy and paste metaphor is familiar to anyone who has used a word processor or image editor. The same principle applies to audio and video, although the programs usually offer more controls.

Research paper thumbnail of Constraint-Solving in Interactive Graphics: A User-Friendly Approach

New Advances in Computer Graphics, 1989

Research paper thumbnail of A new framework for building digital library collections

Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries - JCDL '05, 2005

Research paper thumbnail of Lossless Compression for Text and Images

International Journal of High Speed Electronics and Systems, 1997

Most data that is inherently discrete needs to be compressed in such a way that it can be recover... more Most data that is inherently discrete needs to be compressed in such a way that it can be recovered exactly, without any loss. Examples include text of all kinds, experimental results, and statistical databases. Other forms of data may need to be stored exactly, such as images—particularly bilevel ones, or ones arising in medical and remote-sensing applications, or ones that may be required to be certified true for legal reasons. Moreover, during the process of lossy compression, many occasions for lossless compression of coefficients or other information arise. This paper surveys techniques for lossless compression. The process of compression can be broken down into modeling and coding. We provide an extensive discussion of coding techniques, and then introduce methods of modeling that are appropriate for text and images. Standard methods used in popular utilities (in the case of text) and international standards (in the case of images) are described.

Research paper thumbnail of Biblioteca digital Greenstone. Gu��a de instalaci��n

Research paper thumbnail of Weka Experiences with a Java Open Source Project

Research paper thumbnail of Computer in(security): infiltrating open systems

Research paper thumbnail of Predictive interfaces: what will they think of next?

Research paper thumbnail of The development and usage of the Greenstone digital library software

Bulletin of the American Society for Information Science and Technology, 2009

Research paper thumbnail of Compression-based template matching

Proceedings of IEEE Data Compression Conference (DCC'94)

Research paper thumbnail of Practical Machine Learning Tools and Techniques with Java Implementations

Research paper thumbnail of Data mining algorithms - part 1 of 2

Data Mining Fundamentals and Latest Developments, Jun 17, 2008

Research paper thumbnail of Storing And Retrieving Keys In A Table By Cross-Indexing

Research paper thumbnail of Making Better Use of Global Discretization

Proceedings of the Sixteenth International Conference on Machine Learning, Jun 27, 1999

Research paper thumbnail of A predictive calculator

Watch What I Do, Aug 30, 1993

Research paper thumbnail of Inside the Myths of Search Engine Technology

Research paper thumbnail of Learning Structure from Sequences, with Applications in a Digital Library

Lecture Notes in Computer Science, 2002

Research paper thumbnail of Chapter 3 - Presentation: User interfaces

How to Build a Digital Library (Second Edition), 2010

Publisher Summary This chapter focuses on presentation of data in digital libraries for use by re... more Publisher Summary This chapter focuses on presentation of data in digital libraries for use by readers, discussing what global users experience when interacting with digital libraries, which they invariably do through a Web browser. The role of metadata is considerably expanded in a digital library. Videos combine time-based information with a spatial image component. As with audio, time-based documents can be made more conveniently brows able by segmenting them, and videos can be automatically converted into sequences of thumbnails that correspond to scene changes. Digital collections of music have the potential to capture popular imagination in ways that scholarly libraries never will. Having different representations of the same music available and linking to external resources to locate additional, relevant information help in creating a resource that is interesting and entertaining to search and brows. When users initiate a search or browse in a digital library, they are often presented with lists or displays that summarize the digital objects themselves. These summaries are known as document surrogates, which are concise displays that represent the actual object, typically using some of its metadata.

Research paper thumbnail of Inferring lexical and grammatical structure from sequences

Research paper thumbnail of Text Mining

Chapman & Hall/CRC Computer & Information Science Series, 2004

Research paper thumbnail of People in digital libraries

How to Build a Digital Library, 2010

This chapter focuses on people and help and user support services in digital libraries, describin... more This chapter focuses on people and help and user support services in digital libraries, describing how to use information from these libraries. The emphasis on people is a fundamental principle of contemporary librarianship, and stands in contrast to medieval librarianship, whose job it was to protect, revere, and even chain up the books. The first step in building a successful digital library, therefore, is to understand the people involved. Libraries are social organizations that connect readers and authors through the content of their collections. Although reader and author are the most prominent roles, numerous people work behind the scenes to enable the simple act of reading a library book. Libraries establish services specifically to help connect users with resources that match their information needs. The figure is taken from a round-the-clock reference service that offers real-time one-on-one reference assistance from professional librarians, using Web-based chat, co-browsing, and cooperative reference tools. The copy and paste metaphor is familiar to anyone who has used a word processor or image editor. The same principle applies to audio and video, although the programs usually offer more controls.

Research paper thumbnail of Constraint-Solving in Interactive Graphics: A User-Friendly Approach

New Advances in Computer Graphics, 1989

Research paper thumbnail of A new framework for building digital library collections

Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries - JCDL '05, 2005

Research paper thumbnail of Lossless Compression for Text and Images

International Journal of High Speed Electronics and Systems, 1997

Most data that is inherently discrete needs to be compressed in such a way that it can be recover... more Most data that is inherently discrete needs to be compressed in such a way that it can be recovered exactly, without any loss. Examples include text of all kinds, experimental results, and statistical databases. Other forms of data may need to be stored exactly, such as images—particularly bilevel ones, or ones arising in medical and remote-sensing applications, or ones that may be required to be certified true for legal reasons. Moreover, during the process of lossy compression, many occasions for lossless compression of coefficients or other information arise. This paper surveys techniques for lossless compression. The process of compression can be broken down into modeling and coding. We provide an extensive discussion of coding techniques, and then introduce methods of modeling that are appropriate for text and images. Standard methods used in popular utilities (in the case of text) and international standards (in the case of images) are described.

Research paper thumbnail of Evaluating the efficacy of the digital commons for scaling data-driven learning

In M. Carrier, R. Damerow, K. Bailey (Eds.), Digital Language Learning and Teaching: Research, Th... more In M. Carrier, R. Damerow, K. Bailey (Eds.), Digital Language Learning and Teaching: Research, Theory and Practice. Global Research on Teaching and Learning English Series. Routledge, Taylor & Francis. ISBN: 978-1138696815.

Research paper thumbnail of Bridging Informal Massive Open Online Courses and Formal English for Academic Purposes Programmes with Language Corpora

Massive Open Online Courses (MOOCs) provide a compelling opportunity for domain-specific language... more Massive Open Online Courses (MOOCs) provide a compelling opportunity for domain-specific language learning. They supply a large corpus of interesting linguistic material relevant to a particular subject area, including text, supplementary images (slides), audio and video. It follows then that these domain-specific corpora can also be used in formal English for Academic Purposes (EAP) programmes as well. Such corpora can be automatically analysed, enriched, and transformed into a resource that learners can browse and query in order to extend their ability to understand the language used, and help them express themselves more fluently and eloquently in that domain.

Research paper thumbnail of FLAX: Flexible Language Acquisition with Linked and Open Data-Driven Learning

LinkedUp: Linking Web Data for Education Project – Open Challenge in Web-scale Data Integration C... more LinkedUp: Linking Web Data for Education Project – Open Challenge in Web-scale Data Integration Coordination and Support Action (CSA) Grant Agreement No: 317620 http://linkedup-project.eu/

FLAX, an open-source multilingual online corpus-based language-learning tool, was awarded 1st prize in the LinkedUp Vici challenge for mature open data-driven applications for education. FLAX is applied to Open Educational Resources (OERs), including openly licensed Coursera and edX Massive Open Online Course (MOOC) content, Open Data, and Open Access research content for the development of domain-specific language collections.

FLAX uses the Greenstone suite of open-source multilingual software for building and distributing digital library collections, which can be published on the Internet or on CD-ROM. Issued under the terms of the GNU General Public License, Greenstone is produced by the New Zealand Digital Library Project at the University of Waikato, and developed and distributed in cooperation with UNESCO and the Human Info NGO. The Computer Science Department at Waikato University is also home to the popular Weka (Waikato Environment for Knowledge Analysis) suite of machine learning software written in Java.

Research paper thumbnail of Flexible Open Language Education for a Multilingual World

This research and technology paper will present open language tools and collections that have bee... more This research and technology paper will present open language tools and collections that have been developed for supporting domain-specific academic language with the FLAX multilingual open source software. OpenCourseWare (OCW), Massive Open Online Courses (MOOC) and Open Educational Resources (OER) are becoming popular educational vehicles through which well-resourced universities and organisations can reach out to non-traditional audiences, including those from other countries and cultures. For example, the OCW Consortium website states that, " Open Education seeks to scale educational opportunities by taking advantage of the power of the internet, allowing rapid and essentially free dissemination, and enabling people around the world to access knowledge, connect and collaborate " (" About the OCWC, " n.d.). Specificity in Academic Language Open education provides a compelling opportunity for domain-specific academic language learning. Online courses supply a large corpus of interesting linguistic material relevant to a particular area, including supplementary images (slides), audio and video. We contend that this corpus can be automatically analysed, enriched, and transformed into a resource that learners can browse and query in order to extend their ability to understand the language used, and help them express themselves more fluently and eloquently in that domain. To illustrate this idea, an existing online corpus-based language learning tool (FLAX) is applied to an English-medium Coursera MOOC offered by Columbia University, entitled Virology 1: How Viruses Work. MOOC participants register for educational courses; they do not sign up as language learners. However, many online learners will encounter a language barrier during their study with many of the open educational offerings being delivered in the world's presiding lingua francas, namely English, Arabic, French, Chinese, Russian, Spanish, and Portuguese. Beyond the simple translation of lecture transcripts and course readings, learners will be strongly motivated to improve their knowledge of key terms and concepts as they are used in the subject domain, exemplified here with the Virology MOOC collections in FLAX for support with Academic Language. (They are also helpful for native speakers of the target language.) OER Research Hypotheses for Open Language Support Research into the development and uses of text analysis tools from corpus linguistics has been primarily carried out in relation to traditional classroom-based university teaching only. This is despite the growing number of higher education offerings in open and distance learning, including the recent surge in OER, OCW and MOOCs in collaboration with universities and educational organisations. Drawing on linguistic data sources from MOOCs, along with survey data from MOOC learners and interview and survey data from course developers and English language education professionals, this paper will present findings from participants based on their perceptions of the effectiveness of the open language tools and collections in FLAX under investigation for Academic Language support. Specific OER research hypotheses have been investigated through this research in collaboration with the OER Research Hub. The following OER hypotheses have been examined through the different data collection instruments in relation to the different research participant groups, including: Hypothesis A: Use of OER leads to improvement in student performance and satisfaction. Hypothesis E: Use of OER leads to critical reflection by educators, with improvement in their practice. Hypothesis H: Informal learners adopt a variety of techniques to compensate for the lack of formal support. Hypothesis I: Open education acts as a bridge to formal education, and is complementary, not competitive, with it. Scaling Flexible Open Language Learning For the purpose of innovating, building and creating multilingual learning support collections for large-scale learning, both online and offline, the flexible tools and resources in FLAX can be applied to content

Research paper thumbnail of Wow! The FLAX Language System - So Much Open Data!

Presentation at the LinkedUp Vici Challenge Event for designing and developing advanced tools for... more Presentation at the LinkedUp Vici Challenge Event for designing and developing advanced tools for educational purposes that are driven by linked and open data (awarded 1st place prize). The 13th International Semantic Web conference. Riva del Garda, Trentino, Italy.

Research paper thumbnail of The PhD Abstracts Collections in FLAX: Academic English with the Open Access Electronic Theses Online Service (EThOS) at the British Library

The project presents an educational research study into the development and evaluation of domain-... more The project presents an educational research study into the development and evaluation of domain-specific language corpora derived from PhD abstracts with the Electronic Theses Online Service (EThOS) at the British Library. The collections, which are openly available from this study, were built using the interactive FLAX (Flexible Language Acquisition flax.nzdl.org) open-source software for uptake in English for Specific Academic Purposes programmes (ESAP) at Queen Mary University of London. The project involved the harvesting of metadata, including the abstracts of 400,000 doctoral theses from UK universities, from the EThOS Toolkit at the British Library. These digital PhD abstract text collections were then automatically analysed, enriched, and transformed into a resource that second-language and novice research writers can browse and query in order to extend their ability to understand the language used in specific domains, and to help them develop their abstract writing. It is anticipated that the practical contribution of the FLAX tools and the EThOS PhD Abstract collections will benefit second-language and novice research writers in understanding the language used to achieve the persuasive and promotional aspects of the written research abstract genre. It is also anticipated that users of the collections will be able to develop their arguments more fluently and precisely through the practice of research abstract writing to project a persuasive voice as is used in specific research disciplines.