Kjell Lemström - Academia.edu (original) (raw)
Papers by Kjell Lemström
arXiv (Cornell University), Oct 21, 2019
The importance of repetitions in music is well-known. In this paper, we study music repetitions i... more The importance of repetitions in music is well-known. In this paper, we study music repetitions in the context of effective and efficient automatic genre classification in large-scale music-databases. We aim at enhancing the access and organization of pieces of music in Digital Libraries by allowing automatic categorization of entire collections by considering only their musical content. We handover to the public a set of genre-specific patterns to support research in musicology. The patterns can be used, for instance, to explore and analyze the relations between musical genres. There are many existing algorithms that could be used to identify and extract repeating patterns in symbolically encoded music. In our case, the extracted patterns are used as representations of the pieces of music on the underlying corpus and, consecutively, to train and evaluate a classifier to automatically identify genres. In this paper, we apply two very fast algorithms enabling us to experiment on large and diverse corpora. Thus, we are able to find patterns with strong discrimination power that can be used in various applications. We carried out experiments on a corpus containing over 40,000 MIDI files annotated with at least one genre. The experiments suggest that our approach is scalable and capable of dealing with real-world-size music collections. CCS CONCEPTS • Information systems → Digital libraries and archives; Information extraction; Clustering and classification; Music retrieval; • Applied computing → Sound and music computing;
Informatics in education, Apr 17, 2023
Massive Open Online Courses (MOOCs) have become hugely popular recently. MOOCs can offer high-qua... more Massive Open Online Courses (MOOCs) have become hugely popular recently. MOOCs can offer high-quality education for anyone interested and equalize the whole education field. Still, there are different methodologies for running MOOCs. Coming up with the most suitable methodology benefits both students and teachers. In this study, we have limited the methodological focus to observing scheduled and unscheduled instances of similar MOOC courses. While unscheduled MOOC courses can provide flexibility, they also require self-regulated learning strategies for students to succeed. To observe this, we compare the effectiveness of scheduled and unscheduled programming MOOC courses to find the most effective methodology. For this, we compare the pass rates and grade averages of five instances (two unscheduled and three scheduled) of Python and Java programming MOOCs. The results show that while the attendance numbers are higher in the unscheduled versions, in the scheduled instances the pass rate is significantly better, and students’ progression is much swifter. It also seems that the higher proportion of university students enrolled in a MOOC course positively affects the retention rate. Moreover, the students in the recent unscheduled Python version seem to score significantly higher grades than in its scheduled counterpart. Based on our experiments, the scheduled and unscheduled versions complement each other. Hence, we suggest that, whenever feasible, the maximal benefits would be gained if both types of MOOCs are run simultaneously.
Fuzzy Systems and Knowledge Discovery, Nov 1, 2002
MATHEMATICS AND COMPUTATION IN MUSIC (MCM 2022): Proc. 8th International MCM Conference (Lecture Notes in Computer Science), 2022
SIA is a fundamental algorithm in symbolic musical pattern discovery, which reports all maximal t... more SIA is a fundamental algorithm in symbolic musical pattern discovery, which reports all maximal translatable patterns in a point set. The original SIA algorithm requires O(kn2logn)O(k n^2 \log n)O(kn2logn) time and O(kn2)O(k n^2)O(kn2) space, where nnn is the number of points in the data set, and kkk is the number of coordinates in each point. In this paper, we present a sweepline algorithm that shares the running time of SIA but requires only O(kn)O(k n)O(kn) space, enabling to process of larger data sets without running out of memory. Since SIA is the first step in many pattern discovery tasks, our new algorithm can have a broad impact. For example, we discuss the problem of finding all occurrences of maximal translatable patterns with specific properties. We also compare the algorithms in practice and show that reduced memory usage can benefit real data sets.
Journal of Mathematics and Music, 2021
We study the problem of identifying repetitions under transposition and time-warp invariances in ... more We study the problem of identifying repetitions under transposition and time-warp invariances in polyphonic symbolic music. Using a novel onset-time-pair representation, we reduce the repeating pattern discovery problem to instances of the classical problem of finding the longest increasing subsequences. The resulting algorithm works in O(n 2 log n) time where n is the number of notes in a musical work. We also study windowed variants of the problem where onset-time differences between notes are restricted, and show that they can also be solved in O(n 2 log n) time using the algorithm.
2. I'll talk for about half an hour on the work that I've done on pattern discovery, fo... more 2. I'll talk for about half an hour on the work that I've done on pattern discovery, focusing on SIA and SIATEC, which are two new efficient pattern-discovery algorithms that I've developed over the past year or so in collaboration with Geraint Wiggins and Kjell Lemström.
Computing, May 22, 2001
We present below SIA and SIATEC, two new algorithms for efficient and effective pattern-discovery... more We present below SIA and SIATEC, two new algorithms for efficient and effective pattern-discovery in multidimensional datasets.(A multidimensional dataset is simply any set of points in an N-dimensional space.) These algorithms can be used as the basis of new applications for compression and indexing of databases, and data mining or structural analysis of data. The new algorithms are particularly appropriate for use with databases in which each item in the database is represented as a multidimensional dataset, as is the ...
This invention provides methods for pattern discovery, pattern matching and data compression in m... more This invention provides methods for pattern discovery, pattern matching and data compression in multidimensional numerical datasets. The invention can usefully be applied in any domain in which information represented in the form of multidimensional datasets needs to be retrieved, compared, analysed or compressed. Such domains include 2D images, audio and video data, biomolecular data, seismic, meteorological and financial data. There already exist methods for pattern discovery, pattern matching and data ...
Proceedings of 2001 Conference on Systemics, Cybernetics and Informatics, 2001
4. We're both going to concentrate on the musical applications of these algorithms but you s... more 4. We're both going to concentrate on the musical applications of these algorithms but you should be aware that these algorithms are, in fact, quite general and could be applied to any data that can appropriately be represented in the form of a multidimensional dataset (that is, a set of points in a Cartesian space.)
PCT patent application number PCT/GB02/02430, UK patent application, May 23, 2002
Figure 1: (a) shows a simple 2-dimensional dataset. (b)–(j) show the maximal repeated patterns fo... more Figure 1: (a) shows a simple 2-dimensional dataset. (b)–(j) show the maximal repeated patterns found by SIA in the dataset in (a). ... Figure 2: The sets of patterns discovered by SIATEC in the dataset in Figure 1(a). ... Figure 3: When SIAME searches for occurrences of the query pattern (a) in the dataset (b), it finds the exact matches shown in (c). It also finds the closest incomplete matches shown in (d). ... Figure 4: (b) shows the compressed representation generated by COSIATEC for the dataset (a). The dataset in (a) can be generated by translating the three-point ...
In this paper we address the problem of designing an algorithm that, when given a representation ... more In this paper we address the problem of designing an algorithm that, when given a representation of a passage of music as input, discovers all and only instances of perceptually significant repetition in the passage. The algorithm must be able to discover perceptually significant instances of repetition where the occurrences of the pattern are identical (exact repetition) and instances where the occurrences differ but are nonetheless perceived to be 'versions' of the same thing (modified repetition). Such an algorithm would ...
Invariances are central concepts in content-based music retrieval. Musical representations and si... more Invariances are central concepts in content-based music retrieval. Musical representations and similarity measures are designed to capture musically relevant invariances, such as transposition invariance. Though regularly used, their explicit definition is usually omitted because of the heavy formalism required. The lack of explicit definition, however, can result in misuse or misunderstanding of the terms. We discuss the musical relevance of various musical invariances and develop a set-theoretic formalism, for defining and classifying them. Using it, we define the most common invariances, and give a taxonomy which they inhabit. The taxonomy serves as a useful tool for idetinfying where work is needed to address real world problems in content-based music retrieval.
We present an efficient prototype for music information retrieval. The prototype uses bitparallel... more We present an efficient prototype for music information retrieval. The prototype uses bitparallel algorithms for locating transposition invariant matches of monophonic query melodies within monophonic or polyphonic music stored in a database. When dealing with monophonic music, we employ a fast approximate bit-parallel algorithm with special edit distance metrics. The fast scanning phase is succeeded by verification where a separate metrics is used for ranking matches. We also offer the possibility to search for exact occurrences of a 'distributed' melody within polyphonic databases via a bit-parallel filtering technique. In our experiments with a database of 2 million musical elements (notes in a monophonic and chords in a polyphonic database) the responses were obtained within one second in both cases. Furthermore, our prototype is capable of using various interval classes in matching, producing more approximation when it is needed.
This paper studies the problem of transposition and timescale invariant (ttsi) polyphonic music r... more This paper studies the problem of transposition and timescale invariant (ttsi) polyphonic music retrieval in symbolically encoded music. In the setting, music is represented by sets of points in plane. We give two new algorithms. Applying a search window of size w and given a query point set, of size m, to be searched for in a database point set, of size n, our algorithm for exact ttsi occurrences runs in O(mwn log n) time; for partial occurrences we have an O(mnw 2 log n) algorithm. The framework used is flexible allowing development towards even more robust geometric retrieval.
Lecture Notes in Computer Science, 2010
This paper considers how to adapt geometric algorithms, developed for content-based music retriev... more This paper considers how to adapt geometric algorithms, developed for content-based music retrieval of symbolically encoded music, to be robust against time deformations required by real-world applications. In this setting, music is represented by sets of points in plane. A matching, pertinent to the application, involves two such sets of points and invariances under translations and time scalings. We give an algorithm for finding exact occurrences, under such a setting, of a given query point set, of size m, within a database point set, of size n, with running time O(mn 2 log n); partial occurrences are found in O(m 2 n 2 log n) time. The algorithms resemble the sweepline algorithm introduced in [1].
arXiv (Cornell University), Oct 21, 2019
The importance of repetitions in music is well-known. In this paper, we study music repetitions i... more The importance of repetitions in music is well-known. In this paper, we study music repetitions in the context of effective and efficient automatic genre classification in large-scale music-databases. We aim at enhancing the access and organization of pieces of music in Digital Libraries by allowing automatic categorization of entire collections by considering only their musical content. We handover to the public a set of genre-specific patterns to support research in musicology. The patterns can be used, for instance, to explore and analyze the relations between musical genres. There are many existing algorithms that could be used to identify and extract repeating patterns in symbolically encoded music. In our case, the extracted patterns are used as representations of the pieces of music on the underlying corpus and, consecutively, to train and evaluate a classifier to automatically identify genres. In this paper, we apply two very fast algorithms enabling us to experiment on large and diverse corpora. Thus, we are able to find patterns with strong discrimination power that can be used in various applications. We carried out experiments on a corpus containing over 40,000 MIDI files annotated with at least one genre. The experiments suggest that our approach is scalable and capable of dealing with real-world-size music collections. CCS CONCEPTS • Information systems → Digital libraries and archives; Information extraction; Clustering and classification; Music retrieval; • Applied computing → Sound and music computing;
Informatics in education, Apr 17, 2023
Massive Open Online Courses (MOOCs) have become hugely popular recently. MOOCs can offer high-qua... more Massive Open Online Courses (MOOCs) have become hugely popular recently. MOOCs can offer high-quality education for anyone interested and equalize the whole education field. Still, there are different methodologies for running MOOCs. Coming up with the most suitable methodology benefits both students and teachers. In this study, we have limited the methodological focus to observing scheduled and unscheduled instances of similar MOOC courses. While unscheduled MOOC courses can provide flexibility, they also require self-regulated learning strategies for students to succeed. To observe this, we compare the effectiveness of scheduled and unscheduled programming MOOC courses to find the most effective methodology. For this, we compare the pass rates and grade averages of five instances (two unscheduled and three scheduled) of Python and Java programming MOOCs. The results show that while the attendance numbers are higher in the unscheduled versions, in the scheduled instances the pass rate is significantly better, and students’ progression is much swifter. It also seems that the higher proportion of university students enrolled in a MOOC course positively affects the retention rate. Moreover, the students in the recent unscheduled Python version seem to score significantly higher grades than in its scheduled counterpart. Based on our experiments, the scheduled and unscheduled versions complement each other. Hence, we suggest that, whenever feasible, the maximal benefits would be gained if both types of MOOCs are run simultaneously.
Fuzzy Systems and Knowledge Discovery, Nov 1, 2002
MATHEMATICS AND COMPUTATION IN MUSIC (MCM 2022): Proc. 8th International MCM Conference (Lecture Notes in Computer Science), 2022
SIA is a fundamental algorithm in symbolic musical pattern discovery, which reports all maximal t... more SIA is a fundamental algorithm in symbolic musical pattern discovery, which reports all maximal translatable patterns in a point set. The original SIA algorithm requires O(kn2logn)O(k n^2 \log n)O(kn2logn) time and O(kn2)O(k n^2)O(kn2) space, where nnn is the number of points in the data set, and kkk is the number of coordinates in each point. In this paper, we present a sweepline algorithm that shares the running time of SIA but requires only O(kn)O(k n)O(kn) space, enabling to process of larger data sets without running out of memory. Since SIA is the first step in many pattern discovery tasks, our new algorithm can have a broad impact. For example, we discuss the problem of finding all occurrences of maximal translatable patterns with specific properties. We also compare the algorithms in practice and show that reduced memory usage can benefit real data sets.
Journal of Mathematics and Music, 2021
We study the problem of identifying repetitions under transposition and time-warp invariances in ... more We study the problem of identifying repetitions under transposition and time-warp invariances in polyphonic symbolic music. Using a novel onset-time-pair representation, we reduce the repeating pattern discovery problem to instances of the classical problem of finding the longest increasing subsequences. The resulting algorithm works in O(n 2 log n) time where n is the number of notes in a musical work. We also study windowed variants of the problem where onset-time differences between notes are restricted, and show that they can also be solved in O(n 2 log n) time using the algorithm.
2. I'll talk for about half an hour on the work that I've done on pattern discovery, fo... more 2. I'll talk for about half an hour on the work that I've done on pattern discovery, focusing on SIA and SIATEC, which are two new efficient pattern-discovery algorithms that I've developed over the past year or so in collaboration with Geraint Wiggins and Kjell Lemström.
Computing, May 22, 2001
We present below SIA and SIATEC, two new algorithms for efficient and effective pattern-discovery... more We present below SIA and SIATEC, two new algorithms for efficient and effective pattern-discovery in multidimensional datasets.(A multidimensional dataset is simply any set of points in an N-dimensional space.) These algorithms can be used as the basis of new applications for compression and indexing of databases, and data mining or structural analysis of data. The new algorithms are particularly appropriate for use with databases in which each item in the database is represented as a multidimensional dataset, as is the ...
This invention provides methods for pattern discovery, pattern matching and data compression in m... more This invention provides methods for pattern discovery, pattern matching and data compression in multidimensional numerical datasets. The invention can usefully be applied in any domain in which information represented in the form of multidimensional datasets needs to be retrieved, compared, analysed or compressed. Such domains include 2D images, audio and video data, biomolecular data, seismic, meteorological and financial data. There already exist methods for pattern discovery, pattern matching and data ...
Proceedings of 2001 Conference on Systemics, Cybernetics and Informatics, 2001
4. We're both going to concentrate on the musical applications of these algorithms but you s... more 4. We're both going to concentrate on the musical applications of these algorithms but you should be aware that these algorithms are, in fact, quite general and could be applied to any data that can appropriately be represented in the form of a multidimensional dataset (that is, a set of points in a Cartesian space.)
PCT patent application number PCT/GB02/02430, UK patent application, May 23, 2002
Figure 1: (a) shows a simple 2-dimensional dataset. (b)–(j) show the maximal repeated patterns fo... more Figure 1: (a) shows a simple 2-dimensional dataset. (b)–(j) show the maximal repeated patterns found by SIA in the dataset in (a). ... Figure 2: The sets of patterns discovered by SIATEC in the dataset in Figure 1(a). ... Figure 3: When SIAME searches for occurrences of the query pattern (a) in the dataset (b), it finds the exact matches shown in (c). It also finds the closest incomplete matches shown in (d). ... Figure 4: (b) shows the compressed representation generated by COSIATEC for the dataset (a). The dataset in (a) can be generated by translating the three-point ...
In this paper we address the problem of designing an algorithm that, when given a representation ... more In this paper we address the problem of designing an algorithm that, when given a representation of a passage of music as input, discovers all and only instances of perceptually significant repetition in the passage. The algorithm must be able to discover perceptually significant instances of repetition where the occurrences of the pattern are identical (exact repetition) and instances where the occurrences differ but are nonetheless perceived to be 'versions' of the same thing (modified repetition). Such an algorithm would ...
Invariances are central concepts in content-based music retrieval. Musical representations and si... more Invariances are central concepts in content-based music retrieval. Musical representations and similarity measures are designed to capture musically relevant invariances, such as transposition invariance. Though regularly used, their explicit definition is usually omitted because of the heavy formalism required. The lack of explicit definition, however, can result in misuse or misunderstanding of the terms. We discuss the musical relevance of various musical invariances and develop a set-theoretic formalism, for defining and classifying them. Using it, we define the most common invariances, and give a taxonomy which they inhabit. The taxonomy serves as a useful tool for idetinfying where work is needed to address real world problems in content-based music retrieval.
We present an efficient prototype for music information retrieval. The prototype uses bitparallel... more We present an efficient prototype for music information retrieval. The prototype uses bitparallel algorithms for locating transposition invariant matches of monophonic query melodies within monophonic or polyphonic music stored in a database. When dealing with monophonic music, we employ a fast approximate bit-parallel algorithm with special edit distance metrics. The fast scanning phase is succeeded by verification where a separate metrics is used for ranking matches. We also offer the possibility to search for exact occurrences of a 'distributed' melody within polyphonic databases via a bit-parallel filtering technique. In our experiments with a database of 2 million musical elements (notes in a monophonic and chords in a polyphonic database) the responses were obtained within one second in both cases. Furthermore, our prototype is capable of using various interval classes in matching, producing more approximation when it is needed.
This paper studies the problem of transposition and timescale invariant (ttsi) polyphonic music r... more This paper studies the problem of transposition and timescale invariant (ttsi) polyphonic music retrieval in symbolically encoded music. In the setting, music is represented by sets of points in plane. We give two new algorithms. Applying a search window of size w and given a query point set, of size m, to be searched for in a database point set, of size n, our algorithm for exact ttsi occurrences runs in O(mwn log n) time; for partial occurrences we have an O(mnw 2 log n) algorithm. The framework used is flexible allowing development towards even more robust geometric retrieval.
Lecture Notes in Computer Science, 2010
This paper considers how to adapt geometric algorithms, developed for content-based music retriev... more This paper considers how to adapt geometric algorithms, developed for content-based music retrieval of symbolically encoded music, to be robust against time deformations required by real-world applications. In this setting, music is represented by sets of points in plane. A matching, pertinent to the application, involves two such sets of points and invariances under translations and time scalings. We give an algorithm for finding exact occurrences, under such a setting, of a given query point set, of size m, within a database point set, of size n, with running time O(mn 2 log n); partial occurrences are found in O(m 2 n 2 log n) time. The algorithms resemble the sweepline algorithm introduced in [1].