Michael Mbouopda | Université Clermont Auvergne (original) (raw)
Uploads
Papers by Michael Mbouopda
Zenodo (CERN European Organization for Nuclear Research), Oct 13, 2022
Springer eBooks, 2023
Groundwater level prediction is an applied time series forecasting task with important social imp... more Groundwater level prediction is an applied time series forecasting task with important social impacts to optimize water management as well as preventing some natural disasters: for instance, floods or severe droughts. Machine learning methods have been reported in the literature to achieve this task, but they are only focused on the forecast of the groundwater level at a single location. A global forecasting method aims at exploiting the groundwater level time series from a wide range of locations to produce predictions at a single place or at several places at a time. Given the recent success of global forecasting methods in prestigious competitions, it is meaningful to assess them on groundwater level prediction and see how they are compared to local methods. In this work, we created a dataset of 1026 groundwater level time series. Each time series is made of daily measurements of groundwater levels and two exogenous variables, rainfall and evapotranspiration. This dataset is made available to the communities for reproducibility and further evaluation. To identify the best configuration to effectively predict groundwater level for the complete set of time series, we compared different predictors including local and global time series forecasting methods. We assessed the impact of exogenous variables. Our result analysis shows that the best predictions are obtained by training a global method on past groundwater levels and rainfall data.
2022 IEEE International Conference on Data Mining Workshops (ICDMW)
Le Centre pour la Communication Scientifique Directe - HAL - Université de Nantes, Jun 29, 2020
Time series classification is a task that aims at classifying chronological data. It is used in a... more Time series classification is a task that aims at classifying chronological data. It is used in a diverse range of domains such as meteorology, medicine and physics. In the last decade, many algorithms have been built to perform this task with very appreciable accuracy. However, applications where time series have uncertainty has been under-explored. Using uncertainty propagation techniques, we propose a new uncertain dissimilarity measure based on Euclidean distance. We then propose the uncertain shapelet transform algorithm for the classification of uncertain time series. The large experiments we conducted on state of the art datasets show the effectiveness of our contribution. The source code of our contribution and the datasets we used are all available on a public repository.
Cornell University - arXiv, Sep 28, 2022
Groundwater level prediction is an applied time series forecasting task with important social imp... more Groundwater level prediction is an applied time series forecasting task with important social impacts to optimize water management as well as preventing some natural disasters: for instance, floods or severe droughts. Machine learning methods have been reported in the literature to achieve this task, but they are only focused on the forecast of the groundwater level at a single location. A global forecasting method aims at exploiting the groundwater level time series from a wide range of locations to produce predictions at a single place or at several places at a time. Given the recent success of global forecasting methods in prestigious competitions, it is meaningful to assess them on groundwater level prediction and see how they are compared to local methods. In this work, we created a dataset of 1026 groundwater level time series. Each time series is made of daily measurements of groundwater levels and two exogenous variables, rainfall and evapotranspiration. This dataset is made available to the communities for reproducibility and further evaluation. To identify the best configuration to effectively predict groundwater level for the complete set of time series, we compared different predictors including local and global time series forecasting methods. We assessed the impact of exogenous variables. Our result analysis shows that the best predictions are obtained by training a global method on past groundwater levels and rainfall data.
2022 International Joint Conference on Neural Networks (IJCNN)
Exploring the expansion history of the universe, understanding its evolutionary stages, and predi... more Exploring the expansion history of the universe, understanding its evolutionary stages, and predicting its future evolution are important goals in astrophysics. Today, machine learning tools are used to help achieving these goals by analyzing transient sources, which are modeled as uncertain time series. Although black-box methods achieve appreciable performance, existing interpretable time series methods failed to obtain acceptable performance for this type of data. Furthermore, data uncertainty is rarely taken into account in these methods. In this work, we propose an uncertaintyaware subsequence based model which achieves a classification comparable to that of state-of-the-art methods. Unlike conformal learning which estimates model uncertainty on predictions, our method takes data uncertainty as additional input. Moreover, our approach is explainable-by-design, giving domain experts the ability to inspect the model and explain its predictions. The explainability of the proposed method has also the potential to inspire new developments in theoretical astrophysics modeling by suggesting important subsequences which depict details of light curve shapes. The dataset, the source code of our experiment, and the results are made available on a public repository.
Time serie classification is used in a diverse range of domain such as meteorology, medicine and ... more Time serie classification is used in a diverse range of domain such as meteorology, medicine and physics. It aims to classify chronological data. Many accurate approaches have been built during the last decade and shapelet transformation is one of them. However, none of these approaches does take data uncertainty into account. Using uncertainty propagation techiniques, we propose a new dissimilarity measure based on euclidean distance. We also show how to use this new measure to adapt shapelet transformation to uncertain time series classification. An experimental assessment of our contribution is done on some state of the art datasets.
Conférence Nationale en Intelligence Artificielle, Dec 10, 2019
Time series classification is a task that aims at classifying chronological data. It is used in a... more Time series classification is a task that aims at classifying chronological data. It is used in a diverse range of domains such as meteorology, medicine and physics. In the last decade, many algorithms have been built to perform this task with very appreciable accuracy. However, the uncertainty in data is not explicitly taken into account by these methods. Using uncertainty propagation techniques, we propose a new uncertain dissimilarity measure based on euclidean distance. We also show how to classify uncertain time series using the proposed dissimilarity measure and shapelet transform, one of the best time series classification methods. An experimental assessment of our contribution is done on the well known UCR dataset.
ARIMA J., 2020
Named Entity Recognition (NER) is a fundamental task in many NLP applications that seek to identi... more Named Entity Recognition (NER) is a fundamental task in many NLP applications that seek to identify and classify expressions such as people, location, and organization names. Many NER systems have been developed, but the annotated data needed for good performances are not available for low-resource languages, such as Cameroonian languages. In this paper we exploit the low frequency of named entities in text to define a new suitable cross-lingual distributional representation for named entity recognition. We build the first Ewondo (a Bantu low-resource language of Cameroon) named entities recognizer by projecting named entity tags from English using our word representation. In terms of Recall, Precision and F-score, the obtained results show the effectiveness of the proposed distributional representation of words
2020 International Conference on Data Mining Workshops (ICDMW), 2020
Time series classification is a task that aims at classifying chronological data. It is used in a... more Time series classification is a task that aims at classifying chronological data. It is used in a diverse range of domains such as meteorology, medicine and physics. In the last decade, many algorithms have been built to perform this task with very appreciable accuracy. However, applications where time series have uncertainty has been under-explored. Using uncertainty propagation techniques, we propose a new uncertain dissimilarity measure based on Euclidean distance. We then propose the uncertain shapelet transform algorithm for the classification of uncertain time series. The large experiments we conducted on state of the art datasets show the effectiveness of our contribution. The source code of our contribution and the datasets we used are all available on a public repository.
La classification des series temporelles est une tâche utilisee dans divers domaines tels que la ... more La classification des series temporelles est une tâche utilisee dans divers domaines tels que la meteorologie, la medecine et la physique. Elle a pour but de classer des objets modelises sous forme de series temporelles. Des al-gorithmes toujours plus efficaces ont vu le jour durant cette derniere decennie pour accomplir cette tâche. Parmi ces algorithmes, il y a ceux bases sur la transformation en shapelet. Cependant aucun de ces derniers n'est applicable dans le cadre des series temporelles incertaines. En utilisant les techniques de propagation de l'incertitude, nous proposons dans cet article une nouvelle mesure de dissimilarite incertaine que nous utilisons ensuite pour adapter la transformation en shapelet aux series temporelles incertaines. Une validation experimentale sur des donnees de la litterature montre l'interet de notre approche.
Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence
Time series analysis has gained a lot of interest during the last decade with diverse application... more Time series analysis has gained a lot of interest during the last decade with diverse applications in a large range of domains such as medicine, physic, and industry. The field of time series classification has been particularly active recently with the development of more and more efficient methods. However, the existing methods assume that the input time series is free of uncertainty. However, there are applications in which uncertainty is so important that it can not be neglected. This project aims to build efficient, robust, and interpretable classification methods for uncertain time series.
ArXiv, 2020
Named entity recognition is an important task in natural language processing. It is very well stu... more Named entity recognition is an important task in natural language processing. It is very well studied for rich language, but still under explored for low-resource languages. The main reason is that the existing techniques required a lot of annotated data to reach good performance. Recently, a new distributional representation of words has been proposed to project named entities from a rich language to a low-resource one. This representation has been coupled to a neural network in order to project named entities from English to Ewondo, a Bantu language spoken in Cameroon. Although the proposed method reached appreciable results, the size of the used neural network was too large compared to the size of the dataset. Furthermore the impact of the model parameters has not been studied. In this paper, we show experimentally that the same results can be obtained using a smaller neural network. We also emphasize the parameters that are highly correlated to the network performance. This work...
2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS)
Zenodo (CERN European Organization for Nuclear Research), Oct 13, 2022
Springer eBooks, 2023
Groundwater level prediction is an applied time series forecasting task with important social imp... more Groundwater level prediction is an applied time series forecasting task with important social impacts to optimize water management as well as preventing some natural disasters: for instance, floods or severe droughts. Machine learning methods have been reported in the literature to achieve this task, but they are only focused on the forecast of the groundwater level at a single location. A global forecasting method aims at exploiting the groundwater level time series from a wide range of locations to produce predictions at a single place or at several places at a time. Given the recent success of global forecasting methods in prestigious competitions, it is meaningful to assess them on groundwater level prediction and see how they are compared to local methods. In this work, we created a dataset of 1026 groundwater level time series. Each time series is made of daily measurements of groundwater levels and two exogenous variables, rainfall and evapotranspiration. This dataset is made available to the communities for reproducibility and further evaluation. To identify the best configuration to effectively predict groundwater level for the complete set of time series, we compared different predictors including local and global time series forecasting methods. We assessed the impact of exogenous variables. Our result analysis shows that the best predictions are obtained by training a global method on past groundwater levels and rainfall data.
2022 IEEE International Conference on Data Mining Workshops (ICDMW)
Le Centre pour la Communication Scientifique Directe - HAL - Université de Nantes, Jun 29, 2020
Time series classification is a task that aims at classifying chronological data. It is used in a... more Time series classification is a task that aims at classifying chronological data. It is used in a diverse range of domains such as meteorology, medicine and physics. In the last decade, many algorithms have been built to perform this task with very appreciable accuracy. However, applications where time series have uncertainty has been under-explored. Using uncertainty propagation techniques, we propose a new uncertain dissimilarity measure based on Euclidean distance. We then propose the uncertain shapelet transform algorithm for the classification of uncertain time series. The large experiments we conducted on state of the art datasets show the effectiveness of our contribution. The source code of our contribution and the datasets we used are all available on a public repository.
Cornell University - arXiv, Sep 28, 2022
Groundwater level prediction is an applied time series forecasting task with important social imp... more Groundwater level prediction is an applied time series forecasting task with important social impacts to optimize water management as well as preventing some natural disasters: for instance, floods or severe droughts. Machine learning methods have been reported in the literature to achieve this task, but they are only focused on the forecast of the groundwater level at a single location. A global forecasting method aims at exploiting the groundwater level time series from a wide range of locations to produce predictions at a single place or at several places at a time. Given the recent success of global forecasting methods in prestigious competitions, it is meaningful to assess them on groundwater level prediction and see how they are compared to local methods. In this work, we created a dataset of 1026 groundwater level time series. Each time series is made of daily measurements of groundwater levels and two exogenous variables, rainfall and evapotranspiration. This dataset is made available to the communities for reproducibility and further evaluation. To identify the best configuration to effectively predict groundwater level for the complete set of time series, we compared different predictors including local and global time series forecasting methods. We assessed the impact of exogenous variables. Our result analysis shows that the best predictions are obtained by training a global method on past groundwater levels and rainfall data.
2022 International Joint Conference on Neural Networks (IJCNN)
Exploring the expansion history of the universe, understanding its evolutionary stages, and predi... more Exploring the expansion history of the universe, understanding its evolutionary stages, and predicting its future evolution are important goals in astrophysics. Today, machine learning tools are used to help achieving these goals by analyzing transient sources, which are modeled as uncertain time series. Although black-box methods achieve appreciable performance, existing interpretable time series methods failed to obtain acceptable performance for this type of data. Furthermore, data uncertainty is rarely taken into account in these methods. In this work, we propose an uncertaintyaware subsequence based model which achieves a classification comparable to that of state-of-the-art methods. Unlike conformal learning which estimates model uncertainty on predictions, our method takes data uncertainty as additional input. Moreover, our approach is explainable-by-design, giving domain experts the ability to inspect the model and explain its predictions. The explainability of the proposed method has also the potential to inspire new developments in theoretical astrophysics modeling by suggesting important subsequences which depict details of light curve shapes. The dataset, the source code of our experiment, and the results are made available on a public repository.
Time serie classification is used in a diverse range of domain such as meteorology, medicine and ... more Time serie classification is used in a diverse range of domain such as meteorology, medicine and physics. It aims to classify chronological data. Many accurate approaches have been built during the last decade and shapelet transformation is one of them. However, none of these approaches does take data uncertainty into account. Using uncertainty propagation techiniques, we propose a new dissimilarity measure based on euclidean distance. We also show how to use this new measure to adapt shapelet transformation to uncertain time series classification. An experimental assessment of our contribution is done on some state of the art datasets.
Conférence Nationale en Intelligence Artificielle, Dec 10, 2019
Time series classification is a task that aims at classifying chronological data. It is used in a... more Time series classification is a task that aims at classifying chronological data. It is used in a diverse range of domains such as meteorology, medicine and physics. In the last decade, many algorithms have been built to perform this task with very appreciable accuracy. However, the uncertainty in data is not explicitly taken into account by these methods. Using uncertainty propagation techniques, we propose a new uncertain dissimilarity measure based on euclidean distance. We also show how to classify uncertain time series using the proposed dissimilarity measure and shapelet transform, one of the best time series classification methods. An experimental assessment of our contribution is done on the well known UCR dataset.
ARIMA J., 2020
Named Entity Recognition (NER) is a fundamental task in many NLP applications that seek to identi... more Named Entity Recognition (NER) is a fundamental task in many NLP applications that seek to identify and classify expressions such as people, location, and organization names. Many NER systems have been developed, but the annotated data needed for good performances are not available for low-resource languages, such as Cameroonian languages. In this paper we exploit the low frequency of named entities in text to define a new suitable cross-lingual distributional representation for named entity recognition. We build the first Ewondo (a Bantu low-resource language of Cameroon) named entities recognizer by projecting named entity tags from English using our word representation. In terms of Recall, Precision and F-score, the obtained results show the effectiveness of the proposed distributional representation of words
2020 International Conference on Data Mining Workshops (ICDMW), 2020
Time series classification is a task that aims at classifying chronological data. It is used in a... more Time series classification is a task that aims at classifying chronological data. It is used in a diverse range of domains such as meteorology, medicine and physics. In the last decade, many algorithms have been built to perform this task with very appreciable accuracy. However, applications where time series have uncertainty has been under-explored. Using uncertainty propagation techniques, we propose a new uncertain dissimilarity measure based on Euclidean distance. We then propose the uncertain shapelet transform algorithm for the classification of uncertain time series. The large experiments we conducted on state of the art datasets show the effectiveness of our contribution. The source code of our contribution and the datasets we used are all available on a public repository.
La classification des series temporelles est une tâche utilisee dans divers domaines tels que la ... more La classification des series temporelles est une tâche utilisee dans divers domaines tels que la meteorologie, la medecine et la physique. Elle a pour but de classer des objets modelises sous forme de series temporelles. Des al-gorithmes toujours plus efficaces ont vu le jour durant cette derniere decennie pour accomplir cette tâche. Parmi ces algorithmes, il y a ceux bases sur la transformation en shapelet. Cependant aucun de ces derniers n'est applicable dans le cadre des series temporelles incertaines. En utilisant les techniques de propagation de l'incertitude, nous proposons dans cet article une nouvelle mesure de dissimilarite incertaine que nous utilisons ensuite pour adapter la transformation en shapelet aux series temporelles incertaines. Une validation experimentale sur des donnees de la litterature montre l'interet de notre approche.
Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence
Time series analysis has gained a lot of interest during the last decade with diverse application... more Time series analysis has gained a lot of interest during the last decade with diverse applications in a large range of domains such as medicine, physic, and industry. The field of time series classification has been particularly active recently with the development of more and more efficient methods. However, the existing methods assume that the input time series is free of uncertainty. However, there are applications in which uncertainty is so important that it can not be neglected. This project aims to build efficient, robust, and interpretable classification methods for uncertain time series.
ArXiv, 2020
Named entity recognition is an important task in natural language processing. It is very well stu... more Named entity recognition is an important task in natural language processing. It is very well studied for rich language, but still under explored for low-resource languages. The main reason is that the existing techniques required a lot of annotated data to reach good performance. Recently, a new distributional representation of words has been proposed to project named entities from a rich language to a low-resource one. This representation has been coupled to a neural network in order to project named entities from English to Ewondo, a Bantu language spoken in Cameroon. Although the proposed method reached appreciable results, the size of the used neural network was too large compared to the size of the dataset. Furthermore the impact of the model parameters has not been studied. In this paper, we show experimentally that the same results can be obtained using a smaller neural network. We also emphasize the parameters that are highly correlated to the network performance. This work...
2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS)