Franck Lebourgeois | INSA Lyon (original) (raw)
Uploads
Papers by Franck Lebourgeois
Gazette du livre médiéval, 2011
Cette these a pour objet l'elaboration de methodologies d'analyse permettant de decrire e... more Cette these a pour objet l'elaboration de methodologies d'analyse permettant de decrire et de comparer les ecritures manuscrites anciennes, methodologies d'analyse globale ne necessitant pas segmentation. Elle propose de nouveaux descripteurs robustes bases sur des statistiques d'ordre 2, la contribution essentielle reposant sur la notion de cooccurrence generalisee qui mesure la loi de probabilite conjointe d'informations extraites des images ; c'est une extension de la cooccurrence des niveaux de gris, utilisee jusqu'a present pour caracteriser les textures qui nous a permis d'elaborer diverses cooccurrences, spatiales relatives aux orientations et aux courbures locales des formes, parametriques qui mesurent l'evolution d'une image subissant des transformations successives. Le nombre de descripteurs obtenu etant tres (trop) eleve, nous proposons des methodes concues a partir des plus recentes methodes d'analyse statistique multidimensionnelle de reduction de ce nombre. Ces demarches nous ont conduit a introduire la notion de matrices de cooccurrences propres qui contiennent l'information essentielle permettant de decrire finement les images avec un nombre reduit de descripteurs. Dans la partie applicative nous proposons des methodes de classification non supervisees d'ecritures medievales. Le nombre de groupes et leurs contenus dependent des parametres utilises et des methodes appliquees. Nous avons aussi developpe un moteur de recherche d'ecritures similaires. Dans le cadre du projet ANR-MCD Graphem, nous avons elabore des methodes permettant d'analyser et de suivre l'evolution des ecritures du Moyen Age.
HAL (Le Centre pour la Communication Scientifique Directe), Nov 1, 2006
HAL (Le Centre pour la Communication Scientifique Directe), Jan 21, 2004
European Project Space on Computer Vision, Graphics, Optics and Photonics, 2015
Nous presenterons, dans cet article, un nouveau systeme de segmentation colorimetrique adapte aux... more Nous presenterons, dans cet article, un nouveau systeme de segmentation colorimetrique adapte aux images de documents et permettant de filtrer les bruits de numerisation. Une methode d'extraction de texte simple et rapide basee sur les resultats de la separation colorimetrique sera, ensuite, utilisee afin d'ameliorer les performances d'un OCR particulierement populaire et performant.
This paper presents new techniques of medieval manuscript text discrimination in order to assist ... more This paper presents new techniques of medieval manuscript text discrimination in order to assist paleographers to understand the ancient manuscripts. One of the purposes of paleography is to cluster medieval writings into families, to find relations between them, and find their historical period and/or location. This work aims to confirm paleographers’ classification of medieval writings. It also explores the occasion to show the viability to discriminate medieval writings by using image analysis. In this paper, we define the e-paleography as the assistance of the paleography science by computer vision. Our foremost idea is to select writing features which do not require image segmentation and layout analysis. Our method is based on the Spatial Grey-Level Dependence (SGLD) which measures the join probability between grey level values of pixels for each spatial relation. We prose a statistical measure witch generalizes SGLD, and we also propose the Spatial Curvature Dependence (SCD),...
ArXiv, 2018
The recognition of texts existing in camera-captured images has become an important issue for a g... more The recognition of texts existing in camera-captured images has become an important issue for a great deal of research during the past few decades. This give birth to Scene Character Recognition (SCR) which is an important step in scene text recognition pipeline. In this paper, we extended the Bag of Features (BoF)-based model using deep learning for representing features for accurate SCR of different languages. In the features coding step, a deep Sparse Auto-encoder (SAE)-based strategy was applied to enhance the representative and discriminative abilities of image features. This deep learning architecture provides more efficient features representation and therefore a better recognition accuracy. Our system was evaluated extensively on all the scene character datasets of five different languages. The experimental results proved the efficiency of our system for a multilingual SCR.
2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), 2016
International Journal of Document Analysis and Recognition (IJDAR), 2007
IET Image Processing, 2016
Super-resolution (SR) task has become an important research area due to the rapidly growing inter... more Super-resolution (SR) task has become an important research area due to the rapidly growing interest for high quality images in various computer vision and pattern recognition applications. This has led to the emergence of various SR approaches. According to the number of input images, two kinds of approaches could be distinguished: single or multi-input based approaches. Certainly, processing multiple inputs could lead to an interesting output, but this is not the case mainly for textual image processing. This study focuses on single image-based approaches. Most of the existing methods have been successfully applied on natural images. Nevertheless, their direct application on textual images is not enough efficient due to the specificities that distinguish these particular images from natural images. Therefore, SR approaches especially suited for textual images are proposed in the literature. Previous overviews of SR methods have been concentrated on natural images application with no real application on the textual ones. Thus, this study aims to tackle this lack by surveying methods that are mainly designed for enhancing low-resolution textual images. The authors further criticise these methods and discuss areas which promise improvements in such task. To the best of the authors’ knowledge, this survey is the first investigation in the literature.
PPPS-2001 Pulsed Power Plasma Science 2001. 28th IEEE International Conference on Plasma Science and 13th IEEE International Pulsed Power Conference. Digest of Papers (Cat. No.01CH37251)
Lecture Notes in Computer Science, 2006
Lecture Notes in Computer Science, 2004
Eighth International Conference on Document Analysis and Recognition (ICDAR'05), 2005
This article introduces a new word spotting method designed for ancient manuscripts. We take adva... more This article introduces a new word spotting method designed for ancient manuscripts. We take advantage of the robustness of the gradient feature and propose a new segmentation-free matching algorithm that tolerates spatial variations. We test our algorithm on ancient Latin manuscripts and on George Washington's manuscripts.
Proceedings of Sixth International Conference on Document Analysis and Recognition
This paper describes a statistical model for a document understanding system, which uses both tex... more This paper describes a statistical model for a document understanding system, which uses both text attributes and document layouts. Probabilistic relaxation is used as a recognition scheme to find the hierarchical structure of the logical layout. This approach, commonly used for pixels classification in image analysis, can be applied to classify text blocks into logical classes according to local compatibility
Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) Vol 2, 2007
Lecture Notes in Computer Science, 2004
2007 IEEE International Conference on Image Processing, 2007
Nowadays the research on OCR system focuses on corrupted and damaged characters from printed and ... more Nowadays the research on OCR system focuses on corrupted and damaged characters from printed and handwritten documents. Many researches have been done on touching charac-ters but only few on broken characters. This paper presents a new method to reconstruct printed characters extracted as many connected components. Our approach is based on the pat-tern similarity between broken characters and perfect ones from the same printed document. In the first step, we use a multi-segmentation algorithm to extract all possible connected compo-nents from a document image digitized in grayscale, and then we order them by their size. The correctly segmented characters are supposed to be bigger than the parts of miss-recognized ones. We compute a similarity measure between all connected components, in decreasing order of their size. Then we localize the broken characters by using the bounding box of the correct pattern which have the best match.
Gazette du livre médiéval, 2011
Cette these a pour objet l'elaboration de methodologies d'analyse permettant de decrire e... more Cette these a pour objet l'elaboration de methodologies d'analyse permettant de decrire et de comparer les ecritures manuscrites anciennes, methodologies d'analyse globale ne necessitant pas segmentation. Elle propose de nouveaux descripteurs robustes bases sur des statistiques d'ordre 2, la contribution essentielle reposant sur la notion de cooccurrence generalisee qui mesure la loi de probabilite conjointe d'informations extraites des images ; c'est une extension de la cooccurrence des niveaux de gris, utilisee jusqu'a present pour caracteriser les textures qui nous a permis d'elaborer diverses cooccurrences, spatiales relatives aux orientations et aux courbures locales des formes, parametriques qui mesurent l'evolution d'une image subissant des transformations successives. Le nombre de descripteurs obtenu etant tres (trop) eleve, nous proposons des methodes concues a partir des plus recentes methodes d'analyse statistique multidimensionnelle de reduction de ce nombre. Ces demarches nous ont conduit a introduire la notion de matrices de cooccurrences propres qui contiennent l'information essentielle permettant de decrire finement les images avec un nombre reduit de descripteurs. Dans la partie applicative nous proposons des methodes de classification non supervisees d'ecritures medievales. Le nombre de groupes et leurs contenus dependent des parametres utilises et des methodes appliquees. Nous avons aussi developpe un moteur de recherche d'ecritures similaires. Dans le cadre du projet ANR-MCD Graphem, nous avons elabore des methodes permettant d'analyser et de suivre l'evolution des ecritures du Moyen Age.
HAL (Le Centre pour la Communication Scientifique Directe), Nov 1, 2006
HAL (Le Centre pour la Communication Scientifique Directe), Jan 21, 2004
European Project Space on Computer Vision, Graphics, Optics and Photonics, 2015
Nous presenterons, dans cet article, un nouveau systeme de segmentation colorimetrique adapte aux... more Nous presenterons, dans cet article, un nouveau systeme de segmentation colorimetrique adapte aux images de documents et permettant de filtrer les bruits de numerisation. Une methode d'extraction de texte simple et rapide basee sur les resultats de la separation colorimetrique sera, ensuite, utilisee afin d'ameliorer les performances d'un OCR particulierement populaire et performant.
This paper presents new techniques of medieval manuscript text discrimination in order to assist ... more This paper presents new techniques of medieval manuscript text discrimination in order to assist paleographers to understand the ancient manuscripts. One of the purposes of paleography is to cluster medieval writings into families, to find relations between them, and find their historical period and/or location. This work aims to confirm paleographers’ classification of medieval writings. It also explores the occasion to show the viability to discriminate medieval writings by using image analysis. In this paper, we define the e-paleography as the assistance of the paleography science by computer vision. Our foremost idea is to select writing features which do not require image segmentation and layout analysis. Our method is based on the Spatial Grey-Level Dependence (SGLD) which measures the join probability between grey level values of pixels for each spatial relation. We prose a statistical measure witch generalizes SGLD, and we also propose the Spatial Curvature Dependence (SCD),...
ArXiv, 2018
The recognition of texts existing in camera-captured images has become an important issue for a g... more The recognition of texts existing in camera-captured images has become an important issue for a great deal of research during the past few decades. This give birth to Scene Character Recognition (SCR) which is an important step in scene text recognition pipeline. In this paper, we extended the Bag of Features (BoF)-based model using deep learning for representing features for accurate SCR of different languages. In the features coding step, a deep Sparse Auto-encoder (SAE)-based strategy was applied to enhance the representative and discriminative abilities of image features. This deep learning architecture provides more efficient features representation and therefore a better recognition accuracy. Our system was evaluated extensively on all the scene character datasets of five different languages. The experimental results proved the efficiency of our system for a multilingual SCR.
2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), 2016
International Journal of Document Analysis and Recognition (IJDAR), 2007
IET Image Processing, 2016
Super-resolution (SR) task has become an important research area due to the rapidly growing inter... more Super-resolution (SR) task has become an important research area due to the rapidly growing interest for high quality images in various computer vision and pattern recognition applications. This has led to the emergence of various SR approaches. According to the number of input images, two kinds of approaches could be distinguished: single or multi-input based approaches. Certainly, processing multiple inputs could lead to an interesting output, but this is not the case mainly for textual image processing. This study focuses on single image-based approaches. Most of the existing methods have been successfully applied on natural images. Nevertheless, their direct application on textual images is not enough efficient due to the specificities that distinguish these particular images from natural images. Therefore, SR approaches especially suited for textual images are proposed in the literature. Previous overviews of SR methods have been concentrated on natural images application with no real application on the textual ones. Thus, this study aims to tackle this lack by surveying methods that are mainly designed for enhancing low-resolution textual images. The authors further criticise these methods and discuss areas which promise improvements in such task. To the best of the authors’ knowledge, this survey is the first investigation in the literature.
PPPS-2001 Pulsed Power Plasma Science 2001. 28th IEEE International Conference on Plasma Science and 13th IEEE International Pulsed Power Conference. Digest of Papers (Cat. No.01CH37251)
Lecture Notes in Computer Science, 2006
Lecture Notes in Computer Science, 2004
Eighth International Conference on Document Analysis and Recognition (ICDAR'05), 2005
This article introduces a new word spotting method designed for ancient manuscripts. We take adva... more This article introduces a new word spotting method designed for ancient manuscripts. We take advantage of the robustness of the gradient feature and propose a new segmentation-free matching algorithm that tolerates spatial variations. We test our algorithm on ancient Latin manuscripts and on George Washington's manuscripts.
Proceedings of Sixth International Conference on Document Analysis and Recognition
This paper describes a statistical model for a document understanding system, which uses both tex... more This paper describes a statistical model for a document understanding system, which uses both text attributes and document layouts. Probabilistic relaxation is used as a recognition scheme to find the hierarchical structure of the logical layout. This approach, commonly used for pixels classification in image analysis, can be applied to classify text blocks into logical classes according to local compatibility
Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) Vol 2, 2007
Lecture Notes in Computer Science, 2004
2007 IEEE International Conference on Image Processing, 2007
Nowadays the research on OCR system focuses on corrupted and damaged characters from printed and ... more Nowadays the research on OCR system focuses on corrupted and damaged characters from printed and handwritten documents. Many researches have been done on touching charac-ters but only few on broken characters. This paper presents a new method to reconstruct printed characters extracted as many connected components. Our approach is based on the pat-tern similarity between broken characters and perfect ones from the same printed document. In the first step, we use a multi-segmentation algorithm to extract all possible connected compo-nents from a document image digitized in grayscale, and then we order them by their size. The correctly segmented characters are supposed to be bigger than the parts of miss-recognized ones. We compute a similarity measure between all connected components, in decreasing order of their size. Then we localize the broken characters by using the bounding box of the correct pattern which have the best match.