Bertrand Luvison - Academia.edu (original) (raw)
Papers by Bertrand Luvison
2023 IEEE International Conference on Image Processing (ICIP)
Artificial Intelligence in Agriculture
2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021), Dec 15, 2021
Detecting human interactions is crucial for human behavior analysis. Many methods have been propo... more Detecting human interactions is crucial for human behavior analysis. Many methods have been proposed to deal with Human-to-Object Interaction (HOI) detection, i.e., detecting in an image which person and object interact together and classifying the type of interaction. However, Human-to-Human Interactions, such as social and violent interactions, are generally not considered in available HOI training datasets. As we think these types of interactions cannot be ignored and decorrelated from HOI when analyzing human behavior, we propose a new interaction dataset to deal with both types of human interactions: Human-to-Human-or-Object (H 2 O). In addition, we introduce a novel taxonomy of verbs, intended to be closer to a description of human body attitude in relation to the surrounding targets of interaction, and more independent of the environment. Unlike some existing datasets, we strive to avoid defining synonymous verbs when their use highly depends on the target type or requires a high level of semantic interpretation. As H 2 O dataset includes V-COCO images annotated with this new taxonomy, images obviously contain more interactions. This can be an issue for HOI detection methods whose complexity depends on the number of people, targets or interactions. Thus, we propose DIABOLO (Detecting InterActions By Only Looking Once), an efficient subject-centric single-shot method to detect all interactions in one forward pass, with constant inference time independent of image content. In addition, this multi-task network simultaneously detects all people and objects. We show how sharing a network for these tasks does not only save computation resource but also improves performance collaboratively. Finally, DIABOLO is a strong baseline for the new proposed challenge of H 2 O-Interaction detection, as it outperforms all state-of-the-art methods when trained and evaluated on HOI dataset V-COCO. We hope that this new dataset and new baseline will foster future research. H 2 O is available on https://kalisteo.cea.fr/.
In this paper, we introduce a novel human interaction detection approach, based on CALIPSO (Class... more In this paper, we introduce a novel human interaction detection approach, based on CALIPSO (Classifying ALl Interacting Pairs in a Single shOt), a classifier of humanobject interactions. This new single-shot interaction classifier estimates interactions simultaneously for all humanobject pairs, regardless of their number and class. State-ofthe-art approaches adopt a multi-shot strategy based on a pairwise estimate of interactions for a set of human-object candidate pairs, which leads to a complexity depending, at least, on the number of interactions or, at most, on the number of candidate pairs. In contrast, the proposed method estimates the interactions on the whole image. Indeed, it simultaneously estimates all interactions between all human subjects and object targets by performing a single forward pass throughout the image. Consequently, it leads to a constant complexity and computation time independent of the number of subjects, objects or interactions in the image. In detail, interaction classification is achieved on a dense grid of anchors thanks to a joint multi-task network that learns three complementary tasks simultaneously: (i) prediction of the types of interaction, (ii) estimation of the presence of a target and (iii) learning of an embedding which maps interacting subject and target to a same representation, by using a metric learning strategy. In addition, we introduce an object-centric passive-voice verb estimation which significantly improves results. Evaluations on the two well-known Human-Object Interaction image datasets, V-COCO and HICO-DET, demonstrate the competitiveness of the proposed method (2nd place) compared to the state-ofthe-art while having constant computation time regardless of the number of objects and interactions in the image. This work proposes a new interaction detection approach, named CALIPSO (Classifying ALl Interacting Pairs in a Single shOt) which complexity is independent of the number of interactions. The proposed model simultaneously estimates all interactions between all objects with a
arXiv (Cornell University), Jan 7, 2021
Figure 1: Qualitative results of PandaNet on JTA dataset [8] which consists in images with many p... more Figure 1: Qualitative results of PandaNet on JTA dataset [8] which consists in images with many people (up to 60), a large proportion of people at low resolution and many occlusion situations. Most of the previous 3D human pose estimation studies mainly focused on the single-person case or estimate 3D pose of few people at high resolution. In this paper, we propose an anchor-based and single-shot multi-person 3D pose estimation framework that allows the pose estimation of a large number of people at low resolution. Ground-truth translations and scales are used for visualisation.
2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Cette thèse est une collaboration entre le LAboratoire des Sciences et Matériaux pour l’Électroni... more Cette thèse est une collaboration entre le LAboratoire des Sciences et Matériaux pour l’Électronique et d’Automatique (LASMEA) de Clermont-Ferrand et le Laboratoire Vision et Ingénierie des Contenus (LVIC) du CEA LIST à Saclay. La première moitié de la thèse a été accomplie au sein de l’équipe ComSee (1) du LASMEA et la deuxième au LVIC. L’objectif de ces travaux est de concevoir un système de vidéo-assistance temps réel pour la détection d’évènements dans des scènes possiblement denses.La vidéosurveillance intelligente de scènes denses telles que des foules est particulièrement difficile, principalement à cause de leur complexité et de la grande quantité de données à traiter simultanément. Le but de cette thèse consiste à élaborer une méthode de détection d’évènements rares dans de telles scènes, observées depuis une caméra fixe. La méthode en question s’appuie sur l’analyse automatique de mouvement et ne nécessite aucune information à priori. Les mouvements nominaux sont déterminé...
arXiv (Cornell University), Nov 11, 2019
In this paper, we propose a new single shot method for multi-person 3D human pose estimation in c... more In this paper, we propose a new single shot method for multi-person 3D human pose estimation in complex images. The model jointly learns to locate the human joints in the image, to estimate their 3D coordinates and to group these predictions into full human skeletons. The proposed method deals with a variable number of people and does not need bounding boxes to estimate the 3D poses. It leverages and extends the Stacked Hourglass Network and its multiscale feature learning to manage multi-person situations. Thus, we exploit a robust 3D human pose formulation to fully describe several 3D human poses even in case of strong occlusions or crops. Then, joint grouping and human pose estimation for an arbitrary number of people are performed using the associative embedding method. Our approach significantly outperforms the state of the art on the challenging CMU Panoptic. Furthermore, it leads to good results on the complex and synthetic images from the newly proposed JTA Dataset.
2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021), 2021
Detecting human interactions is crucial for human behavior analysis. Many methods have been propo... more Detecting human interactions is crucial for human behavior analysis. Many methods have been proposed to deal with Human-to-Object Interaction (HOI) detection, i.e., detecting in an image which person and object interact together and classifying the type of interaction. However, Human-to-Human Interactions, such as social and violent interactions, are generally not considered in available HOI training datasets. As we think these types of interactions cannot be ignored and decorrelated from HOI when analyzing human behavior, we propose a new interaction dataset to deal with both types of human interactions: Human-to-Human-or-Object (H 2 O). In addition, we introduce a novel taxonomy of verbs, intended to be closer to a description of human body attitude in relation to the surrounding targets of interaction, and more independent of the environment. Unlike some existing datasets, we strive to avoid defining synonymous verbs when their use highly depends on the target type or requires a high level of semantic interpretation. As H 2 O dataset includes V-COCO images annotated with this new taxonomy, images obviously contain more interactions. This can be an issue for HOI detection methods whose complexity depends on the number of people, targets or interactions. Thus, we propose DIABOLO (Detecting InterActions By Only Looking Once), an efficient subject-centric single-shot method to detect all interactions in one forward pass, with constant inference time independent of image content. In addition, this multi-task network simultaneously detects all people and objects. We show how sharing a network for these tasks does not only save computation resource but also improves performance collaboratively. Finally, DIABOLO is a strong baseline for the new proposed challenge of H 2 O-Interaction detection, as it outperforms all state-of-the-art methods when trained and evaluated on HOI dataset V-COCO. We hope that this new dataset and new baseline will foster future research. H 2 O is available on https://kalisteo.cea.fr/.
2020 IEEE Winter Conference on Applications of Computer Vision (WACV), 2020
In this paper, we introduce a novel human interaction detection approach, based on CALIPSO (Class... more In this paper, we introduce a novel human interaction detection approach, based on CALIPSO (Classifying ALl Interacting Pairs in a Single shOt), a classifier of humanobject interactions. This new single-shot interaction classifier estimates interactions simultaneously for all humanobject pairs, regardless of their number and class. State-ofthe-art approaches adopt a multi-shot strategy based on a pairwise estimate of interactions for a set of human-object candidate pairs, which leads to a complexity depending, at least, on the number of interactions or, at most, on the number of candidate pairs. In contrast, the proposed method estimates the interactions on the whole image. Indeed, it simultaneously estimates all interactions between all human subjects and object targets by performing a single forward pass throughout the image. Consequently, it leads to a constant complexity and computation time independent of the number of subjects, objects or interactions in the image. In detail, interaction classification is achieved on a dense grid of anchors thanks to a joint multi-task network that learns three complementary tasks simultaneously: (i) prediction of the types of interaction, (ii) estimation of the presence of a target and (iii) learning of an embedding which maps interacting subject and target to a same representation, by using a metric learning strategy. In addition, we introduce an object-centric passive-voice verb estimation which significantly improves results. Evaluations on the two well-known Human-Object Interaction image datasets, V-COCO and HICO-DET, demonstrate the competitiveness of the proposed method (2nd place) compared to the state-ofthe-art while having constant computation time regardless of the number of objects and interactions in the image.
Procedure for locating objects of interest in a scene (10) observed by a system (11) image acquis... more Procedure for locating objects of interest in a scene (10) observed by a system (11) image acquisition, objects of interest located using a map (12) Formula ** ** initial presence that models being a i positions in the scene and comprising, for each ** representing the probability that an object of interest is in the position i considered, position i, a value obtained each value ** ** Formula Formula ** from location criteria defined in a space image imaging system, the process being characterized in that it comprises an iteration of the following successive stages while at least one of the values Formula ** ** ** presence map Formula * * considered for the current iteration k is greater than a predetermined threshold: - determining (222) the position on the map nk presence ** ** Formula for which the value ** ** Formula is maximum, considered an object of interest as present in said nk position - from default for each position j presence map ** ** Formula Aj atoms, the atom Aj aj...
2019 IEEE International Conference on Image Processing (ICIP), 2019
In this paper, we propose a new single shot method for multi-person 3D human pose estimation in c... more In this paper, we propose a new single shot method for multi-person 3D human pose estimation in complex images. The model jointly learns to locate the human joints in the image, to estimate their 3D coordinates and to group these predictions into full human skeletons. The proposed method deals with a variable number of people and does not need bounding boxes to estimate the 3D poses. It leverages and extends the Stacked Hourglass Network and its multiscale feature learning to manage multi-person situations. Thus, we exploit a robust 3D human pose formulation to fully describe several 3D human poses even in case of strong occlusions or crops. Then, joint grouping and human pose estimation for an arbitrary number of people are performed using the associative embedding method. Our approach significantly outperforms the state of the art on the challenging CMU Panoptic. Furthermore, it leads to good results on the complex and synthetic images from the newly proposed JTA Dataset.
Dufour/Intelligent Video Surveillance Systems, 2013
Cette these est une collaboration entre le LAboratoire des Sciences et Materiaux pour l’Electroni... more Cette these est une collaboration entre le LAboratoire des Sciences et Materiaux pour l’Electronique et d’Automatique (LASMEA) de Clermont-Ferrand et le Laboratoire Vision et Ingenierie des Contenus (LVIC) du CEA LIST a Saclay. La premiere moitie de la these a ete accomplie au sein de l’equipe ComSee (1) du LASMEA et la deuxieme au LVIC. L’objectif de ces travaux est de concevoir un systeme de video-assistance temps reel pour la detection d’evenements dans des scenes possiblement denses.La videosurveillance intelligente de scenes denses telles que des foules est particulierement difficile, principalement a cause de leur complexite et de la grande quantite de donnees a traiter simultanement. Le but de cette these consiste a elaborer une methode de detection d’evenements rares dans de telles scenes, observees depuis une camera fixe. La methode en question s’appuie sur l’analyse automatique de mouvement et ne necessite aucune information a priori. Les mouvements nominaux sont determine...
2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Proceedings of the 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics
IEEE Transactions on Circuits and Systems for Video Technology, 2016
2023 IEEE International Conference on Image Processing (ICIP)
Artificial Intelligence in Agriculture
2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021), Dec 15, 2021
Detecting human interactions is crucial for human behavior analysis. Many methods have been propo... more Detecting human interactions is crucial for human behavior analysis. Many methods have been proposed to deal with Human-to-Object Interaction (HOI) detection, i.e., detecting in an image which person and object interact together and classifying the type of interaction. However, Human-to-Human Interactions, such as social and violent interactions, are generally not considered in available HOI training datasets. As we think these types of interactions cannot be ignored and decorrelated from HOI when analyzing human behavior, we propose a new interaction dataset to deal with both types of human interactions: Human-to-Human-or-Object (H 2 O). In addition, we introduce a novel taxonomy of verbs, intended to be closer to a description of human body attitude in relation to the surrounding targets of interaction, and more independent of the environment. Unlike some existing datasets, we strive to avoid defining synonymous verbs when their use highly depends on the target type or requires a high level of semantic interpretation. As H 2 O dataset includes V-COCO images annotated with this new taxonomy, images obviously contain more interactions. This can be an issue for HOI detection methods whose complexity depends on the number of people, targets or interactions. Thus, we propose DIABOLO (Detecting InterActions By Only Looking Once), an efficient subject-centric single-shot method to detect all interactions in one forward pass, with constant inference time independent of image content. In addition, this multi-task network simultaneously detects all people and objects. We show how sharing a network for these tasks does not only save computation resource but also improves performance collaboratively. Finally, DIABOLO is a strong baseline for the new proposed challenge of H 2 O-Interaction detection, as it outperforms all state-of-the-art methods when trained and evaluated on HOI dataset V-COCO. We hope that this new dataset and new baseline will foster future research. H 2 O is available on https://kalisteo.cea.fr/.
In this paper, we introduce a novel human interaction detection approach, based on CALIPSO (Class... more In this paper, we introduce a novel human interaction detection approach, based on CALIPSO (Classifying ALl Interacting Pairs in a Single shOt), a classifier of humanobject interactions. This new single-shot interaction classifier estimates interactions simultaneously for all humanobject pairs, regardless of their number and class. State-ofthe-art approaches adopt a multi-shot strategy based on a pairwise estimate of interactions for a set of human-object candidate pairs, which leads to a complexity depending, at least, on the number of interactions or, at most, on the number of candidate pairs. In contrast, the proposed method estimates the interactions on the whole image. Indeed, it simultaneously estimates all interactions between all human subjects and object targets by performing a single forward pass throughout the image. Consequently, it leads to a constant complexity and computation time independent of the number of subjects, objects or interactions in the image. In detail, interaction classification is achieved on a dense grid of anchors thanks to a joint multi-task network that learns three complementary tasks simultaneously: (i) prediction of the types of interaction, (ii) estimation of the presence of a target and (iii) learning of an embedding which maps interacting subject and target to a same representation, by using a metric learning strategy. In addition, we introduce an object-centric passive-voice verb estimation which significantly improves results. Evaluations on the two well-known Human-Object Interaction image datasets, V-COCO and HICO-DET, demonstrate the competitiveness of the proposed method (2nd place) compared to the state-ofthe-art while having constant computation time regardless of the number of objects and interactions in the image. This work proposes a new interaction detection approach, named CALIPSO (Classifying ALl Interacting Pairs in a Single shOt) which complexity is independent of the number of interactions. The proposed model simultaneously estimates all interactions between all objects with a
arXiv (Cornell University), Jan 7, 2021
Figure 1: Qualitative results of PandaNet on JTA dataset [8] which consists in images with many p... more Figure 1: Qualitative results of PandaNet on JTA dataset [8] which consists in images with many people (up to 60), a large proportion of people at low resolution and many occlusion situations. Most of the previous 3D human pose estimation studies mainly focused on the single-person case or estimate 3D pose of few people at high resolution. In this paper, we propose an anchor-based and single-shot multi-person 3D pose estimation framework that allows the pose estimation of a large number of people at low resolution. Ground-truth translations and scales are used for visualisation.
2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Cette thèse est une collaboration entre le LAboratoire des Sciences et Matériaux pour l’Électroni... more Cette thèse est une collaboration entre le LAboratoire des Sciences et Matériaux pour l’Électronique et d’Automatique (LASMEA) de Clermont-Ferrand et le Laboratoire Vision et Ingénierie des Contenus (LVIC) du CEA LIST à Saclay. La première moitié de la thèse a été accomplie au sein de l’équipe ComSee (1) du LASMEA et la deuxième au LVIC. L’objectif de ces travaux est de concevoir un système de vidéo-assistance temps réel pour la détection d’évènements dans des scènes possiblement denses.La vidéosurveillance intelligente de scènes denses telles que des foules est particulièrement difficile, principalement à cause de leur complexité et de la grande quantité de données à traiter simultanément. Le but de cette thèse consiste à élaborer une méthode de détection d’évènements rares dans de telles scènes, observées depuis une caméra fixe. La méthode en question s’appuie sur l’analyse automatique de mouvement et ne nécessite aucune information à priori. Les mouvements nominaux sont déterminé...
arXiv (Cornell University), Nov 11, 2019
In this paper, we propose a new single shot method for multi-person 3D human pose estimation in c... more In this paper, we propose a new single shot method for multi-person 3D human pose estimation in complex images. The model jointly learns to locate the human joints in the image, to estimate their 3D coordinates and to group these predictions into full human skeletons. The proposed method deals with a variable number of people and does not need bounding boxes to estimate the 3D poses. It leverages and extends the Stacked Hourglass Network and its multiscale feature learning to manage multi-person situations. Thus, we exploit a robust 3D human pose formulation to fully describe several 3D human poses even in case of strong occlusions or crops. Then, joint grouping and human pose estimation for an arbitrary number of people are performed using the associative embedding method. Our approach significantly outperforms the state of the art on the challenging CMU Panoptic. Furthermore, it leads to good results on the complex and synthetic images from the newly proposed JTA Dataset.
2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021), 2021
Detecting human interactions is crucial for human behavior analysis. Many methods have been propo... more Detecting human interactions is crucial for human behavior analysis. Many methods have been proposed to deal with Human-to-Object Interaction (HOI) detection, i.e., detecting in an image which person and object interact together and classifying the type of interaction. However, Human-to-Human Interactions, such as social and violent interactions, are generally not considered in available HOI training datasets. As we think these types of interactions cannot be ignored and decorrelated from HOI when analyzing human behavior, we propose a new interaction dataset to deal with both types of human interactions: Human-to-Human-or-Object (H 2 O). In addition, we introduce a novel taxonomy of verbs, intended to be closer to a description of human body attitude in relation to the surrounding targets of interaction, and more independent of the environment. Unlike some existing datasets, we strive to avoid defining synonymous verbs when their use highly depends on the target type or requires a high level of semantic interpretation. As H 2 O dataset includes V-COCO images annotated with this new taxonomy, images obviously contain more interactions. This can be an issue for HOI detection methods whose complexity depends on the number of people, targets or interactions. Thus, we propose DIABOLO (Detecting InterActions By Only Looking Once), an efficient subject-centric single-shot method to detect all interactions in one forward pass, with constant inference time independent of image content. In addition, this multi-task network simultaneously detects all people and objects. We show how sharing a network for these tasks does not only save computation resource but also improves performance collaboratively. Finally, DIABOLO is a strong baseline for the new proposed challenge of H 2 O-Interaction detection, as it outperforms all state-of-the-art methods when trained and evaluated on HOI dataset V-COCO. We hope that this new dataset and new baseline will foster future research. H 2 O is available on https://kalisteo.cea.fr/.
2020 IEEE Winter Conference on Applications of Computer Vision (WACV), 2020
In this paper, we introduce a novel human interaction detection approach, based on CALIPSO (Class... more In this paper, we introduce a novel human interaction detection approach, based on CALIPSO (Classifying ALl Interacting Pairs in a Single shOt), a classifier of humanobject interactions. This new single-shot interaction classifier estimates interactions simultaneously for all humanobject pairs, regardless of their number and class. State-ofthe-art approaches adopt a multi-shot strategy based on a pairwise estimate of interactions for a set of human-object candidate pairs, which leads to a complexity depending, at least, on the number of interactions or, at most, on the number of candidate pairs. In contrast, the proposed method estimates the interactions on the whole image. Indeed, it simultaneously estimates all interactions between all human subjects and object targets by performing a single forward pass throughout the image. Consequently, it leads to a constant complexity and computation time independent of the number of subjects, objects or interactions in the image. In detail, interaction classification is achieved on a dense grid of anchors thanks to a joint multi-task network that learns three complementary tasks simultaneously: (i) prediction of the types of interaction, (ii) estimation of the presence of a target and (iii) learning of an embedding which maps interacting subject and target to a same representation, by using a metric learning strategy. In addition, we introduce an object-centric passive-voice verb estimation which significantly improves results. Evaluations on the two well-known Human-Object Interaction image datasets, V-COCO and HICO-DET, demonstrate the competitiveness of the proposed method (2nd place) compared to the state-ofthe-art while having constant computation time regardless of the number of objects and interactions in the image.
Procedure for locating objects of interest in a scene (10) observed by a system (11) image acquis... more Procedure for locating objects of interest in a scene (10) observed by a system (11) image acquisition, objects of interest located using a map (12) Formula ** ** initial presence that models being a i positions in the scene and comprising, for each ** representing the probability that an object of interest is in the position i considered, position i, a value obtained each value ** ** Formula Formula ** from location criteria defined in a space image imaging system, the process being characterized in that it comprises an iteration of the following successive stages while at least one of the values Formula ** ** ** presence map Formula * * considered for the current iteration k is greater than a predetermined threshold: - determining (222) the position on the map nk presence ** ** Formula for which the value ** ** Formula is maximum, considered an object of interest as present in said nk position - from default for each position j presence map ** ** Formula Aj atoms, the atom Aj aj...
2019 IEEE International Conference on Image Processing (ICIP), 2019
In this paper, we propose a new single shot method for multi-person 3D human pose estimation in c... more In this paper, we propose a new single shot method for multi-person 3D human pose estimation in complex images. The model jointly learns to locate the human joints in the image, to estimate their 3D coordinates and to group these predictions into full human skeletons. The proposed method deals with a variable number of people and does not need bounding boxes to estimate the 3D poses. It leverages and extends the Stacked Hourglass Network and its multiscale feature learning to manage multi-person situations. Thus, we exploit a robust 3D human pose formulation to fully describe several 3D human poses even in case of strong occlusions or crops. Then, joint grouping and human pose estimation for an arbitrary number of people are performed using the associative embedding method. Our approach significantly outperforms the state of the art on the challenging CMU Panoptic. Furthermore, it leads to good results on the complex and synthetic images from the newly proposed JTA Dataset.
Dufour/Intelligent Video Surveillance Systems, 2013
Cette these est une collaboration entre le LAboratoire des Sciences et Materiaux pour l’Electroni... more Cette these est une collaboration entre le LAboratoire des Sciences et Materiaux pour l’Electronique et d’Automatique (LASMEA) de Clermont-Ferrand et le Laboratoire Vision et Ingenierie des Contenus (LVIC) du CEA LIST a Saclay. La premiere moitie de la these a ete accomplie au sein de l’equipe ComSee (1) du LASMEA et la deuxieme au LVIC. L’objectif de ces travaux est de concevoir un systeme de video-assistance temps reel pour la detection d’evenements dans des scenes possiblement denses.La videosurveillance intelligente de scenes denses telles que des foules est particulierement difficile, principalement a cause de leur complexite et de la grande quantite de donnees a traiter simultanement. Le but de cette these consiste a elaborer une methode de detection d’evenements rares dans de telles scenes, observees depuis une camera fixe. La methode en question s’appuie sur l’analyse automatique de mouvement et ne necessite aucune information a priori. Les mouvements nominaux sont determine...
2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Proceedings of the 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics
IEEE Transactions on Circuits and Systems for Video Technology, 2016