Farkhod Makhmudkhujaev | Tashkent University of Information Technologies (original) (raw)
Papers by Farkhod Makhmudkhujaev
IEEE Access, Dec 31, 2022
The challenge of transforming the apparent age of human faces in videos has not been adequately a... more The challenge of transforming the apparent age of human faces in videos has not been adequately addressed due to the complexities involved in preserving spatial and temporal consistency. This task is further complicated by the scarcity of video datasets featuring specific individuals across various age groups. To address these issues, we introduce Re-Aging GAN++ (RAGAN++), a unified framework designed to perform facial age transformation in videos utilizing an innovative GAN-based model trained on still image data. Initially, the modulation process acquires multi-scale personalized age features to depict the attributes of the target age group. Subsequently, the encoder applies Gaussian smoothing at each scale, ensuring a seamless frame-to-frame transition that accounts for inter-frame variations, such as facial motion within the camera's field of view. Remarkably, the proposed model demonstrates the ability to perform facial age transformation in videos despite being trained exclusively on image data. Our proposed method exhibits exceptional spatio-temporal consistency concerning facial identity, expression, and pose while maintaining natural variations across diverse age groups. INDEX TERMS Video generation, age manipulation, GAN, spatio-temporal consistency.
2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021
A Deus que me guia, aos meus pais, Anderson e Gina, que me fazem ser melhor a cada dia, ao Lucas ... more A Deus que me guia, aos meus pais, Anderson e Gina, que me fazem ser melhor a cada dia, ao Lucas que me apoia, a minha avó, Maria, e aos meus falecidos avôs, Abdias e Benedito, por acreditarem em mim e a todos meus familiares e amigos. AGRADECIMENTOS Agradeço, primeiramente, a Deus pelo dom da vida, pela saúde e por me guiar e ajudar em todos os momentos da minha vida. "O senhor é meu pastor e nada me faltará" (Salmo 23). Agradeço aos meus pais, Anderson e Gina, meus fieis protetores, pela vida, oportunidades, ensinamentos, educação, pelo amor mais lindo e pleno, por acreditarem nas minhas conquistas e estarem sempre ao meu lado. Por vocês eu sou capaz de tudo. Agradeço ao Lucas, meu namorado, pelo apoio em tudo o que faço, pelo abraço amigo na hora da dor e pelo sorriso nos momentos de felicidade. Sua coragem e determinação me inspiram a crescer. Agradeço aos meus familiares por me fazerem ter uma família abençoada. Em especial a minha avó Maria e aos meus falecidos avôs, Abdias e Benedito, por sempre acreditarem em mim e formarem uma família tão linda. Agradeço as minhas primas e primos por fazerem parte da vida e serem sempre meus amigos fieis. Agradeço ao Professor Rildo e Silva, que foi meu orientador, por todos os ensinamentos e auxílio para conclusão desse trabalho. Agradeço aos amigos pelos momentos de alegria.
IEEE Transactions on Affective Computing, 2020
IEEE Transactions on Affective Computing, 2018
Currently available local feature descriptors used in facial expression recognition at times suff... more Currently available local feature descriptors used in facial expression recognition at times suffer from unstable feature descriptions, especially in the presence of weak and distorted edges due to noise, limiting their performances. We propose a novel local descriptor named Neighborhood-aware Edge Directional Pattern (NEDP) to overcome such limitations. Instead of relying solely on the local neighborhood to describe the feature around a pixel, as done by the existing local descriptors, NEDP examines the gradients at the target (center) pixel as well as its neighboring pixels to explore a wider neighborhood for the consistency of the feature in spite of the presence of subtle distortion and noise in local region. We introduce template-orientations for the neighboring pixels, which give importance to the gradients in consistent edge directions, prioritizing the specific neighbors falling in the direction of the local edge to represent the shape of the local textures, unambiguously. Moreover, due to the effective management of the featureless regions, no such region is erroneously encoded as a feature by NEDP. Experiments of the performances for person-independent recognition on benchmark expression datasets also show that NEDP performs better than other existing descriptors, and thereby, improves the overall performance of facial expression recognition.
TURKISH JOURNAL OF ELECTRICAL ENGINEERING & COMPUTER SCIENCES, 2019
Signal Processing: Image Communication, 2019
Abstract Local edge-based descriptors have gained much attention as feature extraction methods fo... more Abstract Local edge-based descriptors have gained much attention as feature extraction methods for facial expression recognition. However, such descriptors are found to suffer from unstable shape representations for different local structures for their sensitivity to local distortions such as noise and positional variations. We propose a novel edge-based descriptor, named Local Prominent Directional Pattern (LPDP), which considers statistical information of a pixel neighborhood to encode more meaningful and reliable information than the existing descriptors for feature extraction. More specifically, LPDP examines a local neighborhood of a pixel to retrieve significant edges corresponding to the local shape and thereby ensures encoding edge information in spite of some positional variations and avoiding noisy edges. Thus LPDP can represent important textured regions much effectively to be used in facial expression recognition. Extensive experiments on facial expression recognition on well-known datasets also demonstrate the better capability of LPDP than other existing descriptors in terms of robustness in extracting various local structures originated by facial expression changes.
2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), 2017
Recent background subtraction methods use fusion of multiple features to achieve consistent perfo... more Recent background subtraction methods use fusion of multiple features to achieve consistent performance in non-stationary environment. During the segmentation, most of these methods apply logical operators (AND, or OR) to the results of each observed feature. Using logical operators has two critical problems, e.g. i) AND operator may reduce true-positives, and ii) OR operator may increase false-positive. In this paper, we address these issues by proposing a new way to fuse multiple features in such a way that each feature is treated adaptively based on its influence to decision making. We conduct quantitative experiments on CD-net 2012 dataset, and observe that proposed feature fusion strategy outperforms the existing logical operator based features fusion approach. We use edge and color features in the feature fusion part, where Local Direction Number Pattern (LDN) is used as the edge feature on each RGB color channel. Our comparative analysis show that proposed fusion strategy with color (RGB) and edge features (LDN) outperforms existing state-of-the-art methods.
IEEE Access
Existing works in image synthesis have shown the efficiency of applying attention mechanisms in g... more Existing works in image synthesis have shown the efficiency of applying attention mechanisms in generating natural-looking images. Despite the great informativeness, current works utilize such mechanisms at a certain scale of generative and discriminative networks. Intuitively, the increased use of attention should lead to a better performance. However, due to memory constraints, even moving a single attention mechanism to a higher scale of the network is infeasible. Motivated by the importance of attention in image generation, we tackle this limitation by proposing a generative adversarial network-based framework that readily incorporates attention mechanisms at every scale of its networks. A straightforward structure of attention mechanism enables direct plugging in a scale-wise manner and trains jointly with adversarial networks. As a result, networks are forced to focus on relevant regions of feature maps learned at every scale, thus improving their own image representation power. In addition, we exploit and show the usage of multiscale attention features as a complementary feature set in discriminator training. We demonstrate qualitatively and quantitatively that the introduction of scale-wise attention mechanisms benefits competitive networks, thus improving the performance compared with those of current works. INDEX TERMS Image synthesis, generative adversarial networks, attention, multiscale.
Segmentation of foreground objects using background subtraction methods is popularly used in a wi... more Segmentation of foreground objects using background subtraction methods is popularly used in a wide variety of application areas such as surveillance, tracking, and human pose estimation. Many of the background subtraction methods construct a background model in a pixel-wise manner using color information that is sensitive to illumination variations. In the recent past, a number of local feature descriptors have been successfully applied to overcome such issues. However, these descriptors still suffer from over-sensitivity and sometimes unable to differentiate local structures. In order to tackle the aforementioned problems of existing descriptors, we propose a novel edge based descriptor, Local Top Directional Pattern (LTDP), that represents local structures in a pattern form with aid of compass masks providing information of top local directional variations. Moreover, to strengthen the robustness of the pixel-wise background model and get benefited from each other, we combine both...
IEEE Access, 2021
In this paper, we tackle the well-known problem of dataset construction from the point of its gen... more In this paper, we tackle the well-known problem of dataset construction from the point of its generation using generative adversarial networks (GAN). As semantic information of the dataset should have a proper alignment with images, controlling the image generation process of GAN comes to the first position. Considering this, we focus on conditioning the generative process by solely utilizing conditional information to achieve reliable control over the image generation. Unlike the existing works that consider the input (noise or image) in conjunction with conditions, our work considers transforming the input directly to the conditional space by utilizing the given conditions only. By doing so, we reveal the relations between conditions to determine their distinct and reliable feature space without the impact of input information. To fully leverage the conditional information, we propose a novel architectural framework (i.e., conditional transformation) that aims to learn features only from a set of conditions for guiding a generative model by transforming the input to the generator. Such an approach enables controlling the generator by setting its inputs according to the specific conditions necessary for semantically correct image generation. Given that the framework operates at the initial stage of generation, it can be plugged into any existing generative models and trained in an end-to-end manner together with the generator. Extensive experiments on various tasks, such as novel image synthesis and image-to-image translation, demonstrate that the conditional transformation of inputs facilitates solid control over the image generation process and thus shows its applicability for use in dataset construction.
2019 IEEE International Conference on Consumer Electronics (ICCE)
2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), 2017
Recent background subtraction methods use fusion of multiple features to achieve consistent perfo... more Recent background subtraction methods use fusion of multiple features to achieve consistent performance in non-stationary environment. During the segmentation, most of these methods apply logical operators (AND, or OR) to the results of each observed feature. Using logical operators has two critical problems, e.g. i) AND operator may reduce true-positives, and ii) OR operator may increase false-positive. In this paper, we address these issues by proposing a new way to fuse multiple features in such a way that each feature is treated adaptively based on its influence to decision making. We conduct quantitative experiments on CD-net 2012 dataset, and observe that proposed feature fusion strategy outperforms the existing logical operator based features fusion approach. We use edge and color features in the feature fusion part, where Local Direction Number Pattern (LDN) is used as the edge feature on each RGB color channel. Our comparative analysis show that proposed fusion strategy with color (RGB) and edge features (LDN) outperforms existing state-of-the-art methods.
IEEE Access, Dec 31, 2022
The challenge of transforming the apparent age of human faces in videos has not been adequately a... more The challenge of transforming the apparent age of human faces in videos has not been adequately addressed due to the complexities involved in preserving spatial and temporal consistency. This task is further complicated by the scarcity of video datasets featuring specific individuals across various age groups. To address these issues, we introduce Re-Aging GAN++ (RAGAN++), a unified framework designed to perform facial age transformation in videos utilizing an innovative GAN-based model trained on still image data. Initially, the modulation process acquires multi-scale personalized age features to depict the attributes of the target age group. Subsequently, the encoder applies Gaussian smoothing at each scale, ensuring a seamless frame-to-frame transition that accounts for inter-frame variations, such as facial motion within the camera's field of view. Remarkably, the proposed model demonstrates the ability to perform facial age transformation in videos despite being trained exclusively on image data. Our proposed method exhibits exceptional spatio-temporal consistency concerning facial identity, expression, and pose while maintaining natural variations across diverse age groups. INDEX TERMS Video generation, age manipulation, GAN, spatio-temporal consistency.
2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021
A Deus que me guia, aos meus pais, Anderson e Gina, que me fazem ser melhor a cada dia, ao Lucas ... more A Deus que me guia, aos meus pais, Anderson e Gina, que me fazem ser melhor a cada dia, ao Lucas que me apoia, a minha avó, Maria, e aos meus falecidos avôs, Abdias e Benedito, por acreditarem em mim e a todos meus familiares e amigos. AGRADECIMENTOS Agradeço, primeiramente, a Deus pelo dom da vida, pela saúde e por me guiar e ajudar em todos os momentos da minha vida. "O senhor é meu pastor e nada me faltará" (Salmo 23). Agradeço aos meus pais, Anderson e Gina, meus fieis protetores, pela vida, oportunidades, ensinamentos, educação, pelo amor mais lindo e pleno, por acreditarem nas minhas conquistas e estarem sempre ao meu lado. Por vocês eu sou capaz de tudo. Agradeço ao Lucas, meu namorado, pelo apoio em tudo o que faço, pelo abraço amigo na hora da dor e pelo sorriso nos momentos de felicidade. Sua coragem e determinação me inspiram a crescer. Agradeço aos meus familiares por me fazerem ter uma família abençoada. Em especial a minha avó Maria e aos meus falecidos avôs, Abdias e Benedito, por sempre acreditarem em mim e formarem uma família tão linda. Agradeço as minhas primas e primos por fazerem parte da vida e serem sempre meus amigos fieis. Agradeço ao Professor Rildo e Silva, que foi meu orientador, por todos os ensinamentos e auxílio para conclusão desse trabalho. Agradeço aos amigos pelos momentos de alegria.
IEEE Transactions on Affective Computing, 2020
IEEE Transactions on Affective Computing, 2018
Currently available local feature descriptors used in facial expression recognition at times suff... more Currently available local feature descriptors used in facial expression recognition at times suffer from unstable feature descriptions, especially in the presence of weak and distorted edges due to noise, limiting their performances. We propose a novel local descriptor named Neighborhood-aware Edge Directional Pattern (NEDP) to overcome such limitations. Instead of relying solely on the local neighborhood to describe the feature around a pixel, as done by the existing local descriptors, NEDP examines the gradients at the target (center) pixel as well as its neighboring pixels to explore a wider neighborhood for the consistency of the feature in spite of the presence of subtle distortion and noise in local region. We introduce template-orientations for the neighboring pixels, which give importance to the gradients in consistent edge directions, prioritizing the specific neighbors falling in the direction of the local edge to represent the shape of the local textures, unambiguously. Moreover, due to the effective management of the featureless regions, no such region is erroneously encoded as a feature by NEDP. Experiments of the performances for person-independent recognition on benchmark expression datasets also show that NEDP performs better than other existing descriptors, and thereby, improves the overall performance of facial expression recognition.
TURKISH JOURNAL OF ELECTRICAL ENGINEERING & COMPUTER SCIENCES, 2019
Signal Processing: Image Communication, 2019
Abstract Local edge-based descriptors have gained much attention as feature extraction methods fo... more Abstract Local edge-based descriptors have gained much attention as feature extraction methods for facial expression recognition. However, such descriptors are found to suffer from unstable shape representations for different local structures for their sensitivity to local distortions such as noise and positional variations. We propose a novel edge-based descriptor, named Local Prominent Directional Pattern (LPDP), which considers statistical information of a pixel neighborhood to encode more meaningful and reliable information than the existing descriptors for feature extraction. More specifically, LPDP examines a local neighborhood of a pixel to retrieve significant edges corresponding to the local shape and thereby ensures encoding edge information in spite of some positional variations and avoiding noisy edges. Thus LPDP can represent important textured regions much effectively to be used in facial expression recognition. Extensive experiments on facial expression recognition on well-known datasets also demonstrate the better capability of LPDP than other existing descriptors in terms of robustness in extracting various local structures originated by facial expression changes.
2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), 2017
Recent background subtraction methods use fusion of multiple features to achieve consistent perfo... more Recent background subtraction methods use fusion of multiple features to achieve consistent performance in non-stationary environment. During the segmentation, most of these methods apply logical operators (AND, or OR) to the results of each observed feature. Using logical operators has two critical problems, e.g. i) AND operator may reduce true-positives, and ii) OR operator may increase false-positive. In this paper, we address these issues by proposing a new way to fuse multiple features in such a way that each feature is treated adaptively based on its influence to decision making. We conduct quantitative experiments on CD-net 2012 dataset, and observe that proposed feature fusion strategy outperforms the existing logical operator based features fusion approach. We use edge and color features in the feature fusion part, where Local Direction Number Pattern (LDN) is used as the edge feature on each RGB color channel. Our comparative analysis show that proposed fusion strategy with color (RGB) and edge features (LDN) outperforms existing state-of-the-art methods.
IEEE Access
Existing works in image synthesis have shown the efficiency of applying attention mechanisms in g... more Existing works in image synthesis have shown the efficiency of applying attention mechanisms in generating natural-looking images. Despite the great informativeness, current works utilize such mechanisms at a certain scale of generative and discriminative networks. Intuitively, the increased use of attention should lead to a better performance. However, due to memory constraints, even moving a single attention mechanism to a higher scale of the network is infeasible. Motivated by the importance of attention in image generation, we tackle this limitation by proposing a generative adversarial network-based framework that readily incorporates attention mechanisms at every scale of its networks. A straightforward structure of attention mechanism enables direct plugging in a scale-wise manner and trains jointly with adversarial networks. As a result, networks are forced to focus on relevant regions of feature maps learned at every scale, thus improving their own image representation power. In addition, we exploit and show the usage of multiscale attention features as a complementary feature set in discriminator training. We demonstrate qualitatively and quantitatively that the introduction of scale-wise attention mechanisms benefits competitive networks, thus improving the performance compared with those of current works. INDEX TERMS Image synthesis, generative adversarial networks, attention, multiscale.
Segmentation of foreground objects using background subtraction methods is popularly used in a wi... more Segmentation of foreground objects using background subtraction methods is popularly used in a wide variety of application areas such as surveillance, tracking, and human pose estimation. Many of the background subtraction methods construct a background model in a pixel-wise manner using color information that is sensitive to illumination variations. In the recent past, a number of local feature descriptors have been successfully applied to overcome such issues. However, these descriptors still suffer from over-sensitivity and sometimes unable to differentiate local structures. In order to tackle the aforementioned problems of existing descriptors, we propose a novel edge based descriptor, Local Top Directional Pattern (LTDP), that represents local structures in a pattern form with aid of compass masks providing information of top local directional variations. Moreover, to strengthen the robustness of the pixel-wise background model and get benefited from each other, we combine both...
IEEE Access, 2021
In this paper, we tackle the well-known problem of dataset construction from the point of its gen... more In this paper, we tackle the well-known problem of dataset construction from the point of its generation using generative adversarial networks (GAN). As semantic information of the dataset should have a proper alignment with images, controlling the image generation process of GAN comes to the first position. Considering this, we focus on conditioning the generative process by solely utilizing conditional information to achieve reliable control over the image generation. Unlike the existing works that consider the input (noise or image) in conjunction with conditions, our work considers transforming the input directly to the conditional space by utilizing the given conditions only. By doing so, we reveal the relations between conditions to determine their distinct and reliable feature space without the impact of input information. To fully leverage the conditional information, we propose a novel architectural framework (i.e., conditional transformation) that aims to learn features only from a set of conditions for guiding a generative model by transforming the input to the generator. Such an approach enables controlling the generator by setting its inputs according to the specific conditions necessary for semantically correct image generation. Given that the framework operates at the initial stage of generation, it can be plugged into any existing generative models and trained in an end-to-end manner together with the generator. Extensive experiments on various tasks, such as novel image synthesis and image-to-image translation, demonstrate that the conditional transformation of inputs facilitates solid control over the image generation process and thus shows its applicability for use in dataset construction.
2019 IEEE International Conference on Consumer Electronics (ICCE)
2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), 2017
Recent background subtraction methods use fusion of multiple features to achieve consistent perfo... more Recent background subtraction methods use fusion of multiple features to achieve consistent performance in non-stationary environment. During the segmentation, most of these methods apply logical operators (AND, or OR) to the results of each observed feature. Using logical operators has two critical problems, e.g. i) AND operator may reduce true-positives, and ii) OR operator may increase false-positive. In this paper, we address these issues by proposing a new way to fuse multiple features in such a way that each feature is treated adaptively based on its influence to decision making. We conduct quantitative experiments on CD-net 2012 dataset, and observe that proposed feature fusion strategy outperforms the existing logical operator based features fusion approach. We use edge and color features in the feature fusion part, where Local Direction Number Pattern (LDN) is used as the edge feature on each RGB color channel. Our comparative analysis show that proposed fusion strategy with color (RGB) and edge features (LDN) outperforms existing state-of-the-art methods.