shaozu yuan - Academia.edu (original) (raw)
Papers by shaozu yuan
2022 IEEE International Conference on Multimedia and Expo (ICME)
Previous works on font generation mainly focus on the standard print fonts where character's shap... more Previous works on font generation mainly focus on the standard print fonts where character's shape is stable and strokes are clearly separated. There is rare research on brush handwriting font generation, which involves holistic structure changes and complex strokes transfer. To address this issue, we propose a novel GAN-based image translation model by integrating the skeleton information. We first extract the skeleton from training images, then design an image encoder and a skeleton encoder to extract corresponding features. A self-attentive refined attention module is devised to guide the model to learn distinctive features between different domains. A skeleton discriminator is involved to first synthesize the skeleton image from the generated image with a pre-trained generator, then to judge its realness to the target one. We also contribute a large-scale brush handwriting font image dataset with six styles and 15,000 high-resolution images. Both quantitative and qualitative experimental results demonstrate the competitiveness of our proposed model.
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence
In this paper, we present a novel system (denoted as Polaca) to generate poetic Chinese landscape... more In this paper, we present a novel system (denoted as Polaca) to generate poetic Chinese landscape painting with calligraphy. Unlike previous single image-to-image painting generation, Polaca takes the classic poetry as input and outputs the artistic landscape painting image with the corresponding calligraphy. It is equipped with three different modules to complete the whole piece of landscape painting artwork: the first one is a text-to-image module to generate landscape painting image, the second one is an image-to-image module to generate stylistic calligraphy image, and the third one is an image fusion module to fuse the two images into a whole piece of aesthetic artwork.
Human conversations are complicated and building a human-like dialogue agent is an extremely chal... more Human conversations are complicated and building a human-like dialogue agent is an extremely challenging task. With the rapid development of deep learning techniques, data-driven models become more and more prevalent which need a huge amount of real conversation data. In this paper, we construct a large-scale real scenario Chinese E-commerce conversation corpus, JDDC, with more than 1 million multi-turn dialogues, 20 million utterances, and 150 million words. The dataset reflects several characteristics of human-human conversations, e.g., goal-driven, and long-term dependency among the context. It also covers various dialogue types including task-oriented, chitchat and question-answering. Extra intent information and three well-annotated challenge sets are also provided. Then, we evaluate several retrieval-based and generative models to provide basic benchmark performance on the JDDC corpus. And we hope JDDC can serve as an effective testbed and benefit the development of fundamenta...
Human conversations are complicated and building a human-like dialogue agent is an extremely chal... more Human conversations are complicated and building a human-like dialogue agent is an extremely challenging task. With the rapid development of deep learning techniques, data-driven models become more and more prevalent which need a huge amount of real conversation data. In this paper, we construct a large-scale real scenario Chinese E-commerce conversation corpus, JDDC, with more than 1 million multi-turn dialogues, 20 million utterances, and 150 million words. The dataset reflects several characteristics of human-human conversations, e.g., goal-driven, and long-term dependency among the context. It also covers various dialogue types including task-oriented, chitchat and question-answering. Extra intent information and three well-annotated challenge sets are also provided. Then, we evaluate several retrieval-based and generative models to provide basic benchmark performance on the JDDC corpus. And we hope JDDC can serve as an effective testbed and benefit the development of fundamenta...
Proceedings of the 28th ACM International Conference on Multimedia, 2020
We present a novel Chinese calligraphy artwork composition system (MaLiang) which can generate ae... more We present a novel Chinese calligraphy artwork composition system (MaLiang) which can generate aesthetic, stylistic and diverse calligraphy images based on the emotion status from the input text. Different from previous research, it's the first work to endow the calligraphy synthesis with the ability to express fickle emotions and composite a whole piece of discourse-level calligraphy artwork instead of single character images. The system consists of three modules: emotion detection, character image generation, and layout prediction. As a creative form of interactive art, MaLiang has been exhibited in several famous international art festivals.
Proceedings of the 29th ACM International Conference on Multimedia, 2021
Emotion plays a critical role in calligraphy composition, which makes the calligraphy artwork imp... more Emotion plays a critical role in calligraphy composition, which makes the calligraphy artwork impressive and have a soul. However, previous research on calligraphy generation all neglected the emotion as a major contributor to the artistry of calligraphy. Such defects prevent them from generating aesthetic, stylistic, and diverse calligraphy artworks, but only static handwriting font library instead. To address this problem, we propose a novel cross-modal approach to generate stylistic and diverse Chinese calligraphy artwork driven by different emotions automatically. We firstly detect the emotions in the text by a classifier, then generate the emotional Chinese character images via a novel modified Generative Adversarial Network (GAN) structure, finally we predict the layout for all character images with a recurrent neural network. We also collect a large-scale stylistic Chinese calligraphy image dataset with rich emotions. Experimental results demonstrate that our model outperforms all baseline image translation models significantly for different emotional styles in terms of content accuracy and style discrepancy. Besides, our layout algorithm can also learn the patterns and habits of calligrapher, and makes the generated calligraphy more artistic. To the best of our knowledge, we are the first to work on emotion-driven discourse-level Chinese calligraphy artwork composition. CCS CONCEPTS • Applied computing → Fine arts; • Computing methodologies → Image representations.
2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), 2021
Sketch plays a critical role in the human art creation process. As one of the functions of the sk... more Sketch plays a critical role in the human art creation process. As one of the functions of the sketch, text-to-sketch may help the artists to catch the fleeting inspirations efficiently. Different from traditional text2image tasks, sketches consist of only a set of sparse lines and depend on very strict edge information, which requires the model to understand the text descriptions accurately and control the shape and texture in the fine-grained granularity. However, there was very rare previous research on the challenging text2sketch task. In this paper, we first construct a text2sketch image dataset by modifying the prevalent CUB dataset. Then a novel Generative Adversarial Network (GAN) based model is proposed by leveraging a Conditional Layer-Instance Normalization (CLIN) module, which can fuse the image features and sentence vector effectively and guide the sketch generation process. Extensive experiments were conducted and the results show the superiority of our proposed model compared to previous baselines. An in-depth analysis was also made to illustrate the contribution of each module and the limitation of our work.
2022 IEEE International Conference on Multimedia and Expo (ICME)
Previous works on font generation mainly focus on the standard print fonts where character's shap... more Previous works on font generation mainly focus on the standard print fonts where character's shape is stable and strokes are clearly separated. There is rare research on brush handwriting font generation, which involves holistic structure changes and complex strokes transfer. To address this issue, we propose a novel GAN-based image translation model by integrating the skeleton information. We first extract the skeleton from training images, then design an image encoder and a skeleton encoder to extract corresponding features. A self-attentive refined attention module is devised to guide the model to learn distinctive features between different domains. A skeleton discriminator is involved to first synthesize the skeleton image from the generated image with a pre-trained generator, then to judge its realness to the target one. We also contribute a large-scale brush handwriting font image dataset with six styles and 15,000 high-resolution images. Both quantitative and qualitative experimental results demonstrate the competitiveness of our proposed model.
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence
In this paper, we present a novel system (denoted as Polaca) to generate poetic Chinese landscape... more In this paper, we present a novel system (denoted as Polaca) to generate poetic Chinese landscape painting with calligraphy. Unlike previous single image-to-image painting generation, Polaca takes the classic poetry as input and outputs the artistic landscape painting image with the corresponding calligraphy. It is equipped with three different modules to complete the whole piece of landscape painting artwork: the first one is a text-to-image module to generate landscape painting image, the second one is an image-to-image module to generate stylistic calligraphy image, and the third one is an image fusion module to fuse the two images into a whole piece of aesthetic artwork.
Human conversations are complicated and building a human-like dialogue agent is an extremely chal... more Human conversations are complicated and building a human-like dialogue agent is an extremely challenging task. With the rapid development of deep learning techniques, data-driven models become more and more prevalent which need a huge amount of real conversation data. In this paper, we construct a large-scale real scenario Chinese E-commerce conversation corpus, JDDC, with more than 1 million multi-turn dialogues, 20 million utterances, and 150 million words. The dataset reflects several characteristics of human-human conversations, e.g., goal-driven, and long-term dependency among the context. It also covers various dialogue types including task-oriented, chitchat and question-answering. Extra intent information and three well-annotated challenge sets are also provided. Then, we evaluate several retrieval-based and generative models to provide basic benchmark performance on the JDDC corpus. And we hope JDDC can serve as an effective testbed and benefit the development of fundamenta...
Human conversations are complicated and building a human-like dialogue agent is an extremely chal... more Human conversations are complicated and building a human-like dialogue agent is an extremely challenging task. With the rapid development of deep learning techniques, data-driven models become more and more prevalent which need a huge amount of real conversation data. In this paper, we construct a large-scale real scenario Chinese E-commerce conversation corpus, JDDC, with more than 1 million multi-turn dialogues, 20 million utterances, and 150 million words. The dataset reflects several characteristics of human-human conversations, e.g., goal-driven, and long-term dependency among the context. It also covers various dialogue types including task-oriented, chitchat and question-answering. Extra intent information and three well-annotated challenge sets are also provided. Then, we evaluate several retrieval-based and generative models to provide basic benchmark performance on the JDDC corpus. And we hope JDDC can serve as an effective testbed and benefit the development of fundamenta...
Proceedings of the 28th ACM International Conference on Multimedia, 2020
We present a novel Chinese calligraphy artwork composition system (MaLiang) which can generate ae... more We present a novel Chinese calligraphy artwork composition system (MaLiang) which can generate aesthetic, stylistic and diverse calligraphy images based on the emotion status from the input text. Different from previous research, it's the first work to endow the calligraphy synthesis with the ability to express fickle emotions and composite a whole piece of discourse-level calligraphy artwork instead of single character images. The system consists of three modules: emotion detection, character image generation, and layout prediction. As a creative form of interactive art, MaLiang has been exhibited in several famous international art festivals.
Proceedings of the 29th ACM International Conference on Multimedia, 2021
Emotion plays a critical role in calligraphy composition, which makes the calligraphy artwork imp... more Emotion plays a critical role in calligraphy composition, which makes the calligraphy artwork impressive and have a soul. However, previous research on calligraphy generation all neglected the emotion as a major contributor to the artistry of calligraphy. Such defects prevent them from generating aesthetic, stylistic, and diverse calligraphy artworks, but only static handwriting font library instead. To address this problem, we propose a novel cross-modal approach to generate stylistic and diverse Chinese calligraphy artwork driven by different emotions automatically. We firstly detect the emotions in the text by a classifier, then generate the emotional Chinese character images via a novel modified Generative Adversarial Network (GAN) structure, finally we predict the layout for all character images with a recurrent neural network. We also collect a large-scale stylistic Chinese calligraphy image dataset with rich emotions. Experimental results demonstrate that our model outperforms all baseline image translation models significantly for different emotional styles in terms of content accuracy and style discrepancy. Besides, our layout algorithm can also learn the patterns and habits of calligrapher, and makes the generated calligraphy more artistic. To the best of our knowledge, we are the first to work on emotion-driven discourse-level Chinese calligraphy artwork composition. CCS CONCEPTS • Applied computing → Fine arts; • Computing methodologies → Image representations.
2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), 2021
Sketch plays a critical role in the human art creation process. As one of the functions of the sk... more Sketch plays a critical role in the human art creation process. As one of the functions of the sketch, text-to-sketch may help the artists to catch the fleeting inspirations efficiently. Different from traditional text2image tasks, sketches consist of only a set of sparse lines and depend on very strict edge information, which requires the model to understand the text descriptions accurately and control the shape and texture in the fine-grained granularity. However, there was very rare previous research on the challenging text2sketch task. In this paper, we first construct a text2sketch image dataset by modifying the prevalent CUB dataset. Then a novel Generative Adversarial Network (GAN) based model is proposed by leveraging a Conditional Layer-Instance Normalization (CLIN) module, which can fuse the image features and sentence vector effectively and guide the sketch generation process. Extensive experiments were conducted and the results show the superiority of our proposed model compared to previous baselines. An in-depth analysis was also made to illustrate the contribution of each module and the limitation of our work.