Tomasz Dryjanski - Academia.edu (original) (raw)
Related Authors
Ministero dell'Istruzione, dell'Università e della Ricerca (M.I.U.R.)
Uploads
Papers by Tomasz Dryjanski
2018 IEEE International Conference on Big Data (Big Data), 2018
We propose a highly precise, production-ready neural model for affective natural language generat... more We propose a highly precise, production-ready neural model for affective natural language generation. It is designed to add predefined sentiment to neutral utterances without changing the meaning significantly. It works by inferring phrases and their insertion points. In our work we also propose strict correctness criteria and apply them to our inference results achieving human-level precision.The model is not specific to any particular domain like IoT or restaurants review. We use six selected emotion categories, but we also speculate that the model could be applied to other affective categories, like informal style or politeness, without a design change.
viXra, 2017
This paper proposes an alternative to the Paragraph Vector algorithm, generating fixed-length vec... more This paper proposes an alternative to the Paragraph Vector algorithm, generating fixed-length vectors of human-readable features for natural language corpora. It extends word2vec retaining its other advantages like speed and accuracy, hence its proposed name is doc2feat. Extracted features are presented as lists of words with their proximity to the particular feature, allowing interpretation and manual annotation. By parameter tuning focus can be made on grammatical aspects of the corpus language, making it useful for linguistic applications. The algorithm can run on variable-length pieces of texts, and provides insight into what features are relevant for text classification or sentiment analysis. The corpus does not have to, and in specific cases should not be, preprocessed with stemming or stop-words removal.
Proceedings of the 12th International Conference on Natural Language Generation, 2019
This paper describes our submission to the TL;DR challenge. Neural abstractive summarization mode... more This paper describes our submission to the TL;DR challenge. Neural abstractive summarization models have been successful in generating fluent and consistent summaries with advancements like the copy (Pointer-generator) and coverage mechanisms. However, these models suffer from their extractive nature as they learn to copy words from the source text. In this paper, we propose a novel abstractive model based on Variational Autoencoder (VAE) to address this issue. We also propose a Unified Summarization Framework for the generation of summaries. Our model eliminates non-critical information at a sentencelevel with an extractive summarization module and generates the summary word by word using an abstractive summarization module. To implement our framework, we combine submodules with state-of-the-art techniques including Pointer-Generator Network (PGN) and BERT while also using our new VAE-PGN abstractive model. We evaluate our model on the benchmark Reddit corpus as part of the TL;DR challenge and show that our model outperforms the baseline in ROUGE score while generating diverse summaries.
2018 IEEE International Conference on Big Data (Big Data), 2018
We propose a highly precise, production-ready neural model for affective natural language generat... more We propose a highly precise, production-ready neural model for affective natural language generation. It is designed to add predefined sentiment to neutral utterances without changing the meaning significantly. It works by inferring phrases and their insertion points. In our work we also propose strict correctness criteria and apply them to our inference results achieving human-level precision.The model is not specific to any particular domain like IoT or restaurants review. We use six selected emotion categories, but we also speculate that the model could be applied to other affective categories, like informal style or politeness, without a design change.
viXra, 2017
This paper proposes an alternative to the Paragraph Vector algorithm, generating fixed-length vec... more This paper proposes an alternative to the Paragraph Vector algorithm, generating fixed-length vectors of human-readable features for natural language corpora. It extends word2vec retaining its other advantages like speed and accuracy, hence its proposed name is doc2feat. Extracted features are presented as lists of words with their proximity to the particular feature, allowing interpretation and manual annotation. By parameter tuning focus can be made on grammatical aspects of the corpus language, making it useful for linguistic applications. The algorithm can run on variable-length pieces of texts, and provides insight into what features are relevant for text classification or sentiment analysis. The corpus does not have to, and in specific cases should not be, preprocessed with stemming or stop-words removal.
Proceedings of the 12th International Conference on Natural Language Generation, 2019
This paper describes our submission to the TL;DR challenge. Neural abstractive summarization mode... more This paper describes our submission to the TL;DR challenge. Neural abstractive summarization models have been successful in generating fluent and consistent summaries with advancements like the copy (Pointer-generator) and coverage mechanisms. However, these models suffer from their extractive nature as they learn to copy words from the source text. In this paper, we propose a novel abstractive model based on Variational Autoencoder (VAE) to address this issue. We also propose a Unified Summarization Framework for the generation of summaries. Our model eliminates non-critical information at a sentencelevel with an extractive summarization module and generates the summary word by word using an abstractive summarization module. To implement our framework, we combine submodules with state-of-the-art techniques including Pointer-Generator Network (PGN) and BERT while also using our new VAE-PGN abstractive model. We evaluate our model on the benchmark Reddit corpus as part of the TL;DR challenge and show that our model outperforms the baseline in ROUGE score while generating diverse summaries.