Natural Language Generation from Knowledge-Base Triples (original) (raw)

The main goal of this master thesis is to create a machine-learning-based tool that is able to verbalize given data, i.e., from given RDF triples; it should be able to create a corresponding text in a natural language (English) such that the text must be grammatically correct, fluent, must contain all information from the input data and cannot have any additional information. The thesis begins with examining the publicly available datasets; then, it focuses on the architectures of statistical machine learning models and their possible usage for natural language generation. The work is also focused on possible numerical text representation, text generation by machine learning models, and optimization algorithms for training the models. The next part of the thesis proposes two main solutions to the problem and examines each of them. Automatic metrics evaluate all systems, and the best performing models are then passed to a human (manual) evaluation. The last part of the thesis focuses...