Compositional Distributional Semantics with Long Short Term Memory (original) (raw)

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank

Semantic word spaces have been very useful but cannot express the meaning of longer phrases in a principled way. Further progress towards understanding compositionality in tasks such as sentiment detection requires richer supervised training and evaluation resources and more powerful models of composition. To remedy this, we introduce a Sentiment Treebank. It includes fine grained sentiment labels for 215,154 phrases in the parse trees of 11,855 sentences and presents new challenges for sentiment compositionality. To address them, we introduce the Recursive Neural Tensor Network. When trained on the new treebank, this model outperforms all previous methods on several metrics. It pushes the state of the art in single sentence positive/negative classification from 80% up to 85.4%. The accuracy of predicting fine-grained sentiment labels for all phrases reaches 80.7%, an improvement of 9.7% over bag of features baselines. Lastly, it is the only model that can accurately capture the effects of negation and its scope at various tree levels for both positive and negative phrases.

Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks

Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2015

Because of their superior ability to preserve sequence information over time, Long Short-Term Memory (LSTM) networks, a type of recurrent neural network with a more complex computational unit, have obtained strong results on a variety of sequence modeling tasks. The only underlying LSTM structure that has been explored so far is a linear chain. However, natural language exhibits syntactic properties that would naturally combine words to phrases. We introduce the Tree-LSTM, a generalization of LSTMs to tree-structured network topologies. Tree-LSTMs outperform all existing systems and strong LSTM baselines on two tasks: predicting the semantic relatedness of two sentences (SemEval 2014, Task 1) and sentiment classification (Stanford Sentiment Treebank).

TreeNet: Learning Sentence Representations with Unconstrained Tree Structure

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018

Recursive neural network (RvNN) has been proved to be an effective and promising tool to learn sentence representations by explicitly exploiting the sentence structure. However, most existing work can only exploit simple tree structure, e.g., binary trees, or ignore the order of nodes, which yields suboptimal performance. In this paper, we proposed a novel neural network, namely TreeNet, to capture sentences structurally over the raw unconstrained constituency trees, where the number of child nodes can be arbitrary. In TreeNet, each node learns from its left sibling and right child in a bottom-up left-to-right order, thus enabling the net to learn over any tree. Furthermore, multiple soft gates and a memory cell are employed in implementing the TreeNet to determine to what extent it should learn, remember and output, which proves to be a simple and efficient mechanism for semantic synthesis. Moreover, TreeNet significantly suppresses convolutional neural networks (CNN) and Long Short-Term Memory (LSTM) with fewer parameters. It improves the classification accuracy by 2%-5% with 42% of the best CNN's parameters or 94% of standard LSTM's. Extensive experiments demonstrate TreeNet achieves the state-of-the-art performance on all four typical text classification tasks.

Learning continuous phrase representations and syntactic parsing with recursive neural networks

2010

Natural language parsing has typically been done with small sets of discrete categories such as NP and VP, but this representation does not capture the full syntactic nor semantic richness of linguistic phrases, and attempts to improve on this by lexicalizing phrases only partly address the problem at the cost of huge feature spaces and sparseness. To address this, we introduce a recursive neural network architecture for jointly parsing natural language and learning vector space representations for variable-sized inputs. At the core of our architecture are context-sensitive recursive neural networks (CRNN). These networks can induce distributed feature representations for unseen phrases and provide syntactic information to accurately predict phrase structure trees. Most excitingly, the representation of each phrase also captures semantic information: For instance, the phrases "decline to comment" and "would not disclose the terms" are close by in the induced embedding space. Our current system achieves an unlabeled bracketing F-measure of 92.1% on the Wall Street Journal dataset for sentences up to length 15.

Quantifying the Vanishing Gradient and Long Distance Dependency Problem in Recursive Neural Networks and Recursive LSTMs

Proceedings of the 1st Workshop on Representation Learning for NLP, 2016

Recursive neural networks (RNN) and their recently proposed extension recursive long short term memory networks (RLSTM) are models that compute representations for sentences, by recursively combining word embeddings according to an externally provided parse tree. Both models thus, unlike recurrent networks, explicitly make use of the hierarchical structure of a sentence. In this paper, we demonstrate that RNNs nevertheless suffer from the vanishing gradient and long distance dependency problem, and that RLSTMs greatly improve over RNN's on these problems. We present an artificial learning task that allows us to quantify the severity of these problems for both models. We further show that a ratio of gradients (at the root node and a focal leaf node) is highly indicative of the success of backpropagation at optimizing the relevant weights low in the tree. This paper thus provides an explanation for existing, superior results of RLSTMs on tasks such as sentiment analysis, and suggests that the benefits of including hierarchical structure and of including LSTM-style gating are complementary.

Enhancing sentiment analysis through deep layer integration with long short-term memory networks

International Journal of Electrical and Computer Engineering (IJECE), 2025

This involves studying one of the most important parts of natural language processing (NLP): sentiment, or whether a thing that makes a sentence is neutral, positive, or negative. This paper presents an enhanced long short-term memory (LSTM) network for the sentiment analysis task using an additional deep layer to capture sublevel patterns from the word input. So, the process that we followed in our approach is that we cleaned the data, preprocessed it, built the model, trained the model, and finally tested it. The novelty here lies in the additional layer in the architecture of LSTM model, which improves the model performance. We added a deep layer with the intention of improving accuracy and generalizing the model. The results of the experiment are analyzed using recall, F1-score, and accuracy, which in turn show that the deep-layered LSTM model gives us a better prediction. The LSTM model outperformed the baseline in terms of accuracy, recall, and f1-score. The deep layer's forecast accuracy increased dramatically once it was trained to capture intricate sequences. However, the improved model overfitted, necessitating additional regularization and hyperparameter adjustment. In this paper, we have discussed the advantages and disadvantages of using deep layers in LSTM networks and their application to developing models for deep learning with better-performing sentiment analysis.

Recursive Neural Networks Can Learn Logical Semantics

Proceedings of the 3rd Workshop on Continuous Vector Space Models and their Compositionality, 2015

Tree-structured recursive neural networks (TreeRNNs) for sentence meaning have been successful for many applications, but it remains an open question whether the fixed-length representations that they learn can support tasks as demanding as logical deduction. We pursue this question by evaluating whether two such modelsplain TreeRNNs and tree-structured neural tensor networks (TreeRNTNs)-can correctly learn to identify logical relationships such as entailment and contradiction using these representations. In our first set of experiments, we generate artificial data from a logical grammar and use it to evaluate the models' ability to learn to handle basic relational reasoning, recursive structures, and quantification. We then evaluate the models on the more natural SICK challenge data. Both models perform competitively on the SICK data and generalize well in all three experiments on simulated data, suggesting that they can learn suitable representations for logical inference in natural language.

Recursive Neural Networks for Learning Logical Semantics

Tree-structured recursive neural networks (TreeRNNs) for sentence meaning have been successful for many applications, but it remains an open question whether the fixed-length representations that they learn can support tasks as demanding as logical deduction. We pursue this question by evaluating whether two such modelsplain TreeRNNs and tree-structured neural tensor networks (TreeRNTNs)-can correctly learn to identify logical relationships such as entailment and contradiction using these representations. In our first set of experiments, we generate artificial data from a logical grammar and use it to evaluate the models' ability to learn to handle basic relational reasoning, recursive structures, and quantification. We then evaluate the models on the more natural SICK challenge data. Both models perform competitively on the SICK data and generalize well in all three experiments on simulated data, suggesting that they can learn suitable representations for logical inference in natural language.

Learning sentence embeddings using Recursive Networks

—Learning sentence vectors that generalise well is a challenging task. In this paper we compare three methods of learning phrase embeddings: 1) Using LSTMs, 2) using recursive nets, 3) A variant of the method 2 using the POS information of the phrase. We train our models on dictionary definitions of words to obtain a reverse dictionary application similar to Felix et al. [1]. To see if our embeddings can be transferred to a new task we also train and test on the rotten tomatoes dataset [2]. We train keeping the sentence embeddings fixed as well as with fine tuning.