INFERENCE AND INTERPRETABILITY: HOW NEURAL NLI MODELS PERFORM NATURAL LOGIC DEDUCTIONS (original) (raw)

Example-based learning for Natural Language Understanding (NLU) tasks has been a long-standing goal of Artificial Intelligence (AI) and has seen major success as machine learning methods, architecture capacities and the scale of data processing capabilities have improved in recent decades. However, training large, opaque models on very high-level objectives such as Natural Language Inference (NLI) raises fundamental questions about whether appropriate reasoning strategies have been learnt by a given model. In fact, much work has highlighted the emergence of spurious heuristics which aid NLI model performance in unexpected ways, rather than following theoretically expected systematic reasoning routes using appropriate properties. Research on model interpretability has been providing rapidly maturing methodolo- gies which may shed light on the linguistic features and abstract properties captured within the representations of trained models, as well as their comparative effects on model predictions. This thesis isolates a structured subtask of NLI based on natural logic as a frame- work for applying and developing interpretability methods with the end goal of better assessing the reasoning capabilities of NLI models. In particular, we model entailment examples which are single-step natural logic deductions relying on exactly two abstract semantic features: hierarchical concept relations and the monotonicity of a natural language context. Responding to behavioural observations of NLI model limitations, we turn to both observational and interventional interpretability methods to analyze competitive NLI models’ abilities to perform natural logic deductions and diagnose failure patterns at a finer granularity.

Date of Award 31 Dec 2023
Original language English
Awarding Institution The University of Manchester
Supervisor Andre Freitas (Supervisor) & Uli Sattler (Supervisor)

Documents