Walid Saba | Northeastern University (original) (raw)
Papers by Walid Saba
Draft, 2024
Large language models (LLMs) are essentially a massive bottom-up, data-driven experiment in rever... more Large language models (LLMs) are essentially a massive bottom-up, data-driven experiment in reverse engineering of language at scale. With the massive amount of text ingested, the neural networks underlying these LLMs managed to learn the distribution of “ordinary spoken language” in such a way that they can subsequently draw on that distribution to generate grammatically correct and semantically coherent text in response to user prompts. As impressive as they seem, however, LLMs do not truly understand language and the impressive ‘generative’ capabilities are not an indication of the language competency of LLMs. To accurately test the language understanding capabilities of these LLMs we should prompt LLMs with a snippet of text and embed a query that questions their understanding of the input text. Done properly, it becomes clear that these massive stochastic hashtables do not ‘understand’ language. However, as a mas-sive associative memory of how humans sensibly talk about the world they live in, LLMs can be a power-ful reverse engineering tool that can help us uncover the conceptual structure that seems to be implicitly assumed in our linguistic communication. The result of this process, that has been previously suggested by several luminaries in the philosophy of language and the philosophy of mind, is no less than the discovery of the ontology of natural language and the conceptual structure of the language of thought.
arXiv (Cornell University), Sep 10, 2023
In our opinion the exuberance surrounding the relative success of datadriven large language model... more In our opinion the exuberance surrounding the relative success of datadriven large language models (LLMs) is slightly misguided and for several reasons (i) LLMs cannot be relied upon for factual information since for LLMs all ingested text (factual or non-factual) was created equal; (ii) due to their subsymbolic nature, whatever 'knowledge' these models acquire about language will always be buried in billions of microfeatures (weights), none of which is meaningful on its own; and (iii) LLMs will often fail to make the correct inferences in several linguistic contexts (e.g., nominal compounds, copredication, quantifier scope ambiguities, intensional contexts. Since we believe the relative success of data-driven large language models (LLMs) is not a reflection on the symbolic vs. subsymbolic debate but a reflection on applying the successful strategy of a bottom-up reverse engineering of language at scale, we suggest in this paper applying the effective bottom-up strategy in a symbolic setting resulting in symbolic, explainable, and ontologically grounded language models.
arXiv (Cornell University), Aug 8, 2008
arXiv (Cornell University), Sep 30, 2018
We are in ihe process of building DALIA-an environment for distributed, artificial, and linguisti... more We are in ihe process of building DALIA-an environment for distributed, artificial, and linguistically competent intelligent agents that communicate in natural language and perform commonsense reasoning in a highly dynamic and uncertain environment. There are several challenges in this effort that we do not touch on in this paper. Instead, we focus here on the design of a virtual marketplace where buying and selling agents that learn from experience negotiate autonomously on behalf of their clients. Buying and selling agents enter the ...
Lecture Notes in Computer Science, 2001
It is by now widely accepted that a number of tasks in natural language understanding (NLU) requi... more It is by now widely accepted that a number of tasks in natural language understanding (NLU) require the storage of and reasoning with a vast amount of background (commonsense) knowledge. While several efforts have been made to build such ontologies, a consensus on a scientific methodology for ontological design is yet to emerge. In this paper we suggest an approach to building a commonsense ontology for language understanding using language itself as a design guide. The idea is rooted in Frege's conception of compositional ...
The purpose of this paper is twofold: (i) we will argue that formal semantics might have faltered... more The purpose of this paper is twofold: (i) we will argue that formal semantics might have faltered due to its failure in distinguishing between two fundamentally very different types of concepts, namely ontological concepts, that should be types in a strongly-typed ontology, and logical concepts, that are predicates corresponding to properties of, and relations between, objects of various ontological types; and (ii) we show that accounting for these differences amounts to a new formal semantics; one that integrates lexical and compositional semantics in one coherent framework and one where formal semantics is embedded with a strongly typed ontology; an ontology that reflects our commonsense knowledge of the world and the way we talk about it in ordinary language. We will show how in such a framework a number of challenges in the semantics of natural language are adequately and systematically treated.
arXiv (Cornell University), Jul 20, 2023
Research on computing science, 2006
Lecture Notes in Computer Science, 2002
A mental state model for autonomous agent negotiation in is described. In this model, agent negot... more A mental state model for autonomous agent negotiation in is described. In this model, agent negotiation is assumed to be a function of the agents' mental state (attitude) and their prior experiences. The mental state model we describe here subsumes both competitive and cooperative agent negotiations. The model is first instantiated by buying and selling agents (competitively) negotiating in a virtual marketplace. Subsequently, it is shown that agent negotiations tend to be more cooperative than competitive as agents tend to agree (more ...
arXiv (Cornell University), May 30, 2023
Künstliche Intell., 2009
arXiv (Cornell University), Oct 11, 2006
The purpose of this paper is twofold: (i) we argue that the structure of commonsense knowledge mu... more The purpose of this paper is twofold: (i) we argue that the structure of commonsense knowledge must be discovered, rather than invented; and (ii) we argue that natural language, which is the best known theory of our (shared) commonsense knowledge, should itself be used as a guide to discovering the structure of commonsense knowledge. In addition to suggesting a systematic method to the discovery of the structure of commonsense knowledge, the method we propose seems to also provide an explanation for a number of phenomena in natural language, such as metaphor, intensionality, and the semantics of nominal compounds. Admittedly, our ultimate goal is quite ambitious, and it is no less than the systematic 'discovery' of a well-typed ontology of commonsense knowledge, and the subsequent formulation of the long-awaited goal of a meaning algebra.
arXiv (Cornell University), Apr 14, 2019
arXiv (Cornell University), Sep 30, 2018
The Winograd Schema (WS) challenge has been proposed as an alternative to the Turing Test as a te... more The Winograd Schema (WS) challenge has been proposed as an alternative to the Turing Test as a test for machine intelligence. In this short paper we "situate" the WS challenge in the data-information-knowledge continuum, suggesting in the process what a good WS is. Furthermore, we suggest that the WS is a special case of a more general phenomenon in language understanding, namely the phenomenon of the "missing text". In particular, we will argue that what we usually call thinking in the process of language understanding almost always involves discovering the missing text - text is rarely explicitly stated but is implicitly assumed as shared background knowledge. We therefore suggest extending the WS challenge to include tests beyond those involving reference resolution, including examples that require discovering the missing text in situations that are usually treated in computational linguistics under different labels, such as metonymy, quantifier scope ambiguity, lexical disambiguation, and co-predication, to name a few.
arXiv (Cornell University), Aug 6, 2018
arXiv (Cornell University), Dec 1, 2007
Draft, 2024
Large language models (LLMs) are essentially a massive bottom-up, data-driven experiment in rever... more Large language models (LLMs) are essentially a massive bottom-up, data-driven experiment in reverse engineering of language at scale. With the massive amount of text ingested, the neural networks underlying these LLMs managed to learn the distribution of “ordinary spoken language” in such a way that they can subsequently draw on that distribution to generate grammatically correct and semantically coherent text in response to user prompts. As impressive as they seem, however, LLMs do not truly understand language and the impressive ‘generative’ capabilities are not an indication of the language competency of LLMs. To accurately test the language understanding capabilities of these LLMs we should prompt LLMs with a snippet of text and embed a query that questions their understanding of the input text. Done properly, it becomes clear that these massive stochastic hashtables do not ‘understand’ language. However, as a mas-sive associative memory of how humans sensibly talk about the world they live in, LLMs can be a power-ful reverse engineering tool that can help us uncover the conceptual structure that seems to be implicitly assumed in our linguistic communication. The result of this process, that has been previously suggested by several luminaries in the philosophy of language and the philosophy of mind, is no less than the discovery of the ontology of natural language and the conceptual structure of the language of thought.
arXiv (Cornell University), Sep 10, 2023
In our opinion the exuberance surrounding the relative success of datadriven large language model... more In our opinion the exuberance surrounding the relative success of datadriven large language models (LLMs) is slightly misguided and for several reasons (i) LLMs cannot be relied upon for factual information since for LLMs all ingested text (factual or non-factual) was created equal; (ii) due to their subsymbolic nature, whatever 'knowledge' these models acquire about language will always be buried in billions of microfeatures (weights), none of which is meaningful on its own; and (iii) LLMs will often fail to make the correct inferences in several linguistic contexts (e.g., nominal compounds, copredication, quantifier scope ambiguities, intensional contexts. Since we believe the relative success of data-driven large language models (LLMs) is not a reflection on the symbolic vs. subsymbolic debate but a reflection on applying the successful strategy of a bottom-up reverse engineering of language at scale, we suggest in this paper applying the effective bottom-up strategy in a symbolic setting resulting in symbolic, explainable, and ontologically grounded language models.
arXiv (Cornell University), Aug 8, 2008
arXiv (Cornell University), Sep 30, 2018
We are in ihe process of building DALIA-an environment for distributed, artificial, and linguisti... more We are in ihe process of building DALIA-an environment for distributed, artificial, and linguistically competent intelligent agents that communicate in natural language and perform commonsense reasoning in a highly dynamic and uncertain environment. There are several challenges in this effort that we do not touch on in this paper. Instead, we focus here on the design of a virtual marketplace where buying and selling agents that learn from experience negotiate autonomously on behalf of their clients. Buying and selling agents enter the ...
Lecture Notes in Computer Science, 2001
It is by now widely accepted that a number of tasks in natural language understanding (NLU) requi... more It is by now widely accepted that a number of tasks in natural language understanding (NLU) require the storage of and reasoning with a vast amount of background (commonsense) knowledge. While several efforts have been made to build such ontologies, a consensus on a scientific methodology for ontological design is yet to emerge. In this paper we suggest an approach to building a commonsense ontology for language understanding using language itself as a design guide. The idea is rooted in Frege's conception of compositional ...
The purpose of this paper is twofold: (i) we will argue that formal semantics might have faltered... more The purpose of this paper is twofold: (i) we will argue that formal semantics might have faltered due to its failure in distinguishing between two fundamentally very different types of concepts, namely ontological concepts, that should be types in a strongly-typed ontology, and logical concepts, that are predicates corresponding to properties of, and relations between, objects of various ontological types; and (ii) we show that accounting for these differences amounts to a new formal semantics; one that integrates lexical and compositional semantics in one coherent framework and one where formal semantics is embedded with a strongly typed ontology; an ontology that reflects our commonsense knowledge of the world and the way we talk about it in ordinary language. We will show how in such a framework a number of challenges in the semantics of natural language are adequately and systematically treated.
arXiv (Cornell University), Jul 20, 2023
Research on computing science, 2006
Lecture Notes in Computer Science, 2002
A mental state model for autonomous agent negotiation in is described. In this model, agent negot... more A mental state model for autonomous agent negotiation in is described. In this model, agent negotiation is assumed to be a function of the agents' mental state (attitude) and their prior experiences. The mental state model we describe here subsumes both competitive and cooperative agent negotiations. The model is first instantiated by buying and selling agents (competitively) negotiating in a virtual marketplace. Subsequently, it is shown that agent negotiations tend to be more cooperative than competitive as agents tend to agree (more ...
arXiv (Cornell University), May 30, 2023
Künstliche Intell., 2009
arXiv (Cornell University), Oct 11, 2006
The purpose of this paper is twofold: (i) we argue that the structure of commonsense knowledge mu... more The purpose of this paper is twofold: (i) we argue that the structure of commonsense knowledge must be discovered, rather than invented; and (ii) we argue that natural language, which is the best known theory of our (shared) commonsense knowledge, should itself be used as a guide to discovering the structure of commonsense knowledge. In addition to suggesting a systematic method to the discovery of the structure of commonsense knowledge, the method we propose seems to also provide an explanation for a number of phenomena in natural language, such as metaphor, intensionality, and the semantics of nominal compounds. Admittedly, our ultimate goal is quite ambitious, and it is no less than the systematic 'discovery' of a well-typed ontology of commonsense knowledge, and the subsequent formulation of the long-awaited goal of a meaning algebra.
arXiv (Cornell University), Apr 14, 2019
arXiv (Cornell University), Sep 30, 2018
The Winograd Schema (WS) challenge has been proposed as an alternative to the Turing Test as a te... more The Winograd Schema (WS) challenge has been proposed as an alternative to the Turing Test as a test for machine intelligence. In this short paper we "situate" the WS challenge in the data-information-knowledge continuum, suggesting in the process what a good WS is. Furthermore, we suggest that the WS is a special case of a more general phenomenon in language understanding, namely the phenomenon of the "missing text". In particular, we will argue that what we usually call thinking in the process of language understanding almost always involves discovering the missing text - text is rarely explicitly stated but is implicitly assumed as shared background knowledge. We therefore suggest extending the WS challenge to include tests beyond those involving reference resolution, including examples that require discovering the missing text in situations that are usually treated in computational linguistics under different labels, such as metonymy, quantifier scope ambiguity, lexical disambiguation, and co-predication, to name a few.
arXiv (Cornell University), Aug 6, 2018
arXiv (Cornell University), Dec 1, 2007
Large language models (LLMs) have achieved a milestone that undeniably changed many held beliefs ... more Large language models (LLMs) have achieved a milestone that undeniably changed many held beliefs in artificial intelligence (AI). However, there remains many limitations of these LLMs when it comes to true language understanding, limitations that are a byproduct of the underlying architecture of deep neural networks. Moreover, and due to their subsymbolic nature, whatever knowledge these models acquire about how language works will always be buried in billions of microfeatures (weights), none of which is meaningful on its own, making such models hopelessly unexplainable. To address these limitations, we suggest combining the strength of symbolic representations with what we believe to be the key to the success of LLMs, namely a successful bottom-up reverse engineering of language at scale. As such we argue for a bottom-up reverse engineering of language in a symbolic setting. Hints on what this project amounts to have been suggested by several authors, and we discuss in some detail here how this project could be accomplished. We know any object only through predicates that we can say or think of it.
We argue that logical semantics might have faltered due to its failure in distinguishing between ... more We argue that logical semantics might have faltered due to its failure in distinguishing between two fundamentally very different types of concepts: ontological concepts, that should be types in a strongly-typed ontology, and logical concepts, that are predicates corresponding to properties of and relations between objects of various ontological types. We will then show that accounting for these differences amounts to the integration of lexical and compositional semantics in one coherent framework, and to an embedding in our logical semantics of a strongly-typed ontology that reflects our commonsense view of the world and the way we talk about it in ordinary language. We will show that in such a framework a number of challenges in natural language semantics can be adequately and systematically treated.