Neal Goldfarb - Academia.edu (original) (raw)

Papers by Neal Goldfarb

Annual Review of Linguistics, 2021

Over the past decade, the idea of using corpus linguistics in legal interpretation has attracted ... more Over the past decade, the idea of using corpus linguistics in legal interpretation has attracted interest on the part of judges, lawyers, and legal academics in the United States. This review provides an introduction to this nascent movement, which is generally referred to as Law and Corpus Linguistics (LCL). After briefly summarizing LCL’s origin and development, I situate LCL within legal interpretation by discussing the legal concept of ordinary meaning, which establishes the framework within which LCL operates. Next, I situate LCL within linguistics by identifying the subfields that are most relevant to LCL. I then offer a linguistic justification for an idea that is implicit in the case law and that provides important support for using corpus analysis in legal interpretation: that data about patterns of usage provide evidence of how words and other expressions are ordinarily understood. I go on to discuss linguistic issues that arise from the use of corpus linguistics in disputes that involve lexical ambiguity and categorization. Finally, I point out some challenges that the growth of LCL will present for both legal professionals and linguists.

Social Science Research Network, Jan 26, 2017

Social Science Research Network, 2019

This is an in-depth linguistic analysis of the key language in the Second Amendment ("the ri... more This is an in-depth linguistic analysis of the key language in the Second Amendment ("the right of the people to keep and bear Arms") that is based primarily on evidence of actual 18th-century usage. That evidence comes from two corpora that have been developed and made available by the BYU Law School as resources for researching the original meaning of the language used in the Constitution: COFEA (the Corpus of Founding Era American English) and COEME (the Corpus of Early Modern English). The corpus data provides powerful evidence that contrary to what the Supreme Court held in District of Columbia v. Heller, "bear arms" was used in the Second Amendment in its idiomatic military sense, and in fact that it was most likely understood to mean serve in the militia. Thus, the right to bear arms was most likely understood as being the right to serve in the militia. The analysis proceeds roughly as follows: "BEAR" and "ARMS": The Supreme Court’s interpretation of "bear" and "arms" in District of Columbia v. Heller was accurate as far as it went, but it is clear from evidence of historical usage that was unavailable at the time that the Court’s interpretation failed to reflect how "bear" and "arms" were actually used in the late 18th century. Although "bear" was sometimes used to mean ‘carry,’ the two words weren’t generally synonymous. The ways in which "bear" was used differed substantially from those for "carry." While "carry" was often used to denote the physical carrying of tangible objects (e.g., "carry baggage"), "bear" was seldom used that way. In fact, "carry" had by the end of the 1600s replaced "bear" as the verb generally used to convey the meaning ‘carry.’ In addition, although "arms" was often used to mean ‘weapons,’ it was also used roughly as often to convey a variety of figurative meanings relating to the military. "BEAR ARMS": The corpus data for "bear arms" was overwhelmingly dominated by uses of the phrase in its idiomatic military sense. (This is unsurprising given the conclusions, above, regarding "bear" and "arms.") The Supreme Court in Heller was therefore mistaken in declaring that the “natural meaning” of "bear arms" was essentially, ‘carry weapons in order to be prepared for confrontation.’ The phrase was ordinarily used to convey the meaning ‘serve in the military’ (specifically, ‘in the militia’) or ‘fight in a war.’ "THE RIGHT OF THE PEOPLE TO...BEAR ARMS": Consistently with how "bear arms" was ordinarily used, the right to bear arms was most likely understood as conveying its idiomatic military sense, and in particular as meaning ‘the right to serve in the militia.’ That conclusion is based to a large extent on the fact that there is reason to think that "bear arms" was understood to mean the same thing as to the right to bear arms as it meant with respect to the duty to bear arms — and the duty to bear arms was understood as a duty to serve in the militia. In addition, there is reason to believe, contrary to what the Court said in Heller, that as used in the Second Amendment, "the people" referred to those who were eligible for militia service. The interpretation described above is not ruled out by the fact that "bear arms" appears as part of the phrase "keep and bear arms." Although that interpretation requires that arms be understood as being simultaneously literal (as part of "keep arms") and figurative (as part of "bear arms") there is reason to believe that that was in fact how "keep and bear arms " was understood at the time of the Second Amendment’s framing and ratification.

Social Science Research Network, Dec 24, 2012

Abstract This article takes a critical look through the lens of linguistics at the “always-speaki... more Abstract This article takes a critical look through the lens of linguistics at the “always-speaking” principle in law — an influential principle that is recited in materials on legislative drafting as the justification for using the present tense, adopted in many common-law jurisdictions as a principle of interpretation, and accepted as a foundation for the linguistic analysis of the use of tense in statutes. The article concludes that the principle is an inadequate basis for interpreting or analysing statutes, for at least two reasons: the interpretive results that the principle is intended to support are explainable in terms of widely accepted principles in the analysis of tense, without any need to posit special principles that apply only to statutes; and the interpretations that would be required if the always-speaking principle were taken seriously would in many cases probably be regarded as unnatural by native speakers of English.

Social Science Research Network, 2023

Social Science Research Network, 2021

Social Science Research Network, 2017

Corpus linguistics has been promoted as a new tool for legal interpretation that provides an alte... more Corpus linguistics has been promoted as a new tool for legal interpretation that provides an alternative to dictionaries. But that is not its only significance. In addition to providing new methodologies, corpus linguistics (and in particular corpus-based lexicography) provides important insights about the nature of word meaning, and about the interpretation of words in context. These insights (by linguists and lexicographers such as John Sinclair, Patrick Hanks, Sue Atkins, and Adam Kilgarriff) challenge the assumptions that underlie the lawyers’ and judges’ analyses of word meaning.

As one might expect given the centrality of dictionaries in disputes over word meaning, legal interpretation presupposes a view of word meaning that is essentially the same as the view that is fostered by dictionaries. Under this view, individual words are the basic units of meaning from which the meanings of sentences are built. Word meanings are seen as discrete entities with (in most cases) clear boundaries.

But corpus linguistics and corpus-based lexicography have shown that the reality is different. Clear boundaries between the meanings of different words, or between the different senses of the same word, often do not exist. Drawing lines between different word senses often has an unavoidable element of arbitrariness, as is shown by the fact that the lines are often drawn differently by different dictionaries. These differences raise questions about the validity of legal interpreters’ relying on dictionaries at all, and at a minimum suggest the need for changes in how dictionaries are used.

Corpus linguistics and corpus-based lexicography have also cast doubt on the view (which most people would regard as simple common sense) that words are the basic unit of meaning, and that the meaning of a sentence can be computed by applying the rules of grammar to the meaning of the individual words. It turns out that in many cases, it makes more sense to regard multiword expressions as the basic units of meaning. The meaning of the whole often differs from the sum of the meanings of the words, in part because a word’s meaning in context can be affected by the words it co-occurs with and the grammatical constructions it is part of.

As a result of these insights, corpus linguistics opens up new ways of thinking about word meaning — which translates into new modes of argumentation and analysis. To illustrate the possibilities, I will take a fresh look at Muscarello v. United States, 524 U.S. 125 (1998), which presented the question whether driving a car or truck with a firearm in the trunk or glove compartment amounted to “carrying” the firearm. Although Muscarello has already been the subject of a corpus-based analysis by Steven Mouritsen, his analysis focused on which of two dictionary senses of the word carry was more common, and therefore assumed the conception of word meaning that is generally reflected in legal interpretation. My approach will differ from Mouritsen’s in two respects. First, rather than look only at which one of two senses is more common, I will ask a more open-ended question: when viewed without preconceptions, what does the corpus data tell us about how the word carry behaves? Second, I will look at the data through the lens of Corpus Pattern Analysis, a corpus-driven lexicographic approach that focuses on multiword patterns rather than on individual word meanings.

Social Science Research Network, 2009

This memorandum is submitted on behalf of seven professors of linguistics—Hana Filip, PhD.; Georg... more This memorandum is submitted on behalf of seven professors of linguistics—Hana Filip, PhD.; Georgia M. Green, PhD.; Jeffrey P. Kaplan, PhD., J.D.; Jason Merchant, PhD.; Barbara Partee, PhD.; Roger Shuy, PhD.; and Thomas Wasow, Ph.D.—for the purpose of challenging the plaintiff’s argument that Secretary Clinton’s appointment violated the plain meaning of the Ineligibility Clause. More particularly, we will show that on the point at issue here, the meaning of the Ineligibility Clause is not plain. The plaintiff’s interpretation represents only one of several possible readings. The clause can also be reasonably read in a way that permits a Senator or Representative’s eligibility for appointment to be restored by reducing the salary of the position in question to what it was when the Senator or Representative’s term began. The ambiguity in the Ineligibility Clause relates to the phrase "shall have been encreased." Plaintiff’s argument assumes that a position’s salary has been increased during a Senator or Representative’s term if at any point in that term the salary went up, even if it later went back down by the same amount. This interpretation corresponds to what linguists would refer to as an "experiential" reading. But the language can also be under-stood to have a "resultative" reading. On the latter interpretation, the salary can be said to “have been increased” only if it went up and stayed up through the time of the appointment. This reading of "have been encreased" is comparable to the interpretation of "I have caught a cold" as meaning that the speaker had gotten sick and was still sick. The resultative reading of the Ineligibility Clause is the same in substance as the government’s “on net” interpretation; the difference is merely a matter of how the interpretation is described. What we hope to do here—or at least one of the things—is to explain why this interpretation is a plausible one and therefore why the Ineligibility Clause is ambiguous.

Social Science Research Network, 2020

This paper critically examines Kevin Tobia’s forthcoming paper Testing Ordinary Meaning: An Exper... more This paper critically examines Kevin Tobia’s forthcoming paper Testing Ordinary Meaning: An Experimental Assessment of What Dictionary Definitions and Linguistic Usage Data Tell Legal Interpreters. Please note that this is a work in progress and that I plan on posting a revised version shortly (for some value of “shortly”). Although I believe that Tobia’s analysis is problematic in multiple respects, I will focus here on the criticism that I think is most important. That criticism challenges not only Tobia’s conclusions, but also his paper’s central premise. In particular, I dispute Tobia’s conclusion that the results from his Concept-Condition experiments establish that corpus linguistics is an inaccurate tool. (Those experiments were the ones in which test subjects were asked, e.g., whether a golf cart is a vehicle.) As I will explain, Tobia’s analysis is based on an unexpressed assumption, and if that assumption is invalid, Tobia’s conclusions are invalid, too. The assumption is that in the context of legal interpretation, “ordinary meaning” means only one thing. But that assumption is unfounded. “Ordinary meaning” is not a technical term in linguistics and to the extent that it has a technical meaning in philosophy of language, I’m unaware of that meaning having had influence on ordinary meaning as a legal concept (which dates back at least to Blackstone). So in its use in connection with the practice of legal interpretation, “ordinary meaning” is a purely legal term. Within that practice, therefore, the meaning of “ordinary meaning” is determined by what the courts say it means and by how they apply the concept in particular cases. And when we look at the caselaw, we can see that there are multiple conceptions of what constitutes ordinary meaning. I will focus here on two of those conceptions (one of which has two subcategories). The differences between these conceptions of ordinary meaning have practical consequences. The outcome in a given case can depend in part on which conception is invoked (either explicitly or implicitly). Similarly—and crucially—the choice of an appropriate interpretive methodology will vary depending on which conception of ordinary meaning one is assuming. Tobia’s Concept-Condition experiments are suited for use with respect to one of the conceptions that I will discuss but not with respect to the other. And the suitability of corpus linguistics presents essentially the opposite situation: it is suitable with respect to the conception of ordinary meaning for which Tobia’s Concept-Condition methodology is unsuitable, and for the most part unsuitable for the conception for which his methodology is suitable. This means that to use Tobia’s experimental methodology to evaluate the accuracy of corpus linguistics is to fall prey to a category error. Tobia’s comparison of the corpus-condition results against those for the concept condition proves nothing about the accuracy of corpus linguistics.

Social Science Research Network, 2019

Corpus linguistics can be a powerful tool in legal interpretation, but like all tools, it is suit... more Corpus linguistics can be a powerful tool in legal interpretation, but like all tools, it is suited for some uses but not for others. At a minimum, that means that there are likely to be cases in which corpus data doesn’t yield any useful insights. More seriously, in some cases where the data seems useful, that appearance might prove on closer examination to be misleading. So it is important for people to be able to distinguish issues as to which corpus results are genuinely useful from those in which they are not. A big part of the motivation behind introducing corpus linguistics into legal interpretation is to increase the sophistication and quality of interpretive analysis. That purpose will be disserved corpus data is cited in support of conclusions that the data doesn’t really support. This paper is an initial attempt to deal with problem of distinguishing uses of corpus linguistics that can yield useful data from those that cannot. In particular, the paper addresses a criticism that has been made of the use of corpus linguistics in legal interpretation — namely, that that the hypothesis underlying the legal-interpretive use of frequency data is flawed. That hypothesis, according to one of the critics, is that “where an ambiguous term retains two plausible meanings, the ordinary meaning of the term... is the more frequently used meaning[.]” (Although that description is not fully accurate, it will suffice for present purposes.) The asserted flaw in this hypothesis is that differences in the frequencies of different senses of a word might be due to “reasons that have little to do with the ordinary meaning of that word.” Such differences, rather than reflecting the “sense of a word or phrase that is most likely implicated in a given linguistic context,” might instead reflect at least in part “the prevalence or newsworthiness of the underlying phenomenon that the term denotes.” That argument is referred to in this paper as the Purple-Car Argument, based on a skeptical comment about the use of corpus linguistics in legal interpretation: “If the word ‘car’ is ten times more likely to co-occur with the word ‘red’ than with the word ‘purple,’ it would be ludicrous to conclude from this data that a purple car is not a ‘car.’” This paper deals with the Purple-Car Argument in two ways. First, it attempts to clarify the argument’s by showing that there are ways of using corpus linguistics that do not involve frequency analysis and that are therefore not even arguably subject to the Purple-Car Argument. The paper offers several case studies illustrating such uses. Second, the acknowledges that when frequency analysis is in fact used, there will be cases that do implicate the flaw that the Purple-Car Argument identifies. The problem, therefore, is to figure out how to distinguish these Purple-Car cases from cases in which the Purple-Car Argument does not apply. The paper discusses some possible methodologies that might be helpful in making that determination. It then presents three case studies, focusing on cases that are well known to those familiar with the law-and-corpus-linguistics literature: Muscarello v. United States, State v. Rasabout, and People v. Harris. The paper concludes that the Purple-Car Argument does not apply to Muscarello, that it does apply to Rasabout, and that a variant of the argument applies to the dissenting opinion in Harris.

Regenerative Medicine, Nov 1, 2011

Following the reversal by the court of appeals of the injunction against federal funding of human... more Following the reversal by the court of appeals of the injunction against federal funding of human embryonic stem cell research, Judge Lamberth has held such funding to be lawful and has dismissed the lawsuit. However, the litigation is likely to continue in the court of appeals and ultimately perhaps in the Supreme Court. The plaintiffs’ arguments against funding human embryonic stem cell research are unlikely to succeed, but unfortunately litigation is an unpredictable process in which outcomes cannot be guaranteed.

Social Science Research Network, 2023

This brief undertakes a critical examination of the corpus analysis set out in the amicus brief f... more This brief undertakes a critical examination of the corpus analysis set out in the amicus brief filed by Pro-Life Utah (“PL Utah”). https://www.linkedin.com/posts/lee-nielsen_pro-life-utah-amicus-brief-activity-7007453806825267200-9BBA/?utm_source=share&utm_medium=member_desktop 1. PL Utah’s brief involves the use of corpus-linguistic methodology in a way that differs strikingly from how corpus linguistics has generally been used in the context of legal interpretation. Rather than using corpus data as evidence as to the meaning of a word or phrase in a legal provision, PL Utah treats it as evidence of public attitudes toward abortion, primarily during the 1890s. That is to say, it tries to use corpus data as a proxy for a public-opinion survey targeting Utahns of the 1890s—a demographic group that no longer exists as such and that, Amicus assumes, has no surviving members. a. In order for PL Utah’s data to be considered reliable evidence supporting the conclusion PL Utah wishes the Court to reach, it would have to be shown that the attitudes expressed in the texts in the corpus, whatever they might be, are representative of the relevant attitudes of the overall population of 1890s Utah. And that would require that the authors of those texts be shown to have comprised a representative sample of that population. PL Utah has made no such showing; indeed, it has not tried to do so. And beyond that, it is undeniable that the authors of the newspaper do not constitute a representative sample of Utah’s population. To begin with, some of the articles originated out of state, and therefore were not the work of Utahns at all. Moreover, census data from 1890 and 1900 shows that Utah’s small population of journalists was predominantly male. So to the extent the articles were written by Utahns, women are likely to have been underrepresented in that group of authors. The unrepresentative nature of the newspaper evidence becomes especially clear when considering the fact that during the 1890s, Utah newspapers published more than 2,000 advertisements for what were euphemistically called “female pills”: concoctions that were reputed to be effective in inducing miscarriages and that were used for that purpose. This is evidence that, contrary to what PL Utah contends, Utahns in the 1890s were not united in opposition to abortion. b. Serious flaws are also found in PL Utah’s collocation data. First, the data as presented by the COHA collocation display (and as reported by PL Utah) data consists of what seem to be 33 uses of abortion(s) or abortionist(s). But 22 of those apparent uses reflect multiple counting, in that they come from only five sources, and are therefore attributable to only five authors. When these two flaws are taken into account, the apparent number of relevant uses turns out to have been exaggerated by more than 300%: Rather than 33, there are only 10. 2. The brief concludes with a short discussion of several issues that relate generally to the use of corpus linguistics in legal interpretation, and that Amicus thinks it is important for this Court to be aware of.

Social Science Research Network, Jan 2, 2018

I want to acknowledge in particular Stephen Mouritsen's presentation on Muscarello and my convers... more I want to acknowledge in particular Stephen Mouritsen's presentation on Muscarello and my conversations with Mark Davies and Stefan Th. Gries.

Social Science Research Network, 2023

SSRN Electronic Journal, 2023

1. PL Utah’s brief involves the use of corpus-linguistic methodology in a way that differs strikingly from how corpus linguistics has generally been used in the context of legal interpretation. Rather than using corpus data as evidence as to the meaning of a word or phrase in a legal provision, PL Utah treats it as evidence of public attitudes toward abortion, primarily during the 1890s. That is to say, it tries to use corpus data as a proxy for a public-opinion survey targeting Utahns of the 1890s—a demographic group that no longer exists as such and that, Amicus assumes, has no surviving members.

a. In order for PL Utah’s data to be considered reliable evidence supporting the conclusion PL Utah wishes the Court to reach, it would have to be shown that the attitudes expressed in the texts in the corpus, whatever they might be, are representative of the relevant attitudes of the overall population of 1890s Utah. And that would require that the authors of those texts be shown to have comprised a representative sample of that population.

PL Utah has made no such showing; indeed, it has not tried to do so. And beyond that, it is undeniable that the authors of the newspaper do not constitute a representative sample of Utah’s population. To begin with, some of the articles originated out of state, and therefore were not the work of Utahns at all. Moreover, census data from 1890 and 1900 shows that Utah’s small population of journalists was predominantly male. So to the extent the articles were written by Utahns, women are likely to have been underrepresented in that group of authors.

The unrepresentative nature of the newspaper evidence becomes especially clear when considering the fact that during the 1890s, Utah newspapers published more than 2,000 advertisements for what were euphemistically called “female pills”: concoctions that were reputed to be effective in inducing miscarriages and that were used for that purpose. This is evidence that, contrary to what PL Utah contends, Utahns in the 1890s were not united in opposition to abortion.

b. Serious flaws are also found in PL Utah’s collocation data. First, the data as presented by the COHA collocation display (and as reported by PL Utah) data consists of what seem to be 33 uses of abortion(s) or abortionist(s). But 22 of those apparent uses reflect multiple counting, in that they come from only five sources, and are therefore attributable to only five authors. When these two flaws are taken into account, the apparent number of relevant uses turns out to have been exaggerated by more than 300%: Rather than 33, there are only 10.

2. The brief concludes with a short discussion of several issues that relate generally to the use of corpus linguistics in legal interpretation, and that Amicus thinks it is important for this Court to be aware of.

Social Science Research Network, Jan 26, 2017

Social Science Research Network, Dec 24, 2012

This article takes a critical look through the lens of linguistics at the “always-speaking” princ... more This article takes a critical look through the lens of linguistics at the “always-speaking” principle in law — an influential principle that is recited in materials on legislative drafting as the justification for using the present tense, adopted in many common-law jurisdictions as a principle of interpretation, and accepted as a foundation for the linguistic analysis of the use of tense in statutes. The article concludes that the principle is an inadequate basis for interpreting or analysing statutes, for at least two reasons: the interpretive results that the principle is intended to support are explainable in terms of widely accepted principles in the analysis of tense, without any need to posit special principles that apply only to statutes; and the interpretations that would be required if the always-speaking principle were taken seriously would in many cases probably be regarded as unnatural by native speakers of English.

Regenerative Medicine, 2011

SSRN Electronic Journal, 2021

Annual Review of Linguistics, 2021

Social Science Research Network, Jan 26, 2017

Social Science Research Network, 2019

Social Science Research Network, Dec 24, 2012

Social Science Research Network, 2023

Social Science Research Network, 2021

Social Science Research Network, 2017

Social Science Research Network, 2009

Social Science Research Network, 2020

Social Science Research Network, 2019

Regenerative Medicine, Nov 1, 2011

Social Science Research Network, 2023

Social Science Research Network, Jan 2, 2018

Social Science Research Network, 2023

SSRN Electronic Journal, 2023

Social Science Research Network, Jan 26, 2017

Social Science Research Network, Dec 24, 2012

Regenerative Medicine, 2011

SSRN Electronic Journal, 2021