Zhiqiang Cai | University of Memphis (original) (raw)

Papers by Zhiqiang Cai

Research paper thumbnail of Strengths, Limitations, and Extensions of LSA

Abstract The strength of Latent Semantic Analysis (LSA)(Deerwester, Dumais, Furnas, Landauer, &am... more Abstract The strength of Latent Semantic Analysis (LSA)(Deerwester, Dumais, Furnas, Landauer, & Harshman, 1990, Landauer & Dumais, 1997) has been demonstrated in many applications, many of which are described in this book. This chapter briefly describes how LSA has been effectively integrated in some of the applications developed at the Institute for Intelligent Systems, the University of Memphis. The chapter subsequently identifies some weaknesses of the current use of LSA and proposes a few methods to overcome these ...

Research paper thumbnail of Question Understanding Aid (QUAID): A Web Facility that Tests Question Comprehensibility

Public Opinion Quarterly, 2006

When respondents do not understand the meaning of a survey question, they will not supply valid a... more When respondents do not understand the meaning of a survey question, they will not supply valid and reliable answers. Survey methodologists should therefore benefit from computer tools and other analytical schemes that help them identify problems with questions with respect to comprehension difficulty. We developed a Web facility called Question Understanding Aid (QUAID; www.psyc.memphis.edu/ quaid.html) that assists survey methodologists in identifying problems with the wording, syntax, and semantics of questions on questionnaires. The survey methodologist enters the question into the Web facility, along with any context information and answer alternatives that accompany the question. QUAID quickly returns a list of potential problems with question comprehension, including unfamiliar technical terms, vague or imprecise relative terms, vague or ambiguous noun phrases, complex syntax, and working memory overload. This article describes QUAID and some empirical studies that have assessed the validity and utility of QUAID's critiques of questions. The output of QUAID was compared with the judgments of experts in language, discourse, and cognition during the development of the tool. In one evaluation, expert survey methodologists critiqued and revised problematic questions, whereas in a second evaluation survey methodologists evaluated the ARTHUR C. GRAESSER is a professor in the

Research paper thumbnail of NLS: ANon-Latent Similarity Algorithm

Abstract This paper ,introduces ,a new, algorithm ,for ,calculating semantic similaritywithin and... more Abstract This paper ,introduces ,a new, algorithm ,for ,calculating semantic similaritywithin and between texts. Werefer to this algorithm as NLS, for Non-Latent Similarity. This algorithm makes,use of a ,second-order similarity matrix ,(SOM) based onthe,cosine of the ,vectors from a ,first-order (non-latent) matrix. This first-order matrix (FOM) could be generated in any number ,of ways; here we ,used a method ,modified

Research paper thumbnail of An Orthonormal Basis for Topic Segmentation in Tutorial Dialogue

This paper explores the segmentation of tutorial dialogue into cohesive topics. A latent semantic... more This paper explores the segmentation of tutorial dialogue into cohesive topics. A latent semantic space was created using conversations from human to human tutoring transcripts, allowing cohesion between utterances to be measured using vector similarity.

Research paper thumbnail of Using LSA to Automatically Identify Givenness and Newness of Noun Phrases in Written Discourse

Identifying given and new information within a text has long been addressed as a research issue. ... more Identifying given and new information within a text has long been addressed as a research issue. However, there has previously been no accurate computational method for assessing the degree to which constituents in a text contain given versus new information. This study develops a method for automatically categorizing noun phrases into one of three categories of givenness/newness, using the taxonomy of Prince (1981) as the gold standard. The central computational technique used is span , a derivative of latent semantic analysis (LSA). We analyzed noun phrases from two expository and two narrative texts. Predictors of newness included span as well as pronoun status, determiners, and word overlap with previous noun phrases. Logistic regression showed that span was superior to LSA in categorizing noun-phrases, producing an increase in accuracy from 74% to 80%.

Research paper thumbnail of Coh-Metrix: Analysis of text on cohesion and language

Advances in computational linguistics and discourse processing have made it possible to automate ... more Advances in computational linguistics and discourse processing have made it possible to automate many language-and text-processing mechanisms. We have developed a computer tool called Coh-Metrix, which analyzes texts on over 200 measures of cohesion, language, and readability. Its modules use lexicons, part-of-speech classifiers, syntactic parsers, templates, corpora, latent semantic analysis, and other components that are widely used in computational linguistics. After the user enters an English text, Coh-Metrix returns measures requested by the user. In addition, a facility allows the user to store the results of these analyses in data files (such as Text, Excel, and SPSS). Standard text readability formulas scale texts on difficulty by relying on word length and sentence length, whereas Coh-Metrix is sensitive to cohesion relations, world knowledge, and language and discourse characteristics.

Research paper thumbnail of A Revised Algorithm for Latent Semantic Analysis

Abstract The intelligent tutoring system AutoTutor uses latent semantic analysis to evaluate stud... more Abstract The intelligent tutoring system AutoTutor uses latent semantic analysis to evaluate student answers to the tutor's questions. By comparing a student's answer to a set of expected answers, the system determines how much information is covered and how to continue the tutorial. Despite the success of LSA in tutoring conversations, the system sometimes has difficulties determining at an early stage whether or not an expectation is covered. A new LSA algorithm significantly improves the precision of AutoTutor's natural ...

Research paper thumbnail of Strengths, Limitations, and Extensions of LSA

Abstract The strength of Latent Semantic Analysis (LSA)(Deerwester, Dumais, Furnas, Landauer, &am... more Abstract The strength of Latent Semantic Analysis (LSA)(Deerwester, Dumais, Furnas, Landauer, & Harshman, 1990, Landauer & Dumais, 1997) has been demonstrated in many applications, many of which are described in this book. This chapter briefly describes how LSA has been effectively integrated in some of the applications developed at the Institute for Intelligent Systems, the University of Memphis. The chapter subsequently identifies some weaknesses of the current use of LSA and proposes a few methods to overcome these ...

Research paper thumbnail of Question Understanding Aid (QUAID): A Web Facility that Tests Question Comprehensibility

Public Opinion Quarterly, 2006

When respondents do not understand the meaning of a survey question, they will not supply valid a... more When respondents do not understand the meaning of a survey question, they will not supply valid and reliable answers. Survey methodologists should therefore benefit from computer tools and other analytical schemes that help them identify problems with questions with respect to comprehension difficulty. We developed a Web facility called Question Understanding Aid (QUAID; www.psyc.memphis.edu/ quaid.html) that assists survey methodologists in identifying problems with the wording, syntax, and semantics of questions on questionnaires. The survey methodologist enters the question into the Web facility, along with any context information and answer alternatives that accompany the question. QUAID quickly returns a list of potential problems with question comprehension, including unfamiliar technical terms, vague or imprecise relative terms, vague or ambiguous noun phrases, complex syntax, and working memory overload. This article describes QUAID and some empirical studies that have assessed the validity and utility of QUAID's critiques of questions. The output of QUAID was compared with the judgments of experts in language, discourse, and cognition during the development of the tool. In one evaluation, expert survey methodologists critiqued and revised problematic questions, whereas in a second evaluation survey methodologists evaluated the ARTHUR C. GRAESSER is a professor in the

Research paper thumbnail of NLS: ANon-Latent Similarity Algorithm

Abstract This paper ,introduces ,a new, algorithm ,for ,calculating semantic similaritywithin and... more Abstract This paper ,introduces ,a new, algorithm ,for ,calculating semantic similaritywithin and between texts. Werefer to this algorithm as NLS, for Non-Latent Similarity. This algorithm makes,use of a ,second-order similarity matrix ,(SOM) based onthe,cosine of the ,vectors from a ,first-order (non-latent) matrix. This first-order matrix (FOM) could be generated in any number ,of ways; here we ,used a method ,modified

Research paper thumbnail of An Orthonormal Basis for Topic Segmentation in Tutorial Dialogue

This paper explores the segmentation of tutorial dialogue into cohesive topics. A latent semantic... more This paper explores the segmentation of tutorial dialogue into cohesive topics. A latent semantic space was created using conversations from human to human tutoring transcripts, allowing cohesion between utterances to be measured using vector similarity.

Research paper thumbnail of Using LSA to Automatically Identify Givenness and Newness of Noun Phrases in Written Discourse

Identifying given and new information within a text has long been addressed as a research issue. ... more Identifying given and new information within a text has long been addressed as a research issue. However, there has previously been no accurate computational method for assessing the degree to which constituents in a text contain given versus new information. This study develops a method for automatically categorizing noun phrases into one of three categories of givenness/newness, using the taxonomy of Prince (1981) as the gold standard. The central computational technique used is span , a derivative of latent semantic analysis (LSA). We analyzed noun phrases from two expository and two narrative texts. Predictors of newness included span as well as pronoun status, determiners, and word overlap with previous noun phrases. Logistic regression showed that span was superior to LSA in categorizing noun-phrases, producing an increase in accuracy from 74% to 80%.

Research paper thumbnail of Coh-Metrix: Analysis of text on cohesion and language

Advances in computational linguistics and discourse processing have made it possible to automate ... more Advances in computational linguistics and discourse processing have made it possible to automate many language-and text-processing mechanisms. We have developed a computer tool called Coh-Metrix, which analyzes texts on over 200 measures of cohesion, language, and readability. Its modules use lexicons, part-of-speech classifiers, syntactic parsers, templates, corpora, latent semantic analysis, and other components that are widely used in computational linguistics. After the user enters an English text, Coh-Metrix returns measures requested by the user. In addition, a facility allows the user to store the results of these analyses in data files (such as Text, Excel, and SPSS). Standard text readability formulas scale texts on difficulty by relying on word length and sentence length, whereas Coh-Metrix is sensitive to cohesion relations, world knowledge, and language and discourse characteristics.

Research paper thumbnail of A Revised Algorithm for Latent Semantic Analysis

Abstract The intelligent tutoring system AutoTutor uses latent semantic analysis to evaluate stud... more Abstract The intelligent tutoring system AutoTutor uses latent semantic analysis to evaluate student answers to the tutor's questions. By comparing a student's answer to a set of expected answers, the system determines how much information is covered and how to continue the tutorial. Despite the success of LSA in tutoring conversations, the system sometimes has difficulties determining at an early stage whether or not an expectation is covered. A new LSA algorithm significantly improves the precision of AutoTutor's natural ...