Kai R . Larsen | University of Colorado, Boulder (original) (raw)
Conference Presentations by Kai R . Larsen
This hands-on workshop will cover pedagogical strategies related to teaching Automated Machine Le... more This hands-on workshop will cover pedagogical strategies related to teaching Automated Machine Learning (AutoML) for IS majors, non-majors, and MBA students. AutoML is simply machine learning where data cleaning, feature engineering, algorithm selection, hyperparameter tuning, as well as most other steps are done automatically, removing the need for years of training in machine learning and statistics. Forbes Technology Council as well as many others have suggested that 2018 will be the year of automated machine learning, and most major technology companies are feverishly developing AutoML technologies to stay competitive. In three hours, with no programming, we will do what will normally take a data scientist three months. Participants will go through the whole data science process from project objective definition, acquisition and exploration of data, modeling of the data, interpreting and communicating results, and implementing the solution. We will end with a discussion of approaches for teaching AutoML. The workshop is appropriate for faculty who have no previous experience with machine learning as well as experienced machine learning researchers who have not been exposed to AutoML. The instructor has won both college-wide and university-wide teaching awards and has taught ML for a decade and AutoML for two years at both the undergraduate and graduate level, and his book entitled Automated Machine Learning for Business is under contract with Oxford University Press. Participants will receive a copy of the book.
Research on sensemaking in organisations and on linguistic relativity suggests that speakers of t... more Research on sensemaking in organisations and on linguistic relativity suggests that speakers of the same language may use this language in different ways to construct social realities at work. We apply a semantic theory of survey response (STSR) to explore such differences in quantitative survey research. Using text analysis algorithms, we have studied how language from three media domains – the business press, PR Newswire and general newspapers – has differential explanatory value for analysing survey responses in organisational behaviour (OB). We projected well-known OB surveys measuring leadership, motivation and outcomes into large text samples from these three media domains significantly different impacts on survey responses. Business press language was best in explaining leadership-related items, PR language best at explaining organizational results and " ordinary " newspaper language seemed to explain the relationship among motivation items.
Communications of the Association of Information Systems, 2006
Equity across academic disciplines is taken for granted in contemporary business schools. The sta... more Equity across academic disciplines is taken for granted in contemporary business schools. The status of a discipline is crucial for such fairness. One might assume, therefore, that IS scholars are treated fairly during tenure and promotion processes when compared to scholars from other business school academic fields. In fact, this may not be the case. The playing field used by business academic disciplines may not be level. This study addresses three questions related to this issue. The first asks whether there is a level playing field for publication among the various business disciplines, and second, assuming an unlevel playing field, what are the relative productivity differences between dissemination of scientific results among these disciplines? The third question is how could the playing field be leveled, assuming it is not at the present time. To answer these questions, existing data sources were tapped, one of these containing well over 18,000 data points. Further, original data was gathered from U.S. business schools, and all the data was analyzed in relation to AACSB data on the relative sizes of business school disciplines. Given our finding that the playing field is not level, the differences between the IS discipline and four other disciplines – Accounting, Finance, Management, and Marketing – are examined, and the consequences of the disadvantage to the IS discipline are discussed. The article concludes with recommendations of actions to level the field, and these are presented as a challenge to leaders in the Information Systems discipline.
PLOS One, Sep 3, 2014
Some disciplines in the social sciences rely heavily on collecting survey responses to detect emp... more Some disciplines in the social sciences rely heavily on collecting survey responses to detect empirical relationships among variables. We explored whether these relationships were a priori predictable from the semantic properties of the survey items, using language processing algorithms which are now available as new research methods. Language processing algorithms were used to calculate the semantic similarity among all items in state-of-the-art surveys from Organisational Behaviour research. These surveys covered areas such as transformational leadership, work motivation and work outcomes. This information was used to explain and predict the response patterns from real subjects. Semantic algorithms explained 60–86% of the variance in the response patterns and allowed remarkably precise prediction of survey responses from humans, except in a personality test. Even the relationships between independent and their purported dependent variables were accurately predicted. This raises concern about the empirical nature of data collected through some surveys if results are already given a priori through the way subjects are being asked. Survey response patterns seem heavily determined by semantics. Language algorithms may suggest these prior to administering a survey. This study suggests that semantic algorithms are becoming new tools for the social sciences, opening perspectives on survey responses that prevalent psychometric theory cannot explain.arnulf@bi.no. Due to copyright issues affecting parts of the surveys used (the MLQ), we can only share the semantic values and the survey responses, but not the survey item wordings, which are the property of Mindgarden Inc. However, the items are purchasable from Mindgarden.com, and we used the form ''MLQ 360 Form 5X Short''. Any interested researchers should be able to use our data freely, but need to take proper responsibility in their use of the MLQ items. We will therefore avoid a publicly accessible repository, but promise to offer interested parties all information necessary to reproduce our findings or apply them to other materials. The prerequisite is a clarifying dialogue with the corresponding author.
Human Resource Development Quarterly, 2018
This is a methodological presentation of the relationship between semantics and survey statistics... more This is a methodological presentation of the relationship between
semantics and survey statistics in human resource development
(HRD) research. This study starts with an introduction to the
semantic theory of survey response (STSR) and proceeds by offering
a guided approach to conducting such analyses. The reader is
presented with two types of semantic algorithms and a brief overview
of how they are calculated and how they can be accessed by
interested researchers. Subsequently, we use semantic data to reanalyze
a previously published study on the relationships between
perceptions of a trainee program, intrinsic motivation, and work
outcomes. The semantic algorithms can explain between 31 and
55% of the variation in the observed correlations. This article
shows how the statistical models originally used to explore the survey
data can be replicated using semantics either alone or as an
identifiable source of variation in the data. All the steps are presented
in detail, and the datasets as well as the statistical syntax
necessary to perform the analyses are made available to the
readers. Implications for methodology and the improvement of predictive
validity in HRD research are discussed.
This panel will describe the diverse ways in which information technology (IT) firms organize for... more This panel will describe the diverse ways in which information technology (IT) firms organize for and manage disruptive technologies to enhance company market value. The panel will include two practitioners offering very different perspectives and strategies on how their respective IT firms create value using disruptive technologies. First, Aaron French (founder of Sociabile, a social networking site startup) and Ben Pace (Chief Financial Officer of C-Spire) describe their firm’s strategies for digital disruption. Next, Kai Larsen will offer his perspective on how he anticipates that a specific emerging disruptive technology (automated machine learning) may be exploited to create value for IT firms. Finally, Stacie Petter (Baylor University) will talk about the complexities academics face developing frameworks and other artifacts of common understanding in a sector with so much strategic heterogeneity.
In this paper, we use Latent Semantic Analysis to explore the design battles in smartphones. Usin... more In this paper, we use Latent Semantic Analysis to explore the design battles in smartphones. Using newspaper coverage from 1992-2012, we build a semantic model of the media coverage to identify article clusters. Cluster membership gives us visibility into trends in coverage over the course of the study. We find that five distinct periods can be identified. Some unique characteristics of this market lead us to develop new propositions about design battles.
Assessing the similarity of proposed theoretical constructs to each other and those previously kn... more Assessing the similarity of proposed theoretical constructs to each other and those previously known and studied is imperative in theoretical research. In this paper we turn to theories of similarity judgement from cognitive psychology for the understanding of the process of establishing similarity between one or more constructs. Then, guided by these theories, we develop an integrated method for automatic detection of similar constructs. We apply the method to constructs from leading IS journals, a major journal in psychology, and the interdisciplinary overlap between the IS and psychology constructs. Our paper contributes to methodology of research, design science research, behavioral IS research, text mining and information retrieval theory and practice, IS research on ontology alignment and schema matching as well as cognitive theories of similarity in psychology.
IS, among other social sciences, have moved from a relative paucity of theories about social phen... more IS, among other social sciences, have moved from a relative paucity of theories about social phenomenon to a a state of multiple, overlapping, and overly narrow theories. We offer three Modes for theory Integration that will enable researchers to better integrate theories and processes into internally coherent models within theories, across theories and between fields. The basis for integration are semantic similarity, nomological congruence and physical/functional/causal overlap. We develop a framework that will justify propositions for theory integration that can subsequently be tested for correspondence to real world phenomenon.
This panel addresses the divergent expectations of the IS community on new directions in the genr... more This panel addresses the divergent expectations of the IS community on new directions in the genre of standalone literature reviews (SLRs), which synthesize and interpret a body of literature within a domain. The primary purpose of the panel is to spur a controversial discussion on a) what the IS field can learn from other fields and where it should be specific, b) how the IS field should move forward to foster the genre of SLRs, and c) what are the best approaches to train doctoral IS students in publishing SLRs. The panelists initiate a vital discussion on where the IS field can profit from considering approaches of other fields and where it should focus on IS specifics that are not shared by other fields, which SLR processes are of particular importance for the IS field, and whether and how doctoral IS students should be trained in writing SLRs.
Sage Open, 2018
The semantic theory of survey responses (STSR) proposes that the prime source of statistical cova... more The semantic theory of survey responses (STSR) proposes that the prime source of statistical covariance in survey data is the degree of semantic similarity (overlap of meaning) among the items of the survey. Because semantic structures are possible to estimate using digital text algorithms, it is possible to predict the response structures of Likert-type scales a priori. The present study applies STSR in an experimental way by computing real survey responses using such semantic information. A sample of 153 randomly chosen respondents to the Multifactor Leadership Questionnaire (MLQ) was used as target. We developed an algorithm based on unfolding theory, where data from digital text analysis of the survey items served as input. Upon deleting progressive numbers (from 20%-95%) of the real responses, we let the algorithm replace these with simulated ones, and then compared the simulated datasets with the real ones. The simulated scores displayed sum score levels, alphas, and factor structures highly resembling their real origins even if up to 86% were simulated. In contrast, this was not the case when the same algorithm was operating without access to semantic information. The procedure was briefly repeated on a different measurement instrument and a different sample. This not only yielded similar results but also pointed to need for further theoretical and practical developments. Our study opens for experimental research on the effect of semantics on survey responses using computational procedures.
Behavior Research Methods, Jan 12, 2018
The traditional understanding of data from Likert scales is that the quantifications involved are... more The traditional understanding of data from Likert scales is that the quantifications involved are resulting from measures of attitude strength. Building on our recently proposed a semantic theory of survey response (STSR), we claim that survey responses tap two different sources; a mixture of attitudes plus the semantic structure of the survey. Exploring the degree to which individual responses are influenced by semantics we hypothesize that information about attitude strength is actually filtered out as noise in the commonly used correlation matrix. Applying a linguistic algorithm termed MI, we separated semantics from attitude strength in four samples of altogether 7781 respondents covering 8187 pairs of items. The surveys spanned commonly used organizational behavior surveys on leadership and motivation, as well as a short 5-factor personality inventory, the NEO-FFI. As hypothesized, the findings indicate that levels of attitude strength did not contribute uniquely to the correlation matrices except for in the NEO. This is contradictive to the prevalent understanding of what survey data represent. This problem has been overlooked, possibly contributing to reduced predictive value from research relying on Likert scale data. Theory The " Semantic theory of survey response " (STSR): • Claims that commonly applied statistics to survey responses are determined by the semantics, NOT attitude strength. • This is contrary to most assumptions since Likert (1932). • This study demonstrates how information about attitude strength is filtered out as noise from individual responses in correlation matrices. • Purely semantic values (with no knowledge of attitude strength) explain between 65% and 85% of the variation in the correlation matrices (Arnulf, Larsen, Martinsen, & Bong, 2014). • The present study splits individual response patterns in two components: one component maximizing the attitude strength, the other is mainly constituted by the semantic relations common to all respondents. • The component representing attitude strength is the " item-product matrix " : Products of multiplying all scores of the individual respondents with each other. • The component representing semantics is the " item-product matrix " , where all responses are subtracted from each other and the absolute values are retained. Results: Attitude strength is strongly related to co-products, but the distances determine the observed correlaton matrices: Discussion The purpose of this study was to show that individual responses to survey scales carry two different types of information: Attitude strength (an emotional/motivational component) and semantic similarity between item (a cognitive component). As predicted, the inter-item distance matrices seemed to represent the semantic relationships, whereas the co-products carried more information about attitude strength. When used to predict the observed correlation matrix in hierarchical regression, the contribution of the co-products was reduced to almost nothing. Semantics as computed by algorithms and the related information in the item distance matrices were the most important components explaining the correlation matrices in all samples. Only in the case of the NEO were the semantic algorithms ineffective in predicting the patterns. The findings are disturbing because of the emphasis on correlation and covariance matrices in construct validation. If the information in these structures are not measuring attitude strength but merely semantic relationships, it implies that the object of survey methods – attitude strength-may actually be filtered out as noise. The object matter of statistical modelling is not attitude strength but linguistics. The observed statistics do not indicate latent constructs as defined by Borsboom (2008), and run counter to the original purpose of scale constructions as developed by Rensis Likert (1932). While the response patterns of individuals carry more information than semantics, only the semantic information seems to carry over into the correlation matrix for the whole sample. This is not inevitable, as can be seen in the case of the NEO-FFI. Our study shows that additional assumptions must be made to capture attitude strength as information in Likert scales analyzed as correlations or covariances. Methods: • We used four different datasets: (1) 1220 respondents to a measure of transformational leadership (MLQ: Avolio, Bass & Jung 1995), (2) 255 respondents to transformational leadership and various scales on motivation and work outcomes, (3) 981 respondents to various measures on transformational leadership, LMX leadership, 2-factor theories of leadership and divers motivational scales, and (4) 5332 respondents to the NEO-FFI, a short version of a 5-factor personality scale. • For all respondents in all datasets, individual co-product and distance matrices were computed. The individual values and the average values over all item pairs were compared to two criteria: Similarity with the observed sample correlation matrices, and with the purely semantic matrix created by natural language analysis algorithms (latent semantic analysis and the MI algorithm). Transformational leadership scores predicted by demographics. personality and the individual response matrices: Adjusted R2 = .78
Research on sensemaking in organisations and on linguistic relativity suggests that speakers of t... more Research on sensemaking in organisations and on linguistic relativity suggests that speakers of the same language may use this language in different ways to construct social realities at work. We apply a semantic theory of survey response (STSR) to explore such differences in quantitative survey research. Using text analysis algorithms, we have studied how language from three media domains – the business press, PR Newswire and general newspapers – has differential explanatory value for analysing survey responses in organisational behaviour (OB). We projected well-known OB surveys measuring leadership, motivation and outcomes into large text samples from these three media domains significantly different impacts on survey responses. Business press language was best in explaining leadership-related items, PR language best at explaining organizational results and " ordinary " newspaper language seemed to explain the relationship among motivation items.
Theory identity is a fundamental problem for researchers seeking to determine theory quality, cre... more Theory identity is a fundamental problem for researchers seeking to determine theory quality, create theory ontologies and taxonomies, or perform focused theory-specific reviews and meta-analyses. We demonstrate a novel machine-learning approach to theory identification based on citation data and article features. The multidisciplinary ecosystem of articles which cite a theory's originating paper is created and refined into the network of papers predicted to contribute to, and thus identify, a specific theory. We provide a 'proof-of-concept' for a highly-cited theory. Implications for cross-disciplinary theory integration and the identification of theories for a rapidly expanding scientific literature are discussed.
We propose an automatic construct-level citation extraction system (ACCE) to refine citations fro... more We propose an automatic construct-level citation extraction system (ACCE) to refine citations from the paper level to the construct level. This paper follows the design science paradigm (Hevner et al. 2004; March and Smith 1995; Nunamaker et al. 1991). The remaining sections are organized as follows. We first analyze the tasks involved in extracting construct-level citations, and identify three characteristics as our design guidelines, which present a need for IE techniques. In the related work section, we introduce the historic origin of IE techniques and major types of IE systems. Next, we review state-of-art relation extraction techniques, and, in particular, the support vector machine (SVM) algorithm and its kernel functions. After examining related work, we propose a framework for our automatic construct-level citation extraction system and describe in detail the three important steps of ACCE—citation extraction, construct extraction, and referring relation extraction. Using these steps, we illustrate the applicability of our IT artifact by applying it to a dataset consisting of 224 publications from two top journals in the Information Systems field. We then evaluate system performance in comparison to human extraction. Finally, we summarize the results and discuss future research directions in the conclusion section.
We applied internomological network (INN) analysis, a novel approach that classifies constructs b... more We applied internomological network (INN) analysis, a novel approach that classifies constructs based on their underlying meaning, to constructs from the National Cancer Institute (NCI)’s Grid-enabled measures (GEM) database. Seven expert raters sorted these constructs using Michie’s Theoretical Domains Framework (TDF). Our objectives were to evaluate the TDF domains, examine GEM’s domain coverage, and to enhance the trustworthiness of research by creating a gold standard for natural language processing under the auspices of the federally funded INN project. Background:Identifying constructs that accurately describe the phenomenon for study remains a challenge as long as the nomenclature of even related theories remains unconnected. The INN method assists with the challenge of differently named constructs with overlapping measures. Results from the first use of INN demonstrated its utility to clarify the meaning of constructs in transdisciplinary scientific fields (Cook et al. 2012). As an extension, the current study applied INN to TDF. The TDF is an integrative framework with 14 domains classifying constructs in psychosocial research (Cane et al. 2012). We used the TDF to sort 238 constructs in NCI's GEM database, an interactive website enabling research harmonization by listing agreed-upon measures and constructs. Methods:Seven nursing faculty sorted all GEM constructs into domains using a 3-step process. First, a single expert categorized each construct. Second, all experts reviewed initial assignments and either agreed or proposed another domain. Third, we resolved discrepancies in discussion. Nineteen domains (14 original plus 5 proposed by the group) were used in the final construct sort. Results: Agreement about initial domain assignments (kappa) varied from .43 (goals) to .93 (emotion). Overall kappa was .72, which is acceptable but not ideal. Domains with more constructs had higher agreement, which may indicate more clarity about their definitions or greater applicability to the cancer-related constructs in GEM. In the third step, we achieved 100% agreement on a domain assignment for each construct. TDF domains most useful in classifying GEM constructs were (1) environmental context and resources, (2) emotion, and (3) behavioral regulation. One domain (rewards) did not apply to any constructs in GEM. Thirteen GEM constructs were found to have meanings identical to other constructs in the database under different names. Implications: Despite limitations of the expert-consensus method, our success sorting 238 constructs into 19 domains with moderate reliability suggests commonalities that point to the meaning of constructs in cancer research. In future studies, it will be interesting to compare current results with those from natural language processing algorithms. Additional methods for identifying construct similarity and synonymy are likely to improve classification results and maximize the credibility of research findings.
This hands-on workshop will cover pedagogical strategies related to teaching Automated Machine Le... more This hands-on workshop will cover pedagogical strategies related to teaching Automated Machine Learning (AutoML) for IS majors, non-majors, and MBA students. AutoML is simply machine learning where data cleaning, feature engineering, algorithm selection, hyperparameter tuning, as well as most other steps are done automatically, removing the need for years of training in machine learning and statistics. Forbes Technology Council as well as many others have suggested that 2018 will be the year of automated machine learning, and most major technology companies are feverishly developing AutoML technologies to stay competitive. In three hours, with no programming, we will do what will normally take a data scientist three months. Participants will go through the whole data science process from project objective definition, acquisition and exploration of data, modeling of the data, interpreting and communicating results, and implementing the solution. We will end with a discussion of approaches for teaching AutoML. The workshop is appropriate for faculty who have no previous experience with machine learning as well as experienced machine learning researchers who have not been exposed to AutoML. The instructor has won both college-wide and university-wide teaching awards and has taught ML for a decade and AutoML for two years at both the undergraduate and graduate level, and his book entitled Automated Machine Learning for Business is under contract with Oxford University Press. Participants will receive a copy of the book.
Research on sensemaking in organisations and on linguistic relativity suggests that speakers of t... more Research on sensemaking in organisations and on linguistic relativity suggests that speakers of the same language may use this language in different ways to construct social realities at work. We apply a semantic theory of survey response (STSR) to explore such differences in quantitative survey research. Using text analysis algorithms, we have studied how language from three media domains – the business press, PR Newswire and general newspapers – has differential explanatory value for analysing survey responses in organisational behaviour (OB). We projected well-known OB surveys measuring leadership, motivation and outcomes into large text samples from these three media domains significantly different impacts on survey responses. Business press language was best in explaining leadership-related items, PR language best at explaining organizational results and " ordinary " newspaper language seemed to explain the relationship among motivation items.
Communications of the Association of Information Systems, 2006
Equity across academic disciplines is taken for granted in contemporary business schools. The sta... more Equity across academic disciplines is taken for granted in contemporary business schools. The status of a discipline is crucial for such fairness. One might assume, therefore, that IS scholars are treated fairly during tenure and promotion processes when compared to scholars from other business school academic fields. In fact, this may not be the case. The playing field used by business academic disciplines may not be level. This study addresses three questions related to this issue. The first asks whether there is a level playing field for publication among the various business disciplines, and second, assuming an unlevel playing field, what are the relative productivity differences between dissemination of scientific results among these disciplines? The third question is how could the playing field be leveled, assuming it is not at the present time. To answer these questions, existing data sources were tapped, one of these containing well over 18,000 data points. Further, original data was gathered from U.S. business schools, and all the data was analyzed in relation to AACSB data on the relative sizes of business school disciplines. Given our finding that the playing field is not level, the differences between the IS discipline and four other disciplines – Accounting, Finance, Management, and Marketing – are examined, and the consequences of the disadvantage to the IS discipline are discussed. The article concludes with recommendations of actions to level the field, and these are presented as a challenge to leaders in the Information Systems discipline.
PLOS One, Sep 3, 2014
Some disciplines in the social sciences rely heavily on collecting survey responses to detect emp... more Some disciplines in the social sciences rely heavily on collecting survey responses to detect empirical relationships among variables. We explored whether these relationships were a priori predictable from the semantic properties of the survey items, using language processing algorithms which are now available as new research methods. Language processing algorithms were used to calculate the semantic similarity among all items in state-of-the-art surveys from Organisational Behaviour research. These surveys covered areas such as transformational leadership, work motivation and work outcomes. This information was used to explain and predict the response patterns from real subjects. Semantic algorithms explained 60–86% of the variance in the response patterns and allowed remarkably precise prediction of survey responses from humans, except in a personality test. Even the relationships between independent and their purported dependent variables were accurately predicted. This raises concern about the empirical nature of data collected through some surveys if results are already given a priori through the way subjects are being asked. Survey response patterns seem heavily determined by semantics. Language algorithms may suggest these prior to administering a survey. This study suggests that semantic algorithms are becoming new tools for the social sciences, opening perspectives on survey responses that prevalent psychometric theory cannot explain.arnulf@bi.no. Due to copyright issues affecting parts of the surveys used (the MLQ), we can only share the semantic values and the survey responses, but not the survey item wordings, which are the property of Mindgarden Inc. However, the items are purchasable from Mindgarden.com, and we used the form ''MLQ 360 Form 5X Short''. Any interested researchers should be able to use our data freely, but need to take proper responsibility in their use of the MLQ items. We will therefore avoid a publicly accessible repository, but promise to offer interested parties all information necessary to reproduce our findings or apply them to other materials. The prerequisite is a clarifying dialogue with the corresponding author.
Human Resource Development Quarterly, 2018
This is a methodological presentation of the relationship between semantics and survey statistics... more This is a methodological presentation of the relationship between
semantics and survey statistics in human resource development
(HRD) research. This study starts with an introduction to the
semantic theory of survey response (STSR) and proceeds by offering
a guided approach to conducting such analyses. The reader is
presented with two types of semantic algorithms and a brief overview
of how they are calculated and how they can be accessed by
interested researchers. Subsequently, we use semantic data to reanalyze
a previously published study on the relationships between
perceptions of a trainee program, intrinsic motivation, and work
outcomes. The semantic algorithms can explain between 31 and
55% of the variation in the observed correlations. This article
shows how the statistical models originally used to explore the survey
data can be replicated using semantics either alone or as an
identifiable source of variation in the data. All the steps are presented
in detail, and the datasets as well as the statistical syntax
necessary to perform the analyses are made available to the
readers. Implications for methodology and the improvement of predictive
validity in HRD research are discussed.
This panel will describe the diverse ways in which information technology (IT) firms organize for... more This panel will describe the diverse ways in which information technology (IT) firms organize for and manage disruptive technologies to enhance company market value. The panel will include two practitioners offering very different perspectives and strategies on how their respective IT firms create value using disruptive technologies. First, Aaron French (founder of Sociabile, a social networking site startup) and Ben Pace (Chief Financial Officer of C-Spire) describe their firm’s strategies for digital disruption. Next, Kai Larsen will offer his perspective on how he anticipates that a specific emerging disruptive technology (automated machine learning) may be exploited to create value for IT firms. Finally, Stacie Petter (Baylor University) will talk about the complexities academics face developing frameworks and other artifacts of common understanding in a sector with so much strategic heterogeneity.
In this paper, we use Latent Semantic Analysis to explore the design battles in smartphones. Usin... more In this paper, we use Latent Semantic Analysis to explore the design battles in smartphones. Using newspaper coverage from 1992-2012, we build a semantic model of the media coverage to identify article clusters. Cluster membership gives us visibility into trends in coverage over the course of the study. We find that five distinct periods can be identified. Some unique characteristics of this market lead us to develop new propositions about design battles.
Assessing the similarity of proposed theoretical constructs to each other and those previously kn... more Assessing the similarity of proposed theoretical constructs to each other and those previously known and studied is imperative in theoretical research. In this paper we turn to theories of similarity judgement from cognitive psychology for the understanding of the process of establishing similarity between one or more constructs. Then, guided by these theories, we develop an integrated method for automatic detection of similar constructs. We apply the method to constructs from leading IS journals, a major journal in psychology, and the interdisciplinary overlap between the IS and psychology constructs. Our paper contributes to methodology of research, design science research, behavioral IS research, text mining and information retrieval theory and practice, IS research on ontology alignment and schema matching as well as cognitive theories of similarity in psychology.
IS, among other social sciences, have moved from a relative paucity of theories about social phen... more IS, among other social sciences, have moved from a relative paucity of theories about social phenomenon to a a state of multiple, overlapping, and overly narrow theories. We offer three Modes for theory Integration that will enable researchers to better integrate theories and processes into internally coherent models within theories, across theories and between fields. The basis for integration are semantic similarity, nomological congruence and physical/functional/causal overlap. We develop a framework that will justify propositions for theory integration that can subsequently be tested for correspondence to real world phenomenon.
This panel addresses the divergent expectations of the IS community on new directions in the genr... more This panel addresses the divergent expectations of the IS community on new directions in the genre of standalone literature reviews (SLRs), which synthesize and interpret a body of literature within a domain. The primary purpose of the panel is to spur a controversial discussion on a) what the IS field can learn from other fields and where it should be specific, b) how the IS field should move forward to foster the genre of SLRs, and c) what are the best approaches to train doctoral IS students in publishing SLRs. The panelists initiate a vital discussion on where the IS field can profit from considering approaches of other fields and where it should focus on IS specifics that are not shared by other fields, which SLR processes are of particular importance for the IS field, and whether and how doctoral IS students should be trained in writing SLRs.
Sage Open, 2018
The semantic theory of survey responses (STSR) proposes that the prime source of statistical cova... more The semantic theory of survey responses (STSR) proposes that the prime source of statistical covariance in survey data is the degree of semantic similarity (overlap of meaning) among the items of the survey. Because semantic structures are possible to estimate using digital text algorithms, it is possible to predict the response structures of Likert-type scales a priori. The present study applies STSR in an experimental way by computing real survey responses using such semantic information. A sample of 153 randomly chosen respondents to the Multifactor Leadership Questionnaire (MLQ) was used as target. We developed an algorithm based on unfolding theory, where data from digital text analysis of the survey items served as input. Upon deleting progressive numbers (from 20%-95%) of the real responses, we let the algorithm replace these with simulated ones, and then compared the simulated datasets with the real ones. The simulated scores displayed sum score levels, alphas, and factor structures highly resembling their real origins even if up to 86% were simulated. In contrast, this was not the case when the same algorithm was operating without access to semantic information. The procedure was briefly repeated on a different measurement instrument and a different sample. This not only yielded similar results but also pointed to need for further theoretical and practical developments. Our study opens for experimental research on the effect of semantics on survey responses using computational procedures.
Behavior Research Methods, Jan 12, 2018
The traditional understanding of data from Likert scales is that the quantifications involved are... more The traditional understanding of data from Likert scales is that the quantifications involved are resulting from measures of attitude strength. Building on our recently proposed a semantic theory of survey response (STSR), we claim that survey responses tap two different sources; a mixture of attitudes plus the semantic structure of the survey. Exploring the degree to which individual responses are influenced by semantics we hypothesize that information about attitude strength is actually filtered out as noise in the commonly used correlation matrix. Applying a linguistic algorithm termed MI, we separated semantics from attitude strength in four samples of altogether 7781 respondents covering 8187 pairs of items. The surveys spanned commonly used organizational behavior surveys on leadership and motivation, as well as a short 5-factor personality inventory, the NEO-FFI. As hypothesized, the findings indicate that levels of attitude strength did not contribute uniquely to the correlation matrices except for in the NEO. This is contradictive to the prevalent understanding of what survey data represent. This problem has been overlooked, possibly contributing to reduced predictive value from research relying on Likert scale data. Theory The " Semantic theory of survey response " (STSR): • Claims that commonly applied statistics to survey responses are determined by the semantics, NOT attitude strength. • This is contrary to most assumptions since Likert (1932). • This study demonstrates how information about attitude strength is filtered out as noise from individual responses in correlation matrices. • Purely semantic values (with no knowledge of attitude strength) explain between 65% and 85% of the variation in the correlation matrices (Arnulf, Larsen, Martinsen, & Bong, 2014). • The present study splits individual response patterns in two components: one component maximizing the attitude strength, the other is mainly constituted by the semantic relations common to all respondents. • The component representing attitude strength is the " item-product matrix " : Products of multiplying all scores of the individual respondents with each other. • The component representing semantics is the " item-product matrix " , where all responses are subtracted from each other and the absolute values are retained. Results: Attitude strength is strongly related to co-products, but the distances determine the observed correlaton matrices: Discussion The purpose of this study was to show that individual responses to survey scales carry two different types of information: Attitude strength (an emotional/motivational component) and semantic similarity between item (a cognitive component). As predicted, the inter-item distance matrices seemed to represent the semantic relationships, whereas the co-products carried more information about attitude strength. When used to predict the observed correlation matrix in hierarchical regression, the contribution of the co-products was reduced to almost nothing. Semantics as computed by algorithms and the related information in the item distance matrices were the most important components explaining the correlation matrices in all samples. Only in the case of the NEO were the semantic algorithms ineffective in predicting the patterns. The findings are disturbing because of the emphasis on correlation and covariance matrices in construct validation. If the information in these structures are not measuring attitude strength but merely semantic relationships, it implies that the object of survey methods – attitude strength-may actually be filtered out as noise. The object matter of statistical modelling is not attitude strength but linguistics. The observed statistics do not indicate latent constructs as defined by Borsboom (2008), and run counter to the original purpose of scale constructions as developed by Rensis Likert (1932). While the response patterns of individuals carry more information than semantics, only the semantic information seems to carry over into the correlation matrix for the whole sample. This is not inevitable, as can be seen in the case of the NEO-FFI. Our study shows that additional assumptions must be made to capture attitude strength as information in Likert scales analyzed as correlations or covariances. Methods: • We used four different datasets: (1) 1220 respondents to a measure of transformational leadership (MLQ: Avolio, Bass & Jung 1995), (2) 255 respondents to transformational leadership and various scales on motivation and work outcomes, (3) 981 respondents to various measures on transformational leadership, LMX leadership, 2-factor theories of leadership and divers motivational scales, and (4) 5332 respondents to the NEO-FFI, a short version of a 5-factor personality scale. • For all respondents in all datasets, individual co-product and distance matrices were computed. The individual values and the average values over all item pairs were compared to two criteria: Similarity with the observed sample correlation matrices, and with the purely semantic matrix created by natural language analysis algorithms (latent semantic analysis and the MI algorithm). Transformational leadership scores predicted by demographics. personality and the individual response matrices: Adjusted R2 = .78
Research on sensemaking in organisations and on linguistic relativity suggests that speakers of t... more Research on sensemaking in organisations and on linguistic relativity suggests that speakers of the same language may use this language in different ways to construct social realities at work. We apply a semantic theory of survey response (STSR) to explore such differences in quantitative survey research. Using text analysis algorithms, we have studied how language from three media domains – the business press, PR Newswire and general newspapers – has differential explanatory value for analysing survey responses in organisational behaviour (OB). We projected well-known OB surveys measuring leadership, motivation and outcomes into large text samples from these three media domains significantly different impacts on survey responses. Business press language was best in explaining leadership-related items, PR language best at explaining organizational results and " ordinary " newspaper language seemed to explain the relationship among motivation items.
Theory identity is a fundamental problem for researchers seeking to determine theory quality, cre... more Theory identity is a fundamental problem for researchers seeking to determine theory quality, create theory ontologies and taxonomies, or perform focused theory-specific reviews and meta-analyses. We demonstrate a novel machine-learning approach to theory identification based on citation data and article features. The multidisciplinary ecosystem of articles which cite a theory's originating paper is created and refined into the network of papers predicted to contribute to, and thus identify, a specific theory. We provide a 'proof-of-concept' for a highly-cited theory. Implications for cross-disciplinary theory integration and the identification of theories for a rapidly expanding scientific literature are discussed.
We propose an automatic construct-level citation extraction system (ACCE) to refine citations fro... more We propose an automatic construct-level citation extraction system (ACCE) to refine citations from the paper level to the construct level. This paper follows the design science paradigm (Hevner et al. 2004; March and Smith 1995; Nunamaker et al. 1991). The remaining sections are organized as follows. We first analyze the tasks involved in extracting construct-level citations, and identify three characteristics as our design guidelines, which present a need for IE techniques. In the related work section, we introduce the historic origin of IE techniques and major types of IE systems. Next, we review state-of-art relation extraction techniques, and, in particular, the support vector machine (SVM) algorithm and its kernel functions. After examining related work, we propose a framework for our automatic construct-level citation extraction system and describe in detail the three important steps of ACCE—citation extraction, construct extraction, and referring relation extraction. Using these steps, we illustrate the applicability of our IT artifact by applying it to a dataset consisting of 224 publications from two top journals in the Information Systems field. We then evaluate system performance in comparison to human extraction. Finally, we summarize the results and discuss future research directions in the conclusion section.
We applied internomological network (INN) analysis, a novel approach that classifies constructs b... more We applied internomological network (INN) analysis, a novel approach that classifies constructs based on their underlying meaning, to constructs from the National Cancer Institute (NCI)’s Grid-enabled measures (GEM) database. Seven expert raters sorted these constructs using Michie’s Theoretical Domains Framework (TDF). Our objectives were to evaluate the TDF domains, examine GEM’s domain coverage, and to enhance the trustworthiness of research by creating a gold standard for natural language processing under the auspices of the federally funded INN project. Background:Identifying constructs that accurately describe the phenomenon for study remains a challenge as long as the nomenclature of even related theories remains unconnected. The INN method assists with the challenge of differently named constructs with overlapping measures. Results from the first use of INN demonstrated its utility to clarify the meaning of constructs in transdisciplinary scientific fields (Cook et al. 2012). As an extension, the current study applied INN to TDF. The TDF is an integrative framework with 14 domains classifying constructs in psychosocial research (Cane et al. 2012). We used the TDF to sort 238 constructs in NCI's GEM database, an interactive website enabling research harmonization by listing agreed-upon measures and constructs. Methods:Seven nursing faculty sorted all GEM constructs into domains using a 3-step process. First, a single expert categorized each construct. Second, all experts reviewed initial assignments and either agreed or proposed another domain. Third, we resolved discrepancies in discussion. Nineteen domains (14 original plus 5 proposed by the group) were used in the final construct sort. Results: Agreement about initial domain assignments (kappa) varied from .43 (goals) to .93 (emotion). Overall kappa was .72, which is acceptable but not ideal. Domains with more constructs had higher agreement, which may indicate more clarity about their definitions or greater applicability to the cancer-related constructs in GEM. In the third step, we achieved 100% agreement on a domain assignment for each construct. TDF domains most useful in classifying GEM constructs were (1) environmental context and resources, (2) emotion, and (3) behavioral regulation. One domain (rewards) did not apply to any constructs in GEM. Thirteen GEM constructs were found to have meanings identical to other constructs in the database under different names. Implications: Despite limitations of the expert-consensus method, our success sorting 238 constructs into 19 domains with moderate reliability suggests commonalities that point to the meaning of constructs in cancer research. In future studies, it will be interesting to compare current results with those from natural language processing algorithms. Additional methods for identifying construct similarity and synonymy are likely to improve classification results and maximize the credibility of research findings.
Frontiers in Psychology, Jun 19, 2020
Frontiers in Psychology, 2020
International Journal of Information Management
Transformative artificially intelligent tools, such as ChatGPT, designed to generate sophisticate... more Transformative artificially intelligent tools, such as ChatGPT, designed to generate sophisticated text indistinguishable from that produced by a human, are applicable across a wide range of contexts. The technology presents opportunities as well as, often ethical and legal, challenges, and has the potential for both positive and negative impacts for organisations, society, and individuals. Offering multi-disciplinary insight into some of these, this article brings together 43 contributions from experts in fields such as computer science, marketing, information systems, education, policy, hospitality and tourism, management, publishing, and nursing. The contributors acknowledge ChatGPT’s capabilities to enhance productivity and suggest that it is likely to offer significant gains in the banking, hospitality and tourism, and information technology industries, and enhance business activities, such as management and marketing. Nevertheless, they also consider its limitations, disruptions to practices, threats to privacy and security, and consequences of biases, misuse, and misinformation. However, opinion is split on whether ChatGPT’s use should be restricted or legislated. Drawing on these contributions, the article identifies questions requiring further research across three thematic areas: knowledge, transparency, and ethics; digital transformation of organisations and societies; and teaching, learning, and scholarly research. The avenues for further research include: identifying skills, resources, and capabilities needed to handle generative AI; examining biases of generative AI attributable to training datasets and processes; exploring business and societal contexts best suited for generative AI implementation; determining optimal combinations of human and generative AI for various tasks; identifying ways to assess accuracy of text produced by generative AI; and uncovering the ethical and legal issues in using generative AI across different contexts.
Automated Machine Learning for Business
After preparing your dataset, the business problem should be quite familiar, along with the subje... more After preparing your dataset, the business problem should be quite familiar, along with the subject matter and the content of the dataset. This section is about modeling data, using data to train algorithms to create models that can be used to predict future events or understand past events. The section shows where data modeling fits in the overall machine learning pipeline. Traditionally, we store real-world data in one or more databases or files. This data is extracted, and features and a target (T) are created and submitted to the “Model Data” stage (the topic of this section). Following the completion of this stage, the model produced is examined (Section V) and placed into production. With the model in the production system, present data generated from the real-world environment is inputted into the system. In the example case of a diabetes patient, we enter a new patient’s information electronic health record into the system, and a database lookup retrieves additional data for...
Social Science Research Network, Aug 16, 2018
This hands-on workshop will cover pedagogical strategies related to teaching Automated Machine Le... more This hands-on workshop will cover pedagogical strategies related to teaching Automated Machine Learning (AutoML) for IS majors, non-majors, and MBA students. AutoML is simply machine learning where data cleaning, feature engineering, algorithm selection, hyperparameter tuning, as well as most other steps are done automatically, removing the need for years of training in machine learning and statistics. Forbes Technology Council as well as many others have suggested that 2018 will be the year of automated machine learning, and most major technology companies are feverishly developing AutoML technologies to stay competitive. In three hours, with no programming, we will do what will normally take a data scientist three months. Participants will go through the whole data science process from project objective definition, acquisition and exploration of data, modeling of the data, interpreting and communicating results, and implementing the solution. We will end with a discussion of approaches for teaching AutoML. The workshop is appropriate for faculty who have no previous experience with machine learning as well as experienced machine learning researchers who have not been exposed to AutoML. The instructor has won both college-wide and university-wide teaching awards and has taught ML for a decade and AutoML for two years at both the undergraduate and graduate level, and his book entitled Automated Machine Learning for Business is under contract with Oxford University Press. Participants will receive a copy of the book.
SSRN Electronic Journal, 2020
The construct and instrument development process relies significantly on human judgment in the in... more The construct and instrument development process relies significantly on human judgment in the initial stages of the process, specifically in developing construct definition statements, and in developing instruments with high construct and content validity. Natural language processing techniques can be employed to support human judgment and improve the quality of constructs and instruments employed in research. The paper describes the use of such techniques and presents illustrative results from the use of those techniques. The illustrations support our premise that the use of those techniques can improve the rigor of the process and improve the quality of constructs and instruments employed in research.
Proceedings of the 34th Annual Hawaii International Conference on System Sciences
In Automated Machine Learning for Business, we teach the machine learning process using a new dev... more In Automated Machine Learning for Business, we teach the machine learning process using a new development in data science: automated machine learning. AutoML, when implemented properly, makes machine learning accessible to most people because it removes the need for years of experience in the most arcane aspects of data science, such as the math, statistics, and computer science skills required to become a top contender in traditional machine learning. Anyone trained in the use of AutoML can use it to test their ideas and support the quality of those ideas during presentations to management and stakeholder groups. Because the requisite investment is one semester-long undergraduate course rather than a year in a graduate program, these tools will likely become a core component of undergraduate programs, and over time, even the high school curriculum.
This hands-on workshop will cover pedagogical strategies related to teaching Automated Machine Le... more This hands-on workshop will cover pedagogical strategies related to teaching Automated Machine Learning (AutoML) for IS majors, non-majors, and MBA students. AutoML is simply machine learning where data cleaning, feature engineering, algorithm selection, hyperparameter tuning, as well as most other steps are done automatically, removing the need for years of training in machine learning and statistics. Forbes Technology Council as well as many others have suggested that 2018 will be the year of automated machine learning, and most major technology companies are feverishly developing AutoML technologies to stay competitive. In three hours, with no programming, we will do what will normally take a data scientist three months. Participants will go through the whole data science process from project objective definition, acquisition and exploration of data, modeling of the data, interpreting and communicating results, and implementing the solution. We will end with a discussion of appro...
Journal of the Association for Information Science and Technology, 2019
Academy of Management Proceedings, 2019
Likert-scale surveys are frequently used in cross-cultural studies on leadership. Recent publicat... more Likert-scale surveys are frequently used in cross-cultural studies on leadership. Recent publications using digital text algorithms raise doubt about the source of variation in statistics from such...
Academy of Management Proceedings, 2015
Research on sensemaking in organisations and on linguistic relativity suggests that speakers of t... more Research on sensemaking in organisations and on linguistic relativity suggests that speakers of the same language may use this language in different ways to construct social realities at work. We apply a semantic theory of survey response (STSR) to explore such differences in quantitative survey research. Using text analysis algorithms, we have studied how language from three media domains – the business press, PR Newswire and general newspapers – has differential explanatory value for analysing survey responses in organisational behaviour (OB). We projected well-known OB surveys measuring leadership, motivation and outcomes into large text samples from these three media domains significantly different impacts on survey responses. Business press language was best in explaining leadership-related items, PR language best at explaining organizational results and "ordinary" newspaper language seemed to explain the relationship among motivation items.
Lecture Notes in Computer Science, 2016
The accumulated literature base in the behavioral sciences represents a great source of knowledge... more The accumulated literature base in the behavioral sciences represents a great source of knowledge on human behaviors, and yet the same literature has grown beyond human comprehension. We address this information overload problem by proposing a novel IT artifact --- TheoryOn. Based on the design science paradigm, we identify five design requirements. We first adapt the ontology learning layer cake framework to develop a four-step process --- hypothesis extraction, construct extraction, construct relationship extraction and theory extraction --- to automatically extract integral "parts" of behavioral theories. We then design four functionalities allowing researchers to quickly access synonymous constructs, construct relationships and theoretically related constructs e.g. antecedents and consequents, as well as integrate related theories. To illustrate the applicability and usefulness, we use a dataset of all the relevant behavioral studies from three top journals in Information Systems and Psychology and conduct an A/B test between the prototype TheoryOn system and the EBSCOhost full-text search engine.
2016 49th Hawaii International Conference on System Sciences (HICSS), 2016
SSRN Electronic Journal, 2015
The accumulated literature base in the behavioral sciences represents the IS discipline’s greates... more The accumulated literature base in the behavioral sciences represents the IS discipline’s greatest source of knowledge, and yet the same literature has grown beyond human comprehension. An experiment is conducted showing the inability of experts to retrieve relevant constructs using full-text search. To address this inability to access the body of theoretical behavioral science research we propose a novel IT artifact built on an information extraction approach to nomological network discovery. Based on the design science paradigm we develop a three-step process for extraction and assembly of nomological networks proceeding through article download, hypothesis extraction, variable extraction, and finally to variable integration. Rule-based vs. machine learning algorithms are evaluated and compared to determine the best approach for the extraction steps. A dataset of all the relevant behavioral studies from two top journals in Information Systems and Psychology is used to evaluate the approach in comparison to expert decisions, leading into a discussion of limitations and possible extensions.
Despite extensive research in Information Systems regarding the development, anatomy, evaluation ... more Despite extensive research in Information Systems regarding the development, anatomy, evaluation , characteristics of, and nativeness of theory, the identity of theory remains problematic. Theories develop over time and as the constituent variable and associations are modified, the “true” representation of a theory becomes a salient question. In this research, theory domains for quantitative theories are located and identified in a large-scale nomological net. This metatheoretical approach is demonstrated to provide researchers a method for comparison of the degree of evidentiary support for theory domains, locating areas of theoretical saturation and sparsity, and identifying possible pathways for theory integration and extension.
IFIP International Federation for Information Processing
Encyclopedia of Business Ethics and Society
Nokobit-98, available at http://nokobit. bi. …, 1998
Page 1. NOKOBIT-98 65 Sesjon 2:1a A Network Approach to Delivery of Interdisciplinary Information... more Page 1. NOKOBIT-98 65 Sesjon 2:1a A Network Approach to Delivery of Interdisciplinary Information Science Education Kai R. Larsen and Claire R. McInerney University at Albany, State University of New York ABSTRACT The ...