Natural Language Processing and Social Interaction, Fall 2021

More and more of life is now manifested online, and many of the digital traces that are left by human activity are increasingly recorded in natural-language format. This research-oriented course examines the opportunities for natural language processing to contribute to the analysis and facilitation of socially embedded processes. Possible topics include conversation modeling, analysis of group and sub-group language, language and social relations, persuasion and other causal effects of language.

Click on tabs just above to see information about enrollment/prerequisite policies, administrative info, overall course structure, resources, and so on.

Enrollment Limited to [[PhD and [CS MS] students] who meet the prerequisites]; PhD students not in CS/INFO will receive manual instructor permission to enroll (details to be arranged at lecture). Auditing (either officially or unofficially) is not permitted. These policies are to keep class meetings heavily discussion- and group-research-focused.

Prerequisites All of the following: (1) CS 2110 or equivalent programming experience; (2) a course in artificial intelligence or any relevant subfield (e.g., NLP, information retrieval, machine learning, Cornell CS courses numbered 47xx or 67xx); (3) proficiency with using machine learning tools (e.g., fluency at training an SVM or other classifier, comfort with assessing a classifier’s performance using cross-validation)

Related classes: see Cornell's NLP course list. Also GOVT 3294 Post-Truth Politics COMM 6750 Research Methods for Social Networks and Social Media, COMM 6770 Attitudes and Social Judgment

All prior runnings of CS/INFO 6742: 2019 fall :: 2018 fall :: 2017 fall :: 2016 fall :: 2015 fall :: 2014 fall :: 2013 fall:: 2011 spring

Administrative info

CMS https://cmsx.cs.cornell.edu. Site for submitting assignments, unless otherwise noted. Login with NetID credentials and select CS 6742. You may find this graphically-oriented guide to common operations useful: see how to replace a prior submission; how to tell if CMS successfully received your files; how to form a group.

Course discussion site https://edstem.org/us/courses/8208/discussion (access restricted to enrolled students). Course announcements and Q&A/discussion site. Social interaction and all that, you know.

Office hours and contact info See Prof. Lee's homepage and scroll to the section on Contact and availability info.

Grading Of most interest to is productive research-oriented discussion participation (in class and/or on the course discussion site, interesting research proposals and pilot studies, and a good-faith final research project.

Academic Integrity Academic and scientific integrity compels one to properly attribute to others any work, ideas, or phrasing that one did not create oneself. To do otherwise is fraud.

Certain points deserve emphasis here. In this class, talking to and helping others is strongly encouraged. You may also, with attribution, use the code from other sources. The easiest rule of thumb is, acknowledge the work and contributions and ideas and words and wordings of others. Do not copy or slightly reword portions of papers, Wikipedia articles, textbooks, other students' work, Stack Overflow answers, something you heard from a talk or a conversation or saw on the Internet, or anything else, really, without acknowledging your sources. See "Acknowledging the Work of Others" in The Essential Guide to Academic Integrity at Cornell and http://www.theuniversityfaculty.cornell.edu/AcadInteg/ for more information and useful examples.

This is not to say that you can receive course credit for work that is not your own — e.g., taking someone else's report and putting your name at the top, next to the other person(s)' names. However, violations of academic integrity (e.g., fraud) undergo the academic-integrity hearing process on top of any grade penalties imposed, whereas not following the rules of the assignment “only” risks grade penalties.

Overall course structure

Lecture	Agenda	Pedagogical purpose	Assignments
#1	Course overview		A1 released: pilot empirical study for a research idea based on the given readings.
#2 - #6	Lectures on topics related to the A1 readings	Case studies to explore some topics and research styles find interesting.Get-to-know-you exercises to get everyone familiar and comfortable with each other.
Next block of meetings	Dicussion of proposed projects based on the readings	Practice with fast research-idea generation. Feedback as to what proposals are most interesting, most feasible, etc.	Discussion of student project proposals, based on the readings for that class meeting. Each class meeting involves everyone reading at least one of the two assigned papers and posting a new research proposal based on the reading to the course discussion site. Thoughtfulness and creativity are most important to , but take feasibility into account.
Next block of meetings	Lectures on, potentially, linguistic coordination, linguistic adaptation, influence, persuasion, diffusion, discourse structure, advanced language modeling.	Foundational material	Potentially some assignments based on the lectures.
Remainder of the course	Activities related to course projects	Development of a "full-blown" research project (although time restrictions may limit ambitions). For purposes, "interesting" and "well-thought-out" is more important than "successful".

Resources

Cornell's Passkey for your web browser: "When you’re off-campus, connect to databases, journals and e-books that would otherwise be restricted or hidden behind paywalls through Passkey."
Upcoming conference deadlines: ICWSM 2022: Sep 15 2021 or Jan 15 2022 ::The Web Conference (formerly WWW): Oct 14 2021 (abstract), Oct 21 2021 (full paper) ::ACL 2022: Nov 15 2021 :: NAACL 2022: Jan 15 to ARR ::CSCW 2022: Jan 15 2022 ::SIGDIAL 2022: not yet announced
Paper repositories: Papers With Code :: All ACL conferences, journals, workshops proceedings :: All WWW proceedings :: All CSCW proceedings :: All ICWSM proceedings
ACL wiki of resources: — corpora, datasets, tools, software, lexicons, organized by language
ConvoKit: Cornell Conversational Analysis Toolkit. Includes both tools and conversational datasets.
Books, surveys, and tutorials: Dan Jurafsky and James Martin, 2009:Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition (3rd edition draft chapters and slides) :: Jacob Eisenstein, 2017:A Technical Introduction to Natural Language Processing (book and slides) :: Dirk Hovy, 2020: Text Analysis in Python for Social Scientists (Cornell access) :: Yoav Goldberg, 2017:Neural Network Methods for Natural Language Processing (access via Cornell,JAIR version) :: Cristian Danescu-Niculescu-Mizil and Lillian Lee, 2016.Natural Language Processing for Computational Social Science. Invited Tutorial at NeurIPS. :: Atefeh Farzinder and Diana Inkpen, 2015:NLP for Social Media (access via Cornell, review by Annie Louis) :: Dong Nguyen, A. Seza Doğruöz, Carolyn P. Rosé and Franciska de Jong, 2016:Computational Sociolinguistics: A Survey.Computational Linguistics 42(3):537--593. :: Dirk Hovy and Diyi Yang, 2021: The Importance of Modeling Social Factors of Language: Theory and Practice. NAACL 588--602.
Toolkits, alphabetically:CMU twitter tools (Java) ::ConvoKit (Python) ::CRAN NLP tools (R) ::GATE (Java) ::Gensim (Python) ::Illinois tools (Java?) ::Lingpipe (Java) ::Mallet (Java) ::OpenNLP (Java) ::NLTK (Python) ::SpaCy (Cython) ::Stanford tools (Java) ::VADER (Valence Aware Dictionary and sEntiment Reasoner) (Python)
Pretrained word/sentence embeddings: a list by Sepehr Sameni
NLP at Cornell

#1 Aug 26: Introduction

Assignment A1: Pilot empirical research study. Note the first deadline (of several) on Wed Sep 1, 11:59pm.

Class images, links and handouts

Handout; recording (only available to enrolled students)
Inspirational image: Raphael's The School of Athens
Wikipedia Article for Deletion discussion ("not a vote"); annotated version; Wikipedia essay on arguments to avoid in deletion discussions; notabilia.net visualization of vote dynamics on selected AfD discussions
Poster depicting expansionary ("guestbook") vs. focused ("repeated-engagement") conversation threads

References

Backstrom, Lars, Jon Kleinberg, Lillian Lee, and Cristian Danescu-Niculescu-Mizil. 2013. Characterizing and curating conversation threads: Expansion, focus, volume, re-entry.WSDM, pp. 13–22.
Bryan, Christopher J., Gregory M. Walton, Todd Rogers, and Carol S. Dweck. 2011. Motivating voter turnout by invoking the self.Proceedings of the National Academy of Sciences 108 (31): 12653-12656.
- Followup: failure to replicate: Gerber, Alan S., Gregory A. Huber, Daniel R. Biggers, and David J. Hendry, June 28, 2016. A field experiment shows that subtle linguistic cues might not affect voter behavior. Proceedings of the National Academy of Sciences 113(26): 7112-7117.
- Response to followup: "What is an authentic replication attempt and what is not? Gerber et al.’s paper ... gives us the opportunity to reflect on this issue of longstanding concern to us." Bryan, Christopher J., Gregory M. Walton, and Carol S. Dweck, Oct 18, 2016. Psychologically authentic versus inauthentic replication attempts. Proceedings of the National Academy of Sciences 113(43): E6548.
- Response: "Although we find Bryan et al.’s ... explanation unconvincing, this exchange is well-timed. The original findings have (to our knowledge) never been successfully replicated, and this November provides ample opportunity to test noun vs. verb in the political environment Bryan et al. ... suggest is ideal for producing 11–14 percentage-point effects." Gerber, Alan S., Gregory A. Huber, Daniel R. Biggers, and David J. Hendry, Oct 25, 2016. Reply to Bryan et al.: Variation in context unlikely explanation of nonrobustness of noun versus verb results. Proceedings of the National Academy of Sciences 113(43): E6549--E6550.
- Alan Gerber, Gregory Huber, Albert Fang, 2018. Do Subtle Linguistic Interventions Priming a Social Identity as a Voter Have Outsized Effects on Voter Turnout? Evidence From a New Replication Experiment Political Psychology 39: 925--938.
- Bryan, Christopher J., David S. Yeager, and Joseph M. O’Brien, 2019. Replicator degrees of freedom allow publication of misleading failures to replicate. Proceedings of the National Academy of Sciences 116 (51) 25535--25545.
- Gerber, Alan S., Gregory A. Huber, Albert H. Fang, 2020. Voting behavior is unaffected by subtle linguistic cues: Evidence from a psychologically authentic replication. Behavioural Public Policy, 1--15.

#2 Aug 31: A1 inspiration: Overview of conversations

Assignment A1 finalized. Note the first deadline (of several) on Wed Sep 1, 11:59pm.

Class images, links and handouts

visualization of keep/delete comments in temporal order
Image source: notabilia.net

Recording (only available to enrolled students)
notabilia.net visualization of vote dynamics on selected Wikipedia Article for Deletion discussions

References

Sandra Gonzalez-Bailon, Andreas Kaltenbrunner, and Rafael E. Banchs. 2010. The structure of political discussion networks: A model for the analysis of online deliberation.Journal of Information Technology 25(2): 230–243. [author-posted version]
Jiajun Bao, Junjie Wu, Yiming Zhang, Eshwar Chandrasekharan, David Jurgens, 2021. Conversations Gone Alright: Quantifying and Predicting Prosocial Outcomes in Online Conversations. The Web Conference.
Justine Zhang, Jonathan Chang, Cristian Danescu-Niculescu-Mizil, Lucas Dixon, Yiqing Hua, Dario Taraborelli, Nithum Thain, 2018. Conversations Gone Awry: Detecting Early Signs of Conversational Failure. NAACL: 1350--1361.

#3 Sep 2: Two A1 datasets, alike in dignity

Reminder: try to post a preliminary pilot-study idea/sketch/possibilities/questions on Monday.

Class images, links and handouts

Handout; recording (only available to enrolled students)
Slashdot and the subreddit ChangeMyView front pages
Related work figure (Figure 1) in "Something's Brewing" paper (2019)
Peer moderation in Slashdot according to Wikipedia
Colorized, pseudoline-numbered version of 09_05_25_212203.instancedata.txt from the A1 Slashdot data (produced via http://hilite.me/). Corresponding webpage
Sketch of a ChangeMyView comment tree
Chenhao Tan's curated hedge list, which merges several pre-existing data sources; see README. Appears in Tan, Chenhao and Lillian Lee. 2016.Talk it up or play it down?(Un) expected correlations between (de-) emphasis and recurrence of discussion points in consequential US economic policy meetings. ArXiv Preprint ArXiv:1612.06391.
Harvard General Inquirer lexicon:homepage; documentation about the categories
The LIWC lexicon, 2015 version. A standard reference: Tausczik, Yla R. and James W. Pennebaker. 2010. The psychological meaning of words: LIWC and computerized text analysis methods.Journal of Language and Social Psychology 29(1): 24-54.

References

Dutt, Ritam, Sayan Sinha, Rishabh Joshi, Surya Shekhar Chakraborty, Meredith Riggs, Xinru Yan, Haogang Bao, and Carolyn Rose. 2021. ResPer: Computationally Modelling Resisting Strategies in Persuasive Conversations.EACL, 78–90.
Fernbach, Philip M., Todd Rogers, Craig R. Fox, and Steven A. Sloman, 2013. Political extremism is supported by an illusion of understanding. Psychological Science 24(6): 939--946.
Hessel, Jack, and Lillian Lee. 2019. Something’s Brewing! Early Prediction of Controversy-Causing Posts from Discussion Features. NAACL, 1648--1659.
Kolbert, Elizabeth. 2017. Why Facts Don’t Change Our Minds: New discoveries about the human mind show the limitations of reason. The New Yorker, Books section. [publisher link] [highlighted link, viewable with Cornell NetID login]
Krohn, Rachel, and Tim Weninger. 2019. “Modelling Online Comment Threads from Their Start.” 2019 IEEE International Conference on Big Data (Big Data).
Nyhan, Brendan. 2021. Why the Backfire Effect Does Not Explain the Durability of Political Misperceptions. Proceedings of the National Academy of Sciences 118(15). .
Tan, Chenhao, Vlad Niculae, Cristian Danescu-Niculescu-Mizil, and Lillian Lee, 2016,Winning arguments: Interaction dynamics and persuasion strategies in good-faith online discussions,WWW, pp. 613–624. [ACM link] [ paper "homepage" (paper, slides, data, etc.)]

#4 Sep 7: Language coordination: a "direct linguistic" interaction

Reminder: check Ed Discussions for announcements. And provide thoughts/encouragement to your classmates!
Use Passkey to get access to paywalled content via Cornell.
Toolkits possibly useful for A1: see the "Resources" tab at the top of this page. Note that Cornell's ConvoKit comes with the CMV data.

Class images, links and handouts

New Yorker cartoon showing most business people at a meeting in ridiculous outfits, but one person isn't. Caption: Damn it, Hopkins, didn't you get yesteryad's memo?
Image source: Jack Ziegler, The New Yorker, 06/09/2015. License obtained through The Cartoon Bank

Recording (only available to enrolled students)
SigDIAL 2014 talk slides
Homepage of Vinodkumar Prabhakaran: much work on linguistic correlates of power relationships
Sample oral argument transcript from the US Supreme Court. Quote: "I am attributing rationality to someone who was obviously not doing his job very well".
Sample Wikipedia Request for Adminship

References

Choi, Minje, Luca Maria Aiello, Krisztián Zsolt Varga, and Daniele Quercia. 2020. Ten Social Dimensions of Conversations and Relationships. The Web Conference 1514–25.
Danescu-Niculescu-Mizil, Cristian, Lillian Lee, Bo Pang, and Jon Kleinberg. 2012. Echoes of power: Language effects and power differences in social interaction.WWW, pp. 699--708. [ACM link] [ paper "homepage" (paper, slides, data, etc.)]
Danescu-Niculescu-Mizil, Cristian, Robert West, Dan Jurafsky, Jure Leskovec, and Christopher Potts. 2013. No country for old members: User lifecycle and linguistic change in online communities. WWW, pp. 307--318. Best paper award. [ACM link] [ paper "homepage" ]
Doyle, Gabriel, Amir Goldberg, Sameer B. Srivastava, and Michael C. Frank. 2017.Alignment at work: Accommodation and enculturation in corporate communication. ACL, 604--612.
Nguyen, Viet-An, Jordan Boyd-Graber, Philip Resnik, Deborah A. Cai, Jennifer E. Midberry, and Yuanxin Wang. 2014. Modeling Topic Control to Detect Influence in Conversations Using Nonparametric Topic Models. Machine Learning 95(3): 381–421.
Prabhakaran, Vinodkumar, Ashima Arora, and Owen Rambow. 2014. Staying on topic: An indicator of power in political debates. EMNLP (short Papers), 1481-1486.
Sharma, Eva, and Munmun De Choudhury. 2018. Mental Health Support and its Relationship to Linguistic Accommodation in Online Communities. CHI, 1–13.
Xu, Yang, Jeremy Cole, and David Reitter. 2019. Linguistic Alignment Is Affected More by Lexical Surprisal Rather than Social Power.The Society for Computation in Linguistics, vol 2.

#5 Sep 9: (lecture cancelled: out sick)

Reminder: A1 milestone: post pilot-study idea(s) by tonight, and if grouping, do so on CMS by tomorrow night.

#6 Sep 14: Quick look at settings mentioned last time; some nuts and bolts

Next assignment, "A1 Reflection", released
Reminder: A1 milestone: post project update by tomorrow night

Class images, links and handouts

Why is Mrs. Thatcher Interrupted So Often? Nature title

Recording (only available to enrolled students)
Samples from settings mentioned last time:
- Sample oral argument transcript from the US Supreme Court. Quote: "I am attributing rationality to someone who was obviously not doing his job very well".
- Sample Wikipedia Request for Adminship
Chris Pott's Sentiment analysis tutorial: tokenization, negation

References

Beattie, Geoffrey W., Anne Cutler, and Mark Pearson. 1982. Why Is Mrs Thatcher Interrupted so Often? Nature 300 (December): 744--747. See also Bull and Mayer (1988).
Bull, Peter, and Kate Mayer. 1988. Interruptions in Political Interviews: A Study of Margaret Thatcher and Neil Kinnock.” J. Lang. Soc. Psychol. 7 (1): 35–46.
Bunt, Harry, Volha Petukhova, David Traum, and Jan Alexandersson. 2017. Dialogue Act Annotation with the ISO 24617-2 Standard. In Multimodal Interaction with W3C Standards: Toward Natural User Interfaces to Everything, edited by Deborah A. Dahl, 109–35. Cham: Springer International Publishing.
Feldman, Adam, and Rebecca D. Gill. 2019. Power Dynamics in Supreme Court Oral Arguments: The Relationship between Gender and Justice-to-Justice Interruptions. Justice System Journal 40(3): 173–95.
Gibson, David R. 2005. Opportunistic Interruptions: Interactional Vulnerabilities Deriving From Linearization. Social Psychology Quarterly 68(4): 316–37.
Hawes, Timothy, Jimmy Lin, and Philip Resnik. 2009. Elements of a Computational Model for Multi-Party Discourse: The Turn-Taking Behavior of Supreme Court Justices. Journal of the American Society for Information Science and Technology 60(8): 1607–15.
Jacobi, Tonja, and Dylan Schweers. 2017. Justice, Interrupted: The Effect of Gender, Ideology, and Seniority at Supreme Court Oral Arguments. Virginia Law Review 103: 1379--1496.
Jacobi, Tonja, and Kyle Rozema. 2018. Judicial Conflicts and Voting Agreement: Evidence from Interruptions at Oral Argument. Boston College Law Review 59 (7): 2259–2318.
Johnson, Timothy R, Ryan C Black, and Justin Wedeking. 2009. Pardon the Interruption: An Empirical Analysis of Supreme Court Justices’ Behavior during Oral Arguments. Loyola Law Review 55: 331--351.
Lepp, Haley, and Gina-Anne Levow. 2020. Pardon the Interruption: An Analysis of Gender and Turn-Taking in U.S. Supreme Court Oral Arguments. INTERSPEECH, 1838--1842.
Patton, Dana, and Joseph L. Smith. 2020. Gender, Ideology, and Dominance in Supreme Court Oral Arguments. Journal of Women, Politics & Policy 41 (4): 393–415.
Sullivan, Barry, and Megan Canty. 2015. Interruptions in Search of a Purpose: Oral Argument in the Supreme Court, October Terms 1958-60 and 2010-12. Utah Law Review 2015 (5).

#7 Sep 16: A1 group/individual appointments

Reminder: A1 milestone: submit project report on CMS by Monday night, in-class presentations on Tuesday

#8 Sep 21: A1 class presentations

Reminder: A1R milestone: post slides to Ed Discussions (as a new post) by tonight; post self-reflection part by Thursday night.

Class images, links and handouts

Recording (only available to enrolled students)

#9 Sep 23: Exploring differences between two language samples: "Fightin' Words"

Reminder: A1R self-reflection (main task 1) due tonight; feedback to at least one other group (main task 2) due Monday night.

Class images, links and handouts

The annual death rate is one in six among people who know that the chances of getting killed by lightning are 1 in 7 million.

Image source: https://xkcd.com/795/.

Handout (no recording: audio failed, unfortunately)
Slides (pptx, pdf) adapted from the relevant section of Cristian Danescu-Niculescu-Mizil and Lillian Lee, 2016.Natural Language Processing for Computational Social Science. Invited Tutorial at NIPS.

References

Fredette, Marc and Jean-François Angers. 2002. A new approximation of the posterior distribution of the log-odds ratio.Statistica Neerlandica 56(3): 314-329. [Author's institution link] Attributes the Monroe et al. fact to Chapter 10 of O'Hagan's Kendall's advanced theory of statistic, vol 2b. The little-o analysis of the error appear in section 2.1 of Newson, 2008,Asymptotic distributions of linear combinations of logs of multinomial parameter estimates.Wikipedia attributes the approximation of the confidence interval to a 1988 article that cites a 1978 Biometrika article.
Liberman, Mark. Jan 3, 2016. The case of the disappearing determiners. Language Log blog post.
Latest update is Decreasing definiteness in crime novels, 2018.
Liberman, Mark. Style shifting in student writing assignments (public arguments vs. literature reviews) 2018. The most Kasichoid, Cruzian, Trumpish, and Rubiositous words, 2016.The most Trumpish (and Bushish) words, 2015. Obama's favored (and disfavored) SOTU words, 2014.Draft words (descriptions of white vs black NFL prospects), 2014.Male and female word usage, 2014.
Monroe, Burt L., Michael P. Colaresi, and Kevin M. Quinn. 2008. Fightin' words: Lexical feature selection and evaluation for identifying the content of political conflict. Political Analysis 16(4): 372-403. [alt link]
Yan, Qiushi. 2020. Weighted log odds ratio. In Notes for "Text Mining for R: A Tidy Approach".

Implementations

Convokit implementation, based on prior code from Jack Hessel implementation and Xanda Schofield's visualizer
Hessel, Jack (who took this class!).FightingWords. In Python.
Lim, Kenneth (who took this class!).fightin-words. Compliant with sci-kit learn and distributed by PyPI; borrows (with acknowledgment) from Jack's version.
Marzagão, Thiago. mcq.py. "Because this script processes one file at a time, it can handle corpora that are too large to fit in memory".
Silge, Julia, Alex Hayes, Tyler Schnoebelen. tidylo: Weighted Tidy Log Odds Ratio. In R.

#10 Sep 28: ''Snippet'' propagation and competition (which get at influence)

A2: Initial project proposals to be posted on the course discussion site by Wed Oct 20, 11:59pm.

Class images, links and handouts

Image source: David Malki ! Wondermark 1209: Talk and Awe

Recording (only available to enrolled students)

References

"Special Report With Brit Hume", September 10, 2008: panel-discussion transcript regarding Obama's "lipstick on a pig" utterance
Lerique, Sébastien, and Camille Roth. 2018. The Semantic Drift of Quotations in Blogspace: A Case Study in Short-Term Cultural Evolution. Cognitive Science 42(1): 188–219.
Leskovec, Jure, Lars Backstrom, and Jon Kleinberg. 2009. Meme-tracking and the dynamics of the news cycle.KDD, pp. 497--506. [project homepage]
Liben-Nowell, David, and Jon Kleinberg. Tracing information flow on a global scale using Internet chain-letter data. Proceedings of the National Academy of Sciences 105(12):4633--4638,
Niculae, Vlad, Caroline Suen, Justine Zhang, Cristian Danescu-Niculescu-Mizil, and Jure Leskovec. 2015. QUOTUS: The Structure of Political Media Coverage as Revealed by Quoting Patterns. WWW. [paper "homepage", including talk slides] [data source]
Melumad, Shiri, Robert Meyer, and Yoon Duk Kim. 2021. The Dynamics of Distortion: How Successive Summarization Alters the Retelling of News. Journal of Marketing Research.
Prabhumoye, Shrimai, Samridhi Choudhary, Evangelia Spiliopoulou, Christopher Bogart, Carolyn Penstein Rosé, and Alan W. Black. 2017.Linguistic markers of influence in informal interactions. In the Workshop on Natural Language Processing and Computational Social Science, 53--62.
Rotabi, Rahmtin, Cristian Danescu-Niculescu-Mizil, and Jon Kleinberg. 2017. “Competition and Selection Among Conventions.” In WWW.
Simmons, Matthew P., Lada A. Adamic, and Eytan Adar. 2011. Memes online: Extracted, subtracted, injected, and recollected.ICWSM, pp. 353--360.
Tan, Chenhao, Dallas Card, and Noah A. Smith. 2017. Friendships, Rivalries, and Trysts: Characterizing Relations between Ideas in Texts. ACL (Volume 1: Long Papers), 773–83. [paper homepage, including visualization, blog post, etc.].
Tan, Chenhao, Adrien Friggeri, and Lada A. Adamic. 2016. Lost in propagation? Unfolding news cycles from the source. In ICWSM, 378-387. [paper homepage, including blog post]
Tan, Chenhao, Lillian Lee, and Bo Pang. 2014. The effect of wording on message propagation: Topic-and author-controlled natural experiments on Twitter. InACL, pp. 175--185. [ paper "homepage"]
Tan, Chenhao, Hao Peng, and Noah A. Smith. 2018. ‘You Are No Jack Kennedy’: On Media Selection of Highlights from Presidential Debates.WWW, 945–54. [paper homepage]

#11 Sep 30: Language and communities (I)

A2 (proposals for final project) is due Wed Oct 20 11:59pm. Details forthcoming, but:
- What to submit and what is allowed will be similar to the instructions for Fall 2017. For example, a concrete feasibility test will be required.
- Posting preliminary ideas on Ed Discussions for earlier feedback is encouraged. This also facilitates grouping.
- Lecture 14 (Oct 14) will be (mandatory) group/individual appointments with me to discuss possibilities. Exact schedule TBD. OK if you haven't posted any preliminary ideas at that point, but better to have done so.

Class images, links and handouts

This is the Handmaid's Tale conversation. THAT'S the Westworld conversation
Image by Peter Sipress. Licensed from the Cartoon Bank

Recording (only available to enrolled students)
SIGDIAL 2014 slides on language adaptation, again

References

Chancellor, Stevie, Jessica Annette Pater, Trustin Clear, Eric Gilbert, and Munmun De Choudhury. 2016. #thyghgapp: Instagram Content Moderation and Lexical Variation in Pro-Eating Disorder Communities.CSCW 1201–13.
Danescu-Niculescu-Mizil, Cristian, Robert West, Dan Jurafsky, Jure Leskovec, and Christopher Potts. 2013. No country for old members: User lifecycle and linguistic change in online communities. WWW, pp. 307--318. Best paper award. [ACM link] [ paper "homepage" ]
Hamilton, William, Justine Zhang, Cristian Danescu-Niculescu-Mizil, Dan Jurafsky, and Jure Leskovec. 2017. Loyalty in Online Communities. ICWSM: 540–43. .
Tan, Chenhao, and Lillian Lee. 2015. All Who Wander: On the Prevalence and Characteristics of Multi-Community Engagement. WWW 1056–66. [paper homepage]
Tran, Trang, and Mari Ostendorf. 2016. Characterizing the Language of Online Communities and Its Relation to Community Reception. EMNLP, 1030–35.
Zhang, Justine, William Hamilton, Cristian Danescu-Niculescu-Mizil, Dan Jurafsky, and Jure Leskovec. 2017. “Community Identity and User Engagement in a Multi-Community Landscape.” ICWSM: 377–86. [arxiv version has some changes]

#12 Oct 5: Language and Communities (II): "Norms"

Class images, links and handouts

example of Singlish with lots of code switching

Image credit: Renae Cheng, 10 Bizarre Things Singaporeans Do That The Rest Of The World Won't Understand, 2021.

Recording (only available to enrolled students)

References

Calvillo, Jesús, Le Fang, Jeremy Cole, and David Reitter. 2020. “Surprisal Predicts Code-Switching in Chinese-English Bilingual Text.”EMNLP, 4029–39.
Chancellor, Stevie, Andrea Hu, and Munmun De Choudhury. 2018. Norms Matter: Contrasting Social Support Around Behavior Change in Online Weight Loss Communities. CHI.
Chandrasekharan, Eshwar, Mattia Samory, Shagun Jhaver, Hunter Charvat, Amy Bruckman, Cliff Lampe, Jacob Eisenstein, and Eric Gilbert. 2018. The Internet’s Hidden Rules: An Empirical Study of Reddit Norm Violations at Micro, Meso, and Macro Scales. Proceedings of the ACM on Human-Computer Interaction 2 (CSCW): 32:1-32:25.
Chua, Huikai. 2021. Stylistic Approaches to Predicting Reddit Popularity in Diglossia. ACL-IJCNLP Student Research Workshop, 93–100.
KhudaBukhsh, Ashiqur R., Shriphani Palakodety, and Jaime G. Carbonell. 2020. Harnessing Code Switching to Transcend the Linguistic Barrier. IJCAI 4366–74.
King, Gary, Jennifer Pan, and Margaret E. Roberts. May 2013. How censorship in China allows government criticism but silences collective expression. American Political Science Review 107(02): 326-343. [paper homepage]
Nguyen, D.ong, Dolf Trieschnigg, and Leonie Cornips. 2021. Audience and the Use of Minority Languages on Twitter. ICWSM 9(1), 666-669.
Shoemark, Philippa, Debnil Sur, Luke Shrimpton, Iain Murray, and Sharon Goldwater. 2017.Aye or naw, whit dae ye hink? Scottish independence and linguistic identity on social media. In EACL, 1239-1248.
Yoder, Michael, Shruti Rijhwani, Carolyn Rosé, and Lori Levin. 2017. Code-Switching as a social act: The case of Arabic Wikipedia talk pages. In the Second Workshop on NLP and Computational Social Science, 73-82.

#13 Oct 7: Conversation trajectories

Reminder: sign up for proposal discussion timeslot on the the course discussion site today, if possible.
Reminder: initial proposals (A2) posting deadline on the course discussion site Wed Oct. 20 11:59pm.

Class images, links and handouts

Images: (left) photo of a description of "Some ways in which a conversation can go wrong" from Ben Schott, Schottenfreude: German Words for the Human Condition (2013). (right) photo of a page from Allie Brosh, Solutions and Other Problems (2020).

Recording (only available to enrolled students)

References

Studies of some of the issues we've seen applied specifically to software engineering (such as multiple communities, unhealthy interactions, information propagation): Prem Devanbu, Vladimir Filkov, Bogdan Vasilescu and colleagues' work, inter alia.
Bao, Jiajun, Junjie Wu, Yiming Zhang, Eshwar Chandrasekharan, and David Jurgens. 2021. Conversations Gone Alright: Quantifying and Predicting Prosocial Outcomes in Online Conversations. The Web Conference[code]
Antoniak, Maria, David Mimno, and Karen Levy. 2019. Narrative Paths and Negotiation of Power in Birth Stories. Proc. Proceedings of the ACM on Human-Computer Interaction 3 (CSCW), Article 88. 27 pages.
Barzilay, Regina and Lillian Lee. 2004.Catching the drift: Probabilistic content models, with applications to generation and summarization.HLT-NAACL, pp. 113--120. Best paper award. [paper homepage] [author code, .tgz of Lisp] [author code for later versions] [code by Alexandre Passos, python]
Ritter, Alan, Colin Cherry, and Bill Dolan. 2010. Unsupervised Modeling of Twitter Conversations. NAACL 172–80.
Shi, Weiyan, Tiancheng Zhao, and Zhou Yu. 2019. Unsupervised Dialog Structure Learning. NAACL, 1797–1807 .
Zeng, Jichuan, Jing Li, Yulan He, Cuiyun Gao, Michael Lyu, and Irwin King. 2020. What Changed Your Mind: The Roles of Dynamic Topics and Discourse in Argumentation Process. The Web Conference 2020, 1502–13. [code]
Zeng, Jichuan, Jing Li, Yulan He, Cuiyun Gao, Michael R. Lyu, and Irwin King. 2019. What You Say and How You Say It: Joint Modeling of Topics and Discourse in Microblog Conversations. Transactions of the Association for Computational Linguistics 7 (March): 267–81. [code]
Zhang, Justine, Jonathan Chang, Cristian Danescu-Niculescu-Mizil, Lucas Dixon, Yiqing Hua, Dario Taraborelli, and Nithum Thain. 2018. Conversations Gone Awry: Detecting Early Signs of Conversational Failure. ACL, 1350–61. [ACL anthology page]

Oct 12: No class — Fall Break

#14 Oct 14: Mandatory A2 (initial proposal) appointments

Official A2, A3, A4 instructions posted.

#15 Oct 19: Intention inference

Reminder: initial proposals (A2) posting deadline on the course discussion site Wed Oct. 20 11:59pm, following the A2, A3, A4 instructions.

Class images, links and handouts

Image source: Dinosaur Comics 168, by Ryan North.

Handout; recording (only available to enrolled students)
Am I the [jerk] subreddit

References

Davidov, Dmitry, Oren Tsur, and Ari Rappoport. 2010. Semi-Supervised Recognition of Sarcasm in Twitter and Amazon.the Conference on Computational Natural Language Learning (CoNLL), 107–16.
Feng, Song, Ritwik Banerjee, and Yejin Choi. 2012. Syntactic Stylometry for Deception Detection. ACL (Volume 2: Short Papers), 171–75. [Semantic Scholar]
Forbes, Maxwell, Jena D. Hwang, Vered Shwartz, Maarten Sap, and Yejin Choi. 2020. Social Chemistry 101: Learning to Reason about Social and Moral Norms. EMNLP, 653–70. Online: [project homepage]
Fu, Liye, Jonathan P. Chang, and Cristian Danescu-Niculescu-Mizil. 2019. Asking the Right Question: Inferring Advice-Seeking Intentions from Personal Narratives. NAACL, 528–41.
Niculae, Vlad, Srijan Kumar, Jordan Boyd-Graber, and Cristian Danescu-Niculescu-Mizil. 2015. “Linguistic Harbingers of Betrayal: A Case Study on an Online Strategy Game.” In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 1650–59. Beijing, China: Association for Computational Linguistics. https://doi.org/10.3115/v1/P15-1159.
Zhou, Karen, Ana Smith, and Lillian Lee. 2021. Assessing Cognitive Linguistic Influences in the Assignment of Blame. Ninth International Workshop on Natural Language Processing for Social Media, 61–69.

#16 Oct 21: Joint project-proposal discussion: organization/grouping, recommended directions, etc.

recording (only available to enrolled students)

Class images, links and handouts

Liao, Jingxian, Guowei Yang, David Kavaler, Vladimir Filkov, and Prem Devanbu. 2019. Status, Identity, and Language: A Study of Issue Discussions in GitHub. PLOS ONE14 (6): e0215059.

References

Oct 26: No class — No class

Oct 28: No class — No class

Reminder: submit a request to the A3 appointment-booking site (for next Tuesday the 2nd) by Friday the 29th 11:59pm, following the A2, A3, A4 instructions.
Reminder: Feasibility-check posts updates (as replies) due Monday the 1st 11:59pm on the the course discussion site, following the A2, A3, A4 instructions.

#17 Nov 2: Feasibility-check appointments

Reminder: post a commitment-for-the-week by Th Nov 4 11:59pm, following the A2, A3, A4 instructions.

#18 Nov 4: How different are two language models for different sources? (Part one: language models)

Reminder: post a commitment-for-the-week by 11:59pm today, following the A2, A3, A4 instructions.

#19 Nov 9: How different are two language models for different sources? (Part two: an example language-model derivation)

Reminder: results of your commitment-for-the-week due ~~Thu Nov 11~~ Mon Nov 15, following the A2, A3, A4 instructions.

Class images, links and handouts

Image source: Dorothy Gambrel, Silent Spring (Cat and Girl)

Handout; recording (only available to enrolled students); scan of what was displayed on the document camera

References

On interpolation and backoff (among other techniques): Chen, Stanley F., and Joshua Goodman. 1996. An Empirical Study of Smoothing Techniques for Language Modeling. ACL , 310–18.

#20 Nov 11: continued example of language-model development: latent information; start: functions for measuring the difference between language models

Reminder: post the results of your commitment-for-the-week by ~~Thu Nov 11~~ Mon Nov 15 11:59pm, following the A2, A3, A4 instructions.

Class images, links and handouts

Handout; recording (only available to enrolled students); scan of what was displayed on the document camera

#21 Nov 16: Distances between distributions (conclusion)

Class images, links and handouts

Handout; handout with lecture annotations; recording (available only to enrolled students)

References

Bing, Xin, Florentina Bunea, Seth Strimas-Mackey, and Marten Wegkamp. 2021. “Likelihood Estimation of Sparse Topic Distributions in Topic Models and Its Applications to Wasserstein Document Distance Calculations.” ArXiv:2107.05766 [Math, Stat], July.
Cichocki, Andrzej, and Shun-ichi Amari. 2010. “Families of Alpha- Beta- and Gamma- Divergences: Flexible and Robust Measures of Similarities.” Entropy 12 (6): 1532–68.
Goldfeld, Ziv. Fall 2019 website for ECE 6970 - Statistical Distances for Modern Machine Learning (later versions only accessible to enrolled students, I believe)
Kusner, Matt J., Yu Sun, Nicholas I. Kolkin, and Kilian Q. Weinberger. 2015. From Word Embeddings to Document Distances. ICML, 957–66.
Labeau, Matthieu, and Shay B. Cohen. 2019. Experimenting with Power Divergences for Language Modeling. EMNLP-IJCNLP, 4104–14.
Lee, Lillian. 1997. Chapter 2.3, Measures of distributional similarity, in Similarity-Based Approaches to Natural Language Processing. Ph.D. Thesis.
Lee, Lillian. 1999. Measures of Distributional Similarity. ACL, 25–32. [paper homepage]
Lin, Jianhua. 1991.Divergence measures based on the Shannon entropy.IEEE Transactions on Information Theory 37(1): 145-151. [alt link]

#22 Nov 18: Introduction to discourse

See recording for explanation of and due dates for (new) plan for "class presentation"

Class images, links and handouts

Image source: my personal collection.

Handout; recording (only available to enrolled students)

References

Galantucci, Bruno and Gareth Roberts. 2014. Do we notice when communication goes awry? An investigation of people's sensitivity to coherence in spontaneous conversation. PLoS ONE 9(7).
Rogers, Todd and Michael I. Norton. 2011.The artful dodger: Answering the wrong question the right way. Journal of Experimental Psychology: Applied 17 (2). [alt link]

#23 Nov 23: Latent discourse structure

Reminder: progress-report/current-results "presentation" due on Ed Discussions Thursday noon. And schedule an appointment time for your group with me here: https://6742-2021-a5appts.youcanbook.me/. See lecture 22 recording for instructions.

Class images, links and handouts

One robot is looking at a flowerpot in its hands. The other robot says, 'can't we go five minutes without you checking your flower?'
Cartoon by Tom Chitty. Licensed from CartoonStock.

Handout; recording (only available to enrolled students)

References

Benotti, Luciana, and Patrick Blackburn. 2014. Conversational Implicatures. In Context in Computing: A Crossdisciplinary Approach for Modelling the Real World. Springer.
Grice, H. P. 1982. Logic and Conversation. In Speech Acts, edited by Peter Cole, 5. ed. Syntax and Semantics 3. New York: Academic Press.
Grosz, Barbara J., and Sidner, Candace L. 1986.Attention, intentions, and the structure of discourse.Computational Linguistics 12(3): 175-204.
Grosz, Barbara J., Weinstein, Scott, and Joshi, Aravind K. 1995.Centering: A framework for modeling the local coherence of discourse.Computational Linguistics 21 (June): 203-225. A theory said to account for the "wine on the table" example: structural preferences are subject > direct object > indirect object > other entities.
Walker, Marilyn A. 1996. Limited attention and discourse structure.Computational Linguistics 22(2): 255-264.

Nov 25: No class — Thanksgiving Break

#24 Nov 30: Intentions, attention, discourse structure

Reminder: progress-report/current-results "presentation" due on Ed Discussions Thursday noon. And schedule an appointment time for your group with me here: https://6742-2021-a5appts.youcanbook.me/. See lecture 22 recording for instructions.

Class images, links and handouts

Left: Garry Kasparov, Maurice Ashley, Yasser Seirawan and a bunch of soft drinks at the 1996 match against Deep Blue. Photo by Kenneth Thompson, provided at computerhistory.org
Right: Maurice Ashley and Yasser Seirawan commentating on the 1997 re-match. Photo by Monroe Newborn, provided atcomputerhistory.org

Handout; annotated handout; recording (only available to enrolled students)

References

Summary of and source for transcripts of live commentary on the first Kasparov/Deep Blue match
Speech Acts. 2020. Entry in the Stanford Encyclopedia of Philosophy.
Fox Tree, Jean E., and Herbert H. Clark. 1997. Pronouncing ‘the’ as ‘Thee’ to Signal Problems in Speaking. Cognition 62 (2): 151–67.
Grosz, Barbara J., and Sidner, Candace L. 1986.Attention, intentions, and the structure of discourse.Computational Linguistics 12(3): 175-204.
Intro to RST (Rhetorical Structure Theory). Webpage created by Bill Mann and maintained by Maite Taboada
Mann, William C. and Sandra A. Thompson. 1988.Rhetorical structure theory: Toward a functional theory of text organization.Text: Interdisciplinary Journal for the Study of Discourse 8(3): 243-281. [link at Mann/Taboada's site]
Moore, Johanna D., and Martha E. Pollack. 1992. A Problem for RST: The Need for Multi-Level Discourse Analysis. Computational Linguistics 18 (4): 537–44.

#25 Dec 2: (Mandatory) give-in-class-feedback-on-Ed-Discussions session

Reminder: progress-report "presentation" due on Ed Discussions today at noon. And schedule an appointment time for your group with me here: https://6742-2021-a5appts.youcanbook.me/. See lecture 22 recording for instructions.
Instructions posted for the final writeup, due on CMS Thu Dec. 16, 7pm (date determined by the registrar).
Course grade factors have now been set as shown on CMS: A1 = 30%; A1R=4%, A2 = 30%, A3=5%; A4=5%; A5=5%; Final writeup= 21%.

#26 Dec 7: (Mandatory) appointments with me (each group makes one)

Reminder: final project writeup due Thu Dec. 16, 7pm

Thu Dec. 16, 7pm: final project writeup due (date determined by the registrar)

Code for generating the calendar formatting adapted from Andrew Myers. Portions of the content of this website and course were created by collaboration between Cristian Danescu-Niculescu-Mizil and Lillian Lee over multiple runnings of this course.

Natural Language Processing and Social Interaction, Fall 2021 (original) (raw)

Enrollment, prerequisites, related classes

Administrative info

Overall course structure

Resources