Natural Language Processing and Social Interaction, Fall 2015 (original) (raw)
This page last modified Tue November 24, 2015 9:55 AM.
No tab selected
(If you're looking for anything other than lecture contents and have javascript enabled, click on the appropriate tab above.)
Prerequisites, course selection, enrollment
Prerequisites All of the following: CS 2110 or equivalent programming experience; a course in artificial intelligence or any relevant subfield (e.g., NLP, information retrieval, machine learning); proficiency with using machine learning tools (e.g., fluency at training an SVM, knowledge of how to assess a classifier’s performance using cross-validation)
Enrollment CS/IS PhD students may enroll online. Other students interested in adding the course, (wel)come to the first day of class. Enrollment questions will be addressed then, when we have a better sense of what the demand is and how many CS/IS PhD students are interested in taking the class.
Choosing among NLP courses: How do I know which one is right for me?
In 2015-2016, we are blessed with a plethora of NLP-related offerings!
At the graduate level:
- If you are interested in extracting information and meaning from text through machine learning techniques, then consider taking CS6740/IS6300, Advanced Language Technologies (offered Spring 2016. To get a feel for that the course will be like, see the Fall 2012 offering).
- If you are interested in studying formal representation of language meaning, and designing algorithms to learn to map sentences to such representations, then consider taking CS6741, Structured Prediction for Natural Language Processing (offered Fall 2015).
- If you are interested in exploring the social aspects of language and its role in online interactions, then consider taking CS6742, Natural Language Processing and Social Interaction (offered Fall 2015. To get a feel for what the course will be like, see the Fall 2014 offering).
- All three courses fulfill the same CS graduate course requirements. If you are truly passionate about NLP research, we would love to see you in all of these courses!
For undergraduate courses on offer, consult the Cornell NLP course list.
For more information before classes begin The webpage of the previous running (Fall 2014) of this course gives a general idea of what the course will be like
Administrative info and overall course structure
Course homepage http://www.cs.cornell.edu/courses/cs6742/2015fa. Main site for course info, assignments, readings, lecture references, etc.; updated frequently.
CMS page http://cms.csuglab.cornell.edu. Site for submitting assignments, unless otherwise noted.
Piazza page http://piazza.com/cornell/Fall2015/cs6742 Course announcements and Q&A/discussion site. Social interaction and all that, you know.
Contacting the instructors
- Email: please send a single email with both instructors' emails in the To: line. This ensures that the entire course staff knows what's going on and can contribute to student inquiries
- Office hours and other contact info: see Prof. Cristian Danescu-Niculescu-Mizil's homepage or Prof. Lillian Lee's homepage
Overview of course schedule. Details subject to change. Full schedule is maintained on the main course webpage.
Lecture | Agenda | Pedagogical purpose | Assignments |
---|---|---|---|
#1 | Course overview | A1 released: pilot empirical study for a research idea based on the given readings. | |
#2 - #4 | Lecture topics related to the A1 readings: Online reviews: individual expression, community dynamics; Online asynchronous conversations. | Case studies to explore some topics and research styles find interesting. Get-to-know-you exercises to get everyone familiar and comfortable with each other. | |
Next 6 meetings, not counting presentations or discussions | Lectures on, potentially, linguistic coordination, linguistic adaptation, influence, persuasion, diffusion, discourse structure, advanced language modeling | Foundational material | Potentially some assignments based on the lectures. |
Next large block of meetings | Dicussion of proposed projects based on the readings | Practice with fast research-idea generation. Feedback as to what proposals are most interesting, most feasible, etc. | Discussion of student project proposals, based on the readings for that class meeting. Each class meeting thus involves everyone reading at least one of the two assigned papers and posting a new research proposal based on the reading to Piazza. Thoughtfulness and creativity are most important to , but take feasibility into account. |
Remainder of the course | Activities related to course projects | Development of a "full-blown" research project (although time restrictions may limit ambitions). For our purposes, "interesting" is more important than "thorough". | |
Some time in December (to be determined by the registrar): final project writeup due |
Grading Of most interest to is productive research-oriented discussion participation (in class and on Piazza), interesting research proposals and pilot studies, and a good-faith final research project.
Academic Integrity Academic and scientific integrity compels one to properly attribute to others any work, ideas, or phrasing that one did not create oneself. To do otherwise is fraud.
We emphasize certain points here. In this class, talking to and helping others is strongly encouraged. You may also, with attribution, use the code from other sources. The easiest rule of thumb is, acknowledge the work and contributions and ideas and words and wordings of others. Do not copy or slightly reword portions of papers, Wikipedia articles, textbooks, other students' work, Stack Overflow answers, something you heard from a talk or a conversation or saw on the Internet, or anything else, really, without acknowledging your sources. See http://www.cs.cornell.edu/courses/cs6742/2011sp/handouts/ack-others.pdf and http://www.theuniversityfaculty.cornell.edu/AcadInteg/ for more information and useful examples.
This is not to say that you can receive course credit for work that is not your own — e.g., taking someone else's report and putting your name at the top, next to the other person(s)' names. However, violations of academic integrity (e.g., fraud) undergo the academic-integrity hearing process on top of any grade penalties imposed, whereas not following the rules of the assignment only risk grade penalties.
Resources
- Webpage of the Fall 2014 offering of this course
- ACL anthology of all conferences, journals and workshops published under the aegis of the Association for Computational Linguistics; ACM digital library proceedings publication archive for WWW; AAAI proceedings archive for ICWSM
- ACL wiki of resources - corpora, datasets, tools, software, lexicons (organized by language)
- Toolkits: CMU twitter tools (Java) :: GATE (Java) :: Illinois tools (Java?) :: Lingpipe (Java) :: Mallet (Java) :: OpenNLP (Java) :: NLTK (Python) :: Stanford tools (Java) :: tm (R)
- Resources for specific features: see the features roundup lecture.
- NLP at Cornell
#1 Aug 25: Course overview: scope, course goals, course design
Class images, links and handouts
- Handout
- Inspirational image: The_School_of_Athens
- An Honest Facebook Political Argument: hypothetical comment thread with re-entry
- An annotated Wikipedian "Article for Deletion" vote page (click the "1" speech balloons to see comments)
- notabilia.net visualization of vote dynamics on selected AfD discussions
- Politeness web app and a particular instance of it in action
Datasets
References
- Althoff, Tim, Cristian Danescu-Niculescu-Mizil, Dan Jurafsky. 2014. How to ask for a favor: A case study on the success of altruistic requests. ICWSM, pp. 12–21.
- Backstrom, Lars, Jon Kleinberg, Lillian Lee, and Cristian Danescu-Niculescu-Mizil. 2013. Characterizing and curating conversation threads: Expansion, focus, volume, re-entry. WSDM, pp. 13–22.
- Brown, Penelope and Stephen C. Levinson. 1987. Politeness: Some Universals in Language Usage. Reissued with new introduction by Cambridge University Press
- Bryan, Christopher J., Gregory M. Walton, Todd Rogers, and Carol S. Dweck. 2011. Motivating voter turnout by invoking the self. Proceedings of the National Academy of Sciences 108 (31): 12653-12656.
- Chong, Dennis and James N. Druckman. 2007. Framing theory. Annual Review of Political Science 10:103–26.
- Danescu-Niculescu-Mizil, Cristian, Moritz Sudhof, Dan Jurafsky, Jure Leskovec, Christopher Potts. 2013. A computational approach to politeness with application to social factors. ACL, pp. 250–259.
- Taraborelli, Dario and Giovanni Luca Ciampaglia. Beyond notability. Collective deliberation on content inclusion in Wikipedia. Second international workshop on quality in techno-social systems, pp. 122-125. [alt link]
#2 Aug 27: Reviewing: a social experience?
Class images, links and handouts
Image source: Dorothy Gambrel, Cat and Girl. Permission policy here.
- Reviews for Surviving your Stupid, Stupid Decision to Go to Grad School, annotated for intended audience.
- Example of a helpful review
Lecture references
- Danescu-Niculescu-Mizil, Cristian, Robert West, Dan Jurafsky, Jure Leskovec, and Christopher Potts. 2013. No country for old members: User lifecycle and linguistic change in online communities. WWW, pp. 307—318. Best paper award.
- Gilbert, Eric and Karrie Karahalios. 2010. [Understanding deja reviewers](alternative link). CSCW, pp.225—228.
- Michael, Loizos and Jahna Otterbacher. 2014. Write like I write: Herding in the language of online reviews. ICWSM.
- Pinch, Trevor and Filip Kesler. 2011. How Aunt Ammy gets her free lunch: A study of the top-thousand customer reviewers at Amazon.com.
#3 Sep 1: Review helpfulness
Class images, links and handouts
Image source: http://xkcd.com/810/- List of potential review features from Ottenbacher 2009 and Ghose and Ipeirotis 2011.
- Slides on Danescu-Niculescu-Mizil, Kossinets, Kleinberg and Lee, WWW '09
References on lecture topics
- Danescu-Niculescu-Mizil, Cristian, Gueorgi Kossinets, Jon Kleinberg, and Lillian Lee. 2009. How opinions are received by online communities: A case study on Amazon.com helpfulness votes. WWW: 141—150.
- Ghose, Anindya and Panagiotis Ipeirotis. 2011. Estimating the helpfulness and economic impact of product reviews: Mining text and reviewer characteristics. IEEE Transactions on Knowledge and Data Engineering 23(10): 1498—1512.
- Liu, Jingjing, Yunbao Cao, Chin-Yew Lin, Yalou Huang, and Ming Zhou. 2007.Low-quality product review detection in opinion summarization. In Proceedings of EMNLP. 334-342.
- McAuley, Julian and Jure Leskovec. 2013. From amateurs to connoisseurs: Modeling the evolution of user expertise through online reviews. WWW.
- Otterbacher, Jahna. 2009. 'Helpfulness' in online communities: a measure of message quality. CHI, 955-964.
- Sipos, Ruben, Arpita Ghosh, and Thorsten Joachims. 2014. Was this review helpful to you? It depends! Context and voting patterns in online content. WWW.
#4 Sep 3: From monologues to conversations
Class images, links and handouts
Image: photo of a page from Ben Schott, Schottenfreude: German Words for the Human Condition (2013)
- Some ways in which a conversation can go wrong: photo of part of a page in Schottenfreude.
- Dialectic vs eristic (Blount, Millard, Weal 2014, from the 14th workshop on Computational Models of Natural Argument)
- Sample conversations:
- Slack. Image from Fortune.com
- Slashdot (useful to look at in conjunction with Wikipedia explanation of Slashdot moderation), with a txt version from the UBC BC3 Blog corpus.
- Reddit. Online thread visualizer by Kawandeep Virdee.
Datasets
- UBC BC3 Blog Corpus: 7000 blog conversations with user-labeled comments from 6 popular websites (Slashdot, Macrumors, AndroidCentral, Dailykos, BusinessInsider, TSN). Slashdot includes "Funny" tags.
- CORPS: corpus of political speeches tagged with specific audience reactions, such as APPLAUSE or LAUGHTER.
- Supreme Court Dialog Corpus a collection of conversations from the U.S. Supreme Court Oral Arguments (http://www.supremecourt.gov/oral_arguments/) with metadata. Includes "laughter".
- Yelp academic dataset. Includes "funny", "cool", "useful" tags, although not many of them.
- HCRC Map Task Corpus: 128 dialogues recorded, transcribed, and annotated for a wide range of behaviours. It references other related corpora:
The DCIEM Map Task Corpus uses very similar materials to the HCRC Map Task Corpus, but with a different structure designed to test the effects of sleep deprivation under a number of pharmaceutical conditions. The subjects were Canadian army reservists. The Map Task Corpus has been replicated in whole or in part in a number of languages including Dutch, Italian, Japanese, Swedish, Occitan, andPortuguese. It has also been replicated in part for other forms of English besides the original Glaswegian speakers, including American English, Australian English, and some urban British dialects. The Occitan site has a list of some other language replications. The Map Task has been used to test the effects of many conditions on human communication, including stuttering, computer mediation, textual communication, and the use of avatars.
References
- Brake, David Russell. 2012. Who do they think they're talking to? Framings of the audience by social media users. International Journal of Communication 6:1056-1076.
- Fay, Nicolas, Simon Garrod, and Jean Carletta. 2000. Group discussion as interactive dialogue or as serial monologue: The influence of group size. Psychological Science 11(6): 481-486. [alt link]
- Gonzalez-Bailon, Sandra, Andreas Kaltenbrunner, and Rafael E Banchs. 2010. The structure of political discussion networks: A model for the analysis of online deliberation. Journal of Information Technology 25(2): 230-243. [alt link]
- Lejeune, Philippe. 2009. On Diary. Ed. Jeremy D. Popkin and Julie Rak. University of Hawaii Press.
- Litt, Eden. September 2012. Knock, knock. Who's there? The imagined audience. Journal of Broadcasting & Electronic Media 56(3): 330–345.
- Marwick, Alice E. and danah boyd. 2010. I tweet honestly, I tweet passionately: Twitter users, context collapse, and the imagined audience. New Media & Society 13(1): 114-133. [alt link]
- Pickering, Martin J. and Simon Garrod. 2004. Toward a mechanistic psychology of dialogue. Behavioral and Brain Sciences 27(02): 169-190. [alt link]
#5 Sep 8: Discourse "rules" and discourse structure
Class images, links and handouts
- Gricean maxim of quantity, “be exactly as informative as is required”: Google-plus post; xkcd comic (as pointed out by Mark Liberman on Language Log)
- Examples handout
References
- Galantucci, Bruno and Gareth Roberts. 2014. Do we notice when communication goes awry? An investigation of people's sensitivity to coherence in spontaneous conversation. PLoS ONE 9(7).
- Grice, H.P. 1975. Logic and Conversation. In Cole et al., Syntax and Semantics 3: Speech Acts. and 1978.
- Grosz, Barbara J., Weinstein, Scott, and Joshi, Aravind K. 1995. Centering: a framework for modeling the local coherence of discourse. Computational Linguistics 21 (June): 203-225.
- We didn't talk about this in class, but centering theory is meant to account for phenomena like the focus of example 2 being the wine rather than the table.
- Section 24.1.5 of Jurafsky, Daniel and James H. Martin. 2009. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. 2nd edition. [chapter link at UCSC]
- Rogers, Todd and Michael I. Norton. 2011. The artful dodger: Answering the wrong question the right way. Journal of Experimental Psychology: Applied 17 (2). [alt link]
#6 Sep 10: Intentions and the Grosz/Sidner discourse-structure theory
- A2 - annotation/discourse assignment
Class images, links and handouts
- Pinker, Steven and the Royal Society for the Encouragement of Arts, Manufactures and Commerce (RSA) Animate, posted to YouTube on Feb 10, 2011. Language as a Window into Human Nature
- handout
References
- Clark, Herbert H. 2004. Pragmatics of language performance. In L. R. Horn & G. Ward (Eds.), Handbook of pragmatics. Oxford: Blackwell, pp. 365-382. [alt link]
- Grosz, Barbara J., and Sidner, Candace L. 1986. Attention, intentions, and the structure of discourse. Computational Linguistics 12(3): 175-204. [alt link]
- Section 24.7.2 of Jurafsky, Daniel and James H. Martin. 2009. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. 2nd edition. [chapter link at UCSC]
#7 Sep 15: A1 presentations, part one
#8 Sep 17: A1 presentations**, part two**
- A1.R - A1 Reflection assignment
#9 Sep 22: Grosz/Sidner annotation exercise discussion
- Garley12Beefmoves-Mitra14Kickstarter - proposal based on lecture-11 readings
References
- Grosz, Barbara J. and Peter C. Gordon. 1999. Conceptions of limited attention and discourse focus. Computational Linguistics 25(4). [alt link]
- Walker, Marilyn A. 1996. Limited attention and discourse structure. Computational Linguistics 22(2): 255-264. [alt link]
#10 Sep 24: Case study: Linguistic coordination
Class images, links and handouts
References
- Danescu-Niculescu-Mizil, Cristian, Michael Gamon, and Susan Dumais. 2011. Mark my words! Linguistic style accommodation in social media. In Proceedings of WWW.
- Danescu-Niculescu-Mizil, Cristian, Lillian Lee, Bo Pang, and Jon Kleinberg. 2012. Echoes of power: Language effects and power differences in social interaction. In Proceedings of WWW.
- Danescu-Niculescu-Mizil, Cristian and Lillian Lee. 2011. Chameleons in imagined conversations: A new approach to understanding coordination of linguistic style in dialogs. Proceedings of the ACL Workshop on Cognitive Modeling and Computational Linguistics.
- Levelt, Willem J M and Stephanie Kelter. 1982. Surface form and memory in question answering. Cogn Psychol 14 (1): 78 - 106.
- Giles, Howard, Justine Coupland, and Nikolas Coupland. 1991.Accommodation theory: Communication, context, and consequence. In Contexts of Accommodation: Developments in Applied Sociolinguistics. Cambridge Univ Pr.
- Gonzales, Amy L., Jeffrey T. Hancock, and James W. Pennebaker. 2010. Language style matching as a predictor of social dynamics in small groups.Communication Research 37 (1): 3-19.
- Feng, S, R Banerjee, and Y Choi. 2012. Characterizing stylistic elements in syntactic structure. Proceedings of EMNLP.
- Bramsen, Philip, Martha Escobar-Molana, Ami Patel, and Rafael Alonso. 2011. Extracting social power relationships from natural language. Proceedings of ACL HLT.
#11 Sep 29: Garley12Beefmoves-Mitra14Kickstarter
- Nguyen10Bias-Vasilescu14R - proposal based on lecture-13 readings
Class images, links and handouts
- Facebook pages: Change.org (two variants for the Zadroga act: one, two), https://www.facebook.com/humansofnewyork, Orcs of New York
- StackOverflow, SegmentFault; Bitcointalk.org
The readings
- Garley, Matt and Julia Hockenmaier. 2012. Beefmoves: Dissemination, diversity, and dynamics of English borrowings in a German hip hop forum. ACL.
- Mitra, Tanushree and Eric Gilbert. 2014. The language that gets people to give: Phrases that predict success on kickstarter. CSCW. [alt link]
References
- Backstrom, Lars, Dan Huttenlocher, Jon Kleinberg, and Xiangyang Lan. 2006. Group formation in large social networks: Membership, growth, and evolution. In KDD, 44-54. [alt link]
- Danescu-Niculescu-Mizil, Cristian, Justin Cheng, Jon Kleinberg, and Lillian Lee. 2012. You had me at hello: How phrasing affects memorability. ACL, pp. 892--901.
- Danescu-Niculescu-Mizil, Cristian, Robert West, Dan Jurafsky, Jure Leskovec, and Christopher Potts. 2013. No country for old members: User lifecycle and linguistic change in online communities. WWW, pp. 307-318. [alt link] Best paper award.
- Garley, Matthew E. 2012. Crossing the lexicon: Anglicisms in the German hip hop community. Ph.D. thesis, UIUC.
- Ghose, Anindya and Panagiotis Ipeirotis. 2010. Estimating the helpfulness and economic impact of product reviews: Mining text and reviewer characteristics. IEEE Transactions on Knowledge and Data Engineering [alt link]
- Guerini, Marco, Gözde Ozbal, and Carlo Strapparava. 2015. Echoes of persuasion: The effect of euphony in persuasive communication. NAACL, pp. 1483--1493.
- Jaech, Aaron, Vicky Zayats, Hao Fang, Mari Ostendorf, and Hannaneh Hajishirzi. 2015. Talking to the crowd: What do people react to in online discussions? EMNLP, pp. 2026--2031.
- Lakkaraju, Himabindu, Julian McAuley, and Jure Leskovec. 2013. What's in a name? Understanding the interplay between titles, content, and communities in social media. ICWSM.
- Pierrehumbert, Janet B. 2012. The dynamic lexicon. Handbook of Laboratory Phonology. A. Cohen, M. Huffman, and C. Fougeron (eds). Oxford University Press.
- Tan, Chenhao Tan, Lillian Lee, and Bo Pang. 2014. The effect of wording on message propagation: Topic- and author-controlled natural experiments on Twitter. ACL.
- Wang, William Yang and Wen, Miaomiao. 2015. I Can Has Cheezburger? A Nonparanormal Approach to Combining Textual and Visual Information for Predicting and Generating Popular Meme Descriptions. NAACL, pp. 355--365
#12 Oct 1: Memes and relations
Class images, links and handouts
- MemeTracker visualization, including Variants of the "Lipstick on a pig" quote
- QUOTUS visualization: how media outlets quote the President.
The readings
- Krishnan, Vinodh and Jacob Eisenstein. 2015. "You're Mr. Lebowski, I'm the Dude": Inducing address term formality in signed social networks. NAACL, pp. 1616--1626.
- Simmons, Matthew P., Lada A. Adamic, and Eytan Adar. 2011. Memes online: Extracted, subtracted, injected, and recollected. Proceedings of ICWSM, pp. 353--360.
References
- Chapter on Structural Balance from Easley, David and Jon Kleinberg. 2010. Networks, Crowds, and Markets: Reasoning About a Highly Connected World. Cambridge University Press.
- West, Robert, Hristo S Paskov, Jure Leskovec, and Christopher Potts. Exploiting social network structure for person-to-person sentiment analysis. TACL.
- Elson, David K, Nicholas Dames, and Kathleen R McKeown. 2010. Extracting social networks from literary fiction. ACL, 138-147.
- He, Hua, Denilson Barbosa, and Grzegorz Kondrak. 2013. Identification of speakers in novels. ACL
- Coscia, Michele. September 2014. Average is boring: How similarity kills a meme's success. Scientific Reports.
- Leskovec, Jure, Lars Backstrom, and Jon Kleinberg. 2009. Meme-tracking and the dynamics of the news cycle. KDD, pp. 497--506.
- Niculae, Vlad, Caroline Suen, Justine Zhang, Cristian Danescu-Niculescu-Mizil, and Jure Leskovec. 2015. QUOTUS: The structure of political media coverage as revealed by quoting patterns. WWW, 798-808.
- Weng, Lilian, Filippo Menczer, and Yong-Yeol Ahn. 2014. Predicting successful memes using network and community structure. In ICWSM
- Schneider, Nathan, Rebecca Hwa, Philip Gianfortoni, Dipanjan Das, Michael Heilman, Alan W. Black, Frederick L. Crabbe, and Noah A. Smith. 2010. Visualizing Topical Quotations Over Time to Understand News Discourse. CMU-LTI-01-103, CMU.
- Shaparenko, Benyah and Thorsten Joachims. 2007. Information genealogy: Uncovering the flow of ideas in non-hyperlinked document databases. KDD, 619-628. [alt link]
#13 Oct 6: Nguyen10Bias-Vasilescu14R
Class images, links and handouts
The readings
- Nguyen, Dong, Elijah Mayfield, and Carolyn P Rosé. 2010. An analysis of perspectives in interactive settings. In Proceedings of the First Workshop on Social Media Analytics, 44-52. [alt link]
- Vasilescu, Bogdan, Alexander Serebrenik, Prem Devanbu, and Vladimir Filkov. 2014. How social Q&A sites are changing knowledge sharing in open source software communities. CSCW, pp. 342--354. [alt link]
References
- Card, Dallas, Amber E Boydstum, Justin H Gross, Philip Resnik, and Noah A Smith. 2015. The media frames corpus: Annotations of frames across issues.Proceedings of ACL
- Iyyer, Mohit, Peter Enns, Jordan Boyd-Graber, and Philip Resnik. Political ideology detection using recursive neural networks. Proceedings of ACL.
- Bakshy, Eytan, Solomon Messing, and Lada Adamic. Exposure to ideologically diverse news and opinion on facebook. Science
- Recasens, Marta, Cristian Danescu-Niculescu-Mizil, and Dan Jurafsky. 2013. Linguistic models for analyzing and detecting biased language. Proceedings of ACL
- Bakshy, Eytan, Itamar Rosenn, Cameron Marlow, and Lada Adamic. 2012. The role of social networks in information diffusion. Proceedings of WWW
- Danescu-Niculescu-Mizil, Cristian, Moritz Sudhof, Dan Jurafsky, Jure Leskovec, and Christopher Potts. 2013. A computational approach to politeness with application to social factors. Proceedings of ACL
- Tan, Chenhao and Lillian Lee. 2015. All Who Wander: On the Prevalence and Characteristics of Multi-community Engagement. WWW. [alt link]
- Anderson, Ashton, Daniel Huttenlocher, Jon Kleinberg, and Jure Leskovec. 2013. Steering user behavior with badges. Proceedings of WWW.
#14 Oct 8: Lecture title
The readings
- Sudhof, Moritz, Andrés Goméz Emilsson, Andrew L Maas, and Christopher Potts. Sentiment expression conditioned by affective transitions and social forces. In Proceedings of KDD
- Wang, Lu and Claire Cardie. A piece of my mind: A sentiment analysis approach for online dispute detection.In Proceedings of the ACL, pp.693--699.
References
- Backstrom, Lars and Jon Kleinberg. 2014. Romantic partnerships and the dispersion of social ties: A network analysis of relationship status on facebook. In Proceedings of CSCW, 831-841.
- Niculae, Vlad, Srijan Kumar, Jordan Boyd-Graber, and Cristian Danescu-Niculescu-Mizil. 2015. Linguistic harbingers of betrayal: A case study on an online strategy game. In Proceedings of ACL
- Kittur, Aniket and Robert E Kraut. 2008. Harnessing the wisdom of crowds in wikipedia: Quality through coordination. In Proceedings of CSCW, pp.37--46.
- Lam, Shyong K, Jawed Karim, and John Riedl. 2010. The effects of group composition on decision quality in a social production community. In Proceedings of the 16th ACM International Conference on Supporting Group Work, 55-64.
Oct 13: Fall Break
#15 Oct 15: (Optional) Proposal consultation appointments
If you did not make an appointment for today, you do not need to come to class.
#16 Oct 20: (Mandatory) Feasibility presentations to the instructors
You should only come to class for your appointment slot, not for the whole class-meeting time.
#17 Oct 22:(Mandatory) Feasibility presentations to the instructors
You should only come to class for your appointment slot, not for the whole class-meeting time.
#18 Oct 27: Bayesian identification of features distinguishing two sub-languages
- CMS 2-minute “quiz” on LM background, due midnight on Tuesday the 27th: we want to get a feeling for what everyone's background in language modeling is. Your answers are purely for us to plan lecture (as you will see when you look at the possible answers).
Class images, links and handouts
Image source: http://www.keepcalm-o-matic.co.uk/p/keep-calm-and-never-tell-me-the-odds-6/.
- Percy Liang and Dan Klein. 2007. Structured Bayesian nonparametric models with variational inference.
- Monroe, Burt L., Michael P. Colaresi, and Kevin M. Quinn. 2008. Fightin' words: Lexical feature selection and evaluation for identifying the content of political conflict. Political Analysis 16(4): 372-403. [alt link]
- The argument regarding variance for counts and normalized counts, and the use of a (different) probabilistic model to account for it, also appears in Kleinberg, Jon. 2004. Temporal dynamics of on-line information streams. In Data Stream Management: Processing High-speed Data Streams, to appear.
- Implementations:
- Hessel, Jack. FightingWords.
- Marzagão, Thiago. mcq.py
References
- Kenneth W. Church. 2000. Empirical estimates of adaptation: The chance of two Noriegas is closer to p/2 than p2. COLING.
- Eisenstein, Jacob. 2013. What to do about bad language on the internet. NAACL-HLT, 359-369
- Fredette, Marc and Jean-François Angers. 2002. A new approximation of the posterior distribution of the log-odds ratio. Statistica Neerlandica 56(3): 314-329. [alt link, alt link]
- Liberman, Mark. The most Trumpish (and Bushish) words, 2015. Obama's favored (and disfavored) SOTU words, 2014. Draft words (descriptions of white vs black NFL prospects), 2014. Male and female word usage, 2014.
- FAQ: How do I interpret odds ratios in logistic regression? Introduction to SAS. UCLA: Statistical Consulting Group.
#19 Oct 29: N-Gram Language Models
Class images, links and handouts
- Quote memorability quizz
- MIT Language Modeling Toolkit
- SRILM - The SRI Language Modeling Toolkit
- N-gram language models in Python
- Mention of the bug in NLTK (the point being that language modeling can actually be quite subtle)
Lecture references
- Chen, Stanley F. and Joshua Goodman. 1996. An empirical study of smoothing techniques for language modeling. In Proceedings of the 34th Annual Meeting on Association for Computational Linguistics, 310-318. More detailed technical report version (recommended)
- Danescu-Niculescu-Mizil, Cristian, Justin Cheng, Jon Kleinberg, and Lillian Lee. 2012. You had me at hello: How phrasing affects memorability. Proceedings of the ACL, pp.892--901.
- F. Jelinek, R.L. Mercer and S. Roukos. Principles of Lexical Language Modeling for Speech Recognition. Advances in Speech Signal Processing, S. Furui and J. Sondhi, Eds. M. Dekker Publishers, New York, NY 1991. Pp.651-700
- Gale, William A. and Kenneth W. Church. 1994.What's wrong with adding one. Corpus-based Research Into Language: In Honour of Jan Aarts, pp. 189--200.
#20 Nov 3: Entropy and Divergence
Class images, links and handouts
Lecture references
- Tan, Chenhao, Lillian Lee, and Bo Pang. 2014. The effect of wording on message propagation: Topic-and author-controlled natural experiments on twitter. In ACL
- Stark, Anthony, Izhak Shafran, and Jeffrey Kaye. 2012. Hello, who is calling?: Can words reveal the social nature of conversations?. Proceedings of NAACL
- Genzel, Dmitriy and Eugene Charniak. 2002. Entropy rate constancy in text. ACL, pp.199--206.
- Doyle, Gabriel and Michael C Frank. 2015. Audience size and contextual effects on information density in twitter conversations. Proceedings of the ACL Workshop on Cognitive Modeling and Computational Linguistics (CMCL), pp.19--28.
- Juola, Patrick and Harald R. Baayen. A controlled-corpus experiment in authorship identification by cross-entropy. Literary and Linguistic Computing 20 (Suppl 1): 59-67.
- Purohit, Hemant, Yiye Ruan, David Fuhry, Srinivasan Parthasarathy and Amit Sheth. 2014. On understanding the divergence of online social group discussion. Proceedings of ICWSM
- Danescu-Niculescu-Mizil, Cristian, Robert West, Dan Jurafsky, Jure Leskovec, and Christopher Potts. 2013. No country for old members: User lifecycle and linguistic change in online communities. WWW, pp.307--318. Best paper award.
#21 Nov 5:Advanced yet “off-the-shelf” features roundupp
Assignments/announcements
- Note (new) upcoming check-up appointments, presentation and final-project due dates.
Class images, links and handouts
- Flesch, Rudolf. June 1948. A new readability yardstick. Journal of Applied Psychology 32(3): 221-33. [Alternative link: the paper is bundled is the collection The Classic Readability Studies, ed. William H. DuBay. Published as Unlocking Language: The Classic Studies in Readability, BookSurge Publishing, 2007.
- Louis, Annie and Ani Nenkova. 2013. What makes writing great? First experiments on article quality prediction in the science journalism domain. Transactions of the Association for Computational Linguistics 1:341-352.
- MRC Psycholinguistic database. Wilson, M.D. (1988) The MRC Psycholinguistic Database: Machine Readable Dictionary, Version 2. Behavioural Research Methods, Instruments and Computers, 20(1), 6-11.
- Sentiment/subjectivity lexicons: Connotation Lexicon, MPQA lexica (goodFor/badFor, +/-affect, arguing, subjectivity), opinion lexicon, SentiWordNet. Financial sentiment (2014 version). See also the ...
- Multi-category lexicons: Harvard General Inquirer. The LIWC lexicon (2015 version). The NRC lexicons.
References
- Concreteness ratings. Brysbaert, M., Warriner, A.B., & Kuperman, V. (2014). Concreteness ratings for 40 thousand generally known English word lemmas. Behavior Research Methods 46:904-911.
- Valence, arousal, dominance ratings. Warriner, A.B., Kuperman, V., & Brysbaert, M. (2013). Norms of valence, arousal, and dominance for 13,915 English lemmas. Behavior Research Methods 45:1191-1207.
- Hedge-annotated data. Farkas, Richárd, Veronika Vincze, György Móra, János Csirik, and György Szarvas. 2010. The CONLL-2010 shared task: Learning to detect hedges and their scope in natural language text. Fourteenth Conference on Computational Natural Language Learning---Shared Task, pp. 1-12.
- Stanford politeness (of requests) code. Danescu-Niculescu-Mizil, Cristian, Moritz Sudhof, Dan Jurafsky, Jure Leskovec, and Christopher Potts. 2013. A computational approach to politeness with application to social factors. Proceedings of ACL, pp. 250--259.
#22 Nov 10: Probabilistic models for discourse (and hence dialog) structure patterns: a grammar-based perspective
Class images, links and handouts
- Barzilay, Regina and Lillian Lee. 2004. Catching the drift: Probabilistic content models, with applications to generation and summarization. Proceedings of HLT-NAACL, pp. 113--120. Best paper award.
- Code: Original by Regina Barzilay (in Lisp); by Alexandre Passos (python); other code for later versions
- Louis, Annie and Shay B. Cohen. 2015. Conversation trees: A grammar model for topic structure in forums. EMNLP, pp. 1543--1553.
- Ritter, Alan, Colin Cherry, and Bill Dolan. 2010. Unsupervised modeling of Twitter conversations. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 172-180.
References
- LCFRS's were introduced in Vijay-Shanker, K., David J. Weir, and Aravind K. Joshi. 1987. Characterizing structural descriptions produced by various grammatical formalisms. ACL, pp. 104--111.
- An example LCFRS is given in Burden, Håkan and Peter Ljunglöf. 2005. Parsing linear context-free rewriting systems. The Ninth International Workshop on Parsing Technology (IWPT), 11-17.
- Chen, Harr, S. R. K. Branavan, Regina Barzilay, and David R. Karger. 2009. Content modeling using latent permutations. Journal of Artificial Intelligence Research 36(1): 129-163. Haghighi, Aria and Lucy Vanderwende. 2009. Exploring content models for multi-document summarization. NAACL, 362-370.
- Joshi, Aravind K. and Yves Schabes. Tree-adjoining grammars. In Grzegorz Rozenberg and Arto Salomaa, editors, Handbook of Formal Languages, volume 3 (Beyond words), pp. 69–123 (1997). [alt link]
- Joshi, Aravind, K. Vijay-Shanker, and David Weir. 1991. The convergence of mildly context-sensitive grammar formalisms. In Peter Sells, Stuart Shieber and Thomas Wasow, eds., Foundational Issues in Natural Language Processing.
- Vijay-Shankar, K and Aravind K Joshi. 1985. Some computational properties of tree adjoining grammars. Proceedings of the 23rd Annual Meeting on Association for Computational Linguistics, pp. 82--93.
#23 Nov 12: Controlling for confounding factors in observational studies
Assignments/announcements
- By midnight Sunday the 15th, choose a 30-minute progress-and-problems appointment slot at https://cs6742-checkups.youcanbook.me/. These are mandatory.
- By 3pm the afternoon before your progress-and-problems appointment day, post a Piazza followup to your proposal that summarizes your progress and what discussion points or problems you'd like to bring up with us. Ideally, this followup post will be the agenda for your team's appointment, and will make the meeting efficient and useful for you.
Class images, links and handouts
Lecture references
- Argamon, Shlomo, Moshe Koppel, Jonathan Fine, and Anat Rachel Shimoni. Gender, genre, and writing style in formal written texts. Text - Interdisciplinary Journal for the Study of Discourse 23 (3): 321-346.
- Herring, Susan C. and John C. Paolillo. Gender and genre variation in weblogs. Journal of Sociolinguistics 10 (4): 439-459.
- Louis, Annie and Ani Nenkova. What makes writing great? First experiments on article quality prediction in the science journalism domain. Transactions of the Association for Computational Linguistics 1:341-352.
- Mitra, Tanushree and Eric Gilbert. 2014. The language that gets people to give: Phrases that predict success on kickstarter. In Proceedings of CSCW
- Althoff, Tim, Cristian Danescu-Niculescu-Mizil, Dan Jurafsky. 2014. How to ask for a favor: A case study on the success of altruistic requests. ICWSM, pp. 12–21.
- Danescu-Niculescu-Mizil, Cristian, Justin Cheng, Jon Kleinberg, and Lillian Lee. 2012. You had me at hello: How phrasing affects memorability. In Proceedings of the ACL, 892-901
- Tan, Chenhao, Lillian Lee, and Bo Pang. 2014. The effect of wording on message propagation: Topic-and author-controlled natural experiments on twitter. In Proceedings of ACL
- Rosenbaum, Paul R and Donal B Rubin. The central role of the propensity score in observational studies for causal effects. Biometrika 70 (1): 41-55.
- Borghol, Youmna, Sebastien Ardon, Niklas Carlsson, Derek Eager, and Anirban Mahanti. 2012. The untold story of the clones: Content-agnostic factors that impact youtube video popularity. In Proceedings of KDD, 1186-1194
#24 Nov 17: (Mandatory) Checkup appointments with the instructors
#25 Nov 19:(Mandatory) Checkup appointments with the instructors
#26 Nov 24: Alternate hypotheses. Language change.
Class images, links and handouts
Lecture references
- Danescu-Niculescu-Mizil, Cristian, Gueorgi Kossinets, Jon Kleinberg, and Lillian Lee. 2009. How opinions are received by online communities: A case study on Amazon.com helpfulness votes. WWW: 141—150.
- Cheng, Justin, Cristian Danescu-Niculescu-Mizil, and Jure Leskovec. 2014. How community feedback shapes user behavior. In Proceedings of ICWSM
- Romero, Daniel M, Roderick I Swaab, Brian Uzzi, and Adam D Galinsky. 2015. Mimicry is presidential. Personality and Social Psychology Bulletin 41 (10)
- Juola, Patrick. 2003. The time course of language change. Computers and the Humanities 37 (1): 77-96.
- Akpinar, Ezgi and Jonah Berger. 2015. Drivers of cultural success: The case of sensory metaphors. J Pers Soc Psychol 109 (1): 20.
- Danescu-Niculescu-Mizil, Cristian, Robert West, Dan Jurafsky, Jure Leskovec, and Christopher Potts. 2013. No country for old members: User lifecycle and linguistic change in online communities. WWW, pp. 307—318.
- Labov, William. 1966. The social stratification of english in new york city. In The Social Stratification of English in New York City. Washington, D.C.: Center for Applied Linguistics.
- Wagner, Suzanne Evans.2012. Age grading in sociolinguistic theory. Language and Linguistics Compass 6 (6): 371-382.
- Eisenstein, Jacob.2014. Identifying regional dialects in online social media.
- Eisenstein, Jacob, Brendan O'Connor, Noah A Smith, and Eric P Xing.2014. Diffusion of lexical change in social media. PLoS One 9 (11): e113114.
Nov 26: Thanksgiving Break
#27 Dec 1: project presentations (mandatory attendance by all students for the whole session)
- Schedule has been emailed. We'll end slightly later than usual --- 11:30, but will make up for it by ending early next session.
#28 Dec 3: project presentations (mandatory attendance by all students for the whole session)
- Schedule has been emailed.
Final project description due: December 9, 4:30pm on CMS. (date determined by the registrar)
The main evaluation criteria will be the reasonableness (in approach and amount of effort), thoughtfulness, and creativity of what you tried, as documented in your writeup. Individual effort within team projects will be taken into account; see item 3 below.
- Use the ICWSM style files provided by AAAI(LaTex style and bib files, Word template)
- We make this requirement to facilitate submission to ICWSM 2016. However, note that your final-project submission should have your names and acknowledgments included, in a particular format (see item 1c amd 2b below); in contrast, you will want to strip any identifying information for ICWSM submissions.
- AAAI prefers non-numbered section headings. You may change the style files to include section numbers in your headings for the purposes of CS6742 submission.
- For the author heading, list only the names of your teammates that are enrolled in the class, even if you had external collaborators. (Reason: only students in the class are submitting the paper for a grade.) But see item 2b1 below.
- Include the following sections:
- "content" sections: abstract, introduction/motivation, data description (how you gathered, cleaned, and processed it), methods, an experiments, related work, references, conclusions (what you learned), directions for future work.
- Make sure that your introduction section explicitly sets out your hypothesis or hypotheses.
- Throughout, highlight your most interesting findings (positive or negative).
- For the purposes of CS6742 submission, your related-work section does not need to be exhaustive; you may cover just a few most-related papers.
- An "acknowledgments" section: give the name and state the contribution of those who you received significant help from. (This may or may not include your advisor(s), one or both of your instructors, fellow students in the class).
- Authorship statement: if you intend to ask or have already arranged to have people other than your CS6742-enrolled teammates, also name each such person.
- "content" sections: abstract, introduction/motivation, data description (how you gathered, cleaned, and processed it), methods, an experiments, related work, references, conclusions (what you learned), directions for future work.
- Projects done collaboratively must also include a section describing who did what. External collaborators should be included in this enumeration.
- Use the number of pages you feel is appropriate.
Code for generating the calendar formatting adapted from the original versions created byAndrew Myers.