Natural Language Processing and Social Interaction, Fall 2016 (original) (raw)
No tab selected
(If you're looking for anything other than lecture contents and have javascript enabled, click on the appropriate tab above.)
Prerequisites, course selection, enrollment
Prerequisites All of the following: CS 2110 or equivalent programming experience; a course in artificial intelligence or any relevant subfield (e.g., NLP, information retrieval, machine learning); proficiency with using machine learning tools (e.g., fluency at training an SVM, knowledge of how to assess a classifier’s performance using cross-validation)
Enrollment CS/IS PhD students may enroll online. Other students interested in adding the course, (wel)come to the first day of class. Enrollment questions will be addressed then, when we have a better sense of what the demand is and how many CS/IS PhD students are interested in taking the class.
Choosing among NLP courses: How do I know which one is right for me?
In 2016-2017, we are blessed with a plethora of NLP-related offerings!
At the graduate level:
- If you are interested in extracting information and meaning from text through machine learning techniques, then consider taking CS6740/IS6300, Advanced Language Technologies (offered Spring 2017; to get a feel for that the course will be like, see the Spring 2016 offering).
- If you are interested in studying formal representation of language meaning, and designing algorithms to learn to map sentences to such representations, then consider taking CS6741, Structured Prediction for Natural Language Processing (offered Fall 2016; to get a feel for what the course will be like, see the Fall 2015 offering).
- If you are interested in exploring the social aspects of language and its role in online interactions, then consider taking CS6742, Natural Language Processing and Social Interaction (offered Fall 2016; to get a feel for what the course will be like, see the Fall 2015 offering).
- All three courses fulfill the same CS graduate course requirements. If you are truly passionate about NLP research, we would love to see you in all of these courses!
For undergraduate courses on offer, consult the Cornell NLP course list.
For more information before classes begin The webpage of the previous running (Fall 2015) of this course gives a general idea of what the course will be like
Administrative info and overall course structure
Course homepage http://www.cs.cornell.edu/courses/cs6742/2016fa. Main site for course info, assignments, readings, lecture references, etc.; updated frequently.
CMS page http://cms.csuglab.cornell.edu. Site for submitting assignments, unless otherwise noted.
Piazza page http://piazza.com/cornell/Fall2016/cs6742 Course announcements and Q&A/discussion site. Social interaction and all that, you know.
Contacting the instructor
- Office hours and contact info: see Prof. Cristian Danescu-Niculescu-Mizil's homepage
Overview of course schedule. Details subject to change. Full schedule is maintained on the main course webpage.
Lecture | Agenda | Pedagogical purpose | Assignments |
---|---|---|---|
#1 | Course overview | Pilot empirical study for a research idea based on readings provided. | |
#2 - #4 | Lecture topics related to the A1 readings: Online reviews: individual expression, community dynamics; Online asynchronous conversations. | Case studies to explore some topics and research styles find interesting. Get-to-know-you exercises to get everyone familiar and comfortable with each other. | |
Next 6 meetings, not counting presentations or discussions | Lectures on, potentially, linguistic coordination, linguistic adaptation, influence, persuasion, diffusion, discourse structure, advanced language modeling | Foundational material | Potentially some assignments based on the lectures. |
Next large block of meetings | Dicussion of proposed projects based on the readings | Practice with fast research-idea generation. Feedback as to what proposals are most interesting, most feasible, etc. | Discussion of student project proposals, based on the readings for that class meeting. Each class meeting thus involves everyone reading at least one of the two assigned papers and posting a new research proposal based on the reading to Piazza. Thoughtfulness and creativity are most important to , but take feasibility into account. |
Remainder of the course | Activities related to course projects | Development of a "full-blown" research project (although time restrictions may limit ambitions). For our purposes, "interesting" is more important than "thorough". | |
Some time in December (to be determined by the registrar): final project writeup due |
Grading Of most interest to is productive research-oriented discussion participation (in class and on Piazza), interesting research proposals and pilot studies, and a good-faith final research project.
Academic Integrity Academic and scientific integrity compels one to properly attribute to others any work, ideas, or phrasing that one did not create oneself. To do otherwise is fraud.
We emphasize certain points here. In this class, talking to and helping others is strongly encouraged. You may also, with attribution, use the code from other sources. The easiest rule of thumb is, acknowledge the work and contributions and ideas and words and wordings of others. Do not copy or slightly reword portions of papers, Wikipedia articles, textbooks, other students' work, Stack Overflow answers, something you heard from a talk or a conversation or saw on the Internet, or anything else, really, without acknowledging your sources. See http://www.cs.cornell.edu/courses/cs6742/2011sp/handouts/ack-others.pdf and http://www.theuniversityfaculty.cornell.edu/AcadInteg/ for more information and useful examples.
This is not to say that you can receive course credit for work that is not your own — e.g., taking someone else's report and putting your name at the top, next to the other person(s)' names. However, violations of academic integrity (e.g., fraud) undergo the academic-integrity hearing process on top of any grade penalties imposed, whereas not following the rules of the assignment only risk grade penalties.
Resources
- Webpage of the Fall 2015 offering of this course
- ACL anthology of all conferences, journals and workshops published under the aegis of the Association for Computational Linguistics; ACM digital library proceedings publication archive for WWW; AAAI proceedings archive for ICWSM
- ACL wiki of resources - corpora, datasets, tools, software, lexicons (organized by language)
- Toolkits: CMU twitter tools (Java) :: GATE (Java) :: Illinois tools (Java?) :: Lingpipe (Java) :: Mallet (Java) :: OpenNLP (Java) :: NLTK (Python) :: Stanford tools (Java) :: tm (R)
- NLP at Cornell
#1 Aug 23: Course overview: scope, course goals, course design
Details will be appear here before each lecture.
Assignment A1 released
Student-information assignment released: see handout
Class images, links and handouts
- Handout
- Inspirational image: The_School_of_Athens
- An Honest Facebook Political Argument: hypothetical comment thread with re-entry
- An annotated Wikipedian "Article for Deletion" vote page (click the "1" speech balloons to see comments)
- notabilia.net visualization of vote dynamics on selected AfD discussions
Datasets
References
- Althoff, Tim, Cristian Danescu-Niculescu-Mizil, Dan Jurafsky. 2014. How to ask for a favor: A case study on the success of altruistic requests. ICWSM, pp. 12–21.
- Backstrom, Lars, Jon Kleinberg, Lillian Lee, and Cristian Danescu-Niculescu-Mizil. 2013. Characterizing and curating conversation threads: Expansion, focus, volume, re-entry. WSDM, pp. 13–22.
- Brown, Penelope and Stephen C. Levinson. 1987. Politeness: Some Universals in Language Usage. Reissued with new introduction by Cambridge University Press
- Bryan, Christopher J., Gregory M. Walton, Todd Rogers, and Carol S. Dweck. 2011. Motivating voter turnout by invoking the self. Proceedings of the National Academy of Sciences 108 (31): 12653-12656.
- Chong, Dennis and James N. Druckman. 2007. Framing theory. Annual Review of Political Science 10:103–26.
- Danescu-Niculescu-Mizil, Cristian, Moritz Sudhof, Dan Jurafsky, Jure Leskovec, Christopher Potts. 2013. A computational approach to politeness with application to social factors. ACL, pp. 250–259.
- Taraborelli, Dario and Giovanni Luca Ciampaglia. Beyond notability. Collective deliberation on content inclusion in Wikipedia. Second international workshop on quality in techno-social systems, pp. 122-125. [alt link]
#2 Aug 25: Reviewing: a social experience?
Class images, links and handouts
Image source: Dorothy Gambrel, Cat and Girl. Permission policy here.
- Reviews for Surviving your Stupid, Stupid Decision to Go to Grad School, annotated for intended audience.
Lecture references
- Wu, Fang and Bernardo A. Huberman.2010. Opinion formation under costly expression. ACM Transactions on Intelligent Systems and Technology 1 (1): 1-13.
- Gilbert, Eric and Karrie Karahalios. 2010. [Understanding deja reviewers](alternative link). CSCW, pp.225—228.
#3 Aug 30: Brown Bag and A1 Brainstorming
Class images, links and handouts
References on lecture topics
- Danescu-Niculescu-Mizil, Cristian, Robert West, Dan Jurafsky, Jure Leskovec, and Christopher Potts. 2013. No country for old members: User lifecycle and linguistic change in online communities. WWW
- Danescu-Niculescu-Mizil, Cristian, Moritz Sudhof, Dan Jurafsky, Jure Leskovec, Christopher Potts. 2013. A computational approach to politeness with application to social factors. ACL.
- Niculae, Vlad, Srijan Kumar, Jordan Boyd-Graber, and Cristian Danescu-Niculescu-Mizil. 2015.Linguistic harbingers of betrayal: A case study on an online strategy game. ACL
#4 Sep 1: Review helpfulness
Class images, links and handouts
Image source: http://xkcd.com/810/- List of potential review features from Ottenbacher 2009 and Ghose and Ipeirotis 2011.
Datasets
- Yelp academic dataset. Includes "funny", "cool", "useful" tags, although not many of them.
- Amazon Fine Foods reviews from the Stanford SNAP lab
References on lecture topics
- Danescu-Niculescu-Mizil, Cristian, Gueorgi Kossinets, Jon Kleinberg, and Lillian Lee. 2009. How opinions are received by online communities: A case study on Amazon.com helpfulness votes. WWW: 141—150.
- Danescu-Niculescu-Mizil, Cristian, Robert West, Dan Jurafsky, Jure Leskovec, and Christopher Potts. 2013. No country for old members: User lifecycle and linguistic change in online communities. WWW, pp. 307—318. Best paper award.
- Ghose, Anindya and Panagiotis Ipeirotis. 2011. Estimating the helpfulness and economic impact of product reviews: Mining text and reviewer characteristics. IEEE Transactions on Knowledge and Data Engineering 23(10): 1498—1512.
- Liu, Jingjing, Yunbao Cao, Chin-Yew Lin, Yalou Huang, and Ming Zhou. 2007.Low-quality product review detection in opinion summarization. In Proceedings of EMNLP. 334-342.
- McAuley, Julian and Jure Leskovec. 2013. From amateurs to connoisseurs: Modeling the evolution of user expertise through online reviews. WWW.
- Otterbacher, Jahna. 2009. 'Helpfulness' in online communities: a measure of message quality. CHI, 955-964.
- Michael, Loizos and Jahna Otterbacher. 2014. Write like I write: Herding in the language of online reviews. ICWSM.
- Pinch, Trevor and Filip Kesler. 2011. How Aunt Ammy gets her free lunch: A study of the top-thousand customer reviewers at Amazon.com.
- David, Shay and Trevor J Pinch.2006. Six degrees of reputation: The use and abuse of online review and recommendation systems. First Monday, no. Special Issue on Commercial Applications of the Internet.
- Sipos, Ruben, Arpita Ghosh, and Thorsten Joachims. 2014. Was this review helpful to you? It depends! Context and voting patterns in online content. WWW.
#5 Sep 6: From monologues to conversations
Class images, links and handouts
- Dialectic vs eristic (Blount, Millard, Weal 2014, from the 14th workshop on Computational Models of Natural Argument)
- Sample conversations:
- Slack. Image from Fortune.com
- Slashdot (useful to look at in conjunction with Wikipedia explanation of Slashdot moderation), with a txt version from the UBC BC3 Blog corpus.
- Reddit. Online thread visualizer by Kawandeep Virdee and different types of threads.
Datasets
- UBC BC3 Blog Corpus: 7000 blog conversations with user-labeled comments from 6 popular websites (Slashdot, Macrumors, AndroidCentral, Dailykos, BusinessInsider, TSN). Slashdot includes "Funny" tags.
- CORPS: corpus of political speeches tagged with specific audience reactions, such as APPLAUSE or LAUGHTER.
- Intelligence Square Debate Dataset a collection of public debates with metadata (audience voting results pre- and post-debate, and audience reaction markers)
- Supreme Court Dialog Corpus a collection of conversations from the U.S. Supreme Court Oral Arguments (http://www.supremecourt.gov/oral_arguments/) with metadata. Includes "laughter".
- HCRC Map Task Corpus: 128 dialogues recorded, transcribed, and annotated for a wide range of behaviours. It references other related corpora:
The DCIEM Map Task Corpus uses very similar materials to the HCRC Map Task Corpus, but with a different structure designed to test the effects of sleep deprivation under a number of pharmaceutical conditions. The subjects were Canadian army reservists. The Map Task Corpus has been replicated in whole or in part in a number of languages including Dutch, Italian, Japanese, Swedish, Occitan, andPortuguese. It has also been replicated in part for other forms of English besides the original Glaswegian speakers, including American English, Australian English, and some urban British dialects. The Occitan site has a list of some other language replications. The Map Task has been used to test the effects of many conditions on human communication, including stuttering, computer mediation, textual communication, and the use of avatars.
References
- Brake, David Russell. 2012. Who do they think they're talking to? Framings of the audience by social media users. International Journal of Communication 6:1056-1076.
- Lejeune, Philippe. 2009. On Diary. Ed. Jeremy D. Popkin and Julie Rak. University of Hawaii Press.
- Litt, Eden. September 2012. Knock, knock. Who's there? The imagined audience. Journal of Broadcasting & Electronic Media 56(3): 330–345.
- Marwick, Alice E. and danah boyd. 2010. I tweet honestly, I tweet passionately: Twitter users, context collapse, and the imagined audience. New Media & Society 13(1): 114-133. [alt link]
#6 Sep 8: Conversation structure + A1 updates
References
- Fay, Nicolas, Simon Garrod, and Jean Carletta. 2000. Group discussion as interactive dialogue or as serial monologue: The influence of group size. Psychological Science 11(6): 481-486. [alt link]
- Gonzalez-Bailon, Sandra, Andreas Kaltenbrunner, and Rafael E Banchs. 2010. The structure of political discussion networks: A model for the analysis of online deliberation. Journal of Information Technology 25(2): 230-243. [alt link]
#7 Sep 13: A1 presentations
- A1.R - A1 Reflection assignment
#8 Sep 15: (Breaking) conversation "rules"
Class images, links and handouts
- Gricean maxim of quantity, “be exactly as informative as is required”: Google-plus post; xkcd comic (as pointed out by Mark Liberman on Language Log)
References
- Pickering, Martin J. and Simon Garrod. 2004. Toward a mechanistic psychology of dialogue. Behavioral and Brain Sciences 27(02): 169-190. [alt link]
- Grice, H.P. 1975. Logic and Conversation. In Cole et al., Syntax and Semantics 3: Speech Acts. and 1978.
- Galantucci, Bruno and Gareth Roberts. 2014. Do we notice when communication goes awry? An investigation of people's sensitivity to coherence in spontaneous conversation. PLoS ONE 9(7).
- Langer, Ellen J, Arthur Blank, and Benzion Chanowitz.1978. The mindlessness of ostensibly thoughtful action: The role of "placebic" information in interpersonal interaction. J Pers Soc Psychol 36 (6): 635.
- Rogers, Todd and Michael I. Norton. 2011. The artful dodger: Answering the wrong question the right way. Journal of Experimental Psychology: Applied 17 (2). [alt link]
#9 Sep 20: Intro to discourse-structure theory
- Garley12Beefmoves-Mitra14Kickstarter - proposal based on lecture-11 readings
Class images, links and handouts
- Pinker, Steven and the Royal Society for the Encouragement of Arts, Manufactures and Commerce (RSA) Animate, posted to YouTube on Feb 10, 2011. Language as a Window into Human Nature
- handout
References
- Lam, Shyong K, Jawed Karim, and John Riedl. 2010. The effects of group composition on decision quality in a social production community. In Proceedings of the 16th ACM International Conference on Supporting Group Work, 55-64
- Justine Zhang, Ravi Kumar, Sujith Ravi, and Cristian Danescu-Niculescu-Mizil. 2016. Conversational flow in oxford-style debates. In Proceedings of NAACL.
- Clark, Herbert H. 2004. Pragmatics of language performance. In L. R. Horn & G. Ward (Eds.), Handbook of pragmatics. Oxford: Blackwell, pp. 365-382. [alt link]
- Grosz, Barbara J., and Sidner, Candace L. 1986. Attention, intentions, and the structure of discourse. Computational Linguistics 12(3): 175-204. [alt link]
- Section 24.7.2 of Jurafsky, Daniel and James H. Martin. 2009. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. 2nd edition. [chapter link at UCSC]
#10 Sep 22: Case study: from hypothesis to research
Class images, links and handouts
References
- Danescu-Niculescu-Mizil, Cristian, Michael Gamon, and Susan Dumais. 2011. Mark my words! Linguistic style accommodation in social media. In Proceedings of WWW.
- Danescu-Niculescu-Mizil, Cristian, Lillian Lee, Bo Pang, and Jon Kleinberg. 2012. Echoes of power: Language effects and power differences in social interaction. In Proceedings of WWW.
- Danescu-Niculescu-Mizil, Cristian and Lillian Lee. 2011. Chameleons in imagined conversations: A new approach to understanding coordination of linguistic style in dialogs. Proceedings of the ACL Workshop on Cognitive Modeling and Computational Linguistics.
- Levelt, Willem J M and Stephanie Kelter. 1982. Surface form and memory in question answering. Cogn Psychol 14 (1): 78 - 106.
- Giles, Howard, Justine Coupland, and Nikolas Coupland. 1991.Accommodation theory: Communication, context, and consequence. In Contexts of Accommodation: Developments in Applied Sociolinguistics. Cambridge Univ Pr.
- Gonzales, Amy L., Jeffrey T. Hancock, and James W. Pennebaker. 2010. Language style matching as a predictor of social dynamics in small groups.Communication Research 37 (1): 3-19.
- Feng, S, R Banerjee, and Y Choi. 2012. Characterizing stylistic elements in syntactic structure. Proceedings of EMNLP.
- Bramsen, Philip, Martha Escobar-Molana, Ami Patel, and Rafael Alonso. 2011. Extracting social power relationships from natural language. Proceedings of ACL HLT.
#11 Sep 27: Garley12Beefmoves-Mitra14Kickstarter
Class images, links and handouts
The readings
- Garley, Matt and Julia Hockenmaier. 2012. Beefmoves: Dissemination, diversity, and dynamics of English borrowings in a German hip hop forum. ACL.
- Mitra, Tanushree and Eric Gilbert. 2014. The language that gets people to give: Phrases that predict success on kickstarter. CSCW. [alt link]
References
- Backstrom, Lars, Dan Huttenlocher, Jon Kleinberg, and Xiangyang Lan. 2006. Group formation in large social networks: Membership, growth, and evolution. In KDD, 44-54. [alt link]
- Danescu-Niculescu-Mizil, Cristian, Robert West, Dan Jurafsky, Jure Leskovec, and Christopher Potts. 2013. No country for old members: User lifecycle and linguistic change in online communities. WWW, pp. 307-318. [alt link] Best paper award.
- Garley, Matthew E. 2012. Crossing the lexicon: Anglicisms in the German hip hop community. Ph.D. thesis, UIUC.
- Guerini, Marco, Gözde Ozbal, and Carlo Strapparava. 2015. Echoes of persuasion: The effect of euphony in persuasive communication. NAACL, pp. 1483-1493.
- Pierrehumbert, Janet B. 2012. The dynamic lexicon. Handbook of Laboratory Phonology. A. Cohen, M. Huffman, and C. Fougeron (eds). Oxford University Press.
- Tan, Chenhao Tan, Lillian Lee, and Bo Pang. 2014. The effect of wording on message propagation: Topic- and author-controlled natural experiments on Twitter. ACL.
- Tsvetkov, Yulia, Waleed Ammar, and Chris Dyer. 2015. Constraint-based models of lexical borrowing. In NAACL, 598-608
- Bali, Kalika, Yogarshi Vyas, Jatin Sharma, and Monojit Choudhury. 2014. "I am borrowing ya mixing?" An analysis of english-hindi code mixing in facebook. In proceedings of the first workshop on computational approaches to code switching. In Proceedings of the EMNLP Workshop on Computational Approaches to Code Switching
- Hamilton, William L, Jure Leskovec, and Dan Jurafsky. 2016. Diachronic word embeddings reveal statistical laws of semantic change. In ACL
- Eisenstein, Jacob, Brendan O'Connor, Noah A Smith, and Eric P Xing.2014. Diffusion of lexical change in social media. PLoS One 9 (11): e113114.
#12 Sep 29: Memes and relations
- Nguyen10Bias-Vasilescu14R - proposal based on lecture-13 readings
Class images, links and handouts
- MemeTracker visualization, including Variants of the "Lipstick on a pig" quote
- QUOTUS visualization: how media outlets quote the President.
The readings
- Krishnan, Vinodh and Jacob Eisenstein. 2015. "You're Mr. Lebowski, I'm the Dude": Inducing address term formality in signed social networks. NAACL, pp. 1616--1626.
- Simmons, Matthew P., Lada A. Adamic, and Eytan Adar. 2011. Memes online: Extracted, subtracted, injected, and recollected. Proceedings of ICWSM, pp. 353--360.
References
- Danescu-Niculescu-Mizil, Cristian, Justin Cheng, Jon Kleinberg, and Lillian Lee. 2012. You had me at hello: How phrasing affects memorability. ACL, pp. 892-901.
- Wang, William Yang and Wen, Miaomiao. 2015. I Can Has Cheezburger? A Nonparanormal Approach to Combining Textual and Visual Information for Predicting and Generating Popular Meme Descriptions. NAACL, pp. 355-365
- Lakkaraju, Himabindu, Julian McAuley, and Jure Leskovec. 2013. What's in a name? Understanding the interplay between titles, content, and communities in social media. ICWSM.
- Pavlick, Ellie and Ani Nenkova. 2015. Inducing lexical style properties for paraphrase and genre differentiation. In NAACL, 218-224
#13 Oct 4: Nguyen10Bias-Vasilescu14R
Class images, links and handouts
The readings
- Nguyen, Dong, Elijah Mayfield, and Carolyn P Rosé. 2010. An analysis of perspectives in interactive settings. In Proceedings of the First Workshop on Social Media Analytics, 44-52. [alt link]
- Vasilescu, Bogdan, Alexander Serebrenik, Prem Devanbu, and Vladimir Filkov. 2014. How social Q&A sites are changing knowledge sharing in open source software communities. CSCW, pp. 342--354. [alt link]
#14 Oct 6: Paper discussions
The readings
- Sudhof, Moritz, Andrés Goméz Emilsson, Andrew L Maas, and Christopher Potts. Sentiment expression conditioned by affective transitions and social forces. In Proceedings of KDD
- Wang, Lu and Claire Cardie. A piece of my mind: A sentiment analysis approach for online dispute detection.In Proceedings of the ACL, pp.693--699.
References
- Backstrom, Lars and Jon Kleinberg. 2014. Romantic partnerships and the dispersion of social ties: A network analysis of relationship status on facebook. In Proceedings of CSCW, 831-841.
- Niculae, Vlad, Srijan Kumar, Jordan Boyd-Graber, and Cristian Danescu-Niculescu-Mizil. 2015. Linguistic harbingers of betrayal: A case study on an online strategy game. In Proceedings of ACL
- Kittur, Aniket and Robert E Kraut. 2008. Harnessing the wisdom of crowds in wikipedia: Quality through coordination. In Proceedings of CSCW, pp.37--46.
- Lam, Shyong K, Jawed Karim, and John Riedl. 2010. The effects of group composition on decision quality in a social production community. In Proceedings of the 16th ACM International Conference on Supporting Group Work, 55-64.
Oct 11: Fall Break
#15-16 Oct 16 and Oct 18: (Optional) Proposal consultation appointments
#17-18 Oct 18 and Oct 25: (Mandatory) Feasibility presentations to the instructors
#19 Oct 26:Advanced yet “off-the-shelf” features roundupp
Assignments/announcements
Class images, links and handouts
- Louis, Annie and Ani Nenkova. 2013. What makes writing great? First experiments on article quality prediction in the science journalism domain. Transactions of the Association for Computational Linguistics 1:341-352.
- Flesch, Rudolf. June 1948. A new readability yardstick. Journal of Applied Psychology 32(3): 221-33. [Alternative link: the paper is bundled is the collection The Classic Readability Studies, ed. William H. DuBay. Published as Unlocking Language: The Classic Studies in Readability, BookSurge Publishing, 2007.
- MRC Psycholinguistic database. Wilson, M.D. (1988) The MRC Psycholinguistic Database: Machine Readable Dictionary, Version 2. Behavioural Research Methods, Instruments and Computers, 20(1), 6-11.
- Sentiment/subjectivity lexicons: Connotation Lexicon, MPQA lexica (goodFor/badFor, +/-affect, arguing, subjectivity), opinion lexicon, SentiWordNet. Financial sentiment (2014 version). See also the ...
- Multi-category lexicons: Harvard General Inquirer. The LIWC lexicon (2015 version). The NRC lexicons.
References
- Concreteness ratings. Brysbaert, M., Warriner, A.B., & Kuperman, V. (2014). Concreteness ratings for 40 thousand generally known English word lemmas. Behavior Research Methods 46:904-911.
- Valence, arousal, dominance ratings. Warriner, A.B., Kuperman, V., & Brysbaert, M. (2013). Norms of valence, arousal, and dominance for 13,915 English lemmas. Behavior Research Methods 45:1191-1207.
- Hedge-annotated data. Farkas, Richárd, Veronika Vincze, György Móra, János Csirik, and György Szarvas. 2010. The CONLL-2010 shared task: Learning to detect hedges and their scope in natural language text. Fourteenth Conference on Computational Natural Language Learning---Shared Task, pp. 1-12.
#20-21 Nov 1 and Oct 3: No lectures (EMNLP)
#22 Nov 8: N-Gram Language Models
Class images, links and handouts
- Quote memorability quizz
- MIT Language Modeling Toolkit
- SRILM - The SRI Language Modeling Toolkit
- N-gram language models in Python
- Mention of the bug in NLTK (the point being that language modeling can actually be quite subtle)
Lecture references
- Chen, Stanley F. and Joshua Goodman. 1996. An empirical study of smoothing techniques for language modeling. In Proceedings of the 34th Annual Meeting on Association for Computational Linguistics, 310-318. More detailed technical report version (recommended)
- Danescu-Niculescu-Mizil, Cristian, Justin Cheng, Jon Kleinberg, and Lillian Lee. 2012. You had me at hello: How phrasing affects memorability. Proceedings of the ACL, pp.892--901.
- F. Jelinek, R.L. Mercer and S. Roukos. Principles of Lexical Language Modeling for Speech Recognition. Advances in Speech Signal Processing, S. Furui and J. Sondhi, Eds. M. Dekker Publishers, New York, NY 1991. Pp.651-700
- Gale, William A. and Kenneth W. Church. 1994.What's wrong with adding one. Corpus-based Research Into Language: In Honour of Jan Aarts, pp. 189--200.
#23 Nov 10: Entropy and Divergence
Assignments/announcements
- Next week we'll have mandatory project progress-and-problems appointments. By 2pm the afternoon before your progress-and-problems appointment day, post a Piazza followup to your proposal that summarizes your progress and what discussion points or problems you'd like to bring up with me. Ideally, this followup post will be the agenda for your team's appointment, and will make the meeting efficient and useful for you.
Class images, links and handouts
Lecture references
- Fu, Liye, Cristian Danescu-Niculescu-Mizil, and Lillian Lee. 2016. Tie-breaker: Using language models to quantify gender bias in sports journalism. In Proceedings of the IJCAI Workshop on NLP Meets Journalism
- Tan, Chenhao, Lillian Lee, and Bo Pang. 2014. The effect of wording on message propagation: Topic-and author-controlled natural experiments on twitter. In ACL
- Stark, Anthony, Izhak Shafran, and Jeffrey Kaye. 2012. Hello, who is calling?: Can words reveal the social nature of conversations?. Proceedings of NAACL
- Juola, Patrick and Harald R. Baayen. A controlled-corpus experiment in authorship identification by cross-entropy. Literary and Linguistic Computing 20 (Suppl 1): 59-67.
- Purohit, Hemant, Yiye Ruan, David Fuhry, Srinivasan Parthasarathy and Amit Sheth. 2014. On understanding the divergence of online social group discussion. Proceedings of ICWSM
- Tran, Trang and Mari Ostendorf. 2016. Characterizing the language of online communities and its relation to community reception. In EMNLP
#26-27 Nov 15 and Nov 17: Mandatory projects progress-and-problems appointments
#24-25 Nov 22: Entropy case study: language change. Review of practical research tips. Controlls in observational studies.
- Genzel, Dmitriy and Eugene Charniak. 2002. Entropy rate constancy in text. ACL, pp.199--206.
- Doyle, Gabriel and Michael C Frank. 2015. Audience size and contextual effects on information density in twitter conversations. Proceedings of the ACL Workshop on Cognitive Modeling and Computational Linguistics (CMCL), pp.19--28.
- Danescu-Niculescu-Mizil, Cristian, Robert West, Dan Jurafsky, Jure Leskovec, and Christopher Potts. 2013. No country for old members: User lifecycle and linguistic change in online communities. WWW, pp.307--318. Best paper award.
- Danescu-Niculescu-Mizil, Cristian, Justin Cheng, Jon Kleinberg, and Lillian Lee. 2012. You had me at hello: How phrasing affects memorability. Proceedings of the ACL, pp.892--901.
Nov 24: Thanksgiving Break
#27 Nov 29: Project presentations (mandatory attendance by all students for the whole session)
Schedule on Piazza. Starting at 1:15pm
#28 Dec 1: Project presentations (mandatory attendance by all students for the whole session)
Schedule on Piazza. Starting at 1:15pm
Final project description due: 12/09/16 4:30 PM (date determined by the registrar)
The main evaluation criteria will be the reasonableness (in approach and amount of effort), thoughtfulness, and creativity of what you tried, as documented in your writeup. Individual effort within team projects will be taken into account; see item 3 below.
- Use the ICWSM style files provided by AAAI(LaTex style and bib files, Word template)
- We make this requirement to facilitate submission to ICWSM 2017. However, note that your final-project submission should have your names and acknowledgments included, in a particular format (see item 1c amd 2b below); in contrast, you will want to strip any identifying information for ICWSM submissions.
- AAAI prefers non-numbered section headings. You may change the style files to include section numbers in your headings for the purposes of CS6742 submission.
- For the author heading, list only the names of your teammates that are enrolled in the class, even if you had external collaborators. (Reason: only students in the class are submitting the paper for a grade.) But see item 2b1 below.
- Include the following sections:
- "content" sections: abstract, introduction/motivation, data description (how you gathered, cleaned, and processed it), methods, an experiments, related work, references, conclusions (what you learned), directions for future work.
- Make sure that your introduction section explicitly sets out your hypothesis or hypotheses.
- Throughout, highlight your most interesting findings (positive or negative).
- For the purposes of CS6742 submission, your related-work section does not need to be exhaustive; you may cover just a few most-related papers.
- An "acknowledgments" section: give the name and state the contribution of those who you received significant help from. (This may or may not include your advisor(s), your instructor, fellow students in the class).
- Authorship statement: if you intend to ask or have already arranged to have people other than your CS6742-enrolled teammates, also name each such person.
- "content" sections: abstract, introduction/motivation, data description (how you gathered, cleaned, and processed it), methods, an experiments, related work, references, conclusions (what you learned), directions for future work.
- Projects done collaboratively must also include a section describing who did what. External collaborators should be included in this enumeration.
- Use the number of pages you feel is appropriate.
Code for generating the calendar formatting adapted from the original versions created byAndrew Myers.