CSC401/2511 :: Natural Language Computing :: University of Toronto (original) (raw)

Contact information

Instructor Gerald Penn
Office PT 283
Office hours M 4-6pm
Email gpenn@teach.cs.toronto.edu (please put CSC 401/2511 in the subject line)
Forum (Piazza) Piazza - (signup)
Quercus https://q.utoronto.ca/courses/352606
Email policy For non-confidential inquiries, consult the Piazza forum first. Otherwise, for confidential assignment-related inquiries, consult the TA associated with the particular assignment. Emails sent with appropriate subject headings and from University of Toronto email addresses are most likely not to be redirected towards junk email folders, for example.

Back to top

Course overview

This course presents an introduction to natural language computing in applications such as information retrieval and extraction, intelligent web searching, speech recognition, and machine translation. These applications will involve various statistical and machine learning techniques. Assignments will be completed in Python. All code must run on the 'teaching servers'.

Prerequisites: CSC207/ CSC209/ APS105/ APS106/ ESC180/ CSC180 and STA237/ STA247/ STA255/ STA257/ STAB52/ ECE302/ STA286/ CHE223/ CME263/ MIE231/ MIE236/ MSE238/ ECE286 and a CGPA of 3.0 or higher or a CSC subject POSt. MAT 223 or 240, CSC 311 (or equivalent) are strongly recommended.

See also the course information sheet.

Back to top

Meeting times

Locations BA Bahen Centre for Information Technology
Lectures MW 10-11h at BA 1180; 11-12h at BA 1190
Tutorials F 10-11h at BA 1180; 11-12h at BA 1190

Back to top

Syllabus

The following is an estimate of the topics to be covered in the course and is subject to change.

  1. Introduction to corpus-based linguistics
  2. _N_-gram, linguistic features, word embeddings
  3. Entropy and information theory
  4. Intro to deep neural networks and neural language models
  5. Machine translation (statistical and neural) (MT)
  6. Transformers, attention based models and variants
  7. Large language models (LLMs)
  8. Acoustics and phonetics
  9. Speech features and speaker identification
  10. Dynamic programming for speech recognition.
  11. Speech synthesis (TTS)
  12. Information Retrieval (IR)
  13. Text Summarization
  14. Ethics in NLP

Calendar

4 September First lecture
18 September Last day to enrol
24 September Part 1 of Assignment 1 due
8 October Assignment 1 due
28 October Last day to drop CSC 2511
28 October - 1 November Reading week -- no lectures or tutorial
4 November Last day to drop CSC 401
5 November Assignment 2 due
3 December Last lecture
3 December Assignment 3 due
6-21 December Final exam period

See Dates for undergraduate students.

See Dates for graduate students.

Back to top

Readings for this course

Optional Foundations of Statistical Natural Language Processing C. Manning and H. Schutze ErrataOnline edition (free if you're on a UofT computer of VPN)
Optional Speech and Language Processing D. Jurafsky and J.H. Martin (2nd ed.) Errata3rd ed. N.B. all readings sections refer to the 2nd ed.
Optional Deep Learning I Goodfellow, Y Bengio, and A Courville

Supplementary reading

Please see additional lecture specific supplementary resources under Lecture Materials section.

Topic Title Author(s) Misc
Good-Turing Smoothing A comparison of the enhanced Good-Turing and deleted estimation methods for estimating probabilities of English bigrams Kenneth Church and William Gale
ML History What is science for? The Lighthill report on artificial intelligence reinterpreted Jon Agar
Smoothing An Empirical Study of Smoothing Techniques for Language Modeling Stanley F Chen and Joshua Goodman
Hidden Markov models A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition Lawrence R. Rabiner
Sentence alignment A Program for Aligning Sentences in Bilingual Corpora William A. Gale and Kenneth W. Church
Transformation-based learning Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging Eric Brill
Sentence boundaries Sentence boundaries Read J, Dridan R, Oepen S, Solberg LJ
Seq2Seq Sequence to Sequence Learning with Neural Networks Ilya Sutskever, Oriol Vinyals, Quoc V. Le
Transformer Attention Is All You Need Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin
Attention-based NMT Effective Approaches to Attention-based Neural Machine Translation Minh-Thang Luong, Hieu Pham, Christopher D. Manning
NMT Neural machine translation by jointly learning to align and translate Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio
NMT Massive exploration of neural machine translation architectures Britz, Denny, et al.

Back to top

Evaluation policies

General

You will be graded on three homework assignments and a final exam. The relative proportions of these grades are as follows:

Assignment 1 20%
Assignment 2 20%
Assignment 3 20%
Ethics Surveys (2x) 1%
Final exam 39%

Lateness

A 10% (absolute) deduction is applied to late homework one minute after the due time. Thereafter, an additional 10% deduction is applied every 24 hours up to 72 hours late at which time the homework will receive a mark of zero. No exceptions will be made except in case of documented emergencies.

Final

The final exam will be a timed 3-hour test. A mark of at least 50 on the final exam is required to pass the course. In other words, if you receive a 49 or less on the final exam then you automatically fail the course, regardless of your performance in the rest of the course.

Collaboration and plagiarism

No collaboration on the homeworks is permitted. The work you submit must be your own. `Collaboration' in this context includes but is not limited to sharing of source code, correction of another's source code, copying of written answers, and sharing of answers prior to or after submission of the work (including the final exam). Failure to observe this policy is an academic offense, carrying a penalty ranging from a zero on the homework to suspension from the university. The use of AI writing assistance (ChatGPT, Copilot, etc) is allowed only for refining the English grammar and/or spelling of text that you have already written. Submitting any Python code generated or modified by any AI assistants is strictly prohibited. See Academic integrity at the University of Toronto.

Back to top

Lecture materials

  1. Introduction
    • Date: 4 Sep.
    • Reading: Manning & Schütze: Sections 1.3-1.4.2, Sections 6.0-6.2.1
  2. Corpora and Smoothing
    • Dates: 9-16 Sep.
    • Reading: Manning & Schütze: Section 1.4.3, Section 6.1-6.2.2, Section 6.2.5, Sections 6.3
    • Reading: Jurafsky & Martin: 3.4-3.5
    • See also the supplementary reading for Good-Turing smoothing
  3. Features and Classification
    • Date: 18-23 Sep.
    • Reading: Manning & Schütze: Section 1.4.3, Section 6.1-6.2.2, Section 6.2.5, Sections 6.3
    • Reading: Jurafsky & Martin: 3.4-3.5
  4. Entropy and information theory
    • Dates: 25-30 Sep.
    • Reading: Manning & Schütze: Sections 2.2, 5.3-5.5
  5. Intro. to NNs and Neural Langauge Models
    • Dates: 7, 9 Oct.
    • Reading: DL (Goodfellow et al.). Sections: 6.3, 6.6, 10.2, 10.5, 10.10
    • (Optional) Supplementary resources and readings:
      • Mikolov, Tomas, et al. "Efficient estimation of word representations in vector space. (2013)" link
      • Xin Rong. "word2Vec Parameter Learning Explained". link
      • Bolukbasi, Tolga, et al. "Man is to computer programmer as woman is to homemaker? debiasing word embeddings." NeurIPS (2016). link
      • Greff, Klaus, et al. "LSTM: A search space odyssey." IEEE (2016). link
      • Jozefowicz, Sutskever et al. "An empirical exploration of recurrent network architectures." ICML (2015). link
      • GRU: Cho, et al. "Learning phrase representations using RNN encoder-decoder for statistical machine translation." (2014). link
      • ELMo: Peters, Matthew E., et al. "Deep contextualized word representations. (2018)." link
        Blogs:
      • The Unreasonable Effectiveness of Recurrent Neural Networks. link
      • Colah's Blog. "Understanding LSTM Networks". link.
  6. Machine Translation (MT)
    • Dates: 16,21,23 Oct.
    • Readings:
      • Manning & Schuütze Sections 13.0, 13.1.2, 13.1.3, 13.2, 13.3, 14.2.2
      • DL (Goodfellow et al.). Sections: 10.3, 10.4, 10.7
    • (Optional) Supplementary resources and readings:
      • Papineni, et al. "BLEU: a method for automatic evaluation of machine translation." ACL (2002). link
      • Sutskever, Ilya, Oriol Vinyals et al. "Sequence to sequence learning with neural networks."(2014). link
      • Bahdanau, Dzmitry, et al. "Neural machine translation by jointly learning to align and translate."(2014). link
      • Luong, Manning, et al. "Effective approaches to attention-based neural machine translation." arXiv (2015). link
      • Britz, Denny, et al. "Massive exploration of neural machine translation architectures."(2017). link
      • BPE: Sennrich, et al. "Neural machine translation of rare words with subword units." arXiv (2015). link
      • Wordpiece: Wu, Yonghui, et al. "Google's neural machine translation system: Bridging the gap between human and machine translation." arXiv (2016). link
        Blogs:
      • Distill: Olah & Carter "Attention and Augmented RNNs"(2016). link
  7. Transformers
    • Dates: 4,6 Nov.
    • Readings:
      • Vaswani et al. "Attention is all you need." (2017). link
    • (Optional) Supplementary resources and readings:
      • RoPE: Su, Jianlin, et al. "Roformer: Enhanced transformer with rotary position embedding." (2021). [arxiv]
      • Ba, Kiros, and Hinton. "Layer normalization." (2016). [link]
      • Xiong, Ruibin, et al. "On layer normalization in the transformer architecture." ICML PMLR (2020). [link]
      • Xie et al. "ResiDual: Transformer with Dual Residual Connections." (2023). [arxiv] [github]
        BERTology:
      • Devlin et al. "BERT: Pre-training of deep bidirectional transformers for language understanding." (2019). link
      • Clark et al. "What does bert look at? an analysis of bert's attention." (2019). link
      • Rogers, Anna et al. "A primer in BERTology: What we know about how bert works." TACL(2020). link
      • Tenney et al. "BERT rediscovers the classical NLP pipeline." (2019). link
      • Niu et al. "Does BERT rediscover a classical NLP pipeline." (2022). link
      • Lewis et al. "BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension." (2019). link
      • T5: Raffel et al. "Exploring the limits of transfer learning with a unified text-to-text transformer." J. Mach. Learn. Res. 21.140 (2020). link
      • GPT3: Radford et al. "Language models are few-shot learners." (2020). link
        Attention-free models:
      • Fu, Daniel, et al. "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture." (2023). [arxiv]. [blog].
        Token-free models:
      • Clark et al. "CANINE: Pre-training an efficient tokenization-free encoder for language representation." (2021). link
      • Xue et al. "ByT5: Towards a token-free future with pre-trained byte-to-byte models." (2022). link
        Blogs:
      • Harvard NLP. "The Annotated Transformer". link.
      • Jay Allamar. "The Illustrated Transformer". link.
  8. Acoustics and Phonetics
  9. Speech Features and Speaker Identification
    • Dates: 13,18 Nov.
    • Readings:
      • Jurafsky & Martin SLP3 (3rd ed.): Chapter 16. link
  10. Dynamic Programming for Speech Recognition
  1. Information Retrieval (IR)
  1. Text Summarization
  1. Guest Lectures on Ethics: [Module 1], [Module 2]
  1. Summary and Review (last lecture).

Tutorial materials

Assignments

Here is the ID template that you must submit with your assignments.

Head TA: Ken Shi

Extension requests: All extension requests must be made to the head TA. All undergrads should follow the FAS student absences policy. Specifically, undergrads must file an ACORN absence declaration when it is allowed, and a VOI form for extensions due to illness when it is not allowed (because an ACORN declaration has already been filed this term). Grads should always use a VOI form for extensions due to illness.

Remark requests: Please follow the remarking policy.

General Tips & F.A.Q.:

Assignment 1: Financial Sentiment Analysis

Assignment 2: Neural Machine Translation with Transformers

Assignment 3: ASR, Speakers, and Lies

Back to top

News and announcements

Back to top