CompSci 516 - Databases Systems (original) (raw)

Overview

This is the graduate database course at Duke CS. This course will cover principles and design of database management systems at an advanced level.

Topics will include:

SQL/Relational Algebra/Relational Calculus, Database Normalization, DBMS Architecture/Storage, Indexing/Hashing, Query Algorithms and Optimizations, Transactions and Recovery, Parallel DB/Map Reduce/Distributed query processing, NOSQL/Column store, Datalog, Advanced and Research Topics in Databases (TBD).

Prerequisites:

CompSci 316 or an undergraduate database course is **not** a hard prerequisite as we start from basics in databases and SQL in this course. However, having a background in CS (e.g., algorithms, data structures, sets, programming, basic systems knowledge etc.) and experience in some programming knowledge helps. If you have taken 316 before, there may be some overlap of topics. In 516, there will be a group project and individual assignments in old and new database systems (SQL, Spark, NOSQL) -- quite a bit of these topics you are expected to learn yourself with help from TAs/online tutorials (along with any other topic that is needed as a background knowledge). If you do not have a CS background, you are encouraged to contact the instructor to check if the course is suitable for you.

Time/Day

Lectures: Tuesdays and Thursdays, 10:15 am - 11:30 am Place: French Science 2231

Course Staff

Instructor

Sudeepa Roy Office hour: Mondays 4-5 pm (Zoom link on Ed) and after class TTh (in-class / lecture Zoom).

TAs

Grading

You have a considerable amount of control over the final grade you would receive in this class if you work hard, and should have a fair idea where you approximately stand at any point of time!

Weights of each component:

See details in Workload below.

Component Weight
Homeworks 30%
HW1: SQL (Traditional DBMS) 10%
HW2: Map-Reduce / Spark (Big Data) 10%
HW3: MongoDB / NOSQL (New DBMS) 10%
Exams 45%
Midterm 20%
Final exam 25%
Project 13%
Class participation 12%
In-class labs and quizzes (equal weight for each, lowest score dropped) 10%
Communication / Surveys (equal weight for each) 2%

Workload

Resources / Communication / Toolkits

**Book:**The textbooks are optional and may be found in the library. Lecture slides and regular class participation should be sufficient for this class.

We will use the following two books in this class:

  1. [RG] (Main) Database Management Systems (third edition); Raghu Ramakrishnan and Johannes Gehrke.
  2. [GUW] Database Systems: The Complete Book, by Hector Garcia-Molina, Jeffrey D. Ullman, and Jennifer Widom. 2nd Edition. Prentice Hall. 2008. Relevant chapters for reading are posted under Schedule. In a typical semester, textbooks for this course are available for 3-hour checkouts at the Duke Libraries. Search the Libraries' Top Textbooks program here: https://library.duke.edu/course-support/course-reserves/textbooks. Please consult the library for options during the COVID-19 situation. Gradescope: We will mostly use Gradescope for submission and grading of homeworks and project work. Communication: You should check your email regularly for important course-related announcements. All important announcements will be sent through Sakai. Old email messages can be found under "Email Archive" on Sakai.We will use the Sakai course management system for posting sample solutions (under "Resources") and for checking grades (under "Gradebook").Computing: You will need access to a computer (any major OS will do) on which you are allowed to install new software. We will also use cloud-based virtual machines - see Help for details.

Course Policy and Honor Code

Under the Duke Honor Code, the students are expected to submit their own work in this course in the homework and exams (note that the students will work on the project in groups). The students are allowed (and are encouraged) to discuss the course material with other students, but need to solve problems in the homeworks and exams on their own. Any assistance received must be clearly indicated in the solutions -- failure to do so will be considered a violation of the Honor Code. In any event, the students are responsible for understanding and being able to explain on their own all solutions that they submit. The course staff will pursue aggressively all suspected cases of Honor Code violations, and they will be handled through official University channels.If you are unsure of a policy, please ask the instructor and do not assume anything.

What is allowed/not allowed

Help

TBA

Schedule

(subject to change) "Notes" will be uploaded before the class and are intentionally left incomplete for interactive lectures. Completed "slides" will be uploaded after the lectures.

| | Day | Topic | Slides | Additional material / reading | Release/Due dates | | | ---------- | ------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------- | | 1 | 1/6 (Th) | Introduction and SQL | Lecture-1-notes Lecture-1 (up to slide 46) | Instructions to install Postgres and load MovieLens dataset: slides | | | 2 | 1/11 (T) | Data model | Lecture-2-notes Lecture-2 (up to slide 36) | | HW1-Part I (SQL) released | | 3 | 1/13 (Th) | SQL: intro, semantics, modifications aggregation, subqueries, NULL, outerjoin, constraints, triggers, views | Lecture-3-notes Lecture-3 (up to slide 38) | Check out some optional slides on SQL Programming at the end of the lecture slides | | | 4 | 1/18 (T) | SQL: aggregation, subqueries, NULL, outerjoin, constraints, triggers, views | (see Lec 3 notes, from p7, slide 38) Lecture-4 (up to slide 28) | | | | 5 | 1/20 (Th) | Relational Algebra and Calculus | (See Lecture-4 slides 27 to 42) Lecture-5-notes Lecture-5 (up to slide 24) | | | | 6 | 1/25 (T) | Contd. | Lecture-6-notes (finished Lecture-5) | | Team members and tentative topic of project due | | 7 | 1/27 (Th) | Normalization, FDs, BCNF | Lecture-7 (up to slide 38) | Quiz1-RA posted | | | 8 | 2/1 (T) | a. Map-Reduce and Sparkb. Storage | Lecture-8a Lecture-8b | HW1 (SQL) - Part I and II - due | | | 9 | 2/3 (Th) | Tree and Hash Index | Lecture-9-11 (up to slide 11) | Project proposal due (MOVED to Monday 2/7) | | | 10 | 2/8 (T) | Contd. | B+-Tree Index (up to slide 46) | Quiz1-RA due | | | 11 | 2/10 (Th) | Contd. | Hash Index (up to slide 74) | HW2 (AWS/Spark) released | | | 2/14 (M) | | Quiz-2 and 3 due on Gradiance - 12 NOON | | | | | 12 | 2/15 (T) | Midterm exam | | | | | 13 | 2/17 (Th) | Transactions | Lecture-13 (up to slide 37) | | | | 14 | 2/22 (T) | Transactions - concurrency control | Lecture-14 (up to slide 13) | | | | 15 | 2/24 (Th) | Contd. | | | | | 16 | 3/1 (T) | Transactions - recovery | Lecture-16 | | | | 17 | 3/3 (Th) | External Sorting and Index Selection | Lecture-17 | HW2 (AWS/Spark) due at noon | | | 3/8 (T) | Spring Break - no class | | | | | | 3/10 (Th) | Spring Break - no class | | | | | | 18 | 3/15 (T) | Query Evaluation and Join Algorithms | Lecture-18 | Midterm project report due at noon | | | 19 | 3/17 (Th) | Contd. | | HW3 (NOSQL/MongoDB) released (tentative) | | | 20 | 3/22 (T) | Distributed Databases, NOSQL, MongoDB | Lecture 20 | MongoDB & JSON Basics | | | 21 | 3/24 (Th) | Contd.(up to slide 35) | | Quiz-4-Transactions due on Gradiance | | | 22 | 3/29 (T) | Query Optimization | Lecture 22 (up to slide 22) | | | | 23 | 3/31 (Th) | Contd. | | | | | 24 | 4/5 (T) | Parallel Databases | Lecture 24 | HW3 (NOSQL/MongoDB) due | | | 25 | 4/7 (Th) | Recursive query evaluation and Datalog | Lecture 25 | | | | 26 | 4/12 (T) | Data mining and data cube | Lecture 26 | | | | 4/13 (W) | Project final report and video due (can be submitted by Friday 4/15 noon) | | | | | | 4/27 (Wed) | Final Exam, 9am-12 pm EST, in class | | | | |