CompSci 516 - Database Systems (original) (raw)

Overview

This is the graduate database course. This course will cover principles and design of database management systems at an advanced level.

Topics will include:
SQL/Relational Algebra/Relational Calculus, Database Normalization, DBMS Architecture/Storage, Indexing/Hashing, Query Algorithms and Optimizations, Transactions and Recovery, Parallel DB/Map Reduce/Distributed query processing, NOSQL/Column store, Datalog, Advanced and Research Topics in Databases (TBD).

Textbooks:

  1. [RG] (Main) Database Management Systems (third edition); Raghu Ramakrishnan and Johannes Gehrke.
  2. [GUW] (Additional) Database Systems: The Complete Book (second edition); Hector Garcia-Molina, Jeffrey Ullman, and Jennifer Widom

Prerequisites:
An introductory database course (CompSci 316 or equivalent) or consent of the instructor. Some background in Algorithms, Data Structure, and Discrete Maths will be assumed. Undergraduate students with the necessary background and interests are welcome.

Grading

Homework
There will be three homework assignments. They have to solved strictly individually by every student (see the honor code below). There are no late days.

Project
There will be a semester-long project on topics chosen by the students in groups of 3-4. Students are encouraged to choose a research project of their own research interests that is related to data management / processing / visualization / applications / theory. Some ideas of the projects will be posted.

The deliverables of the project will be (1) a project proposal (1-3 pages), (2) a midterm project report (3-5 pages), (3) the final project report (4-8 pages), and (4) a short (~10 minutes) class presentation at the end. The same document will be updated through the semester as the project progresses. A template of the project report will be posted on sakai.

Exams
Exams are closed book and closed notes, and in class. No electronic devices are allowed.

Honor Code:
Under the Duke Honor Code, the students are expected to submit their own work in this course in the homework and exams (note that the students will work on the project in groups). The students are allowed (and are encouraged) to discuss the course material with other students, but need to solve problems in the homeworks and exams on their own. Any assistance received must be clearly indicated in the solutions -- failure to do so will be considered a violation of the Honor Code. In any event, the students are responsible for understanding and being able to explain on their own all solutions that they submit. The course staff will pursue aggressively all suspected cases of Honor Code violations, and they will be handled through official University channels.

What is allowed/not allowed

Schedule

(subject to change) "Notes" will be uploaded before the class and are intentionally left incomplete for interactive lectures. Completed "slides" will be uploaded after the lectures.

| | Day | Topic | Slides | Reading | | | ----------- | --------------------------------------------- | ---------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | 1 | 8/28 (T) | Introduction and Data Models | Lec-1-notes Lecture-1 | [RG] 1.1, 1.3, 1.4, 1.5 | | 2 | 8/30 (Th) | SQL | Lec-2-notes Lecture-2 | SQL: [RG] 3, 5 (also see 4.2.4), [GUW] 6 XML (optional reading): [RG] 27.6, [GUW] 11.1, 11.2 | | 3 | 9/4 (T) | More SQL | Lec-3-notes Lecture-3 | | | 4 | 9/6 (Th) | Relational Algebra/Calculus | Lec-4-notes Lecture-4 | [RG] 4, [GUW] 2.4, 5.1, 5.2 | | 5 | 9/11 (T) | Design Theory and Normalization | Lec-5-notes Lecture-5 | [RG] 19.1-19.5, 19.6.1, 19.8 (overview only)[GUW] 3 | | 9/13 (Th) | Class canceled due to bad weather | | | | | 6 | 9/18 (T) | Storage | Lec-6-notes Lecture-6 | [RG] 8.1-8.5, 9.4-9.7, 10.1-10.7, 11[GUW] 8.3, 14.1-14.4 | | 7 | 9/20 (Th) | Index | Lec-7-8-notes Lecture-7-8-Index | | | 8 | 9/25 (T) | Index | | | | 9 | 9/27 (Th) | External Sorting | Lec-9-notes Lecture-9 | [RG] 13[GUW] 15.4.1 | | 10 | 10/2 (T) | Query Evaluation, Operator and Join Algorithms | Lec-10-notes Lecture-10 | [RG] 14 Optional reading: (1) "Architecture of a Database System" by Joseph M. Hellerstein, Michael Stonebraker, and James Hamilton [pdf], Chapters 1.1 and 4.1-4.5(2) "Query Evaluation Techniques for Large Databases" by Goetz Graefe [pdf] | | 11 | 10/4 (Th) | Map-Reduce and Spark | Lec-11-notes Lecture-11 | Spark_RDD Google_File_System Google_MapReduce | | 10/9 (T) | No class- Fall break | | | | | 10/11 (Th) | Midterm (in class) | | | | | 12 | 10/16 (T) | Query optimization | Lec-12-notes Lecture-12 | [RG] 15 [GUW] 14.2-14.7 Optional reading: (1) "Access Path Selection in a Relational Database Management System" by Selinger et al. [pdf](2) "An Overview of Query Optimization in Relational Systems" by Chaudhuri et al. [pdf] | | 13 | 10/18 (Th) | SparkGuest Lecture by Prajakta Kalmegh | Lecture-13-Spark | | | 14 | 10/23 (T) | Transactions | Lec-14-notes Lecture-14 | [RG] 16.1-16.3, 16.4.1, 17.1-17.4 | | 15 | 10/25 (Th) | Transactions: Concurrency control | Lec-15-notes Lecture-15 | [RG] Chapter 17.5.1, 17.5.3, 17.6 [GUW] Chapter 18.8, 18.9 | | 16 | 10/30 (T) | Transactions: Concurrency Control | | | | 17 | 11/1 (Th) | Transactions: Recovery | Lec-16-notes Lecture-16 | [GUW] 17.2-17.4 | | 18 | 11/6 (T) | Distributed Databases | Lec-18-notes Lecture-18 | | | 19 | 11/8 (Th) | Distributed Databases (contd.) | | | | 20 | 11/13 (T) | NOSQL and Parallel Databases | Lec-20-NOSQL-notes Lecture-20-NOSQL Lec-20-ParallelDB-notes Lecture-20-ParallelDB | Additional slides on MongoDB Lecture-20-Optional | | 21 | 11/15 (Th) | Recursive query evaluation and Datalog | Lec-21-notes Lecture-21 | | | 22 | 11/20 (T) | Datalog (contd.) | | | | 11/22 (Th) | No class - Thanksgiving Recess | | | | | 23 | 11/27 (T) | Data cube and data mining | Lec-23-notes Lecture-23 | | | 24 | 11/29 (Th) | Project Presentations | | | | 12/15 (Sat) | Final Exam (2:00 PM - 5:00 PM), LSRC D106 | | | |

Homework

Homework Topic Posted on Due on
HW1 SQL and Postgres
HW2 Spark and AWS
HW3 NOSQL

Project Milestones

Milestone Due on
Project Proposal (1-3 pages)
Midterm Report (3-5 pages)
Final Report (4-8 pages)