CompSci 316 - Introduction to Databases (original) (raw)

Office Hour Calendar: Office Hour Calendar

Not sure how your TAs look like? Check out first few slides of Lecture 1!

Overview

We intend this course to give you a solid background in database systems as well as managing and processing "big data" in general. Topics include data modeling, database design theory, data definition and manipulation languages (SQL and NoSQL), database application programming interfaces, storage and indexing, query processing and optimization, parallel and distributed data processing, transaction processing, as well as a sample of other topics such as data mining and web data. Programming projects are required.

Prerequisites:

CompSci 201 or equivalent, or consent of the instructor. You will need familiarity (or ability to quickly become familiar) with the Unix command line (such as "Terminal" in Mac OS).

Grading

Updates in grading due to unprecedented change of circumstances under COVID-19:

  1. Please see this website for remote learning and other resources in this challenging time: https://keeplearning.duke.edu/.
  2. As you should have heard from Duke, the default grading of this class has changed to S/U. You have to submit a form to the registrar if you decide to opt for the letter grade for the course. We expect everyone who work on all components of the course (note that some lowest scores would be dropped) to the best of their capacity to receive an S grad at the end.
  3. There might be unexpected special circumstances, and we would consider such scenarios after the spring break to decide both the S/U grades and letter grades for such students. If you are concerned about your situation regarding any of the assignments in the class or in general, please send the instructor an email copying your academic dean.
  4. The thresholds for the letter grades remain the same for now, but as earlier, the thresholds may go down based on performance of the entire class at the end.
  5. NEW: We will reweigh the total 40% weight of the exams in the following way: The higher score will have 70% weight (i.e. 28% of the course grade) weight and the lower score will have 30% weight (i.e. 12% of the course grade).
  6. NEW: The lowest homework grade will be dropped.
  7. NEW: There will be two days of extension without penalty for each remaining homework. But the next homework might be posted, so you are encouraged to work on the parts of the homeworks as soon as the corresponding concept is covered in class.
  8. NEW: The 5% + 5% = 10% total weight of in-class labs and quizzes will be updated as follows. We have had 2 quizzes and 2 labs so far. As announced earlier, we will drop the lower quiz and lower lab and include the higher ones toward this 10% total weight (100 points each, scaled up to 100 from 10 for the quiz). The rest will be coming from watching the videos posted for each class (each class will have 100 points). You will submit on gradescope whether you watched all the videos of the class. If there are multiple short videos of the same class, the 100 points will be equally divided among them. We will drop the lowest score (from one quiz, one lab, and all videos) at the end.
  9. The details of the online final exam will be announced later. We will take into account time-zone differences and potential problems with network connectivity while designing the exam format.
  10. There are concerns about collaborations to complete the group projects in the survey. The good thing is that, as you already have finished HW4 as a team, you should have a basic implementation of a webpage already. We will consider this disruption in collaboration and lost time while grading the projects. We suggest each team to find common times to meet weekly using when2meet and work on the project together using ZOOM at least for some time each week.
  11. We plan to use when2meet for homework collaboration where students can post their weekly availability to discuss homework problems. Links will be posted on sakai. Then you should form small groups and discuss homeworks on ZOOM. Note that the same collaboration policy as before still applies, i.e., you can only discuss with course staff and students in the class, you must acknowledge all help in your submission, you need to write your solution and should be able to explain your solution, you cannot search online, from previous semesters' solutions, orf discuss with anyone outside the class. Any violation of course policy will be aggressively pursued and actions will be taken.
  12. Hope you all stay safe and healthy wherever you are!

Grading is done on an absolute, but adjustable scale. In other words, there is no curve. Anyone earning

Information

Book: We use the following book: Database Systems: The Complete Book, by Hector Garcia-Molina, Jeffrey D. Ullman, and Jennifer Widom. 2nd Edition. Prentice Hall. 2008. See the publisher's book page and the Amazon book page. Relevant chapters for reading are posted under Schedule. Textbooks for this course are available for 3-hour checkouts at the Duke Libraries. Search the Libraries' Top Textbooks program here: https://library.duke.edu/course-support/course-reserves/textbooks.

Gradescope: We will use Gradescope (follow the link from the navigation bar) for submission and grading of (non-Gradiance) homeworks and project work, as well as grading of exams. Be sure to read Help/Submitting Non-Gradiance Work for instructions.

Communication: You should check your email regularly for important course-related announcements.

All questions that may be of general interest to the class should be directed to Piazza (follow the link from the navigation bar); do not use the mailing list. You will get your questions answered faster on Piazza than via personal emails to the course staff, because Piazza is monitored closely by everybody in the class, not just the course staff.

Sakai: We will use the Sakai course management system for posting sample solutions (under "Resources") and for checking grades (under "Gradebook").

Computing: You will need access to a computer (any major OS will do) on which you are allowed to install new software. We will also use cloud-based virtual machines - see Help for details.

Standards of Conduct: Under the Duke Community Standard, you are expected to submit your own work in this course, including homeworks, projects, and exams. On many occasions when working on homeworks and projects, it is useful to ask others (the instructor, the TA, or other students) for hints or debugging help, or to talk generally about the written problems or programming strategies. Such activity is both acceptable and encouraged, but you must indicate in your submission any assistance you received. Any assistance received that is not given proper citation will be considered a violation of the Standard. In any event, you are responsible for understanding and being able to explain on your own all written and programming solutions that you submit. It is strictly not allowed to seek help outside your TAs and classmates for solving the assignments, so you cannot search for answers on the Web, students from previous semester taking this course, or search for solutions from previous semesters. The course staff will pursue aggressively all suspected cases of violations, and they will be handled through official University channels.

Help

Schedule

(subject to change) "Notes" will be uploaded before the class and are intentionally left incomplete for interactive lectures. Completed "slides" will be uploaded after the lectures.

| | Day | Topic | Slides | Assignments / Remarks | Reading | | | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------ | | 1 | 1/9 (Th) | Introduction and SQL | Lecture-1-notes Lecture-1 | Try the queries from the lecture notes on pgweb! How to access online SQL interface (for simple tasks): pgweb (pgweb instruction) (we will use this in class). Also check out a similar pgAdmin4 (pgAdmin4 instruction) (you would need pgadmin for assignments) | 2.1, 2.2, 6.1, 6.2 | | 2 | 1/14 (T) | Relational model and algebra | Lecture-2-notes Lecture-2 (up to slide 25) | HW1 is posted. | 2.3, 2.4 | | 3 | 1/16 (Th) | contd. | (from slide 26 to end) | | | | 4 | 1/21 (T) | Database design in E/R model | Lecture-3-notes Lecture-3 | Lab-1 on RA in class LAB1 (in groups).HW1 is due. HW2 is posted. | 4.1-4.4 | | 5 | 1/23 (Th) | contd. | Quiz-1 in class (in groups). | | | | 6 | 1/28 (T) | Project mixer | Guest lecture by Jane Li and Danai Adkisson (slides) | Project details is posted on Saturday 1/25. | | | 7 | 1/30 (Th) | ER to relational translation | Lecture-4-notes Lecture-4 | HW2 is due. Lab-1 is due. HW3 is posted (Q1-Q5). | 4.5, 4.6 | | 8 | 2/4 (T) | SQL: aggregation, subqueries, NULL, outerjoin, modifications, constraints, triggers, views | Lecture-5-notes Lecture-5 (up to slide 26) | 2.3, 6.1.1-6.1.7, 6.2-6.5, 7.1-7.5, 8.1-8.3 | | | 9 | 2/6 (Th) | contd. | (up to slide 46 of Lecture-5) | | | | 10 | 2/11 (T) | contd. | (finished Lecture-5) | HW3 - Q1, Q2, Q3 are due. | | | 11 | 2/13 (Th) | Database design theory - FD, BCNF | Lecture-6-notes Lecture-6 (up to slide 13 -- up to here included in midterm) | 3.1-3.2 | | | 2/15 (Sat) | HW3 - Q4 and Q5 are due **AT 12 NOON**. | | | | | | 2/18 (T) | Midterm exam | | | | | | 12 | 2/20 (Th) | BCNF (contd.) | (end of Lecture 6) | | | | 2/24 (M) | Project MS1 is due. | | | | | | 13 | 2/25 (T) | Storage | Lecture-7-notes Lecture-7 (up to slide 20) | Quiz-2 in class (in groups). | | | 14 | 2/27 (Th) | Index | Lecture-8-notes Lecture-8 (up to slide 15 of Lecture-8, finished Lecture-7) | HW4 is posted (due on 03/04 Wed). | | | 15 | 3/3 (T) | contd. | | | | | 16 | 3/5 (Th) | Query Processing | Lecture-9-notes Lecture-9 (up to slide 14) | Lab-2 in class (index) | | | 3/10 (T) | No class- Spring break | | | | | | 3/12 (Th) | No class- Spring break | | | | | | 3/17 (T) | Class canceled - Extended spring break due to COVID-19 | | | | | | 3/19 (Th) | Class canceled - Extended spring break due to COVID-19 | | | | | | The rest of the class will be conducted online. We would be exploring what would work best for the class taking into account responses to our survey on remote teaching, so our teaching strategy might be updated as we go. The class is meeting on TTh usual time by ZOOM as regular lectures. The class meetings will be recorded for those who cannot join during the class times. The new video-watch assignments require that either you attend the class or watch the videos. The following schedule is tentative. Please check sakai for all announcements. | | | | | | | 17 | 3/24 (T) | Join Algorithms and external sorting | Lecture-10 | Watch video for 10a and 10b on sakai | | | 3/25 (W) | Lab2 is due. HW5 to be posted. | | | | | | 18 | 3/26 (Th) | contd. | | | | | 19 | 3/31 (T) | XML | Lecture-11 | | | | 20 | 4/2 (Th) | contd. | (up to XPATH, slide 27 in Lec-11, last two XPATH slides have examples/functions that can be useful in HW) | Project MS2 is due. | | | 21 | 4/7 (T) | contd. | (finished Lecture 11) | HW5 (Q1ab, Q2, Q3) due. | | | 22 | 4/9 (Th) | Transaction | Lecture-12 | | | | 4/13 (M) | HW5 - Q4 is due. | | | | | | 23 | 4/14 (T) | contd. + A note on SQL injection attack | SQL-injection-attack | HW5 - Q5 (Gradiance) is due. | | | 24 | 4/17 (Th) | Transaction Recovery | Lecture-13 | | | | 25 | 4/21 (T) | Map-Reduce, Parallel DBMS | Lecture-14 | HW6 - Q1 (Gradiance) is due. | | | 4/22 (W) | HW6 - Q2 (Gradiance) is due. Video-watch assignments due. | | | | | | 4/24 (F) | Final project report, and Project demo link, code, and video by each team due. | | | | | | 4/27 (Mon) | Online Final Exam (9 am to 12 noon), details TBA | | | | |