CIS 4400 Data Warehousing for Analytics Fall 2020

CIS 4400- Data Warehousing for Analytics

Instructor

Emily Mazo (Tech-in-Residence Corps member)

Email: emily.mazo@baruch.cuny.edu

Course website: https://blogs.baruch.cuny.edu/emilymazo

Office hours: Sunday 4pm-6pm, Wednesday 5pm-7pm

Materials in this syllabus are adapted from those provided by Professor Holowczak

Objectives

This advanced course will provide students with an introduction to the internals of data warehousing and analytics database systems, as well as an overview of design and operational best-practices for data warehousing in industry. Students will develop a complete data warehouse system including design document, architecture review, ETL pipeline, data warehouse, and business intelligence dashboard. Covered topics include:

  • Data warehousing project planning
  • ETL pipelines
  • Batch- and streaming-data processing
  • Architecture design
  • Monitoring, observability, and scaling of data warehouses
  • Performance optimization of data warehouses for analytics
  • NoSQL databases
  • Cloud data storage and managed systems
  • Structured data
  • Database migration

Learning Goals

Students will learn not only how to build data warehouses for business applications, but also best practices for building and maintaining such systems in industry. This includes practicing project planning, stakeholder management, implementing monitoring and observability tools, considering ethics, security, and privacy at every step of system creation, undergoing architecture reviews, writing documentation, and collaborating in a team setting.

 

I believe that class is for theory, and homework is for practice. I will never ask you to regurgitate theory for me on homework assignments or exams, only to solve problems or build systems that demonstrate you have absorbed the theories we have discussed together and can use them.

Materials

Where available, I have found free tech blogs and articles online to provide supplemental readings for this course. Supplemental means supplemental to lectures, not optional. It is expected that you do the readings for each week’s classes as listed in this syllabus after that week’s lectures- lectures will serve as an introduction to the material in each topic, and the provided readings will serve as case studies of each topic in industry. Homeworks and exams may ask questions referencing these readings, and lectures will begin with space to ask and answer questions about the previous week’s readings.

 

In addition to reading materials, this course involves several homework assignments, in-class activities, and a group project that require use of a computer or computer cluster and the installation of open-source database software. If you do not have access to a computer or on-campus computer lab, I will work with Baruch College to provide you with one, but you must let me know three weeks in advance of the semester (8/1).

Grading 

 

Type Percentage of final grade
Homework 30% (10% each assignment)
Final exam 15%
Midterm exam 10%
Group Project 45% (15% proposal, 10% presentation, 20% final document)

 

Prerequisites

 

CIS 3400 Database Management Systems AND ZICK OR ZKTP Student Group.

Students must have a firm understanding of topics covered in CIS 3400 including the relational model, E-R diagraming, normalization and SQL. Students who received less than a “B” in CIS 3400, or students who have taken CIS 3400 more than 1 year ago should consult with the instructor prior to continuing on in CIS 4400.

 

Schedule (tentative)

 

Week (dates) Topics Post-lecture Reading Homework (assigned, due)
Week 1 (8/26) Welcome, administrivia
Week 2 (8/31, 9/2) Data Warehousing project planning, E-R Model 8/31:

http://holowczak.com/data-warehousing-project-planning/

https://eng.lyft.com/awesome-tech-specs-86eea8e45bb9

https://medium.com/pinterest-engineering/powering-pinterest-ads-analytics-with-apache-druid-51aa6ffb97c1 (up to the section labeled “query construction”)

9/2:

https://www.lucidchart.com/pages/ER-diagram-symbols-and-meaning#:~:text=and%20published%20date.-,ER%20diagram%20notation,preferred%20ERD%20notation%20for%20Lucidchart.

Homework 1 (assigned 8/31, due 9/9)

Group project milestone 1 due 9/14

Week 3 (9/9) Dimensional Model 9/9: http://holowczak.com/data-warehouse-dimensional-modeling/
Week 4 (9/14, 9/16) ETL (parts 1 & 2) 9/14:https://netflixtechblog.com/dblog-a-generic-change-data-capture-framework-69351fb9099b

9/16:https://eng.lyft.com/running-apache-airflow-at-lyft-6e53bb8fccff

Homework 2(assigned 9/14, due 9/21)
Week 5 (9/21, 9/23) Ethics,

Security

9/21: (BEFORE CLASS)https://www.scu.edu/ethics/ethics-resources/ethical-decision-making/what-is-ethics/

9/23:

Week 6 (9/29, 9/30) CAP theorem, Scaling Homework 3(assigned 9/30, due 10/14 7)
Week 7 (10/5, 10/7) Software engineering/data engineering interview practice, technical documentation
Week 8 (10/14) Performance
Week 9 (10/19, 10/21) Midterm exam review, midterm held 10/21 in-class
Week 10 (10/26, 10/28) Cloud computing and managed services, Architecture design part 2 (architecture reviews)
Week 11 (11/2, 11/4) Maintenance (monitoring, observability, stakeholder management) Homework 4 (assigned 11/2, due 11/16)
Week 12 (11/9, 11/11) NoSQL databases
Week 13 (11/16, 11/18) Streaming data vs. batch data processing, guest panel
Week 14 (11/23) Structured and semi-structured data Homework 5 (assigned 11/23, due 12/2)
Week 15 (11/30, 12/2) Final Exam Review
Week 16 (12/7, 12/9) Project presentations

 

Academic Integrity Statement

I fully support Baruch College’s policy on Academic Honesty, which states, in part:

“Academic dishonesty is unacceptable and will not be tolerated. Cheating, forgery, plagiarism and collusion in dishonest acts undermine the college’s educational mission and the students’ personal and intellectual growth. Baruch students are expected to bear individual responsibility for their work, to learn the rules and definitions that underlie the practice of academic integrity, and to uphold its ideals. Ignorance of the rules is not an acceptable excuse for disobeying them. Any student who attempts to compromise or devalue the academic process will be sanctioned.”

Academic sanctions in this class will range from an F on the assignment to an F in this course. A report of suspected academic dishonesty will be sent to the Office of the Dean of Students. Additional information and definitions can be found at http://www.baruch.cuny.edu/academic/academic_honesty.html

Additional Notes

Cribbed from Professor Holowczak’s syllabus:

  • No makeups will be given for missed quizzes or exams.
  • The instructor retains all midterm and final exams.
  • The final exam must be taken by all students in the time slot posted in the college bulletin.
    Please make your business and travel plans to accommodate this schedule.
  • Grades will not be given out via e-mail under any circumstances.
  • If you miss class, it is your responsibility to find out about any announcements or assignments you may have missed.
  • Cell phones etc. should be turned off during class and especially during exams.
  • In general, the time to let me know about any problems or issues concerning missing class, long term illnesses, job related problems, academic probation, etc. is before you have missed a week or two of classes.
  • All homework assignments are to be done individually. Students handing in similar work will both receive a 0 and face disciplinary actions.
  • The instructor reserves the right to give unannounced quizes if it appears students are not putting the time in to prepare for class.
  • Students are expected to spend time outside of the classroom learning to use Oracle and/or SQL Server. This means you will need to spend time in the computer labs at school or on your own computer systems.
  • Other helpful software tools to have include a decent word processor (e.g., MS Word) and a drawing tool. MS Powerpoint or Visio can be used for the latter.
  • Assignments may be turned in via e-mail. However, you are strongly advised to send me a sample attachment to ensure I can decode and view it properly. Not all e-mail programs work the same and many students have had problems sending attachments such as MS Word files.
  • Make backups of all of your work! This includes any assignment and project materials you produce. I reserve the right to ask you to resubmit any assignment at any time.
  • There will be programming assignments.
    If you have never taken a programming course before, you should (at the very least) read up on some notes Professor Holowczak has assembled here: holowczak.com/programming-concepts-tutorial-programmers/
  • In some portions of the course, we will work with web access to databases. It is important that you understand how a web browser and server interact and how to write some simple HTML including HTML forms. There are many tutorials on the web and you should know how to do this anyhow so dig in…
  • You may wish to participate in the Baruch College CIS group on Facebook

 

One thought on “CIS 4400 Data Warehousing for Analytics Fall 2020”

Leave a Reply