CIS 9760 Big Data Technologies Syllabus

Zicklin School of Business – Baruch College
City University of New York

CIS 9760 – Big Data Technologies

Spring 2025 – Section FMWA – Hybrid/Synchronous
Monday/Wednesday 4:10 – 5:25 pm

DRAFT Syllabus


Professor Dr. Richard Holowczak
Phone: 646-312-3371
Office Hours: Monday and Wednesday, 12:30pm – 1:30pm, or by appointment
E-Mail: [email protected] (Preferred)
Please put the following in the Subject line for any e-mail to me: CIS 9760 followed by the specific subject of your e-mail.
Instructional Modality Section FMWA will be Hybrid for Spring 2025.
Monday – In Person in Room NVC 7-150
Wednesday – On-Line – Lectures will be Synchronous.
Objectives This course will give students an overview of the big data technologies that will help efficiently store, extract, and process very large datasets. Students will learn key data analysis and management techniques, including critical concepts such as Distributed File Systems (storage concepts) and MapReduce/Spark(processing concepts) that power modern big data technologies. In particular, it will leverage cloud services to manage storage and efficiently process data. Further, the course will also show how big data technologies can be used to effectively analyze large volumes of data for practical applications.
Course Learning Goals Upon successful completion of this course, students will be able to:

  • Explain the challenges of big data
  • Articulate key big data and cloud computing concepts.
  • Explain different processing technologies such as MapReduce and Spark.
  • Leverage cloud computing services to analyze large data sets.
Prerequisites Pre-requisite: CIS 9650 – Programming for Analytics
Suggested: CIS 9660 – Data Mining for Business Analytics
Textbooks / Materials / Resources
  • There are no required textbooks for this class. However students reserve approximately $100 to cover the costs of cloud services and certification tests.
  • Optional: Data Engineering with Google Cloud Platform By Adi Wijaya. March 2022. Packt Publishing. ISBN 9781800561328 (Available on-line in Newman Library)
  • Optional: Data Science on the Google Cloud Platform, 2nd Edition by Valliappa Lakshmanan. March 2022 O’Reilly Media, Inc. ISBN: 9781098118952 (Available on-line in Newman Library)
  • Additional course materials will be provided on Brightspace.
Course Content There will be 7 homework assignments including:
    Google Cloud Platform exercises
    DataCamp: Big Data with PySpark Skill Track courses
There will be an in-person Midterm Exam and Final Exam (not cumulative).
An individual machine learning Project will be due at the end of the semester.
Students are expected to spend a significant amount of time outside the classroom learning to use Google Cloud Platform and programming with PySpark.
Grading
  • Mid term Exam
  • 25%
  • Final Exam
  • 25%
  • Homeworks
  • 20%
  • Semester Project
  • 30%
    This is a tentative grading schedule and is subject to change.
    Credit will not be given for assignments submitted after homework solutions are discussed.
    There will be no extra credit assignments.
    Semester Project Students will complete an individual machine learning project during the semester. The project will have six milestones due throughout the semester:

    Milestone Points Work
    1 10 Project Proposal
    2 15 Data Acquisition
    3 20 Exploratory Data Analysis and Data Cleaning
    4 30 Feature Engineering and Modeling
    5 15 Model Evaluation and Data Visualization
    6 10 Final Report and Share Project on GitHub

    Topics / Schedule (Tentative)

    The following table gives a tentative lecture schedule for the course.

    Week Topics Google Cloud / DataCamp Project Milestones
    1 Course Introduction

    Introduction to BigData
    DataCamp: Understanding Data Engineering (Due 2/3)  
    2 Python Review
    Introduction to Machine Learning
    Google Cloud Platform Exercises (Due 2/17) Milestone 1 Proposal (Due 2/21)
    3 Introduction to Cloud Computing Continued  
    4 Cloud Computing – Compute Services
    GCP Compute Engine and working with Command line.
    DataCamp: Introduction to PySpark (Due 2/28)  
    5 Cloud Computing – Cloud Storage Services Continued Milestone 2 Data Acquisition (Due 3/3)
    6 Hadoop – Architecture and HDFS DataCamp: BigData Fundamentals with PySpark (Due 3/14)  
    7 Hadoop – MapReduce and YARN
    GCP DataProc
    Continued  
    8 Catch up class
    Review for Midterm Exam
    DataCamp: Cleaning Data with PySpark (Due 3/28)  
    9 Midterm Exam (3/24 Tentative)
    Spark Architecture and PySpark
    Continued Milestone 3 EDA / Data Cleaning (Due 3/28)
    10 Spark: RDDs, DataFrames, DataSets, SparkSQL DataCamp: Feature Engineering with PySpark (Due 4/11)  
    11 Spark: Feature Engineering with PySpark Continued  
    12 Spark: Machine Learning with MLIB DataCamp: Machine Learning with PySpark (Due 4/25) Milestone 4 FE and Modeling (Due 4/18)
    13 Spark: Pipelines and Cross Validation (Optional) DataCamp:
    Building Rec. Engines with PySpark
    Milestone 5 Data Visualization (Due 5/2)
    14 Spark: Streaming    
    15 Scripting, Automation and MLOps
    Final Exam Review
      Milestone 6 Final Report (Due 5/16)
    16
    TBD
    Final Exam to be held during 2 hour final exam period.    

    Please note that this schedule is subject to change. Students are expected to come to class prepared and ready to participate.

    Google Cloud Platform

    This course has been structured around Google Cloud Platform which will be used extensively for demonstrations, homework and projects.
    Most of the topics can be adequately demonstrated using the “Free Tier” of services. Some services may require payment if they are left running or consuming storage space for a prolonged period of time.

    Students are responsible for monitoring their services and billing statements, and for setting up budgets and alerts to prevent extensive charges.

    Final Letter Grades Letter grades are calculated according to the Official Grading System of Baruch College. The instructor reserves the right to curve the scale when computing final grades, if deemed necessary.

    Grade Grade Point Eq Score
    A 4.0 93.0 – 100
    A- 3.7 90.0 – 92.9
    B+ 3.3 87.1 – 89.9
    B 3.0 83.0 – 87.0
    B- 2.7 80.0 – 82.9
    C+ 2.3 77.1 – 79.9
    C 2.0 73.0 – 77.0
    C- 1.7 70.0 – 72.9
    D+ 1.3 67.1 – 69.9
    D 1.0 60.0 – 67.0
    F 0.0 below 60
    Grade Distribution The Paul H. Chook Department of Information Systems and Statistics expects to see a reasonable distribution of grades in each class. For graduate courses this distribution is:

    A and A- 40% or less
    B+, B 40% or less
    B- or lower 20% (approximate)

    Due to these guidelines, the professor reserves the right to curve final letter grades up or down.

    Academic Integrity Statement I fully support Baruch College’s policy on Academic Honesty, which states, in part:

    “Academic dishonesty is unacceptable and will not be tolerated. Cheating, forgery, plagiarism and collusion in dishonest acts undermine the college’s educational mission and the students’ personal and intellectual growth. Baruch students are expected to bear individual responsibility for their work, to learn the rules and definitions that underlie the practice of academic integrity, and to uphold its ideals. Ignorance of the rules is not an acceptable excuse for disobeying them. Any student who attempts to compromise or devalue the academic process will be sanctioned.”

    Academic sanctions in this class will range from an F on the assignment to an F in this course. A report of suspected academic dishonesty will be sent to the Office of the Dean of Students. Additional information and definitions can be found at https://provost.baruch.cuny.edu/teaching-learning-student-success/academic_honesty/

    The use of AI (ChatGPT and similar) for coursework and assignments is strictly prohibited. This includes, but is not limited to, the use of AI-generated text, speech, programming code, diagrams or images, as well as the use of AI tools or software to complete any portion of a project, assignment or exam. Any use of AI tools to complete your work or a portion of your work will result in a grade of 0.
    Statement on Lecture Recording Students who participate in this class with their camera on or use a profile image are agreeing to have their video or image recorded solely for the purpose of creating a record for students enrolled in the class to refer to, including those enrolled students who are unable to attend live. If you are unwilling to consent to have your profile or video image recorded, be sure to keep your camera off and do not use a profile image. Likewise, students who un-mute during class and participate orally are agreeing to have their voices recorded. If you are not willing to consent to have your voice recorded during class, you will need to keep your mute button activated and communicate exclusively using the “chat” feature, which allows students to type questions and comments live.
    Baruch College Counseling Center At Baruch, we acknowledge that as a student, you are balancing many demands. During the semester, if you start to experience personal difficulties or stressors that are interfering with your academic performance or day to day functioning, please consider seeking free and confidential support at the Baruch College Counseling Center. For more information or to make an appointment, please visit their website at studentaffairs.baruch.cuny.edu/counseling/ or call 646-312-2155.
    If it is outside of business hours (Monday-Friday 9-5pm) and you need immediate assistance, please call 1-888-NYC-WELL (888-692-9355). If you are concerned about one of your classmates, please share that concern by filling out a Campus Intervention Team form at studentaffairs.baruch.cuny.edu/campus-intervention-team.

    Students with Disabilities Students with disabilities may receive assistance and accommodation of various sorts to enable them to participate fully in courses at Baruch. To establish the accommodations appropriate for each student, please alert me to your needs and contact the Office of Services for Students with Disabilities, part of the Division of Student Development and Counseling. For more information contact the Director of this office in NVC 2-271 or at (646) 312 4590.

    Additional Notes
    • No makeups will be given for missed quizes or exams.
    • The instructor retains all exam papers.
    • The final exam must be taken by all students in the time slot posted in the college bulletin.
      Please make your business and travel plans to accommodate this schedule.
    • If you miss class, it is your responsibility to find out about any announcements or assignments you may have missed.
    • Cell phones etc. should be silenced during class and turned off during exams.
    • In general, the time to let me know about any problems or issues concerning missing class, long term illnesses, job related problems, academic probation, etc. is before you have missed a week or two of classes.
    • All homework assignments are to be done individually. Students handing in similar work will both receive a 0 and face disciplinary actions.
    • The instructor reserves the right to give unannounced quizes if it appears students are not putting the time in to prepare for class.
    • Other helpful software tools to have include a decent word processor (e.g., MS Word) and a drawing tool. MS Powerpoint or LucidChart can be used for the latter.
    • Make backups of all of your work! This includes any assignment and project materials you produce. I reserve the right to ask you to resubmit any assignment at any time.
    Important Dates Baruch Academic Calendar for Spring 2025

    - January 25     Saturday      Official start of the Spring Semester
    - January 27     Monday        First Class session for CIS 9760
    - January 29     Wednesday     No Class
    - February 12    Wednesday     No Class
    - February 17    Monday        No Class
    - February 18    Tuesday       Classes follow Monday schedule
    - March 6        Thursday      Classes follow Wednesday schedule
    - March 31       Monday        No Class
    - April 1        Tuesday       Last day to withdraw with "W" grade
    - April 12-20	 Spring Recess
    - April 14       Monday        No Class
    - April 16       Wednesday     No Class
    - May 14         Wednesday     Last class for CIS 9760
    - May 16-22      Final exams
    - May 27         Tuesday       Final Grades Submitted
    

    .