CIS 4130 Big Data Technologies Syllabus

Zicklin School of Business – Baruch College
City University of New York

CIS 4130 Big Data Technologies

Fall 2024 – Section CMWA – Hybrid/Synchronous
Monday/Wednesday 10:45 – 12:00 pm

Syllabus


Professor Dr. Richard Holowczak
Phone: 646-312-3371
Office Hours: Monday and Wednesday, 9:30am – 10:30am, or by appointment
E-Mail: [email protected] (Preferred)
Please put the following in the Subject line for any e-mail to me: CIS 4130 followed by the specific subject of your e-mail.
Instructional Modality Section CMWA will be Hybrid for Fall 2024.
Monday – In Person in Room NVC 12-150
Wednesday – On-Line – Lectures will be Synchronous.
Objectives This course will give students an overview of the big data technologies that will help efficiently store, extract, and process very large datasets. Students will learn key data analysis and management techniques, including critical concepts such as Distributed File Systems (storage concepts) and MapReduce/Spark (processing concepts) that power modern big data technologies. In particular, the course will leverage cloud-based services to manage storage and efficiently process data. Further, the course will also show how big data technologies can also be used to effectively analyze large volumes of data for practical applications.
Course Learning Goals Upon successful completion of this course, students will be able to:

  • Explain the challenges of big data
  • Articulate key big data and cloud computing concepts.
  • Explain different processing technologies such as MapReduce and Spark.
  • Leverage cloud computing services to analyze large data sets.
BBA Learning Goals
  • Analytical Skills: Students will possess the analytical and critical thinking skills to evaluate issues faced in business and professional careers.
  • Technological Skills: Students will possess the necessary technological skills to analyze problems, develop solutions and convey information.
  • Written Communication Skills: Students will have the necessary written communication skills to convey ideas and information effectively and persuasively.
  • Civic awareness and ethical decision-making: Students will be aware of the general ethical considerations and responsibilities of managing large scale data repositories.
Prerequisites CIS 3120 Programming for Analytics AND (ZICK or ZKTP student group)
Textbooks / Materials / Resources
  • There are no required textbooks for this class. However students reserve approximately $100 to cover the costs of cloud services and certification tests.
  • Optional: Data Engineering with Google Cloud Platform By Adi Wijaya. March 2022. Packt Publishing. ISBN 9781800561328 (Available on-line in Newman Library)
  • Optional: Data Science on the Google Cloud Platform, 2nd Edition by Valliappa Lakshmanan. March 2022 O’Reilly Media, Inc. ISBN: 9781098118952 (Available on-line in Newman Library)
  • Additional course materials will be provided on Brightspace.
Course Content There will be 7 homework assignments including:
    Google Cloud Big Data and Machine Learning Fundamentals
    DataCamp: Big Data with PySpark Skill Track courses
There will be a Midterm Exam and Final Exam (not cumulative).
An individual machine learning Project will be due at the end of the semester.
Students are expected to spend a significant amount of time outside the classroom learning to use Google Cloud Platform and programming with PySpark.
Grading
  • Mid term Exam
  • 25%
  • Final Exam
  • 25%
  • Homeworks
  • 20%
  • Semester Project
  • 30%
    This is a tentative grading schedule and is subject to change.
    Credit will not be given for assignments submitted after homework solutions are discussed.
    There will be no extra credit assignments.
    Semester Project Students will complete an individual machine learning project during the semester. The project will have six milestones due throughout the semester:

    Milestone Points Work
    1 10 Project Proposal
    2 15 Data Acquisition
    3 20 Exploratory Data Analysis and Data Cleaning
    4 30 Feature Engineering and Modeling
    5 15 Model Evaluation and Data Visualization
    6 10 Final Report and Share Project on GitHub

    Topics / Schedule (Tentative)

    The following table gives a tentative lecture schedule for the course.

    Week Topics Google Cloud / DataCamp Project Milestones
    1 Course Introduction

    Introduction to BigData
    DataCamp: Understanding Data Engineering (Due 9/6)  
    2 Python Review
    Introduction to Machine Learning
    Google Cloud Platform Exercises (Due 9/20) Milestone 1 Proposal (Due 9/13)
    3 Introduction to Cloud Computing Continued  
    4 Cloud Computing – Compute Services
    GCP Compute Engine and working with Command line.
    DataCamp: Introduction to PySpark (Due 9/11)  
    5 Cloud Computing – Cloud Storage Services Continued Milestone 2 Data Acquisition (Due 10/4)
    6 Hadoop – Architecture and HDFS DataCamp: BigData Fundamentals with PySpark (Due 10/4)  
    7 Hadoop – MapReduce and YARN
    GCP DataProc ( Dataproc Lab)
    Continued  
    8 Catch up class
    Review for Midterm Exam
    DataCamp: Cleaning Data with PySpark (Due 10/18)  
    9 Midterm Exam (10/21 Tentative)
    Spark Architecture and PySpark
    Continued Milestone 3 EDA / Data Cleaning (Due 10/25)
    10 Spark: RDDs, DataFrames, DataSets, SparkSQL DataCamp: Feature Engineering with PySpark (Due 11/1)  
    11 Spark: Feature Engineering with PySpark Continued  
    12 Spark: Machine Learning with MLIB DataCamp: Machine Learning with PySpark (Due 11/15) Milestone 4 FE and Modeling (Due 11/8)
    13 Spark: Pipelines and Cross Validation (Optional) DataCamp:
    Building Rec. Engines with PySpark
    Milestone 5 Data Visualization (Due 11/29)
    14 Spark: Streaming    
    15 Scripting, Automation and MLOps
    Final Exam Review
      Milestone 6 Final Report (Due 12/13)
    16
    TBD
    Final Exam to be held during 2 hour final exam period.    

    Please note that this schedule is subject to change. Students are expected to come to class prepared and ready to participate.

    Google Cloud Platform

    This course has been structured around Google Cloud Platform which will be used extensively for demonstrations, homework and projects.
    Most of the topics can be adequately demonstrated using the “Free Tier” of services. Some services may require payment if they are left running or consuming storage space for a prolonged period of time.

    Students are responsible for monitoring their services and billing statements, and for setting up budgets and alerts to prevent extensive charges.

    Final Letter Grades Letter grades are calculated according to the Official Grading System of Baruch College. The instructor reserves the right to curve the scale when computing final grades, if deemed necessary.

    Grade Grade Point Eq Score
    A 4.0 93.0 – 100
    A- 3.7 90.0 – 92.9
    B+ 3.3 87.1 – 89.9
    B 3.0 83.0 – 87.0
    B- 2.7 80.0 – 82.9
    C+ 2.3 77.1 – 79.9
    C 2.0 73.0 – 77.0
    C- 1.7 70.0 – 72.9
    D+ 1.3 67.1 – 69.9
    D 1.0 60.0 – 67.0
    F 0.0 below 60
    Grade Distribution The Paul H. Chook Department of Information Systems and Statistics expects to see a reasonable distribution of grades in each class. For undergraduate courses this distribution is:

    A and A- 30% or less
    B+, B & B- 30% or less
    C+, C & C- 30% or less
    D & F any students who have earned these grades

    Due to these guidelines, the professor reserves the right to curve final letter grades up or down.

    Academic Integrity Statement I fully support Baruch College’s policy on Academic Honesty, which states, in part:

    “Academic dishonesty is unacceptable and will not be tolerated. Cheating, forgery, plagiarism and collusion in dishonest acts undermine the college’s educational mission and the students’ personal and intellectual growth. Baruch students are expected to bear individual responsibility for their work, to learn the rules and definitions that underlie the practice of academic integrity, and to uphold its ideals. Ignorance of the rules is not an acceptable excuse for disobeying them. Any student who attempts to compromise or devalue the academic process will be sanctioned.”

    Academic sanctions in this class will range from an F on the assignment to an F in this course. A report of suspected academic dishonesty will be sent to the Office of the Dean of Students. Additional information and definitions can be found at http://www.baruch.cuny.edu/academic/academic_honesty.html

    The use of AI (ChatGPT and similar) for coursework and assignments is strictly prohibited. This includes, but is not limited to, the use of AI-generated text, speech, programming code, diagrams or images, as well as the use of AI tools or software to complete any portion of a project, assignment or exam. Any use of AI tools to complete your work or a portion of your work will result in a grade of 0.
    Statement on Lecture Recording Students who participate in this class with their camera on or use a profile image are agreeing to have their video or image recorded solely for the purpose of creating a record for students enrolled in the class to refer to, including those enrolled students who are unable to attend live. If you are unwilling to consent to have your profile or video image recorded, be sure to keep your camera off and do not use a profile image. Likewise, students who un-mute during class and participate orally are agreeing to have their voices recorded. If you are not willing to consent to have your voice recorded during class, you will need to keep your mute button activated and communicate exclusively using the “chat” feature, which allows students to type questions and comments live.
    Baruch College Counseling Center At Baruch, we acknowledge that as a student, you are balancing many demands. During the semester, if you start to experience personal difficulties or stressors that are interfering with your academic performance or day to day functioning, please consider seeking free and confidential support at the Baruch College Counseling Center. For more information or to make an appointment, please visit their website at studentaffairs.baruch.cuny.edu/counseling/ or call 646-312-2155.
    If it is outside of business hours (Monday-Friday 9-5pm) and you need immediate assistance, please call 1-888-NYC-WELL (888-692-9355). If you are concerned about one of your classmates, please share that concern by filling out a Campus Intervention Team form at studentaffairs.baruch.cuny.edu/campus-intervention-team.

    Students with Disabilities Students with disabilities may receive assistance and accommodation of various sorts to enable them to participate fully in courses at Baruch. To establish the accommodations appropriate for each student, please alert me to your needs and contact the Office of Services for Students with Disabilities, part of the Division of Student Development and Counseling. For more information contact the Director of this office in NVC 2-271 or at (646) 312 4590.

    Additional Notes
    • No makeups will be given for missed quizzes or exams.
    • The instructor retains all exam papers.
    • The final exam must be taken by all students in the time slot posted in the college bulletin.
      Please make your business and travel plans to accommodate this schedule.
    • If you miss class, it is your responsibility to find out about any announcements or assignments you may have missed.
    • Cell phones etc. should be silenced during class and turned off during exams.
    • In general, the time to let me know about any problems or issues concerning missing class, long term illnesses, job related problems, academic probation, etc. is before you have missed a week or two of classes.
    • All homework assignments are to be done individually. Students handing in similar work will both receive a 0 and face disciplinary actions.
    • The instructor reserves the right to give unannounced quizes if it appears students are not putting the time in to prepare for class.
    • Other helpful software tools to have include a decent word processor (e.g., MS Word) and a drawing tool. MS Powerpoint or LucidChart can be used for the latter.
    • Make backups of all of your work! This includes any assignment and project materials you produce. I reserve the right to ask you to resubmit any assignment at any time.
    Important Dates Baruch Academic Calendar for Fall 2024

    - August 28      Wednesday     First day of of the Fall Semester
    - September 2    Monday        No Class
    - October 2      Wednesday     No Class
    - October 14     Monday        No Class
    - October 15     Tuesday       Make-up Class (On-line)
    - October 16     Wednesday     Normal class session (on-line)
    - November 27    Wednesday     No class (Classes follow a Friday schedule)
    - December 11    Wednesday     Last class session for CIS 4130
    - December 15-21               Final Exam Period
    

    BBA Program Learning Goals

    Goals Significant
    Part of the Course
    Moderate
    Part of
    Course
    Minimal
    part of
    Course
    Not Part of
    Course
    Analytical Skills X
    Technological Skills X
    Communication Skills: Oral X
    Communication Skills: Written X
    Civic Awareness and Ethical
    Decision-Making
    X
    Global Awareness X

    Course mapping with learning goals

    Course Learning Goals BBA learning goals Assignments
    Explain the challenges of big data Analytical skills
    Technological skills
    Written Communication Skills
    Civic awareness and ethical decision-making
    Quizzes, Exams, Assignments
    Articulate key big data concepts Analytical skills
    Technological skills
    Written Communication Skills
    Quizzes, Exams, Assignments
    Explain different processing technologies such as MapReduce and Spark. Analytical skills
    Technological skills
    Written Communication Skills
    Quizzes, Exams, Assignments
    Leverage cloud computing services to analyze large data sets Analytical skills
    Technological skills
    Quizzes, Exams, Assignments

    .