Zicklin School of Business – Baruch College
City University of New York
CIS 4130 Big Data Technologies
Fall 2024 – Section CMWA – Hybrid/Synchronous
Monday/Wednesday 10:45 – 12:00 pm
Syllabus
Professor | Dr. Richard Holowczak Phone: 646-312-3371 Office Hours: Monday and Wednesday, 9:30am – 10:30am, or by appointment E-Mail: [email protected] (Preferred) Please put the following in the Subject line for any e-mail to me: CIS 4130 followed by the specific subject of your e-mail. |
|||||||||||||||||||||
Instructional Modality | Section CMWA will be Hybrid for Fall 2024. Monday – In Person in Room NVC 12-150 Wednesday – On-Line – Lectures will be Synchronous. |
|||||||||||||||||||||
Objectives | This course will give students an overview of the big data technologies that will help efficiently store, extract, and process very large datasets. Students will learn key data analysis and management techniques, including critical concepts such as Distributed File Systems (storage concepts) and MapReduce/Spark (processing concepts) that power modern big data technologies. In particular, the course will leverage cloud-based services to manage storage and efficiently process data. Further, the course will also show how big data technologies can also be used to effectively analyze large volumes of data for practical applications. | |||||||||||||||||||||
Course Learning Goals |
Upon successful completion of this course, students will be able to:
|
|||||||||||||||||||||
BBA Learning Goals |
|
|||||||||||||||||||||
Prerequisites | CIS 3120 Programming for Analytics AND (ZICK or ZKTP student group) | |||||||||||||||||||||
Textbooks / Materials / Resources |
|
|||||||||||||||||||||
Course Content | There will be 7 homework assignments including: Google Cloud Big Data and Machine Learning Fundamentals DataCamp: Big Data with PySpark Skill Track courses There will be a Midterm Exam and Final Exam (not cumulative). An individual machine learning Project will be due at the end of the semester. Students are expected to spend a significant amount of time outside the classroom learning to use Google Cloud Platform and programming with PySpark. |
|||||||||||||||||||||
Grading |
|
|||||||||||||||||||||
This is a tentative grading schedule and is subject to change. Credit will not be given for assignments submitted after homework solutions are discussed. There will be no extra credit assignments. |
||||||||||||||||||||||
Semester Project |
Students will complete an individual machine learning project during the semester. The project will have six milestones due throughout the semester:
|
Topics / Schedule (Tentative)
The following table gives a tentative lecture schedule for the course.
Week | Topics | Google Cloud / DataCamp | Project Milestones |
---|---|---|---|
1 | Course Introduction Introduction to BigData |
DataCamp: Understanding Data Engineering (Due 9/6) | |
2 | Python Review Introduction to Machine Learning |
Google Cloud Platform Exercises (Due 9/20) | Milestone 1 Proposal (Due 9/13) |
3 | Introduction to Cloud Computing | Continued | |
4 | Cloud Computing – Compute Services GCP Compute Engine and working with Command line. |
DataCamp: Introduction to PySpark (Due 9/11) | |
5 | Cloud Computing – Cloud Storage Services | Continued | Milestone 2 Data Acquisition (Due 10/4) |
6 | Hadoop – Architecture and HDFS | DataCamp: BigData Fundamentals with PySpark (Due 10/4) | |
7 | Hadoop – MapReduce and YARN GCP DataProc ( Dataproc Lab) |
Continued | |
8 | Catch up class Review for Midterm Exam |
DataCamp: Cleaning Data with PySpark (Due 10/18) | |
9 | Midterm Exam (10/21 Tentative) Spark Architecture and PySpark |
Continued | Milestone 3 EDA / Data Cleaning (Due 10/25) |
10 | Spark: RDDs, DataFrames, DataSets, SparkSQL | DataCamp: Feature Engineering with PySpark (Due 11/1) | |
11 | Spark: Feature Engineering with PySpark | Continued | |
12 | Spark: Machine Learning with MLIB | DataCamp: Machine Learning with PySpark (Due 11/15) | Milestone 4 FE and Modeling (Due 11/8) |
13 | Spark: Pipelines and Cross Validation | (Optional) DataCamp: Building Rec. Engines with PySpark |
Milestone 5 Data Visualization (Due 11/29) |
14 | Spark: Streaming | ||
15 | Scripting, Automation and MLOps Final Exam Review |
Milestone 6 Final Report (Due 12/13) | |
16 TBD |
Final Exam to be held during 2 hour final exam period. |
Please note that this schedule is subject to change. Students are expected to come to class prepared and ready to participate.
Google Cloud Platform |
This course has been structured around Google Cloud Platform which will be used extensively for demonstrations, homework and projects. Students are responsible for monitoring their services and billing statements, and for setting up budgets and alerts to prevent extensive charges.
|
||||||||||||||||||||||||||||||||||||
Final Letter Grades |
Letter grades are calculated according to the Official Grading System of Baruch College. The instructor reserves the right to curve the scale when computing final grades, if deemed necessary.
|
||||||||||||||||||||||||||||||||||||
Grade Distribution |
The Paul H. Chook Department of Information Systems and Statistics expects to see a reasonable distribution of grades in each class. For undergraduate courses this distribution is:
Due to these guidelines, the professor reserves the right to curve final letter grades up or down. |
||||||||||||||||||||||||||||||||||||
Academic Integrity Statement |
I fully support Baruch College’s policy on Academic Honesty, which states, in part: “Academic dishonesty is unacceptable and will not be tolerated. Cheating, forgery, plagiarism and collusion in dishonest acts undermine the college’s educational mission and the students’ personal and intellectual growth. Baruch students are expected to bear individual responsibility for their work, to learn the rules and definitions that underlie the practice of academic integrity, and to uphold its ideals. Ignorance of the rules is not an acceptable excuse for disobeying them. Any student who attempts to compromise or devalue the academic process will be sanctioned.” Academic sanctions in this class will range from an F on the assignment to an F in this course. A report of suspected academic dishonesty will be sent to the Office of the Dean of Students. Additional information and definitions can be found at http://www.baruch.cuny.edu/academic/academic_honesty.html The use of AI (ChatGPT and similar) for coursework and assignments is strictly prohibited. This includes, but is not limited to, the use of AI-generated text, speech, programming code, diagrams or images, as well as the use of AI tools or software to complete any portion of a project, assignment or exam. Any use of AI tools to complete your work or a portion of your work will result in a grade of 0. |
||||||||||||||||||||||||||||||||||||
Statement on Lecture Recording |
Students who participate in this class with their camera on or use a profile image are agreeing to have their video or image recorded solely for the purpose of creating a record for students enrolled in the class to refer to, including those enrolled students who are unable to attend live. If you are unwilling to consent to have your profile or video image recorded, be sure to keep your camera off and do not use a profile image. Likewise, students who un-mute during class and participate orally are agreeing to have their voices recorded. If you are not willing to consent to have your voice recorded during class, you will need to keep your mute button activated and communicate exclusively using the “chat” feature, which allows students to type questions and comments live. |
||||||||||||||||||||||||||||||||||||
Baruch College Counseling Center |
At Baruch, we acknowledge that as a student, you are balancing many demands. During the semester, if you start to experience personal difficulties or stressors that are interfering with your academic performance or day to day functioning, please consider seeking free and confidential support at the Baruch College Counseling Center. For more information or to make an appointment, please visit their website at studentaffairs.baruch.cuny.edu/counseling/ or call 646-312-2155. If it is outside of business hours (Monday-Friday 9-5pm) and you need immediate assistance, please call 1-888-NYC-WELL (888-692-9355). If you are concerned about one of your classmates, please share that concern by filling out a Campus Intervention Team form at studentaffairs.baruch.cuny.edu/campus-intervention-team. |
||||||||||||||||||||||||||||||||||||
Students with Disabilities | Students with disabilities may receive assistance and accommodation of various sorts to enable them to participate fully in courses at Baruch. To establish the accommodations appropriate for each student, please alert me to your needs and contact the Office of Services for Students with Disabilities, part of the Division of Student Development and Counseling. For more information contact the Director of this office in NVC 2-271 or at (646) 312 4590. | ||||||||||||||||||||||||||||||||||||
Additional Notes |
|
||||||||||||||||||||||||||||||||||||
Important Dates |
Baruch Academic Calendar for Fall 2024
- August 28 Wednesday First day of of the Fall Semester - September 2 Monday No Class - October 2 Wednesday No Class - October 14 Monday No Class - October 15 Tuesday Make-up Class (On-line) - October 16 Wednesday Normal class session (on-line) - November 27 Wednesday No class (Classes follow a Friday schedule) - December 11 Wednesday Last class session for CIS 4130 - December 15-21 Final Exam Period |
BBA Program Learning Goals
Goals | Significant Part of the Course |
Moderate Part of Course |
Minimal part of Course |
Not Part of Course |
---|---|---|---|---|
Analytical Skills | X | |||
Technological Skills | X | |||
Communication Skills: Oral | X | |||
Communication Skills: Written | X | |||
Civic Awareness and Ethical Decision-Making |
X | |||
Global Awareness | X |
Course mapping with learning goals
Course Learning Goals | BBA learning goals | Assignments |
---|---|---|
Explain the challenges of big data | Analytical skills Technological skills Written Communication Skills Civic awareness and ethical decision-making |
Quizzes, Exams, Assignments |
Articulate key big data concepts | Analytical skills Technological skills Written Communication Skills |
Quizzes, Exams, Assignments |
Explain different processing technologies such as MapReduce and Spark. | Analytical skills Technological skills Written Communication Skills |
Quizzes, Exams, Assignments |
Leverage cloud computing services to analyze large data sets | Analytical skills Technological skills |
Quizzes, Exams, Assignments |
.