Labs
Installing and configuring all the software needed for this course on your machine might be tedious. We have prepared a virtual machine (VM) with the majority of tools which you can download and use it.
If you use VMWare Workstation Player (recommended, free for personal use from here) you can download the VM from here (size: ~8.7GB).
If you use Oracle VirtualBox (latest version 7.0.4, free from here) you can download the VM from here (size: ~9.3GB).
The username/password is csdeptucy.
PLEASE INSTALL THE VM BEFORE THE FIRST LAB.
If you want to resize your VM please follow these instructions.
We would like to kindly ask you to bring your own laptop (with VM installed on it) in the lab.
Week | Description | Useful Links | Material | Exercises to deliver |
1 | Introduction to Apache Hadoop |
LAB01.pdf Source Code Dataset |
||
2 | Programming with Apache Hadoop |
LAB02.pdf WordCount.java SalesJan2009.csv |
🔴 | |
3 | Introduction to Python |
LAB03.pdf |
||
4 | Data Manipulation |
LAB04.pdf, Lab04.ipynb, iris_data.csv, iris_data2.csv |
||
5 | Data Visualization |
LAB05.pdf, Lab05.ipynb, iris.csv, haberman.csv |
||
6 | Data Preparation I: Cleaning, Encoding, Scaling, Resampling Data | LAB06 Lab06.ipynb NFL Play by Play 2009-2016 (v3).zip house_prices_train.csv shampoo.csv |
||
7 | Data Preparation II: Dimensionality Reduction: Feature Selection and Extraction | LAB07 Lab07.ipynb |
||
8 | Machine Learning: Regression |
LAB08 Lab8_LinearRegression.ipynb Lab8_PolynomialRegression.ipynb Advertising.csv Boston.csv |
||
9 | Machine Learning: Regression (cont'd) |
LAB08 Lab08.ipynb Advertising.csv Boston.csv |
||
10 | Machine Learning: Classification and Clustering |
LAB09 Lab09-classification.ipynb Lab09-clustering.ipynb telco.csv wine_data.csv fleet_data.csv WineAnalysis.ipynb |
🔴 | |
11 | Introduction to Apache Spark |
LAB10 kmeans-rdd.py kmeans-dataframe.py |
||
12 | Programming with Apache Spark |
LAB11 kmeans-fleet.py |
🔴 | |
13 | No Lab | Project Presentation Week |