Machine Learning Zoomcamp Update: Tuesday, 14 September 2021
Date: 14 September 2021
Today, I completed the Sessions 2.1, 2.2, 2.3, 2.4 and 2.5.
Session 2.1 - Car price prediction project
In this session Alexey Grigorev introduced this week's lessions and the project we will be working on this week.
Key takeaways:
- We will learn regression by working on a project using Car Features and MSRP dataset on Kaggle.
-
The Project plan is as follows:
- Prepare data and do EDA (Exploratory Data Analysis)
- Use linear regression for predicting price
- Understanding the internals of linear regression
- Evaluating the model using RMSE
- Feature engineering
- Regularization
- Using the model
Session 2.2 - Data preparation
This session explains how to perform data cleaning and prepare data for training.
Key takeaways:
- Data should be consistent.
Session 2.3 - Exploratory data analysis
This session explains Exploratory data analysis.
Key takeaways:
- Before training we should try to find any patterns if there are in the data.
- Using np.log1p method we can remove the tail if there is in data distribution.
Session 2.4 - Setting up the validation framework
This session explains train, validation and test split.
Key takeaways:
- The data should be first shuffled before creating the splits.
- We should set the random seed to make reproducible results.
Session 2.5 - Linear regression
This session gave an introduction to linear regression.
Key takeaways:
- We can use the np.expm1 to convert model result to actual predictions if we have used np.log1p previously.
- Prediction in linear regression: g(xi) = w0 + ∑j=1n wjxij, where there are n features
Estimated Time Taken: 1 hour 10 minutes