Machine Learning Zoomcamp Update: Wednesday, 22 September 2021
Date: 22 September 2021
Today, I completed the Sessions 3.4, 3.5, 3.6, 3.7 and 3.8.
Session 3.4 - EDA
This session performed EDA on the data.
Session 3.5 - Feature importance: Churn rate and risk ratio
This session we used churn rate and risk ration to find feature importance.
Key takeaways:
- Churn rate is defined as average of people with a given value of a feature who churned.
- Risk ratio is defined as feature churn rate divided by global churn rate.
Session 3.6 - Feature importance: Mutual information
This session uses mutual information to find feature importance.
Key takeaways:
- Mutual information is a concept of information theory which is used to find dependence between two variables or how much we can learn about one variable using the other variable.
- We can use mutual_info_score in scikit-learn to find mutual information.
Session 3.7 - Feature importance: Correlation
This session uses correlation to find feature importance for numerical features.
Session 3.8 - One-hot encoding
This session we perform one-hot encoding using DictVectorizer in scikit-learn.
Estimated Time Taken: 1 hour 10 minutes