This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.
This course is designed to cover maximum concepts of machine learning a-z. Anyone can opt for this course. No prior understanding of machine learning is required.
Bonus introductions include Natural Language Processing and Deep Learning.
Below Topics are covered
Chapter – Introduction to Machine Learning
– Machine Learning?
– Types of Machine Learning
Chapter – Setup Environment
– Installing Anaconda, how to use Spyder and Jupiter Notebook
– Installing Libraries
Chapter – Creating Environment on cloud (AWS)
– Creating EC2, connecting to EC2
– Installing libraries, transferring files to EC2 instance, executing python scripts
Chapter – Data Preprocessing
– Null Values
– Correlated Feature check
– Data Molding
– Imputing
– Scaling
– Label Encoder
– On-Hot Encoder
Chapter – Supervised Learning: Regression
– Simple Linear Regression
– Minimizing Cost Function – Ordinary Least Square(OLS), Gradient Descent
– Assumptions of Linear Regression, Dummy Variable
– Multiple Linear Regression
– Regression Model Performance – R-Square
– Polynomial Linear Regression
Chapter – Supervised Learning: Classification
– Logistic Regression
– K-Nearest Neighbours
– Naive Bayes
– Saving and Loading ML Models
– Classification Model Performance – Confusion Matrix
Chapter: UnSupervised Learning: Clustering
– Partitionaing Algorithm: K-Means Algorithm, Random Initialization Trap, Elbow Method
– Hierarchical Clustering: Agglomerative, Dendogram
– Density Based Clustering: DBSCAN
– Measuring UnSupervised Clusters Performace – Silhouette Index
Chapter: UnSupervised Learning: Association Rule
– Apriori Algorthm
– Association Rule Mining
Chapter: Deploy Machine Learning Model using Flask
– Understanding the flow
– Serverside and Clientside coding, Setup Flask on AWS, sending request and getting response back from flask server
Chapter: Non-Linear Supervised Algorithm: Decision Tree and Support Vector Machines
– Decision Tree Regression
– Decision Tree Classification
– Support Vector Machines(SVM) – Classification
– Kernel SVM, Soft Margin, Kernel Trick
Chapter – Natural Language Processing
Below Text Preprocessing Techniques with python Code
– Tokenization, Stop Words Removal, N-Grams, Stemming, Word Sense Disambiguation
– Count Vectorizer, Tfidf Vectorizer. Hashing Vector
– Case Study – Spam Filter
Chapter – Deep Learning
– Artificial Neural Networks, Hidden Layer, Activation function
– Forward and Backward Propagation
– Implementing Gate in python using perceptron
Chapter: Regularization, Lasso Regression, Ridge Regression
– Overfitting, Underfitting
– Bias, Variance
– Regularization
– L1 & L2 Loss Function
– Lasso and Ridge Regression
Chapter: Dimensionality Reduction
– Feature Selection – Forward and Backward
– Feature Extraction – PCA, LDA
Chapter: Ensemble Methods: Bagging and Boosting
– Bagging – Random Forest (Regression and Classification)
– Boosting – Gradient Boosting (Regression and Classification)
Optional: Setup Environment
-
1What is Machine Learning?
-
2Types of Machine Learning
•Supervised - labeled data is used to help machines recognize characteristics and use them for future data. E.g: classify pictures of cats and dogs.
•Unsupervised - we simply put unlabeled data and let machine understand the characteristics and classify it. E.g: Clustering (News Article)
•Reinforcement Learning: RML interact with the environment by producing actions and then analyze errors or rewards. E.g: Chess game
-
3Supervised Learning
Full Course Material can be download from github: https://github.com/bansalrishi/MachineLearningWithPython_UD
•Regression: This is a type of problem where we need to predict the continuous-response value (ex : above we predict number which can vary from -infinity to +infinity)
E.g: House Price, Value of stock
•Classification: This is a type of problem where we predict the categorical response value where the data can be separated into specific “classes” (ex: we predict one of the values in a set of values).
E.g: Mail spam or not, Diabetes or not, etc
-
4Quiz 1
Test your understanding about Regression and Classification Problems
Optional: Setup Environment on cloud (AWS)
Data Preprocessing
Supervised Learning: Regression
-
14What is Data Preprocessing?
•Preprocessing refers to transformation before feeding to machine learning
•Quality of data is important to train the model
•Source – Government databases, professional or company data sources(twitter), your company, etc
•Data will never be in the format you need – Pandas Dataframe for reformatting
•Columns to remove – No values, duplicate(correlated column, e.g: house size in ft and metres)
•Learning algorithms understands only number, converting text image to number is required
•Unscaled or unstandardized data have might have unacceptable prediction
-
15Checking for Null Values: Concept + Python Code
•Check for Null values
•Remove or Impute
•df.isnull().values.any()
•df = df.dropna(how='any',axis=0)
-
16Correlated Feature Check: Concept + Python Code
•Sometimes two features that are meant to measure different characteristics of a model are influenced by common mechanism and they move together.
•
•How to Handle Correlation:
•Remove one of the feature
•Apply Principal Component Analysis(PLA)
-
17Data Molding(Encoding): Concept + Python Code
•Adjusting Data Types - Inspect data types to see if there are any issues. Data should be numeric.
•If required create new columns
-
18Data Splitting
-
19Data Splitting : Python Code
-
20Impute Missing Values: Concept + Python Code
Missing Data - Ways to Handle
•Drop rows
•Replace values (Impute)
-
21Scaling
•Feature Scaling is a technique to standardize the independent features present in the data in a fixed range.
•It is performed during the data pre-processing to handle highly varying magnitudes or values or units.
•Disadvantage:
• Without Feature Scaling a machine learning algorithm tends to weigh greater values -> higher and consider smaller values as the lower values, regardless of the unit of the values.
-
22Scaling: Python Code
-
23Label Encoder: Concept + Code
Convert text values to numbers. These can be used in the following situations:
•There are only two values for a column in your data. The values will then become 0/1 - effectively a binary representation
•The values have relationship with each other where comparisons are meaningful (e.g. low<medium<high)
-
24One-Hot Encoder: Concept + Python Code
•Use when there is no meaningful comparison between values in the column
•Creates a new column for each unique value for the specified feature in the data set
-
25Data Preprocessing
Supervised Learning: Classification
-
26Simple Linear Regression: Concept
Full Course content (Code) can be downloaded from Github: https://github.com/bansalrishi/MachineLearningWithPython_UD
-
27Minimizing Cost Function
•Error = (y_pred – y_act)^2
•Two Methods:
1.Least Square Criterian (OLS)
2.Gradient Descent
-
28Ordinary Least Square(OLS)
•non-iterative method that fits a model such that the sum-of-squares of differences of observed and predicted values is minimized
•Error = (y_pred – y_act)^2
•Line => y = bo + b1x
-
29Gradient Descent
•Cost Function, J(m,c) = (y_pred – y_act)^2 / No. of data point
•Hypothesis: y_pred = c + mx
-
30Measuring Regression Model Performance: R^2 (R - Square)
It tells how well regression equation explains the data.
•A value of R^2 = 1 means regression predictions perfectly fit/explains the data.
Question: Can ?2 be negative?
•Ans: When: ( Sum of Square Errors(SSE) > {Total Sum of Squares(SST)} )
•This means when our predicted model performs worst than average line which is a very rare case.
-
31Simple Linear Regression: Python Code -1
Code file and datasets can be found in the regression.zip.
-
32Simple Linear Regression: Python Code -2
-
33Assumptions of Linear Regression
1.Linearity
•linear regression is sensitive to outlier effects
•needs the relationship between the independent and dependent variables to be linear
•linearity assumption can best be tested with scatter plots
2. Homoscedasticity
•meaning the residuals are equal across the regression line
•Heteroscedasticity Test to check - The Goldfeld-Quandt Test
3. Multivariate Normality
•This assumption can best be checked with a histogram or a Q-Q-Plot
•Normality can be checked with a goodness of fit test(Kolmogorov-Smirnov)
4. No Autocorrelation in the Data
•when the residuals are not independent from each other.
•in simple terms when the value of y(x+1) is not independent from the value of y(x)
•Durbin-Watson test
5. Lack of Multicollinearity
•Multicollinearity: Model cannot differentiate between the effect of D1 and D2 as these are totally related.
•fixed using correlation in data pre processing
-
34Multiple Linear Regression: Concept
Assumption:
There is a linear relationship between both the dependent and independent variables.
It also assumes no major correlation between the independent variables.•Multiple regressions can be linear and nonlinear.
-
35Dummy Variable
y = b0 + b1x1 + b2x2 + b3x3 + b4x4 + b5x5
Here x4,x5 are dummy variable
x5 = 1 – x4
Multicollinearity -> that’s why its called as dummy variable
for 2 -> 1 & 0
For: > 2 -> column
-
36Multiple Linear Regression: Python - 1
-
37Multiple Linear Regression: Python - 2
-
38Multiple Linear Regression: Python - 3
-
39Polynomial Linear Regression: Concept
If data is not linear, we need polynomial terms to fit it better
-
40Polynomial Linear Regression: Python - 1
-
41Polynomial Linear Regression: Python - 2
-
42Polynomial Linear Regression: Python - 3
-
43Polynomial Linear Regression: Python - 4
-
44Linear Regressions Comparisons
-
45Simple Linear Regression: Quiz
-
46Boston Housing Price Prediction
-
47Assignment: Predicting Housing Prices (Boston Data Solution): Optional
UnSupervised Learning: Clustering
-
48Logistic Regression
Issue with Linear Regression
•But if we have an outlier, it will go horribly wrong
•Because of one outlier, whole linear regression prediction is going wrong
Logistic Regression
Logistic regression can be understood by standard logistic function. Logistic function is a Sigmoid function, which takes real value between zero and one.
If we plot sigmoid function, the graph will be S curve. When there is an outlier, sigmoid function takes care of it.
Linear regression assumes that the data follows a linear function.
Logistic regression models the data using the sigmoid function
-
49Confusion Matrix: Measuring Performance of Classification Model
Describe the performance of a classification model
•Accuracy: Is fraction of correct predictions in all prediction made by model
•Precision: Is fraction of correct positive predictions in all positive predictions made by the model
•Recall: Is fraction of correct positive predictions made in actual positive data
-
50Confusion Matrix: Case Study
Spam Filter (positive class: spam): Optimize for precision or specificity because false negatives (spam goes to inbox) are more acceptable than false positives (non-spam caught by spam filter).
Fraudulent transaction detector ( positive class: fraud): Optimize for sensitivity because false positives (normal transactions that are flagged as possible fraud) are more acceptable than false negatives (fraudulent transactions that are not detected)
-
51Logistic Regression: Python 1
-
52Logistic Regression: Python 2
-
53Logistic Regression: Python 3
-
54Logistic Regression: Python 4
-
55K - Nearest Neighbours Algorithm
It assumes that similar things exist in close proximity.
Algorithm:
* Step 1: Choose the no. K of neighbours
* Step 2: Take the K nearest neighbours of the new data points by Euclidean distance
* Step 3: Among K Neighbours, count the no. of data points in each category
* Step 4: Assign new data point to the category where you counted most neighbour -
56K - Nearest Neighbours: Python 1
-
57K - Nearest Neighbours: Python 2
-
58Naive Bayes
Its Naive(innocent) because it assumes that all the features are independent of each other. Which is almost never possible.
•Easy to understand.
•All features are independent.
•All impact results equally.
•Need small amount of data to train the model.
•Fast – up to 100X faster.
•It is highly scalable.
•It can make probabilistic predictions.
•It's simple & out-performs many sophisticated methods.
•Stable to data changes.
-
59Naive Bayes: Python Code
-
60Pickle File: Saving and Loading ML Models: Python
-
61Wine Quality Prediction
-
62Assignment 2: Predicting Wine Quality: Optional
-
63Classify iris plants into three species
UnSupervised Learning: Association Rule
-
64K-Means Algorithm
Algorithm:
1.Initialize k centroids.
2.Select at random K points, the centroids(not necessary from the dataset)
3.Assign each data to the nearest centroid, this step will create clusters.
4.Compute and place the new centroid of each cluster.
5.Reassign each data point to the new closest centroid. If any new reassignment, Repeat steps 4 otherwise go to Finish
-
65Random Initialization Trap
•Solution(Fix) -> K-Means++
•K-Means++ -> smarter initialization of centroids, rest is same
-
66Elbow Method: Choosing optimum no of clusters
WCSS:
•Euclidean distance between a given point and centroid to which it is assigned.
•Iterate this process for all the points in the cluster
•Sum all the values and divide by no. of points
Total WCSS decreases as no. of clusters increases
Total WCSS is minimum when No. of clusters is equal to no. of data points
Elbow Method to find the optimal number of clusters
-
67K-Means++ : Python 1
-
68K-Means++ : Python 2
-
69K-Means++ : Python 3
-
70Hierarchical - Agglomerative Algorithm
•These methods does hierarchical decomposition of datasets.
•Agglomerative method (Bottom-Up): assume each data as cluster & merge to create a bigger cluster
•Divisive method (Top-Down): start with one cluster & continue splitting
Algorithm:
•Start with assigning one cluster to each data - N Cluster
•Combine two closest point in one cluster - (N - 1) Cluster
•Combine two closest cluster into one cluster - (N - 2) Cluster
•Repeat Step 3 until there is only one cluster left
-
71Agglomerative - Dendrogram
-
72Agglomerative - Python 1
-
73Agglomerative - Python 2
-
74Density Based Clustering - DBSCAN
All above techniques are distance based & such methods can find only spherical clusters and not suited for clusters of other shapes. All they are severely impacted by noise or outliers in the data.
Used:
•If data is of arbitrary shape
•Data contain noise
Algorithm has two parameters:
eps: The radius of our neighborhoods around a data point p. If distance between two points is lower or equal to eps then they are neighbours. Small value will lead to large data points as outlier and large value will lead to majority of data points to same cluster.
minPts: The minimum number of data points we want in a neighborhood to define a cluster. minPts >= D +1 and should be at least 3. -
75DBSCAN - Python 1
-
76DBSCAN - Python 2
-
77Measuring UnSupervised Clusters Performance
•Not as straight forward as Supervised Algorithm
•Question of Good clustering is relative
Some Popular Index:
Davies-Bouldin
•Evaluates intra-cluster similarity and inter-cluster differences
•Not Normalized, so difficult to compare between two different datasets
Silhouette Index
•calculates using the mean intra-cluster distance (a) and the mean nearest-cluster distance (b) for each sample
•The Silhouette Coefficient for a sample is (b - a) / max(a, b). To clarify, b is the distance between a sample and the nearest cluster that the sample is not a part of.
•Normalized, a value close to 1 is always good
•good for spherical data structures
-
78Silhouette Index - Python 1
-
79Find optimal no. of brands of car
Deploy Machine Learning Model on AWS Using Flask
-
80Apriori Algorithm
Apriori Algorithm:
•Used to identify frequent item sets.
•Uses bottom-up approach, identify individual items first that fulfill a min occurrence threshold. After this, it add one item at a time and check if the resulting item set still meet the specified threshold.
•Algorithm stops when there are no more item left to add to meet the min. occurence threshold
-
81Association Rule Mining
•Once we generated itemsets using Apriori, we can apply association rules.
•As our item size is having 2 items so our association rule will be of the form (A) -> (B)
Three Stage:
1. Support
2. Confidence
3. Lift
-
82Apriori Association: Python 1
-
83Apriori Association - Python 2
-
84Apriori Association- Python 3