CRISP-DM Methodology With Python (Model Deployment Using Flask Included) | Classification Case Study Using KNN Model

Jada Ng Pooi Ling
13 min readMar 19, 2022

--

Cross-Industry Standard Process for Data Mining (CRISP-DM) methodology consists of 6 phases, including business understanding, data preparation & explanatory data analysis (EDA), modeling, evaluation, deployment, as well as maintenance, and monitoring. For this case study purpose, I would use the E-Commerce Shipping Data from Kaggle.🚢

Source: Pexels@Martin Damboldt

Phase 1: Business Understanding 🐣

Before we start an analysis process, we need to set SMART (specific, measurable, actionable, results-oriented, and time-bound) goals. A SMART goal would allow us to understand the criteria to be used to judge the success of a project from the business point of view.🎯

For this case study, the primary objective could be to is to achieve an accuracy rate of at least 80% in predicting customer ratings within a year using specific order details. This will help businesses make data-driven decisions about their products, marketing strategies, and customer service. Related business questions might be “Is the customer query being answered?”, “Was the product delivered on time?”, or “If product importance is high, did the products being delivered on time?”.

To make our goal become a reality, we need a project plan that describes the necessary steps, constraints (E.g. the size of the data set that it is applicable to use for modeling), deadline, risks/ events that might cause delay or failure of the project, as well as corresponding contingency plans.

Phase 2: Data Preparation/ Explanatory Data Analysis (EDA)📈

  • Download the dataset from E-Commerce Shipping Data and save it on your local PC. Open your Jupyter Notebook and import the necessary libraries. To load the dataset, you need to right-click the downloaded CSV file, select ‘copy’, and paste the path to the code. Remember to put r before the path to convert normal string to raw string. Otherwise, you might get a Unicode error.
import numpy as np # linear algebra
import pandas as pd # data processing
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
ecomm = pd.read_csv(r"C:\Users\User\Desktop\Data Science\Kaggle Data\ecomm.csv")
  • Using Dataframe.info(), we get to know that the dataset comprises 12 columns and 10,999 rows. 8 columns of the data are in the form of integers, while the rest is in string form. We can also conclude there are no missing values in any column of the data (10,999 non-null for all columns of the dataset).
ecomm.info()
  • We can rename the column of the dataframe by replacing the underscore with the spacing using the string.replace() method. After that, we can capitalize the first letter of every column name and change the others to lowercase using the string.title() method.
new_cols=[]
for i in ecomm.columns[1:-1]:
i = i.replace("_"," ")
i = i.title()
new_cols.append(i);
new_cols = ['ID'] + new_cols
new_cols.append('Arrival')
ecomm.columns = new_cols
ecomm.columns.to_list()
  • To get a basic understanding of data, we could useDataFrame.describe() functions. By default, this function returns descriptive statistics for the numerical variables. However, we can find out the information for the objects by including include=’object’ as the parameter. (Cool, right?😎)
ecomm.iloc[:,1:-1].describe()
ecomm.describe(include='object')
  • Before we create any visualization, we need to understand the type of data for each of the columns of the dataset. The data under ‘Cost Of The Product’, ‘Weight In Gms’, and ‘Discount Offered’ are numerical and we could use histograms to explore the cumulative frequency distribution of those columns. Meanwhile, we could use countplots for the remaining variables.
# Plotting multiple graphs in a grid
# Exploring the distribution of numeric columns using cumulative frequency distribution
fig, ax = plt.subplots(figsize=(20,16), facecolor='#F2F4F4')
fig.subplots_adjust(left=None, bottom=None, right=None, top=None, wspace=0.2, hspace=0.5)
count=1
ls = ['Cost Of The Product', 'Weight In Gms', 'Discount Offered']
for i in ls:
plt.subplot(3,1,count)
h = sns.histplot(x=i, kde=True, data=ecomm)
h.set_title(('frequency distribution of ' + i).title(), fontsize=13)
count+=1

From the histograms above, we can infer that:

  1. The cost of most of the products falls within the range of $240.00–275.00.
  2. The weight of most of the products falls within the range of 1,000–2,000gms and 4,000–6,000gms.
  3. The discount that is most likely to be given is between 1% to 10%. There are many outliers located in the third quartile.
# for categorical data
fig, axes = plt.subplots(4,2,figsize=(16,25), facecolor='#F2F4F4')

# countplot for 'Warehouse Block'
abs_whs=ecomm["Warehouse Block"].value_counts(ascending=False)
sns.countplot(x=ecomm["Warehouse Block"], order=abs_whs.index, ax=axes[0,0], palette='CMRmap_r')
axes[0,0].set_title('Orders Handled By Each Warehouse Block', fontsize=12)
rel_whs=ecomm["Warehouse Block"].value_counts(ascending=False, normalize=True).values*100
lbs_whs=[f"{w[0]} ({w[1]:.2f}%)" for w in zip(abs_whs,rel_whs)]
axes[0,0].bar_label(container=axes[0,0].containers[0], labels=lbs_whs)

# countplot for 'Mode Of Shipment'
abs_ship = ecomm["Mode Of Shipment"].value_counts(ascending=False)
sns.countplot(x=ecomm["Mode Of Shipment"], order=abs_ship.index, ax=axes[0,1], palette=['#DC143C','#556b2f','#008b8b'])
axes[0,1].set_title('Number of Orders By Shipment Mode', fontsize=12)
rel_ship = ecomm["Mode Of Shipment"].value_counts(ascending=False, normalize=True).values*100
lbs_ship = [f"{s[0]} ({s[1]:.2f}%)" for s in zip (abs_ship,rel_ship)]
axes[0,1].bar_label(container=axes[0,1].containers[0], labels=lbs_ship)

# countplot for 'Customer Care Calls'
abs_calls=ecomm["Customer Care Calls"].value_counts(ascending=False)
sns.countplot(x=ecomm["Customer Care Calls"], order=abs_calls.index,ax=axes[1,0],palette='cubehelix')
axes[1,0].set_title('Number of Customer Care Calls Made by Customers', fontsize=12)
rel_calls=ecomm["Customer Care Calls"].value_counts(ascending=False, normalize=True).values*100
lbs_calls=[f"{c[0]} ({c[1]:.2f}%)" for c in zip(abs_calls, rel_calls)]
axes[1,0].bar_label(container=axes[1,0].containers[0], labels=lbs_calls)

# countplot for 'Customer Rating'
abs_rating = ecomm["Customer Rating"].value_counts(ascending=False)
sns.countplot(x=ecomm["Customer Rating"], order=abs_rating.index,ax=axes[1,1],palette="rocket")
axes[1,1].set_title('Customer Rating Received', fontsize=12);
rel_rating = ecomm["Customer Rating"].value_counts(ascending=False, normalize=True).values*100
lbs_rating = [f"{r[0]} ({r[1]:.2f}%)" for r in zip(abs_rating, rel_rating)]
axes[1,1].bar_label(container=axes[1,1].containers[0], labels=lbs_rating)

# countplot for 'Prior Purchases'
abs_prior_pur = ecomm["Prior Purchases"].value_counts(ascending=False)
sns.countplot(x=ecomm["Prior Purchases"], order=abs_prior_pur.index,ax=axes[2,0],palette='viridis')
axes[2,0].set_title('Number of Prior Purchases Made by Customers', fontsize=12)
rel_prior_pur = ecomm["Prior Purchases"].value_counts(ascending=False, normalize=True).values*100
lbs_prior_pur = [f"{pur[0]} ({pur[1]:.0f}%)" for pur in zip(abs_prior_pur, rel_prior_pur)]
axes[2,0].bar_label(container=axes[2,0].containers[0], labels=lbs_prior_pur)

# countplot for 'Product Importance'
abs_priority = ecomm["Product Importance"].value_counts(ascending=False)
sns.countplot(x=ecomm["Product Importance"], order=abs_priority.index,ax=axes[2,1])
axes[2,1].set_title('Number of Orders Made by Product Importance', fontsize=12)
rel_priority = ecomm["Product Importance"].value_counts(ascending=False, normalize=True).values*100
lbs_priority = [f"{i[0]} ({i[1]:.2f}%)" for i in zip(abs_priority, rel_priority)]
axes[2,1].bar_label(container=axes[2,1].containers[0], labels=lbs_priority)

# countplot for 'Gender'
abs_gender = ecomm["Gender"].value_counts(ascending=False)
sns.countplot(x=ecomm["Gender"], order=abs_gender.index,ax=axes[3,0],palette=['#800000','#191970'])
axes[3,0].set_title("Number of Orders Made by Customers' Gender", fontsize=12)
rel_gender = ecomm["Gender"].value_counts(ascending=False, normalize=True).values*100
lbs_gender = [f"{g[0]} ({g[1]:.2f}%)" for g in zip(abs_gender, rel_gender)]
axes[3,0].bar_label(container=axes[3,0].containers[0], labels=lbs_gender)

# countplot for 'Arrival'
abs_arrival = ecomm["Reached.On.Time Y.N"].value_counts(ascending=False)
sns.countplot(x=ecomm["Reached.On.Time Y.N"], order=abs_arrival.index,ax=axes[3,1],palette='tab20c_r')
axes[3,1].set_title('Number of Orders Based On Arrival Time', fontsize=12)
axes[3,1].set_xticklabels(['Late', 'On Time'])
rel_arrival = ecomm["Reached.On.Time Y.N"].value_counts(ascending=False, normalize=True).values*100
lbls_arrival=[f"{a[0]}({a[1]:.2f}%)" for a in zip(abs_arrival,rel_arrival)]
axes[3,1].bar_label(container=axes[3,1].containers[0],labels=lbls_arrival);

From the subplot above, we can make inferences as follow:

  1. 33.33% of the orders were handled by warehouse block F, while other warehouse blocks handled the rest of the orders equally.
  2. Most of the orders were shipped by ship (67.84%), followed by flight (16.16%), and finally by road (16%).
  3. 32.34% of the customers needed to make 4 calls to track their shipment(s). This is a warning point for the company as the customers’ concerns need to be solved as soon as possible without the need for the customers to make several calls.
  4. 1 is the rating with the second-highest count. It is more likely due to the fact the products are not delivered on time and the high number of calls that the customers need to make.
  5. About 36% of the customers had made 3 prior purchases. There are 306 loyal customers who had made at least 8 prior purchases.
  6. Only 8.62% of the orders are of high importance. Most of the orders are of low importance (48.16%).
  7. 50.41% of the customers are females, while the remaining are males.
  8. More than half of the total 10,999 orders (59.67%) were late.

👋 If you would like to check out the method to add data labels on your visualization, do check out my other article on “Building Pie Chart, Stacked Bar Chart & Column Bar Chart (With Data Labels) Using Matplotlib & Seaborn”. 🤩

To dig deeper, we could ask some questions about the data and answer them. The results could be in any format.

Phase 3: Modeling 💁‍♀️

Before building a supervised machine-learning model, it would be great to understand the following terminologies.

  • Target (aka dependent variable, Y-variable, response, outcome): The variable we are trying to predict.
  • Features (aka independent variable, X-variable, predictor, attribute): The variable used to predict the target.
  • Record (aka row, case, instance, example): The vector of predictor and outcome values for a specific individual or case.
  • Regression: To identify how much the continuous (numeric) output variable changes with the change in one (in the case of simple linear regression) or more (in the case of multiple linear regression) features.
  • Classification: To predict a class label, which is a choice from a predefined list of possibilities. It can be divided into binary classification (distinguishing between exactly two classes) and multiclass classification (distinguishing between more than two classes).

This case study would be a binary classification as we would map the rating 2 to 5 as 0 to represent a rating other than 1. Since the output is categorical data, we need to use classification ML models, such as logical regression, K-neighbors, support vector machines, etc.

To get the dataset ready for model building, we need to scale numerical variables and use pandas.get_dummies to create dummy variables for categorical variables. After that, we need to separate the data into train and test data, as well as import the necessary ML algorithm libraries to build the models. 👷‍♀️

# import libraries
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier as DT

# create dummy variables for categorical variables
ecomm['Gender'] = ecomm.Gender.map({'F':0, 'M':1})
ecomm['Customer Rating'] = ecomm['Customer Rating'].map({5:0, 4:0, 3:0, 2:0, 1:1})
dummy = pd.DataFrame(pd.get_dummies(ecomm[['Warehouse Block', 'Mode Of Shipment','Product Importance']]))

# for normalizing data
from sklearn.preprocessing import scale
ecomm1 = pd.DataFrame(scale(ecomm[['Cost_of_the_Product','Discount_offered', 'Weight_in_gms']]),
columns=['Cost_of_the_Product','Discount_offered', 'Weight_in_gms'])

# create new datafame for modeling
ecomm_final = pd.concat([ecomm1, dummy,ecomm[['Customer Care Calls', 'Prior Purchases','Gender', 'Arrival','Customer Rating']]],
axis=1)

# Split data into output and input
X = ecomm_final.iloc[:,:-1] # inputs
Y = ecomm_final['Customer Rating'] # outputs

# Split data into train data and test data
X_train, X_test, Y_train, Y_test = train_test_split(X,Y, test_size=0.25,shuffle=True)

# append different classification models into classifiers array
classifiers=[]
KNN_model = KNeighborsClassifier(n_neighbors=11, metric='euclidean')
classifiers.append(KNN_model)
DT_model = DT(criterion = 'entropy',max_depth=4)
classifiers.append(DT_model)

Phase 4: Evaluation 🧠

According to the result below, we could choose the KNN as our final model because it has higher testing accuracy.

from sklearn.metrics import accuracy_score
accuracy_train = []
accuracy_test = []
for clf in classifiers:
clf.fit(X_train, Y_train)
pred_train = clf.predict(X_train)
pred_test = clf.predict(X_test)
acc_train = accuracy_score(Y_train, pred_train)
acc_test = accuracy_score(Y_test, pred_test)
accuracy_train.append(acc_train)
accuracy_test.append(acc_test)
accuracy_result = pd.DataFrame(data={'Model':['KNN','Decision Tree'],
'Training Accuracy':accuracy_train,
'Testing Accuracy':accuracy_test})
accuracy_result.sort_values('Testing Accuracy',ascending=False)

👋 If you want to create a heatmap for confusion matrix, you could check out my other article called ‘Heatmap For Correlation Matrix & Confusion Matrix| Extra Tips On Machine Learning’.🍡

Phase 5: Deployment (Using Flask + HTML + CSS) 🏭

Why deployment is needed? Not every user is a coder, so it is unrealistic to expect the end-users to know how to run the Jupyter Notebook and get output. Hence, model deployment is probably the most crucial part of ML.

How do we deploy a model? It depends on the programming language that you want to use. If you are using R, you could use R Shiny. If you are using Python, you could use Flask and you need to have some basic knowledge of HTML and CSS too. For this case study, I would use Flask for demonstration. No worries! I am going to explain in detail.😉

Now, we have created a model using a KNN classifier. Then, we need to use Pickle to serialize the trained model and save the serialized format to a file. 🥒 After you run the codes below, you could find a pickle file called ‘finalized_knn.pkl’ in your directory. We would use it in the main.py (the application file to make a prediction based on user inputs and sends the result back to the user).

import pandas as pd
from sklearn.preprocessing import scale
from sklearn.neighbors import KNeighborsClassifier
import pickle

# Load dataset
ecomm = pd.read_csv("https://raw.githubusercontent.com/jadanpl/E-Commerce-Shipping/main/E-Commerce%20Shipping%20Data.csv")

# Rename columns
cols=[]
for i in ecomm.columns[1:-1]:
i = i.lower()
cols.append(i);
cols = ['ID'] + cols
cols.append('arrival')
ecomm.columns = cols

# Data preprocessing
ecomm['gender'] = ecomm.gender.map({'F':0, 'M':1})
ecomm['customer_rating'] = ecomm['customer_rating'].map({5:0, 4:0, 3:0, 2:0, 1:1})
dummy = pd.DataFrame(pd.get_dummies(ecomm[['warehouse_block', 'mode_of_shipment','product_importance']]))
ecomm1 = pd.DataFrame(scale(ecomm[['cost_of_the_product','weight_in_gms','discount_offered']]),
columns=['cost_of_the_product','weight_in_gms','discount_offered'])
ecomm_final = pd.concat([ecomm1,dummy,ecomm[['customer_care_calls', 'prior_purchases','gender', 'arrival','customer_rating']]],
axis=1)

# Split data into output and input
X = ecomm_final.iloc[:,:-1] # inputs
Y = ecomm_final['customer_rating'] # outputs

# Model building
KNN_model = KNeighborsClassifier(n_neighbors=11, metric='euclidean')
KNN_model.fit(X, Y)

# Save the model
filename = 'finalized_knn.pkl'
pickle.dump(KNN_model, open(filename, 'wb'))
Structure of Model Deployment Directory

As you see from the figure above, we need to create a folder called ‘templates’ to store the home.html file (basically a form that enables the users to provide their inputs) and the result.html file (a file that shows the result to the users).

Another folder called ‘static’ is to store CSS files to improve the appearance of the webpage or to store any images. You can link the CSS file to the HTML file by using <link>. It is optional whether you want to create a CSS file, as you always can apply internal CSS in the HTML by including the styling features using the <style> tag in the <head> part of the HTML file (⚠️PS: Internal CSS might increase the page size and loading time).

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Customer Rating Prediction</title>
<link rel="stylesheet" type="text/css" href="static/style.css" />
</head>
<body>
<h1>Customer Rating Prediction</h1>
<h3>Let's check whether the customer will give rating 1.</h3>
<center><img src="https://images.pexels.com/photos/799091/pexels-photo-799091.jpeg?auto=compress&cs=tinysrgb&dpr=2&h=650&w=940" alt="Photography of Ship" width="500px"></center>
<form action="/result" method="POST">
<fieldset>
<!--input 1-->
<p>Q1. Select the gender of the customer.</p>
<select required name="gender">
<option value="" disabled selected>-- select an option --</option>
<option value="0">Female</option>
<option value="1">Male</option>
</select>
<!--input 2-->
<p>Q2. Select the warehouse block that in charge.</p>
<select required name="warehouse_block">
<option value="" disabled selected>-- select an option --</option>
<option value="A">Warehouse A</option>
<option value="B">Warehouse B</option>
<option value="C">Warehouse C</option>
<option value="D">Warehouse D</option>
<option value="F">Warehouse F</option>
</select>
<!--input 3-->
<p>Q3. Select the mode of shipment.</p>
<select required name="mode_of_shipment">
<option value="" disabled selected>-- select an option --</option>
<option value="Ship">Ship</option>
<option value="Flight">Flight</option>
<option value="Road">Road</option>
</select>
<!--input 4-->
<p>Q4. Select the product importance.</p>
<select required name="product_importance">
<option value="" disabled selected>-- select an option --</option>
<option value="low">Low</option>
<option value="medium">Medium</option>
<option value="high">High</option>
</select>
<!--input 5-->
<p>Q5. Select the number of calls that the customer has made.</p>
<select required name="customer_care_calls">
<option value="" disabled selected>-- select an option --</option>
<option value="2">2</option>
<option value="3">3</option>
<option value="4">4</option>
<option value="5">5</option>
<option value="6">6</option>
<option value="7">7</option>
</select>
<!--input 6-->
<p>Q6. Did the shipment arrive on time?</p>
<select required name="arrival">
<option value="" disabled selected>-- select an option --</option>
<option value="1">Late</option>
<option value="0">On Time</option>
</select>
<!--input 7-->
<p>Q7. Select the purchases that the customer had made.</p>
<select required name="prior_purchases">
<option value="" disabled selected>-- select an option --</option>
<option value="2">2</option>
<option value="3">3</option>
<option value="4">4</option>
<option value="5">5</option>
<option value="6">6</option>
<option value="7">7</option>
<option value="8">8</option>
<option value="10">10</option>
</select>
<!--input 8-->
<p>Q8. Select the cost of the product (in USD).</p>
<input type="number" placeholder="Enter the product cost" name="cost_of_the_product" required>
<!--input 9-->
<p>Q9. Select the discount given to customer.</p>
<input type="number" placeholder="Enter the discount" name="discount_offered" required>
<!--input 10-->
<p>Q10. Select the weight (in gms) of the product.</p>
<input type="number" placeholder="Enter the product's weight" name="weight_in_gms" required>
<div class="parent">
<button type="submit" class="child" value="Submit">Submit</button>
</div>
</fieldset>
</form>
</body>
</html>

Above is the home.html file. I am going to explain a few important things that you need to know when you create the form based on that.

from flask import Flask, render_template, request
import numpy as np
# import pandas as pd
import pickle

model = pickle.load(open('finalized_knn.pkl', 'rb'))
app = Flask(__name__)

@app.route("/")
def home():
return render_template("home.html")

@app.route("/result", methods=["POST"])
def submit():
global cost_of_the_product,weight_in_gms,discount_offered
global warehouse_block_A,warehouse_block_B,warehouse_block_C,warehouse_block_D,warehouse_block_F
global mode_of_shipment_Flight,mode_of_shipment_Road,mode_of_shipment_Ship
global product_importance_high,product_importance_low,product_importance_medium
global customer_care_calls,prior_purchases,gender,arrival,customer_rating

# HTML to .py
if request.method == "POST":
warehouse_block = request.form["warehouse_block"]
if (warehouse_block == "A"):
warehouse_block_A = 1
warehouse_block_B = 0
warehouse_block_C = 0
warehouse_block_D = 0
warehouse_block_F = 0
elif warehouse_block == "B":
warehouse_block_A = 0
warehouse_block_B = 1
warehouse_block_C = 0
warehouse_block_D = 0
warehouse_block_F = 0
elif warehouse_block == "C":
warehouse_block_A = 0
warehouse_block_B = 0
warehouse_block_C = 1
warehouse_block_D = 0
warehouse_block_F = 0
elif warehouse_block == "D":
warehouse_block_A = 0
warehouse_block_B = 0
warehouse_block_C = 0
warehouse_block_D = 1
warehouse_block_F = 0
elif warehouse_block == "F":
warehouse_block_A = 0
warehouse_block_B = 0
warehouse_block_C = 0
warehouse_block_D = 0
warehouse_block_F = 1

mode_of_shipment = request.form["mode_of_shipment"]
if mode_of_shipment == "Ship":
mode_of_shipment_Flight = 0
mode_of_shipment_Road = 0
mode_of_shipment_Ship = 1
elif mode_of_shipment == "Flight":
mode_of_shipment_Flight = 1
mode_of_shipment_Road= 0
mode_of_shipment_Ship = 0
elif mode_of_shipment == "Road":
mode_of_shipment_Flight = 0
mode_of_shipment_Road = 1
mode_of_shipment_Ship = 0

product_importance = request.form["product_importance"]
if product_importance == "low":
product_importance_high = 0
product_importance_low = 1
product_importance_medium = 0
elif product_importance == "medium":
product_importance_high = 0
product_importance_low = 0
product_importance_medium = 1
elif product_importance == "high":
product_importance_high = 1
product_importance_low = 0
product_importance_medium = 0

customer_care_calls = int(request.form["customer_care_calls"])
arrival = int(request.form["arrival"])
gender = int(request.form["gender"])
prior_purchases = int(request.form["prior_purchases"])
cost_of_the_product = int(request.form["cost_of_the_product"])
discount_offered = int(request.form["discount_offered"])
weight_in_gms = int(request.form["weight_in_gms"])

# .py to HTML
# Get prediction results
x = np.array([cost_of_the_product,weight_in_gms,discount_offered,
warehouse_block_A,warehouse_block_B,warehouse_block_C,warehouse_block_D,warehouse_block_F,
mode_of_shipment_Flight,mode_of_shipment_Road,mode_of_shipment_Ship,
product_importance_high,product_importance_low,product_importance_medium,
customer_care_calls,prior_purchases,gender,arrival])
x = x.reshape((1, -1))
prediction = model.predict(x)
if prediction == 1:
return render_template('result.html', prediction="give rating 1.")
else:
return render_template('result.html', prediction="not give rating 1.")

if __name__ == "__main__":
app.run(debug=True)

Other than Flask library, we also need to import the following libraries:

  • render_template : to render the HTML templates.
  • request: to send HTTP requests using Python.

At first, we need to create the route to the home page by using @app.route(“/”). After that, we need to build a custom-built predicting function and @app.route(“/result”, methods=[“POST”]) to predict shipment arrival (arrive on time or arrive late) from the inputs from the user and render the output to the result.html. It is important to note that the methods=[“POST”] in the form element of home.html too. I believe you understand the connection between the home.html and main.py.🤓

After you have completed the coding part, you can run the main.py on your local PC and view the app using the browsers by clicking the link that appears. You will see the home.html page after clicking the link.

Phase 6: Maintenance & Monitoring🚴‍♂️

There are always ever-changing political, economic, social, and technological (PEST) factors. Models do not always run in static environments, and these PEST factors cause model drift (decline in model performance) as the model has no predictive power for interpreting unfamiliar data. 🍭

Regardless of how the model is performing, it is needed to understand how frequently data and variables in your model change to retrain your model at a regular gap. ⏳

There is room for improvement for my codes. Let me know if you have any idea to make it better. Anyway, hope you enjoy my article. If you want, you can also check the full project on my GitHub repository called E-Commerce-Shipping. 🙌

⚠️NOTE: You may implement tuning and cross-validation techniques to guide your model selection process.

Happy learning!😊

--

--

Jada Ng Pooi Ling
Jada Ng Pooi Ling

Written by Jada Ng Pooi Ling

Stay Constantly Curious🧐 I LOVE data and writing! 😍 Hope my articles could bring some inspiration to you on your learning journey. 😄

No responses yet