Python – Clustering – Density-Based Spatial Clustering of Applications with Noise (DBScan)
Data:
Questionnaire data from mall visitors contains sex, age, salary & shopping score (200 rows).
Mission:
How to predict the probability of shopping score from given age & salary
Library used:
Pandas
Numpy
Seaborn
Matplotlib
Scikit
Code:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.cluster import DBSCAN
from sklearn.preprocessing import StandardScaler
url = 'https://raw.githubusercontent.com/kokocamp/vlog101/master/vlog101.csv'
vlog131 = pd.read_csv(url)
vlog131.info()
X = vlog131[['Usia','Gaji (juta)']]
y = vlog131['Skor Belanja (1-100)']
sns.scatterplot(x="Usia",y="Gaji (juta)",data=vlog131,color="red",alpha=0.5)
X = np.array(X)
XX = StandardScaler().fit_transform(X)
dbscan = DBSCAN(eps=0.3,min_samples=10)
dbscan.fit(XX)
print(dbscan.labels_)
vlog131["kluster"] = dbscan.labels_
vlog131.head()
fig, ax = plt.subplots()
sct = ax.scatter(X[:,0],X[:,1], c = vlog131.kluster, marker = "o", alpha = 0.5)
plt.title("Hasil Klustering DBSCAN")
plt.xlabel("Usia")
plt.ylabel("Gaji (juta)")
plt.show()
usia = input("Usia (thn): ")
usia = int(usia)
gaji = input("Gaji (juta): ")
gaji = int(gaji)
data = [usia,gaji]
hasil = dbscan.fit_predict([data])
print("Prediksi Kluster: ", hasil)
fig, ax = plt.subplots()
sct = ax.scatter(X[:,0],X[:,1], c = vlog131.kluster, marker = "o", alpha = 0.5)
plt.title("Hasil Klustering DBSCAN")
plt.xlabel("Usia")
plt.ylabel("Gaji (juta)")
plt.scatter(usia,gaji, c = "red", s=100)
plt.show()
I wrapped the scenario in a Youtube video below.
Click this link (http://paparadit.blogspot.com/2020/11/the-algorithms-of-machine-learning.html), if you want to check out for other algorithms. Thank you for for visiting this blog & subs my channel.
Labels: Programming, Python
PS: If you've benefit from this blog, you can support it by making a small contribution. |
Post a Comment
Leave comments here...