Python – Clustering – Density-Based Spatial Clustering of Applications with Noise (DBScan)

Sunday, July 25, 2021

Data:

Questionnaire data from mall visitors contains sex, age, salary & shopping score (200 rows).

Mission:

How to predict the probability of shopping score from given age & salary

Library used:

Pandas
Numpy
Seaborn
Matplotlib
Scikit

Code:

import pandas as pd

import numpy as np

import seaborn as sns

import matplotlib.pyplot as plt

from sklearn.cluster import DBSCAN

from sklearn.preprocessing import StandardScaler

url = 'https://raw.githubusercontent.com/kokocamp/vlog101/master/vlog101.csv'

vlog131 = pd.read_csv(url)

vlog131.info()

X = vlog131[['Usia','Gaji (juta)']]

y = vlog131['Skor Belanja (1-100)']

sns.scatterplot(x="Usia",y="Gaji (juta)",data=vlog131,color="red",alpha=0.5)

X = np.array(X)

XX = StandardScaler().fit_transform(X)

dbscan = DBSCAN(eps=0.3,min_samples=10)

dbscan.fit(XX)

print(dbscan.labels_)

vlog131["kluster"] = dbscan.labels_

vlog131.head()

fig, ax = plt.subplots()

sct = ax.scatter(X[:,0],X[:,1], c = vlog131.kluster, marker = "o", alpha = 0.5)

plt.title("Hasil Klustering DBSCAN")

plt.xlabel("Usia")

plt.ylabel("Gaji (juta)")

plt.show()

usia = input("Usia (thn): ")

usia = int(usia)

gaji = input("Gaji (juta): ")

gaji = int(gaji)

data = [usia,gaji]

hasil = dbscan.fit_predict([data])

print("Prediksi Kluster: ", hasil)

fig, ax = plt.subplots()

sct = ax.scatter(X[:,0],X[:,1], c = vlog131.kluster, marker = "o", alpha = 0.5)

plt.title("Hasil Klustering DBSCAN")

plt.xlabel("Usia")

plt.ylabel("Gaji (juta)")

plt.scatter(usia,gaji, c = "red", s=100)
plt.show()

I wrapped the scenario in a Youtube video below.

Click this link (http://paparadit.blogspot.com/2020/11/the-algorithms-of-machine-learning.html), if you want to check out for other algorithms. Thank you for for visiting this blog & subs my channel.

Labels: Programming, Python