Python - Dimension Reduction – Anomaly Detection (Outlier Detection)

Data:

Employees when they sent job applicant (40 rows)

 

Mission:

How to find & learn about data anomaly from graphic output result

 

Library used:

Pandas

Numpy

Matplotlib

Seaborn

Scikit

PyOD

 

Code:

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt (#5)

import seaborn as sns

 

from sklearn.model_selection import train_test_split

from sklearn.decomposition import PCA

from sklearn.preprocessing import StandardScaler

 

!pip install pyod

 

from pyod.utils.data import get_outliers_inliers

from pyod.models.pca import PCA

from pyod.utils.data import evaluate_print

from pyod.utils.example import visualize

 

url = 'https://raw.githubusercontent.com/kokocamp/vlog119/main/vlog119.csv'

vlog134 = pd.read_csv(url)

vlog134.describe()

 

X = vlog134[['gpa', 'gmat','work_experience']]

y = vlog134['admitted']

X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.25,random_state=0)

 

X_train = pd.DataFrame(X_train)

X_train['y'] = y_train

X_test = pd.DataFrame(X_test)

X_test['y'] = y_test

X_train.head()

 

sc = StandardScaler()

X_train = sc.fit_transform(X_train)

df_train = pd.DataFrame(X_train)

X_test = sc.fit_transform(X_test)

df_test = pd.DataFrame(X_test)

 

sns.scatterplot(x=df_train[0], y=df_train[1], hue=df_train[3], data=df_train)

plt.title('Ground Truth')

 

pca = PCA(n_components=3)

pca.fit(X_train)

 

y_train_pred = pca.labels_

y_train_scores = pca.decision_scores_

sns.scatterplot(x=df_train[0], y=df_train[1], hue=y_train_scores, data=df_train, palette='RdBu_r');

plt.title('Skor Anomali PCA');

 

axes = df_train.plot(subplots=True, figsize=(16, 8), title='Simulated Anomaly Data for Training');

plt.show()

 

axes = df_test.plot(subplots=True, figsize=(16, 8), title='Simulated Anomaly Data for Test');

plt.show()

I wrapped the scenario in a Youtube video below.


 

Click this link (http://paparadit.blogspot.com/2020/11/the-algorithms-of-machine-learning.html), if you want to check out for other algorithms. Thank you for for visiting this blog & subs my channel.

Labels: ,


PS: If you've benefit from this blog,
you can support it by making a small contribution.

Enter your email address to receive feed update from this blog:

Post a Comment

 

Post a Comment

Leave comments here...