Implement simple K-Means Algorithm to demonstrate the clustering rule on the following datasets: (a) iris.arff (b) student.arff
To implement a simple K-Means algorithm for clustering the datasets iris.arff
and student.arff
, you'll need a few key tools. Here's a step-by-step guide:
Ensure you have Python installed along with the necessary libraries, such as numpy
, pandas
, and scikit-learn
.
pip install numpy pandas scikit-learn
You will need the liac-arff
library to read .arff
files.
pip install liac-arff
Here's a Python script that demonstrates a simple K-Means clustering on the datasets:
import numpy as np
import pandas as pd
import arff
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
# Function to load ARFF file and convert to DataFrame
def load_arff_to_dataframe(file_path):
with open(file_path, 'r') as file:
data = arff.load(file)
columns = [attr[0] for attr in data['attributes']]
df = pd.DataFrame(data['data'], columns=columns)
return df
# Function to perform K-Means clustering
def perform_kmeans_clustering(data, n_clusters, features):
# Standardize features
scaler = StandardScaler()
scaled_features = scaler.fit_transform(data[features])
# Apply K-Means
kmeans = KMeans(n_clusters=n_clusters, random_state=42)
data['Cluster'] = kmeans.fit_predict(scaled_features)
return data, kmeans
# Load datasets
iris_df = load_arff_to_dataframe("iris.arff")
student_df = load_arff_to_dataframe("student.arff")
# Define features for clustering
iris_features = iris_df.columns[:-1] # Assuming last column is the target or non-feature
student_features = student_df.columns # Adjust accordingly based on the dataset
# Perform K-Means clustering
iris_clusters, iris_kmeans = perform_kmeans_clustering(iris_df, n_clusters=3, features=iris_features)
student_clusters, student_kmeans = perform_kmeans_clustering(student_df, n_clusters=3, features=student_features)
# Print results
print("Iris Dataset Clusters:")
print(iris_clusters[['Cluster']].head())
print("\nStudent Dataset Clusters:")
print(student_clusters[['Cluster']].head())
n_clusters
) based on your dataset understanding.Evaluate the clustering results by comparing against the known class labels (for the iris dataset) or by visualizing the cluster centers and data distribution.
Execute the Python script to perform K-Means clustering on your ARFF datasets. Make sure your datasets (iris.arff
and student.arff
) are in the correct directory or provide the full path.
This will give you a basic implementation of K-Means clustering on the given datasets. Adjust according to specific feature requirements or dataset structure.
Answered By