Debug School

rakesh kumar
rakesh kumar

Posted on

Explain the different type of Imputer in ml

In machine learning, imputers are used to handle missing data in datasets. There are several different types of imputers, each with its own way of estimating missing values based on the available data. Here are some common types of imputers along with examples:

Mean Imputer:
The mean imputer replaces missing values with the mean of the available values for that feature.

import pandas as pd
from sklearn.impute import SimpleImputer

# Example DataFrame with missing values
data = pd.DataFrame({'A': [1, 2, None, 4, 5], 'B': [10, None, 30, 40, 50]})

# Create a mean imputer instance
mean_imputer = SimpleImputer(strategy='mean')

# Fit and transform the data
data_imputed = mean_imputer.fit_transform(data)

print(data_imputed)
# Output: [[1. 10.]
#          [2. 35.]
#          [3. 30.]
#          [4. 40.]
#          [5. 50.]]
Enter fullscreen mode Exit fullscreen mode

Median Imputer:
The median imputer replaces missing values with the median of the available values for that feature.

from sklearn.impute import SimpleImputer

# Create a median imputer instance
median_imputer = SimpleImputer(strategy='median')

# Fit and transform the data
data_imputed = median_imputer.fit_transform(data)

print(data_imputed)
# Output: [[1. 10.]
#          [2. 35.]
#          [3. 30.]
#          [4. 40.]
#          [5. 50.]]
Enter fullscreen mode Exit fullscreen mode

Most Frequent Imputer:
The most frequent imputer replaces missing values with the most frequent value (mode) of the available values for that feature.

from sklearn.impute import SimpleImputer

# Create a most frequent imputer instance
mode_imputer = SimpleImputer(strategy='most_frequent')

# Fit and transform the data
data_imputed = mode_imputer.fit_transform(data)

print(data_imputed)
# Output: [[1. 10.]
#          [2. 30.]
#          [1. 30.]
#          [4. 40.]
#          [5. 50.]]
Enter fullscreen mode Exit fullscreen mode

KNN Imputer:
The KNN imputer replaces missing values by computing the average of the k-nearest neighbors for each sample with missing values.

from sklearn.impute import KNNImputer

# Create a KNN imputer instance
knn_imputer = KNNImputer(n_neighbors=2)

# Fit and transform the data
data_imputed = knn_imputer.fit_transform(data)

print(data_imputed)
# Output: [[1. 10.]
#          [2. 25.]
#          [3. 30.]
#          [4. 40.]
#          [5. 50.]]
Enter fullscreen mode Exit fullscreen mode

Iterative Imputer:
The Iterative Imputer uses a multivariate approach to estimate missing values by modeling each feature with missing values as a function of other features.

from sklearn.experimental import enable_iterative_imputer
from sklearn.impute import IterativeImputer

# Create an Iterative imputer instance
iterative_imputer = IterativeImputer()

# Fit and transform the data
data_imputed = iterative_imputer.fit_transform(data)

print(data_imputed)
# Output: [[1. 10.]
#          [2. 33.]
#          [3. 30.]
#          [4. 40.]
#          [5. 50.]]
Enter fullscreen mode Exit fullscreen mode

These are some of the common types of imputers used in machine learning. The choice of imputer depends on the nature of the data, the presence of missing values, and the specific characteristics of the problem you are trying to solve.

Top comments (0)