Debug School

rakesh kumar
rakesh kumar

Posted on

How to handling space and null value for data cleaning

Create a new DataFrame (df1) that excludes rows where the TotalCharges column contains a space

df1 = df[df.TotalCharges!=' ']
df1.shape
Enter fullscreen mode Exit fullscreen mode
(7032, 20)
Enter fullscreen mode Exit fullscreen mode

no of Identifying Rows Without Spaces

df[df.TotalCharges!=' '].shape
(7032, 20)
Enter fullscreen mode Exit fullscreen mode

Create a new DataFrame (df1) that excludes rows where all column contains a space

# Assuming df is your original DataFrame
for col in df.columns:
    df = df[df[col] != ' ']
Enter fullscreen mode Exit fullscreen mode

Image description

Sample DataFrame

Image description

import pandas as pd
import numpy as np

data = {
    'A': [1, 2, ' ', 4, np.nan],
    'B': ['foo', 'bar', ' ', 'baz', 'qux'],
    'C': [10, np.nan, 30, 40, 50]
}
df = pd.DataFrame(data)

# Remove rows where any column has spaces or null values
df_cleaned = df[(df != ' ').all(axis=1) & (df.notnull().all(axis=1))]

print(df_cleaned)
Enter fullscreen mode Exit fullscreen mode

Image description

  1. Replacing Spaces and Null Values
# Replace spaces with NaN
df.replace(' ', np.nan, inplace=True)

# Fill NaN values with a specified value (e.g., 0)
df.fillna(0, inplace=True)

print(df)
Enter fullscreen mode Exit fullscreen mode

Image description

Using the dropna() Method

# Drop rows with any null values
df_dropped = df.dropna()

print(df_dropped)
Enter fullscreen mode Exit fullscreen mode

Image description

Filling Null Values

# Fill null values with the previous value
df_filled = df.fillna(method='ffill')

print(df_filled)
Enter fullscreen mode Exit fullscreen mode

Image description

Custom Function to Clean Data

def clean_data(df):
    # Remove rows with spaces
    df = df[(df != ' ').all(axis=1)]
    # Replace spaces with NaN and drop nulls
    df.replace(' ', np.nan, inplace=True)
    return df.dropna()

df_cleaned_custom = clean_data(df)

print(df_cleaned_custom)
Enter fullscreen mode Exit fullscreen mode

Image description
Using applymap() to Trim Spaces

# Trim leading and trailing spaces
df = df.applymap(lambda x: x.strip() if isinstance(x, str) else x)

print(df)
Enter fullscreen mode Exit fullscreen mode

Image description

Image description

fill specific col with null value

df['A'].replace(' ', np.nan, inplace=True)
Enter fullscreen mode Exit fullscreen mode

Image description

filter out rows where any column contains a space using applylambda

# Step 1: Create a boolean DataFrame to check for spaces
has_spaces = df.applymap(lambda x: x == ' ')

# Step 2: Filter out rows that have spaces in any column
df_cleaned = df[~has_spaces.any(axis=1)]

# Display the shape of the cleaned DataFrame
df_cleaned.shape
Enter fullscreen mode Exit fullscreen mode

Example Dataset

Objective
We want to identify and remove rows that contain a space (' ') in any of their columns.
Step-by-Step Process
Step 1: Identify Cells Containing Spaces
First, we will check each cell in the DataFrame to identify where the space values (' ') are present.

# Check for spaces in the DataFrame
has_spaces = df.applymap(lambda x: x == ' ')

print(has_spaces)
Enter fullscreen mode Exit fullscreen mode

Image description

Image description

Top comments (0)