Create a new DataFrame (df1) that excludes rows where the TotalCharges column contains a space
df1 = df[df.TotalCharges!=' ']
df1.shape
(7032, 20)
no of Identifying Rows Without Spaces
df[df.TotalCharges!=' '].shape
(7032, 20)
Create a new DataFrame (df1) that excludes rows where all column contains a space
# Assuming df is your original DataFrame
for col in df.columns:
df = df[df[col] != ' ']
Sample DataFrame
import pandas as pd
import numpy as np
data = {
'A': [1, 2, ' ', 4, np.nan],
'B': ['foo', 'bar', ' ', 'baz', 'qux'],
'C': [10, np.nan, 30, 40, 50]
}
df = pd.DataFrame(data)
# Remove rows where any column has spaces or null values
df_cleaned = df[(df != ' ').all(axis=1) & (df.notnull().all(axis=1))]
print(df_cleaned)
- Replacing Spaces and Null Values
# Replace spaces with NaN
df.replace(' ', np.nan, inplace=True)
# Fill NaN values with a specified value (e.g., 0)
df.fillna(0, inplace=True)
print(df)
Using the dropna() Method
# Drop rows with any null values
df_dropped = df.dropna()
print(df_dropped)
Filling Null Values
# Fill null values with the previous value
df_filled = df.fillna(method='ffill')
print(df_filled)
Custom Function to Clean Data
def clean_data(df):
# Remove rows with spaces
df = df[(df != ' ').all(axis=1)]
# Replace spaces with NaN and drop nulls
df.replace(' ', np.nan, inplace=True)
return df.dropna()
df_cleaned_custom = clean_data(df)
print(df_cleaned_custom)
Using applymap() to Trim Spaces
# Trim leading and trailing spaces
df = df.applymap(lambda x: x.strip() if isinstance(x, str) else x)
print(df)
fill specific col with null value
df['A'].replace(' ', np.nan, inplace=True)
filter out rows where any column contains a space using applylambda
# Step 1: Create a boolean DataFrame to check for spaces
has_spaces = df.applymap(lambda x: x == ' ')
# Step 2: Filter out rows that have spaces in any column
df_cleaned = df[~has_spaces.any(axis=1)]
# Display the shape of the cleaned DataFrame
df_cleaned.shape
Example Dataset
Objective
We want to identify and remove rows that contain a space (' ') in any of their columns.
Step-by-Step Process
Step 1: Identify Cells Containing Spaces
First, we will check each cell in the DataFrame to identify where the space values (' ') are present.
# Check for spaces in the DataFrame
has_spaces = df.applymap(lambda x: x == ' ')
print(has_spaces)
Top comments (0)