How to convert string to numeric for data cleaning

df.TotalCharges.values

output

array(['29.85', '1889.5', '108.15', ..., '346.45', '306.6', '6844.5'],
      dtype=object)

df[pd.to_numeric(df.TotalCharges,errors='coerce').isnull()]

Explanation
Step-by-Step Explanation
Step 1: Identify Data Type
The data in the TotalCharges column is recognized as strings. This can be seen from the output that shows values like '29.85' as text rather than numbers.
Step 2: Convert Strings to Numeric
The function pd.to_numeric(df.TotalCharges, errors='coerce') is used to attempt converting the TotalCharges column to numeric values.
errors='coerce': This parameter converts any non-convertible values (e.g., spaces or non-numeric strings) to NaN (Not a Number).
Step 3: Identify Invalid Entries
The .isnull() method is applied to identify which rows resulted in NaN after attempting the conversion. This is indicated by True for rows with invalid data and False for rows with valid numeric values.
Step 4: Remove Rows with Invalid Data
df[pd.to_numeric(df.TotalCharges, errors='coerce').isnull()] filters out the rows where TotalCharges could not be converted to numeric, keeping only rows with valid data.

Simple Example
Consider the following small DataFrame:

Debug School

How to convert string to numeric for data cleaning

Top comments (0)