Debug School

rakesh kumar
rakesh kumar

Posted on

How to convert string to numeric for data cleaning

df.TotalCharges.values
Enter fullscreen mode Exit fullscreen mode

output

array(['29.85', '1889.5', '108.15', ..., '346.45', '306.6', '6844.5'],
      dtype=object)
Enter fullscreen mode Exit fullscreen mode
df[pd.to_numeric(df.TotalCharges,errors='coerce').isnull()]
Enter fullscreen mode Exit fullscreen mode

Explanation
Step-by-Step Explanation
Step 1: Identify Data Type
The data in the TotalCharges column is recognized as strings. This can be seen from the output that shows values like '29.85' as text rather than numbers.
Step 2: Convert Strings to Numeric
The function pd.to_numeric(df.TotalCharges, errors='coerce') is used to attempt converting the TotalCharges column to numeric values.
errors='coerce': This parameter converts any non-convertible values (e.g., spaces or non-numeric strings) to NaN (Not a Number).
Step 3: Identify Invalid Entries
The .isnull() method is applied to identify which rows resulted in NaN after attempting the conversion. This is indicated by True for rows with invalid data and False for rows with valid numeric values.
Step 4: Remove Rows with Invalid Data
df[pd.to_numeric(df.TotalCharges, errors='coerce').isnull()] filters out the rows where TotalCharges could not be converted to numeric, keeping only rows with valid data.

Simple Example
Consider the following small DataFrame:

Image description

Image description

Top comments (0)