Debug School

rakesh kumar
rakesh kumar

Posted on

How to perform scaling or normalization using django

Feature scaling is a common preprocessing step that aims to bring the values of different features or variables to a similar scale. It is particularly useful when working with machine learning algorithms that are sensitive to the scale of the input features. One popular scaling technique is Min-Max scaling, also known as normalization.

The MinMaxScaler class in scikit-learn provides an implementation of the Min-Max scaling technique. When you create an instance of MinMaxScaler using

scaler = MinMaxScaler(),
Enter fullscreen mode Exit fullscreen mode

it initializes the scaler object.

To perform the feature scaling or normalization, you can use the fit_transform method of the scaler object. In the code

scaled_data = scaler.fit_transform(dataset.values), 
Enter fullscreen mode Exit fullscreen mode

the fit_transform method takes the dataset as input and returns the scaled dataset.

Here's a breakdown of the steps involved:

fit_transform: This method fits the scaler to the dataset and applies the scaling transformation simultaneously. It calculates the minimum and maximum values for each feature in the dataset and computes the scaling transformation accordingly. The fit_transform method expects a 2D array-like object as input, which is why we use dataset.values to access the underlying numerical data of the dataset.

scaled_data: The fit_transform method returns the scaled dataset. It is stored in the variable scaled_data for further processing or display purposes.

The resulting scaled_data will contain the scaled values of the features in the dataset, where the values are transformed to a range between 0 and 1. This normalization allows the features to have similar scales, preventing any particular feature from dominating the others during analysis or modeling.

Note that the MinMaxScaler assumes that the data is continuous and does not handle missing values automatically. Therefore, it is important to handle missing values or perform any necessary data cleaning before applying the scaling operation.

You can access the scaled values of individual features in scaled_data using indexing, such as scaled_data[:, 0] to access the first feature column.

Overall, using

scaler = MinMaxScaler() and scaled_data = scaler.fit_transform(dataset.values)
Enter fullscreen mode Exit fullscreen mode

allows you to easily apply Min-Max scaling to your dataset and obtain the scaled data for further analysis or modeling.

MinMaxScaler

Step 1: Import the necessary libraries and modules


from sklearn.preprocessing import MinMaxScaler
import numpy as np
Enter fullscreen mode Exit fullscreen mode

Step 2: Create a sample dataset

Let's assume we have a dataset consisting of two numerical features: feature1 and feature2.

dataset = np.array([[10, 5],
                    [20, 7],
                    [30, 10],
                    [40, 12],
                    [50, 15]])
Enter fullscreen mode Exit fullscreen mode

Step 3: Create an instance of MinMaxScaler

scaler = MinMaxScaler()
Enter fullscreen mode Exit fullscreen mode

Step 4: Fit and transform the dataset

scaled_data = scaler.fit_transform(dataset)
Enter fullscreen mode Exit fullscreen mode

The fit_transform method of MinMaxScaler performs two operations simultaneously:

Fit: This step calculates the minimum and maximum values for each feature in the dataset. These values are used to determine the scaling transformation.
Transform: This step applies the scaling transformation to the dataset based on the calculated minimum and maximum values.
Step 5: View the scaled dataset

The output will be:

[[0.   0.  ]
 [0.25 0.2 ]
 [0.5  0.5 ]
 [0.75 0.7 ]
 [1.   1.  ]]
Enter fullscreen mode Exit fullscreen mode

The scaled dataset contains the scaled values of the features in the range [0, 1]. Each feature is transformed independently.

In this example, feature1 has been scaled from the original range [10, 50] to the scaled range [0.0, 1.0]. Similarly, feature2 has been scaled from the original range [5, 15] to the scaled range [0.0, 1.0].

Step 6: Inverse transform (optional)

If you want to transform the scaled data back to its original scale, you can use the inverse_transform method of MinMaxScaler. For example:

original_data = scaler.inverse_transform(scaled_data)
print(original_data)
Enter fullscreen mode Exit fullscreen mode

The output will be the original dataset:

[[10.  5.]
 [20.  7.]
 [30. 10.]
 [40. 12.]
 [50. 15.]]
Enter fullscreen mode Exit fullscreen mode

The inverse_transform method reverses the scaling transformation and restores the original values.

In summary, the MinMaxScaler allows you to easily scale or normalize your dataset to a specified range, such as [0, 1]. This scaling can be useful in various scenarios, such as when working with machine learning algorithms that require normalized features or when comparing variables with different scales.

Scaling and Normalization using Django

from django.shortcuts import render
from .models import Dataset
from sklearn.preprocessing import MinMaxScaler

def preprocess_data(request):
    # Retrieve the dataset from the database
    dataset = Dataset.objects.all()

    # Perform feature scaling/normalization
    scaler = MinMaxScaler()
    scaled_data = scaler.fit_transform(dataset.values)

    # Render the scaled dataset in a template
    return render(request, 'scaled_data.html', {'scaled_data': scaled_data})
Enter fullscreen mode Exit fullscreen mode
<!DOCTYPE html>
<html>
<head>
    <title>Scaled Data</title>
</head>
<body>
    <h1>Scaled Dataset</h1>
    <table>
        <tr>
            <th>Feature 1</th>
            <th>Feature 2</th>
        </tr>
        {% for row in scaled_data %}
            <tr>
                <td>{{ row.0 }}</td>
                <td>{{ row.1 }}</td>
            </tr>
        {% endfor %}
    </table>
</body>
</html>
Enter fullscreen mode Exit fullscreen mode

Another Example

Define a Django model: Create a model in Django that represents your dataset. The model will define the fields and their types for storing the data. For example, let's consider a dataset for storing information about products with price and quantity fields. In your Django app's models.py file, define a model like this:

from django.db import models

class Product(models.Model):
    name = models.CharField(max_length=100)
    price = models.FloatField()
    quantity = models.IntegerField()
Enter fullscreen mode Exit fullscreen mode
# Add more fields as per your dataset requirements
Enter fullscreen mode Exit fullscreen mode

Apply migrations: Run the following command in the terminal to apply the migrations and create the necessary database table for your model:

python manage.py makemigrations
python manage.py migrate
Enter fullscreen mode Exit fullscreen mode

Create dataset instances: Now, you can create instances of your dataset by instantiating the model and setting the field values. Here's an example of creating a few instances of the Product dataset:

from your_app.models import Product

def create_dataset():
    # Create instances of the Product dataset
    product1 = Product(name='Product 1', price=100.0, quantity=5)
    product1.save()

    product2 = Product(name='Product 2', price=50.0, quantity=3)
    product2.save()

    product3 = Product(name='Product 3', price=200.0, quantity=8)
    product3.save()
Enter fullscreen mode Exit fullscreen mode

Perform data scaling and normalization: Once you have the dataset instances, you can perform data scaling and normalization operations. In this example, let's scale the price and quantity fields using Min-Max scaling to a range between 0 and 1.

from sklearn.preprocessing import MinMaxScaler
from your_app.models import Product

def preprocess_dataset():
    # Fetch all instances of the Product dataset
    products = Product.objects.all()

    # Perform data scaling and normalization on the dataset
    scaler = MinMaxScaler()

    # Create a list to store the scaled values
    scaled_prices = []

    for product in products:
        # Scale the price
        scaled_price = scaler.fit_transform([[product.price]])
        scaled_prices.append(scaled_price[0][0])

    # Update the instances with the scaled values
    for i, product in enumerate(products):
        product.price = scaled_prices[i]

        # Save the updated instance
        product.save()
Enter fullscreen mode Exit fullscreen mode

In this example, the price field is scaled using the Min-Max scaler from the scikit-learn library. The scaled values are stored in a list and then updated in the respective dataset instances.

Top comments (0)