The fill-mask task in NLP refers to the task of predicting a missing word or token in a given sentence. It is a common task used to evaluate the language understanding capabilities of models.
In the fill-mask task, a sentence is provided with one or more masked tokens represented by a special token, usually [MASK]. The goal is to predict the most likely word or token that should replace the masked token(s) based on the context of the sentence.
Here's an example to illustrate the fill-mask task:
Input: "I want to [MASK] a new car."
Output: "I want to buy a new car."
In this example, the word "buy" is the correct prediction for the masked token, based on the context of the sentence.
The fill-mask task can be performed using pre-trained language models, such as BERT or GPT, which have been trained on large amounts of text data. These models have learned to understand the context of words and can generate meaningful predictions for masked tokens.
from transformers import pipeline
unmasker = pipeline("fill-mask")
unmasker("This course will teach you all about <mask> models.", top_k=2)
[{'sequence': 'This course will teach you all about mathematical models.',
'score': 0.19619831442832947,
'token': 30412,
'token_str': ' mathematical'},
{'sequence': 'This course will teach you all about computational models.',
'score': 0.04052725434303284,
'token': 38163,
'token_str': ' computational'}]
Implement Fill Mask
Here's an example code to perform fill-mask using the Hugging Face transformers library and tokenizer in Django, where the input text is provided dynamically via a POST method:
Step 1: Install the required dependencies
pip install transformers
Step 2: Import the necessary libraries and modules in your Django views.py file
from django.http import JsonResponse
from transformers import pipeline, AutoTokenizer
import torch
Step 3: Create a Django view to handle the POST request
def fill_mask_view(request):
# Retrieve the input text from the POST request
input_text = request.POST.get('text')
# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
# Create the pipeline for fill-mask
fill_mask = pipeline("fill-mask", model="bert-base-uncased", tokenizer=tokenizer)
# Perform fill-mask prediction
results = fill_mask(input_text)
# Process the results
predictions = []
for result in results:
prediction = {
"token": result["token_str"],
"score": result["score"]
}
predictions.append(prediction)
# Return the predictions as a JSON response
return JsonResponse({"predictions": predictions})
Step 4: Map the Django URL to the view
from django.urls import path
urlpatterns = [
path('fill-mask/', fill_mask_view, name='fill-mask'),
]
Step 5: Send a POST request to the Django server with the input text
import requests
url = 'http://localhost:8000/fill-mask/'
input_text = 'The [MASK] is blue.'
response = requests.post(url, data={'text': input_text})
predictions = response.json()['predictions']
for prediction in predictions:
print(f"Token: {prediction['token']}")
print(f"Score: {prediction['score']}")
print()
In this example, the Django view fill_mask_view handles the POST request with the input text. It loads the pre-trained BERT model and tokenizer, creates the fill-mask pipeline, and performs the fill-mask prediction on the input text. The results are processed and returned as a JSON response.
When you send a POST request to http://localhost:8000/fill-mask/ with the input text "The [MASK] is blue.", the Django server will respond with a JSON object containing the predicted tokens and their scores. The example code above then prints the predictions to the console.
Note: Make sure you have a Django server running to handle the requests. Adjust the URL and port number in the code according to your Django server configuration.
Output:
Token: sky
Score: 0.36311510276794434
Token: color
Score: 0.12426304870891571
Token: water
Score: 0.07195656740617752
Token: sky's
Score: 0.057355597913980484
Token: ocean
Score: 0.051273450702667236
Top comments (0)