Debug School

rakesh kumar
rakesh kumar

Posted on

Real Time Application of Nlp using pipeline text rank

Real time Applications of textrank
Extractive Text Summarization: Uses textrank to extract the most relevant sentences from the input text for a summary.
Keyphrase Extraction: Extracts the top-ranked keyphrases based on TextRank scoring.
Automatic Title Generation: Generates a title using the top-ranked phrase.
Topic Modeling: Identifies key topics in the text based on the highest-ranked phrases.
Document Similarity: Compares two texts to measure their semantic similarity.
Question Generation: Generates simple questions from the top-ranked phrases.
Content Recommendation: Recommends similar content from a corpus by calculating similarity.
Text Highlighting: Highlights the top-ranked phrases, useful for summarization or analysis tools.
Summary Evaluation: Compares a summary with the original text for semantic similarity.
Educational Content Creation: Generates educational questions and key phrases from the input text.

Code for Real-Time NLP Applications
Prerequisites

Install SpaCy and PyTextRank:
Enter fullscreen mode Exit fullscreen mode
pip install spacy pytextrank
python -m spacy download en_core_web_sm
Enter fullscreen mode Exit fullscreen mode
import spacy
from pytextrank import TextRank

# Load SpaCy model and add TextRank
nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("textrank", last=True)

# 1. Extractive Text Summarization
def extractive_summarization(text):
    doc = nlp(text)
    return "\n".join([sent.text for sent in doc._.textrank.summary(limit_phrases=10, limit_sentences=3)])

# 2. Keyphrase Extraction
def keyphrase_extraction(text):
    doc = nlp(text)
    return [phrase.text for phrase in doc._.textrank.phrases[:10]]

# 3. Automatic Title Generation
def generate_title(text):
    doc = nlp(text)
    if doc._.textrank.phrases:
        return doc._.textrank.phrases[0].text  # Top-ranked phrase as title
    return "No title generated."

# 4. Topic Modeling
def topic_modeling(text):
    doc = nlp(text)
    return [phrase.text for phrase in doc._.textrank.phrases[:5]]  # Top 5 phrases as topics

# 5. Document Similarity
def document_similarity(text1, text2):
    doc1 = nlp(text1)
    doc2 = nlp(text2)
    return doc1.similarity(doc2)

# 6. Question Generation
def question_generation(text):
    doc = nlp(text)
    questions = []
    for phrase in doc._.textrank.phrases[:5]:
        questions.append(f"What is {phrase.text}?")
    return questions

# 7. Content Recommendation
def content_recommendation(text, corpus):
    doc = nlp(text)
    recommendations = []
    for item in corpus:
        doc_item = nlp(item)
        similarity = doc.similarity(doc_item)
        recommendations.append((item, similarity))
    return sorted(recommendations, key=lambda x: x[1], reverse=True)[:3]  # Top 3 recommendations

# 8. Text Highlighting
def text_highlighting(text):
    doc = nlp(text)
    highlights = [{"phrase": phrase.text, "rank": phrase.rank} for phrase in doc._.textrank.phrases[:5]]
    return highlights

# 9. Summary Evaluation
def summary_evaluation(summary, original_text):
    summary_doc = nlp(summary)
    original_doc = nlp(original_text)
    return summary_doc.similarity(original_doc)

# 10. Educational Content Creation
def create_educational_content(text):
    doc = nlp(text)
    key_phrases = [phrase.text for phrase in doc._.textrank.phrases[:5]]
    questions = [f"What is {phrase}?" for phrase in key_phrases]
    return {"key_phrases": key_phrases, "questions": questions}

# Example Usage
if __name__ == "__main__":
    sample_text = (
        "Artificial intelligence is transforming the world. "
        "Applications such as chatbots, recommendation systems, and self-driving cars "
        "are redefining industries. The potential for AI is immense, ranging from healthcare "
        "to finance and beyond."
    )

    sample_corpus = [
        "Machine learning is a subset of AI that focuses on data-driven algorithms.",
        "Self-driving cars use AI to navigate and make decisions.",
        "Chatbots powered by natural language processing are revolutionizing customer service.",
    ]

    print("1. Extractive Text Summarization:\n", extractive_summarization(sample_text))
    print("\n2. Keyphrase Extraction:\n", keyphrase_extraction(sample_text))
    print("\n3. Automatic Title Generation:\n", generate_title(sample_text))
    print("\n4. Topic Modeling:\n", topic_modeling(sample_text))
    print("\n5. Document Similarity:\n", document_similarity(sample_text, sample_corpus[0]))
    print("\n6. Question Generation:\n", question_generation(sample_text))
    print("\n7. Content Recommendation:\n", content_recommendation(sample_text, sample_corpus))
    print("\n8. Text Highlighting:\n", text_highlighting(sample_text))
    print("\n9. Summary Evaluation:\n", summary_evaluation(extractive_summarization(sample_text), sample_text))
    print("\n10. Educational Content Creation:\n", create_educational_content(sample_text))
Enter fullscreen mode Exit fullscreen mode

Using TextRank for Automatic Keyword Extraction
TextRank is an unsupervised algorithm for keyword and phrase extraction. It ranks words or phrases based on their importance within the document.

Implementation Using SpaCy and PyTextRank
Enter fullscreen mode Exit fullscreen mode
import spacy
import pytextrank  # Install using pip install pytextrank

# Load SpaCy model
nlp = spacy.load("en_core_web_sm")

# Add PyTextRank to the SpaCy pipeline
import pytextrank
nlp.add_pipe("textrank", last=True)

# Define a custom keyword extractor function
def keyword_extractor(doc):
    # Extract the top-ranked phrases
    keywords = [phrase.text for phrase in doc._.phrases[:10]]  # Top 10 phrases
    keyword_count = {keyword: text.lower().count(keyword.lower()) for keyword in keywords}
    print("Extracted Keywords:", keywords)
    print("Keyword Counts:", keyword_count)
    return doc

# Add the custom extractor to the pipeline
nlp.add_pipe(keyword_extractor, last=True)

# Test the pipeline
text = "AI and machine learning are subsets of data science. Neural networks are widely used in AI applications, including image recognition and natural language processing."
doc = nlp(text)
Enter fullscreen mode Exit fullscreen mode

Summarization

import spacy
import pytextrank  # Install using pip install pytextrank

# Load SpaCy model
nlp = spacy.load("en_core_web_sm")

# Add PyTextRank to the pipeline for keyword extraction
nlp.add_pipe("textrank", last=True)

def summarize_with_keywords(text, num_sentences=3):
    """
    Summarize text by extracting keywords and selecting sentences containing them.

    Args:
        text (str): Input text to summarize.
        num_sentences (int): Number of sentences to include in the summary.

    Returns:
        str: A concise summary of the text.
    """
    # Process the text
    doc = nlp(text)

    # Extract top-ranked keywords
    keywords = [phrase.text for phrase in doc._.phrases[:10]]  # Top 10 phrases
    print("Extracted Keywords:", keywords)

    # Rank sentences based on the presence of keywords
    sentence_scores = []
    for sent in doc.sents:
        score = sum(1 for keyword in keywords if keyword.lower() in sent.text.lower())
        sentence_scores.append((score, sent.text))

    # Sort sentences by score and select the top ones
    ranked_sentences = sorted(sentence_scores, key=lambda x: x[0], reverse=True)
    summary_sentences = [sent for _, sent in ranked_sentences[:num_sentences]]

    # Join the top sentences to form the summary
    summary = " ".join(summary_sentences)
    return summary

# Example text
text = """
Artificial Intelligence (AI) and Machine Learning are transforming industries worldwide.
Data-driven approaches have enabled companies like Google and Microsoft to innovate in various fields.
Neural networks, a subset of machine learning, are crucial for applications such as image recognition and natural language processing.
The increasing importance of data has also spurred research in data privacy and ethical AI practices.
"""

# Generate the summary
summary = summarize_with_keywords(text, num_sentences=2)
print("Summary:")
print(summary)
Enter fullscreen mode Exit fullscreen mode

Example Output
Input Text:

Artificial Intelligence (AI) and Machine Learning are transforming industries worldwide.
Data-driven approaches have enabled companies like Google and Microsoft to innovate in various fields.
Neural networks, a subset of machine learning, are crucial for applications such as image recognition and natural language processing.
The increasing importance of data has also spurred research in data privacy and ethical AI practices.
Enter fullscreen mode Exit fullscreen mode

Extracted Keywords:

['machine learning', 'Artificial Intelligence', 'neural networks', 'data', 'AI', 'data privacy', 'image recognition']
Enter fullscreen mode Exit fullscreen mode

Summary:

Artificial Intelligence (AI) and Machine Learning are transforming industries worldwide. Neural networks, a subset of machine learning, are crucial for applications such as image recognition and natural language processing.
Enter fullscreen mode Exit fullscreen mode

Top comments (0)