Debug School

rakesh kumar
rakesh kumar

Posted on • Updated on

How to scrape results from multiple pages after click anchor tag element using Selenium python

How to select anchor tag text element and click using By.XPATH

How to scrape results from from a web page by repeatedly extracting elements with a specific class and clicking a "Load More" button

How to select elements By CLASS_NAME

How to select elements By CLASS_NAME using XPATH

Learning Points
click the element that loads more items using LINK_TEXT

How to apply while loop length of the quote list upto desirded count

Step 1: install selenium using python

pip install --upgrade selenium
Enter fullscreen mode Exit fullscreen mode

Step 2: Import Required Libraries

import pandas as pd
from selenium import webdriver
import warnings
warnings.filterwarnings('ignore')
from selenium.webdriver.common.by import By
import time
Enter fullscreen mode Exit fullscreen mode

Step 3: Start a WebDriver Instance

driver=webdriver.Chrome()
Enter fullscreen mode Exit fullscreen mode

Step 4: Navigate to a Webpage
Navigate to the webpage where you want to interact with the search input bar:

driver.get('https://www.azquotes.com/')
Enter fullscreen mode Exit fullscreen mode

Step 5: click anchor tag text Element

target_href = '/top_quotes.html'

# Locate the anchor tag by its href attribute
anchor_element = driver.find_element(By.XPATH, f'//a[@href="{target_href}"]')

# Click the anchor tag (or perform any other action)
anchor_element.click()
Enter fullscreen mode Exit fullscreen mode

Step 8: Scrape the all quote
How to scrape results from from a web page by repeatedly extracting elements with a specific class and clicking a "Load More" button
Step 1:declare variable desired number of data points to collect
Step 2:declare variable as desired count to desired number of data points to collect
Step 3:Sets up a while loop that will continue executing as long as the length of the quote list is less than or equal to the desired count(above declared variable)
step4: collect all title By class_name
step5:store all data in list using append
step6:click the element that loads more items using LINK_TEXT
step7:Use of list slicing to get records upto above declared variable desired_count

quote = []

# Define the number of data points you want to collect
desired_count = 1000
while len(quote) <= desired_count:
    # Find all elements with class "_2WkVRV" on the current page
    all_quote_elements = driver.find_elements(By.CLASS_NAME, "title")

    # Extract the text values and add them to the data list
    for element in all_quote_elements:
        quote.append(element.text)


    # Try to find and click the element that loads more items (e.g., a "Load More" button)
    try:
        load_more_button = driver.find_element(By.LINK_TEXT, 'Next →')
        load_more_button.click()
    except:
        # If there are no more items to load, exit the loop
        break 


quote = quote[:desired_count]
quote
Enter fullscreen mode Exit fullscreen mode

output

Image description

Step 9: Scrape the all author

driver=webdriver.Chrome()
driver.get('https://www.azquotes.com/')
target_href = '/top_quotes.html'

# Locate the anchor tag by its href attribute
anchor_element = driver.find_element(By.XPATH, f'//a[@href="{target_href}"]')
anchor_element.click()
# Click the anchor tag (or perform any other action)

data_list=[]

# Define the number of data points you want to collect
desired_count = 1000
while len(quote) <= desired_count:
    # Find all elements with class "_2WkVRV" on the current page
    all_quote_elements = driver.find_elements(By.CLASS_NAME, "title")


    current_data = driver.find_elements(By.XPATH, '//div[@class="author"]/a')

    # Extract the text from each data point and add it to the list
    for data_point in current_data:
        data_list.append(data_point.text)  

     # Extract the text from each data point and add it to the list

    # Try to find and click the element that loads more items (e.g., a "Load More" button)
    next_button = None
    try:
        next_button = driver.find_element(By.LINK_TEXT, 'Next →')
    except:
        pass

    if next_button:
        next_button.click()
    else:
        # If there's no "Next" button or it's not clickable, exit the loop
        break


data_list = data_list[:desired_count]
data_list
Enter fullscreen mode Exit fullscreen mode

output

Image description

Step 10: Scrape the all description

driver=webdriver.Chrome()
driver.get('https://www.azquotes.com/')
target_href = '/top_quotes.html'

# Locate the anchor tag by its href attribute
anchor_element = driver.find_element(By.XPATH, f'//a[@href="{target_href}"]')
anchor_element.click()
desc=[]
# Define the number of data points you want to collect
desired_count = 1000
while len(quote) <= desired_count:
    # Find all elements with class "_2WkVRV" on the current page

    tags_div = driver.find_elements(By.CLASS_NAME, 'tags')    
     # Extract the text from each data point and add it to the list
    for description in tags_div:
        anchor_tags = description.find_elements(By.TAG_NAME, 'a')


        tag_text = ', '.join(tag.text for tag in anchor_tags)
        desc.append(tag_text)
    # Try to find and click the element that loads more items (e.g., a "Load More" button)
    next_button = None
    try:
        next_button = driver.find_element(By.LINK_TEXT, 'Next →')
    except:
        pass

    if next_button:
        next_button.click()
    else:
        # If there's no "Next" button or it's not clickable, exit the loop
        break


desc = desc[:desired_count]
desc
Enter fullscreen mode Exit fullscreen mode

output

Image description

step11: Make a dataframe

Image description

Top comments (0)