Debug School

rakesh kumar
rakesh kumar

Posted on • Updated on

How to scrape results after selecting checkbox using Selenium python

How to Search Input Element By ID
How to send keys to search input elements
How to submit button on icon using id By XPATH after send keys
How to select checkbox using id By XPATH
How to scrape the data using CLASS_NAME
How to scrape the data using span class BY XPATH
How to select checkbox by selecting icon tag using xpath and full xpath
How to select checkbox by selecting span tag using xpath
How to select span element of li class
How to take or select span aria label value using get-attribute
How to take or select first three character

Learning Point
How to apply list slicing in for loop to get range of records like 10,100

:Go to webpage https://www.amazon.in/ Enter “Laptop” in the search field and then click the search icon. Then
set CPU Type filter to “Intel Core i7” as shown in the below image:

Step 1: install selenium using python

pip install --upgrade selenium
Enter fullscreen mode Exit fullscreen mode

Step 2: Import Required Libraries

import pandas as pd
from selenium import webdriver
import warnings
warnings.filterwarnings('ignore')
from selenium.webdriver.common.by import By
import time
Enter fullscreen mode Exit fullscreen mode

Step 3: Start a WebDriver Instance

driver=webdriver.Chrome()
Enter fullscreen mode Exit fullscreen mode

Step 4: Navigate to a Webpage
Navigate to the webpage where you want to interact with the search input bar:

driver.get(' https://www.amazon.in/')
Enter fullscreen mode Exit fullscreen mode

Step 5: Find the Search Input Element

input_field = driver.find_element(By.ID,"twotabsearchtextbox")
Enter fullscreen mode Exit fullscreen mode

Step 6: Find the Search Input Element for sunglasses by sending S Keys to the Input Element and then submit button

input_field.send_keys("Laptop")
input_submit =driver.find_element(By.XPATH, "//*[@id='nav-search-submit-button']").click()
Enter fullscreen mode Exit fullscreen mode

Step 7: Select checkbox By ID then click

input_submit = driver.find_element(By.XPATH, "//*[@id=\"p_n_feature_thirteen_browse-bin/12598163031\"]/span/a/span").click()
Enter fullscreen mode Exit fullscreen mode

===OR===

input_submit = driver.find_element(By.XPATH, "//*[@id=\"p_n_feature_thirteen_browse-bin/12598163031\"]/span/a/span").click()
Enter fullscreen mode Exit fullscreen mode

Step 8: Scrape the Results
title=[]
price=[]
rating=[]

Step 9: Scrape the all title

all_title=driver.find_elements(By.CLASS_NAME,"a-size-medium.a-color-base.a-text-normal")
all_title

for alltitles in all_title[0:10]:  

    title.append(alltitles.text)
title
Enter fullscreen mode Exit fullscreen mode

output

Image description

Step 10: Scrape the all price

all_price=driver.find_elements(By.CLASS_NAME,"a-price-whole")
all_price
for allprice in all_price[0:10]:  

    price.append(allprice.text)
price
Enter fullscreen mode Exit fullscreen mode

output

Image description

Step 10: Scrape the all ratings

all_rating= driver.find_elements(By.XPATH, '//span[@class="a-size-base puis-bold-weight-text"]')
all_rating
ratings=[]
for rating_element in all_rating[0:10]:

    ratings.append(rating_element.text)
ratings  
Enter fullscreen mode Exit fullscreen mode

output

Image description

=====================================
: In this question you have to scrape data using the filters available on the webpage as shown below: You have to use the location and salary filter. You have to scrape data for “Data Scientist” designation for first 10 job results. You have to scrape the job-title, job-location, company_name, experience_required. The location filter to be used is “Delhi/NCR” The salary filter to be used is “3-6” lakhs The task will be done as shown in the below steps:
first get the webpage https://www.naukri.com/
Enter “Data Scientist” in “Skill,Designations,Companies” field .
Then click the search button.
Then apply the location filter and salary filter by checking the respective boxes
Then scrape the data for the first 10 jobs results you get.
Finally create a dataframe of the scraped data.

# Activating the chrome browser
driver=webdriver.Chrome(r"C:\Users\HP\Desktop\Fliprobo notes\chromedriver.exe")
Enter fullscreen mode Exit fullscreen mode
# Opening the homepage-
driver.get("https://www.naukri.com/")
Enter fullscreen mode Exit fullscreen mode

How to select checkbox by selecting icon tag using xpath and full xpath
How to select checkbox by selecting span tag using xpath

# Entering Designation in search box-
designation=driver.find_element(By.CLASS_NAME,"suggestor-input")
designation.send_keys('Data Scientist')

#Clicking on Search Button-
search=driver.find_element(By.CLASS_NAME,"qsbSubmit")
search.click()

# Setting the location filter-
location_delhi=driver.find_element(By.XPATH,"/html/body/div[1]/div[3]/div[2]/section[1]/div[2]/div[5]/div[2]/div[3]/label/i")
location_delhi.click()
===or==========

Enter fullscreen mode Exit fullscreen mode

//*[@id="search-result-container"]/div[1]/div[1]/div/div/div[2]/div[5]/div[2]/div[3]/label/i




# Setting the Salary filter-
salary_filter=driver.find_element(By.XPATH,"/html/body/div[1]/div[3]/div[2]/section[1]/div[2]/div[6]/div[2]/div[2]/label/p/span[1]")
salary_filter.click()
Enter fullscreen mode Exit fullscreen mode
#Creating Empty List for different attributes-
job_title=[]
job_location=[]
company_name=[]
exp_Reqd=[]
Enter fullscreen mode Exit fullscreen mode
# Scraping Data for different Attributes from different pages-
start=0
end=1
for page in range(start,end):
    title=driver.find_elements(By.XPATH,"//a[@class='title fw500 ellipsis']")
    for i in title[0:10]:
        job_title.append(i.text)
    location=driver.find_elements(By.XPATH,"//li[@class='fleft grey-text br2 placeHolderLi location']//span")
    for i in location[0:10]:
        job_location.append(i.text)    
    company=driver.find_elements(By.XPATH,"//a[@class='subTitle ellipsis fleft']")
    for i in company[0:10]:
        company_name.append(i.text)
    experience=driver.find_elements(By.XPATH,"//li[@class='fleft grey-text br2 placeHolderLi experience']")
    for i in experience[0:10]:
        exp_Reqd.append(i.text)       
    next_button=driver.find_elements(By.XPATH,"//a[@class='_1LKTO3']") 
Enter fullscreen mode Exit fullscreen mode
#checking length for all attributes-
print(len(job_title),len(job_location),len(company_name),len(exp_Reqd))
Enter fullscreen mode Exit fullscreen mode
# Creating Dataframe-
df=pd.DataFrame({'Job_title':job_title,'Job_location':job_location,'Company_name':company_name,'Experience':exp_Reqd})
df
Enter fullscreen mode Exit fullscreen mode

========================================================
Q7: Go to webpage https://www.amazon.in/Enter “Laptop” in the search field and then click the search icon.Then set CPU Type filter to “Intel Core i7”
After setting the filters scrape first 10 laptops data.You have to scrape 3 attributes for each laptop:

  1. title
  2. Ratings
  3. Price# 8)Amazon Laptop
# Activating the chrome browser
driver=webdriver.Chrome()
Enter fullscreen mode Exit fullscreen mode
# Opening the homepage-
driver.get("https://www.amazon.in/")

Enter fullscreen mode Exit fullscreen mode
# Entering laptop in search box-
laptop=driver.find_element(By.XPATH,"//div[@class='nav-search-field ']//input")
laptop.send_keys('Laptop')

#Clicking on Search Button-
search=driver.find_element(By.XPATH,"/html/body/div[1]/header/div/div[1]/div[2]/div/form/div[3]/div/span/input")
search.click()
Enter fullscreen mode Exit fullscreen mode
# code to filter i7 CPu-
cpu_filter1=driver.find_element(By.XPATH,"/html/body/div[1]/div[2]/div[1]/div[2]/div/div[3]/span/div[1]/div/div/div[5]/ul[6]/li[12]/span/a/div/label/i")
cpu_filter1.click()
Enter fullscreen mode Exit fullscreen mode
#Creating Empty List for different attributes-
Title=[]
price=[]
Ratings=[]
Rating=[]
Enter fullscreen mode Exit fullscreen mode
start=0
end=1
for page in range(start,end):
    title=driver.find_elements(By.XPATH,"//h2[@class='a-size-mini a-spacing-none a-color-base s-line-clamp-2']")
    for i in title[0:10]:
        Title.append(i.text) 
    prices=driver.find_elements(By.XPATH,'//span[@class="a-price-whole"]')
    for i in prices[0:10]:
        price.append(i.text) 
Enter fullscreen mode Exit fullscreen mode

How to take or select span aria label value using get-attribute
How to take or select first three character

titles_rat=driver.find_elements(By.XPATH,"//div[@class='a-row a-size-small']/span")
for i in titles_rat:
    Rating.append(i.get_attribute("aria-label"))
rating=[]
for i in range(0,len(Rating))[0:20]:
    if i == 0 or  i/2 == i//2:
        rating.append(Rating[i][0:3])
Enter fullscreen mode Exit fullscreen mode

Top comments (0)