Debug School

rakesh kumar
rakesh kumar

Posted on

How to Calculate the details of Highest selling novels. from table while scraping data

To calculate the details of the highest-selling novels from a table while scraping using Selenium, you need to extract the data from the table, parse the sales numbers, and then determine which novels have the highest sales. Here's an example of how to do this using Python and Selenium:

Set Up Selenium and Open the Web Page:

Make sure you have Selenium and the appropriate WebDriver (e.g., ChromeDriver) installed on your system.

from selenium import webdriver

# Initialize the WebDriver (in this case, Chrome)
driver = webdriver.Chrome()

# Navigate to the web page with the table
driver.get("https://example.com/novels")
Enter fullscreen mode Exit fullscreen mode

Locate the Table and Extract Data:

Identify the table on the web page and extract the data, including sales numbers, into a list of dictionaries.

novel_details = []

table = driver.find_element_by_xpath("//table[@class='novel-table']")

for row in table.find_elements_by_xpath(".//tr"):
    cells = row.find_elements_by_xpath(".//td")

    if len(cells) == 4:  # Assuming there are four columns in each row
        title = cells[0].text
        author = cells[1].text
        sales = int(cells[2].text.replace(",", ""))  # Remove commas and convert to an integer
        publication_date = cells[3].text

        novel_details.append({
            "Title": title,
            "Author": author,
            "Sales": sales,
            "Publication Date": publication_date
        })
Enter fullscreen mode Exit fullscreen mode

Calculate the Highest Selling Novels:

You can calculate the novels with the highest sales by sorting the list of dictionaries based on the "Sales" value and selecting the top entries.

# Sort the novel_details list by sales in descending order
novel_details.sort(key=lambda x: x["Sales"], reverse=True)

# Calculate the number of top-selling novels you want
num_top_sellers = 5
top_sellers = novel_details[:num_top_sellers]

# Print or use the top-selling novels
for i, novel in enumerate(top_sellers):
    print(f"{i + 1}. Title: {novel['Title']}, Sales: {novel['Sales']}")
Enter fullscreen mode Exit fullscreen mode

Close the WebDriver:

Don't forget to close the WebDriver once you've finished scraping and calculating the data.

driver.quit()
Enter fullscreen mode Exit fullscreen mode

Another Way

To calculate the details of the highest selling novels from a table while scraping with Selenium, you can follow these steps:

  1. Scrape the details of the highest selling novels from the table using Selenium. You can use the code in the previous example to do this.
  2. Sort the scraped data by novel sales in descending order. You can use the sort() method to do this.
  3. Calculate the total sales of all the novels. You can use the sum() method to do this.
  4. Calculate the average sales of all the novels. You can use the mean() method to do this.
  5. Calculate the median sales of all the novels. You can use the median() method to do this . Here is an example of how to calculate the details of the highest selling novels from a table while scraping with Selenium in Python:
import statistics

def calculate_highest_selling_novels(highest_selling_novels):
  """Calculates the details of the highest selling novels from a list of novels.

  Args:
    highest_selling_novels: A list of novels, where each novel is a dictionary with the following keys:
      * novel_name: The name of the novel.
      * novel_author: The author of the novel.
      * novel_sales: The sales of the novel.

  Returns:
    A dictionary with the following keys:
      * highest_selling_novel: The highest selling novel.
      * total_sales: The total sales of all the novels.
      * average_sales: The average sales of all the novels.
      * median_sales: The median sales of all the novels.
  """

  # Sort the scraped data by novel sales in descending order.
  highest_selling_novels.sort(key=lambda novel: novel["novel_sales"], reverse=True)

  # Calculate the total sales of all the novels.
  total_sales = sum([novel["novel_sales"] for novel in highest_selling_novels])

  # Calculate the average sales of all the novels.
  average_sales = total_sales / len(highest_selling_novels)

  # Calculate the median sales of all the novels.
  median_sales = statistics.median([novel["novel_sales"] for novel in highest_selling_novels])

  # Return the results.
  return {
    "highest_selling_novel": highest_selling_novels[0],
    "total_sales": total_sales,
    "average_sales": average_sales,
    "median_sales": median_sales
  }

# Scrape the details of the highest selling novels from the table using Selenium.
highest_selling_novels = scrape_highest_selling_novels()

# Calculate the details of the highest selling novels.
highest_selling_novels_details = calculate_highest_selling_novels(highest_selling_novels)

# Print the results.
print(highest_selling_novels_details)
Enter fullscreen mode Exit fullscreen mode

Use code with caution. Learn more
This code will calculate the details of the highest selling novels from the scraped data and print the results to the console.

You can use a similar approach to calculate any type of data from a table while scraping with Selenium.

Top comments (0)