Debug School

rakesh kumar
rakesh kumar

Posted on • Updated on

Get html table data and stores in dataframe using BeautifulSoup library

How to get all nested span element inside td for table and stores in list
How to get nested td element inside td for table and stores in list

how to get second td element of multiple tr
how to get second/third span element of td

Requirement
Top 10 ODI teams in men’s cricket along with the records for matches, points and rating

Image description

Solution
Step1:Inspect the data

Image description
step 1 first install libraries

pip install bs4
pip install request
Enter fullscreen mode Exit fullscreen mode

step 2 import libraries

from bs4 import BeautifulSoup
import requests
Enter fullscreen mode Exit fullscreen mode

step3:Send an HTTP GET request to the URL and (status code 200)

page  = requests.get('https://www.icc-cricket.com/rankings/mens/team-rankings/odi')
page
Enter fullscreen mode Exit fullscreen mode

output

<Response [200]>
Enter fullscreen mode Exit fullscreen mode

step4: check page content

soup= BeautifulSoup(page.content)
soup
Enter fullscreen mode Exit fullscreen mode

output

Image description

step5:Get and store all team data in list

second_span_list = []
tbody = soup.find('tbody')
if tbody:
    for tr in tbody.find_all('tr'):
        tds = tr.find_all('td')
        if len(tds) >= 0:
            td = tds[1]  # Get the second <td> element
            spans = td.find_all('span')
            if len(spans) == 3:
                second_span_content = spans[1].text  # Extract the text content of the second <span>
                second_span_list.append(second_span_content.strip())
second_span_list 
Enter fullscreen mode Exit fullscreen mode

Output

Image description

step6:Get and store all match made by team in list

second_td_list = []
tbody = soup.find('tbody')
if tbody:
    for tr in tbody.find_all('tr'):
        tds = tr.find_all('td')
        if len(tds) >= 0:
            second_td_content = tds[2].text
            second_td_list.append(second_td_content.strip())
second_td_list
Enter fullscreen mode Exit fullscreen mode

Output

Image description

step7:Get and store all points made by team in list

third_td_list = []
tbody = soup.find('tbody')
if tbody:
    for tr in tbody.find_all('tr'):
        tds = tr.find_all('td')
        if len(tds) >= 0:
            second_td_content = tds[3].text
            third_td_list.append(second_td_content.strip())
third_td_list
Enter fullscreen mode Exit fullscreen mode

Output

Image description

step8:Get and store all ratings made by team in list

four_td_list = []
tbody = soup.find('tbody')
if tbody:
    for tr in tbody.find_all('tr'):
        tds = tr.find_all('td')
        if len(tds) >= 0:
            second_td_content = tds[4].text
            four_td_list.append(second_td_content.strip())
four_td_list
Enter fullscreen mode Exit fullscreen mode

Output

Image description

step9:Store all the information in data-frame

import pandas as pd
df= pd.DataFrame({'team':second_span_list,'match':second_td_list,'points':third_td_list,'rating':four_td_list})
df
Enter fullscreen mode Exit fullscreen mode

Output

Image description

Top comments (0)