Debug School

rakesh kumar
rakesh kumar

Posted on • Edited on

Get data using BeautifulSoup library with a Specific Tag and Class

How to store all data of specific tag(h3) in list after getting all information of Specific Class

Use of strip to remove /n at leading and trailing position

how to store list of all data before special symbol

Task:
Write s python program to display list of respected former presidents of India(i.e. Name , Term ofoffice)
from https://presidentofindia.nic.in/former-presidents.htm and make data frame.

First Methods using class and header

step 1 first install libraries

pip install bs4
pip install request
Enter fullscreen mode Exit fullscreen mode

step 2 import libraries

from bs4 import BeautifulSoup
import requests
Enter fullscreen mode Exit fullscreen mode

step3:Send an HTTP GET request to the URL and (status code 200)

page  = requests.get('https://presidentofindia.nic.in/former-presidents')
page
Enter fullscreen mode Exit fullscreen mode

output

<Response [200]>
Enter fullscreen mode Exit fullscreen mode

step4: check page content

soup= BeautifulSoup(page.content)
soup
Enter fullscreen mode Exit fullscreen mode

output

Image description

step5: get all information of Specific Class

data = soup.find_all('div', class_="desc-sec")
data
Enter fullscreen mode Exit fullscreen mode

Image description

step6: append all data of h3 tag and class desc-sec and store in list

name=[]
for tag in data:
    president_name = tag.find('h3').text
    name.append(president_name.strip())
name
Enter fullscreen mode Exit fullscreen mode

output

Image description

step7: append all data of h5 tag and class desc-sec and store in list


term=[]
for tag in data:
    terms_of_office = tag.find('h5').text
    term.append(terms_of_office.strip())
term


Enter fullscreen mode Exit fullscreen mode

output

Image description

step 8: Finally make a dataframe of above data

import pandas as pd
df= pd.DataFrame({'president_name':name,'terms_of_office':term})
df
Enter fullscreen mode Exit fullscreen mode

output

Image description

Second Methods using class and split

step 1 first install libraries

pip install bs4
pip install request
Enter fullscreen mode Exit fullscreen mode

step 2 import libraries

from bs4 import BeautifulSoup
import requests
Enter fullscreen mode Exit fullscreen mode

step3:Send an HTTP GET request to the URL and (status code 200)

page  = requests.get('https://www.wikipedia.org')
page
Enter fullscreen mode Exit fullscreen mode

output

<Response [200]>
Enter fullscreen mode Exit fullscreen mode

step4: check page content

soup= BeautifulSoup(page.content)
soup
Enter fullscreen mode Exit fullscreen mode

output

Image description

step5: get all information of Specific Class

data = soup.find_all('div', class_="desc-sec")
data
Enter fullscreen mode Exit fullscreen mode

Image description

step6: store all information of Specific Class in list

president=[]
for tag in data:
president.append(tag.text.strip())
president

Output

Image description

step7: store all information of above list into seprate list

president_na = []


for item in president:
    parts = item.split('\n')
    if len(parts) > 1:
        president_na.append(parts[0])      
president_na
Enter fullscreen mode Exit fullscreen mode

output
Image description

term_of = []

for item in president:
    parts = item.split('\n')
    if len(parts) > 1:

        term_of.append(parts[1])

term_of
Enter fullscreen mode Exit fullscreen mode

output

Image description

step 8: Finally make a dataframe of above data

import pandas as pd
datas= pd.DataFrame({'president_name':president_na,'terms_of_office':term_of})
datas
Enter fullscreen mode Exit fullscreen mode

output
Image description

Top comments (0)