Debug School

rakesh kumar
rakesh kumar

Posted on

How to get data and url from anchor tag using beutifull soup library

Write a python program to scrape mentioned news details from https://www.cnbc.com/world/?region=world and
make data frame
i) Headline
ii) Time
iii) News Link

How to get all text data of anchor tag of specific class
How to get all url of anchor tag of specific class
How to count all element of list
step 1 first install libraries

pip install bs4
pip install request
Enter fullscreen mode Exit fullscreen mode

step 2 import libraries

from bs4 import BeautifulSoup
import requests
Enter fullscreen mode Exit fullscreen mode

step3:Send an HTTP GET request to the URL and (status code 200)

page  = requests.get('https://www.cnbc.com/world/?region=world')
page
Enter fullscreen mode Exit fullscreen mode

output

<Response [200]>
Enter fullscreen mode Exit fullscreen mode

step4: check page content

soup= BeautifulSoup(page.content)
soup
Enter fullscreen mode Exit fullscreen mode

output

Image description

step5: get data of particular class of time tag

data = soup.find_all('time', class_="LatestNews-timestamp")
data
Enter fullscreen mode Exit fullscreen mode

Output

Image description

step6: get all data of time and store in list

hour=[]
for tag in data:  
    hour.append(tag.text.strip())
hour
Enter fullscreen mode Exit fullscreen mode

Output

Image description

step7: get all data of anchor tag of specific class

heads = soup.find_all('a', class_="LatestNews-headline")
heads
Enter fullscreen mode Exit fullscreen mode

Output

Image description

step8: get all data of anchor tag and store in list
How to get all text data of anchor tag of specific class

latest_news=[]
for tag in heads:  
    latest_news.append(tag.text.strip())
latest_news
Enter fullscreen mode Exit fullscreen mode

output

Image description
step9: to check total n0 of element in list
How to count all element of list

count_hour = len(hour)
count_hour
Enter fullscreen mode Exit fullscreen mode

Output
30

step10: get all url of anchor tag and store in list
How to get all url of anchor tag of specific class

url_list = []
# Loop through the anchor tags and extract the 'href' attribute
for anchor in heads:
    url = anchor.get('href')
    url_list.append(url)
url_list
Enter fullscreen mode Exit fullscreen mode

Image description

step11: convert all data into dataframe

import pandas as pd
df= pd.DataFrame({'headline':latest_news,'time':hour,'url':url_list})
df
Enter fullscreen mode Exit fullscreen mode

output

Image description

Top comments (0)