python爬取网页内容demo
2021-06-28 02:06
阅读:444
标签:parser dem this exce dataframe class note pre sts
1 #html文本提取 2 from bs4 import BeautifulSoup 3 html_sample = ‘ 4 5 6Hello world
7 This is link1 8 This is link2 9 10 ‘ 11 soup = BeautifulSoup(html_sample,‘html.parser‘) 12 print(soup.text) 13 soup.select(‘h1‘) 14 print(soup.select(‘h1‘)[0].text) 15 print(soup.select(‘a‘)[0].text) 16 print(soup.select(‘a‘)[1].text) 17 18 for alink in soup.select(‘a‘): 19 print(alink.text) 20 21 print(soup.select(‘#title‘)[0].text) 22 print(soup.select(‘.link‘)[0].text) 23 24 alinks = soup.select(‘a‘) 25 for link in alinks: 26 print(link[‘href‘])
demo2:
1 import requests 2 from bs4 import BeautifulSoup 3 res = requests.get(‘http://news.qq.com/‘) 4 soup = BeautifulSoup(res.text,‘html.parser‘) 5 newsary = [] 6 for news in soup.select(‘.Q-tpWrap .text‘): 7 newsary.append({‘title‘:news.select(‘a‘)[0].text, ‘url‘:news.select(‘a‘)[0][‘href‘]}) 8 9 import pandas 10 newsdf = pandas.DataFrame(newsary) 11 newsdf.to_excel(‘news.xlsx‘)
推荐使用:Jupyter Notebook 做练习,很方便。
python爬取网页内容demo
标签:parser dem this exce dataframe class note pre sts
原文地址:https://www.cnblogs.com/hujianglang/p/9650329.html
上一篇:python基础
评论
亲,登录后才可以留言!