python爬取网页内容demo

2021-06-28 02:06

阅读:444

标签:parser   dem   this   exce   dataframe   class   note   pre   sts   

 1 #html文本提取
 2 from bs4 import BeautifulSoup
 3 html_sample =  4   5   6 

Hello world

7 This is link1 8 This is link2 9 10 11 soup = BeautifulSoup(html_sample,html.parser) 12 print(soup.text) 13 soup.select(h1) 14 print(soup.select(h1)[0].text) 15 print(soup.select(a)[0].text) 16 print(soup.select(a)[1].text) 17 18 for alink in soup.select(a): 19 print(alink.text) 20 21 print(soup.select(#title)[0].text) 22 print(soup.select(.link)[0].text) 23 24 alinks = soup.select(a) 25 for link in alinks: 26 print(link[href])

demo2:

 1 import requests
 2 from bs4 import BeautifulSoup
 3 res = requests.get(http://news.qq.com/)
 4 soup = BeautifulSoup(res.text,html.parser)
 5 newsary = []
 6 for news in soup.select(.Q-tpWrap .text):
 7     newsary.append({title:news.select(a)[0].text, url:news.select(a)[0][href]})
 8 
 9 import pandas 
10 newsdf = pandas.DataFrame(newsary)
11 newsdf.to_excel(news.xlsx)

 推荐使用:Jupyter Notebook 做练习,很方便。

python爬取网页内容demo

标签:parser   dem   this   exce   dataframe   class   note   pre   sts   

原文地址:https://www.cnblogs.com/hujianglang/p/9650329.html


评论


亲,登录后才可以留言!