python beautifulsoup4 简单使用

2021-04-21 18:27

阅读:670

YPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">

标签:htm   imp   string   href   head   time   ring   from   otto   


from bs4 import BeautifulSoup
from bs4 import Comment

html_doc = """
The Dormouse‘s story

The Dormouse‘s story

Once upon a time there were three little sisters; and their names were Elsie, Lacie and Tillie; and they lived at the bottom of a well.

...

""" soup = BeautifulSoup(html_doc, "lxml") print(soup.prettify()) print(‘*‘ * 100) print(soup.title) print(soup.title.string) print(soup.title.text) print(soup.title.parent.name) print(‘*‘ * 100) print(soup.a) print(soup.a.attrs) print(soup.a[‘href‘]) print(soup.a.attrs[‘href‘]) print(soup.find(‘p‘)) print(soup.find(id="link1")) print(soup.find_all(‘a‘)) for link in soup.find_all(‘a‘): print(link.get(‘href‘)) print(soup.select(‘.sister‘)) print(‘*‘ * 100) print(soup.get_text()) print(‘*‘ * 100) comments = soup.findAll(text=lambda text: isinstance(text, Comment)) print(comments)

python beautifulsoup4 简单使用

标签:htm   imp   string   href   head   time   ring   from   otto   

原文地址:https://www.cnblogs.com/xgege/p/13280510.html


评论


亲,登录后才可以留言!