Python爬虫-百度贴吧

2021-02-02 13:13

阅读：703

标签：ike use window inpu filename parse urllib key load

百度贴吧爬虫实现

　　GET请求

from urllib import request
import urllib
import time

# https://tieba.baidu.com/f?kw=python&fr=ala0&tpl=5    #第一页 
# https://tieba.baidu.com/f?kw=python&ie=utf-8&pn=50 #第二页 (2-1)*50
# https://tieba.baidu.com/f?kw=python&ie=utf-8&pn=100 #第三页 (3-1)*50    
# https://tieba.baidu.com/f?kw=python&ie=utf-8&pn=150 #第四页 (4-1)*50
# 第n页    (n-1)*50 
# 推测第一页：https://tieba.baidu.com/f?kw=python&ie=utf-8&pn=0 

headers={
"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36"
}

#根据url发送请求，获取服务器响应文件
def loadPage(url,filename):
    print("正在下载"+filename)
    req=request.Request(url,headers=headers)
    return request.urlopen(req).read()

#将HTML内容写到本地
def writePage(html,filename):
    print("正在保存"+filename)
    with open(filename,"wb") as f:
        f.write(html)
    print("---------------------------")    


def tiebaSpider(url,begin,end):
    for page in range(begin,end+1):
        pn=(page-1)*50
        fullurl=url+"&pn="+str(pn) #每次请求的url
        filename="D:/贴吧/第"+str(page)+"页.html" #每次请求后保存的文件名

        html=loadPage(fullurl,filename) #调用爬虫，爬取网页信息
        writePage(html,filename) #写入本地


if __name__==‘__main__‘:
    while(True):
        kw=input("请输入字条：")
        begin=int(input("请输入起始页："))
        end=int(input("请输入结束页："))

        url="http://tieba.baidu.com/f?"
        key=urllib.parse.urlencode({"kw":kw})
        url=url+key
        tiebaSpider(url,begin,end)

Python爬虫-百度贴吧

标签：ike use window inpu filename parse urllib key load

原文地址：https://www.cnblogs.com/Just-a-calm-programmer/p/12809816.html

上一篇：PyOpenGL TypeError in glutCreateWindow

下一篇：用python自动复制excel的sheet表数据到新表

文章来自：搜素材网的编程语言模块，转载请注明文章出处。
文章标题：Python爬虫-百度贴吧
文章链接：http://soscw.com/index.php/essay/49972.html

亲，登录后才可以留言！

Python爬虫-百度贴吧

百度贴吧爬虫实现

评论

热门文章

推荐文章

最新文章

置顶文章