Python爬虫-爬取音乐资源

2021-05-07 01:28

阅读：775

标签：gecko 出现 dal win64 mp3 pre 榜单 str imp

爬取音乐资源

实现

#python 的正则库
import re 
#python 的requests库
import requests
import time

#找到url的规律
#每一页的url
# http://www.htqyy.com/top/hot
# http://www.htqyy.com/top/musicList/hot?pageIndex=1&pageSize=20
# http://www.htqyy.com/top/musicList/hot?pageIndex=2&pageSize=20

#歌曲连接
# http://www.htqyy.com/play/33
# 33-每个歌曲的号码，页url可以找到
#资源所在url
# http://f2.htqyy.com/play8/33/mp3/6

#class="num">41琵琶语

songName=[]
songID=[]

headers={
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36"
}

page=2
#page=int(input("请输入您要爬取的页数："))

for i in range(0,page):
    url="http://www.htqyy.com/top/musicList/hot?pageIndex="+str(i)+"&pageSize=20"

    #发送get请求，获取音乐榜单网页信息
    r=requests.get(url,headers=headers)
    #GBK网页采用的编码格式
    r.encoding=‘GBK‘
    html_text=r.text
    print(html_text)
    #正则找到对应歌的url
    part1=r‘title="(.*?)" sid=‘
    part2=r‘sid="(.*?)"‘

    #将匹配的字串组成列表形式返回
    titlelist=re.findall(part1,html_text)
    idlist=re.findall(part2,html_text)

    #在一个列表尾添加另一个列表
    songName.extend(titlelist)
    songID.extend(idlist)

for i in range(0,len(songID)):
    songurl="http://f2.htqyy.com/play8/"+str(songID[i])+"/mp3/6"
    songname=songName[i]

    #二进制文件
    data=requests.get(songurl).content
    
    print("正在下载...")
    with open("E:\\music\\{0}.mp3".format(songname),"wb") as f:
        f.write(data)
    
    time.sleep(5)

当无法访问试试下面代码

headers={
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36"
}

songurl="http://f2.htqyy.com/play8/33/mp3/6"
songname="清风"

#二进制文件
data=requests.get(songurl,headers=headers).content

print("正在下载...")
with open("D:\\Python\\{0}.mp3".format(songname),"wb") as f:
    f.write(data)

总结

　　当得到的网页信息是乱码：

　　print requests.get(url).encoding　　打印获取到的网页信息采用什么编码

　　r = requests.get(url)

　　r.encoding = ‘GBK‘

　　print(r.text)　　　　　　　　　　　将编码格式采用‘GBK‘，网页编码，就不会出现乱码

　　字符串拼接：

　　+或者format()

Python爬虫-爬取音乐资源

标签：gecko 出现 dal win64 mp3 pre 榜单 str imp

原文地址：https://www.cnblogs.com/Just-a-calm-programmer/p/12958040.html

上一篇：用python操作PDF文件

下一篇：Zabbix 5.0切换中文语言小结

文章来自：搜素材网的编程语言模块，转载请注明文章出处。
文章标题：Python爬虫-爬取音乐资源
文章链接：http://soscw.com/essay/83463.html

亲，登录后才可以留言！

Python爬虫-爬取音乐资源

爬取音乐资源

实现

当无法访问试试下面代码

总结

评论

热门文章

推荐文章

最新文章

置顶文章