Python实例---获取酷狗音乐Top100

2021-07-04 08:06

阅读:405

标签:ref   gecko   temp   ongl   soup   查找   nbsp   image   sel   

获取酷狗TOP 100

http://www.kugou.com/yy/rank/home/1-8888.html

排名

技术分享图片

文件&&歌手

技术分享图片

时长

技术分享图片

效果:

技术分享图片

附源码:

import time
import json
from bs4 import BeautifulSoup
import requests


class Kugou(object):
    def __init__(self):
        self.header = {
            "User-Agent": ‘Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0‘
        }

    def getInfo(self, url):
        html = requests.get(url, headers=self.header)
        soup = BeautifulSoup(html.text, ‘html.parser‘)
        # print(soup.prettify())
        ranks = soup.select(‘.pc_temp_num‘)
        titles = soup.select(‘.pc_temp_songlist > ul > li > a‘)  # 层层标签查找
        times = soup.select(‘.pc_temp_time‘)
        for rank, title, songTime in zip(ranks, titles, times):
            data = {
                # rank 全打印就是带HTML标签的
                ‘rank‘: rank.get_text().strip(),
                ‘title‘: title.get_text().split(‘-‘)[1].strip(),
                ‘singer‘: title.get_text().split(‘-‘)[0].strip(),
                ‘songTime‘: songTime.get_text().strip()
            }
            s = str(data)
            print(‘rank:%2s\t‘ % data[‘rank‘], ‘title:%2s\t‘ % data[‘title‘], ‘singer:%2s\t‘ %data[‘singer‘], ‘songTime:%2s\t‘ % data[‘songTime‘])
            with open(‘hhh.txt‘, ‘a‘, encoding=‘utf8‘) as f:
               f.writelines(s + ‘\n‘)

if __name__ == ‘__main__‘:
    urls = [
        ‘http://www.kugou.com/yy/rank/home/{}-8888.html‘.format(str(i)) for i in range(30)
    ]

    kugou = Kugou()
    for url in urls:
        kugou.getInfo(url)
        time.sleep(1)

 

部分代码解析

--------------------------------------------------------------------
urls = [‘http://www.kugou.com/yy/rank/home/{}-8888.html‘.format(str(i)) for i in range(1, 5)]
for i in urls:
    print(i)

结果打印:
http://www.kugou.com/yy/rank/home/1-8888.html
http://www.kugou.com/yy/rank/home/2-8888.html
http://www.kugou.com/yy/rank/home/3-8888.html
http://www.kugou.com/yy/rank/home/4-8888.html
--------------------------------------------------------------------
for rank, title, songTime in zip(ranks, titles, times):
    data = {
        # rank 全打印就是带HTML标签的
        ‘rank‘: rank.get_text().strip(),
        ‘title‘: title.get_text().split(‘-‘)[0].strip(),
        ‘singer‘: title.get_text().split(‘-‘)[1].strip(),
        ‘songTime‘: songTime.get_text()
    }
    print(data[‘rank‘])
    print(data[‘title‘])
    print(data[‘singer‘])
    print(data[‘songTime‘])

结果打印:
    1
    飞驰于你
    许嵩
    4: 04
--------------------------------------------------------------------  
for rank, title, songTime in zip(ranks, titles, times):
data = {
# rank 全打印就是带HTML标签的
‘rank‘: rank,
‘title‘: title,
‘songTime‘: songTime
}
print(data[‘rank‘])
print(data[‘title‘])
print(data[‘songTime‘])
结果打印:

1

许嵩 - 飞驰于你
4:04

Python实例---获取酷狗音乐Top100

标签:ref   gecko   temp   ongl   soup   查找   nbsp   image   sel   

原文地址:https://www.cnblogs.com/ftl1012/p/9614146.html


评论


亲,登录后才可以留言!