如何下载web资源

2021-04-21 13:26

阅读:571

标签:bat   lse   deb   course   debug   rip   post   cti   cond   

目录

  • 目的
  • 研究
    • test 1: chrome extension
    • test 2: 最终写了如下python脚本
  • 其他
    • 下面这个是啥框架写的?
    • bookChapter在哪里定义的?
    • 如何获取连接?

如何下载web资源

目的

最近机工社宣布开放工程科技数字图书馆, 全网免费共克时艰!

发现有些书是以web页面的方式给用户看的,一张一张,很难一次性下载

有没有办法一次性下载他们呢?

比如书

技术图片

研究

test 1: chrome extension

上网查到很多chrome extension但是他们都认不到页面内的连接。这是因为页面里面根本没有连接

biru

页面链接如下

3.1 协商原则

该链接其实最终变成http://www.hzcourse.com/resource/readBook?path=/openresources/teach_ebook/uncompressed/13780/OEBPS/Text/chapter33.html

所以怪不得扩展不认识了

看来还是要自己写一个了

最简单就是用python了

测试以上链接

C:\Users\cutep>python -m wget http://www.hzcourse.com/resource/readBook?path=/openresources/teach_ebook/uncompressed/13780/OEBPS/Text/chapter33.html -o 33.html
100% [................................................................................] 4000 / 4000
Saved under 33.html

成功!

test 2: 最终写了如下python脚本

import os 
#from selenium import webdriver
#from urllib2 import urlopen
import requests

def my_system(cmd):
    print(cmd)
    os.system(cmd)
    
def download(url, file):
    cmd = 'python -m wget %s -o %s'%(url, file)
    my_system(cmd)
    
def download_chapter(click_url, file):
    download('http://www.hzcourse.com/resource/readBook?path=%s'%click_url, file)
    
def get_bookname(cont):
    s='
' p1 = cont.find(s) p1 = p1 + len(s) p1 = cont.find('', p1) p1 = p1 + len('') p2 = cont.find('', p1) #print(p1, p2) name=cont[p1:p2] return name def get_value_token(cont): s='"ebookId" value="' p1 = cont.find(s) p1 = p1 + len(s) p2 = cont.find('"/>', p1) #print(p1, p2) ebookId=cont[p1:p2] s2 = 'name="token" value="' p3 = cont.find(s2, p2) p3 = p3 + len(s2) p4 = cont.find('"/>', p3) #print(p3, p4) token=cont[p3:p4] print('ebookId, token %s %s'%(ebookId, token)) return [ebookId, token] def download_book(main_link): my_system('del main*.html') download(main_link, 'main.html') main_cont = open('main.html', 'r', encoding='utf-8').read() [ebookId, token] = get_value_token(main_cont) bookname = get_bookname(main_cont) print(bookname) if os.path.isdir(bookname): return my_system('rd/s/q my_temp') my_system('md my_temp') os.chdir('my_temp') my_system('cd') #response = requests.post('http://www.hzcourse.com/web/refbook/queryAllChapterList', data={'ebookId':15917,'token':"e87436c8bc7849c397a1db2f27c0ba5d"}) response = requests.post('http://www.hzcourse.com/web/refbook/queryAllChapterList', data={'ebookId':ebookId,'token':token}) resp_json = response.json() #print(resp_json) for i in resp_json['data']['data']: ref_link = i['ref'] file = ref_link[ref_link.rfind('/')+1:] print(ref_link, file) download_chapter(ref_link, file) os.chdir('..') my_system('cd') my_system('md "%s"'%bookname) my_system('xcopy /c/d/e/y my_temp "%s"'%bookname) #download_book('http://www.hzcourse.com/web/refbook/probationAll/6736/e87436c8bc7849c397a1db2f27c0ba5d') download_book('http://www.hzcourse.com/web/refbook/probationAll/6736/e87436c8bc7849c397a1db2f27c0ba5d') download_book('http://www.hzcourse.com/web/refbook/probationAll/6856/e87436c8bc7849c397a1db2f27c0ba5d') download_book('http://www.hzcourse.com/web/refbook/probationAll/7899/e87436c8bc7849c397a1db2f27c0ba5d') download_book('http://www.hzcourse.com/web/refbook/probationAll/7249/e87436c8bc7849c397a1db2f27c0ba5d') download_book('http://www.hzcourse.com/web/refbook/probationAll/7165/e87436c8bc7849c397a1db2f27c0ba5d') download_book('http://www.hzcourse.com/web/refbook/probationAll/7186/e87436c8bc7849c397a1db2f27c0ba5d') download_book('http://www.hzcourse.com/web/refbook/probationAll/7523/e87436c8bc7849c397a1db2f27c0ba5d') download_book('http://www.hzcourse.com/web/refbook/probationAll/6965/e87436c8bc7849c397a1db2f27c0ba5d') download_book('http://www.hzcourse.com/web/refbook/probationAll/6826/e87436c8bc7849c397a1db2f27c0ba5d') download_book('http://www.hzcourse.com/web/refbook/probationAll/6166/e87436c8bc7849c397a1db2f27c0ba5d') download_book('http://www.hzcourse.com/web/refbook/probationAll/6188/e87436c8bc7849c397a1db2f27c0ba5d') download_book('http://www.hzcourse.com/web/refbook/probationAll/6853/e87436c8bc7849c397a1db2f27c0ba5d') download_book('http://www.hzcourse.com/web/refbook/probationAll/4599/e87436c8bc7849c397a1db2f27c0ba5d') download_book('http://www.hzcourse.com/web/refbook/probationAll/6759/e87436c8bc7849c397a1db2f27c0ba5d')

Test result

Saved under chapter51.xhtml
/openresources/teach_ebook/uncompressed/16571/OEBPS/Text/chapter52.xhtml chapter52.xhtml
python -m wget http://www.hzcourse.com/resource/readBook?path=/openresources/teach_ebook/uncompressed/16571/OEBPS/Text/chapter52.xhtml -o chapter52.xhtml
100% [................................................................................] 1058 / 1058
Saved under chapter52.xhtml
/openresources/teach_ebook/uncompressed/16571/OEBPS/Text/chapter53.xhtml chapter53.xhtml
python -m wget http://www.hzcourse.com/resource/readBook?path=/openresources/teach_ebook/uncompressed/16571/OEBPS/Text/chapter53.xhtml -o chapter53.xhtml
100% [................................................................................] 4625 / 4625
Saved under chapter53.xhtml
/openresources/teach_ebook/uncompressed/16571/OEBPS/Text/chapter54.xhtml chapter54.xhtml
python -m wget http://www.hzcourse.com/resource/readBook?path=/openresources/teach_ebook/uncompressed/16571/OEBPS/Text/chapter54.xhtml -o chapter54.xhtml
100% [..................................................................................] 705 / 705
Saved under chapter54.xhtml
/openresources/teach_ebook/uncompressed/16571/OEBPS/Text/chapter55.xhtml chapter55.xhtml
python -m wget http://www.hzcourse.com/resource/readBook?path=/openresources/teach_ebook/uncompressed/16571/OEBPS/Text/chapter55.xhtml -o chapter55.xhtml
100% [................................................................................] 1814 / 1814
Saved under chapter55.xhtml
/openresources/teach_ebook/uncompressed/16571/OEBPS/Text/chapter56.xhtml chapter56.xhtml
python -m wget http://www.hzcourse.com/resource/readBook?path=/openresources/teach_ebook/uncompressed/16571/OEBPS/Text/chapter56.xhtml -o chapter56.xhtml
100% [..............................................................................] 10025 / 10025
Saved under chapter56.xhtml
/openresources/teach_ebook/uncompressed/16571/OEBPS/Text/chapter57.xhtml chapter57.xhtml
python -m wget http://www.hzcourse.com/resource/readBook?path=/openresources/teach_ebook/uncompressed/16571/OEBPS/Text/chapter57.xhtml -o chapter57.xhtml

技术图片

其他

下面这个是啥框架写的?

A: avalonjs

                            
  • {{bookChapter.title}}
  • bookChapter在哪里定义的?

    var probation = {
        search:function(){
            var key = $.trim($("#condition").val());
            ebookRead.queryEbookChapterList(key);
        },
        queryEbookChapterList:function(key){
            var ebookId = $.trim($("#ebookId").val());
            var token = $.trim($("#token").val());
            debugger;
            jQuery.ajax({
                type : "post" , 
                url : "web/refbook/queryAllChapterList", 
                dataType : "json" , 
                data : {ebookId:ebookId,key:key,token:token},
                success : function(obj) {
                    if(obj.data.code==1){
                        var bookChapters = obj.data.data;
                        if(bookChapters.length > 0){
                            bookChaptertCtrl.bookChapters = bookChapters;
                            $("#chapterCont").load();
                            $("#directories").find("li").first().children("a").click();
                        }
                    } else {
                        alert(obj.data.message);
                    }
                }
            });
        },

    技术图片

    如何获取连接?

    万能的chrome F12了

    技术图片

    如何下载web资源

    标签:bat   lse   deb   course   debug   rip   post   cti   cond   

    原文地址:https://www.cnblogs.com/cutepig/p/12250629.html


    评论


    亲,登录后才可以留言!