urllib.request--urllib2

2021-05-20 01:29

阅读:660

标签:and   file   获取   doctype   tar   star   cad   utf-8   width   

技术分享

 

1 基本使用:

================  urllib2 (py2)库的使用  
================ py3是urllib.request 其他使用一样
import urllib.request headers
={ User-Agent:......} request=urllib2.Request(http://www.baidu.com,headers=headers) # request 对象 response=urllib2.urlopen(request) # response 对象 html=response.read() response.read() response.getcode() response.geturl() response.info() # 的响应信息

 

2 增加请求头:

import urllib.request
import random

url=http://www.baidu.com

UA_list=[User-Agent:Mozilla/5.0(Macintosh;IntelMacOSX10_7_0)AppleWebKit/535.11(KHTML,likeGecko)Chrome/17.0.963.56Safari/535.11,
         User-Agent:Mozilla/5.0(Macintosh;U;IntelMacOSX10_6_8;en-us)AppleWebKit/534.50(KHTML,likeGecko)Version/5.1Safari/534.50,
         User-Agent:Mozilla/5.0(Windows;U;WindowsNT6.1;en-us)AppleWebKit/534.50(KHTML,likeGecko)Version/5.1Safari/534.50,
         User-Agent:Mozilla/5.0(compatible;MSIE9.0;WindowsNT6.1;Trident/5.0,
         User-Agent:Mozilla/4.0(compatible;MSIE8.0;WindowsNT6.0;Trident/4.0),
         User-Agent:Mozilla/4.0(compatible;MSIE7.0;WindowsNT6.0),
         User-Agent:Mozilla/5.0(Macintosh;IntelMacOSX10.6;rv:2.0.1)Gecko/20100101Firefox/4.0.1,
         User-Agent:Opera/9.80(Macintosh;IntelMacOSX10.6.8;U;en)Presto/2.8.131Version/11.11,
         User-Agent:Opera/9.80(WindowsNT6.1;U;en)Presto/2.8.131Version/11.11
]

User_agent=random.choice(UA_list)

request = urllib.request.Request(url)

request.add_header(User-Agent,User_agent)  #  增加请求头   User-Agent

print(request.get_header(User-agent))     # 获取当前的请求头---User-agent

 

3 url编码:

import urllib.request
import urllib.parse

url=https://www.baidu.com/s?

keyword={kd:哈哈哈 }#‘kd=嘿嘿嘿‘

print(urllib.parse.urlencode(keyword))  #  urllib.request.quote(string)
                #                         urllib.parse.urlencode(dic)

 

4.post请求--静态:

 

                                  页面url不变化

#

完整的请求头 h={ "User-Agent":" Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36", } post_dic={ "i":keyword, "from":"AUTO", "to":"AUTO", "smartresult":"dict&client=fanyideskweb", "doctype":"json", "version":"2.1", "keyfrom":"fanyi.web", "ction":"FY_BY_REALTIME", "typoResult":"true", } post_data=urllib.parse.urlencode(post_dic).encode(utf-8) request=urllib.request.Request(post_url,data=post_data,headers=h) response=urllib.request.urlopen(request) print(response.read().decode(utf-8))

 

5.post请求 -- 动态(ajax加载):

 

#   爬虫----数据来源

#   AJAX 方式加载的页面   数据来源一定是 JSON

# 拿到了json 就是 拿到了网页的数据

  Post数据
urllib.request.urlopen(url, data=None, [timeout, ]*, cafile=None, capath=None, cadefault=False, context=None)
urlopen() 的data参数默认为None,当data参数不为空的时候,urlopen()提交方式为Post。
from urllib import parse,request url=rhttps://movie.douban.com/j/chart/top_list?type=11&interval_id=100%3A90&action= headers={"User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"} # ------抓包工具获取----
post_data
={ # "type":"11", # "interval_id":"100:90", # "action":"", "start":"0", "limit":"20"} post_data=parse.urlencode(post_data).encode(utf-8) # post的数据必须是字节码 my_request=request.Request(url,data=post_data,headers=headers) my_response=request.urlopen(my_request) print(my_response.read().decode(utf-8)) # 解码

 

urllib.request--urllib2

标签:and   file   获取   doctype   tar   star   cad   utf-8   width   

原文地址:http://www.cnblogs.com/big-handsome-guy/p/7710406.html


评论


亲,登录后才可以留言!