python基础篇-爬虫urlparse使用及简单示例 (一)
2021-01-29 05:15
标签:inter geturl log csharp gen col arp integer ati Attribute Index Value Value if not present 0 URL scheme specifier scheme parameter 1 Network location part empty string 2 Hierarchical path empty string 3 Parameters for last path element empty string 4 Query component empty string 5 Fragment identifier empty string User name Password Host name (lower case) Port number as integer, if present 简单的demo示例 思路如下: 参考:https://docs.python.org/3/library/urllib.parse.html?highlight=urlparse#urllib.parse.urlparse https://blog.csdn.net/fengxinlinux/article/details/77281253 https://www.runoob.com/python/python-func-open.html python基础篇-爬虫urlparse使用及简单示例 (一) 标签:inter geturl log csharp gen col arp integer ati 原文地址:https://www.cnblogs.com/guanbin-529/p/12833766.html>>> from urllib.parse import urlparse
>>> o = urlparse(‘http://www.cwi.nl:80/%7Eguido/Python.html‘)
>>> o
ParseResult(scheme=‘http‘, netloc=‘www.cwi.nl:80‘, path=‘/%7Eguido/Python.html‘,
params=‘‘, query=‘‘, fragment=‘‘)
>>> o.scheme
‘http‘
>>> o.port
80
>>> o.geturl()
‘http://www.cwi.nl:80/%7Eguido/Python.html‘
>>> from urllib.parse import urlparse
>>> urlparse(‘//www.cwi.nl:80/%7Eguido/Python.html‘)
ParseResult(scheme=‘‘, netloc=‘www.cwi.nl:80‘, path=‘/%7Eguido/Python.html‘,
params=‘‘, query=‘‘, fragment=‘‘)
>>> urlparse(‘www.cwi.nl/%7Eguido/Python.html‘)
ParseResult(scheme=‘‘, netloc=‘‘, path=‘www.cwi.nl/%7Eguido/Python.html‘,
params=‘‘, query=‘‘, fragment=‘‘)
>>> urlparse(‘help/Python.html‘)
ParseResult(scheme=‘‘, netloc=‘‘, path=‘help/Python.html‘, params=‘‘,
query=‘‘, fragment=‘‘)
scheme
netloc
path
params
query
fragment
username
None
password
None
hostname
None
port
None
>>>from urllib.parse import urljoin
>>>urljoin(‘http://www.cwi.nl/%7Eguido/Python.html‘, ‘FAQ.html‘)
‘http://www.cwi.nl/%7Eguido/FAQ.html‘>>> urljoin(‘http://www.cwi.nl/%7Eguido/Python.html‘,
... ‘//www.python.org/%7Eguido‘)
‘http://www.python.org/%7Eguido‘
>>>urllib.request.quote(‘http://www.baidu.com‘)
‘http%3A//www.baidu.com‘
>>>urllib.request.unquote(‘http%3A//www.baidu.com‘)
‘http://www.baidu.com‘
import urllib.request
import urllib.parse
url=‘http://www.baidu.com‘
hearder={
‘User-Agent‘:‘Mozilla/5.0 (X11; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36‘
}
request=urllib.request.Request(url,headers=header)
reponse=urllib.request.urlopen(request).read()
h=open("./1.html","wb")
h.write(reponse)
h.close()
上一篇:C# 根据论文 像素差异算法【个人实验还是比较好使的】
下一篇:使用线程池创建线程
文章标题:python基础篇-爬虫urlparse使用及简单示例 (一)
文章链接:http://soscw.com/essay/48541.html