python使用scrapy发送post请求的坑

2018-09-21 18:57

阅读:729

  使用requests发送post请求

  先来看看使用requests来发送post请求是多少好用,发送请求

  Requests 简便的 API 意味着所有 HTTP 请求类型都是显而易见的。例如,你可以这样发送一个 HTTP POST 请求:

   >>>r = requests.post(

  使用data可以传递字典作为参数,同时也可以传递元祖

   >>>payload = ((key1, value1), (key1, value2)) >>>r = requests.post(

  传递json是这样

   >>>import json >>>url =

  2.4.2 版的新加功能:

   >>>url =

  也就是说,你不需要对参数做什么变化,只需要关注使用data=还是json=,其余的requests都已经帮你做好了。

  使用scrapy发送post请求

  通过源码可知scrapy默认发送的get请求,当我们需要发送携带参数的请求或登录时,是需要post、请求的,以下面为例

   from scrapy.spider import CrawlSpider from scrapy.selector import Selector import scrapy import json class LaGou(CrawlSpider): name = myspider def start_requests(self): yield scrapy.FormRequest( url=这里不能给bool类型的True,requests模块中可以 pn: 1,#这里不能给int类型的1,requests模块中可以 kd: python },这里的formdata相当于requ模块中的data,key和value只能是键值对形式 callback=self.parse ) def parse(self, response): datas=json.loads(response.body.decode())[content][positionResult][result] for data in datas: print(data[companyFullName] + str(data[positionId]))

  官方推荐的 Using FormRequest to send data via HTTP POST

   return [FormRequest(url=

  这里使用的是FormRequest,并使用formdata传递参数,看到这里也是一个字典。

  但是,超级坑的一点来了,今天折腾了一下午,使用这种方法发送请求,怎么发都会出问题,返回的数据一直都不是我想要的

   return scrapy.FormRequest(url, formdata=(payload))

  在网上找了很久,最终找到一种方法,使用scrapy.Request发送请求,就可以正常的获取数据。

  复制代码 代码如下:return scrapy.Request(url, body=json.dumps(payload), method=POST, headers={Content-Type: application/json},)

  参考:Send Post Request in Scrapy

   my_data = {field1: value1, field2: value2} request = scrapy.Request( url, method=POST, body=json.dumps(my_data), headers={Content-Type:application/json} )

  FormRequest 与 Request 区别

  在文档中,几乎看不到差别,

  The FormRequest class adds a new argument to the constructor. The remaining arguments are the same as for the Request class and are not documented here.
Parameters: formdata (dict or iterable of tuples) – is a dictionary (or iterable of (key, value) tuples) containing HTML Form data which will be url-encoded and assigned to the body of the request.

  说FormRequest新增加了一个参数formdata,接受包含表单数据的字典或者可迭代的元组,并将其转化为请求的body。并且FormRequest是继承Request的

  最终我们传递的{‘key: ‘value, ‘k: ‘v}会被转化为key=value&k=v 并且默认的method是POST,再来看看Request

   class Request(object_ref): def __init__(self, url, callback=None, method=GET, headers=None, body=None, cookies=None, meta=None, encoding=utf-8, priority=0, dont_filter=False, errback=None, flags=None): self._encoding = encoding # this one has to be set first self.method = str(method).upper()

  默认的方法是GET,其实并不影响。仍然可以发送post请求。这让我想起来requests中的request用法,这是定义请求的基础方法。

   def request(method, url, **kwargs): Constructs and sends a :class:`Request <Request>`. :param method: method for the new :class:`Request` object. :param url: URL for the new :class:`Request` object. :param params: (optional) Dictionary or bytes to be sent in the query string for the :class:`Request`. :param data: (optional) Dictionary or list of tuples ``[(key, value)]`` (will be form-encoded), bytes, or file-like object to send in the body of the :class:`Request`. :param json: (optional) json data to send in the body of the :class:`Request`. :param headers: (optional) Dictionary of HTTP Headers to send with the :class:`Request`. :param cookies: (optional) Dict or CookieJar object to send with the :class:`Request`. :param files: (optional) Dictionary of ``name: file-like-objects`` (or ``{name: file-tuple}``) for multipart encoding upload. ``file-tuple`` can be a 2-tuple ``(filename, fileobj)``, 3-tuple ``(filename, fileobj, content_type)`` or a 4-tuple ``(filename, fileobj, content_type, custom_headers)``, where ``content-type`` is a string defining the content type of the given file and ``custom_headers`` a dict-like object containing additional headers to add for the file. :param auth: (optional) Auth tuple to enable Basic/Digest/Custom HTTP Auth. :param timeout: (optional) How many seconds to wait for the server to send data before giving up, as a float, or a :ref:`(connect timeout, read timeout) <timeouts>` tuple. :type timeout: float or tuple :param allow_redirects: (optional) Boolean. Enable/disable GET/OPTIONS/POST/PUT/PATCH/DELETE/HEAD redirection. Defaults to ``True``. :type allow_redirects: bool :param proxies: (optional) Dictionary mapping protocol to the URL of the proxy. :param verify: (optional) Either a boolean, in which case it controls whether we verify the servers TLS certificate, or a string, in which case it must be a path to a CA bundle to use. Defaults to ``True``. :param stream: (optional) if ``False``, the response content will be immediately downloaded. :param cert: (optional) if String, path to ssl client cert file (.pem). If Tuple, (cert, key) pair. :return: :class:`Response <Response>` object :rtype: requests.Response Usage:: >>> import requests >>> req = requests.request(GET,

  以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持脚本之家。


评论


亲,登录后才可以留言!