c# 爬虫

2021-03-04 06:25

阅读：726

标签：des tty gil 爬取 res utf-8 www select like

      刚学c#不久，想体验一下使用c#语言来爬虫，之前是用python来爬取的。（其实就是语法不一样而已，??）
      下面写了个简单例子

爬取图片

创建dotnet new console --name crawler
安装dotnet add package HtmlAgilityPack --version 1.11.23

string url = "https://www.iqiyi.com/dianying_new/i_list_paihangbang.html";
HttpWebRequest request = HttpWebRequest.Create(url) as HttpWebRequest;
request.Timeout = 30 * 1000;
request.UserAgent = @"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) 
                    Chrome/81.0.4044.122 Safari/537.36";
request.ContentType = "text/html; charset=utf-8";
request.CookieContainer = new CookieContainer();
string html;
using(HttpWebResponse response = (HttpWebResponse)request.GetResponse())
{
    if(response.StatusCode != HttpStatusCode.OK)
    {
        return;
    }
    else
    {
        StreamReader sr = new StreamReader(response.GetResponseStream(),Encoding.GetEncoding("utf-8"));
        html = sr.ReadToEnd();
        sr.Close();
    }
}

HtmlDocument document = new HtmlDocument();
document.LoadHtml(html);
string li_xpath = "//*[@id=‘widget-tab-0‘]/div[2]/div/div[1]/ul/li";
HtmlNodeCollection liNodeList =  document.DocumentNode.SelectNodes(li_xpath);
foreach(var liNode in liNodeList)
{
    string img_xpath = "//*/a/img";
    HtmlDocument imgDocument = new HtmlDocument();
    imgDocument.LoadHtml(liNode.OuterHtml);
    HtmlNode imgNode = imgDocument.DocumentNode.SelectSingleNode(img_xpath);
    if (imgNode.Attributes["src"] != null)
    {
        string imgUrl = imgNode.Attributes["src"].Value;
    }
}

c# 爬虫

标签：des tty gil 爬取 res utf-8 www select like

原文地址：https://www.cnblogs.com/hwxing/p/12949020.html

上一篇：vue-cli webpack躺坑之旅

下一篇：C#.NET自定义下拉框实现选中下拉list的值和显示框内的值不同

文章来自：搜素材网的编程语言模块，转载请注明文章出处。
文章标题：c# 爬虫
文章链接：http://soscw.com/index.php/essay/59859.html

亲，登录后才可以留言！

c# 爬虫

爬取图片

评论

热门文章

推荐文章

最新文章

置顶文章