c# 爬虫
2021-03-04 06:25
标签:des tty gil 爬取 res utf-8 www select like 创建 c# 爬虫 标签:des tty gil 爬取 res utf-8 www select like 原文地址:https://www.cnblogs.com/hwxing/p/12949020.html 刚学c#不久,想体验一下使用c#语言来爬虫,之前是用python来爬取的。(其实就是语法不一样而已,??)
下面写了个简单例子
爬取图片
dotnet new console --name crawler
安装dotnet add package HtmlAgilityPack --version 1.11.23
string url = "https://www.iqiyi.com/dianying_new/i_list_paihangbang.html";
HttpWebRequest request = HttpWebRequest.Create(url) as HttpWebRequest;
request.Timeout = 30 * 1000;
request.UserAgent = @"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)
Chrome/81.0.4044.122 Safari/537.36";
request.ContentType = "text/html; charset=utf-8";
request.CookieContainer = new CookieContainer();
string html;
using(HttpWebResponse response = (HttpWebResponse)request.GetResponse())
{
if(response.StatusCode != HttpStatusCode.OK)
{
return;
}
else
{
StreamReader sr = new StreamReader(response.GetResponseStream(),Encoding.GetEncoding("utf-8"));
html = sr.ReadToEnd();
sr.Close();
}
}
HtmlDocument document = new HtmlDocument();
document.LoadHtml(html);
string li_xpath = "//*[@id=‘widget-tab-0‘]/div[2]/div/div[1]/ul/li";
HtmlNodeCollection liNodeList = document.DocumentNode.SelectNodes(li_xpath);
foreach(var liNode in liNodeList)
{
string img_xpath = "//*/a/img";
HtmlDocument imgDocument = new HtmlDocument();
imgDocument.LoadHtml(liNode.OuterHtml);
HtmlNode imgNode = imgDocument.DocumentNode.SelectSingleNode(img_xpath);
if (imgNode.Attributes["src"] != null)
{
string imgUrl = imgNode.Attributes["src"].Value;
}
}