nutch2.1抓取中文网站
2020-12-13 05:41
标签:local 中文网页 mysql 对nutch添加中文网站抓取功能。 1、中文网页抓取 A、调整mysql配置,避免存入mysql的中文出现乱码。修改 ${APACHE_NUTCH_HOME} /runtime/local/conf/gora.properties ############################### # MySQL properties # ############################### gora.sqlstore.jdbc.driver=com.mysql.jdbc.Driver gora.sqlstore.jdbc.url=jdbc:mysql://10.10.11.252:3306/nutch? useUnicode=true&characterEncoding=utf8&autoReconnect=true&zeroDateTimeBehavior=convertToNull gora.sqlstore.jdbc.user=devuser gora.sqlstore.jdbc.password=devuser B、修改 ${APACHE_NUTCH_HOME} /runtime/local/conf/nutch-site.xml文件 This allows selecting non-English language as default one to retrieve. It is a useful setting for search engines build for certain national group. property> nutch2.1抓取中文网站,搜素材,soscw.com nutch2.1抓取中文网站 标签:local 中文网页 mysql 原文地址:http://8917152.blog.51cto.com/8907152/1413052