2017.7.5
Keywords
- lxml
- 东野圭吾-白夜行
- requests
- multiprocessing.dumy
- pool
- file
Check Out
爬取东野圭吾-白夜行小说内容
获取网站代码:requests
import requests html_str = requests.get(url).content.decoding("utf-8")
解析内容:lxml
import lxml.html selector = lxml.html.fromstring(html_str) content_list = selector.xpath('//div[@class="contentbox"]/p/text()')
多线程爬虫:multiprocessing.dumy, Pool
from multiprocessing.dumy import Pool pool = Pool(7) pool.map(run_fun_name, resource_list)
写入文件
with open('white.txt', 'w', encoding="utf-8") as f: f.write("header") f.write("content")