2017.7.5

Keywords

  • lxml
  • 东野圭吾-白夜行
  • requests
  • multiprocessing.dumy
  • pool
  • file

Check Out

  • 爬取东野圭吾-白夜行小说内容

  • 获取网站代码:requests

    import requests
    html_str = requests.get(url).content.decoding("utf-8")
    
  • 解析内容:lxml

    import lxml.html
    selector = lxml.html.fromstring(html_str)
    content_list = selector.xpath('//div[@class="contentbox"]/p/text()')
    
  • 多线程爬虫:multiprocessing.dumy, Pool

    from multiprocessing.dumy import Pool
    pool = Pool(7)
    pool.map(run_fun_name, resource_list)
    
  • 写入文件

    with open('white.txt', 'w', encoding="utf-8") as f:
    f.write("header")
    f.write("content")
    

results matching ""

    No results matching ""