Full-Text Search:
Home|Journal Papers|About CNKI|User Service|FAQ|Contact Us|中文
《Computer Knowledge and Technology》 2019-20
Add to Favorite Get Latest Update

Implementation of Scrapy-based Distributed Crawler Software

WENG Shao-fei;LIAO Xiang-yu;ZHU Guang-yi;FAN Ya-jing;GAN Yu-jian;Institute of Information and Statistics, Guangxi University of Finance and Economics;  
In recent years, with the rapid development of the Internet, people are paying more and more attention to the mining and application of data. To make the program automatically browse the massive web pages on the Internet, and collect the information needed by users and then convert it into a form that is easy to read and store, so than it can convenient for people to understand and use. distributed crawler and Scrapy framework is studied, A distributed website collection system by combine the Scrapy and Redis is designed and implemented. The result shows that the system is simple to operate, which can reduce the difficulty of writing crawler script, and that the distributed structure improves the efficiency.
【CateGory Index】: TP311.52
Download(CAJ format) Download(PDF format)
CAJViewer7.0 supports all the CNKI file formats; AdobeReader only supports the PDF format.
©2006 Tsinghua Tongfang Knowledge Network Technology Co., Ltd.(Beijing)(TTKN) All rights reserved