Implementation of Scrapy-based Distributed Crawler Software
WENG Shao-fei;LIAO Xiang-yu;ZHU Guang-yi;FAN Ya-jing;GAN Yu-jian;Institute of Information and Statistics, Guangxi University of Finance and Economics;
In recent years, with the rapid development of the Internet, people are paying more and more attention to the mining and application of data. To make the program automatically browse the massive web pages on the Internet, and collect the information needed by users and then convert it into a form that is easy to read and store, so than it can convenient for people to understand and use. distributed crawler and Scrapy framework is studied, A distributed website collection system by combine the Scrapy and Redis is designed and implemented. The result shows that the system is simple to operate, which can reduce the difficulty of writing crawler script, and that the distributed structure improves the efficiency.
【CateGory Index】： TP311.52