Full-Text Search:
Home|Journal Papers|About CNKI|User Service|FAQ|Contact Us|中文
《Computer Applications and Software》 2014-07
Add to Favorite Get Latest Update

A SEPARATOR BAR-BASED WEBPAGE SEGMENTATION ALGORITHM

Sun Xuebo;Zhang Dawei;School of Computer Software,University of Science and Technology Liaoning;  
The advent of the networks information era has increased the content of information within the internet in exponential mode,so it makes the research on how to extract useful information from the internet in an efficient way become an important topic in network information retrieval domain.Based on two basic features of web pages,the visibility and the unification,in this paper we put forward a new algorithm which separates the webpages to blocks using the detection separator bars.The use of the conception of typesetting in relative position solves the problem of how to express the relative positions of each page blocks under the circumstances that the heights of them are partly unknown.By computing the number of blocks,the length,width and height of the information of current node,the algorithm determines the termination condition of segmentation during its process,which ensures the execution efficiency and effectiveness of the algorithm.Experimental results show that this algorithm is highly effective.
【CateGory Index】: TP393.092;TP391.3
Download(CAJ format) Download(PDF format)
CAJViewer7.0 supports all the CNKI file formats; AdobeReader only supports the PDF format.
【Citations】
Chinese Journal Full-text Database 3 Hits
1 YU Man-quan 1,2,CHEN Tie-rei 1,2,XU Hong-bo 1(1.Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100080,China; 2.Graduate School,Chinese Academy of Sciences,Beijing 100039,China);Research and design of HTML parser based on page segmentation[J];Computer Applications;2005-04
2 Wang Lei Jiang Jianzhong Guo Junli(Department of Communication Engineering,Information Engineering University of PLA,Zhengzhou 450002,Henan,China);INFORMATION EXTRACTION FROM WEB PAGE BASED ON EXTENDED DOM TREE[J];Computer Applications and Software;2007-06
3 SUN Xaohu1,2 i,LIU Jian1,2,WANG2 Jinlin,CHEN2 Xiao(1Guaduate University of Chinese Academy of Sciences,Beijing,100190,China 2National Network New Media Engineering Research Center,Beijing,100190,China);CSS Based Segmentation of Web Pages[J];Microcomputer Applications;2008-09
【Co-citations】
Chinese Journal Full-text Database 10 Hits
1 Song Jian-hao Zhao Gang (School of Information Management, Beijing Information Science & Technology University Beijing 100192);Heuristic-Optimizing Based Webpage Elements Extraction[J];Information Security and Technology;2012-06
2 WANG Nan(Science & Technology Department,Dalian Radio & Television University,Dalian 116021,China);A conversion algorithm from Web data to XML[J];Journal of Dalian Maritime University;2010-03
3 LIU Yong-nian,ZHONG Cheng,JIAO Xiao-jiao(School of Computer and Electronics and Information,Guangxi University,Nanning 530004,China);A Web information extraction method based on unit identification[J];Journal of Guangxi University(Natural Science Edition);2011-05
4 REN Yu,FAN Yong,ZHENG Jia-heng(School of Computer and Information Technology,Key Laboratory of Ministry of Education for ComputationIntelligence and Chinese Information Processing,Shanxi University,Taiyuan 030006,China);Extraction of Topical Text from Web Pages Based on Page Segmentation[J];Journal of Guangxi Normal University(Natural Science Edition);2009-01
5 ZHANG Chun-yuan(College of Information Science and Technology,Hainan University,Haikou Hainan 570228,China);Automatic Web News Content Extraction Based on CRFs[J];Journal of Guangxi Normal University(Natural Science Edition);2011-01
6 XIAN Xiao-ping(Department of Computer Science,Gansu Normal University for Nationalities,Hezuo Gansu 747000);The Explore of Vertical Searching Engine[J];Journal of Gansu Normal Colleges;2013-02
7 WU Bin-jie,XU Zi-wei,YU Fei-hua(Information Science and Technology School of Zhejiang Shuren University,Hangzhou 310015,China);Data Collection System Of Microblog Based on Sina’s API[J];Computer Knowledge and Technology;2013-17
8 XIANG Cheng-guan;XIONG Shi-huan;Mathematics and Computer Science Institute,Guizhou Normal College;;An extraction algorithm of web fragment information based on feature tree[J];Journal of Lanzhou University of Technology;2014-01
9 LONG Long;DENG Wei;YUAN Chang-an;Department of Computer Science and Information Technology,Guangxi Teachers Education University;Guangxi Cancer Institute;;Extraction approach of green network unhealthy blog comment system[J];Journal of Guilin University of Technology;2014-01
10 HU Jun-kun,WANG Hao,YANG Jing (School of Computer and Information,Hefei University of Technology,Hefei 230009,China);A method of Web news extraction based on decision tree[J];Journal of Hefei University of Technology(Natural Science);2009-06
China Proceedings of conference Full-text Database 2 Hits
1 Yong LIANG,Wen ZHANG Academy of Equipment Command & Technology;Design of Network Public Opinion Acquisition System[A];[C];2011
2 WU Qian,YANG Xiao,ZHANG Zhao-xin School of Computer Science and Technology Harbin Institute of Technology(Weihai),Weihai 264209;Web information extraction based on visual characteris tics[A];[C];2010
【Secondary Citations】
Chinese Journal Full-text Database 8 Hits
1 WANG Qi 1, TANG Shi Wei 1,2 , YANG Dong Qing 2, and WANG Teng Jiao 2 1 (National Laboratory on Machine Perception, Peking University, Beijing 100871) 2 (Department of Computer Science and Technology, Peking University, Beijing 100871);DOM-Based Automatic Extraction of Topical Information from Web Pages[J];Journal of Computer Research and Development;2004-10
2 Zhang Shuyu Zhu Zhongying(Department of Automation,Shanghai Jiaotong University,Shanghai200030);The Study of Semi-structured Information Retrivel Based on Web[J];Computer Engineering and Applications;2004-13
3 LI Lei1,2,WANG Jin-lin1,BAI He1,2,HU Jing-jing1,2 1.DSP and Network Research Center,Institute of Acoustics,Chinese Academy of Sciences,Beijing 100080,China 2.Graduate University of Chinese Academy of Sciences,Beijing 100039,China;Research and implementation of FFT-based extraction algorithm of webpage content main body[J];Computer Engineering and Applications;2007-30
4 LI Xiao Dong GU Yu\|Qing (Institute of Software, Chinese Academy of Sciences, Beijing 100080);DOM-based Information Extraction for the Web Sources[J];Chinese Journal of Computers;2002-05
5 YU Man-quan 1,2,CHEN Tie-rei 1,2,XU Hong-bo 1(1.Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100080,China; 2.Graduate School,Chinese Academy of Sciences,Beijing 100039,China);Research and design of HTML parser based on page segmentation[J];Computer Applications;2005-04
6 SUN Cheng-jie, GUAN Yi (Dept. of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China);A Statistical Approach for Content Extraction from Web Page[J];Journal of Chinese Information Processing;2004-05
7 LUO Yonglian,QIN Zhenji(Jinzhong University,Jinzhong,030600,China);Research on Extracting Topic Content from News Webpages[J];Microcomputer Applications;2007-05
8 WU Peng-fei , MENG Xiang-zheng , LIU Jun-xiao , MA Feng-juan (School of Communication, Shandong Normal University, Jinan 250014 China);Segmentation and Identification of Web Page's Areas[J];Modern Computer;2006-06
©2006 Tsinghua Tongfang Knowledge Network Technology Co., Ltd.(Beijing)(TTKN) All rights reserved