Full-Text Search:
Home|Journal Papers|About CNKI|User Service|FAQ|Contact Us|中文
《Chinese Journal of Computers》 2001-01
Add to Favorite Get Latest Update

A Chinese Web Page Classifier Based on Support Vector Machine and Unsupervised Clustering

LI Xiao Li LIU Ji Min SHI Zhong Zhi (Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080)  
This paper presents a new algorithm that combines Support Vector Machine (SVM) and unsupervised clustering. After analyzing the characteristics of web pages, it proposes a new vector representation of web pages and applies it to web page classification. Given a training set, the algorithm clusters positive and negative examples respectively by the unsupervised clustering algorithm (UC), which will produce a number of positive and negative centers. Then, it selects only some of the examples to input to SVM according to ISUC algorithm. At the end, it constructs a classifier through SVM learning. Any text can be classified by comparing the distance of clustering centers or by SVM. If the text nears one cluster center of a category and far away from all the cluster centers of other categories, UC can classify it rightly with high possibility, otherwise SVM is employed to decide the category it belongs. The algorithm utilizes the virtues of SVM and unsupervised clustering. The experiment shows that it not only improves training efficiency, but also has good precision.
【Fund】: 国家自然科学基金!(6 980 30 10 );; 国家“八六三”高技术研究发展计划!(86 3-5 11-946 -0 10 )资助
【CateGory Index】: TP393.09
Download(CAJ format) Download(PDF format)
CAJViewer7.0 supports all the CNKI file formats; AdobeReader only supports the PDF format.
【References】
Chinese Journal Full-text Database 10 Hits
1 GAO Bo SUI Li (School of Computer & Information Engineering, Changzhou Institute of Technology, Changzhou 213002);The Selection of Classify Attribute from Web Page Training-set Base on Rough Sets[J];Journal of Changzhou Institute of Technology;2004-02
2 CHEN Zi-Jun~1 WANG Xin-Yu (1.College of Information Science and Engineering,Yanshan University,Qinhuangdao,Hebei 066004,China);Method of sessions' identification based on feature of web pages[J];Journal of Yanshan University;2008-01
3 ;Web Page Feature Extraction Method for Chinese Web Page Categorization[J];Computer Development & Applications;2005-10
4 LI Yu,HUANG Xi-yue,ZHOU Xin,LIU Tao ( College of Automation, Chongqing University, Chongqing 400044, China );A New Method of Online Adaptive Classification of Web Pages[J];Journal of Chongqing University(Natural Science Edition);2003-07
5 HUANG Fa_liang~1,ZHONG Zhi~2(1.Department of Computer Science,Guangxi Normal University,Guilin 541000,China;2.Department of Mathematics and Computer Science,Guangxi Teachers College,Nanning 530001,China);Support Vector Machine Used in Classification[J];Journal of Guangxi Teachers College;2004-03
6 FU De-yu,DAI CHENG-qin,ZHONG Wei(Network and Information Center,Harbin Institute of Technology,Harbin 150001,China);A website categorization system based on key resource[J];Journal of Harbin Institute of Technology;2006-01
7 TONG Yala(School of Science,Hubei University of Technology,Wuhan 430068);A web document categorization rule extraction based on chaos particle swarm optimization combining linkage clustering[J];Journal of Huazhong Normal University(Natural Sciences);2008-04
8 CHEN Xin,ZHOU Li(Signal & Information Processing Lab,Beijing University of Technology,Beijing 100124,China);Design and Implementation of Web Content Extraction for Recognition of Pornographic Web Pages[J];Measurement & Control Technology;2009-05
9 JIA Zi-Yan,HE Qing,ZHANG Hai-Jun,LI Jia-You,and SHI Zhong-Zhi (Key Laboratory of Intelligent Information Processing,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100080) (Graduate School,Chinese Academy of Sciences,Beijing 100039);A News Event Detection and Tracking Algorithm Based on Dynamic Evolution Model[J];Journal of Computer Research and Development;2004-07
10 Yuan Jiazheng~ 1,2 , Xu De~1, and Bao Hong~2 ~ 1 (School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044) ~ 2 (Institute of Information Technology, Beijing Union University, Beijing 100101);An Efficient XML Documents Classification Method Based on Structure and Keywords Frequency[J];Journal of Computer Research and Development;2006-08
China Proceedings of conference Full-text Database 3 Hits
1 Zhang Dongna Peng Hong Wu Tiefeng(School of Computer & Mathematical and physical, Xihua University, Chengdu 610039);A Classification Method of Chinese Web Pages Based on Rough Set and Bayes Classifier[A];[C];2004
2 Wang Shiwei Li Aiguo (Department of Computer Science and Technology, Xi'an University of Science and Technology, Xi'an 710054, China);Study on Fraud Detection in Tax Declaration[A];[C];2005
3 ;Web Mining中的网页分类[A];[C];2001
【Citations】
Chinese Journal Full-text Database 1 Hits
1 LI Xiao Li and SHI Zhong Zhi (Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080);A DATA MINING METHOD TO ACQUIRE PART OF SPEECH RULES IN CHINESE TEXT[J];Journal of Computer Research and Development;2000-12
【Co-citations】
Chinese Journal Full-text Database 8 Hits
1 HU Xi-heng (Anshan Normal University,Anshan Liaoning 114007,China);Research and Design of Spam Filtering System Model[J];Journal of Anshan Normal University;2009-02
2 GAO Bo SUI Li (School of Computer & Information Engineering, Changzhou Institute of Technology, Changzhou 213002);The Selection of Classify Attribute from Web Page Training-set Base on Rough Sets[J];Journal of Changzhou Institute of Technology;2004-02
3 ;Realization of Automatic Correction System of Chinese Text's Part of Speech[J];Computer Development & Applications;2004-01
4 HU Xi-Heng(Department of Mathematics,Anshan Normal University,Anshan Liaoning 114007,China);Application of RBF Network on Text Categorization[J];Journal of Anshan Normal University;2011-02
5 Ren Zhen;Algorithms of Retrieval System[J];Computer & Digital Engineering;2007-08
6 SHI Chang-qiong1,2,HUANG Hui1,WANG Da-wei1,JIANG La-lin1,FU Zong-wen1(1.Institute of Computer & Communication Engineering,Changsha University of Science & Technology,Changsha 410076,China;2.School of Computer & Communication,Hunan University,Changsha 410082,China);Web text classification algorithm fused LSI and SVC[J];Application Research of Computers;2009-12
7 QIAN Yi-li,ZHENG Jia-heng (The Department of Computer Science,Shanxi University, Taiyuan, Shanxi 030006,China);Research on the Method of Automatic Correction of Chinese Part-of-Speech Tagging[J];Journal of Chinese Information Processing;2004-02
8 CHEN Wenliang ZHU Jingbo Lü Xueqiang;Acquisition and Optimization of Rules for Part of Speech Tagging[J];Terminology Standardization & Information Technology;2004-02
China Proceedings of conference Full-text Database 1 Hits
1 Chen WenLiang Zhu JingBo Lv Xueqiang Yao TianShun Institute of Computer Software & Theory, Northeastern University, Shenyang 110004;Acquisition and Optimization of Rules for Part of Speech Tagging[A];[C];2002
©2006 Tsinghua Tongfang Knowledge Network Technology Co., Ltd.(Beijing)(TTKN) All rights reserved