Full-Text Search:
Home|Journal Papers|About CNKI|User Service|FAQ|Contact Us|中文
《Journal of Chinese Information Processing》 2002-06
Add to Favorite Get Latest Update

A New Statistical-based Method in Automatic Text Classification

LIU Bin 1 HUANG Tie jun 2 CHENG Jun 3 GAO Wen 1 (1 Institute of Computing Technology Chinese Academy of Sciences Beijing 100080 2 Grduate School of Chinese Academy of Sciences Beijing 100080 3 The Library of Chinese Academy of Sciences Beijing 100080 China)  
Automatic text classification is defined as the task to assign pre defined category labels to documents.To improve the classification performance,this article puts forward the multi level feature selection method and the kernel based distance weighted KNN algorithm.We extract the statistical text features on three different levels as Chinese letters,the common wordlist and the professional wordlist,which can represent more statistical character of the document set.The kernel based weighted KNN algorithm solves the multi peak distribution problem and the overlap boundary problem of the sample set,as well as the classifier's precise decision problem.In practical use,the Internet and text data bases provide many pre classified training samples.But some of them are not good for training the classifier.We use sample weightiness analysis to address this problem.The experimental system shows the effectiveness of the method.
【Fund】: 国家科学数字图书馆重大专项 (CSDL2 0 0 2 - 18)
【CateGory Index】: TP391.1
Download(CAJ format) Download(PDF format)
CAJViewer7.0 supports all the CNKI file formats; AdobeReader only supports the PDF format.
【References】
Chinese Journal Full-text Database 10 Hits
1 ;Chinese Text Categorization based on Mixed Features[J];Computer Development & Applications;2005-04
2 SONG Dong-feng,ZHANG Zhi-hao(Department of Computer Science and Technology,Tongji University,Shanghai 200092,China);Short-Text Categorization[J];Computer and Information Technology;2007-01
3 HU Jia-ni,XU Wei-ran,GUO Jun,DENG Wei-hong (Beijing University of Posts and Telecommunications, Beijing 100876, China);Study on feature selection methods in Chinese text categorization[J];Study On Communications;2005-03
4 DING Xiao-jian,ZHAO Yin-liang,LI Yuan-cheng(School of Electronics and Information Engineering,Xi'an Jiaotong University,Xi'an,Shaanxi 710049,China);Secondary Descent Active Set Algorithm Based on SVM[J];Acta Electronica Sinica;2011-08
5 WANG Changhou,LUO Yong-lian (Department of Computer Science, Jinzhong College, Jinzhong, Shanxi, China,030600);Study on Text Classification Method Based on Accident News Webpage[J];Journal of Changzhi University;2006-02
6 LUO Chang-ri1,ZHANG Xin-hua2,HE Ting-ting2,LUO Shi-guang3 (1.School of Network Education,HZNU,Wuhan 430079,China; 2.Department of Computer Science,HZNU,Wuhan 430079,China; 3.Mathematics Department,Guangdong University of Finance Applied,Guangzhou 510521,China);Chinese Text Classification Based on DCM[J];Computer Engineering and Applications;2006-34
7 KANG Pingbo1,WANG Wenjie2 (1. Graduate School of University of Science and Technology of China, Beijing 1 00039; 2. Information Science and Engineering School, Graduate School of Chinese Academ y of Sciences, Beijing 100039);Internet Robot Based on Automatic Classification[J];Computer Engineering;2003-21
8 KANG Pingbo1 , WANG Wenjie2 (1. Graduate School, University of Science and Technology of China, Beijing 100039; 2. Information Science and Engineering School, Graduate School, Chinese Academy of Sciences , Beijing 100039);Search Engine Filter System Based on Automatic Classification[J];Computer Engineering;2004-02
9 KANG Pingbo1, TIAN Yonghong2, HUANG Tiejun2 (1.Graduat School, University of Science and Technology of China, Beijing 100039; 2. Institute of Computer Technology, ACS);Design and Implementation of Intellectual Internet Robot[J];Computer Engineering;2004-04
10 ZHANG Yufang1,PENG Shiming1,LV Jia2(1.Department of Computer Science,Chongqing University,Chongqing 400045;2.College of Mathematics and Computer Science,Chongqing Normal University,Chongqing 400047);Improvement and Application of TFIDF Method Based on Text Classification[J];Computer Engineering;2006-19
China Proceedings of conference Full-text Database 3 Hits
1 Lu Jiao-li Zheng Jia-heng (Institute of computer and information technology,Taiyuan,030006);The Research Of Text Categorization Based On Rough Set[A];[C];2004
2 SunXiongyong LuoXiao Tongfang Knowledge Network Technology(Beijing)Co.,Ltd.Beijing 100084;Text Automatic Classification Based on Chinese Library Classification[A];[C];2008
3 Lei Xiaofeng~(1,2),Xia Zhengyi~3,and Xie Kunqing~1 1(Department of Intelligence Science,School of EE & CS,Peking University,Beijing 100871) 2(School of Computer Science & Technology,China University of Mining & Technology,Xuzhou 221116) 3(Logistics Science and Technology Institute,PLA Chief Logistics Department,Beijing 100071);SROC:An Iterative Clustering Approach Oriented to Structure Robustness[A];[C];2007
【Citations】
Chinese Journal Full-text Database 2 Hits
1 Jun Wu, Zuoying Wang, Feng Yu, Xia Wang(Department of Electronic Engineering,Tsinghua UniversityBeijing 100084,P.R.China);Automatic Classification of Chinese Texts[J];JOURNAL OF CHINESE INFORMATION PROCESSING;1995-04
2 Zou TaoWang JichengHuang YuanZhang Fuyan Department of Computer Science and TechnologyNanjing UniversityNanjing210093Email:tzou@graphics.nju.edu.cn;The Design and Implementation of an Automatic Chinese DocumentsClassification System[J];JOURNAL OF CHINESE INFORMATION PROCESSING;1999-03
【Co-citations】
Chinese Journal Full-text Database 10 Hits
1 HU Xi-heng (Anshan Normal University,Anshan Liaoning 114007,China);Research and Design of Spam Filtering System Model[J];Journal of Anshan Normal University;2009-02
2 CAO Feng, ZHANG Dai-yuan (School of Computer, Nanjing University of Posts & Telecommunications, Nanjing 210003, China);Research of Text Categorization Technique[J];Computer Knowledge and Technology;2009-32
3 Zhou Shuigeng, Guan Jihong , Hu Yunfa (Computer Science Department, Fudan University, Shanghai 200433) (  School of Informatics Engineering, Wuhan Technical University of Surveying and Mapping, Wuhan 430079);Chinese Document Categorization without Dictionary Support and Segmentation Processing[J];High Technology Letters;2001-03
4 HU Xi-Heng(Department of Mathematics,Anshan Normal University,Anshan Liaoning 114007,China);Application of RBF Network on Text Categorization[J];Journal of Anshan Normal University;2011-02
5 Chen Qin Zhang Guoxuan Wang Xiaohua (School of Information Engineering);Reserch on Automatic Fuzzy Classification Methods of Text[J];JOURNAL OF HANGZHOU INSTITUTE OF ELECTRONIC ENGINEERING;1999-04
6 WANG Xiao -hua, ZHANG Guo -xuan, LU Bei (School of Computer, Hangzhou Institute of Electronics Engineering, Hangzhou Zhejiang 310037, China);The Factors of Evaluation Text Categorizing Model[J];Journal of Hangzhou Institute of Electronic Engineering;2002-03
7 CHEN Qin,ZHANG Guo\|xuan,WANG Xiao\|hua,LU Bei,ZHAO Bao\|hua(Department of Computer Science and Technology, Hangzhou Institute of Electronic Engineering, Hangzhou 310037,China) Journal of Zhejiang University;Research on text automatic classification method based on fuzzy pattern recognition.[J];JOURNAL OF ZHEJIANG UNIVERSITY(SCIENCES EDITION);2000-03
8 ZHOU Shui Geng ①, GUAN Ji Hong ②, HU Yun Fa ③, and ZHOU Ao Ying ③ ①(State Key Laboratory of Software Engineering, Wuhan University, Wuhan 430072) ②(School of Computer Science, Wuhan University, Wuhan 430072) ③(Department of C;A CHINESE DOCUMENT CATEGORIZATION SYSTEM WITHOUT DICTIONARY SUPPORT AND SEGMENTATION PROCESSING[J];Journal of Computer Research and Development;2001-07
9 Liu Hua(College of Chinese Language and Culture of Jinan University,Guangzhou 510610);Automatic Subject Lemma Obtaining and Selecting in Lexicography[J];Computer Engineering and Applications;2006-24
10 XIA Xu-hu YANG Bing-ru (School of Information Engineering,University of Science and Technology Beijing,Beijing 100083);The Design and Implementation of a System for Retrieving Massive Information and Sharing Services[J];Computer Engineering and Applications;2006-28
China Proceedings of conference Full-text Database 1 Hits
1 Wang Xiaohua Lu Bei Zhang GuoxuanHangzhou Institute of Electronics Engineering, Hangzhou, 310037, China.P.R;An Automatic Fuzzy Text Categorizing Model[A];[C];2001
【Co-references】
Chinese Journal Full-text Database 10 Hits
1 LI Xiao-hong1,TIAN Jun-wei2,3 (1.Department of Mathematics and Physics,Xi'an Technological University,Xi'an 710032,China;2.School of Electronic and Information Engineering,Xi'an Jiaotong University,Xi'an 710049,China;3.School of Mechanical and Electronical Engineering,Xi'an Technological University,Xi'an 710032,China);Cluster validity function used for FCM segmentation[J];Journal of Anhui University(Natural Sciences);2007-05
2 ZHONG Xiao-xu1,2,HU Xue-gang 1(1.School of Computer and Information,HeFei University of Technology,HeFei 230009,China;2.AnHui Vocational and Technical college of Communications,HeFei 230051,China);Correlation analysis of web recruitment information based on data mining[J];Journal of Anhui Institute of Architecture & Industry(Natural Science);2010-04
3 LIU Yang;Pay attention to the overrun note emerging undercurrent[J];Telecommunication Construction;2003-03
4 ZHANG Shi-ying (Department of Philosophy, Peking University, Beijing 100871, China);Sameness, Resemblance and Communication: An Exploration on the Ontological Role of Universality[J];Journal of Peking University(Humanities and Social Sciences);2004-03
5 Ding Feng Dong Na Lin Biqin Yuan Baozong (College of Electronics and Information Engineering, Northern Jiaotong University, Beijing 100044);Automatic Segment in Natural Language Processing System[J];JOURNAL OF NORTHERN JIAOTONG UNIVERSITY;1999-06
6 WU Xu,XU De(School of Computer and Information Technology, Northern Jiaotong University, Beijing 100044,China);Research and Implementation of Automatic Text Categorization System Based on VSM[J];Journal of Northern Jiaotong University;2003-02
7 GONG Han-ming,ZHOU Chang-sheng (Department of Computer Science & Automation,Beijing Institute of Machinery, Beijing 100085, China);Chinese word segmentation system research[J];Journal of Beijing Institute of Machinery;2004-03
8 GU Yi-jun~1,FAN Xiao-zhong~1,WANG Jian-hua~1,WANG Tao~1,HUANG Wei-jin~2(1.Department of Computer Science and Engineering, School of Information Science and Technology, Beijing Institute of Technology, Beijing100081, China; 2.Department of Information Security Science and Technology, China Security University, Beijing100038, China);Automatic Selection of Chinese Stoplist[J];Journal of Beijing Institute of Technology;2005-04
9 CUI Lin~(1,3),SONG Han-tao~1,LU Yu-chang~2(1.Department of Computer Science and Engineering, School of Information Science and Technology, Beijing Institute of Technology, Beijing100081, China; 2.Department of Computer Science and Technology, Tsinghua University, Beijing100084, China; 3.Department of Science and Engineering, China Central Radio and TV University, Beijing100031, China);A Study on Item-Based Collaborative Filtering Algorithm Using Semantic Similarity[J];Journal of Beijing Institute of Technology;2005-05
10 LIU Hua-fu (Department of Computer Science & Technology of Zhangsha University, Zhangsha410003,China);Some Properties of Support Vector Machines Mercer's Nuclear[J];Journal of Beijing Union University;2005-01
China Proceedings of conference Full-text Database 4 Hits
1 XIAO Jian SHEN Cai-Xia;Discusses the technical aspect of data mining[A];[C];2008
2 Wei Wu, Weiqiang Chen, Bo Liu( Dept. of App. Math., Dalian Univ. Tech., China) ( Dept. of Math., Jilin Univ., China);Stock Market by BP Neural Networks[A];[C];2001
3 Sun Lihua Wang Hongjun Xiao Shibin Shi Shuicai TRS Infromaton Technology Limited Company,Bdjing 100101;Rule-based Classifying Application on Text Automatic Category[A];[C];2003
4 Wan Zhong-ying~1 Wang Ming-wen~2 Liao hai-bo~2 Zuo jia-li~2 (1.School of Physics and Communication Electron,Jiangxi Normal University,Nanchang 330027;2.School of Computer Information Engineering,Jiangxi Normal University,Nanchang 330027);Using Dimensional Reduction in Webpage Classification[A];[C];2004
【Secondary References】
Chinese Journal Full-text Database 10 Hits
1 CHENG Wei-hua,YOU Jin-yuan(School of Software Engineering,Shanghai Jiaotong University,Shanghai 200030,China);The design and implementation of content-based anti-spam E-mail system[J];Journal of Anhui University(Natural Sciences);2007-03
2 JIANG Zong-li,LI Xian-lei,XU Xue-ke (College of Computer Science,Beijing University of Technology,Beijing 100124,China);Topic Hub Based Meta Search Engine[J];Journal of Beijing University of Technology;2009-03
3 GAO Ying-fan~1,MA Run-bo~2,LIU Yu-shu~1(1.School of Computer Science and Technology,Beijing Institute of Technology,Beijing 100081,China;2.College of Physics and Electronics,Shanxi University,Taiyuan,Shanxi 030006,China);Design and Implementation of a Fast Text Categorization Algorithm[J];Transactions of Beijing Institute of Technology;2006-12
4 HUANG Wen-liang1,2,LI Shi-jian1,LIU Ju-xin1,XU Cong-fu1(1.College of Computer Science,Zhejiang University,Hangzhou 310027,China;2.Zhejiang Branch of China Unicom Corporation Limited,Hangzhou 310006,China);A Large-Scale Online Spam Short Message Filtering System[J];Journal of Beijing University of Posts and Telecommunications;2008-03
5 ZHU Hao-dong1,2,ZHOU Shu1,2,ZHONG Yong1,2 (1.Chengdu Institute of Computer Application,Chinese Academy of Sciences,Chengdu 610041,P.R.China;2.Graduate School,Chinese Academy of Sciences,Beijing 100039,P.R.China);Feature selection combined ODF with discernible sets[J];Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition);2010-01
6 DONG Zhen-xing1,2,LI Rong1,CHEN Long1(1.College of Computer Science and Technology,Chongqing University of Posts and Telecommunications,Chongqing 400065,P.R.China;2.School of Information Science and Technology,SouthWest JiaoTong University,Chengdu 610031,P.R.China);An email filtering method based on active learning and TCM-EKNN[J];Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition);2011-01
7 LIU Yang (College of Information Science and Engineering,Bohai University,Jinzhou 121013,China);Study on spam email treatment model based on Bayesian method[J];Journal of Changchun Institute of Technology(Natural Sciences Edition);2007-03
8 ZHANG Jing,HOU Xu-dong,LV He-sheng(School of Electronic Information and Automation,Chongqing University of Technology,Chongqing 400050,China);Development of an Intelligent SMS Analysis System Based on Naive Bayes and Support Vector Machine[J];Journal of Chongqing University of Technology(Natural Science);2010-01
9 YIN Hongwei1,ZHAO Wei1,2,YANG Zhiwei1 (1.School of Computer Science and Technology,Chanchun University of Technology Changchun 130012; 2 School of Information and Technology,Jilin Agriculture University,Changchun 130118);The Application of Ant-Colony-Algorithm to the Knn Text Classification[J];Journal of Changchun University of Science and Technology(Natural Science Edition);2010-01
10 TIAN Lin(Department of Computer,Sichuan University,Chengdu 610065,China);An Active Model Spam Filtering Technology Based on SMTP Session Control[J];Journal of Chuxiong Normal University;2009-06
China Proceedings of conference Full-text Database 10 Hits
1 XUE Zheng~1 LIAO Wen-jian~2 (1 Wuhan Research Institute of Posts and Telecommunications,Wuhan 430074,China 2 Fiberhome Telecommunication Tech Co.Ltd,Nanjing 210019,China);Keywords Extraction Base on Location Weight and Entity Recognition[A];[C];2009
2 DONG Xuechun HU Xuegang XIE Fei WU Gongqing School of Computer & Information Hefei University of Technology Hefei Anhui 230009;A Method of Text Categorization Based on Word Vector Space Model[A];[C];2006
3 Zhang Di Zheng Dequan Zhao Tiejun Yu Hao (MOE-MS Key Laboratory of Natural Language Processing and Speech Harbin Institute of Technology,Harbin 150001);Study on the Classification and Identification of Blog Pages[A];[C];2007
4 Huang Wenliang1,2 Li Shijian1 Liu Jiuxin1 Xu Congfu1 (1 College of Computer Science, Zhejiang University, Hangzhou, Zhejiang, 310027; 2 Zhejiang Branch of China Unicom Corporation Lid, Hangzhou, Zhejiang, 310006);The Designing and Realizing of Large-Scale Online Spam Message Filtering System[A];[C];2008
5 Maoshu Ni Hongfei Lin Department of Computer Science and Engineering,Dalian University of Technology,Dalian 116024;Research of Text Categorization Based on Term Co-occurrence Concept[A];[C];2007
6 Wan Zhong-ying~1 Wang Ming-wen~1 Liao hai-bo~2 (1.School of Computer Information Engineering,Jiangxi Normal University,Nanchang 330022; 2.School of Science and Technology,Jiangxi Normal University,Nanchang 330027);A New PP Algorithm and its application to Text Classification[A];[C];2007
7 Jing Hong-fang Wang Bin YangYa-hui ~1 Institute of Computing Technology,Chinese Academy of Sciences,Beijing,100190 ~2 School of Software & Microeleetronics,Peking University,Beijing,102600;Category Distribution-Based Feature Selection Framework[A];[C];2008
8 Sui Su Hongfei Lin Zheng Ye Department of Computer Science and Engineering,Dalian University of Technology,Dalian 116024;Character-based Language Modeling Approach for Spam Filtering[A];[C];2008
9 ZhaoJiyuan,LuoXiao Tongfang Knowledge Network Technology(Beijing) Co.,Ltd.,Beijing,100084;Research on Academic Text Categorization Oriented to CLC[A];[C];2009
10 WANG Zhen~(1,2),Winira Musajan~(1,2),ZHAO Li-hong~(1,2) 1.College of Information Science & Engineering,Xinjiang University,P.R.China,830046 2.Xinjiang Laboratory of Multi-language Information Technology,P.R.China,830046;The Research of Automatic Classification in Uyghur Kazak Kirgiz Multiliteral Search Engine[A];[C];2010
【Secondary Citations】
Chinese Journal Full-text Database 1 Hits
1 Jie Chunyu, Liu Yuan, Liang Nanyuan;On Methods of Chinese Automatic Segmentation[J];Journal of Chinese Information Processing;1989-01
©2006 Tsinghua Tongfang Knowledge Network Technology Co., Ltd.(Beijing)(TTKN) All rights reserved