Full-Text Search:
Home|Journal Papers|About CNKI|User Service|FAQ|Contact Us|中文
《Journal of Chinese Information Processing》 2002-06
Add to Favorite Get Latest Update

A New Statistical-based Method in Automatic Text Classification

LIU Bin 1 HUANG Tie jun 2 CHENG Jun 3 GAO Wen 1 (1 Institute of Computing Technology Chinese Academy of Sciences Beijing 100080 2 Grduate School of Chinese Academy of Sciences Beijing 100080 3 The Library of Chinese Academy of Sciences Beijing 100080 China)  
Automatic text classification is defined as the task to assign pre defined category labels to documents.To improve the classification performance,this article puts forward the multi level feature selection method and the kernel based distance weighted KNN algorithm.We extract the statistical text features on three different levels as Chinese letters,the common wordlist and the professional wordlist,which can represent more statistical character of the document set.The kernel based weighted KNN algorithm solves the multi peak distribution problem and the overlap boundary problem of the sample set,as well as the classifier's precise decision problem.In practical use,the Internet and text data bases provide many pre classified training samples.But some of them are not good for training the classifier.We use sample weightiness analysis to address this problem.The experimental system shows the effectiveness of the method.
【Fund】: 国家科学数字图书馆重大专项 (CSDL2 0 0 2 - 18)
【CateGory Index】: TP391.1
Download(CAJ format) Download(PDF format)
CAJViewer7.0 supports all the CNKI file formats; AdobeReader only supports the PDF format.
【References】
Chinese Journal Full-text Database 10 Hits
1 ;Chinese Text Categorization based on Mixed Features[J];Computer Development & Applications;2005-04
2 SONG Dong-feng,ZHANG Zhi-hao(Department of Computer Science and Technology,Tongji University,Shanghai 200092,China);Short-Text Categorization[J];Computer and Information Technology;2007-01
3 HU Jia-ni,XU Wei-ran,GUO Jun,DENG Wei-hong (Beijing University of Posts and Telecommunications, Beijing 100876, China);Study on feature selection methods in Chinese text categorization[J];Study On Communications;2005-03
4 YU Su-ya(Air Force Aviation Instrument Measurement Station,Beijing 100070,China);Research on information retrieval technique based on forward counter propagation neural network[J];Electronic Design Engineering;2012-19
5 MA Xiao-long(Department of Computer Science,GanSu Normal University for Nationalities,Hezuo 747000,China);The application of Bayesian composite algorithm in the classification of network message[J];Journal of Foshan University(Natural Science Edition);2013-02
6 WANG Changhou,LUO Yong-lian (Department of Computer Science, Jinzhong College, Jinzhong, Shanxi, China,030600);Study on Text Classification Method Based on Accident News Webpage[J];Journal of Changzhi University;2006-02
7 LUO Chang-ri1,ZHANG Xin-hua2,HE Ting-ting2,LUO Shi-guang3 (1.School of Network Education,HZNU,Wuhan 430079,China; 2.Department of Computer Science,HZNU,Wuhan 430079,China; 3.Mathematics Department,Guangdong University of Finance Applied,Guangzhou 510521,China);Chinese Text Classification Based on DCM[J];Computer Engineering and Applications;2006-34
8 LI Xue-xiang(School of Software Technology,Zhengzhou University,Zhengzhou 450002,China);Research of Text Categorization Based on Improved Maximum Entropy Algorithm[J];Computer Science;2012-06
9 KANG Pingbo1,WANG Wenjie2 (1. Graduate School of University of Science and Technology of China, Beijing 1 00039; 2. Information Science and Engineering School, Graduate School of Chinese Academ y of Sciences, Beijing 100039);Internet Robot Based on Automatic Classification[J];Computer Engineering;2003-21
10 KANG Pingbo1 , WANG Wenjie2 (1. Graduate School, University of Science and Technology of China, Beijing 100039; 2. Information Science and Engineering School, Graduate School, Chinese Academy of Sciences , Beijing 100039);Search Engine Filter System Based on Automatic Classification[J];Computer Engineering;2004-02
【Citations】
Chinese Journal Full-text Database 3 Hits
1 Jun Wu, Zuoying Wang, Feng Yu, Xia Wang(Department of Electronic Engineering,Tsinghua UniversityBeijing 100084,P.R.China);Automatic Classification of Chinese Texts[J];JOURNAL OF CHINESE INFORMATION PROCESSING;1995-04
2 Zou TaoWang JichengHuang YuanZhang Fuyan Department of Computer Science and TechnologyNanjing UniversityNanjing210093Email:tzou@graphics.nju.edu.cn;The Design and Implementation of an Automatic Chinese DocumentsClassification System[J];JOURNAL OF CHINESE INFORMATION PROCESSING;1999-03
3 Huang Xuanjing, Wu Lide (Dept. of Computer Science, Fudan University, Shanghai 200433);A VECTOR SPACE MODEL BASED DOCUMENT CLASSIFICATION SYSTEM[J];Pattern Recognition and Artificial Intelligence;1998-02
【Co-citations】
Chinese Journal Full-text Database 10 Hits
1 HU Xi-heng (Anshan Normal University,Anshan Liaoning 114007,China);Research and Design of Spam Filtering System Model[J];Journal of Anshan Normal University;2009-02
2 HU Xi-Heng(Department of Mathematics,Anshan Normal University,Anshan Liaoning 114007,China);Application of RBF Network on Text Categorization[J];Journal of Anshan Normal University;2011-02
3 CAO Feng, ZHANG Dai-yuan (School of Computer, Nanjing University of Posts & Telecommunications, Nanjing 210003, China);Research of Text Categorization Technique[J];Computer Knowledge and Technology;2009-32
4 Zhou Shuigeng, Guan Jihong , Hu Yunfa (Computer Science Department, Fudan University, Shanghai 200433) (  School of Informatics Engineering, Wuhan Technical University of Surveying and Mapping, Wuhan 430079);Chinese Document Categorization without Dictionary Support and Segmentation Processing[J];High Technology Letters;2001-03
5 LIU Kai-yi1,JIANG Zhi-Xiong2(1.Guizhou Dafang Power Generation Co.Ltd,Bijie 551600,China;2.Shanghai Haigang Communication Technology Corporation,Shanghai 200043,China);Network Text Classification Based on K-Nearnest neighbor Method[J];Journal of Guizhou University(Natural Science Edition);2009-03
6 JIANG Zhi-xiong, DING Yue-wei ( College of Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093 , China );Network text classification based on K-nearest neighbor method[J];Journal of University of Shanghai For Science and Technology;2005-01
7 Chen Qin Zhang Guoxuan Wang Xiaohua (School of Information Engineering);Reserch on Automatic Fuzzy Classification Methods of Text[J];JOURNAL OF HANGZHOU INSTITUTE OF ELECTRONIC ENGINEERING;1999-04
8 WANG Xiao -hua, ZHANG Guo -xuan, LU Bei (School of Computer, Hangzhou Institute of Electronics Engineering, Hangzhou Zhejiang 310037, China);The Factors of Evaluation Text Categorizing Model[J];Journal of Hangzhou Institute of Electronic Engineering;2002-03
9 CHEN Qin,ZHANG Guo\|xuan,WANG Xiao\|hua,LU Bei,ZHAO Bao\|hua(Department of Computer Science and Technology, Hangzhou Institute of Electronic Engineering, Hangzhou 310037,China) Journal of Zhejiang University;Research on text automatic classification method based on fuzzy pattern recognition.[J];JOURNAL OF ZHEJIANG UNIVERSITY(SCIENCES EDITION);2000-03
10 LEI Ming, LIU Jian Guo, WANG Jian Yong, and CHEN Bao Jue (Department of Computer Sience and Technology, Peking University, Beijing 100871);A MODEL FOR DYNAMIC INFORMATION UPDATING IN LEXICON BASED SEARCH ENGINE[J];JOURNAL OF COMPUTER RESEARCH AND DEVELOPMENT;2000-10
【Co-references】
Chinese Journal Full-text Database 10 Hits
1 LI Xiao-hong1,TIAN Jun-wei2,3 (1.Department of Mathematics and Physics,Xi'an Technological University,Xi'an 710032,China;2.School of Electronic and Information Engineering,Xi'an Jiaotong University,Xi'an 710049,China;3.School of Mechanical and Electronical Engineering,Xi'an Technological University,Xi'an 710032,China);Cluster validity function used for FCM segmentation[J];Journal of Anhui University(Natural Sciences);2007-05
2 ZHONG Xiao-xu1,2,HU Xue-gang 1(1.School of Computer and Information,HeFei University of Technology,HeFei 230009,China;2.AnHui Vocational and Technical college of Communications,HeFei 230051,China);Correlation analysis of web recruitment information based on data mining[J];Journal of Anhui Institute of Architecture & Industry(Natural Science);2010-04
3 LIU Yang;Pay attention to the overrun note emerging undercurrent[J];Telecommunication Construction;2003-03
4 ZHANG Shi-ying (Department of Philosophy, Peking University, Beijing 100871, China);Sameness, Resemblance and Communication: An Exploration on the Ontological Role of Universality[J];Journal of Peking University(Humanities and Social Sciences);2004-03
5 Liu Gaojun Ma Yanzhong Duan Jianyong(Col.of Information Engineering,North China Univ.of Tech.,100144,Beijing,China);Calculating Correlation of Chinese Named Entity Based on Wikipedia[J];Journal of North China University of Technology;2012-01
6 Ding Feng Dong Na Lin Biqin Yuan Baozong (College of Electronics and Information Engineering, Northern Jiaotong University, Beijing 100044);Automatic Segment in Natural Language Processing System[J];JOURNAL OF NORTHERN JIAOTONG UNIVERSITY;1999-06
7 WU Xu,XU De(School of Computer and Information Technology, Northern Jiaotong University, Beijing 100044,China);Research and Implementation of Automatic Text Categorization System Based on VSM[J];Journal of Northern Jiaotong University;2003-02
8 LI Wei, HUANG Ying (School of Information Engineering, Jiangxi University of Science & Technology, Ganzhou 341000, China);Web Information Extraction Based on HtmlPaser[J];Ordnance Industry Automation;2007-07
9 GONG Han-ming,ZHOU Chang-sheng (Department of Computer Science & Automation,Beijing Institute of Machinery, Beijing 100085, China);Chinese word segmentation system research[J];Journal of Beijing Institute of Machinery;2004-03
10 WANG Jian fen, CAO Yuan da (Dept. of Computer Science and Engineering, Beijing Institute of Technology, Beijing100081, China);The Application of Support Vector Machine in Classifying Large Namber of Catalogs[J];Journal of Beijing Institute of Technology;2001-02
【Secondary References】
Chinese Journal Full-text Database 10 Hits
1 CHENG Wei-hua,YOU Jin-yuan(School of Software Engineering,Shanghai Jiaotong University,Shanghai 200030,China);The design and implementation of content-based anti-spam E-mail system[J];Journal of Anhui University(Natural Sciences);2007-03
2 JIANG Zong-li,LI Xian-lei,XU Xue-ke (College of Computer Science,Beijing University of Technology,Beijing 100124,China);Topic Hub Based Meta Search Engine[J];Journal of Beijing University of Technology;2009-03
3 GAO Ying-fan~1,MA Run-bo~2,LIU Yu-shu~1(1.School of Computer Science and Technology,Beijing Institute of Technology,Beijing 100081,China;2.College of Physics and Electronics,Shanxi University,Taiyuan,Shanxi 030006,China);Design and Implementation of a Fast Text Categorization Algorithm[J];Transactions of Beijing Institute of Technology;2006-12
4 HUANG Wen-liang1,2,LI Shi-jian1,LIU Ju-xin1,XU Cong-fu1(1.College of Computer Science,Zhejiang University,Hangzhou 310027,China;2.Zhejiang Branch of China Unicom Corporation Limited,Hangzhou 310006,China);A Large-Scale Online Spam Short Message Filtering System[J];Journal of Beijing University of Posts and Telecommunications;2008-03
5 KE Li1,WANG Ming-wen1*,HE Shi-zhu1,LI Jia1,LUO Yuan-sheng2 (1.College of Computer Information Engineering,Jiangxi Normal University,Nanchang Jiangxi 330027,China;2.Modern Education Technology Center,Jiangxi University of Finance and Economics,Nanchang Jiangxi 330027,China);Web Pages Auto Classification Based on Frequently Co-Occurring Entropy[J];Journal of Jiangxi Normal University(Natural Science Edition);2011-03
6 ZHU Hao-dong1,2,ZHOU Shu1,2,ZHONG Yong1,2 (1.Chengdu Institute of Computer Application,Chinese Academy of Sciences,Chengdu 610041,P.R.China;2.Graduate School,Chinese Academy of Sciences,Beijing 100039,P.R.China);Feature selection combined ODF with discernible sets[J];Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition);2010-01
7 DONG Zhen-xing1,2,LI Rong1,CHEN Long1(1.College of Computer Science and Technology,Chongqing University of Posts and Telecommunications,Chongqing 400065,P.R.China;2.School of Information Science and Technology,SouthWest JiaoTong University,Chengdu 610031,P.R.China);An email filtering method based on active learning and TCM-EKNN[J];Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition);2011-01
8 DENG Wen-tao,WANG Guo-yin,DONG Zhen-xing(Institute of Computer Science and Technology,Chongqing University of Posts and Telecommunications,Chongqing 400065,P.R.China);A personalized E-mail classification method based on improved KNN[J];Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition);2011-06
9 LIU Yang (College of Information Science and Engineering,Bohai University,Jinzhou 121013,China);Study on spam email treatment model based on Bayesian method[J];Journal of Changchun Institute of Technology(Natural Sciences Edition);2007-03
10 ZHANG Jing,HOU Xu-dong,LV He-sheng(School of Electronic Information and Automation,Chongqing University of Technology,Chongqing 400050,China);Development of an Intelligent SMS Analysis System Based on Naive Bayes and Support Vector Machine[J];Journal of Chongqing University of Technology(Natural Science);2010-01
【Secondary Citations】
Chinese Journal Full-text Database 3 Hits
1 Jie Chunyu, Liu Yuan, Liang Nanyuan;On Methods of Chinese Automatic Segmentation[J];Journal of Chinese Information Processing;1989-01
2 Jun Wu, Zuoying Wang, Feng Yu, Xia Wang(Department of Electronic Engineering,Tsinghua UniversityBeijing 100084,P.R.China);Automatic Classification of Chinese Texts[J];JOURNAL OF CHINESE INFORMATION PROCESSING;1995-04
3 Huang Xuanjing Wu Lide Wang Wenxin Ye Danjin (Dept, of Computer Science, Fudan University. Shanghai 200433);A MACHINE LEARNING BASED WORD SEGMENTATION SYSTEM WITHOUT MANUAL DICTIONARY[J];Pattern Recognition and Artificial Intelligence;1996-04
©2006 Tsinghua Tongfang Knowledge Network Technology Co., Ltd.(Beijing)(TTKN) All rights reserved