Full-Text Search:
Home|Journal Papers|About CNKI|User Service|FAQ|Contact Us|中文
《Journal of Computer Research and Development》 2005-01
Add to Favorite Get Latest Update

Using Maximum Entropy Model for Chinese Text Categorization

Li Ronglu, Wang Jianhui, Chen Xiaoyun, Tao Xiaopeng, and Hu Yunfa (Department of Computing and Information Technology, Fudan University, Shanghai 200433)  
With the rapid development of World Wide Web, text classification has become the key technology in organizing and processing large amount of document data. Maximum entropy model is a probability estimation technique widely used for a variety of natural language tasks. It offers a clean and accommodable frame to combine diverse pieces of contextual information to estimate the probability of a certain linguistics phenomena. This approach for many tasks of NLP perform near state-of-the-art level, or outperform other competing probability methods when trained and tested under similar conditions. However, relatively little work has been done on applying maximum entropy model to text categorization problems. In addition, no previous work has focused on using maximum entropy model in classifying Chinese documents. Maximum entropy model is used for text categorization. Its categorization performance is compared and analyzed using different approaches for text feature generation, different number of feature and smoothing technique. Moreover, in experiments it is compared to Bayes, KNN and SVM, and it is shown that its performance is higher than Bayes and comparable with KNN and SVM. It is a promising technique for text categorization.
【Fund】: 国家自然科学基金项目(60173027)
【CateGory Index】: TP391.1
Download(CAJ format) Download(PDF format)
CAJViewer7.0 supports all the CNKI file formats; AdobeReader only supports the PDF format.
【References】
Chinese Journal Full-text Database 10 Hits
1 YU Jiang-de1,LI Xue-yu1,FAN Xiao-zhong2,and PANG Wen-bo2 (1. School of Computer and Information Engineering,Anyang Normal University Anyang Henan 455002; 2. School of Computer Science and Technology,Beijing Institute of Technology Haidian Beijing 100081);Event Classification Based on Maximum Entropy Model[J];Journal of University of Electronic Science and Technology of China;2010-04
2 Gao Feng et al;Illegitimate Contents Recognition based on Maximun Entropy Model[J];Computer Development & Applications;2009-01
3 QU Zhi-yi,LI Yi-wei,ZHANG Yan-tang,YANG Shu-guang,ZHANG Fei-fei (School of Information Science-Engineering,Lanzhou University,Lanzhou 730000,China);Maximum Entropy Text Classification Based on Key Duplication Semantic[J];Journal of Guangxi Normal University(Natural Science Edition);2007-04
4 CHEN Wen-qing~1 , LI Qin~2, YAO Jia-hua~3(1. Department of Teaching Technology, Zhanjiang Normal College, Zhanjiang 524048, China; 2. South China University of Technology, Network Engineering and Research Center, Guanzhou 510640,China;3. School of Mathematics and Computation Science, Zhanjiang Normal College,Zhanjiang 524048, China );The Spam Email Filter Technology Based on Maximum Entropy Modeling[J];Journal of Guangxi Teachers College;2005-01
5 Shang Wenqian,Huang Houkuan,Liu Yuling,Lin Yongmin,Qu Youli,and Dong Hongbin (School of Computer and Information Technology,Beijing Jiaotong University,Beijing 100044);Research on the Algorithm of Feature Selection Based on Gini Index for Text Categorization[J];Journal of Computer Research and Development;2006-10
6 SI Guang-tao1,2,LI Pei-feng2,ZHU Qiao-ming2,LI Jun-hui2 (1.School of Computer Science,Qufu Normal University,Rizhao,Shandong,276826,China; 2.School of Computer Science and Technology,Soochow University,Suzhou,Jiangsu,215006,China);Research on Mail Filtering System Based on Maximum Entropy Model[J];Computer Engineering and Applications;2006-32
7 JIA Ning1,2 1.Graduate University of Chinese Academy of Sciences,Beijing 100039,China 2.Institute of Acoustics,Chinese Academy of Sciences,Beijing 100080,China;Using concept primitive feature for text classification[J];Computer Engineering and Applications;2007-01
8 LI Jun-hui,LI Pei-feng,ZHU Qiao-ming,QIAN Pei-de School of Computer Science and Technology,Suzhou University,Suzhou,Jiangsu 215006,China;Email categorization with maximum entropy model[J];Computer Engineering and Applications;2007-35
9 FANG Wei,HUANG Li,CUI Zhi-ming. 1.Jiangsu Key Laboratory of Computer Information Processing Technology, Soochow University, Suzhou, Jiangsu 215006, China 2.Institute of Intelligent Information Processing and Application, Soochow University, Suzhou, Jiangsu 215006, China;Automatic identifying query interfaces of deep Web with maximum entropy classifier[J];Computer Engineering and Applications;2008-21
10 ZHANG Yong-kui1,2,GAO Feng1 1.Faculty of Computer & Information Technology,Shanxi University,Taiyuan 030006,China 2.Key Laboratory of Ministry of Education for Computation Intelligence and Chinese Information Processing,Taiyuan 030006,China;Feature selection for illegitimate contents recognition[J];Computer Engineering and Applications;2010-02
China Proceedings of conference Full-text Database 6 Hits
1 Gu Bo , Liu Kaiying School of Computer and Information Technology, Shanxi University, Taiyuan 030006;A Comparison of Decision Tree and Maximum Entropy in Chinese Text Classification[A];[C];2005
2 Li Junhui Zhu Qiaoming Li Peifeng School of Computer Science and Technology, Suzhou University, Suzhou 215006;Approach to Chinese Text Categorization Based on Maximum Entropy Model[A];[C];2005
3 WANG Suge, YANG Junling, ZHANG Wu, LI Deyu, PENG Qiwei School of Computer Engineering and Science, Shanghai University, Shanghai 200072; School of Mathematics Science, Shanxi University, Taiyuan 030006; School of Computer & Information Technology, Shanxi University, Taiyuan 030006;Maximum Entropy Model for Identifying Chinese Verb-Verb Collocation[A];[C];2006
4 ZHANG Wei, SUN Le, FENG Yuan-yong, LV Yuan-hua Open System and Chinese Information Processing Center, Institute of Software Chinese Academy of Sciences. Graduate University of Chinese Academy of Sciences. Beijing 100080. China;A New Chinese Input Method Combined With Classification Model[A];[C];2006
5 HUANG Yun-ping~(1,2)SUN Le~1 LI Wen-bo~(1,) ~1 Institute of Software,Chinese Academy of Sciences,Beijing 100190 ~2 Graduate University of the Chinese Academy of Sciences,Beijing 100049);Research on Graph-based Contextual Text Representation for Text Classification[A];[C];2008
6 Li Yuelun,Li Xiang,Chang Baobao,Yuan Yulin Institute of Computational Linguistics,Peking University,Beijing,100871;A Text Categorization Method Based on Cognitive Situations[A];[C];2010
【Citations】
Chinese Journal Full-text Database 1 Hits
1 LI Rong Lu and HU Yun Fa (Department of Computing and Information Technology, Fudan University, Shanghai 200433);A Density-Based Method for Reducing the Amount of Training Data in kNN Text Classification[J];Journal of Computer Research and Development;2004-04
【Co-citations】
Chinese Journal Full-text Database 10 Hits
1 SHI Lei et al(College of Information and Management Science,Henan Agricultural University,Zhengzhou,Henan 450002);Application of Ensemble Learning Technique in Agriculture[J];Journal of Anhui Agricultural Sciences;2008-26
2 SHI Lei et al (College of Information and Management Science,Henan Agricultural University,Zhengzhou,Henan 450002);Research on the Classification of Agricultural Data Based on Support Vector Machine[J];Journal of Anhui Agricultural Sciences;2009-05
3 SHI Lei et al(College of Information and Management Science,Henan Agricultural University,Zhengzhou,Henan 450002);Research on the Diagnosis of Soybean Diseases Based on Naive Bayes Algorithm[J];Journal of Anhui Agricultural Sciences;2009-11
4 LIU Xiao-zhi,HUANG Hou-kuan,SHANG Wen-qian(School of Computer and Information Technology,Beijing Jiaotong University, Beijing 100044,China);Feature Selection with Term Library[J];Journal of Beijing Jiaotong University;2006-02
5 SUN Jian, WANG Wei, ZHONG Yi xin (Information Engineering School, Beijing University of Posts and Telecommunications, Beijing 100876, China);Automatic Text Categorization Based on K-Nearest Neighbor[J];Journal of Beijing University of Posts and Telecommunications;2001-01
6 LI Ning,XU Hong(Dept.of Computers,CUIT,Chengdu 610225,China);Application of semantic smoothing based on categorization to language model[J];Journal of Chengdu University of Information Technology;2008-03
7 Xiong Xiaomei Liu Yonglang (Jiangxi BlueSky University,Nanchang 330098);Application of quadratic dimension reduction method based on LSA in classification of the chinese legal text[J];Electronic Measurement Technology;2007-10
8 Zheng De-quan Li Sheng Zhao Tie-jun Yu Hao (MOE-MS Key Laboratory of Natural Language Processing and Speech, Harbin Institute of Technology, Harbin 150001, China);Research on Automatic Text Classification Based on a Hybrid Language Model[J];Journal of Electronics & Information Technology;2007-03
9 SHI Lei,HU Xiao-hong,XI Lei(College of Information and Management Science,HeNan Agricultural University,Henan Zhengzhou 450002);Naive Bayes Classification Algorithm and its Application Research[J];CD Technology;2008-11
10 TANG Yi-fang,NIU Li,FU Sai-xiang,YAN Xiao-wei(The Key Laboratory of Intelligent Information Processing,Institute of Computing Technology,Beijing 100080,China; Department of Computer Science,Guangxi Normal University,Guilin 541004,China);AUTOMATED TEXT CLASSIFICATION[J];Journal of Guangxi Normal University(Natural Science);2001-04
China Proceedings of conference Full-text Database 10 Hits
1 ZHU Yan-hui, WANG Ping, ZHOU Yong-mei (Department of Computer Science and Technology, Hunan University of Technology, Zhuzhou 412008 China);An Automatic Chinese Web Information Retrieving System Based on Agent[A];[C];2006
2 Chenggen Shi and Jie Lu Faculty of Information Technology, University of Technology, Sydney Po Box 123, Broadway, NSW 2007, Australia;An Information Retrieval Model by Using Weighting Technology[A];[C];2003
3 CHEN Qing-xuan,ZHENG De-quan,ZHEN Bo-wen,ZHAO Tie-jun,LI Sheng (MOE-MS Key Laboratory of Natural Language Processing and Speech,Harbin Institute of Technology,Harbin 150001,China);Text feature selection based on document frequency distribution for Chinese text classification[A];[C];2010
4 Nuanwan Soonthornphisaj, Kanokwan Chaikulseriwat, Piyanan Tang-On Department of Computer Science,Faculty of Science, Kasetsart University Bangkok, Thailand;Anti-Spam Filtering: A Centroid-Based Classification Approach[A];[C];2002
5 SHI Hong-Bo;WANG Zhi-Hai;HUANG Hou-Kuan;Jing Li-Ping School of Computer and Information Technology, Northem Jiaotong University, Beiing, 100044;Text Classification Based on the TAN Model[A];[C];2002
6 Huang Ke;Ma Shaoping State Key Lab of Intelligent Technology and Systems, Department of Computer Science and Technology, Tsinghua University, 100084, Beijing, China;Text Categorization Based On Concept Indexing and Principal Component Analysis[A];[C];2002
7 Son Doan and Susumu Horiguchi Graduate School of Information Science Japan Advance Institute of Science and Technology Asahidai 1-1, Tatsunokuchi, Ishikawa 923-1292, Japan Graduate School of Information Science Tohoku University, Aoba 09, Sendai, 980-8579, Japan;A COMPARATIVE STUDY OF ROCHIO AND NAIVE BAYES ALGORITHMS ON REUTERS DATASET IN TEXT CATEGORIZATION[A];[C];2005
8 Zhou Xuezhong Fang Qing Wu Zhaohui College of Computer Science,Zhejiang University,Hangzhou 310027;A Comparative Study on Text Representation and Classifiers in Chinese Text Categorization[A];[C];2003
9 Liu Gongshen Li Jianhua Li Shenghong (School of Information Security Engineering,Shanghai Jiantong University,Shanghai 200030);New Feature Selection and Weighting Methods Based on Category Information[A];[C];2004
10 Wuzheng Lv Xiaoli Jin Yaohong ( Linguistry Management Institute&Com.Ltd, Dazheng, Beijing, Beijing, 100081 ) ( Institute of Acoustics, CAS, Beijing, 100080);Discussion about Introducing HNC Domain into Text Categorization[A];[C];2005
【Secondary References】
Chinese Journal Full-text Database 10 Hits
1 LI Wen-bin1,2,4,LIU Chun-nian2,ZHONG Ning2,3(1.School of Information Engineering,Shijiazhuang University of Economics,Shijiazhuang 050031,China;2.College of Electronic and Control Engineering,Beijing University of Technology,Beijing 100124,China;3.Department of Life Science and Informatics,Maebashi Institute of Technology,Gunmaken 371-0816,Japan;4.School of Software,Hebei Normal University,Shijiazhuang 050016,China);Combining Classifiers Based on Two-phase Ensemble Learning[J];Journal of Beijing University of Technology;2010-03
2 ZHOU Shi-bin1,2,BAI Jing-hua2,LIU Yu-shu2(1.School of Computer Science,China University of Mining and Technology,Xuzhou,Jiangsu 221116,China; 2.School of Computer Science and Technology,Beijing Institute of Technology,Beijing 100081,China);Text Classification Based on Kernel Neighbor Algorithm on Statistical Manifold[J];Transactions of Beijing Institute of Technology;2010-03
3 Chen Shuang1 Chen Fu2 Du Tiancang2(1 School of Computer Science,Northwestern Polytechnical University,Xi'an 710072;2 Department of Automation,Beijing Institute of Petrochemical Technology,Beijing 102617);The Design and the Implementation of Net Information Gathering with Heuristic Method[J];Journal of Beijing Institute of Petro-Chemical Technology;2007-04
4 HUANG Wen-liang1,2,LI Shi-jian1,LIU Ju-xin1,XU Cong-fu1(1.College of Computer Science,Zhejiang University,Hangzhou 310027,China;2.Zhejiang Branch of China Unicom Corporation Limited,Hangzhou 310006,China);A Large-Scale Online Spam Short Message Filtering System[J];Journal of Beijing University of Posts and Telecommunications;2008-03
5 ZENG Li-mei(College of Computer Science and Technology,Chongqing University of Posts and Telecommunications,Chongqing 400065,P.R.China);Categorization of master thesis based on text data mining[J];Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition);2010-05
6 YIN Hongwei1,ZHAO Wei1,2,YANG Zhiwei1 (1.School of Computer Science and Technology,Chanchun University of Technology Changchun 130012; 2 School of Information and Technology,Jilin Agriculture University,Changchun 130118);The Application of Ant-Colony-Algorithm to the Knn Text Classification[J];Journal of Changchun University of Science and Technology(Natural Science Edition);2010-01
7 ZHANG Yuan①, CHEN Liang①, WANG Wen-zhong①,WANG Jun-zhan②(①State Key Laboratory Of Hydrology-Water Resource And Hydraulic Engineering of Hohai University, Nanjing 210098,China; ②College of Civil Engineering, Hohai University, Nanjing 210098,China);Multi-source feature data selection for land cover classification using remote sensing image[J];Science of Surveying and Mapping;2009-02
8 Zhang Yizhuo, Liu Yaqiu, Sun Liping(College of Mechanical and Electrical Engineering, Northeast Forestry University, Harbin 150040, P. R. China);Identification Method of Glue Dosing Proportion for MDF Based on Adaptive GA-SVR Algorithm[J];Journal of Northeast Forestry University;2008-09
9 Ma Jialin Zhang Guizhu Liu Jinling;Disscussion on Differences and Similarities of Text Categorization in Chinese and English[J];Computer Study;2011-02
10 CHEN Si 1,QIAN Mingyu 2,LIU Changming 3(1.Ideal Institute of Information and Technology in Northeast Normal University,Changchun130117; 2.Ideal Institute of Information and Technology in Northeast Normal University,Changchun 130117;3. Aviation University Of Air Force , Changchun 130022);The research progress of Text Categorization Techniques[J];Computer Programming Skills & Maintenance;2009-S1
China Proceedings of conference Full-text Database 10 Hits
1 HUANG Jin-yan XU Jia-yi YANG Jian-ping (School of Applied Mathematics,University of Electronic Science and Technology of China,Chengdu 610000,China);Spam Filtering Based on Dendritic Cell Algorithm[A];[C];2009
2 Huang Wenliang1,2 Li Shijian1 Liu Jiuxin1 Xu Congfu1 (1 College of Computer Science, Zhejiang University, Hangzhou, Zhejiang, 310027; 2 Zhejiang Branch of China Unicom Corporation Lid, Hangzhou, Zhejiang, 310006);The Designing and Realizing of Large-Scale Online Spam Message Filtering System[A];[C];2008
3 Liao Kai-Ji, Zheng Chao-Ran, Xi Yun-Jiang Knowledge Management Research Centre of South China University of Technology, Guangzhou, Guangdong, China;A method and its application for Business-Process oriented knowledge classification[A];[C];2009
4 Bai Yu Cai Dong-feng Zhao Huan-yu Ji Duo NLPLab,Shenyang Institute of Aeronautical Engineering,Shenyang 110034;Semantic Computation-based Similar Questions Finding[A];[C];2007
5 Li Wenbo~(1,2) Sun Le~1 Huang Ruihong~1 Feng Yuanyong~1 Zhang Dakun~1 (1.Institute of Software,CAS,Beijing 100080; 2.Graduate School of the CAS,Beijing 100049);Text Classification Based on Labeled-LDA Model[A];[C];2007
6 Wang Hui Zuo Wanli (Jilin University,Changchun 130012);Using Centroid Vector to Build an Incremental Classifier[A];[C];2007
7 Yan Xu Bin Wang JinTao Li ChunMing Sun (Institute of Computing Technology,Chinese Academy of Sciences,Beijing,100080);A knowledge Gain-Based Feature Selection Method[A];[C];2007
8 Liu Jian Zhang Wei ming (School of Information System and Management,Nation University of Defense Technology,ChangSha,HuNan,410073);Study on Concept Sequence Based text content analysis[A];[C];2007
9 HUANG Yun-ping~(1,2)SUN Le~1 LI Wen-bo~(1,) ~1 Institute of Software,Chinese Academy of Sciences,Beijing 100190 ~2 Graduate University of the Chinese Academy of Sciences,Beijing 100049);Research on Graph-based Contextual Text Representation for Text Classification[A];[C];2008
10 SunXiongyong LuoXiao Tongfang Knowledge Network Technology(Beijing)Co.,Ltd.Beijing 100084;Text Automatic Classification Based on Chinese Library Classification[A];[C];2008
©2006 Tsinghua Tongfang Knowledge Network Technology Co., Ltd.(Beijing)(TTKN) All rights reserved