Full-Text Search:
Home|Journal Papers|About CNKI|User Service|FAQ|Contact Us|中文
《Journal of Beijing Information Science & Technology University》 2019-01
Add to Favorite Get Latest Update

Reflowable document comprehension method based on fusion features and grammar rules

HAO Haili;LI Ning;TIAN Ying′ai;GENG Si;Computer School,Beijing Information Science & Technology University;  
In order to achieve adequate component identification in reflowable document structure understanding, a new method to understand documents is proposed based on fusion features and grammatical rules. Two vectors are used in the method. One is the format vector representing the format features, such as fonts; the other is the content vector representing text features such as keywords. Then the components to be identified are compared with the candidates by measuring the distance between the vectors with different weights. Finally, based on the candidate labels and grammatical rules, the logic structure of the document is recognized by applying the top-down and bottom-up algorithm. The experiment results show that this method can effectively improve the accuracy of component identification, and in turn improve the accuracy of whole document structure recognition.
【Fund】: 国家重点研发计划项目(2018YFB1004100);; 国家自然科学基金资助项目(61672105)
【CateGory Index】: TP391.1
Download(CAJ format) Download(PDF format)
CAJViewer7.0 supports all the CNKI file formats; AdobeReader only supports the PDF format.
【Citations】
Chinese Journal Full-text Database 6 Hits
1 SONG Hao-su,LI Ning,ZHANG Wei(School of Computer Science,Beijing Information Science and Technology University,Beijing 100101,China);Application of VSM model to document structure identification[J];北京信息科技大学学报(自然科学版);2011-06
2 DANG Xing,GONG Sheng-rong,LIU Quan(School of Computer Science & Technology,Soochow University,Suzhou 215006);Complex Chinese Document Layout Analysis Based on SVM Region Formation[J];计算机工程;2010-12
3 DING Yan-Hui LI Qing-Zhong DONG Yong-Quan PENG Zhao-Hui(School of Computer Science and Technology,Shandong University,Jinan 250014);Semantic Annotation of Web Data Based on Ensemble Learning and 2D Correlative-Chain Conditional Random Fields[J];计算机学报;2010-02
4 ZHANG Wei-feng ZHOU Guo-qiang(Computer Science School,Nanjing University of Posts and Telecommunications,Nanjing,210003,China);The Exploration and Practice of Document Management of Graduation Design[J];教育与教学研究;2009-11
5 Bu Feiyu 1 Liu Changsong 2 Ding Xiaoqing 21 (Institute of Software,Chinese Academy of Sciences,Beijing100080) 2 (State Key Laboratory of Intelligent Technology and Systems ,Electronic Engineering Department ,Tsinghua University,Beijing100084);Distinguish Tables from Graphics in Layout Analysis[J];计算机工程与应用;2004-12
6 HUANG Yu qing QI Guang zhi ZHANG Fu yan(Multimedia Computer Institute Nanjing University Nanjing 210093);Extracting Semi-Structured Information from the WEB[J];软件学报;2000-01
【Co-citations】
Chinese Journal Full-text Database 10 Hits
1 Guo Shaoyou;Dou Chang;Chang Zhen;School of Information Management,Zhengzhou University;;Review on Semantic Annotations of Web Pages[J];情报杂志;2015-04
2 PENG Xin;LI Ning;School of Computer Science,Beijing Information Science and Technology University;;Improved VSM algorithm for judging paragraph logic label[J];北京信息科技大学学报(自然科学版);2014-06
3 LI Wen,ZHENG Bang-xi,DENG Wu(Software Institute,Dalian Jiaotong University,Dalian 116028,China);Research on Web Information Extraction Model Based on XML and DOM Technologies[J];大连交通大学学报;2013-03
4 Sun Ming1 Lu Chunsheng2 Xu Xiuxing1 Li Qingzhong1 Peng Zhaohui1 1(School of Computer Science and Technology,Shandong University,Jinan 250101,Shandong,China) 2(Information Center,Ministry of Human Resources and Social Security of China,Beijing 100716,China);A WEB ENTITY INFORMATION EXTRACTION METHOD BASED ON SVM AND ADABOOST[J];计算机应用与软件;2013-04
5 Wang Yunying(Library of Xiangnan University,Chenzhou 423000);Research on Web-page Semantic Annotation Algorithm Based on PLSA Model[J];情报杂志;2013-01
6 MA Yan-hong,HU Xue-gang,WU Gong-qing(School of Computer and Information,Hefei University of Technology,Hefei 230009,China);URL Attribute Integration Method Based on Link Path Search[J];计算机工程;2013-01
7 ZHANG Chuan-Yan,HONG Xiao-Guang +,PENG Zhao-Hui,LI Qing-Zhong(School of Computer Science and Technology,Shandong University,Ji’nan 250101,China);Extracting Web Entity Activities Based on SVM and Extended Conditional Random Fields[J];软件学报;2012-10
8 Liu Chunjiang et al.;Metadata Extraction from the Open Conference Literatures Acquisition and Service System[J];情报理论与实践;2012-09
9 GUO Xiao-yu,PING Xi-jian,ZHOU Lin (Information Engineering Institute,Information Engineering University,Zhengzhou 450002,China);Fax image layout analysis based on connect components[J];计算机应用研究;2012-08
10 PI Jing SHAO Xiongkai XIAO Yafu(School of Computer Science,Hubei University of Technology,Wuhan 430068);Research on Focused Crawler Based on Naive Bayes Algorithm[J];计算机与数字工程;2012-06
【Secondary Citations】
Chinese Journal Full-text Database 10 Hits
1 XU Dong-feng,PENG Hong-xing,LIAO Jun-jie(College of Informatics,South China Agricultural University,Guangzhou 510642,China);Research and application of document format checking technology based on Java[J];计算机工程与设计;2010-19
2 LIN Xue-yun (Fuqing Branch of Fujian Normal University, Fuqing 350300, China);The Research of Paper Format Intelligent Inspection System[J];电脑知识与技术;2009-33
3 ZHANG Wei-feng ZHOU Guo-qiang(Computer Science School,Nanjing University of Posts and Telecommunications,Nanjing,210003,China);The Exploration and Practice of Document Management of Graduation Design[J];教育与教学研究;2009-11
4 XIAO Han-guang1,CAI Cong-zhong21.School of Mathematics and Physics,Chongqing Institute of Technology,Chongqing 400054,China 2.School of Mathematics and Physics,Chongqing University,Chongqing 400044,China;Comparison study of normalization of feature vector[J];计算机工程与应用;2009-22
5 SUN Ting1,2,GENG Guo-hua1,ZHOU Ming-quan3(1.Institute of Visualization Technology,Northwest University,Xi'an 710069,China;2.Department of Computer,Zhoukou Normal University,Zhoukou 466000,China;3.College of Information Science and Technology,Beijing Normal University,Beijing 100875,China);An Effective Term Weighted Method for Describing Term's Importance[J];郑州大学学报(理学版);2008-04
6 HUANG Jian-Bin JI Hong-Bing SUN He-Li1(School of Computer Science and Technology,Xidian University,Xi'an 710071,China)2(Schnol of Electronic Engineering,Xidian University,Xi'an 710071,China)3(Department of Computer Science and Technology,Xi'an Jiaotong University,Xi'an 710049,China);Integration of Heterogeneous Web Records Using Mixed Skip-Chain Conditional Random Fields[J];软件学报;2008-08
7 LUO Hui-Lan1),2) KONG Fan-Sheng1) LI Yi-Xiao1)1)(Institute of Artificial Intelligence, Zhejiang University, Hangzhou 310027) 2)(School of Information Engineering, Jiangxi University of Science and Technology, Ganzhou, Jiangxi 341000);An Analysis of Diversity Measures in Clustering Ensembles[J];计算机学报;2007-08
8 LI Jie,GAO Xin-bo,JIAO Li-cheng (School of Electronic Engineering,Xidian University,Xi′an,Shaanxi 710071,China);A New Feature Weighted Fuzzy Clustering Algorithm[J];电子学报;2006-01
9 Chen Ming, Ding Xiaoqing, Wu Youshou (Department of Electronic Engineering, Tsinghua University, Beijing 100084);A BOTTOM-UP LAYOUT ANALYSIS ALGORITHM BASED ON MULTI-LEVEL CONFIDENCE[J];模式识别与人工智能;2003-02
10 WU Xu,XU De(School of Computer and Information Technology, Northern Jiaotong University, Beijing 100044,China);Research and Implementation of Automatic Text Categorization System Based on VSM[J];北方交通大学学报;2003-02
©2006 Tsinghua Tongfang Knowledge Network Technology Co., Ltd.(Beijing)(TTKN) All rights reserved