Full-Text Search:
Home|Journal Papers|About CNKI|User Service|FAQ|Contact Us|中文
《Computer & Digital Engineering》 2011-03
Add to Favorite Get Latest Update

Identification and Inspection of Approximately Duplicated Records Based on Data Warehouse

Peng Lu(Department of Information Engineering,Wuhan University of Science and Technology City College,Wuhan 430083)  
In order to cleaning approximately duplicated records,first it is needed to identify the identity of the same entity,there are two main methods: one is to decide whether it is the same entity actually by comparing the similarity degree between the whole strings;another is to match based on the fields of the records.First set a text similarity,and then decide whether the two records are the same according to it,if the text similarity between records is larger than the pre-specified threshold value,then judge they are repeated,and vice versa.
【CateGory Index】: TP311.13
Download(CAJ format) Download(PDF format)
CAJViewer7.0 supports all the CNKI file formats; AdobeReader only supports the PDF format.
【Citations】
Chinese Journal Full-text Database 6 Hits
1 QIU Yue Feng TIAN Zeng Ping JI Wen Yun ZHOU Ao Ying (Department of Computer Science, Fudan University, Shanghai 200433);An Efficient Approach for Detecting Approximately Duplicate Database Records[J];Chinese Journal of Computers;2001-01
2 CHENG Guo-da, SU Hang-li ( College of Information Engineering, Nanjing University of Finance & Economics, Nanjing Jiangsu 210003, China );Efficient approach for identifying approximately duplicate Chinese database records[J];Computer Applications;2005-06
3 ZHANG Yong 1,2,CHI Zhong-xian2,YAN De-qin1(1.Department of Computer,Liaoning Normal University,Dalian Liaoning 116029,China;2.Department of Computer Science and Engineering,Dalian University of Technology,Dalian Liaoning 116024,China);Approximately duplicated records examining method and its application in ETL of data warehouse[J];Journal of Computer Applications;2006-04
4 YANG Fu xiang,LIU Yun chao,DUAN Zhi hua (The Computer School of Shanghai University,Shanghai 200072,China);An Overview of Data Cleaning[J];Application Research of Computers;2002-03
5 SHE Chunhong1,XU Xiangyang2(1College of Information Science,Jingzhou Normal University,Jingzhou Hubei 434104,China;2Institute of Database & Multimedia Technology,College of Computer Science & Technology,Huazhong University of Science & Technology,Wuhan Hubei 430074,China);The Detection of Approximately Duplicate Records for Relational Database[J];Application Research of Computers;2003-09
6 Chen Wei Wang Hao Zhu Wenming (Nanjing Audit Institute,Nanjing Jiangsu 210029,China);A METHOD OF IMPROVING APPROXIMATELY DUPLCATED RECORDS DETECTION PRECISION[J];Computer Applications and Software;2006-10
【Co-citations】
Chinese Journal Full-text Database 10 Hits
1 YU Tao(Shihezi University,shihezi,xinjiang 832000,China);Issues and Strategies in Academic Affairs and Data Quality in Universities[J];Journal of Bingtuan Education Institute;2008-03
2 GU He(College of Computer Science and Technology,Changchun University,Changchun 130022,China);Interface Design and Development of ETL in Telecommunications Business Data Warehouse Platform[J];Journal of Jilin University(Information Science Edition);2008-06
3 XU Yang①②,FENG Ke-zhong②,MA Ya-ming①③①Institute of Surveying and Mapping,Information Engineering University,Zhengzhou 450052,China;②The Center of Surveying and Mapping,Beijing 100088,China;③63870 Troops,Huayin 714200,China);A cleaning method studying on duplicate records of spatial data[J];Science of Surveying and Mapping;2008-06
4 WANG Zhi-jun, LE Jia-jin (College of Computer Science and Technology,Donghua University, Shanghai,200051);An Approach for Detecting Approximately Duplicate Records In Chinese[J];Journal of Donghua University,Natural Science;2005-02
5 LI Xing-yi1,2,BAO Cong-jian2,SHI Hua-ji2 (1. School of Electronics and Information Engineering, Beijing Jiaotong University Haidian Beijing 100044; 2. School of Computer Science and Telecommunications Engineering, Jiangsu University Zhenjiang Jiangsu 212013);A Method for Detecting Approximately Duplicate Database Records in Data Warehouse[J];Journal of University of Electronic Science and Technology of China;2007-06
6 Zheng Yong,Lu Hanhua,Chen Gong,Zhou Yongcai(Nanjing University of the Posts & Communications,Nanjing 210003,China);An Implementation Scheme for Cognitive Search Model Based on CARBA for Rule Matching[J];Telecommunications Science;2009-07
7 DU Ai-yong,LI Li-shun,ZHU Yuan,XIE Xin-peng(Department of Automobile Engineering,Academy of Military Transportation,Tianjin 300161,China);Research on Eliminating Duplicate Records Based on Chinese Character Code[J];Computer Knowledge and Technology;2009-29
8 CHEN Demin (Fujian Building materials industry school, Fuzhou 350002);Optimization of Four-digit Combination of Four Computing Algorithm Library[J];Computer Programming Skills & Maintenance;2009-22
9 Cheng Zheng Lei Xia Liao Xiang Ma Yikai Bai Xiaoli(School of Electrical & Information,Xihua University,Chengdu 610039);Application of Data Mining in Power Network Safety Evaluation[J];Electrical Engineering;2010-08
10 CAO Jian-jun1,DIAO Xing-chun1,DU Yi2,WANG Fang-xiao1,ZHANG Xiao-yi1(1.The 63rd Research Institute of the PLA General Staff Headquarters,Nanjing 210007,Jiangsu,China;2.Network Management Center,China Electronic System Engineering Company,Beijing 100036,China);Classification Detection of Approximately Duplicate Records Based on Feature Selection Using Ant Colony Algorithm[J];Acta Armamentarii;2010-09
China Proceedings of conference Full-text Database 4 Hits
1 Yu Bo Wang Hongding Tang Shiwei Tong Yunhai School of Electronics Engineering and Computer Science, Peking University, Beijing 100871;Data Quality Analyzing Based on Data Mining: a Survey[A];[C];2004
2 ;A Clustering-Based Algorithm for Detecting Approximately Duplicate Database Recordsof Multi-Language Data[A];[C];2001
3 ;Eliminating Approximately Duplications in Data Warehouse[A];[C];2003
4 ;An Optimized Approach for Detecting Duplicate Records[A];[C];2002
【Secondary Citations】
Chinese Journal Full-text Database 10 Hits
1 ZHAO Yu-hai, LI Qiu-ju(Department of Mathematics,Anshan Normal University,Anshan Liaoning 114005,China);Design and Implementation of OLAP System Based on Relational Database[J];Journal of Anshan Normal University;2005-04
2 Wei Luoxia (Dongguan Institute of Technology);DATA WAREHOUSE AND OLAP[J];Journal of Dongguan Institute of Technology;2000-02
3 Chen Wei,Ding Qiu-lin;Edit distance application in data cleaning and realization with Java[J];Computer and Information Technology;2003-06
4 ;基于数据仓库的OLAP技术的研究[J];Computer Knowledge and Technology;2005-02
5 ;基于SQLServer数据仓库的研究与实现[J];Computer Knowledge and Technology;2005-14
6 ;Module and class module's application in design Visual Basic MIS[J];Computer Knowledge and Technology;2005-17
7 ;Key Factors Affecting Data Warehouse[J];ELECLIONCI ENGINEER;2000-01
8 ;Metadata Management in a Data Warehouse[J];ELECLIONCI ENGINEER;2000-02
9 PAN Dong-jing(Department of Computer,Dezhou University,Dezhou Shandong 253023,China);The design and application of multidimension alanalytical mode based on OLAP[J];Journal of Dezhou University;2004-06
10 CHEN Xiao-hong,MA Liang(School of Business,Central South University,Changsha 410083,China);The Algorithm for Multi-Dimension Data Based on Association Rules[J];Systems Engineering;2005-05
©2006 Tsinghua Tongfang Knowledge Network Technology Co., Ltd.(Beijing)(TTKN) All rights reserved