Review on Structural Processing Techniques for Knowledge Graph Data

doi:10.19423/j.cnki.31-1561/u.2026.008

Ship & Boat ›› 2026, Vol. 37 ›› Issue (03): 121-137.DOI: 10.19423/j.cnki.31-1561/u.2026.008

Review on Structural Processing Techniques for Knowledge Graph Data

XIN Dengyue, SHI Xuyang, CHEN Yuxing, WEI Fangsheng, WANG Chong

Marine Design & Research Institute of China, Shanghai 200011, China

Received:2026-01-13 Revised:2026-03-12 Online:2026-06-25 Published:2026-06-29

知识图谱数据结构化处理技术研究综述

辛登月, 施旭阳, 陈宇星, 魏芳盛, 王充

中国船舶及海洋工程设计研究院上海 200011

作者简介:辛登月（1997—）,男,硕士,工程师。研究方向：知识图谱构建研究。施旭阳（1995—）,男,硕士,馆员。研究方向：知识图谱、数据挖掘。陈宇星（1999—）,男,硕士,工程师。研究方向：管理信息系统开发、数据挖掘。魏芳盛（1996—）,男,硕士,工程师。研究方向：管理信息系统开发、数据挖掘。王充（1990—）,男,硕士,高级工程师。研究方向：管理信息系统开发、人工智能。
基金资助:
国家部委重点专项（CBZ01N23-02）

Abstract

Abstract: Data preprocessing is a core step in knowledge graph construction, consisting of two main stages: data collection and information extraction. This paper systematically reviews mainstream data preprocessing methods based on rules and lexicons, statistical machine learning, and deep learning, and thoroughly analyzes their technical principles and application limitations in entity recognition and relation extraction. Existing methods rely heavily on manual rules and suffer from weak semantic generalization, making it difficult to achieve cross-domain knowledge transfer. To address these issues, this paper explores a novel paradigm of “semantic-driven and automated extraction” based on large language models. By generating deep semantic embeddings through pre-trained large language models and combining vector similarity computation, it enables unsupervised and context-aware information extraction, driving the intelligent transformation of knowledge graph construction. The current approach is still in the exploratory stage, facing challenges such as high computational cost and low interpretability. Future research should focus on lightweight model design, multimodal semantic alignment, and domain knowledge integration to improve the efficiency of knowledge graph construction and model interpretability.

Key words: knowledge graph, data preprocessing, entity recognition, relation extraction, large language model

摘要： 数据预处理是知识图谱构建的核心环节,主要包括数据采集与信息提取两大步骤。该文系统梳理了基于规则和词典、统计机器学习与深度学习的主流数据预处理方法,深入剖析了各类方法在实体识别与关系抽取中的技术原理与应用局限性。现有方法普遍依赖人工规则,语义泛化能力弱,难以实现跨领域的知识迁移。为此,文中探讨了基于大语言模型的“语义驱动+自动化抽取”新范式,即通过预训练大语言模型生成深层语义嵌入,结合向量相似度计算实现无监督、上下文感知的信息抽取,从而推动知识图谱构建向智能化转型。当前该方法仍处于探索阶段,面临计算开销大、可解释性差等挑战。未来研究应聚焦于轻量化模型设计、多模态语义对齐与领域知识融合,以提升知识图谱构建效率和模型的可解释性。

关键词: 知识图谱, 数据预处理, 实体识别, 关系抽取, 大语言模型

CLC Number:

TP18
TP391

XIN Dengyue, SHI Xuyang, CHEN Yuxing, WEI Fangsheng, WANG Chong. Review on Structural Processing Techniques for Knowledge Graph Data[J]. Ship & Boat, 2026, 37(03): 121-137.

辛登月, 施旭阳, 陈宇星, 魏芳盛, 王充. 知识图谱数据结构化处理技术研究综述[J]. 船舶, 2026, 37(03): 121-137.

References

[1] 楼建坤,徐蒙源,岳林,等.无人船舶智能航行技术进展与前沿[J].中国舰船研究,2025,20(1):3-14.
[2] 靳渊,楼建坤,王鸿东,等.无人船舶局部路径规划算法综述[J].船舶,2025,36(3):10-22.
[3] 衣正尧,弥思瑶,朱嘉晟,等.船舶智能制造及其智能船舶产品发展趋势的可视化分析[J].船舶工程,2025,47(4):106-115.
[4] 郑佳明,陈家宾,胡杰鑫,等.基于大模型和知识图谱的标准领域融合应用方法研究[J].中国标准化,2023(23):39-46.
[5] 杨延云,胡军.知识图谱构建研究综述[J].现代信息科技,2025,9(8):117-125.
[6] 张吉祥,张祥森,武长旭,等. 知识图谱构建技术综述[J]. 计算机工程,2022,48(3):23-37.
[7] DEVLIN J,CHANG M W,LEE K,et al.BERT:pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies,Minneapolis,USA,2019:4171-4186.
[8] 吴俊,程垚,郝瀚,等. 基于BERT 嵌入BiLSTM-CRF 模型的中文专业术语抽取研究[J].情报学报,2020,39(4):409-418.
[9] HUANG Y,LUO F,WANG X,et al.A one-size-fits-three representation learning framework for patient similarity search[J]. Journal of Data Science and Engineering,2023,8:306-317.
[10] GUO L,LI X,YAN F,et al.A method for constructing a machining knowledge graph using an improved transformer[J]. Journal of Expert Systems with Applications,2024,237:121448.
[11] TONG W,CHU X,JIANG W,et al.A multimodal dual-fusion entity extraction model for large and complex devices[J]. Journal of Computer Communications,2023,210:1-9.
[12] 杨玉婷,胡杰鑫,郑佳明,等.船舶产品标准知识图谱构建研究及应用[J].船舶标准化与质量,2021(6):7-13.
[13] 李冬梅,张扬,李东远,等.实体关系抽取方法研究综述[J].计算机研究与发展,2020,57(7):1424-1448.
[14] 王传栋,徐娇,张永.实体关系抽取综述[J].计算机工程与应用,2020,56(12):25-36.
[15] 李冬梅,罗斯斯,张小平,等.命名实体识别方法研究综述[J].计算机科学与探索,2022,16(9):1954-1968.
[16] 丁建平,李卫军,刘雪洋,等.命名实体识别研究综述[J].计算机工程与科学,2024,46(7):1296-1310.
[17] 刘耀萱. 基于OCR技术的文本数据质量控制研究[J].中国信息化,2024(6):70-72.
[18] 王栋. 人工智能OCR技术的应用研究[J].电子技术与软件工程,2022(1):122-125.
[19] 王珂,杨芳,姜杉.光学字符识别综述[J].计算机应用研究,2020,37(增刊2):22-24.
[20] MORI S,SUEN C Y,YAMAMOTO K.Historical review OCR research and development[J]. Proceedings of the IEEE,1992,80(7):1029-1058.
[21] 郭凯威,杨奎武,张万里,等.面向文本识别的对抗样本攻击综述[J].中国图象图形学报,2024,29(9):2672-2691.
[22] LONG J,SHELHAMER E,DARRELL T.Fully convolutional networks for semantic segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,39(4):640-651.
[23] REDMON J,DIVVALA S,GIRSHICK R,et al.You only look once:unified,real-time object detection[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition,Las Vegas,USA,2016:779-788.
[24] WANG X B,JIANG Y Y,LUO Z B,et al.Arbitrary shape scene text detection with adaptive text region representation[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach,USA,2019:6449-6458.
[25] LIAO M H,ZOU Z S,WAN Z Y,et al.Real-time scene text detection with differentiable binarization and adaptive scale fusion[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2023,45(1):919-931.
[26] TIAN Z,HUANG W L,HE T,et al.Detecting text innatural image with connectionist text proposal etwork[C]//Proceedings of the 14th European Conference on Computer Vision,Amsterdam,the Netherlands,2016:56-72.
[27] KRUPKA G R.SRA:description of the SRA system as used for MUC-6[C]//Proceedings of the 6th Conference on Message Understanding,Columbia,1995:221-235.
[28] 王宁,葛瑞芳,苑春法,等.中文金融新闻中公司名的识别[J].中文信息学报,2002(2):1-6.
[29] BORTHWICK A,STERLING J,AGICHTEIN E,et al.NYU:description of the MENE named entity system as used in MUC-7[C]//Proceedings of the 7th Message Understanding Conference,Virginia,1998.
[30] ZHOU G D,SU J.Named entity recognition using an HMM-based chunk tagger[C]//Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics,Philadelphia,2002:473-480.
[31] TEIXEIRA J,SARMENTO L,OLIVEIRA E C.A bootstrapping approach for training a NER with conditional random fields[C]//LNCS 7026:Proceedings of the 15th Portuguese Conference on Artificial Intelligence, Lisbon,Portugal,2011:664-678.
[32] THENMALAR S,BALAJI J,GEETHA T.Semi-supervised bootstrapping approach for named entity recognition[J].arXiv,2015,1511.06833.
[33] ETZIONI O,CAFARELLA M,DOWNEY D,et al.Unsupervised named-entity extraction from the web:an experimental study[J]. Artificial Intelligence,2005,165(1):91-134.
[34] NADEAU D,TURNEY P D,MATWIN S.Unsupervised named-entity recognition:generating gazetteers and resolving ambiguity[C]//Proceedings of the 19th Conference of the Canadian Society for Computational Studies of Intelligence,Quebec,2006:266-277.
[35] 张继元,钱育蓉,冷洪勇,等.基于深度学习的命名实体识别研究综述[J].现代电子技术,2024,47(6):32-42.
[36] COLLOBERT R,WESTON J,BOTTOU L,et al.Natural language processing(almost) from scratch[J].Journal of Machine Learning Research,2011,12(1):2493-2537.
[37] YAO L,LIU H,LIU Y,et al.Biomedical named entity recognition based on deep neutral network[J]. International Journal of Hybrid Information Technology,2015,8(8):279-288.
[38] STRUBELL E,VERGA P,BELANGER D,et al.Fast and accurate entity recognition with iterated dilated convolutions[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing,Copenhagen,2017:2670-2680.
[39] WU F Z,LIU J X,WU C H,et al.Neural Chinese named entity recognition via CNN-LSTM- CRF and joint training with word segmentation[C]//Proceedings of the World Wide Web Conference,San Francisco,USA,2019:3342-3348.
[40] GUI T,MA R T,ZHANG Q,et al.CNN-based Chinese NER with lexicon rethinking[C]//Proceedings of the 28th International Joint Conference on Artificial Intelligence,Macao,China,2019:4982-4988.
[41] KONG J,ZHANG L X,JIANG M,et al.Incorporating multi-level CNN and attention mechanism for Chinese clinical named entity recognition[J].Journal of Biomedical Informatics,2021,116:103737-103743.
[42] HUANG Z H,XU W,YU K.Bidirectional LSTM-CRF models for sequence tagging[J].arXiv,2015,1508.01991.
[43] LIU W,XU T G,XU Q H,et al.An encoding strategy based word- character LSTM for Chinese NER[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies,Minneapolis,USA,2019:2379-2389.
[44] LI F,WANG Z,HUI S C,et al.Modularized interaction network for named entity recognition[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Stroudsburg,USA,2021:267-277.
[45] 后同佳,周良. 基于双向GRU神经网络和注意力机制的中文船舶故障关系抽取方法[J].计算机科学,2021,48(增刊2):154-158.
[46] 宋邓强,周彬,申兴旺,等. 面向船舶分段制造过程的动态知识图谱建模方法[J].上海交通大学学报,2021,55(5):544-556.
[47] APPELT D E,HOBBS J R,BEAR J,et al.SRI international FASTUS system :MUC-6 test results and analysis[C]//Proceedings of the 6th Message Understanding Conference,Columbia,Maryland,1995:237-248.
[48] MILLER S,FOX H,RAMSHAW L,et al.A novel use of statistical parsing to extract information from text[C]// 6th Applied Natural Language Processing Conference:1st Meeting of the North American Chapter of the Association for Computational Linguistics:Proceedings of the Conferences and Proceedings of the ANLP-NAACL 2000 Student Research Workshop,Seattle,USA,2000:226-233.
[49] 李冬梅,张扬,李东远,等. 实体关系抽取方法研究综述[J].计算机研究与发展,2020,57(7):1424-1448.
[50] 武文雅,陈钰枫,徐金安,等. 中文实体关系抽取研究综述[J]. 计算机与现代化,2018(8):21-27.
[51] ZHANG Y M,ZHOU J F.A trainable method for extracting chinese entity names and their relations[C]//Proceedings of the Second Chinese Language Processing Workshop,Hong Kong,China,2000:66-72.
[52] SUN X,DONG L H.Feature-based approach to chinese tern relation extraction[C]//Proceedings of the International Conference on Signal Processing Systems, Piscataway,USA,2009:410-414.
[53] ZHU Z.Weakly - supervised relation classification for information extraction[C]//Proceedings of the Thirteenth ACM Conference on Information and Knowledge Management,Washington,2004:581-588.
[54] 张佳宏. 基于半监督学习的文本实体关系抽取研究[D].哈尔滨:哈尔滨工程大学,2017.
[55] 任乐,张仰森,刘帅康. 基于深度学习的实体关系抽取研究综述[J].北京信息科技大学学报(自然科学版),2023,38(6):70-79.
[56] SOCHER R,HUVAL B,MANNING C D,et al.Semantic compositionality through recursive matrix-vector spaces[C]//Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning,Jeju Island,Korea,2012:1201-1211.
[57] XU K,FENG Y S,HUANG S F,et al.Semantic relation classification via convolutional neural networks with simple negative sampling[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing,Lisbon,Portugal,2015:526-540.
[58] 何山,肖晰,张嘉玲. 面向领域知识图谱的实体关系抽取模型仿真[J].吉林大学学报(理学版),2025,63(2):185-191.
[59] MINTZ M,BILLS S,SNOW R,et al.Distant supervision for relation extraction without labeled data[C]//Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP,Singapore,2009:1003-1011.
[60] ZENG D J,LIU K,LAI S W,et al.Relation classification via convolutional deep neural network[C]//Proceedings of the 25th International Conference on Computational Linguistics,Dublin,Ireland,2014:2335-2344.
[61] TAKANOBU R,ZHANG T Y,LIU J X,et al.A hierarchical framework for relation extraction with reinforcement learning[C]//Proceedings of the 33rd AAAI Conference on Artificial Intelligence,Honolulu,USA,2019:7795-7802.
[62] 黄勃,吴申奥,王文广,等. 图模互补:知识图谱与大模型融合综述[J].武汉大学学报(理学版),2024,70(4):397-412.
[63] 王帅,何文春,王甫棣,等. 大语言模型融合知识图谱与向量检索的问答系统[J].科学技术与工程,2024,24(32):13902-13910.
[64] ZHANG B W,SOH H S.Extract,define,canonicalize:an LLM-based framework for knowledge graph construction[C]//Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing,Miami,USA,2024:1120-1134.
[65] PAN S R,LUO L H,WANG Y F,et al.Unifying large language models and knowledge graphs:a roadmap[J].IEEE Transactions on Knowledge and Data Engineering,2024,36(7):3580-3599.

Metrics

Tel: (021)63161688-105006
Fax: (021)63151255
E-mail: chuanbo@maric.com.cn

Copyright ©Ship & Boat
website: http://chuanbo.magtechjournal.com
京ICP备 05021913号
Advertising Business License Number: 3100120200001 Visited:

Review on Structural Processing Techniques for Knowledge Graph Data

知识图谱数据结构化处理技术研究综述

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 3

Recommended Articles

Metrics

[1]	KANG Yirou, CHEN Peng, CHENG Zhengshun, HU Zhiqiang. Review of Artificial Intelligence Technology Applications in Offshore Wind Turbines [J]. Ship & Boat, 2023, 34(05): 12-23.
[2]	. [J]. Ship & Boat, 2014, 25(06): 103-112.
[3]	. [J]. Ship & Boat, 2004, 15(05): 53-56.