Ship & Boat ›› 2026, Vol. 37 ›› Issue (03): 121-137.DOI: 10.19423/j.cnki.31-1561/u.2026.008

Previous Articles    

Review on Structural Processing Techniques for Knowledge Graph Data

XIN Dengyue, SHI Xuyang, CHEN Yuxing, WEI Fangsheng, WANG Chong   

  1. Marine Design & Research Institute of China, Shanghai 200011, China
  • Received:2026-01-13 Revised:2026-03-12 Online:2026-06-25 Published:2026-06-29

知识图谱数据结构化处理技术研究综述

辛登月, 施旭阳, 陈宇星, 魏芳盛, 王充   

  1. 中国船舶及海洋工程设计研究院 上海 200011
  • 作者简介:辛登月(1997—),男,硕士,工程师。研究方向:知识图谱构建研究。施旭阳(1995—),男,硕士,馆员。研究方向:知识图谱、数据挖掘。陈宇星(1999—),男,硕士,工程师。研究方向:管理信息系统开发、数据挖掘。魏芳盛(1996—),男,硕士,工程师。研究方向:管理信息系统开发、数据挖掘。王 充(1990—),男,硕士,高级工程师。研究方向:管理信息系统开发、人工智能。
  • 基金资助:
    国家部委重点专项(CBZ01N23-02)

Abstract: Data preprocessing is a core step in knowledge graph construction, consisting of two main stages: data collection and information extraction. This paper systematically reviews mainstream data preprocessing methods based on rules and lexicons, statistical machine learning, and deep learning, and thoroughly analyzes their technical principles and application limitations in entity recognition and relation extraction. Existing methods rely heavily on manual rules and suffer from weak semantic generalization, making it difficult to achieve cross-domain knowledge transfer. To address these issues, this paper explores a novel paradigm of “semantic-driven and automated extraction” based on large language models. By generating deep semantic embeddings through pre-trained large language models and combining vector similarity computation, it enables unsupervised and context-aware information extraction, driving the intelligent transformation of knowledge graph construction. The current approach is still in the exploratory stage, facing challenges such as high computational cost and low interpretability. Future research should focus on lightweight model design, multimodal semantic alignment, and domain knowledge integration to improve the efficiency of knowledge graph construction and model interpretability.

Key words: knowledge graph, data preprocessing, entity recognition, relation extraction, large language model

摘要: 数据预处理是知识图谱构建的核心环节,主要包括数据采集与信息提取两大步骤。该文系统梳理了基于规则和词典、统计机器学习与深度学习的主流数据预处理方法,深入剖析了各类方法在实体识别与关系抽取中的技术原理与应用局限性。现有方法普遍依赖人工规则,语义泛化能力弱,难以实现跨领域的知识迁移。为此,文中探讨了基于大语言模型的“语义驱动+自动化抽取”新范式,即通过预训练大语言模型生成深层语义嵌入,结合向量相似度计算实现无监督、上下文感知的信息抽取,从而推动知识图谱构建向智能化转型。当前该方法仍处于探索阶段,面临计算开销大、可解释性差等挑战。未来研究应聚焦于轻量化模型设计、多模态语义对齐与领域知识融合,以提升知识图谱构建效率和模型的可解释性。

关键词: 知识图谱, 数据预处理, 实体识别, 关系抽取, 大语言模型

CLC Number: