%0 Journal Article
%T 基于SAM的零样本多模态舌体分割方法<br>Zero-Shot Multimodal Tongue Image Segmentation Based on SAM
%A 钟甫广
%A 邓森耀
%A 曾军英
%A 冯跃
%A 钟甫东
%A 贾旭东
%J Computer Science and Application
%P 29-38
%@ 2161-881X
%D 2025
%I Hans Publishing
%R 10.12677/csa.2025.153055
%X 舌诊通过观察舌体特征评估健康状态&#65292;而舌体分割作为智能舌诊的关键步骤&#65292;需要准确分离舌体与背景&#65292;为后续特征提取和健康分析奠定基础。然而&#65292;舌体分割目前面临着两大挑战&#65306;一是数据的稀缺性&#65292;二是现有的分割大模型(如SAM模型)对人工提示的依赖性。为了解决以上问题&#65292;本文提出了一种零样本多模态的分割方法。该方法结合SAM模型和多模态提示技术&#65292;通过两阶段框架实现&#65306;1) 初步分割和相似度聚类&#65292;利用SAM模型生成初步分割结果&#65292;并通过相似度聚类解码器筛选潜在有效分割&#65307;2) 精细化分割&#65292;利用多模态大语言模型分析舌体特征&#65292;生成精确点提示&#65292;再次输入到SAM模型中以实现高精度分割。该方法在无需特定任务训练或标注数据的情况下&#65292;实现了SAM模型在舌诊领域的智能分割应用。实验结果显示&#65292;相比于原始的SAM模型&#65292;该方法在三个舌诊数据集上的mIoU指标分别提升了27.3%&#65292;18.2%&#65292;29.7%。<br>Tongue diagnosis assesses health status by observing tongue characteristics, and tongue segmentation, as a key step in intelligent tongue diagnosis, requires accurately separating the tongue body from the background to lay a foundation for subsequent feature extraction and health analysis. However, tongue segmentation currently faces two main challenges: data scarcity and the dependency of existing large segmentation models (such as the segment anything model) on manual prompts. To address these issues, this paper proposes a zero-shot multimodal segmentation method. This method combines the SAM model with multimodal prompt techniques and implemented in a two-stage framework: 1) initial segmentation and similarity clustering, where the SAM model generates initial segmentation results, followed by a similarity clustering decoder to filter out potentially effective segmentations; 2) refined segmentation, where a multimodal large language model analyzes tongue characteristics to generate precise point prompts, which are re-entered into the SAM model to achieve high-precision segmentation. This method enables intelligent segmentation with the SAM model in tongue diagnosis without the need for task-specific training or annotated data. Experimental results show that, compared to the original SAM model, this method improves the mIoU metric on three tongue diagnosis datasets by 27.3%, 18.2%, and 29.7%, respectively.
%K 舌体分割&#65292
%K 零样本学习&#65292
%K 多模态大语言模型&#65292
%K 相似度聚类&#65292
%K 医学图像处理<br>Tongue Image Segmentation
%K Zero-Shot Learning
%K Multimodal Large Language Model
%K Similarity Clustering
%K Medical Image Processing
%U http://www.hanspub.org/journal/PaperInformation.aspx?PaperID=109121