面向平面扫描图像的用户定制意图理解智能体  

User-customized intention understanding agent for planar scanned images

在线阅读下载全文

作  者:冯弋珂 励雪巍 刘鹏伟 郭丰俊 龙腾 李玺 Feng Yike;Li Xuewei;Liu Pengwei;Guo Fengjun;Long Teng;Li Xi(College of Software Technology,Zhejiang University,Ningbo 315048,China;College of Computer Science and Technology,Zhejiang University,Hangzhou 310027,China;Shanghai Hehe Information Technology Co.,Ltd.,Shanghai 200072,China)

机构地区:[1]浙江大学软件学院,宁波315048 [2]浙江大学计算机科学与技术学院,杭州310027 [3]上海合合信息科技股份有限公司,上海200072

出  处:《中国图象图形学报》2025年第1期198-211,共14页Journal of Image and Graphics

基  金:国家自然科学基金项目(62441602);国家自然科学基金项目(U20A20222);国家杰出青年科学基金项目(62225605);浙江省重点研发计划资助(2023C03196);INTSIG合作研究项目。

摘  要:目的移动端应用中对平面扫描图像的用户意图理解是常见的现实需求,传统方法主要是利用大量用户历史行为数据进行建模,预测用户对新图像的意图,但该应用场景面临着定制化问题、交互次数少等挑战,限制了传统方法的效果。而近几年出现的智能体方法可以较好地应对这些挑战,为定制意图理解任务提供了新的思路。基于此,提出了一个面向平面扫描图像的用户定制意图理解智能体。方法智能体由任务感知、任务规划、任务执行与反馈校正模块构成,并针对方法面临的小样本增量问题以及计算资源有限、基准数据集不足等技术挑战,首先提出了“分而治之”的域泛化方法,将基任务与定制化任务的推理解耦,使其互不影响。其次通过模板匹配进行意图理解,以实现无需微调即可应对新的定制化任务的功能。然后通过自提升策略减少意图理解结果噪声,提升域泛化的可靠性。此外还构建了平面扫描图像的定制意图理解基准数据集。结果本文智能体在所提出的基准数据集上与其他7种方法进行了比较,在平均交并比(mean intersection over union,mIoU)指标上,智能体的mIoU达90.47%,相比于性能第2的方法提高了15.60%,总正确率提高了22.10%。同时进行了消融实验,验证了智能体各部分的有效性。最后将智能体应用在公开票据数据集CORD(consolidated receipt dataset)上,验证了智能体的泛化能力。结论本文提出的智能体超越了前沿检测和分割模型在平面扫描图像的定制意图理解任务上的表现,同时回避了对每个子任务微调模型的过程,方法具有有效性和高效性。Objective In the era of mobile internet,mobile applications are developing rapidly and becoming increasingly common in society,thus becoming an indispensable part of daily life.The demands and expectations of users for mobile applications are also constantly increasing.In the development of mobile applications,user intention understanding is an important research field,which aims to provide more personalized and intelligent services for users by analyzing their behavior and needs.User intention understanding for image input in mobile applications is a common practical require⁃ment.In image-related human-computer interaction,users often need to interact with mobile applications through touch clicks or gestures so that mobile applications can intelligently understand users’intentions.Traditional intention under⁃standing methods mainly use a large amount of historical user behavior data to model and predict possible user intentions for new images.However,the application scenario is faced with some challenges such as the customization problem and few interactions,thus limiting the effectiveness of traditional methods.In recent years,with the development of autonomous artificial intelligence,the emergence of agent technology has provided new perspectives for user-customized intention understanding task.Agents can imitate and learn human thinking process,accurately understand users,reduce the burden of memory and operation,and help mobile applications better understand users’intentions and needs.Therefore,we pro⁃pose to build a user-customized intention understanding agent for planar scanned images for user-customized intention understanding task.Method The user-customized intention understanding agent consists of task perception,task planning,task execution,and feedback correction modules.Task perception module extracts information from an input image and combines it with user-customized template information obtained from stored intention libraries to understand the intention of the input image.The task plan

关 键 词:用户意图理解 智能体 小样本增量学习 显著性检测 交互式分割 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象