检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:李鑫源 李柏 孙跃硕 张坦探 田永林 殷烛炎 王飞跃[4,5,6] LI Xinyuan;LI Bai;SUN Yueshuo;ZHANG Tantan;TIAN Yonglin;YIN Zhuyan;WANG Fei-Yue(College of Mechanical and Vehicle Engineering,Hunan University,Changsha 410082,China;State Key Laboratory of Advanced Design and Manufacturing Technology for Vehicle,Hunan University,Changsha 410082,China;State Key Laboratory for Multi-modal Artificial Intelligence Systems,Institute of Automation,Chinese Academy of Sciences,Beijing 100190,China;State Key Laboratory for Management and Control of Complex Systems,Institute of Automation,Chinese Academy of Sciences,Beijing 100190,China;School of Artificial Intelligence,University of Chinese Academy of Sciences,Beijing 100049,China;Department of Engineering Science,Faculty of Innovation Engineering,Macao University of Science and Technology,Macao 999078,China)
机构地区:[1]湖南大学机械与运载工程学院,湖南长沙410082 [2]湖南大学整车先进设计制造技术全国重点实验室,湖南长沙410082 [3]中国科学院自动化研究所多模态人工智能系统全国重点实验室,北京100190 [4]中国科学院自动化研究所复杂系统管理与控制国家重点实验室,北京100190 [5]中国科学院大学人工智能学院,北京100049 [6]澳门科技大学创新工程学院工程科学系,中国澳门999078
出 处:《智能科学与技术学报》2024年第4期429-444,共16页Chinese Journal of Intelligent Science and Technology
基 金:国家自然科学基金项目(No.62103139);湖南省芙蓉计划湖湘青年英才项目(No.2023RC3115)。
摘 要:面向高质量和精准烹饪的需求,提出一种基于多模态大语言模型的数字厨师与智能烹饪方法。离线阶段利用视觉、声音、温度等多源传感器记录专业厨师的连续操作,将图像与多轮问答文本融合,建立烹饪专家知识库,并采用低秩适配方法对预训练多模态大语言模型进行微调,以构建能够理解烹饪意图的多模态大语言模型。在线阶段将实时感知的数据转换为图文输入微调后的大语言模型,经模型分析后生成烹饪指令,引导用户完成相应的烹饪动作。以煎牛排任务为例,搭建了智能烹饪软硬件系统并进行实验验证。实验结果表明,经过微调后的智能烹饪系统能有效控制牛排的熟度与品质,相较于微调前的模型,显著提升了烹饪指令的合理性和针对性。A digital chef and an intelligent cooking method were proposed to achieve high-quality,precise cooking results.In the offline phase,visual,auditory and thermal sensors record professional chefs’continuous cooking operations.The collected frame-by-frame images and multi-round Q&A texts form a culinary expert knowledge base.A low-rank adapta‐tion method was applied to fine-tune a pretrained multimodal large language model,enabling it to understand cooking in‐tentions.In the online phase,real-time sensory data were converted into image-text inputs for the fine-tuned model,which then generated cooking instructions to guide users through the cooking steps.A hardware-software cooking system was implemented and tested with a pan-frying steak task.Experimental results show that the fine-tuned system effectively con‐trols the steak’s doneness and quality,and significantly improves the accuracy and rationality of cooking instructions com‐pared to the model before fine-tuning.
关 键 词:多模态大语言模型 数字厨师 智能烹饪 烹饪机器人 专家系统 人工智能
分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.185