云南高校图书馆联盟文献共享服务平台- MULTIMODAL

MULTIMODAL: 作品数：530被引量：677H指数：10; 导出分析报告; 相关领域：医药卫生更多>>; 相关作者：陈伟王群丁彰雄顾曰国施鹏飞更多>>; 相关机构：西安外国语大学华中师范大学安徽师范大学华南理工大学更多>>; 相关期刊：更多>>; 相关基金：国家自然科学基金中国博士后科学基金国家重点基础研究发展计划北京市自然科学基金更多>>

ChemDFM-X:towards large multimodal model for chemistry: 《Science China(Information Sciences)》2024年第12期95-96,共2页Zihan ZHAO Bo CHEN Jingpiao LI Lu CHEN Liyang WEN Pengyu WANG Zichen ZHU Danyang ZHANG Yansi LI Zhongyang DAI Xin CHEN Kai YU; supported by National Science and Technology Major Project(Grant No.2023ZD0120703);National Natural Science Foundation of China(Grant Nos.U23B2057,62106142,62120106006);Shanghai Municipal Science and Technology Major Project(Grant No.2021SHZDZX0102)。; Chemistry,as a naturally multimodal discipline,plays a crucial role in various vital fields such as pharmaceutical research and material manufacturing.Therefore,research on artificial intelligence(AI)for chemistry has...; 关键词：MODAL SPITE artificial

How far are we to GPT-4V?Closing the gap to commercial multimodal models with open-source suites被引量：1: 《Science China(Information Sciences)》2024年第12期1-18,共18页Zhe CHEN Weiyun WANG Hao TIAN Shenglong YE Zhangwei GAO Erfei CUI Wenwen TONG Kongzhi HU Jiapeng LUO Zheng MA Ji MA Jiaqi WANG Xiaoyi DONG Hang YAN Hewei GUO Conghui HE Botian SHI Zhenjiang JIN Chao XU Bin WANG Xingjian WEI Wei LI Wenjian ZHANG Bo ZHANG Pinlong CAI Licheng WEN Xiangchao YAN Min DOU Lewei LU Xizhou ZHU Tong LU Dahua LIN Yu QIAO Jifeng DAI Wenhai WANG; supported by National Key R&D Program of China(Grant Nos.2022ZD0160102,2022ZD0161300);National Natural Science Foundation of China(Grant Nos.62372223,U24A20330,62376134);China Mobile Zijin Innovation Institute(Grant No.NR2310J7M);Youth Ph.D.Student Research Project under the National Natural Science Foundation(Grant No.623B2050)。; In this paper,we introduce InternVL 1.5,an open-source multimodal large language model(MLLM)to bridge the capability gap between open-source and proprietary commercial models in multimodal understanding.We introduce t...; 关键词：multimodal model OPEN-SOURCE vision encoder dynamic resolution bilingual dataset

OCRBench:on the hidden mystery of OCR in large multimodal models: 《Science China(Information Sciences)》2024年第12期19-31,共13页Yuliang LIU Zhang LI Mingxin HUANG Biao YANG Wenwen YU Chunyuan LI Xu-Cheng YIN Cheng-Lin LIU Lianwen JIN Xiang BAI; supported by National Natural Science Foundation of China(Grant Nos.62225603,62226104)。; Large models have recently played a dominant role in natural language processing and multimodal vision-language learning.However,their effectiveness in text-related visual tasks remains relatively unexplored.In this p...; 关键词：large multimodal model OCR text recognition scene text-centric VQA document-oriented VQA key information extraction handwritten mathematical expression recognition

Woodpecker:hallucination correction for multimodal large language models: 《Science China(Information Sciences)》2024年第12期48-60,共13页Shukang YIN Chaoyou FU Sirui ZHAO Tong XU Hao WANG Dianbo SUI Yunhang SHEN Ke LI Xing SUN Enhong CHEN; supported in part by National Natural Science Foundation of China(Grant Nos.U23A20319,62222213,U22B2059,61727809,62072423);Young Scientists Fund of the Natural Science Foundation(Grant No.2023NSFSC1402)。; Hallucinations is a big shadow hanging over the rapidly evolving multimodal large language models(MLLMs),referring to that the generated text is inconsistent with the image content.To mitigate hallucinations,existing ...; 关键词：multimodal learning multimodal large language models hallucination correction large language models vision and language

DocPedia:unleashing the power of large multimodal model in the frequency domain for versatile document understanding: 《Science China(Information Sciences)》2024年第12期61-74,共14页Hao FENG Qi LIU Hao LIU Jingqun TANG Wengang ZHOU Houqiang LI Can HUANG; supported by National Natural Science Foundation of China(Grant No.62021001);Youth Innovation Promotion Association CAS;supported by GPU cluster built by MCC Lab of Information Science and Technology Institution,University of Science and Technology of China;the Supercomputing Center of the University of Science and Technology of China。; In this work,we present DocPedia,a novel large multimodal model(LMM)for versatile OCRfree document understanding,capable of parsing images up to 2560×2560 resolution.Unlike existing studies that either struggle with ...; 关键词：document understanding large multimodal model OCR-free HIGH-RESOLUTION frequency

Modality-experts coordinated adaptation for large multimodal models: 《Science China(Information Sciences)》2024年第12期75-92,共18页Yan ZHANG Zhong JI Yanwei PANG Jungong HAN Xuelong LI; supported by National Key Research and Development Program of China (Grant No.2022ZD0160403);National Natural Science Foundation of China (Grant No.62176178)。; Driven by the expansion of foundation models and the increasing variety of downstream tasks,parameter-efficient fine-tuning(PEFT) methods have exhibited remarkable efficacy in the unimodal domain,effectively mitigatin...; 关键词：large multimodal model multimodal learning vision-language pretraining parameter-efficient fine-tuning ADAPTER modality expert

COMET■:“cone of experience”enhanced large multimodal model for mathematical problem generation: 《Science China(Information Sciences)》2024年第12期93-94,共2页Sannyuya LIU Jintian FENG Zongkai YANG Yawei LUO Qian WAN Xiaoxuan SHEN Jianwen SUN; supported by National Science and Technology Major Project(Grant No.2022ZD0117103);National Natural Science Foundation of China(Grant Nos.62437002,62307015,62293554);China Postdoctoral Science Foundation(Grant Nos.2023M741304,2023T160256);Hubei Provincial Natural Science Foundation of China(Grant Nos.2023AFA020,2023AFB295);Fundamental Research Funds for the Central Universities(Grant No.CCNU24AI016)。; The impact of generative artificial intelligence on education is unprecedented[1].Researchers have been exploring possibilities of combining the large multimodal model(LMM)with the teaching process.Specifically,Luo an...; 关键词：COMET cone of experience enhanced large multimodal model for mathematical problem generation

Large circuit models:opportunities and challenges: 《Science China(Information Sciences)》2024年第10期21-62,共42页Lei CHEN Yiqi CHEN Zhufei CHU Wenji FANG Tsung-Yi HO Ru HUANG Yu HUANG Sadaf KHAN Min LI Xingquan LI Yu LI Yun LIANG Jinwei LIU Yi LIU Yibo LIN Guojie LUO Hongyang PAN Zhengyuan SHI Guangyu SUN Dimitrios TSARAS Runsheng WANG Ziyi WANG Xinming WEI Zhiyao XIE Qiang XU Chenhao XUE Junchi YAN Jun YANG Bei YU Mingxuan YUAN Evangeline F.Y.YOUNG Xuan ZENG Haoyi ZHANG Zuodong ZHANG Yuxiang ZHAO Hui-Ling ZHEN Ziyang ZHENG Binwu ZHU Keren ZHU Sunan ZOU; supported in part by Hong Kong S.A.R.General Research Fund(Grant No.14212422);Research Matching(Grant No.CSE-7-2022)。; Within the electronic design automation(EDA)domain,artificial intelligence(AI)-driven solutions have emerged as formidable tools,yet they typically augment rather than redefine existing methodologies.These solutions o...; 关键词：AI-rooted EDA large circuit models(LCMs) multimodal circuit representation learning circuit optimization

Logit prototype learning with active multimodal representation for robust open-set recognition: 《Science China(Information Sciences)》2024年第6期293-308,共16页Yimin FU Zhunga LIU Zicheng WANG; supported in part by National Natural Science Foundation of China(Grant No.U20B2067);Innovation Foundation for Doctor Dissertation of Northwestern Polytechnical University(Grant No.CX2023015);Cultivation Foundation for Excellent Doctoral Dissertation of the School of Automation of Northwestern Polytechnical University。; Robust open-set recognition(OSR)performance has become a prerequisite for pattern recognition systems in real-world applications.However,the existing OSR methods are primarily implemented on the basis of single-modal ...; 关键词：logit prototype learning multimodal perception open-set recognition uncertainty estimation

From single-to multi-modal remote sensing imagery interpretation:a survey and taxonomy被引量：6: 《Science China(Information Sciences)》2023年第4期1-28,共28页Xian SUN Yu TIAN Wanxuan LU Peijin WANG Ruigang NIU Hongfeng YU Kun FU; supported by National Key R&D Program of China(Grant No.2021YFB3900504);National Natural Science Foundation of China(Grant Nos.61725105,62171436)。; Modality is a source or form of information.Through various modal information,humans can perceive the world from multiple perspectives.Simultaneously,the observation of remote sensing(RS)is multimodal.We observe the w...; 关键词：MULTIMODAL remote sensing image interpretation feature fusion co-learning

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

MULTIMODAL

检索结果分析

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

MULTIMODAL

检索结果分析

下载全文

用户登录

高级检索检索式检索