MULTIMODAL

作品数:530被引量:677H指数:10
导出分析报告
相关领域:医药卫生更多>>
相关作者:陈伟王群丁彰雄顾曰国施鹏飞更多>>
相关机构:西安外国语大学华中师范大学安徽师范大学华南理工大学更多>>
相关期刊:更多>>
相关基金:国家自然科学基金中国博士后科学基金国家重点基础研究发展计划北京市自然科学基金更多>>
-

检索结果分析

结果分析中...
选择条件:
  • 期刊=Science China(Information Sciences)x
条 记 录,以下是1-10
视图:
排序:
ChemDFM-X:towards large multimodal model for chemistry
《Science China(Information Sciences)》2024年第12期95-96,共2页Zihan ZHAO Bo CHEN Jingpiao LI Lu CHEN Liyang WEN Pengyu WANG Zichen ZHU Danyang ZHANG Yansi LI Zhongyang DAI Xin CHEN Kai YU 
supported by National Science and Technology Major Project(Grant No.2023ZD0120703);National Natural Science Foundation of China(Grant Nos.U23B2057,62106142,62120106006);Shanghai Municipal Science and Technology Major Project(Grant No.2021SHZDZX0102)。
Chemistry,as a naturally multimodal discipline,plays a crucial role in various vital fields such as pharmaceutical research and material manufacturing.Therefore,research on artificial intelligence(AI)for chemistry has...
关键词:MODAL SPITE artificial 
How far are we to GPT-4V?Closing the gap to commercial multimodal models with open-source suites被引量:1
《Science China(Information Sciences)》2024年第12期1-18,共18页Zhe CHEN Weiyun WANG Hao TIAN Shenglong YE Zhangwei GAO Erfei CUI Wenwen TONG Kongzhi HU Jiapeng LUO Zheng MA Ji MA Jiaqi WANG Xiaoyi DONG Hang YAN Hewei GUO Conghui HE Botian SHI Zhenjiang JIN Chao XU Bin WANG Xingjian WEI Wei LI Wenjian ZHANG Bo ZHANG Pinlong CAI Licheng WEN Xiangchao YAN Min DOU Lewei LU Xizhou ZHU Tong LU Dahua LIN Yu QIAO Jifeng DAI Wenhai WANG 
supported by National Key R&D Program of China(Grant Nos.2022ZD0160102,2022ZD0161300);National Natural Science Foundation of China(Grant Nos.62372223,U24A20330,62376134);China Mobile Zijin Innovation Institute(Grant No.NR2310J7M);Youth Ph.D.Student Research Project under the National Natural Science Foundation(Grant No.623B2050)。
In this paper,we introduce InternVL 1.5,an open-source multimodal large language model(MLLM)to bridge the capability gap between open-source and proprietary commercial models in multimodal understanding.We introduce t...
关键词:multimodal model OPEN-SOURCE vision encoder dynamic resolution bilingual dataset 
OCRBench:on the hidden mystery of OCR in large multimodal models
《Science China(Information Sciences)》2024年第12期19-31,共13页Yuliang LIU Zhang LI Mingxin HUANG Biao YANG Wenwen YU Chunyuan LI Xu-Cheng YIN Cheng-Lin LIU Lianwen JIN Xiang BAI 
supported by National Natural Science Foundation of China(Grant Nos.62225603,62226104)。
Large models have recently played a dominant role in natural language processing and multimodal vision-language learning.However,their effectiveness in text-related visual tasks remains relatively unexplored.In this p...
关键词:large multimodal model OCR text recognition scene text-centric VQA document-oriented VQA key information extraction handwritten mathematical expression recognition 
Woodpecker:hallucination correction for multimodal large language models
《Science China(Information Sciences)》2024年第12期48-60,共13页Shukang YIN Chaoyou FU Sirui ZHAO Tong XU Hao WANG Dianbo SUI Yunhang SHEN Ke LI Xing SUN Enhong CHEN 
supported in part by National Natural Science Foundation of China(Grant Nos.U23A20319,62222213,U22B2059,61727809,62072423);Young Scientists Fund of the Natural Science Foundation(Grant No.2023NSFSC1402)。
Hallucinations is a big shadow hanging over the rapidly evolving multimodal large language models(MLLMs),referring to that the generated text is inconsistent with the image content.To mitigate hallucinations,existing ...
关键词:multimodal learning multimodal large language models hallucination correction large language models vision and language 
DocPedia:unleashing the power of large multimodal model in the frequency domain for versatile document understanding
《Science China(Information Sciences)》2024年第12期61-74,共14页Hao FENG Qi LIU Hao LIU Jingqun TANG Wengang ZHOU Houqiang LI Can HUANG 
supported by National Natural Science Foundation of China(Grant No.62021001);Youth Innovation Promotion Association CAS;supported by GPU cluster built by MCC Lab of Information Science and Technology Institution,University of Science and Technology of China;the Supercomputing Center of the University of Science and Technology of China。
In this work,we present DocPedia,a novel large multimodal model(LMM)for versatile OCRfree document understanding,capable of parsing images up to 2560×2560 resolution.Unlike existing studies that either struggle with ...
关键词:document understanding large multimodal model OCR-free HIGH-RESOLUTION frequency 
Modality-experts coordinated adaptation for large multimodal models
《Science China(Information Sciences)》2024年第12期75-92,共18页Yan ZHANG Zhong JI Yanwei PANG Jungong HAN Xuelong LI 
supported by National Key Research and Development Program of China (Grant No.2022ZD0160403);National Natural Science Foundation of China (Grant No.62176178)。
Driven by the expansion of foundation models and the increasing variety of downstream tasks,parameter-efficient fine-tuning(PEFT) methods have exhibited remarkable efficacy in the unimodal domain,effectively mitigatin...
关键词:large multimodal model multimodal learning vision-language pretraining parameter-efficient fine-tuning ADAPTER modality expert 
COMET■:“cone of experience”enhanced large multimodal model for mathematical problem generation
《Science China(Information Sciences)》2024年第12期93-94,共2页Sannyuya LIU Jintian FENG Zongkai YANG Yawei LUO Qian WAN Xiaoxuan SHEN Jianwen SUN 
supported by National Science and Technology Major Project(Grant No.2022ZD0117103);National Natural Science Foundation of China(Grant Nos.62437002,62307015,62293554);China Postdoctoral Science Foundation(Grant Nos.2023M741304,2023T160256);Hubei Provincial Natural Science Foundation of China(Grant Nos.2023AFA020,2023AFB295);Fundamental Research Funds for the Central Universities(Grant No.CCNU24AI016)。
The impact of generative artificial intelligence on education is unprecedented[1].Researchers have been exploring possibilities of combining the large multimodal model(LMM)with the teaching process.Specifically,Luo an...
关键词:COMET cone of experience enhanced large multimodal model for mathematical problem generation 
Large circuit models:opportunities and challenges
《Science China(Information Sciences)》2024年第10期21-62,共42页Lei CHEN Yiqi CHEN Zhufei CHU Wenji FANG Tsung-Yi HO Ru HUANG Yu HUANG Sadaf KHAN Min LI Xingquan LI Yu LI Yun LIANG Jinwei LIU Yi LIU Yibo LIN Guojie LUO Hongyang PAN Zhengyuan SHI Guangyu SUN Dimitrios TSARAS Runsheng WANG Ziyi WANG Xinming WEI Zhiyao XIE Qiang XU Chenhao XUE Junchi YAN Jun YANG Bei YU Mingxuan YUAN Evangeline F.Y.YOUNG Xuan ZENG Haoyi ZHANG Zuodong ZHANG Yuxiang ZHAO Hui-Ling ZHEN Ziyang ZHENG Binwu ZHU Keren ZHU Sunan ZOU 
supported in part by Hong Kong S.A.R.General Research Fund(Grant No.14212422);Research Matching(Grant No.CSE-7-2022)。
Within the electronic design automation(EDA)domain,artificial intelligence(AI)-driven solutions have emerged as formidable tools,yet they typically augment rather than redefine existing methodologies.These solutions o...
关键词:AI-rooted EDA large circuit models(LCMs) multimodal circuit representation learning circuit optimization 
Logit prototype learning with active multimodal representation for robust open-set recognition
《Science China(Information Sciences)》2024年第6期293-308,共16页Yimin FU Zhunga LIU Zicheng WANG 
supported in part by National Natural Science Foundation of China(Grant No.U20B2067);Innovation Foundation for Doctor Dissertation of Northwestern Polytechnical University(Grant No.CX2023015);Cultivation Foundation for Excellent Doctoral Dissertation of the School of Automation of Northwestern Polytechnical University。
Robust open-set recognition(OSR)performance has become a prerequisite for pattern recognition systems in real-world applications.However,the existing OSR methods are primarily implemented on the basis of single-modal ...
关键词:logit prototype learning multimodal perception open-set recognition uncertainty estimation 
From single-to multi-modal remote sensing imagery interpretation:a survey and taxonomy被引量:6
《Science China(Information Sciences)》2023年第4期1-28,共28页Xian SUN Yu TIAN Wanxuan LU Peijin WANG Ruigang NIU Hongfeng YU Kun FU 
supported by National Key R&D Program of China(Grant No.2021YFB3900504);National Natural Science Foundation of China(Grant Nos.61725105,62171436)。
Modality is a source or form of information.Through various modal information,humans can perceive the world from multiple perspectives.Simultaneously,the observation of remote sensing(RS)is multimodal.We observe the w...
关键词:MULTIMODAL remote sensing image interpretation feature fusion co-learning 
检索报告 对象比较 聚类工具 使用帮助 返回顶部