supported by National Science and Technology Major Project(Grant No.2023ZD0120703);National Natural Science Foundation of China(Grant Nos.U23B2057,62106142,62120106006);Shanghai Municipal Science and Technology Major Project(Grant No.2021SHZDZX0102)。
Chemistry,as a naturally multimodal discipline,plays a crucial role in various vital fields such as pharmaceutical research and material manufacturing.Therefore,research on artificial intelligence(AI)for chemistry has...
supported by National Key R&D Program of China(Grant Nos.2022ZD0160102,2022ZD0161300);National Natural Science Foundation of China(Grant Nos.62372223,U24A20330,62376134);China Mobile Zijin Innovation Institute(Grant No.NR2310J7M);Youth Ph.D.Student Research Project under the National Natural Science Foundation(Grant No.623B2050)。
In this paper,we introduce InternVL 1.5,an open-source multimodal large language model(MLLM)to bridge the capability gap between open-source and proprietary commercial models in multimodal understanding.We introduce t...
supported by National Natural Science Foundation of China(Grant Nos.62225603,62226104)。
Large models have recently played a dominant role in natural language processing and multimodal vision-language learning.However,their effectiveness in text-related visual tasks remains relatively unexplored.In this p...
supported in part by National Natural Science Foundation of China(Grant Nos.U23A20319,62222213,U22B2059,61727809,62072423);Young Scientists Fund of the Natural Science Foundation(Grant No.2023NSFSC1402)。
Hallucinations is a big shadow hanging over the rapidly evolving multimodal large language models(MLLMs),referring to that the generated text is inconsistent with the image content.To mitigate hallucinations,existing ...
supported by National Natural Science Foundation of China(Grant No.62021001);Youth Innovation Promotion Association CAS;supported by GPU cluster built by MCC Lab of Information Science and Technology Institution,University of Science and Technology of China;the Supercomputing Center of the University of Science and Technology of China。
In this work,we present DocPedia,a novel large multimodal model(LMM)for versatile OCRfree document understanding,capable of parsing images up to 2560×2560 resolution.Unlike existing studies that either struggle with ...
supported by National Key Research and Development Program of China (Grant No.2022ZD0160403);National Natural Science Foundation of China (Grant No.62176178)。
Driven by the expansion of foundation models and the increasing variety of downstream tasks,parameter-efficient fine-tuning(PEFT) methods have exhibited remarkable efficacy in the unimodal domain,effectively mitigatin...
supported by National Science and Technology Major Project(Grant No.2022ZD0117103);National Natural Science Foundation of China(Grant Nos.62437002,62307015,62293554);China Postdoctoral Science Foundation(Grant Nos.2023M741304,2023T160256);Hubei Provincial Natural Science Foundation of China(Grant Nos.2023AFA020,2023AFB295);Fundamental Research Funds for the Central Universities(Grant No.CCNU24AI016)。
The impact of generative artificial intelligence on education is unprecedented[1].Researchers have been exploring possibilities of combining the large multimodal model(LMM)with the teaching process.Specifically,Luo an...
supported in part by Hong Kong S.A.R.General Research Fund(Grant No.14212422);Research Matching(Grant No.CSE-7-2022)。
Within the electronic design automation(EDA)domain,artificial intelligence(AI)-driven solutions have emerged as formidable tools,yet they typically augment rather than redefine existing methodologies.These solutions o...
supported in part by National Natural Science Foundation of China(Grant No.U20B2067);Innovation Foundation for Doctor Dissertation of Northwestern Polytechnical University(Grant No.CX2023015);Cultivation Foundation for Excellent Doctoral Dissertation of the School of Automation of Northwestern Polytechnical University。
Robust open-set recognition(OSR)performance has become a prerequisite for pattern recognition systems in real-world applications.However,the existing OSR methods are primarily implemented on the basis of single-modal ...
supported by National Key R&D Program of China(Grant No.2021YFB3900504);National Natural Science Foundation of China(Grant Nos.61725105,62171436)。
Modality is a source or form of information.Through various modal information,humans can perceive the world from multiple perspectives.Simultaneously,the observation of remote sensing(RS)is multimodal.We observe the w...