机器学习设计新型有机分子研究进展  被引量:1

Research Progress on New Organic Molecules Design via Machine Learning

在线阅读下载全文

作  者:谭胖 刘旭红[1,3] 谌彤童 秦智慧 杨涛[1] 刘晓彤 刘秀磊[1,2] Tan Pang;Liu Xuhong;Chen Tongtong;Qin Zhihui;Yang Tao;Liu Xiaotong;Liu Xiulei(Beijing Advanced Innovation Center for Materials Genome Engineering,Beijing Information Science and Technology University,Beijing 100101;Laboratory of Data Science and Information Studies,Beijing Information Science and Technology University,Beijing 100101;Beijing Key Laboratory of Internet Culture and Digital Dissemination Research,Beijing Information Science and Technology University,Beijing 100192;Beijing Institute of Tracking and Telecommunications Technology,Beijing 100094;State Key Laboratory of Coal Conversion,Institute of Coal Chemistry,Chinese Academy of Sciences,Taiyuan 030001;National Energy Center for Coal to Liquids,Synfuels China Co.,Ltd,Beijing 101400;University of Chinese Academy of Sciences,Beijing 100049)

机构地区:[1]北京信息科技大学北京材料基因工程高精尖创新中心,北京100101 [2]北京信息科技大学数据与科学情报分析实验室,北京100101 [3]北京信息科技大学网络文化与数字传播北京市重点实验室,北京100192 [4]北京跟踪与通信技术研究所,北京100094 [5]中国科学院煤炭化学研究所煤转化国家重点实验室,太原030001 [6]中科合成油技术有限公司国家能源煤基液体燃料研发中心,北京101400 [7]中国科学院大学,北京100049

出  处:《有机化学》2021年第7期2666-2675,共10页Chinese Journal of Organic Chemistry

基  金:北京信息科技大学“勤信人才”培育计划、北京市教育委员会科技计划一般项目(No.KM202111232003);北京信息科技大学促进高校内涵发展、北京市自然科学基金(No.4204100)资助项目.

摘  要:新型有机分子一直是有机化学领域的研究重点,其在开发高性能材料方面具有重要意义.传统的有机分子发现是一个类似于“炒菜”的试错过程,它耗时耗能且效率相对低下.常见的量子化学方法试图根据期望属性值筛选出合理的分子结构,以更好地指导实验,然而,由于计算资源相对于算法复杂度严重不足,精确给出实验指导在大多数情况下难以实现.近年来机器学习的出现改变了这种情况,训练好的模型可以快速推测出分子的属性.更令人兴奋的是机器学习可以逆向进行分子设计,拓宽人类的想象力,给出其在分子设计领域的“神之一手”.本综述首先介绍了逆向分子设计所必须的分子描述方式,随后对几种常见的深度生成模型加以归纳,对新型有机分子设计研究现状进行了总结,最后探讨了新型有机分子设计所面临的挑战,展示了笔者做出的部分探索.Low-cost and high-performance materials have become more and more important in past decades.It exhibits the technology level of a country.Chemists used to find the candidate material according to property regression and quantitative structure activity relationship(QSAR).Traditional methods focus on finding new molecule from prior knowledge with trial and error experiments.They are time-consuming and low efficiency on screening molecules.The appearance of machine learning(ML)changes this embarrassing situation in two ways.One is accelerating the property prediction process to prevent wasting time on worse candidates.The other is inverse molecule design which expands the imagination of human.Lots of researches show promising results using different inverse design method such as,variational auto-encoder(VAE),generative adversarial networks(GAN),reinforcement learning(RL),and recurrent neural network(RNN).They introduce uncertainty from different level to generate new structure candidates.In any method,molecule descriptor has a great impact on the result.The descriptor converts the 3D structures in real world to a vector or a notation string to feed into all kinds of ML models.Large number of descriptors have been developed in cheminformatic,bioinformatic,quantum chemistry and natural language process(NLP).Some classical descriptors are Coulomb matrix(CM),smooth overlap of atomic positions(SOAP),weighted graph(WG),simplified molecular input line entry specification(SMILES).They show different advantages and solving problems from different aspects.CM has clear definition and good result on energy regression.SOAP is good at reflecting local environment features of an atom.However,they are easy to encode but hard to decode.That is a reason why people prefer WG and SMILES in the structure inverse design tasks.WG and SMILES express structure as a graph(an atom as a node and a bond as an edge)or string to apply massive mature GNN or NLP algorithm on them.Nowadays,most of the ML applications on chemistry and molecule scienc

关 键 词:机器学习 生成模型 逆向分子设计 分子描述 BASE64编码 

分 类 号:O622[理学—有机化学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象