PG-RAC:基于PostgreSQL的共享缓存多写事务处理数据库  

PG-RAC:PostgreSQL-based Database with Shared Cache for Multi-write Transaction

作  者:印钰杰 史浩洋 范自豪 周华辉 刘晟驰 胡卉芪 魏星[2] 陈河堆[2] 屠要峰[2] 蔡鹏 周烜 YIN Yu-Jie;SHI Hao-Yang;FAN Zi-Hao;ZHOU Hua-Hui;LIU Sheng-Chi;HU Hui-Qi;WEI Xing;CHEN He-Dui;TU Yao-Feng;CAI Peng;ZHOU Xuan(School of Data Science and Engineering,East China Normal University,Shanghai 200062,China;ZTE Corporation,Nanjing 210012,China)

机构地区:[1]华东师范大学数据科学与工程学院,上海200062 [2]中兴通讯股份有限公司,江苏南京210012

出  处:《软件学报》2025年第3期1065-1083,共19页Journal of Software

基  金:国家自然科学基金(92270202);上海市自然科学基金(23ZR1418300);中兴通讯研究基金(HC-CN-20220721010)。

摘  要:云原生数据库的主流设计采用一主多从架构,集群中从节点可以分担主节点的只读请求,写请求由主节点处理.在此基础上,为了进一步满足大规模交易扩展的需求,一些云数据库尝试实现多写事务扩展.多写扩展的一种实现路径是在计算节点间实现共享缓存,支持跨节点的数据访问.在基于共享缓存的数据库系统中,跨节点远程访问的开销远大于本地访问,因此缓存协议的设计是影响系统性能和可扩展性的关键因素.对缓存协议提出了两个创新性改进,并基于PostgreSQL实现了支持多写事务处理的共享缓存数据库PG-RAC.一方面,PG-RAC提出一种新型的分布式链式路由策略,将路由信息分散在各计算节点.相比单点目录管理的路由策略,事务平均延迟降低了约20%.另一方面,还改进了副本页失效机制,将失效操作从事务路径分离,减小了事务处理关键路径的延迟.在此基础上,PG-RAC利用多版本并发控制的特性,进一步提出推迟副本页失效时机,有效提高了缓存利用率.TPCC实验结果显示,在配备4台计算节点的集群中,吞吐率为PostgreSQL的近2倍,为分布式数据库Citus的1.5倍.Single-master multi-slave is the mainstream architecture of cloud-native databases.In the cluster,slave nodes can share the readonly requests of the master node,while write requests are handled by the master node.Based on this,to further meet the demands of large-scale transaction expansion,some cloud databases attempt to implement multi-write transaction expansion.One possible approach to multiwrite expansion is to introduce shared cache among computing nodes to support cross-node data access.For shared-cache database systems,the overhead of cross-node remote access is significantly higher than that of local access.Therefore,the design of cache protocol is a crucial factor that affects system performance and scalability.This study proposes two innovative improvements to the coherence protocol and implements PG-RAC,a shared-cache database,which supports multi-write transactions based on PostgreSQL.On one hand,PG-RAC proposes a new distributed chained routing strategy,which disperses routing information among computing nodes.Compared to the routing strategy that utilizes single-node directory management,it reduces the average transaction latency by approximately 20%.On the other hand,this study also enhances the duplicate page invalidation mechanism by separating invalidation operations from the transaction path,reducing the latency of the critical path in the transaction.Based on this,PG-RAC takes advantage of the characteristics of multi-version concurrency control(MVCC)and further proposes to delay the invalidation point of duplicate pages,which effectively improves cache utilization.TPC-C experimental results show that for a cluster with 4 compute nodes,the throughput is nearly 2 times that of PostgreSQL and 1.5 times that of the distributed database Citus.

关 键 词:云原生数据库 共享缓存数据库 缓存一致性协议 事务处理 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象