• 售前

  • 售后

热门帖子
入门百科

postgresql insert into select无法使用并行查询的解决

[复制链接]
鬼子进了村部 显示全部楼层 发表于 2021-10-26 13:32:56 |阅读模式 打印 上一主题 下一主题
本文信息基于PG13.1。

从PG9.6开始支持并行查询。PG11开始支持CREATE TABLE … AS、SELECT INTO以及CREATE MATERIALIZED VIEW的并行查询。

先说结论:

换用create table as 大概select into大概导入导出。
起首跟踪如下查询语句的实验操持:
  1. select count(*) from test t1,test1 t2 where t1.id = t2.id ;
复制代码
  1. postgres=# explain analyze select count(*) from test t1,test1 t2 where t1.id = t2.id ;
  2.                                     QUERY PLAN                                    
  3. --------------------------------------------------------------------------------------------------------------------------------------------------------
  4. Finalize Aggregate (cost=34244.16..34244.17 rows=1 width=8) (actual time=683.246..715.324 rows=1 loops=1)
  5.   -> Gather (cost=34243.95..34244.16 rows=2 width=8) (actual time=681.474..715.311 rows=3 loops=1)
  6.      Workers Planned: 2
  7.      Workers Launched: 2
  8.      -> Partial Aggregate (cost=33243.95..33243.96 rows=1 width=8) (actual time=674.689..675.285 rows=1 loops=3)
  9.         -> Parallel Hash Join (cost=15428.00..32202.28 rows=416667 width=0) (actual time=447.799..645.689 rows=333333 loops=3)
  10.            Hash Cond: (t1.id = t2.id)
  11.            -> Parallel Seq Scan on test t1 (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.025..74.010 rows=333333 loops=3)
  12.            -> Parallel Hash (cost=8591.67..8591.67 rows=416667 width=4) (actual time=260.052..260.053 rows=333333 loops=3)
  13.               Buckets: 131072 Batches: 16 Memory Usage: 3520kB
  14.               -> Parallel Seq Scan on test1 t2 (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.032..104.804 rows=333333 loops=3)
  15. Planning Time: 0.420 ms
  16. Execution Time: 715.447 ms
  17. (13 rows)
复制代码
可以看到走了两个Workers。

下边看一下insert into select:
  1. postgres=# explain analyze insert into va select count(*) from test t1,test1 t2 where t1.id = t2.id ;     
  2.                                   QUERY PLAN                                 
  3. --------------------------------------------------------------------------------------------------------------------------------------------------
  4. Insert on va (cost=73228.00..73228.02 rows=1 width=4) (actual time=3744.179..3744.187 rows=0 loops=1)
  5.   -> Subquery Scan on "*SELECT*" (cost=73228.00..73228.02 rows=1 width=4) (actual time=3743.343..3743.352 rows=1 loops=1)
  6.      -> Aggregate (cost=73228.00..73228.01 rows=1 width=8) (actual time=3743.247..3743.254 rows=1 loops=1)
  7.         -> Hash Join (cost=30832.00..70728.00 rows=1000000 width=0) (actual time=1092.295..3511.301 rows=1000000 loops=1)
  8.            Hash Cond: (t1.id = t2.id)
  9.            -> Seq Scan on test t1 (cost=0.00..14425.00 rows=1000000 width=4) (actual time=0.030..421.537 rows=1000000 loops=1)
  10.            -> Hash (cost=14425.00..14425.00 rows=1000000 width=4) (actual time=1090.078..1090.081 rows=1000000 loops=1)
  11.               Buckets: 131072 Batches: 16 Memory Usage: 3227kB
  12.               -> Seq Scan on test1 t2 (cost=0.00..14425.00 rows=1000000 width=4) (actual time=0.021..422.768 rows=1000000 loops=1)
  13. Planning Time: 0.511 ms
  14. Execution Time: 3745.633 ms
  15. (11 rows)
复制代码
可以看到并没有Workers的指示,没有启用并行查询。

即使开启欺压并行,也无法走并行查询。
  1. postgres=# set force_parallel_mode =on;
  2. SET
  3. postgres=# explain analyze insert into va select count(*) from test t1,test1 t2 where t1.id = t2.id ;
  4.                                   QUERY PLAN                                 
  5. --------------------------------------------------------------------------------------------------------------------------------------------------
  6. Insert on va (cost=73228.00..73228.02 rows=1 width=4) (actual time=3825.042..3825.049 rows=0 loops=1)
  7.   -> Subquery Scan on "*SELECT*" (cost=73228.00..73228.02 rows=1 width=4) (actual time=3824.976..3824.984 rows=1 loops=1)
  8.      -> Aggregate (cost=73228.00..73228.01 rows=1 width=8) (actual time=3824.972..3824.978 rows=1 loops=1)
  9.         -> Hash Join (cost=30832.00..70728.00 rows=1000000 width=0) (actual time=1073.587..3599.402 rows=1000000 loops=1)
  10.            Hash Cond: (t1.id = t2.id)
  11.            -> Seq Scan on test t1 (cost=0.00..14425.00 rows=1000000 width=4) (actual time=0.034..414.965 rows=1000000 loops=1)
  12.            -> Hash (cost=14425.00..14425.00 rows=1000000 width=4) (actual time=1072.441..1072.443 rows=1000000 loops=1)
  13.               Buckets: 131072 Batches: 16 Memory Usage: 3227kB
  14.               -> Seq Scan on test1 t2 (cost=0.00..14425.00 rows=1000000 width=4) (actual time=0.022..400.624 rows=1000000 loops=1)
  15. Planning Time: 0.577 ms
  16. Execution Time: 3825.923 ms
  17. (11 rows)
复制代码
原因在官方文档有写:
  1. The query writes any data or locks any database rows. If a query contains a data-modifying operation either at the top level or within a CTE, no parallel plans for that query will be generated. As an exception, the commands CREATE TABLE … AS, SELECT INTO, and CREATE MATERIALIZED VIEW which create a new table and populate it can use a parallel plan.
复制代码
办理方案有如下三种:


1.select into
  1. postgres=# explain analyze select count(*) into vaa from test t1,test1 t2 where t1.id = t2.id ;
  2.                                     QUERY PLAN                                    
  3. --------------------------------------------------------------------------------------------------------------------------------------------------------
  4. Finalize Aggregate (cost=34244.16..34244.17 rows=1 width=8) (actual time=742.736..774.923 rows=1 loops=1)
  5.   -> Gather (cost=34243.95..34244.16 rows=2 width=8) (actual time=740.223..774.907 rows=3 loops=1)
  6.      Workers Planned: 2
  7.      Workers Launched: 2
  8.      -> Partial Aggregate (cost=33243.95..33243.96 rows=1 width=8) (actual time=731.408..731.413 rows=1 loops=3)
  9.         -> Parallel Hash Join (cost=15428.00..32202.28 rows=416667 width=0) (actual time=489.880..700.830 rows=333333 loops=3)
  10.            Hash Cond: (t1.id = t2.id)
  11.            -> Parallel Seq Scan on test t1 (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.033..87.479 rows=333333 loops=3)
  12.            -> Parallel Hash (cost=8591.67..8591.67 rows=416667 width=4) (actual time=266.839..266.840 rows=333333 loops=3)
  13.               Buckets: 131072 Batches: 16 Memory Usage: 3520kB
  14.               -> Parallel Seq Scan on test1 t2 (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.058..106.874 rows=333333 loops=3)
  15. Planning Time: 0.319 ms
  16. Execution Time: 783.300 ms
  17. (13 rows)
复制代码
2.create table as
  1. postgres=# explain analyze create table vb as select count(*) from test t1,test1 t2 where t1.id = t2.id ;
  2.                                    QUERY PLAN                                    
  3. -------------------------------------------------------------------------------------------------------------------------------------------------------
  4. Finalize Aggregate (cost=34244.16..34244.17 rows=1 width=8) (actual time=540.120..563.733 rows=1 loops=1)
  5.   -> Gather (cost=34243.95..34244.16 rows=2 width=8) (actual time=537.982..563.720 rows=3 loops=1)
  6.      Workers Planned: 2
  7.      Workers Launched: 2
  8.      -> Partial Aggregate (cost=33243.95..33243.96 rows=1 width=8) (actual time=526.602..527.136 rows=1 loops=3)
  9.         -> Parallel Hash Join (cost=15428.00..32202.28 rows=416667 width=0) (actual time=334.532..502.793 rows=333333 loops=3)
  10.            Hash Cond: (t1.id = t2.id)
  11.            -> Parallel Seq Scan on test t1 (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.018..57.819 rows=333333 loops=3)
  12.            -> Parallel Hash (cost=8591.67..8591.67 rows=416667 width=4) (actual time=189.502..189.503 rows=333333 loops=3)
  13.               Buckets: 131072 Batches: 16 Memory Usage: 3520kB
  14.               -> Parallel Seq Scan on test1 t2 (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.023..77.786 rows=333333 loops=3)
  15. Planning Time: 0.189 ms
  16. Execution Time: 565.448 ms
  17. (13 rows)
复制代码
3.大概通过导入导出的方式,比方:
  1. psql -h localhost -d postgres -U postgres -c "select count(*) from test t1,test1 t2 where t1.id = t2.id " -o result.csv -A -t -F ","
  2. psql -h localhost -d postgres -U postgres -c "COPY va FROM 'result.csv' WITH (FORMAT CSV, DELIMITER ',', HEADER FALSE, ENCODING 'windows-1252')"
复制代码
一些场景下也会比非并行快。
到此这篇关于postgresql insert into select无法使用并行查询的办理的文章就先容到这了,更多相干postgresql insert into select并行查询内容请搜索草根技术分享以前的文章或继续浏览下面的相干文章渴望各人以后多多支持草根技术分享!

帖子地址: 

回复

使用道具 举报

分享
推广
火星云矿 | 预约S19Pro,享500抵1000!
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

草根技术分享(草根吧)是全球知名中文IT技术交流平台,创建于2021年,包含原创博客、精品问答、职业培训、技术社区、资源下载等产品服务,提供原创、优质、完整内容的专业IT技术开发社区。
  • 官方手机版

  • 微信公众号

  • 商务合作