对具有多个自联接的大型表执行缓慢的空间查询

2023-12-10

我正在 Postgres 9.3.9 中的一个大表上进行查询。它是一个空间数据集，并且具有空间索引。比如说，我需要找到 3 种类型的物体：A、B 和 C。标准是 B 和 C 都在 A 的一定距离内，比如 500 米。

我的查询是这样的：

select 
  school.osm_id as school_osm_id, 
  school.name as school_name, 
  school.way as school_way, 
  restaurant.osm_id as restaurant_osm_id, 
  restaurant.name as restaurant_name, 
  restaurant.way as restaurant_way, 
  bar.osm_id as bar_osm_id, 
  bar.name as bar_name, 
  bar.way as bar_way 
from (
    select osm_id, name, amenity, way, way_geo 
    from planet_osm_point 
    where amenity = 'school') as school, 
   (select osm_id, name, amenity, way, way_geo 
    from planet_osm_point 
    where amenity = 'restaurant') as restaurant, 
   (select osm_id, name, amenity, way, way_geo 
    from planet_osm_point 
    where amenity = 'bar') as bar 
where ST_DWithin(school.way_geo, restaurant.way_geo, 500, false) 
  and ST_DWithin(school.way_geo, bar.way_geo, 500, false);

这个查询给了我我想要的，但是它需要很长的时间，比如 13 秒来执行。我想知道是否有另一种方法来编写查询并使其更有效。

查询计划：

Nested Loop  (cost=74.43..28618.65 rows=1 width=177) (actual time=33.513..11235.212 rows=10591 loops=1)
   Buffers: shared hit=530967 read=8733
   ->  Nested Loop  (cost=46.52..28586.46 rows=1 width=174) (actual time=31.998..9595.212 rows=4235 loops=1)
         Buffers: shared hit=389863 read=8707
         ->  Bitmap Heap Scan on planet_osm_point  (cost=18.61..2897.83 rows=798 width=115) (actual time=7.862..150.607 rows=8811 loops=1)
               Recheck Cond: (amenity = 'school'::text)
               Buffers: shared hit=859 read=5204
               ->  Bitmap Index Scan on idx_planet_osm_point_amenity  (cost=0.00..18.41 rows=798 width=0) (actual time=5.416..5.416 rows=8811 loops=1)
                     Index Cond: (amenity = 'school'::text)
                     Buffers: shared hit=3 read=24
         ->  Bitmap Heap Scan on planet_osm_point planet_osm_point_1  (cost=27.91..32.18 rows=1 width=115) (actual time=1.064..1.069 rows=0 loops=8811)
               Recheck Cond: ((way_geo && _st_expand(planet_osm_point.way_geo, 500::double precision)) AND (amenity = 'restaurant'::text))
               Filter: ((planet_osm_point.way_geo && _st_expand(way_geo, 500::double precision)) AND _st_dwithin(planet_osm_point.way_geo, way_geo, 500::double precision, false))
               Rows Removed by Filter: 0
               Buffers: shared hit=389004 read=3503
               ->  BitmapAnd  (cost=27.91..27.91 rows=1 width=0) (actual time=1.058..1.058 rows=0 loops=8811)
                     Buffers: shared hit=384528 read=2841
                     ->  Bitmap Index Scan on idx_planet_osm_point_waygeo  (cost=0.00..9.05 rows=137 width=0) (actual time=0.193..0.193 rows=64 loops=8811)
                           Index Cond: (way_geo && _st_expand(planet_osm_point.way_geo, 500::double precision))
                           Buffers: shared hit=146631 read=2841
                     ->  Bitmap Index Scan on idx_planet_osm_point_amenity  (cost=0.00..18.41 rows=798 width=0) (actual time=0.843..0.843 rows=6291 loops=8811)
                           Index Cond: (amenity = 'restaurant'::text)
                           Buffers: shared hit=237897
   ->  Bitmap Heap Scan on planet_osm_point planet_osm_point_2  (cost=27.91..32.18 rows=1 width=115) (actual time=0.375..0.383 rows=3 loops=4235)
         Recheck Cond: ((way_geo && _st_expand(planet_osm_point.way_geo, 500::double precision)) AND (amenity = 'bar'::text))
         Filter: ((planet_osm_point.way_geo && _st_expand(way_geo, 500::double precision)) AND _st_dwithin(planet_osm_point.way_geo, way_geo, 500::double precision, false))
         Rows Removed by Filter: 1
         Buffers: shared hit=141104 read=26
         ->  BitmapAnd  (cost=27.91..27.91 rows=1 width=0) (actual time=0.368..0.368 rows=0 loops=4235)
               Buffers: shared hit=127019
               ->  Bitmap Index Scan on idx_planet_osm_point_waygeo  (cost=0.00..9.05 rows=137 width=0) (actual time=0.252..0.252 rows=363 loops=4235)
                     Index Cond: (way_geo && _st_expand(planet_osm_point.way_geo, 500::double precision))
                     Buffers: shared hit=101609
               ->  Bitmap Index Scan on idx_planet_osm_point_amenity  (cost=0.00..18.41 rows=798 width=0) (actual time=0.104..0.104 rows=779 loops=4235)
                     Index Cond: (amenity = 'bar'::text)
                     Buffers: shared hit=25410
 Total runtime: 11238.605 ms

我目前只使用一张桌子1,372,711 行。它有73 列:

       Column       |         Type         |       Modifiers
--------------------+----------------------+---------------------------
 osm_id             | bigint               | 
 access             | text                 | 
 addr:housename     | text                 | 
 addr:housenumber   | text                 | 
 addr:interpolation | text                 | 
 admin_level        | text                 | 
 aerialway          | text                 | 
 aeroway            | text                 | 
 amenity            | text                 | 
 area               | text                 | 
 barrier            | text                 | 
 bicycle            | text                 | 
 brand              | text                 | 
 bridge             | text                 | 
 boundary           | text                 | 
 building           | text                 | 
 capital            | text                 | 
 construction       | text                 | 
 covered            | text                 | 
 culvert            | text                 | 
 cutting            | text                 | 
 denomination       | text                 | 
 disused            | text                 | 
 ele                | text                 | 
 embankment         | text                 | 
 foot               | text                 | 
 generator:source   | text                 | 
 harbour            | text                 | 
 highway            | text                 | 
 historic           | text                 | 
 horse              | text                 | 
 intermittent       | text                 | 
 junction           | text                 | 
 landuse            | text                 | 
 layer              | text                 | 
 leisure            | text                 | 
 lock               | text                 | 
 man_made           | text                 | 
 military           | text                 | 
 motorcar           | text                 | 
 name               | text                 | 
 natural            | text                 | 
 office             | text                 | 
 oneway             | text                 | 
 operator           | text                 | 
 place              | text                 | 
 poi                | text                 | 
 population         | text                 | 
 power              | text                 | 
 power_source       | text                 | 
 public_transport   | text                 | 
 railway            | text                 | 
 ref                | text                 | 
 religion           | text                 | 
 route              | text                 | 
 service            | text                 | 
 shop               | text                 | 
 sport              | text                 | 
 surface            | text                 | 
 toll               | text                 | 
 tourism            | text                 | 
 tower:type         | text                 | 
 tunnel             | text                 | 
 water              | text                 | 
 waterway           | text                 | 
 wetland            | text                 | 
 width              | text                 | 
 wood               | text                 | 
 z_order            | integer              | 
 tags               | hstore               | 
 way                | geometry(Point,4326) | 
 way_geo            | geography            | 
 gid                | integer              | not null default nextval('...
Indexes:
    "planet_osm_point_pkey1" PRIMARY KEY, btree (gid)
    "idx_planet_osm_point_amenity" btree (amenity)
    "idx_planet_osm_point_waygeo" gist (way_geo)
    "planet_osm_point_index" gist (way)
    "planet_osm_point_pkey" btree (osm_id)

便利学校、餐厅和酒吧分别有8811、6291、779排。

这个查询应该有很长的路要走（是much快点）：

WITH school AS (
   SELECT s.osm_id AS school_id, text 'school' AS type, s.osm_id, s.name, s.way_geo
   FROM   planet_osm_point s
        , LATERAL (
      SELECT  1 FROM planet_osm_point
      WHERE   ST_DWithin(way_geo, s.way_geo, 500, false)
      AND     amenity = 'bar'
      LIMIT   1  -- bar exists -- most selective first if possible
      ) b
        , LATERAL (
      SELECT  1 FROM planet_osm_point
      WHERE   ST_DWithin(way_geo, s.way_geo, 500, false)
      AND     amenity = 'restaurant'
      LIMIT   1  -- restaurant exists
      ) r
   WHERE  s.amenity = 'school'
   )
SELECT * FROM (
   TABLE school  -- schools

   UNION ALL  -- bars
   SELECT s.school_id, 'bar', x.*
   FROM   school s
        , LATERAL (
      SELECT  osm_id, name, way_geo
      FROM    planet_osm_point
      WHERE   ST_DWithin(way_geo, s.way_geo, 500, false)
      AND     amenity = 'bar'
      ) x

   UNION ALL  -- restaurants
   SELECT s.school_id, 'rest.', x.*
   FROM   school s
        , LATERAL (
      SELECT  osm_id, name, way_geo
      FROM    planet_osm_point
      WHERE   ST_DWithin(way_geo, s.way_geo, 500, false)
      AND     amenity = 'restaurant'
      ) x
   ) sub
ORDER BY school_id, (type <> 'school'), type, osm_id;

This is not与您的原始查询相同，而是您真正想要的，根据评论中的讨论:

我想要一份拥有 500 家以内餐厅和酒吧的学校列表米，我需要每个学校的坐标及其对应的餐厅和酒吧。

因此，此查询返回这些学校的列表，然后是附近的酒吧和餐馆。每组行都由osm_id栏中的学校school_id.

现在使用LATERAL连接，以利用空间 GiST 索引。

TABLE school只是简写SELECT * FROM school:

psql 中有 SELECT * FROM 的快捷方式吗？

表达方式(type <> 'school')首先对每组中的学校进行排序，因为：

SQL select 按日和月查询顺序

子查询sub在决赛中SELECT只需要通过这个表达式来排序。 AUNION查询限制附加的ORDER BY仅列出列，无表达式。

我重点关注您为了回答这个问题而提出的问题 -ignoring过滤其他 70 个文本列中任何一个的扩展要求。这确实是一个设计缺陷。搜索条件应集中于few列。或者您必须对所有 70 列建立索引，而像我要建议的多列索引几乎不是一个选择。仍然possible尽管 ...

Index

除了现有的：

"idx_planet_osm_point_waygeo" gist (way_geo)

如果总是在同一列上进行过滤，您可以创建一个多列索引涵盖您感兴趣的几列，所以仅索引扫描成为可能：

CREATE INDEX planet_osm_point_bar_idx ON planet_osm_point (amenity, name, osm_id)

Postgres 9.5

即将推出的 Postgres9.5介绍主要改进这恰好解决了您的情况：

允许查询使用 GiST 索引对边界框索引对象（多边形、圆形）执行精确的距离过滤（亚历山大·科罗特科夫、海基·林纳坎加斯）

以前，需要一个公共表表达式来返回一个大的按边界框距离排序然后过滤的行数进一步进行更准确的非边界框距离计算。

允许 GiST 索引执行仅索引扫描 (Anastasia Lubennikova, Heikki Linnakangas, Andreas Karlsson)

您对此特别感兴趣。现在你可以拥有一个single多列（覆盖）GiST 索引：

CREATE INDEX reservations_range_idx ON reservations
USING gist(amenity, way_geo, name, osm_id)

And:

提高位图索引扫描性能（Teodor Sigaev、Tom Lane）

And:

添加GROUP BY分析功能GROUPING SETS, CUBE and ROLLUP（安德鲁·吉尔斯、阿特里·夏尔马）

为什么？因为ROLLUP会简化我建议的查询。相关回答：

PostgreSQL 中的 Grouping() 等效项？

第一个 alpha 版本已于 2015 年 7 月 2 日发布。预计发布时间表：

这是9.5版本的alpha版本，表明一些变化在发布之前功能仍然是可能的。 PostgreSQL 项目将于8月份发布9.5 beta 1，之后定期发布测试所需的额外测试版，直到最终版本发布 2015年底。

Basics

当然，一定不要忽视基础知识：

PostgreSQL Wiki 上的慢查询问题页面

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)