这是一个案例关系划分 /questions/tagged/relational-division- 增加了特殊要求,即同一对话不得有额外的 users.
Assuming表的PK"conversationUsers"
is on ("userId", "conversationId")
,它强制组合的唯一性,NOT NULL
并隐含地提供了性能的基本指标。多列 PK 中的列this命令。理想情况下,您还有另一个索引("conversationId", "userId")
. See:
- 复合索引也适合第一个字段的查询吗? https://dba.stackexchange.com/a/27493/3684
对于基本查询,有“蛮力”计算匹配用户数量的方法all所有给定用户的对话,然后过滤与所有给定用户匹配的对话。对于小表和/或只有短输入数组和/或每个用户的对话很少,但是扩展性不好:
SELECT "conversationId"
FROM "conversationUsers" c
WHERE "userId" = ANY ('{1,4,6}'::int[])
GROUP BY 1
HAVING count(*) = array_length('{1,4,6}'::int[], 1)
AND NOT EXISTS (
SELECT FROM "conversationUsers"
WHERE "conversationId" = c."conversationId"
AND "userId" <> ALL('{1,4,6}'::int[])
);
消除与其他用户的对话NOT EXISTS
反半连接。更多的:
- 我如何(或可以)在多个列上选择 DISTINCT? https://stackoverflow.com/questions/54418/how-do-i-or-can-i-select-distinct-on-multiple-columns/12632129#12632129
替代技术:
- 选择其他表中不存在的行 https://stackoverflow.com/questions/19363481/select-rows-which-are-not-present-in-other-table/19364694#19364694
还有其他各种(更快)的关系划分 /questions/tagged/relational-division查询技术。但最快的并不适合dynamic用户 ID 的数量。
- 如何在多通关系中过滤 SQL 结果 https://stackoverflow.com/questions/7364969/how-to-filter-sql-results-in-a-has-many-through-relation/7774879#7774879
For a 快速查询也可以处理动态数量的用户 ID,请考虑递归CTE https://www.postgresql.org/docs/current/queries-with.html:
WITH RECURSIVE rcte AS (
SELECT "conversationId", 1 AS idx
FROM "conversationUsers"
WHERE "userId" = ('{1,4,6}'::int[])[1]
UNION ALL
SELECT c."conversationId", r.idx + 1
FROM rcte r
JOIN "conversationUsers" c USING ("conversationId")
WHERE c."userId" = ('{1,4,6}'::int[])[idx + 1]
)
SELECT "conversationId"
FROM rcte r
WHERE idx = array_length(('{1,4,6}'::int[]), 1)
AND NOT EXISTS (
SELECT FROM "conversationUsers"
WHERE "conversationId" = r."conversationId"
AND "userId" <> ALL('{1,4,6}'::int[])
);
为了便于使用,将其包装在函数中或准备好的声明 https://www.postgresql.org/docs/current/sql-prepare.html. Like:
PREPARE conversations(int[]) AS
WITH RECURSIVE rcte AS (
SELECT "conversationId", 1 AS idx
FROM "conversationUsers"
WHERE "userId" = $1[1]
UNION ALL
SELECT c."conversationId", r.idx + 1
FROM rcte r
JOIN "conversationUsers" c USING ("conversationId")
WHERE c."userId" = $1[idx + 1]
)
SELECT "conversationId"
FROM rcte r
WHERE idx = array_length($1, 1)
AND NOT EXISTS (
SELECT FROM "conversationUsers"
WHERE "conversationId" = r."conversationId"
AND "userId" <> ALL($1);
Call:
EXECUTE conversations('{1,4,6}');
数据库小提琴(还展示了一个function)
还有改进的余地:top为了提高性能,您必须将对话次数最少的用户放在输入数组中的第一位,以便尽早消除尽可能多的行。为了获得最佳性能,您可以动态生成非动态、非递归查询(使用其中之一)fast第一个链接中的技术)并依次执行。您甚至可以使用动态 SQL 将其包装在单个 plpgsql 函数中...
更多解释:
- 在 WHERE 子句中多次使用同一列 https://stackoverflow.com/questions/47351766/using-same-column-multiple-times-in-where-clause/47512013#47512013
替代方案:稀疏写入表的 MV
如果表"conversationUsers"
大部分是只读的(旧的对话不太可能改变)你可以使用MATERIALIZED VIEW https://www.postgresql.org/docs/current/sql-creatematerializedview.html在排序数组中使用预先聚合的用户,并在该数组列上创建一个普通的 btree 索引。
CREATE MATERIALIZED VIEW mv_conversation_users AS
SELECT "conversationId", array_agg("userId") AS users -- sorted array
FROM (
SELECT "conversationId", "userId"
FROM "conversationUsers"
ORDER BY 1, 2
) sub
GROUP BY 1
ORDER BY 1;
CREATE INDEX ON mv_conversation_users (users) INCLUDE ("conversationId");
演示的覆盖索引需要 Postgres 11。请参阅:
- https://dba.stackexchange.com/a/207938/3684 https://dba.stackexchange.com/a/207938/3684
关于对子查询中的行进行排序:
- 如何将 ORDER BY 和 LIMIT 与聚合函数结合应用? https://dba.stackexchange.com/a/213724/3684
在旧版本中使用普通的多列索引(users, "conversationId")
。对于很长的数组,哈希索引在 Postgres 10 或更高版本中可能有意义。
那么更快的查询将是:
SELECT "conversationId"
FROM mv_conversation_users c
WHERE users = '{1,4,6}'::int[]; -- sorted array!
数据库小提琴
您必须权衡存储、写入和维护的额外成本与读取性能的好处。
另外:考虑不带双引号的合法标识符。conversation_id
代替"conversationId"
etc.:
- PostgreSQL 列名区分大小写吗? https://stackoverflow.com/questions/20878932/are-postgresql-column-names-case-sensitive/20880247#20880247