我的桌子lead
有一个索引:
\d lead
...
Indexes:
"lead_pkey" PRIMARY KEY, btree (id)
"lead_account__c" btree (account__c)
...
"lead_email" btree (email)
"lead_id_prefix" btree (id text_pattern_ops)
为什么 PG (9.1) 不使用索引来进行这种简单的相等选择?电子邮件几乎都是独一无二的......
db=> explain select * from lead where email = 'blah';
QUERY PLAN
------------------------------------------------------------
Seq Scan on lead (cost=0.00..319599.38 rows=1 width=5108)
Filter: (email = 'blah'::text)
(2 rows)
其他索引命中查询似乎没问题(尽管我不知道为什么这个查询不只使用 pkey 索引):
db=> explain select * from lead where id = '';
QUERY PLAN
------------------------------------------------------------------------------
Index Scan using lead_id_prefix on lead (cost=0.00..8.57 rows=1 width=5108)
Index Cond: (id = ''::text)
(2 rows)
db=> explain select * from lead where account__c = '';
QUERY PLAN
----------------------------------------------------------------------------------
Index Scan using lead_account__c on lead (cost=0.00..201.05 rows=49 width=5108)
Index Cond: (account__c = ''::text)
(2 rows)
起初我认为这可能是由于没有足够明确的价值观email
。例如,如果统计数据表明email
is blah
对于表的大部分内容,顺序扫描速度更快。但事实并非如此:
db=> select count(*), count(distinct email) from lead;
count | count
--------+--------
749148 | 733416
(1 row)
即使我强制关闭 seq 扫描,规划器也会表现得好像没有其他选择:
db=> set enable_seqscan = off;
SET
db=> show enable_seqscan;
enable_seqscan
----------------
off
(1 row)
db=> explain select * from lead where email = '[email protected] /cdn-cgi/l/email-protection';
QUERY PLAN
---------------------------------------------------------------------------
Seq Scan on lead (cost=10000000000.00..10000319599.38 rows=1 width=5108)
Filter: (email = '[email protected] /cdn-cgi/l/email-protection'::text)
(2 rows)
也尝试过EXPLAIN ANALYZE
:
db=> explain analyze select * from lead where email = '[email protected] /cdn-cgi/l/email-protection';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Seq Scan on lead (cost=10000000000.00..10000319732.76 rows=1 width=5102) (actual time=77845.244..77845.244 rows=0 loops=1)
Filter: (email = '[email protected] /cdn-cgi/l/email-protection'::text)
Total runtime: 77857.215 ms
(3 rows)
这里是\d
输出(抱歉,必须模糊列名,并进行裁剪以适应 SO 的限制;请参阅未裁剪的版本http://pastebin.com/ve3gzJpY http://pastebin.com/ve3gzJpY):
Table "lead"
Column | Type | Modifiers
--------------------------------------------+-----------------------------+-----------
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | real |
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | text |
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | boolean |
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | text |
...
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | text |
email | text |
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | boolean |
...
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | text |
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | text |
account__c | text |
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | text |
...
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | text |
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | text |
id | text | not null
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | real |
...
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | timestamp without time zone |
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | real |
Indexes:
"lead_pkey" PRIMARY KEY, btree (id)
"lead_account__c" btree (account__c)
"lead_XXXXXXXXXXXXXXXXXXXXXX" btree (XXXXXXXXXXXXXXXXXXXXXX)
"lead_XXXXXXXXXXXXXXXXXXXXXX" btree (XXXXXXXXXXXXXXXXXXXXXX)
"lead_XXXXXXXXXXXXXXXXXXXXXX" btree (XXXXXXXXXXXXXXXXXXXXXX)
"lead_email" btree (email)
"lead_id_prefix" btree (id text_pattern_ops)
Here is pg_dump --schema-only -t lead
(再次看到未裁剪的http://pastebin.com/ve3gzJpY http://pastebin.com/ve3gzJpY,也具有唯一的列名称,以防有助于再现性):
--
-- PostgreSQL database dump
--
SET statement_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SET check_function_bodies = false;
SET client_min_messages = warning;
SET default_tablespace = '';
SET default_with_oids = false;
--
-- Name: lead; Type: TABLE; Schema: public; Owner: pod; Tablespace:
--
CREATE TABLE lead (
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX real,
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX text,
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX boolean,
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX text,
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX text,
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX date,
...
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX text,
account__c text,
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX text,
...
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX text,
id text NOT NULL,
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX real,
...
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX timestamp without time zone,
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX real
);
ALTER TABLE lead OWNER TO pod;
--
-- Name: lead_pkey; Type: CONSTRAINT; Schema: public; Owner: pod; Tablespace:
--
ALTER TABLE ONLY lead
ADD CONSTRAINT lead_pkey PRIMARY KEY (id);
--
-- Name: lead_account__c; Type: INDEX; Schema: public; Owner: pod; Tablespace:
--
CREATE INDEX lead_account__c ON lead USING btree (account__c);
--
-- Name: lead_XXXXXXXXXXXXXXXXXXXX; Type: INDEX; Schema: public; Owner: pod; Tablespace:
--
CREATE INDEX lead_XXXXXXXXXXXXXXXXXXXX ON lead USING btree (XXXXXXXXXXXXXXXXXXXX);
--
-- Name: lead_XXXXXXXXXXXXXXXXXXXX; Type: INDEX; Schema: public; Owner: pod; Tablespace:
--
CREATE INDEX lead_XXXXXXXXXXXXXXXXXXXX ON lead USING btree (XXXXXXXXXXXXXXXXXXXX);
--
-- Name: lead_XXXXXXXXXXXXXXXXXXXX; Type: INDEX; Schema: public; Owner: pod; Tablespace:
--
CREATE INDEX lead_XXXXXXXXXXXXXXXXXXXX ON lead USING btree (XXXXXXXXXXXXXXXXXXXX);
--
-- Name: lead_email; Type: INDEX; Schema: public; Owner: pod; Tablespace:
--
CREATE INDEX lead_email ON lead USING btree (email);
--
-- Name: lead_id_prefix; Type: INDEX; Schema: public; Owner: pod; Tablespace:
--
CREATE INDEX lead_id_prefix ON lead USING btree (id text_pattern_ops);
--
-- PostgreSQL database dump complete
--
一些PG目录咒语:
db=> select * from pg_index where indexrelid = 'lead_email'::regclass;
indexrelid | indrelid | indnatts | indisunique | indisprimary | indisexclusion | indimmediate | indisclustered | indisvalid | indcheckxmin | indisready | indkey | indcollation | indclass | indoption | indexprs | indpred
------------+-----------+----------+-------------+--------------+----------------+--------------+----------------+------------+--------------+------------+--------+--------------+----------+-----------+----------+---------
215251995 | 101034456 | 1 | f | f | f | t | f | t | t | t | 101 | 100 | 10043 | 0 | ¤ | ¤
(1 row)
一些区域设置信息:
db=> show lc_collate;
lc_collate
-------------
en_US.UTF-8
(1 row)
db=> show lc_ctype;
lc_ctype
-------------
en_US.UTF-8
(1 row)
我搜索了很多过去的 SO 问题,但没有一个是关于像这样的简单相等查询的。