我有一个非常慢(通常接近 60 秒)的 MySQL 查询,它试图找到用户对一项民意调查的投票方式与他们对所有先前民意调查的投票方式之间的相关性。
基本上,我们收集在给定民意调查中投票给某一特定选项的每个人的用户 ID。
然后,我们查看该小组如何在之前的每次民意调查中投票,并将这些结果与每个人(不仅仅是该小组)对该民意调查的投票方式进行比较。子组结果与总结果之间的差异就是偏差,该查询按偏差排序以确定最强的相关性。
查询有点混乱:
(SELECT p_id as poll_id, o_id AS option_id, description, optCount AS option_count, subgroup_percent, total_percent, ABS(total_percent - subgroup_percent) AS deviation
FROM(
SELECT poll_id AS p_id,
option_id AS o_id,
(SELECT description FROM `option` WHERE id = o_id) AS description,
COUNT(*) AS optCount,
(SELECT COUNT(*) FROM response INNER JOIN user_ids_122 ON response.user_id = user_ids_122.user_id WHERE option_id = o_id ) /
(SELECT COUNT(*) FROM response INNER JOIN user_ids_122 ON response.user_id = user_ids_122.user_id WHERE poll_id = p_id) AS subgroup_percent,
(SELECT COUNT(*) FROM response WHERE option_id = o_id) /
(SELECT COUNT(*) FROM response WHERE poll_id = p_id) AS total_percent
FROM response
INNER JOIN user_ids_122 ON response.user_id = user_ids_122.user_id
WHERE poll_id < '61'
GROUP BY option_id DESC
) AS derived_table_122
)
ORDER BY deviation DESC, option_count DESC
请注意,user_ids_122 是之前创建的临时表,其中包含投票给选项 ID 122 的所有用户的 ID。
“响应”表大约有 65,000 行,“用户”表大约有 7,000 行,“选项”表大约有 130 行。
UPDATE:
这是解释表...
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 121 Using filesort
2 DERIVED user_ids_122 ALL NULL NULL NULL NULL 74 Using temporary; Using filesort
2 DERIVED response ref poll_id,user_id user_id 4 correlated.user_ids_122.user_id 780 Using where
7 DEPENDENT SUBQUERY response ref poll_id poll_id 4 func 7800 Using index
6 DEPENDENT SUBQUERY response ref option_id option_id 4 func 7800 Using index
5 DEPENDENT SUBQUERY user_ids_122 ALL NULL NULL NULL NULL 74
5 DEPENDENT SUBQUERY response ref poll_id,user_id poll_id 4 func 7800 Using where
4 DEPENDENT SUBQUERY user_ids_122 ALL NULL NULL NULL NULL 74
4 DEPENDENT SUBQUERY response ref user_id,option_id user_id 4 correlated.user_ids_122.user_id 780 Using where
3 DEPENDENT SUBQUERY option eq_ref PRIMARY PRIMARY 4 func 1
更新2:
“响应”表中的每一行如下所示:
id (INT) poll_id (INT) user_id (INT) option_id (INT) created (DATETIME)
7 7 1 14 2011-03-17 09:25:10
“选项”表中的每一行如下所示:
id (INT) poll_id (INT) text (TEXT) description (TEXT)
14 7 No people who dislike country music
“user”表中的每一行如下所示:
id (INT) email (TEXT) created (DATETIME)
1 [email protected] /cdn-cgi/l/email-protection 2011-02-15 11:16:03