【简介】
索引是常见的查询性能优化手段,但索引本身也需要占用一定的磁盘空间,同时也会降低数据入库的性能。通过学习和了解GaussDB(DWS)的btree索引和cbtree的机制,本文提供一种依靠索引元数据来识别表上索引冗余的手段
【方案】
注: 不支持表达式索引, 不区分partial index和普通的index
-- duplicate:重复索引
-- 解释:索引定义重复
-- 建议:删除重复索引
-- redundancy:冗余索引
-- 解释:如果索引A的索引列刚好是索引B的索引列的前面一部分,那么索引A就可以被认为是冗余索引
-- 建议:删除冗余索引
-- optimizable:可优化索引
-- 解释:如果索引A和B的索引列一致,只是索引列的顺序有差异,那么索引A或者B是可优化的
-- 建议:根据业务优化索引的列字段,通常会根据重复程度,删除索引的后N个索引列
WITH info AS
(
SELECT
n.nspname AS schemaname,
c.relname AS tablename,
x.indrelid AS indrelid,
x.indexrelid AS indexrelid,
indnatts,
indkey,
indexprs
FROM pg_index x
JOIN pg_class c ON c.oid = x.indrelid
JOIN pg_class i ON i.oid = x.indexrelid
LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE c.relkind = 'r' AND c.oid >= 16384 AND (c.reloptions IS NULL OR c.reloptions::text NOT LIKE '%internal_mask%')
AND i.relkind = 'i' AND i.oid >= 16384
AND x.indpred IS NULL
)
SELECT
i.schemaname,
i.tablename,
i.indexrelid::regclass::text AS baseidx,
substring(pg_get_indexdef(i.indexrelid) from 'USING .+\)') AS baseidxdef,
x.indexrelid::regclass::text AS optidx,
substring(pg_get_indexdef(x.indexrelid) from 'USING .+\)') AS optidxdef,
CASE WHEN i.indkey = x.indkey AND pg_get_expr(i.indexprs, i.indrelid) = pg_get_expr(x.indexprs, x.indrelid) THEN 'duplicate'::text
WHEN x.indexprs IS NULL AND strpos(i.indkey::text||' ', x.indkey::text) = 1 THEN 'redundancy'::text
WHEN x.indexprs IS NULL AND i.indkey @> x.indkey AND x.indkey @> i.indkey THEN 'optimizable'::text
ELSE NULL
END AS optpolicy
FROM info i
INNER JOIN pg_index x ON (i.indrelid = x.indrelid AND i.indexrelid > x.indexrelid)
WHERE x.indpred IS NULL AND optpolicy IS NOT NULL
ORDER BY 1, 2, 3
;