The optimization known as partition
pruning is based on a relatively simple concept which
can be described as “Do not scan partitions where there can
be no matching values”. Suppose a partitioned table
t1
is created by this statement:
CREATE TABLE t1 (
fname VARCHAR(50) NOT NULL,
lname VARCHAR(50) NOT NULL,
region_code TINYINT UNSIGNED NOT NULL,
dob DATE NOT NULL
)
PARTITION BY RANGE( region_code ) (
PARTITION p0 VALUES LESS THAN (64),
PARTITION p1 VALUES LESS THAN (128),
PARTITION p2 VALUES LESS THAN (192),
PARTITION p3 VALUES LESS THAN MAXVALUE
);
Suppose that you wish to obtain results from a
SELECT
statement such as this one:
SELECT fname, lname, region_code, dob
FROM t1
WHERE region_code > 125 AND region_code < 130;
It is easy to see that none of the rows which ought to be returned
are in either of the partitions p0
or
p3
; that is, we need search only in partitions
p1
and p2
to find matching
rows. By limiting the search, it is possible to expend much less
time and effort in finding matching rows than by scanning all
partitions in the table. This “cutting away” of
unneeded partitions is known as
pruning. When the optimizer
can make use of partition pruning in performing this query,
execution of the query can be an order of magnitude faster than
the same query against a nonpartitioned table containing the same
column definitions and data.
The optimizer can perform pruning whenever a
WHERE
condition can be reduced to either one of
the following two cases:
partition_column
=constant
partition_column
IN (constant1
,constant2
, ...,constantN
)
In the first case, the optimizer simply evaluates the partitioning
expression for the value given, determines which partition
contains that value, and scans only this partition. In many cases,
the equal sign can be replaced with another arithmetic comparison,
including <
, >
,
<=
, >=
, and
<>
. Some queries using
BETWEEN
in the WHERE
clause
can also take advantage of partition pruning. See the examples
later in this section.
In the second case, the optimizer evaluates the partitioning expression for each value in the list, creates a list of matching partitions, and then scans only the partitions in this partition list.
SELECT
,
DELETE
, and
UPDATE
statements support partition
pruning. An INSERT
statement also
accesses only one partition per inserted row; this is true even
for a table that is partitioned by HASH
or
KEY
although this is not currently shown in the
output of EXPLAIN
.
Pruning can also be applied to short ranges, which the optimizer
can convert into equivalent lists of values. For instance, in the
previous example, the WHERE
clause can be
converted to WHERE region_code IN (126, 127, 128,
129)
. Then the optimizer can determine that the first
two values in the list are found in partition
p1
, the remaining two values in partition
p2
, and that the other partitions contain no
relevant values and so do not need to be searched for matching
rows.
The optimizer can also perform pruning for
WHERE
conditions that involve comparisons of
the preceding types on multiple columns for tables that use
RANGE COLUMNS
or LIST
COLUMNS
partitioning.
This type of optimization can be applied whenever the partitioning
expression consists of an equality or a range which can be reduced
to a set of equalities, or when the partitioning expression
represents an increasing or decreasing relationship. Pruning can
also be applied for tables partitioned on a
DATE
or
DATETIME
column when the
partitioning expression uses the
YEAR()
or
TO_DAYS()
function. Pruning can
also be applied for such tables when the partitioning expression
uses the TO_SECONDS()
function.
Suppose that table t2
, partitioned on a
DATE
column, is created using the
statement shown here:
CREATE TABLE t2 (
fname VARCHAR(50) NOT NULL,
lname VARCHAR(50) NOT NULL,
region_code TINYINT UNSIGNED NOT NULL,
dob DATE NOT NULL
)
PARTITION BY RANGE( YEAR(dob) ) (
PARTITION d0 VALUES LESS THAN (1970),
PARTITION d1 VALUES LESS THAN (1975),
PARTITION d2 VALUES LESS THAN (1980),
PARTITION d3 VALUES LESS THAN (1985),
PARTITION d4 VALUES LESS THAN (1990),
PARTITION d5 VALUES LESS THAN (2000),
PARTITION d6 VALUES LESS THAN (2005),
PARTITION d7 VALUES LESS THAN MAXVALUE
);
The following statements using t2
can make of
use partition pruning:
SELECT * FROM t2 WHERE dob = '1982-06-23';
UPDATE t2 SET region_code = 8 WHERE dob BETWEEN '1991-02-15' AND '1997-04-25';
DELETE FROM t2 WHERE dob >= '1984-06-21' AND dob <= '1999-06-21'
In the case of the last statement, the optimizer can also act as follows:
Find the partition containing the low end of the range.
YEAR('1984-06-21')
yields the value1984
, which is found in partitiond3
.Find the partition containing the high end of the range.
YEAR('1999-06-21')
evaluates to1999
, which is found in partitiond5
.Scan only these two partitions and any partitions that may lie between them.
In this case, this means that only partitions
d3
,d4
, andd5
are scanned. The remaining partitions may be safely ignored (and are ignored).
Invalid DATE
and DATETIME
values referenced in the WHERE
condition of a
statement against a partitioned table are treated as
NULL
. This means that a query such as
SELECT * FROM
does not return any values (see Bug
#40972).
partitioned_table
WHERE
date_column
<
'2008-12-00'
So far, we have looked only at examples using
RANGE
partitioning, but pruning can be applied
with other partitioning types as well.
Consider a table that is partitioned by LIST
,
where the partitioning expression is increasing or decreasing,
such as the table t3
shown here. (In this
example, we assume for the sake of brevity that the
region_code
column is limited to values between
1 and 10 inclusive.)
CREATE TABLE t3 (
fname VARCHAR(50) NOT NULL,
lname VARCHAR(50) NOT NULL,
region_code TINYINT UNSIGNED NOT NULL,
dob DATE NOT NULL
)
PARTITION BY LIST(region_code) (
PARTITION r0 VALUES IN (1, 3),
PARTITION r1 VALUES IN (2, 5, 8),
PARTITION r2 VALUES IN (4, 9),
PARTITION r3 VALUES IN (6, 7, 10)
);
For a statement such as SELECT * FROM t3 WHERE
region_code BETWEEN 1 AND 3
, the optimizer determines in
which partitions the values 1, 2, and 3 are found
(r0
and r1
) and skips the
remaining ones (r2
and r3
).
For tables that are partitioned by HASH
or
[LINEAR] KEY
, partition pruning is also
possible in cases in which the WHERE
clause
uses a simple =
relation against a column used
in the partitioning expression. Consider a table created like
this:
CREATE TABLE t4 (
fname VARCHAR(50) NOT NULL,
lname VARCHAR(50) NOT NULL,
region_code TINYINT UNSIGNED NOT NULL,
dob DATE NOT NULL
)
PARTITION BY KEY(region_code)
PARTITIONS 8;
A statement that compares a column value with a constant can be pruned:
UPDATE t4 WHERE region_code = 7;
Pruning can also be employed for short ranges, because the
optimizer can turn such conditions into IN
relations. For example, using the same table t4
as defined previously, queries such as these can be pruned:
SELECT * FROM t4 WHERE region_code > 2 AND region_code < 6;
SELECT * FROM t4 WHERE region_code BETWEEN 3 AND 5;
In both these cases, the WHERE
clause is
transformed by the optimizer into WHERE region_code IN
(3, 4, 5)
.
This optimization is used only if the range size is smaller than the number of partitions. Consider this statement:
DELETE FROM t4 WHERE region_code BETWEEN 4 AND 12;
The range in the WHERE
clause covers 9 values
(4, 5, 6, 7, 8, 9, 10, 11, 12), but t4
has
only 8 partitions. This means that the DELETE
cannot be pruned.
When a table is partitioned by HASH
or
[LINEAR] KEY
, pruning can be used only on
integer columns. For example, this statement cannot use pruning
because dob
is a
DATE
column:
SELECT * FROM t4 WHERE dob >= '2001-04-14' AND dob <= '2005-10-15';
However, if the table stores year values in an
INT
column, then a query having
WHERE year_col >= 2001 AND year_col <=
2005
can be pruned.
Tables using a storage engine that provides automatic
partitioning, such as the NDB
storage engine
used by MySQL Cluster can be pruned if they are explicitly
partitioned.