Was ist der Leistungsunterschied bei Implementierungen der relationalen MySQL-Division (IN AND statt IN OR)?

Ich habe einige Verbesserungen im JOIN vorgenommen Ausführung; siehe unten.

Ich stimme für den JOIN-Ansatz für Geschwindigkeit. So habe ich es bestimmt:

HABEN, Version 1

mysql> FLUSH STATUS;
mysql> SELECT city
    ->     FROM us_vch200
    ->     WHERE state IN ('IL', 'MO', 'PA')
    ->     GROUP BY city
    ->     HAVING count(DISTINCT state) >= 3;
+-------------+
| city        |
+-------------+
| Springfield |
| Washington  |
+-------------+
mysql> SHOW SESSION STATUS LIKE 'Handler%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_external_lock      | 2     |
| Handler_read_first         | 1     |
| Handler_read_key           | 2     |
| Handler_read_last          | 1     |
| Handler_read_next          | 4175  | -- full index scan

(etc)

+----+-------------+-----------+-------+-----------------------+------------+---------+------+------+--------------------------------------------------+
| id | select_type | table     | type  | possible_keys         | key        | key_len | ref  | rows | Extra                                            |
+----+-------------+-----------+-------+-----------------------+------------+---------+------+------+--------------------------------------------------+
|  1 | SIMPLE      | us_vch200 | range | state_city,city_state | city_state | 769     | NULL | 4176 | Using where; Using index for group-by (scanning) |
+----+-------------+-----------+-------+-----------------------+------------+---------+------+------+--------------------------------------------------+

Das 'Extra' weist darauf hin, dass es sich entschieden hat, GROUP BY in Angriff zu nehmen und verwenden Sie INDEX(city, state) obwohl INDEX(state, city) könnte sinnvoll sein.

HABEN, Version 2

Umschalten auf INDEX(state, city) ergibt:

mysql> FLUSH STATUS;
mysql> SELECT city
    ->     FROM us_vch200  IGNORE INDEX(city_state)
    ->     WHERE state IN ('IL', 'MO', 'PA')
    ->     GROUP BY city
    ->     HAVING count(DISTINCT state) >= 3;
+-------------+
| city        |
+-------------+
| Springfield |
| Washington  |
+-------------+
mysql> SHOW SESSION STATUS LIKE 'Handler%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 1     |
| Handler_external_lock      | 2     |
| Handler_read_key           | 401   |
| Handler_read_next          | 398   |
| Handler_read_rnd           | 398   |
(etc)

+----+-------------+-----------+-------+-----------------------+------------+---------+------+------+------------------------------------------+
| id | select_type | table     | type  | possible_keys         | key        | key_len | ref  | rows | Extra                                    |
+----+-------------+-----------+-------+-----------------------+------------+---------+------+------+------------------------------------------+
|  1 | SIMPLE      | us_vch200 | range | state_city,city_state | state_city | 2       | NULL |  397 | Using where; Using index; Using filesort |
+----+-------------+-----------+-------+-----------------------+------------+---------+------+------+------------------------------------------+

MITGLIED

mysql> SELECT x.city
    -> FROM us_vch200 x
    -> JOIN us_vch200 y ON y.city= x.city AND y.state = 'MO'
    -> JOIN us_vch200 z ON z.city= x.city AND z.state = 'PA'
    -> WHERE                                  x.state = 'IL';
+-------------+
| city        |
+-------------+
| Springfield |
| Washington  |
+-------------+
2 rows in set (0.00 sec)

mysql> SHOW SESSION STATUS LIKE 'Handler%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 1     |
| Handler_external_lock      | 6     |
| Handler_read_key           | 86    |
| Handler_read_next          | 87    |
(etc)    
+----+-------------+-------+------+-----------------------+------------+---------+--------------------+------+--------------------------+
| id | select_type | table | type | possible_keys         | key        | key_len | ref                | rows | Extra                    |
+----+-------------+-------+------+-----------------------+------------+---------+--------------------+------+--------------------------+
|  1 | SIMPLE      | y     | ref  | state_city,city_state | state_city | 2       | const              |   81 | Using where; Using index |
|  1 | SIMPLE      | z     | ref  | state_city,city_state | state_city | 769     | const,world.y.city |    1 | Using where; Using index |
|  1 | SIMPLE      | x     | ref  | state_city,city_state | state_city | 769     | const,world.y.city |    1 | Using where; Using index |
+----+-------------+-------+------+-----------------------+------------+---------+--------------------+------+--------------------------+

Nur INDEX(state, city) wird gebraucht. Die Handler-Nummern sind für diese Formulierung am kleinsten, daher folgere ich daraus, dass sie am schnellsten ist.

Beachten Sie, wie der Optimierer selbst entschieden hat, mit welcher Tabelle er beginnen soll, wahrscheinlich aufgrund von

+-------+----------+
| state | COUNT(*) |
+-------+----------+
| IL    |      221 |
| MO    |       81 |  -- smallest
| PA    |       96 |
+-------+----------+

Schlussfolgerungen

JOIN (ohne das unnötige t Tabelle) ist wahrscheinlich die schnellste. Außerdem wird dieser zusammengesetzte Index benötigt:INDEX(state, city) .

So übersetzen Sie zurück zu Ihrem Anwendungsfall:

city --> documentid
state --> termid

Vorbehalt:YMMV, da die Verteilung der Werte für documentid und termid ganz anders sein könnte als in dem von mir verwendeten Testfall.