@GreenReaper opened this Issue on March 17th 2015

As noted in #7440, commit 0abac3b312ff9560fb43012192e37c2369bab3ed introduced a UNION into findVisitor() within the visit code (now in core/Tracker/Model.php), which results in the creation of a temporary table on each call ("UNION queries use temporary tables.").

Separately, the idvisitor query normally takes 1-2ms and the config_id query takes around 0.5ms. The combined query is normally around 5ms. However, the number of rows examined may reach the hundreds, and it then starts taking significantly longer - up to 200ms on a dedicated machine.

The impact of these queries vary depending on their size and may not be visible on test or low-visit production websites.

This is what happened when I switched it to checking just the idvisitor part (see last hour):
mysql_tmp_tables-day-idvisitor
[The other temporary tables which are disk tables appeared more recently]
mysql_select_types-day

When I switched the code around to just use just the select on config_id, the decrease in activity remained. This suggests to me that it is the UNION of the two selects that is causing objects and scans.

An example of a slow query from the log (there are several 100ms+ ones per minute):

# Query_time: 0.203814  Lock_time: 0.048132 Rows_sent: 1  Rows_examined: 116
SET timestamp=1426562315;
( SELECT visit_last_action_time, visit_first_action_time, idvisitor, idvisit, user_id, visit_exit_idaction_url, visit_exit_idaction_name, visitor_returning, visitor_days_since_first, visitor_days_since_order, visitor_count_visits, visit_goal_buyer, location_country, location_region, location_city, location_latitude, location_longitude, referer_name, referer_keyword, referer_type, location_ip, location_browser_lang, visit_total_events, visit_total_time, visit_total_actions, visit_total_searches   ,
                0 as priority
                FROM piwik_log_visit
                WHERE visit_last_action_time >= '2015-03-17 02:48:34' AND visit_last_action_time <= '2015-03-17 03:48:34' AND idsite = '1'  AND config_id = '^Z?^\? D?' AND user_id IS NULL
                ORDER BY visit_last_action_time DESC
                LIMIT 1
             )
                UNION
                ( SELECT visit_last_action_time, visit_first_action_time, idvisitor, idvisit, user_id, visit_exit_idaction_url, visit_exit_idaction_name, visitor_returning, visitor_days_since_first, visitor_days_since_order, visitor_count_visits, visit_goal_buyer, location_country, location_region, location_city, location_latitude, location_longitude, referer_name, referer_keyword, referer_type, location_ip, location_browser_lang, visit_total_events, visit_total_time, visit_total_actions, visit_total_searches   ,
                1 as priority
                FROM piwik_log_visit
                WHERE visit_last_action_time >= '2015-03-17 02:48:34' AND visit_last_action_time <= '2015-03-17 03:48:34' AND idsite = '1'  AND idvisitor = '^U\"?^^0^M?U'
                ORDER BY visit_last_action_time DESC
                LIMIT 1
             )
                ORDER BY priority DESC
                LIMIT 1;

An EXPLAIN of the query (the last entry involves the temporary table created by the UNION):

| id | select_type  | table           | type  | possible_keys                                                             | key                          | key_len | ref         | rows | Extra                       |
+----+--------------+-----------------+-------+---------------------------------------------------------------------------+------------------------------+---------+-------------+------+-----------------------------+
|  1 | PRIMARY      | piwik_log_visit | range | index_idsite_config_datetime,index_idsite_datetime,index_idsite_idvisitor | index_idsite_config_datetime | 20      | NULL        |    1 | Using where                 |
|  2 | UNION        | piwik_log_visit | ref   | index_idsite_config_datetime,index_idsite_datetime,index_idsite_idvisitor | index_idsite_idvisitor       | 12      | const,const |    1 | Using where; Using filesort |
| NULL | UNION RESULT | <union1,2>      | ALL   | NULL                                                                      | NULL                         | NULL    | NULL        | NULL | Using filesort              |

The UNION is intended to check visits against both idvisitor and config_id and return the first if it is present, other than the second. I propose that two queries be performed - check idvisitor first, and if that does not return a result, do the config_id query.

@GreenReaper commented on March 17th 2015

This seems to work, although it should be checked thoroughly:

        // We do each query in turn so that each sql query uses its own INDEX
        // We don't use UNION because it would create a temporary table
        else {

            // will use INDEX index_idsite_idvisitor (idsite, idvisitor)
            $where     = ' AND idvisitor = ?';
            $bindSql[] = $idVisitor;
            $sqlVisitorId = "$select
                $from
                WHERE $whereCommon $where
                ORDER BY visit_last_action_time DESC
                LIMIT 1
            ";
            $visitRow = $this->getDb()->fetch($sqlVisitorId, $bindSql);

            if (!$visitRow) {

                // will use INDEX index_idsite_config_datetime (idsite, config_id, visit_last_action_time)
                array_pop($bindSql);
                $where       = ' AND config_id = ? AND user_id IS NULL ';
                $bindSql[]   = $configId;
                $sqlConfigId = "$select
                    $from
                    WHERE $whereCommon $where
                    ORDER BY visit_last_action_time DESC
                    LIMIT 1
                ";
                $visitRow = $this->getDb()->fetch($sqlConfigId, $bindSql);
            }
        }

        return $visitRow;
@GreenReaper commented on March 17th 2015

Eliminating the temporary tables is nice but doesn't completely eliminated the database time used. Separating these two queries has, however, helped identify where rows originate from - the idvisitor subquery. Here's a 34ms outlier example (from a 5.7Gb database, with MySQL 5.5.41 - the database is dedicated to piwik but is under memory pressure from other systems on the same machine):

# User<a class='mention' href='https://github.com/Host'>@Host</a>: root[root] @ localhost [127.0.0.1]
# Query_time: 0.033887  Lock_time: 0.000255 Rows_sent: 0  Rows_examined: 442
SET timestamp=1426591134;
SELECT visit_last_action_time, visit_first_action_time, idvisitor, idvisit, user_id, visit_exit_idaction_url, visit_exit_idaction_name, visitor_returning, visitor_days_since_first, visitor_days_since_order, visitor_count_visits, visit_goal_buyer, location_country, location_region, location_city, location_latitude, location_longitude, referer_name, referer_keyword, referer_type, location_ip, location_browser_lang, visit_total_events, visit_total_time, visit_total_actions, visit_total_searches
                FROM piwik_log_visit
               WHERE visit_last_action_time >= '2015-03-17 10:48:54' AND visit_last_action_time <= '2015-03-17 11:48:54' AND idsite = '1'  AND idvisitor = '�Nʈ9lGx'
                ORDER BY visit_last_action_time DESC
                LIMIT 1;
# User<a class='mention' href='https://github.com/Host'>@Host</a>: root[root] @ localhost [127.0.0.1]
# Query_time: 0.001082  Lock_time: 0.000136 Rows_sent: 0  Rows_examined: 0
SET timestamp=1426591134;
SELECT visit_last_action_time, visit_first_action_time, idvisitor, idvisit, user_id, visit_exit_idaction_url, visit_exit_idaction_name, visitor_returning, visitor_days_since_first, visitor_days_since_order, visitor_count_visits, visit_goal_buyer, location_country, location_region, location_city, location_latitude, location_longitude, referer_name, referer_keyword, referer_type, location_ip, location_browser_lang, visit_total_events, visit_total_time, visit_total_actions, visit_total_searches
                  FROM piwik_log_visit
                  WHERE visit_last_action_time >= '2015-03-17 10:48:54' AND visit_last_action_time <= '2015-03-17 11:48:54' AND idsite = '1'  AND config_id = '1����Ko�' AND user_id IS NULL
                  ORDER BY visit_last_action_time DESC
                  LIMIT 1;

I should emphasize that most of the time we're still talking a 1-3ms query (which is maybe 0.5-1ms less without the config_id part). There are regular outliers (most likely because the database is not completely in memory), and when you're getting five to ten hits a second they add up.

@mattab commented on May 1st 2015 Owner

Hi @GreenReaper thanks for the report. We've opened a new issue as forgot there was this one: Tracker: Faster visitor recognition #7779

This Issue was closed on May 1st 2015
Powered by GitHub Issue Mirror