@mattab opened this issue on June 13th 2015

The goal of this issue is to solve a performance issue we experienced on a production Piwik instance measuring thousands of websites. This slow query is reproduced when using a segment that matches to no action eg. pageUrl@=this-wont-match-anthing or &segment=pageTitle==Větrnásssssss with an unknown (not found) url, then the sql query piwik generates is not optimal and mysql spends literally hours running them.

Query stats on the prod instance, via mysqldumpslow:

Count: 176  Time=2321.87s (408648s)  Lock=0.00s (0s)  Rows=1.0 (176), piwik[piwik]@localhost
  SELECT
  count(distinct log_inner.idvisitor) AS `N`, 
  count(*) AS `N`, 
  sum(log_inner.visit_total_actions) AS `N`, 
  max(log_inner.visit_total_actions) AS `N`, 
  sum(log_inner.visit_total_time) AS `N`, 
  sum(case log_inner.visit_total_actions when N then N when N then N else N end) AS `N`, 
  sum(case log_inner.visit_goal_converted when N then N else N end) AS `N`, 
  count(distinct log_inner.user_id) AS `N`
  FROM
  (
  SELECT
  log_visit.idvisitor, 
  log_visit.visit_total_actions, 
  log_visit.visit_total_time, 
  log_visit.visit_goal_converted, 
  log_visit.user_id
  FROM
  piwik_log_visit AS log_visit
  LEFT JOIN piwik_log_link_visit_action AS log_link_visit_action ON log_link_visit_action.idvisit = log_visit.idvisit
  WHERE
  ( log_visit.visit_last_action_time >= 'S'
  AND log_visit.visit_last_action_time <= 'S'
  AND log_visit.idsite IN ('S') )
  AND
  ( ( log_visit.visitor_count_visits IS NULL OR log_visit.visitor_count_visits = 'S'  ) AND log_link_visit_action.idaction_url = 'S' )
  GROUP BY
  log_visit.idvisit
  ORDER BY
  NULL
  ) AS log_inner

Another more extreme example of how the SQL query segment part can turn out:

                AND
                ( (log_visit.visitor_returning = '1' OR log_visit.visitor_returning = '2') AND (log_link_visit_action.idaction_url = '-100' OR 
log_link_visit_action.idaction_url = '-100' OR
log_link_visit_action.idaction_url = '-100' OR
log_link_visit_action.idaction_url = '-100' OR
log_link_visit_action.idaction_url = '-100' OR 
log_link_visit_action.idaction_url = '-100' OR
log_link_visit_action.idaction_url = '-100' OR 
log_link_visit_action.idaction_url = '-100' OR
log_link_visit_action.idaction_url = '-100' OR
log_link_visit_action.idaction_url = '-100' OR
log_link_visit_action.idaction_url = '-100' OR 
log_link_visit_action.idaction_url = '-100' OR
log_link_visit_action.idaction_url = '-100' 
[...]
 ))

The code responsible for this -100 is here: https://github.com/piwik/piwik/blob/2.13.1/core/Tracker/TableLogAction.php#L180-185

Solution: in the Segment SQL creation code, when idaction is -100 (not found), we don't add the SQL OR condition ie OR log_link_visit_action.idaction_url = '-100' since it won't match anything anyway. When there is a AND condition, and idaction was -100 (not found), the logical statement will be false. Didn't look how hard it will be to change...

@quba commented on June 15th 2015

:+1: to fix this asap

This issue was closed on June 18th 2015
Powered by GitHub Issue Mirror