@sdosg opened this Issue on March 16th 2015

After upgrading to 2.12.b5 i have still big performance issues.

2.9.2: 740 req/s
2.11.2: 370 req/s
2.12.b5: 400 req/s

I can still see high cpu consumption by the php fcgi processes (~ 20-25% cpu p. process under load, with 2.9.2 it was ~10-15%).

see Issue 7416

@mattab commented on March 16th 2015 Owner

Hi there,

The queries you pasted in this comment: https://github.com/piwik/piwik/issues/7416#issuecomment-81170546 - the SQL queries haven't changed recently (in months or years) so these should not be the cause of slowdown... unless of course it's possible that your mysql has reached some limits and maybe it starts swapping or other things. Do you have more information on what is slow? Can you maybe enable mysql slow queries?

@mattab commented on March 16th 2015 Owner

Update to my previous comment: actually one of the SQL query has changed (in https://github.com/piwik/piwik/issues/6436) - as @diosmosis suggested, @sdosg could you please try this query:

SELECT idaction as idaction, type, name FROM piwik_log_action WHERE ( hash = CRC32('Gallery | Inkbunny, the Furry Art Community') AND name = 'Gallery | Inkbunny, the Furry Art Community' AND type = '4' )  OR ( hash = CRC32('inkbunny.net/submissionsviewall.php?mode=search&orderby=&text=&stringtype=&keywords=no&title=no&description=no&artist=&favsby=&type=4&sale=&days=&keyword_id=&user_id=&random=&md5=') AND name = 'inkbunny.net/submissionsviewall.php?mode=search&orderby=&text=&stringtype=&keywords=no&title=no&description=no&artist=&favsby=&type=4&sale=&days=&keyword_id=&user_id=&random=&md5=' AND type = '1' )  

How many rows are returned? (could you send us the full return rows set of this query?)

@GreenReaper commented on March 17th 2015

Hi @mattab and @diosmosis - I think that query (from #7416) is for me, not sdosg, and the issue for me with it seems to be the GROUP BY. When I use EXPLAIN with the GROUP BY query it shows " Using where; Using temporary; Using filesort" as opposed to just "Using where".

With the query above, I get one row:

| idaction | type | name                
|        1 |    1 | inkbunny.net/submissionsviewall.php?mode=search&orderby=&text=&stringtype=&keywords=no&title=no&description=no&artist=&favsby=&type=4&sale=&days=&keyword_id=&user_id=&random=&md5=

With the original query involving the GROUP BY I get two rows:

| idaction | type | name 
|     3665 |    1 | inkbunny.net/submissionsviewall.php?mode=search&orderby=&text=&stringtype=&keywords=no&title=no&description=no&artist=&favsby=&type=4&sale=&days=&keyword_id=&user_id=&random=&md5= |
|        1 |    4 | Gallery | Inkbunny, the Furry Art Community
@GreenReaper commented on March 17th 2015

Per MySQL documentation:

"By default, MySQL sorts all GROUP BY col1, col2, ... queries as if you specified ORDER BY col1, col2, ... in the query as well. If you include an explicit ORDER BY clause that contains the same column list, MySQL optimizes it away without any speed penalty, although the sorting still occurs. If a query includes GROUP BY but you want to avoid the overhead of sorting the result, you can suppress sorting by specifying ORDER BY NULL."

Perhaps that is what is needed here? I don't know if you need them in a specific order. Note that using GROUP BY order for ORDER BY is also deprecated (because it can be a performance issue), so if you do need it, it should be specified.

If I add ORDER BY NULL after the GROUP BY clause, I get "using temporary" but not "using filesort" in explain.

@GreenReaper commented on March 17th 2015

I tried adding ORDER BY NULL after the GROUP BY clause in:
public function getIdsAction($actionsNameAndType)
of core/Tracker/Model.php

For me, this does not have a significant impact the temporary table statistic (on disk or otherwise). I guess "filesort" doesn't actually mean "using files to do it". I suspect the tables come from other queries.

Using ORDER BY NULL to avoid the sort might still be a worthwhile optimization, of course - assuming you don't actually rely on the order anywhere - and it might be worth @sdosg trying it.

@diosmosis commented on March 17th 2015 Member

Did you remove the MIN(idaction) and replace w/ idaction in the first query you executed?

@mattab commented on March 17th 2015 Owner

(my bad, I now removed the MIN() in the query above - can you try again?)

@GreenReaper commented on March 17th 2015

I tried it and it made no difference for me, at least in the number of temporary tables; maybe it would improve things for the original poster, or be a general improvement, though?

To be clear: I'm not facing the specific performance issues which sdosg reports; I just noticed temp disk tables where before there were none and I'm trying to eliminate them. Our two issues may not be related. (I should probably not have hopped onto his bug last time, sorry!) Unfortunately I'm not on Solaris/OS X or able to use MySQL 5.6 easily on Debian, or it might be easier to trace these tables down.

@diosmosis commented on March 17th 2015 Member

I am looking for the number results so we can see if MySQL is traversing over a lot of rows, thus needing disk space. Can you run the query and post the number of rows returned? The second query was introduced due to a race condition that created duplicate actions. If there are many duplicate actions, the query would have to do more work. This seems unlikely, but it is something to check.

@GreenReaper commented on March 17th 2015

I decided just to log everything for a few seconds by setting long_query_time = 0 and see what took time. It does not appear to be the query above which selects idaction. These examined 2, 4 or 0 rows.

Instead, the queries that were taking taking 0.1 or 0.2 seconds looked similar to the following. Many were examining 300 or 400 rows. They typically not take more than 0.005 seconds until they started examining 100-150 rows, although there were one or two exceptions. There were a lot of them.

# Query_time: 0.203814  Lock_time: 0.048132 Rows_sent: 1  Rows_examined: 116
SET timestamp=1426562315;
( SELECT visit_last_action_time, visit_first_action_time, idvisitor, idvisit, user_id, visit_exit_idaction_url, visit_exit_idaction_name, visitor_returning, visitor_days_since_first, visitor_days_since_order, visitor_count_visits, visit_goal_buyer, location_country, location_region, location_city, location_latitude, location_longitude, referer_name, referer_keyword, referer_type, location_ip, location_browser_lang, visit_total_events, visit_total_time, visit_total_actions, visit_total_searches   ,
                0 as priority
                FROM piwik_log_visit
                WHERE visit_last_action_time >= '2015-03-17 02:48:34' AND visit_last_action_time <= '2015-03-17 03:48:34' AND idsite = '1'  AND config_id = '^Z?^\? D?' AND user_id IS NULL
                ORDER BY visit_last_action_time DESC
                LIMIT 1
             )
                UNION
                ( SELECT visit_last_action_time, visit_first_action_time, idvisitor, idvisit, user_id, visit_exit_idaction_url, visit_exit_idaction_name, visitor_returning, visitor_days_since_first, visitor_days_since_order, visitor_count_visits, visit_goal_buyer, location_country, location_region, location_city, location_latitude, location_longitude, referer_name, referer_keyword, referer_type, location_ip, location_browser_lang, visit_total_events, visit_total_time, visit_total_actions, visit_total_searches   ,
                1 as priority
                FROM piwik_log_visit
                WHERE visit_last_action_time >= '2015-03-17 02:48:34' AND visit_last_action_time <= '2015-03-17 03:48:34' AND idsite = '1'  AND idvisitor = '^U\"?^^0^M?U'
                ORDER BY visit_last_action_time DESC
                LIMIT 1
             )
                ORDER BY priority DESC
                LIMIT 1;

The explain was the following:

| id | select_type  | table           | type  | possible_keys                                                             | key                          | key_len | ref         | rows | Extra                       |
+----+--------------+-----------------+-------+---------------------------------------------------------------------------+------------------------------+---------+-------------+------+-----------------------------+
|  1 | PRIMARY      | piwik_log_visit | range | index_idsite_config_datetime,index_idsite_datetime,index_idsite_idvisitor | index_idsite_config_datetime | 20      | NULL        |    1 | Using where                 |
|  2 | UNION        | piwik_log_visit | ref   | index_idsite_config_datetime,index_idsite_datetime,index_idsite_idvisitor | index_idsite_idvisitor       | 12      | const,const |    1 | Using where; Using filesort |
| NULL | UNION RESULT | <union1,2>      | ALL   | NULL                                                                      | NULL                         | NULL    | NULL        | NULL | Using filesort              |

While lock_time was relatively large in the example above, in general it was more like 0.0002 for such queries.

@GreenReaper commented on March 17th 2015

The code in question (public function findVisitor() ) appears to be making two queries at once and then using a UNION on them because it prefers the result of one of them.

I think you might be trying to do too much in a single query. Maybe it would be better to break it out into two queries, and only execute the one involving config_id if the one involving idvisitor fails to return a result? This would avoid what might be an expensive UNION and resulting temporary table creation.

Generally available versions of MySQL appear to create a temporary table for all UNIONs. They only changed that in 5.7.3, and then only for UNION ALL (which could have been used here) and where there is no ORDER BY (there is here) and it is not the top-level query block of a SELECT (it is).

@GreenReaper commented on March 17th 2015

The code in question has moved around a bit, but was created in commit 0abac3b312ff9560fb43012192e37c2369bab3ed

It therefore isn't a recent change. But it does seem to be taking some time!

@mattab commented on March 17th 2015 Owner

It's not a recent change like you point out, and you mention you see a regression since 2.9.2. So in this particular issue we would like to focus on anything that may have changed since 2.9.2. otherwise we lose focus :-)

The problem is that I don't know what info we need from you to troubleshoot your issue... it's tricky.

@mattab commented on March 17th 2015 Owner

Btw can you try disabling the Provider plugin if that's not disabled yet? it should help performance a lot in case DNS server is slow (which often it is).

we need to move this plugin to the Marketplace for this reason (it's slow) - covered in #5341

@GreenReaper commented on March 17th 2015

I can certainly try that, and will do so. But I will open another issue for the UNION as it appears to have a very significant impact, and is perhaps easily fixed.

@mattab commented on March 17th 2015 Owner

before creating the issue could you enable slow query log and confirm it is actually slow? AFAIK it's fast and mysql won't create TMP table on disk since it's very small result set and fits in RAM. From explain statement I don't see that it will create tmp table. I see using filesort but that does not mean that it will create a tmp table - see this blog post for more info: http://www.percona.com/blog/2009/03/05/what-does-using-filesort-mean-in-mysql/

@GreenReaper commented on March 17th 2015

Yes, it is actually slow. :-) That is a 200ms query above, examining 100+ rows. The idvisitor query by itself is a 1ms-2ms query, while the config_id query is maybe 0.5ms; neither examines more than one row.

Changing it to just do one select (either idvisitor or config_id, it doesn't matter) eliminates half of the temporary tables (the ones which did not appear recently) and also removes all scan selects in my select types munin graph.

A big part of the slowdown may be in the idvisitor query, as documented later in the issue below. I think it's still worth getting rid of the UNION, not least because it usually means avoiding the config_id query.

@GreenReaper commented on March 17th 2015

Now, as for the idaction query . . . the issue is not the MIN(), but the GROUP BY clause. It seems that grouping by name causes it to split to a temporary table (probably because it is a text field). If the query ends "GROUP BY type, hash" [and optionally "ORDER BY NULL"] then no temporary table is created.

This appears to have been introduced in e43ed30926102a009985a8a6b889b20c89179ce0 to both getSelectQueryWhereNameContains() [core/Tracker/TableLogAction.php] and getIdsAction() [core/Tracker/Model.php], the latter called per-hit.

Commit 3e390e5e38a6cbe60831352462d4b87b8bc2a451 adds 'hash' to the GROUP BY but does not remove name. Perhaps it would be possible to use hash instead of name? This eliminates the temporary disk table per hit, as so:
mysql_tmp_tables-day-notext

@diosmosis commented on March 17th 2015 Member

No, we can't group by hash only, since different names can have the same hash. We could select everything (w/ a sensible limit) and do the group by in PHP though... Will have to think about it.

Thanks for doing the research/debugging! I think you've found the issues at fault here.

@mattab commented on June 18th 2015 Owner

Hi everyone,

We've made huge improvements to Performance in recent months. See all closed issues for an overview of the progress we've done: https://github.com/piwik/piwik/issues?q=label%3A%22c%3A+Performance%22+is%3Aclosed+sort%3Aupdated-desc

If you have further suggestions please open a new issue, we would love to hear. Performance is a top priority for us at Piwik :+1:

This Issue was closed on June 18th 2015
Powered by GitHub Issue Mirror