@gaumondp opened this Issue on February 6th 2017

I'm still wondering how this still happens. My January piwik_archive_blob_2017_01 went from 31.9 GB to 800 MB (12 M rows to 161k rows) after doing./console core:purge-old-archive-data all

I'm well aware of ticket #7181 (Report archives have tripled in size) but with a database that went from63 GB to 28.1 GB total I'm looking for a long term solution.

Piwik 2.17.1 (can't upgrade to 3.x until few months)
PHP 5.5.x, Apache 2.4.x, MySQL 5.5.x
All tests green in Piwik System Check.
Active non-Core plugins : Logviewer, PlatformReport, RestrictLaguageSelection,SecurityInfo and SimpleSysMon.

Here are my last 3 months (last time I run purge-old-archive-data) :

Table Size Before/after Rows before/after
piwik_archive_blob_2017_02 1.9 GB / 417 MB 501,726 / 379,490
piwik_archive_blob_2017_01 31.9 GB / 800.5 MB 12,179,895 / 161,802
piwik_archive_blob_2016_12 171.8 MB / 85.6 MB 82,220 / 14,695

No errors in my PHP or Apache log nor in Piwik.

@tsteur commented on February 6th 2017 Owner

I think the results may be a bit bigger in January because it stores yearly archives there etc (I think). Also I don't know how much data PlatformsReport archives but that's quite a difference. I'm not into #7181 but am wondering whether you have "browser archiving" enabled and/or how often your cronjob runs?

@gaumondp commented on February 6th 2017

Thanks for answering, highly appreciated.

  1. Platform Report. I checked and I'm still running the latest version available. But why such a report data would be "flushed" by purge-old-archive-data ?

  2. Browser archiving is at OFF since 2013 and we run Cronjob every 15 minutes. We got around 150,000 actions total every day from 10 different sites.

  3. NEW: Config file difference from the default settings :
    datatable_archiving_maximum_rows_actions = 2000 (500 default)
    datatable_archiving_maximum_rows_events = 2000 (500 default)
    datatable_archiving_maximum_rows_subtable_actions = 2000 (100 default)
    enable_processing_unique_visitors_year = 1

And my DB is on different MySQL server (4 cores, 12 GB of RAM) and nothing else run on that server.

@mattab commented on February 21st 2017 Owner

Looking at the code it seems that already there is a daily scheduled task which should have the same effect as calling core:purge-old-archive-data

the daily scheduled task is defined here: https://github.com/piwik/piwik/blob/3.0.1/plugins/CoreAdminHome/Tasks.php#L41-L43

@gaumondp when you check your core:archive output logs for 1 or 2 days, do you see this scheduled task purgeOutdatedArchives being executed?

@gaumondp commented on February 21st 2017

Here is my current cronjob :

*/15 * * * * /usr/bin/php /piwik/console core:archive --url=http://stats.site.com >> /logs/piwik-console-cron217-1.log

Looking at the last 2 days I'm seeing an error message I didn't notice:

INFO [2017-02-21 14:45:01] Running Piwik 2.17.1 as Super User INFO [2017-02-21 14:45:01] --------------------------- INFO [2017-02-21 14:45:01] NOTES INFO [2017-02-21 14:45:01] - Reports for today will be processed at most every 600 seconds. You can change this value in Piwik UI > Settings > General Settings. INFO [2017-02-21 14:45:01] - Reports for the current week/month/year will be refreshed at most every 3600 seconds. INFO [2017-02-21 14:45:01] - Archiving was last executed without error 72 days 3 hours ago INFO [2017-02-21 14:45:01] - Will process 19 other websites because the last time they were archived was on a different day (in the website's timezone) , IDs: 2, 4, 5, 6, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22 INFO [2017-02-21 14:45:01] - Will process 1 other websites because some old data reports have been invalidated (eg. using the Log Import script) , IDs: 6 INFO [2017-02-21 14:45:01] --------------------------- INFO [2017-02-21 14:45:01] START

In fact I really erased all my piwik_archive_blob2016 and piwik_archive_numeric2016 but only website I got a problem is siteId 6 for annual reports only. Other reports are ok and consistant. The log from cron has 0 visit for annual reports (There is no data for this report.) :

INFO [2017-02-21 14:47:08] Archived website id = 6, period = year, 0 segments, 0 visits in last 7 years, 0 visits this year, Time elapsed: 51.443s INFO [2017-02-21 14:47:08] Will pre-process for website id = 6, period = range, date = last7 INFO [2017-02-21 14:47:08] - pre-processing all visits INFO [2017-02-21 14:47:09] Archived website id = 6, period = range, 0 segments, 56352 visits in last 7 ranges, 56352 visits this range, Time elapsed: 1.083s INFO [2017-02-21 14:47:09] Archived website id = 6, 5 API requests, Time elapsed: 102.837s [4/19 done

And today my Report table has February at 13 GB and January at 4 GB. So it looks there are never any purgeOutdatedArchives run.

Am I supposed to see anything about purgeOutdatedArchives in my cron log ? Remember, I'm still using 2.17.1.

Thanks!

@mattab commented on June 9th 2017 Owner

Remember, I'm still using 2.17.1.

Hi @gaumondp
Have you now upgraded to Piwik 3 and if so do you still experience this issue? we would like to get to bottom of this one, if it still occurs

Powered by GitHub Issue Mirror