@jackellenberger opened this Issue on July 1st 2015

Background: Our Piwik instance currently has 13,000 sites associated with it, where only 100-200 of them are active at a time (both receiving views and viewing reports / analytics). This means our core:archive output looks like this:

INFO CoreConsole[2015-06-30 14:25:31] - Will process 101 websites with new visits since 1 days 0 hours , IDs: [...]
INFO CoreConsole[2015-06-30 14:25:31] - Will process 91 other websites because some old data reports have been invalidated (eg. using the Log Import script) , IDs: [...]
INFO CoreConsole[2015-06-30 14:25:31] - Will process 13212 other websites because the last time they were archived was on a different day (in the website's timezone) , IDs:

It is pretty easy to efficiently process 100-200 sites, but 13,000 takes about 7 days (!).

Proposed Solution: Create a flag for the core:archive command such as "--ignore-untouched-sites" that would not archive sites that were last archived on a different day, if there were no views or actions. If at a later time these sites did get views, it may take longer to archive them, but I would rather have the option to frontload the time savings and ignore sites that have not been interacted with in the last day (or any time period specified by --force-all-periods).

Let me know what you think!

Edit: it appears I can't label this, but if I could it'd be a feature request / enhancement!

@tsteur commented on July 1st 2015 Owner

I think this is a duplicate of #5922 . Please comment/reopen if not. Cheers!

This Issue was closed on July 1st 2015
Powered by GitHub Issue Mirror