Previous behaviour: - We either processed specifically specified sites - We processed all sites if specified - or by default we checked which sites had visits since last archiver run and which sites needed to be reprocessed because of invalidation etc. The got the list of all sites that had queries since last archiving run by doing one big query - We checked whether there were visits for a rather long timeframe
Difference to before:
- Instead of executing the slow query each time an archiver starts we only execute one small query for just one site before a site is actually about to be processed.
- We do not have to execute this query if we have to archive the site anyway because of other reasons: eg if
websiteDayHasFinishedSinceLastRun or if
- We check for visits in a shorter timeframe (since midnight in website timezone or since last archiving, whatever is smaller)
- The log output of core archiver can be bigger since we by default archive all websites and log each site that can be skipped (eg if it had no visits)
We might try to run the archiver on a large Piwik system tmrw. I'm not really sure how to test it and to make sure we actually do not regress anything. Maybe we can deploy it on all our Piwik demos as well
Follow up note:
- Maybe in Piwik 3.0 we can delete the API
- it's not used in core,
- it's not tested,
- it's slow when many websites.
Marked method as deprecated
FYI: We're running the archiver on a big instance and the big query is definitely gone, the archiver starts immediately
FYI: it breaks MetaSites because there's no raw data for them.
Added more log output. Re meta sites I need to have a look how it was worked around there. In theory it should not change behaviour
Think I found it already...
To make sure we're compatible with MetaSites plugin I added https://github.com/PiwikPRO/plugin-MetaSites/pull/17
Can someone please have a look again?
Let's test it tomorrow.
Rebased :) just FYI
Created follow up issue re: the archiving event here: https://github.com/piwik/piwik/issues/8631
@tsteur do you have anything else to say regarding the event name or shall I merge?
Nah not really. I wouldn't wanna use any event that contains
force in archiving since it's simply not really forcing and there would be multiple events needed that contain the word
getIdSitesDisplayingThirdPartyData is not really correct either see eg MultiSites plugin that actually uses this event currently. It's not third party data that is shown. The only difference to normal behaviour is that it doesn't use the tracker.
I think after refactoring the archiver we might not need an event anymore at all, hopefully. And if so, we can maybe find a better name or provide a method for it somehow.
Ok, will merge then.