This issue happens when following conditions are met:
--force-idsitesis provided to
Expected behaviour: archiving of certain website should be triggered by any new visit since last archiving, regardles of time it took place. Actual behaviour: when a visit takes place during archiving, it won't trigger website's preprocessing during next archiving.
This is problematic for small websites with few visits per day, since it causes lack of reports for some days.
Archiving works as follows: 1. Websites to archive are fetched. 2. For each website it is checked whether there are new visits since last full archiving finished successfuly. 3. If there are new visits, archiving of website is performed.
Let's say archiving started at 10:00. Website with id 1 is processed at the beginning of archiving process, there are no new visits. Archiving processes the rest of 5000 websites in 5.5 hours and last successful archiving time is saved (15:30). But meanwhile, at 14:00, another visit took place. Next archiving starts at 16:00. It checks whether since last successful archiving (15:30) there were new visits and it skips website 1 since there are no new visits. Visit at 14:00 is missed.
To avoid such situation, archiving could check for new visits since last successful archiving of given website for day period. There is still a chance that some visits may be missed, but not so big.
The fastest workaround for this is to use
--force-all-websites option in
Thanks @adaqus for the detailed report.
For each website it is checked whether there are new visits since last full archiving finished successfuly.
wondering if you managed to locate this in the code?
I just took a (quick 5min) look and couldn't find this logic implemented. What I saw was that it would skip websites if they didn't have visit since midnight in the website's timezone, which seems correct. Also noticed the comment was incorrect so fixed it in https://github.com/piwik/piwik/pull/11079
Looking forward to hear more details from your investigation :+1:
@mattab Yes, in
CronArchive::hadWebsiteTrafficSinceMidnightInTimezone it is checked whether there were new visits since midnight or since last successful archiving (https://github.com/piwik/piwik/blob/3.x-dev/core/CronArchive.php#L1176).
Let's assume that we run
core:archive few times a day (I should've mention about that). In such case seconds since last successful archiving will be taken into account (because such period is smaller than since midnight (https://github.com/piwik/piwik/blob/3.x-dev/core/CronArchive.php#L1177). So for if a website has 1-2 visits a day and they happen dufing archiving process, they won't trigger archiving for this website.
Also I'm not sure whether changes introduced in #11079 are good. They may be misleading since time since last archiving may be taken into account.