@adaqus opened this Issue on December 16th 2016

This issue happens when following conditions are met:

  1. Website has 1-2 visits per day.
  2. Total number of websites to archive is big and hence archiving takes few hours.
  3. No option --force-all-websites or --force-idsites is provided to core:archive.

Expected behaviour: archiving of certain website should be triggered by any new visit since last archiving, regardles of time it took place.
Actual behaviour: when a visit takes place during archiving, it won't trigger website's preprocessing during next archiving.

This is problematic for small websites with few visits per day, since it causes lack of reports for some days.

Detailed description

Archiving works as follows:

  1. Websites to archive are fetched.
  2. For each website it is checked whether there are new visits since last full archiving finished successfuly.
  3. If there are new visits, archiving of website is performed.

But:

Let's say archiving started at 10:00. Website with id 1 is processed at the beginning of archiving process, there are no new visits. Archiving processes the rest of 5000 websites in 5.5 hours and last successful archiving time is saved (15:30). But meanwhile, at 14:00, another visit took place. Next archiving starts at 16:00. It checks whether since last successful archiving (15:30) there were new visits and it skips website 1 since there are no new visits. Visit at 14:00 is missed.

To avoid such situation, archiving could check for new visits since last successful archiving of given website for day period. There is still a chance that some visits may be missed, but not so big.

The fastest workaround for this is to use --force-all-websites option in core:archive.

@mattab commented on December 25th 2016 Owner

Thanks @adaqus for the detailed report.

For each website it is checked whether there are new visits since last full archiving finished successfuly.

wondering if you managed to locate this in the code?

I just took a (quick 5min) look and couldn't find this logic implemented. What I saw was that it would skip websites if they didn't have visit since midnight in the website's timezone, which seems correct. Also noticed the comment was incorrect so fixed it in https://github.com/piwik/piwik/pull/11079

Looking forward to hear more details from your investigation :+1:

@adaqus commented on January 2nd 2017

@mattab Yes, in CronArchive::hadWebsiteTrafficSinceMidnightInTimezone it is checked whether there were new visits since midnight or since last successful archiving (https://github.com/piwik/piwik/blob/3.x-dev/core/CronArchive.php#L1176).

Let's assume that we run core:archive few times a day (I should've mention about that). In such case seconds since last successful archiving will be taken into account (because such period is smaller than since midnight (https://github.com/piwik/piwik/blob/3.x-dev/core/CronArchive.php#L1177). So for if a website has 1-2 visits a day and they happen dufing archiving process, they won't trigger archiving for this website.

@adaqus commented on January 2nd 2017

Also I'm not sure whether changes introduced in #11079 are good. They may be misleading since time since last archiving may be taken into account.

Powered by GitHub Issue Mirror