we sometimes will have to push conversion data for days in the past.
Currently, such conversions will be tracked however old days reports will not be updated the next time they are requested.
When such conversions happen in the past, we should set a flag that will force this report to be refreshed the next time it is requested, or the next time archiving runs.
This function would also work for "updating" orders in previous days (which would invalidate past reports)
See also duplicated #2328
Why this ticket?
AWStats/Urchin alternative script #703 will push server log data to Piwik for days in the past, sometimes users will play logs from the last 3 months at once, or in several go, processing dates in random orders, websites in random order.
For example: - import logs for 2012 March for Site 1-1000. - Run archive.php - Data shows only since the day of install, but at least some data is shown - User tries imports logs from 2011 Aug 2012 Feb - Run archive.php. The calendar does not show before March?? - User finds out about updating ts_created in piwik_site table - Calendar now allows to select older dates, but old data does not show??
The goal is to accommodate this use case in a user friendly manner: Piwik should transparently force reprocessing for the websites/days/weeks/months/years where new Data was inserted.
- This feature will work ONLY if archive.php cron is setup
- we implement it in archive.php for efficiency / decoupling. Doing the logic on each Piwik_Archive::build or similar would be too slow for a feature that will only be used by power users
- Task implementation
- idvisit is growing in log_visit
- we only invalidate days that have NEW visits. If the changes are "updating" one or several visitors (or orders, pageviews, etc) without adding new visits, it will not cause a re-processing. It should be fine since the goal is to deal with newly pushed data from logs, which will always create new visitors.
- at the end of archive.php run (scheduled task?), keep track of the MAX(idvisit) last processed
- Next run: look for (idsite,date) that are from past date
SELECT count( idvisit ) , DATE( visit_last_action_time ) AS date, idsite FROM piwik_log_visit WHERE idvisit >4000000 GROUP BY idsite, date
- This query will be potentially slow. Allow for a config setting to disable this feature completely for users who don't use it and don't want the added performance hit (ie. takes 20s to run on 1.5M visits on demo)
- Cache in DB rather than in memory (when running concurrent archive.php)
- Delete archived reports...
- Wait until a website is being processed, to delete the old reports -- so that a user consulting stats can access them until the last second that the re-process starts
- Delete all matching reports from archive_XX tables for this site and all dates that are invalidated for this site.
- Delete matched days, and weeks/month/year containing these deleted days
- The function to delete archives for a given site/date should be refactored in a private API so that it can be reused for Piwik lightweight mode: #53
- If using the feature "Delete logs older than N days", we should only delete reports for dates that are more recent than N days
- Write in the archive.php output a WARNING when it happens that older data is recorded. Ask user to increase the "Delete logs older than N days" in Settings to proceed and not lose their data.
- Force re-processing of these old reports...
- Update site.ts_created to earliest date now known
- Set the proper values to ensure these dates are triggered when archiving old data (ie. set last52 or more to API calls)
This should work!
(In ) Fixes #2584 - Implemented instead as a public API, that can invalidate any report from any day or list of idsites - the API will be called by the Log Import script - the archive.php next run will process these dates in priority - the ts_created is updated in the websites to make sure calendar is selectable - Handles when "Delete logs older than N days". Only invalidate reports that are more recent than N days (for which we are likely to still have logs) - Added integration test that - first calls the reports, old data not displayed - then calls the API invalidating reports for newer dates - then calls the API again, now with data!
(In ) remove debug Refs #2584