@PCSun1987 opened this Issue on March 17th 2016

Current it's not possible to send tracking event offline.

So one idea would be to extend PIWIK tracking API especially for JavaScript with additional parameter about even time-stamp (optional), so inside JS, you can keep the event locally and once connecting to internet, send all the events with the event happening time.

For basic tracking, probably can do similar way.

https://forum.piwik.org/t/does-piwik-work-even-your-offline/7295/9

@hpvd commented on March 17th 2016

+1
would be a good step on way to universal tracker (Piwik 3.0).
It's not only intersting for apps which have temporaly no connection to web,
but also a good thing to make the manual input of "real world" events after they have happened possible and give them the correct place in timeline

@tsteur commented on March 28th 2016 Owner

This would be a nice feature indeed

@PCSun1987 commented on March 29th 2016

So...how long we would plan to have this functionality? E.g. 3.0 would be released when?

@tsteur commented on March 29th 2016 Owner

Piwik 3.0 would be in about a year but this feature is not planned yet. Pull request or suggestions on how to implement it are always welcome :+1:

@stehlo commented on November 3rd 2016

Just looking for such a solution. This is tremendously important for mobile apps.

In the JS tracking client, I have noticed a method called retryMissedPluginCalls() and the array missedPluginTrackerCalls. It could be interesting to hook it in some way in order to intercept the calls to the server in an offline state. Then, upon "online" event, we would call and retry missed calls.

Your thoughts, @tsteur and @mattab ?

@tsteur commented on November 3rd 2016 Owner

retryMissedPluginCalls() is actually a bit different here. Plugins can extend the Piwik JS tracker and there may be cases where either Piwik is loaded first, or the Plugin. If Piwik was loaded first and tries to apply all _paq.push calls, it cannot call the methods for the plugins yet as they are not yet loaded. Therefore once the plugin is loaded they try to call all missed plugin calls again.

Offline tracking is super important nowadays for mobile apps, progressive web apps, .... If someone wanted to work on it I'm happy to give some support. I think it needs to be worked out what the best place is to save requests that couldn't be sent because the user is offline (eg localstorage, ...) and then we need to detect whether user is offline and when user goes online again. Some browsers have an API for that.

There might be one problem when tracking the requests later. I think by default Piwik lets you only track requests up to 4 hours in the past without needing an authentication token. The time is customizable AFAIK but it might be something to take into account that requests older than 4 hours may have to be invalidated.

@stehlo commented on November 4th 2016

Thank you for your prompt reply.

I need to find some solution for this rather promptly, whether using Piwik or something else. Ideally the former though, due to its ability to easily deal with hybrid applications.
¬

retryMissedPluginCalls()

OK, I see. No problem.
¬

I think it needs to be worked out what the best place is to save requests that couldn't be sent

Well, I don't think that this is so important. It should be left open to the developer to decide what to plug into the system. Everyone might have different preferences.
¬

then we need to detect whether user is offline and when user goes online again

Ditto. Developer should supply this to make it easy for Piwik.
Piwik should only supply the mechanism for giving up the "send" and for having the ability to submit it later. This is what I need to find out now.

What I mean by this is that there should be methods to call in Piwik that let it know that "now it is necessary stop sending tracking data and save it instead" and "now resume sending and send what has been stored".

By the way, the "stop sending" and "resume sending" functionality is already working now.

  • Upon getting offline, I stop calling trackPageView(), trackEvent(), etc.
  • And upon getting online, I start calling them.

So that part is simple.

Even calling the JS tracking client (or loading it locally) is easy to solve upon detection of online/offline events. I already have this part solved.

Thus the only remaining thing is "how to store the data, so that Piwik would know when those tracking event happened, so that it would be possible to reconstruct the past sequence correctly upon delayed sending".

I need your or Matthieu's input on this, as you understand the existing code base and its functionality. (Thank you in advance.)
¬

I think by default Piwik lets you only track requests up to 4 hours in the past without needing an authentication token.

I don't understand this one. What kind of a token?
When you are submitting old data, you are still submitting it with the session that is currently active, aren't you?

The point is that the device might be offline for one week or a month. This means that we just need to keep storing the tracking data with original timestamps and simply submit it in a sequence, when the device comes online.

This means that we won't have such data available for analysis immediately, but only eventually – yet it is still important and better than not having it at all.

The question is to what extent the current system could support this scenario without major code changes.
Is it possible to hook some existing sub-systems?

@stehlo commented on November 4th 2016

Updated the previous comment. --

@stehlo commented on November 4th 2016

@tsteur
This is actually an interesting notion that you mentioned... about the plugins for JS tracking client.

It is possible to write such an "offline tracking" solution as a plugin?

If yes, I could look into this ASAP, if I am given a guidance on how such plugins are written. Thanks.

@tsteur commented on November 4th 2016 Owner

With token I mean you need an authentication token in this case. It is actually hard coded that when you want to track a request that is older than 4 hours you need to authenticate see https://github.com/piwik/piwik/blob/2.17.0/core/Tracker/Request.php#L467-L474 . This is for some security reasons eg you could otherwise track into any Piwik instance data in the past etc. This token can be disabled though here https://github.com/piwik/piwik/blob/3.x-dev/config/global.ini.php#L696

I feel like we could maybe add an API to the tracker like setUserOffline() in which we overwrite the internal method sendRequest to add such requests to an array instead. A developer could eg pass some kind of storage class for us to add the request like {addRequest: function (request) {}} and the developer could this way decide what to do with it.

When user becomes online, the developer could call eg a method tracker.setUserOnline(storedRequests) and we (Piwik tracker) would try to re-send these requests in bulk. However, there is this 4 hour problem as described currently.

It could be probably written as a plugin, but this API is not yet official and is undocumented and we would for sure need to add some methods to the tracker. Adding those methods to tracker could be done quickly though. I'll show you rough idea in a bit without thinking too much about it

@tsteur commented on November 4th 2016 Owner

This could be rough idea:
https://github.com/piwik/piwik/compare/2.x-dev...offlinetracker?expand=1

Developer would do

tracker.setUserOffline({push: function (request) {
    // eg localstorage.addItem(request);
}})
tracker.setUserOnline(localstorage.getItems()})
@tsteur commented on November 4th 2016 Owner

Eventually Piwik would ideally detect offline status itself and store it somewhere.

The biggest problem remains the Piwik backend re the 4 hours in past only

@stehlo commented on November 4th 2016

Thank you.
The code looks reasonable to me.
I would just change the initialisation of the configOfflineStorage to an object instead of an array: configOfflineStorage = {}; on the following line:
https://github.com/piwik/piwik/compare/2.x-dev...offlinetracker?expand=1#diff-1279d666063b65e6d6777f902d11574fR3085

@tsteur commented on November 4th 2016 Owner

I made in an array because an array has out of the box a push method. This way it will be easy for us to add tests for it. A developer would for now set a custom offline storage that is an object with a push method.

I renamed the user term to visitor as Piwik usually uses the term Piwik. Do you think you could work with something like this? Maybe some background would be good as well. Are you developing a mobile app eg via phonegap? mobile web app?

@mattab do you have any thoughts on this?

@stehlo commented on November 4th 2016

array, push method

OK, I understand your point.
¬

I renamed the user term to visitor as Piwik usually uses the term.

Makes sense.
¬

Do you think you could work with something like this?

Absolutely. Looks very good to me. Simple and effective.
¬

Maybe some background would be good as well. Are you developing a mobile app eg via phonegap? mobile web app?

Cordova/Tizen + plain JS + HTML + CSS.
Multiplatform (Android, Amazon, AmigoOS, Blackberry10, iOS, Tizen, Windows).
¬

The biggest problem remains the Piwik backend re the 4 hours in past only

I have just cloned the Piwik repository and I am going to look into the reasoning for this limitation...

@stehlo commented on November 4th 2016

@tsteur

I have just found this:
https://github.com/piwik/piwik/blob/3.x-dev/core/Tracker.php#L256-L260

It looks like the bulk submission automatically bypasses the authentification.

Am I right in the assumption that it applies to our case?

If yes, the whole problem will have been solved tonight. ;-)

@stehlo commented on November 4th 2016

If not, what is the "bulk request" then?

@stehlo commented on November 4th 2016

Oops!
I have noticed only now that those lines are within the setTestEnvironment() function.

@stehlo commented on November 4th 2016

However, I have found this:

; Whether Bulk tracking requests to the Tracking API requires the token_auth to be set.
bulk_requests_require_authentication = 0

https://github.com/piwik/piwik/blob/3.x-dev/config/global.ini.php#L684-L685

Could this fit our needs?

@stehlo commented on November 4th 2016

@tsteur
Could it be that you have wrong time on your computer? Or something like that?
Have a look here:
https://github.com/piwik/piwik/compare/2.x-dev...offlinetracker?expand=1
Your today's commits during our communication on Nov 4, 2016 appear to be made on Nov 2, 2016, i.e. two days ago(!). That's odd.

@stehlo commented on November 4th 2016

On top of that, you are changing the version of JS Tracking Client that lacks some code related to configIdPageView, which is present in Piwik 2.17.0.

Thus you have effectively overwritten its declaration on the line 3084.

@stehlo commented on November 4th 2016

Also, the semicolon at the end of the line 3084 should be a comma, as the declaration of variables continues on the next line:
https://github.com/piwik/piwik/compare/2.x-dev...offlinetracker?expand=1#diff-1279d666063b65e6d6777f902d11574fR3084

@tsteur commented on November 4th 2016 Owner

Yeah the time in my virtual machine sometimes gets wrong :)

The config you mentioned only applies to bulk requests in general. Not to the recording records in past. For this tracking_requests_require_authentication would need to be set to "1" see https://github.com/piwik/piwik/blob/3.x-dev/config/global.ini.php#L696

Regarding idpageview I think you are currently looking at Piwik 3 (not released yet), the change was made on Piwik 2 (branch 2.x-dev).

@stehlo commented on November 4th 2016

Regarding idpageview I think you are currently looking at Piwik 3 (not released yet), the change was made on Piwik 2 (branch 2.x-dev).

Yes, I realised this. Therefore it looks like that 3.0-dev wasn't updated to the latest 2.17 or 2.x-dev.
¬

The config you mentioned only applies to bulk requests in general. Not to the recording records in past.

Oh, that's a pity... :-\

@mattab commented on November 12th 2016 Owner

This could be rough idea: https://github.com/piwik/piwik/compare/2.x-dev...offlinetracker?expand=1

Developer would do

tracker.setUserOffline({push: function (request) {
// eg localstorage.addItem(request);
}})

tracker.setUserOnline(localstorage.getItems()})

This looks great @tsteur ! I'd vote for inclusion in Piwik 3 as a rather powerful new feature once tested & documented. Will help tons of people and make Piwik more resilient!

@stehlo commented on November 12th 2016

Hello Matthieu, @mattab

Thank you for your input into this topic.

What are your thoughts about the 4 hour limit of the backend authentication?

@tsteur commented on November 12th 2016 Owner

The problem is indeed it is not really useful with the 4 hour limit. And I think even good documentation might not help here as much. Would definitely need to mention it in the docs. On top we could set a timestamp with each request and by default only replay tracking requests of eg last 3 hours (because clocks are not always right it we should go save). Then people could maybe have an option to ignore that 3-4 hours limit.

The scary part is, when a date is set more than 4 hours back, Piwik simply uses the current date instead of not tracking that request at all which is quite dangerous and can lead to wrong tracking data. So it is a big problem

@stehlo commented on November 12th 2016

Well, in some way this needs to be resolved.

We can't talk or think just in hours, when discussing offline tracking. The device can be offline for a day, a week or a month, yet the app and Piwik JS Tracking Client need to be able to collect data before eventually submitting it over Internet.

Therefore the data to be processed can be rather old.

@tsteur commented on November 13th 2016 Owner

I'm not sure but the next problem might be even that Piwik won't re-archive / re-generate / update reports when there are visitors recorded for eg 2 or 3 days in the past. So even when the visitors are recorded in the past successfully, they might not be visible in the reports. So right now it might actually not make sense to merge it as Piwik is just not ready yet for offline. @mattab or would the reports be re-archived when we record visits in the past and they have been finished already?

@mattab commented on November 21st 2016 Owner

@mattab or would the reports be re-archived when we record visits in the past and they have been finished already?

yes, it should be implemented already: when tracking in the past, the old reports should be marked as "needs to be invalidated" and the reports should be invalidated on the next archiving run. Done in Tracker API via rememberToInvalidateArchivedReportsLater in: https://github.com/piwik/piwik/blob/2.x-dev/core/Tracker/Visit.php#L587-L589

and even should be tested in here: https://github.com/piwik/piwik/blob/2.x-dev/tests/PHPUnit/Integration/Tracker/VisitTest.php#L370-L410

@tsteur commented on November 21st 2016 Owner

See comments above, tracking offline data would still not really be useable.

@mattab commented on November 21st 2016 Owner

fyi: when the feature "custom request datetime" was launched we made the feature require token_auth for all datetime - then in #6407 #6110 it was changed to allow recent requests 4 hours to be tracked.

-> Would you say it would be good enough to track data for the 24 hours?

as you mention, created issue for "skip request with invalid token_auth" #10890

Edit: some code has been written!

Thomas wrote a bit of unfinished code here which may be useful https://github.com/piwik/piwik/commit/59de0ad1c7637c320c3d31050acbd16c16805892

@stehlo commented on November 21st 2016

From the technical point of view, the past should be of arbitrary length.

Powered by GitHub Issue Mirror