@skynet opened this Issue on December 20th 2014

The Snowplow JavaScript Tracker (https://github.com/snowplow/snowplow-javascript-tracker) is based on piwik.js.

Snowplow's Cloudfront collector is the most popular SnowPlow collector. It is incredibly robust and scalable, by leveraging Amazon's cloud infrastructure.

The SnowPlow tracking pixel is served from Cloudfront. The SnowPlow tracker requests the pixel (using a GET), and appends the data to be logged in SnowPlow in the query string for the GET request. Amazon provides Cloudfront logging: the request (incl. the query string) gets logged to S3 including some additional data provided by Cloudfront. (E.g. requester IP address and URL.)

Would it be possible to have an out-of-box alternative method for collecting data for analysis, using Snowplow JavaScript Tracker instead of piwik.js?

@alexanderdean commented on December 20th 2014

Hey there! Alex from the Snowplow project here.

Not trying to shoot this suggestion down, but just a note that the Snowplow JavaScript Tracker has diverged significantly from the Piwik tracker since we forked 2.5 years ago:

@skynet commented on December 20th 2014

Good, understand, did not expect for these trackers to be compatible at all. Just want to collect data with Snowplow and display it with Piwik. Is that an impossible task?

@mattab commented on December 21st 2014 Owner

Hi @alexanderdean good to see your progress on SnowPlow! :+1:

@skynet Not sure if it's possible task, but if it is, it would certainly be very interesting to learn more!

@skynet commented on December 22nd 2014

From what I read so far, it would be possible to import into PW standard logs generated by CloudFront. That means super-fast data collection.

Instead of sending the data collected on a page back to PW to be inserted into the database, (which is probably a relatively expensive operation, unless you buffer what's sent from browser), the data gets collected using CloudFront logs, parsed and inserted into PW on demand, during a nightly process.

That means, your server is not even touched while users browse your website. Of course, there will be no real-time statistics, but happy to trade that option for a reliable and fast alternative. It is probably why most of us use Google Analytics, to leverage on Google cloud infrastructure. It's scary to know that for any user action in the browser your database gets an insert.

@mattab commented on December 28th 2014 Owner

That means, your server is not even touched while users browse your website.

Sure it's possible :+1: Check out our log analytics feature which lets you import many log flies into piwik.

Make sure you use our latest RC release which supports CloudFront logs #5894 #6851 #6554

This Issue was closed on December 28th 2014
