@laszlovl opened this Issue on November 13th 2017

Currently, conversions are attributed (to a referer or campaign) by two means:

  1. By piwik.js storing campaign name & keyword in a cookie and passing it to _rcn or _rck in subsequent visits
  2. By the server side extracting referer information from the current visit

This causes various limitations in accurately tracking attributions. If a user converts in a 2nd or later visit (where URL referer/campaign information is no longer available) and the cookie is unavailable, attribution for that conversion is lost. For example:

  1. Thanks to campaign X, a user initially visits the site on their cellphone
  2. By setting a user-id, the user is tracked across multiple devices
  3. One day later, the user visits on their tablet and converts

The conversion is now attributed to "direct entry" instead of campaign X.

The same thing happens:

  • When the user has cleared cookies between step 1 & 3
  • When the conversion is tracked by an external system instead of through piwik.js

Other limitations are:

  • Dimensions other than campaign name & keyword (such as source & medium that are added by the campaign plugin) are currently not stored in the cookie, so they are always lost between visits
  • The architecture can only deal with a single source, so multi-channel attribution (#6064) is very hard to implement on top of it

I propose that cross-visit goal attribution is handled on the server side instead. When a conversion is created, instead of fetching dimensions from the current visit, we should simply look in all of the visitor's visits and using the last (or first) visit with non-empty attributes. setConversionAttributionFirstReferrer would move from piwik.js to a server-side configuration option.

As far as I can see this would solve all problems, without significant performance issues:

explain select * from piwik_log_visit where idvisitor='abc' and (campaign_content is not null or campaign_id is not null or campaign_keyword is not null or campaign_medium is not null or campaign_name is not null or campaign_source is not null) order by idvisit desc limit 1;
+----+-------------+-----------------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| id | select_type | table           | partitions | type  | possible_keys | key     | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-----------------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | piwik_log_visit | NULL       | index | NULL          | PRIMARY | 8       | NULL |    1 |    16.67 | Using where |
+----+-------------+-----------------+------------+-------+---------------+---------+---------+------+------+----------+-------------+

Perhaps the only caveat would be installations where the raw visitor logs are deleted after X days, so attribution wouldn't be possible if the conversion happens X (180 by default) days after the initial visit. I don't think that's an issue, but if it is we could keep the cookie attribution information as a fallback.

@laszlovl commented on November 14th 2017
Powered by GitHub Issue Mirror