@mattab opened this Issue on August 3rd 2014 Owner

Problem: when IP anonimisation is enabled, it is still easy to profile users across days as they may be the only ones in their anonimised IP address range. So a Piwik admin could very easily find matching visitors in previous days despite the fact that the IP address is anonymised. This was suggested by Richard Stallman from the FSF.

Goal: Help users prevent future surveillance on themselves. Provide better privacy to users measured by Piwik. Preventing the long-term tracking of users is of the highest importance; if we hit an obstacle, we must not simply give up.

Proposed solution: When IP anonimisation is enabled, hash the IP address in a way that prevents fingerprinting. For example, hash the anonimised IP using a seed that changes each day.

Advantages:

  • the algorithm to detect unique visitors using IP address hash would still work for matching visitors to the current day's traffic
  • it would bring added privacy to users tracked in Piwik as their IP address would be hashed, and across days such hash would be different.
  • the fingerprinting hash should use the hashed visitor IP

What do you think?

refs #5052

@kylekatarnls commented on August 3rd 2014 Contributor

Agree, but does it will be possible any longer to know if a visitor is new or returning with anonimisation if the hash change each day ?

@mattab commented on August 3rd 2014 Owner

@kylekatarnls if the visitor has First party cookies enabled, then yes Piwik will be able to detect returning visitors.

@mattab commented on September 9th 2014 Owner

This is requirement to become a GNU Package: #5276
see also #6160

@mattab commented on October 20th 2014 Owner

From RMS:

http://www.theguardian.com/commentisfree/2014/oct/17/whisper-private-secret-sharing-app-anonymity
talks about the danger of identifying people from the approximate geolocations when they visit a site.

@mattab commented on December 10th 2014 Owner

Trying to get my head around this issue.... I think that:

  • we need to change the meaning of what it means to be Anonymising IPs, and how we store anonymised IPs in the DB in log_visit.location_ip
    • To prevent surveillance, we must hash the IP address with a random seed that changes every day.
    • example: a user visiting from the same IP address today and then three days later visits again: in the Piwik database we will see a different value for each day in the field log_visit.location_ip
    • it will be impossible to display IPs in the Visitor Log or Visitor Profile. we would show anonymised IP address instead of 123.45.0.0.
  • Piwik has privacy built-in, we'd like to set this anonymisation as the new default.
    • users could disable via the config file rather than the UI, by default it will read eg. enable_ip_anonymisation_really_anonymous=1
  • Maybe we can remove the section Select how many bytes of the visitors' IPs should be masked. from the Privacy settings page?

Other

  • By default, enabled by default. Can be disabled in config eg. enable_ip_anonymisation_really_anonymous=0
  • when the IP is anonimised the segment visitIp should be hidden as it won't match the visit as expected
  • if the config setting window_look_back_for_visitor is non zero, then we should automatically set enable_ip_anonymisation_really_anonymous=0 as it is required for this feature to work (and BC)
  • need to tweak FAQs finding visitor IP, select visitor ip from database
@mnapoli commented on December 10th 2014 Member

To prevent surveillance, we must hash the IP address with a random seed that changes every day.

:+1:

users could disable via the config file rather than the UI

Why making it harder than what it is today? Why not leaving it in the UI?

enable_ip_anonymisation_really_anonymous=1

really_anonymous is like mysql_real_escape_string(), we should try to find a better naming

What about Piwik installs that use IP anonymisation today? Will the new anonymisation method replace the old one?

@mattab commented on December 10th 2014 Owner

Why making it harder than what it is today? Why not leaving it in the UI?

because user won't be able anyway to view "bytes that were not anonimised from the IP addresses" then it may be confusing to user if we give him control there but in the end it will not affect the "visibility" of the reports. it seems to be that it will become an implementation detail of the new better anonymisation algorithm?

Will the new anonymisation method replace the old one?

I guess it's safer not to in case some users depend on it somehow. maybe we add upgrade task to set the setting to 0 for those users.

+1 to find a better setting name! anonymised_ip_prevent_big_brother or anonymised_ip_prevent_surveillance or anonymised_ip_look_different_each_day ?

@tsteur commented on December 11th 2014 Owner

Sounds rather like a new plugin for me in case someone wants to have another level of anonymization... Or if you think most users actually want this behavior replace the default behavior and move the old one into a plugin so users can still use the previous one.

@mnapoli commented on December 11th 2014 Member

+1

It's confusing:

  • no_anonymisation
  • anonymisation_but_not_really_anonymous
  • anonymisation_really_this_is_the_real_one

It needs to be clear and simple, either it's anonymized, either it's not. And in the end: do Piwik changes its definition on what "anonymisation" mean? If not, then we put the new method in a plugin because that's not the Piwik endorsed way. If yes, then we put the old method in a plugin (or we mark it "not-recommended" in the UI) because that's no longer the Piwik way. We can't go half heartedly, it will just confuse people (but we should still keep BC for users though!).

And coming back again on this because I'm not sure I understood your answer:

users could disable via the config file rather than the UI

As a user, I want to be able to disable IP anonymisation. Going in the config file is a no-no for me, e.g. I'm using Piwik Cloud.

@mattab commented on December 11th 2014 Owner

Alright I think we do it this way:

  • we keep current setting Anonymize Visitors' IP addresses. when Yes is selected....
    • show below new setting Keep IP secret. The inline help reads Select "Yes" if you want to respect your visitors privacy by encrypting their IP addresses.
    • When yes is selected, hide the Select how many bytes of the visitors' IPs should be masked. (since this setting becomes almost irrelevant).
    • When no is selected, display Select how many bytes of the visitors' IPs should be masked. (current behavior)

I think this way makes full sense and nicely integrated (config setting was a bad idea)

@tsteur commented on December 11th 2014 Owner

Can we put this into a plugin? That's why there is a Tracker.setVisitorIp event ;) So people could just activate it by installing (or disable it by deactivating) the plugin

Powered by GitHub Issue Mirror