@hi2u opened this issue on July 22nd 2015

It's great that Piwik now keeps an up to date list of referral spammers, but this doesn't seem to affect any stats in the past?

It would be great if there were a feature to delete all historical data from referral spam bots. This could either be triggered automatically, or maybe with a button in the web interface or command on the server.

I only recently just upgraded my Piwik installation, so most of my stats are still clogged up with referral spam.

There's also the fact that even with the new auto-updating spammer list, there will always be a gap between the time spammers are discovered, added to the list, and then received in the Piwik installation. So if we had a feature that deleted historical spam, that would fix up that gap too going forward, giving everyone an "eventually pure" spam free database.

Another bonus advantage is reducing the size of the database. 99.99% of us don't want this data at all, and for small websites it can easily account for 50% to 99% of your stats.

Maybe a very small number of people want referral spam kept, so for those who do, it could be made optional. Although I think auto-deleting junk data would be a sensible default.

If somebody is able to implement this and it's going to be a manually triggered thing, it would be great if it can be applied to all websites in the Piwik installation in one go.

@nekohayo commented on September 7th 2015

Aye, some people have been requesting this in http://forum.piwik.org/read.php?2,127138 for example.

In https://github.com/piwik/piwik/issues/3385#issuecomment-69985675 @mattab has mentioned a SQL command to delete some crap from (I think) the two main affected tables (besides archives), "log_visit" and "log_link_visit_action". I've been scratching my head tonight trying to understand that sample SQL command to see if I could adapt it to this case, to search for those blacklisted referrers in the "referrer_url" or "referrer_name" fields/columns… but I didn't figure it out. I think the solution (in terms of the command) is probably 99% there but I'm not familiar with SQL (yet), the logic in that request eludes me.

But a feature in Piwik that would take care not only of the logs but also the archives (because reprocessing years of archives for multiple sites is long!) would be absolutely wonderful.

Powered by GitHub Issue Mirror