@gka opened this Issue on July 4th 2013 Member

Listing the referrer websites can be significantly improved by normalizing the domain names. Currently subdomains such as "www7" are treated as separate website. Here's an example of such a referrer list, in which you see that lemonde.fr is listed several times:

[[Image(http://new.tinygrab.com/f3aa221edeba52ea05e91e20b51690a2c38c508b47.png)]]

Of course this is not trivial, as some sub-domains are pointing to separate websites while others are only mirrors or mobile variants of the same site.

To solve this issue, Mozilla maintains a list of "effective" tld names. This list includes domains such as bl0gsp0t.com and dyndns.org, because X.dyndns.org should be treated as a separate websites.

http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1

Using this list it is easy to normalize the domains, or in other words, to extract the "effective" websites. The list is not perfect (for instance tumbr.com is missing) but it should solve 95% of the problem.

@mattab commented on July 14th 2013 Owner

Good idea to use a list to improve the referrer website.
For lemonde example though, I feel like having all the subdomains brings value as it helps seeing which sub-sites bring more traffic. lemonde is not in the list so it makes sense.

We could also implement this as a plugin in the upcoming marketplace at: http://plugins.piwik.org/

@gka commented on July 16th 2013 Member

Another very smart solution would be to do just group the visits by domain and subdomain. This seems to be easier as we don't need to maintain the effective tld list at all. The result could look like this:

||= Website =||= Visits =||
|| guardian.co.uk || 503108||
|| lemonde.fr || 303471||
|| - www.lemonde.fr || 177113||
|| - decodeurs.blog.lemonde.fr || 83375||
|| - emploi.blog.lemonde.fr || 30323||
|| - abonnes.lemonde.fr || 7412||
|| - mobile.lemonde.fr || 2652||
|| - alicedsl.lemonde.fr || 2596||
|| derstandard.at || 58850||

Ok, we might still need to maintain a shorter list of effective TLDs where we put some country-specific TLDs in, such as co.uk, but we don't need to cover company specific TLDs such as blogsp0t.com, as users can easily unfold the domain to see what blogs are linking most.

(btw I hate this comment system which always blacklists my comments just because I include blogsp0t.com. silly!)

@mattab commented on July 16th 2013 Owner

Great idea to add a new "view" of the report with subtables showing subdomains.

Maybe we show such new report as a new footer link Related Report "Websites by Domain" under "Websites" report

  • Maybe we could "save" as preference on click, as part of #1915
  • or maybe in general we could make "Related Report" link more visible (see for example under Page Titles report)

Or maybe as a "COG" dropdown option.

@gka commented on July 22nd 2013 Member

I would prefer making the hierarchical view the new default and then let the user "make it flat" as we are doing with the Pages report.

Anyone thinking that the flat view is better than grouping by domain?

@mattab commented on January 13th 2014 Owner

Nice idea for a plugin which could filter out the Referrers dataTable to make the grouping as explained here!

@gka commented on October 17th 2014 Member

As a first step toward this I worked on a PHP implementation for extracting the "effective" domain name of an hostname.

Usage is very simple:

> include('EffectiveDomainName.php');

> print EffectiveDomainName::get('mobile.nytimes.com') . "\n";
nytimes.com

> print EffectiveDomainName::get('flightjs.github.io') . "\n";
flightjs.github.io

> print EffectiveDomainName::get('www.google.com.br') . "\n";
google.com.br

https://github.com/gka/effective-domain-name

@mattab commented on November 27th 2014 Owner

@gka Thanks for the tip.

Weird that this issue got closed, I don't think I closed it unless it was by mistake...

It would be relatively easy to create a plugin that will either modify existing getWebsites or add new related report report where we will call a filter GroupBy that will group rows by "effective domain".

@mattab commented on November 27th 2014 Owner

Would you also group t.co under twitter.com ?

and maybe group m.facebook.com and lm.facebook.com under facebook.com ?

@gka commented on November 27th 2014 Member

Since facebook.com is not listed as effective TLD (aka "public suffix"), any subdomain *.facebook.com will indeed be "normalized" to facebook.com. However, t.co is not being "grouped" with twitter.com, as both are entirely different domains.

@mattab commented on December 1st 2014 Owner

Hi @gka alright

maybe we could use your list and then customise it with all known social networks domains for example.
I'm setting to Short term as it's quite easy to build this at least in a plugin on the Marketplace

we'd simply apply the normalisation function in a custom filter, that would GroupBy the labels by the normalisation function. it would ideally be possible to disable it in the Cog icon menu.

Powered by GitHub Issue Mirror