@mattab opened this issue on April 13th 2009

Currently the Piwik tracking code has a noscript which could be used to record visits from people without Javascript enabled.

There is some work required to - filter out search engine bots - filter out spam bots - filter out all other type of bots

Of course this could also be used to log bots and show them in a specific Piwik report "Bot activity".

The initial design decision was to not record any visitor without Javascript as it is a lot of work to ensure that the data coming from Javascript-disabled devices is accurate and not bot initiated.

To record a visit without JS, you must call

piwik.php?idsite=$ID_SITE
          &rec=1
          &action_name=$ACTION_NAME

See also PUSH API without Javascript #134 Keywords: bots noscript

@anonymous-piwik-user commented on August 31st 2009

To me this is a major issue, as non-javascript users still give us valuable information. I would love to see this implemented before 1.0

## Suggestion 1

Couldn't the http:BL by Project Honeypot be used to filter out any bots? They offer an API to identify Search Engines, Spammers and other bots by IP address.

Piwik could work like this: - Javascript enabled: - count users as usual - Javascript disabled: - Discard all users that are known search engine bots (by User-Agent) - Check IP of all remaining users against http:BL, discard if known bot, count otherwise.

This way traffic for the blacklist server would be kept low. I still think every Piwik installation would need their own API key, though.

## Suggestion 2

Piwik should include its own tiny honeypot. The <noscript> tag should include a link that is invisible to the user and that has rel=nofollow.

<a href="http://domain/piwik/honeypot.php" rel="nofollow">&nbsp;</a>

Only malicious crawlers will follow this link, so Piwik can exclude their IPs from tracking. Known, well-behaving search bots can still be identified by User-Agent. This way, most bots will probably get identified.

@anonymous-piwik-user commented on November 6th 2009

Replying to matt: I wan't this feature too. Not only users are interessting. I want see which bots crawl my site.

@philmck commented on January 13th 2010

Can I add my vote for this as well? We're missing out on visits from many mobile phone users and disabled people using screen readers, for example, because they don't have javascript. And there are legitimate reasons for disabling javascript in a normal browser as well. I agree we need to separate out the bots somehow for the statistics, but really that's a separate issue. I'd like the option of counting all visitors, even if that includes bots.

@anonymous-piwik-user commented on February 4th 2010

The code

<a href="http://domain/piwik/honeypot.php" rel="nofollow">&nbsp;</a>

will be visible to blind persons using screen-reader software. It would be better to code this as

<a href="http://domain/piwik/honeypot.php" rel="nofollow" style="display:none;">&nbsp;</a>

which will also hide it from the screen readers.

Hope this helps, Charles Belov SFMTA Webmaster www.sfmta.com/webmaster

@robocoder commented on February 4th 2010

Charles: that's not our tracking code. Piwik's tracking code doesn't contain an anchor link (honeypot or otherwise).

@robocoder commented on February 4th 2010

re: comment:3 - The idea behind the noscript tag is to track Javascript-disabled visitors. We'll provide a hook here so third-party plugins can implement suggestion 2.

@mattab commented on March 18th 2010

In order to report search engine bot activity, we could reuse some of the GPL code from http://www.crawltrack.net/ which is a php bot tracker tool. The logic could sit in a Piwik plugin. There could be a new sub tab, that would report bot activity for each bot that was seen during the selected date range.

Bots would be identified by user agents and / or IPs, see eg. the list at crawltrack: http://www.crawltrack.net/crawlerlist.php

Additional features could include: - give ratio of bots VS human activity on the website (what percentage of traffic comes from bots VS humans) - for a given bot on a given day, list all pages crawled - list bot crawling frequency in a new column (next to Visits, Page views, etc.). eg. google can crawl one page every 10s, other bots would crawl one page every 1 min, etc.

@anonymous-piwik-user commented on March 28th 2010

So i think it would be interesting to track also robots f.e. for big sites. With this feature you can see how many bots a scraping your site. But it make sense to see Googlebot, Msnbot and maybe Slurp (Yahoo)

But this should track in a seperatet table with a special plugin - like Live Bots ;-)

In my tool http://www.spider-trap.de/en_index.html i ban a lot of bad bots. Maybe Piwik can report the webmaster if an bot is crawling.

@mattab commented on July 29th 2010

The Tracking API has been released, which can help track visitors without Javascript, or even track visits Mobile apps, desktop apps and more.

http://piwik.org/docs/tracking-api/

This issue was closed on July 29th 2010
Powered by GitHub Issue Mirror