@imoullet opened this Issue on February 17th 2014

Running
piwik 2.1 RC1
python 2.7.1

Discussing about thet problem for 4 weeks now in the forum (
http://forum.piwik.org/read.php?2,110277 ) I cannot get any solution and really think there is a bug in import_logs.py.I let you read all the details in the discussion mentioned above

Here is a summarry with some test log file ..

I run the following command
/usr/bin/python2.7 /var/www/html/piwik/misc/log-analytics/import_logs.py --url=https://w3stat.unil.ch/piwik/ /var/tmp/stats/app/xxxx --idsite=xxx --config=/var/www/html/piwik/config/config.ini.php --recorders=2 --log-hostname=www3.unil.ch --hostname=www3.unil.ch --enable-static --enable-bots --enable-http-errors --enable-http-redirects --enable-reverse-dns --strip-query-string

for the two logfiles I send you in attachment ( 22nd of january and 14th pf february)

You can have a look to the results for this site on our piwik site : https://w3stat.unil.ch/piwik using piwik/debug4piwik as user/pwd.

You wil see that the piwik results are wrong both for the visitor log ( some IP are ignored for the 22nd of january AND also for the 14th of february ) and the actions > pages report.

For example, some IP are missing in Log visitor report for day 22 of january

65.55.24.218 and 83.233.207.74 are not there while they are present in the log files.. ( see my preceding message)

And the actions > pages report is empty !!!!!!!!!!!!!!!!! while I have some access such as

83.139.189.139 - - +0100 "GET /wpmu/alumnil/participez-a-la-construction-dun-nouvel-avenir-technologique-et-social/ HTTP/1.0" 200 34367 "http://www3.unil.ch/wpmu/alumnil/participez-a-la-construction-dun-nouvel-avenir-technologique-et-social/" "Mozilla/5.0 (Windows NT 5.2; rv:17.0) Gecko/20100101 Firefox/17.0"

in my logfile

I also mention that for the same site I can see some access ( in actions > pages report) as I use the WP piwik plugin for this individual site !!

The actions > downloads report is the only one which seem to be correct.

So in conclusion I cannot compare my results for each individual Wordpress site generated using WP PIwik plugin and the results for all my WOrdpress sites generated using import_log.py. Indeed the result for all ( ie 250 sites !!) WP sites are much less than for one indivudual site. That 's the reason which alerts me somethng was wrong with import_log.py !!

I am really confused about that..
I have the same results for all my parsed logfiles. They all come from an apache webserver with combined ( ncsa.. ) format..

Please let me know if you need more information.. The piwik output file is correct in the sene that it imports the correct number of lines.......

Hope you can help me !!

Keywords: import_logs

@imoullet commented on February 17th 2014

Attachment: test logfile 22n january
access_test0

@imoullet commented on February 17th 2014

Attachment: logfile 14th Feb
access_test1.zip

@mattab commented on February 20th 2014 Owner

65.55.24.218 is excluded because it is a MSN bot IP address which we exclude by design ( unless you add --enable-bots )

IP: 83.233.x.x is tracked for me.

After importing the logs you have to re-run the archive.php cron script

All looks like working (I reused your access_test0 log). Plaese try again with RC5 as I think it will work!

@imoullet commented on February 20th 2014

Indeed I added the --enable-bots option !!!

So I retry with RC6 today on access_test0 ( 13 lines logfile) and I still have the same problem.. IP: 83.233.x.x is not tracked for me and nor is 65.55.24.218 even with --enable-bots option ..

Why do you say I have to re run archive.php cron script ?? I don't understand..

and what do you think of this following one line logifle example I mentioned in the forum


Last test with piwik 2.1RC6 and one line logfile :

41.107.212.109 - - +0100 "GET /slav/ling/cours/a07-08/SEMI%20UNIL/041207Iva.html HTTP/1.1" 200 6690 "https://www.google.dz/" "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.107 Safari/537.36"

command:
/usr/local/bin/python2.7 /var/www/html/piwik/misc/log-analytics/import_logs.py --url=https://w3stat.unil.ch/piwik/ --idsite=579 --config=/var/www/html/piwik/config/config.ini.php --recorders=2 --log-hostname=www2.unil.ch --hostname=www2.unil.ch --enable-static --enable-bots --enable-http-errors --enable-http-redirects --enable-reverse-dns --strip-query-string /var/tmp/stats/prod/access_slav2 2>&1

Output:
0 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
Parsing log /var/tmp/stats/prod/access_slav2...
Purging Piwik archives for dates: 2014-02-17
To re-process these reports with your new update data, execute the piwik/misc/cron/archive.php script, or see: [piwik.org] for more info.

Logs import summary

1 requests imported successfully
0 requests were downloads
0 requests ignored:
0 invalid log lines
0 requests done by bots, search engines, ...
0 HTTP errors
0 HTTP redirects
0 requests to static resources (css, js, ...)
0 requests did not match any known site
0 requests did not match any requested hostname

Website import summary

1 requests imported to 1 sites
1 sites already existed
0 sites were created:

0 distinct hostnames did not match any existing site:

Performance summary

Total time: 0 seconds
Requests imported per second: 11.11 requests per second

And the actions > report page is ..... empty ................!!!!!!!!!! I just see the IP in the visitor log but again with 0 action.

Any idea ??


Could it be my database is corrupted somewhere ?? I do not understand and really have no confidence in piwik tracking results

Thanks for your help

@mattab commented on February 21st 2014 Owner

do you have "Browser trigger archiving" enabled?

see: http://piwik.org/docs/setup-auto-archiving/

execute:

 php misc/cron/archive.php --url=http://piwik.example.org ```

after importing the stats.

can you now see it in the Page URLs report for 17th of feb?
@imoullet commented on February 21st 2014

No, "Browser trigger archiving" is disabled now..
I run archive.php every 2 hours.. which means it has been run a lot of times since I import the data..
and I stil do not see the pages in actions > Pages report and I can see the IP BUT with "0 actions"

!!!
I am away of my office for one week now so do not "desperate" if I don't reply to your suggestions the day after..

Best regards,

@vspiliop commented on May 13th 2014

Hello to all!

I am using piwik for a customer and just found out the following very serious issue.

I am using the latest piwik (2.2.0) and have probably the same issue with imoullet.

PROBLEM:

Lines with HTTP status 200 are ignored!! i.e. only the first entry is included both to the Visits and to the Actions. This applies before or after I do the achieving. So archiving is irrelevant.

I just import (access.log : file with just 2 lines):

66.249.76.11 - - +0100 "GET /id/resource/013541589 HTTP/1.1" 303 - "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.76.11 - - +0100 "GET /doc/resource/007667232 HTTP/1.1" 200 - "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

via command:

python import_logs.py --url=http://localhost:83/
analytics/ access.log --idsite=1 --recorders=2 --enable-http-errors --enable-http-redirects --enable-static --ena
ble-bots --add-sites-new-hosts

Result:

0 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
Parsing log access_006_bl.services.tso.co.uk.2014.05.12.log...
Purging Piwik archives for dates: 2014-05-11
To re-process these reports with your new update data, execute the following command:
piwik/console core:archive --url=http://example/piwik/
Reference: http://piwik.org/docs/setup-auto-archiving/

Logs import summary

2 requests imported successfully
0 requests were downloads
0 requests ignored:
    0 invalid log lines
    0 requests done by bots, search engines, ...
    0 HTTP errors
    0 HTTP redirects
    0 requests to static resources (css, js, ...)
    0 requests did not match any known site
    0 requests did not match any requested hostname

Website import summary

2 requests imported to 1 sites
    1 sites already existed
    0 sites were created:

0 distinct hostnames did not match any existing site:

Performance summary

Total time: 0 seconds
Requests imported per second: 3.29 requests per second

Kind Regards,
Vassilis

@mattab commented on May 14th 2014 Owner

I am using the latest piwik (2.2.0) and have probably the same issue with imoullet.

Latest piwik is 2.2.2 and this bug should be fixed.

please try latest latest beta; http://piwik.org/faq/how-to-update/faq_159/
and create a new ticket here if you still have a bug with that version. thanks

This Issue was closed on May 14th 2014
Powered by GitHub Issue Mirror