@anonymous-piwik-user opened this Issue on February 15th 2013

Description: Make Piping one single line from the access.log into import_logs.py works but using the same command directly from apache nothing gets logged

This is about the same issue as reported by ottodude125 (#3163) and elm (#3163). Piping one single line from the access.log into import_logs.py works but using the same command directly from apache nothing gets logged.

To reproduce, use the following apache config (within a VirtualHost container):

# Set up your log format as a normal extended format, with hostname at the start
LogFormat "%v %h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" piwik_log_format
# Log to a file
CustomLog /var/log/apache2/access_piwik.log piwik_log_format
# Log to piwik
CustomLog "|/usr/bin/python /var/www/foo/piwik/misc/log-analytics/import_logs.py --idsite=1 --recorders=1 --enable-http-errors --enable-http-redirects --enable-static --enable-bots --url=http://foo.bar.org/piwik/ --log-format-name='common_vhost' --output=/tmp/piwik_import.log -dd -" piwik_log_format
# Test log piping
CustomLog "|tee -a /tmp/pipetest.log" piwik_log_format

Both /var/log/apache2/access_piwik.log and /tmp/pipetest.log gets populated, so there is no doubt that the import_logs.py script receives the standard input continuously. However, /tmp/piwik_import.log only gets any activity upon restart of apache (like for ottodude125)

Running:

/usr/bin/python /var/www/foo/piwik/misc/log-analytics/import_logs.py --idsite=1 --recorders=1 --enable-http-errors --enable-http-redirects --enable-static --enable-bots --url=http://foo.bar.org/piwik/ --log-format-name='common_vhost' --output=/tmp/piwik_import.log -dd /var/log/apache2/access_piwik.log

... (the same command as in the CustomLog directive but instead reading a file instead of stdin) works fine.

Piwik 1.10.1
Debian 6

@anonymous-piwik-user commented on February 15th 2013

Have also verified that the behavior is the same whether or not sh (dash) or bash is standard shell for www-data (the apache user).

Using python's unbuffered binary stdout and stderr (-u) like so:

CustomLog "|/usr/bin/python -u /var/www...

...makes no difference.

I added debug log output in import_logs.py already in main() which is also only printed out on apache restart. On any subsequent requests all is silent...

@anonymous-piwik-user commented on February 15th 2013

Believe I've found the main issue here.

According to apache docs (http://httpd.apache.org/docs/2.2/logs.html#piped): Apache will start the piped-log process when the server starts, and will restart it if it crashes while the server is running. (This last feature is why we can refer to this technique as "reliable piped logging".)

However, import_logs.py will not start actually parse any lines until an EOF has been encountered it seems.

So, this works:

cat access_piwik.log "|/usr/bin/python /var/www...etc

...but this does not work:

tail -f access_piwik.log "|/usr/bin/python /var/www...etc

The latter example will keep the python script running indefinitely until the tail process is killed, whereafter the read lines will be parsed.

@mattab commented on February 15th 2013 Owner

Thanks for the report & clear info!

If you can find the fix in the python script that'd be great...

@anonymous-piwik-user commented on February 15th 2013

First of all, the script hangs on something like sys.stdin.read() internally (it hangs just before the for loop in parse(), doesn't proceed past it it seems). This indicates that some buffer is being used, but I am not sure about the size of this buffer.

sys.stdin.readline() is however unbuffered. In the end, a fix to this issue is replacing

        for lineno, line in enumerate(file):

with

        lineno = 0
        while 1:
            line = sys.stdin.readline()
            if not line: break
            lineno = lineno + 1

Perform this in a copy of import_logs.py (call it pipe_import_logs.py or something), since this breaks normal usage :)

Of course, use pipe_import_logs.py in your CustomLog directive as well.

It is possible that stdin is buffered and that the script that seems to hang eventually gets some lines to work with, but relying on that seems risky since the buffer is thrown away on apache restart.

I'll leave it up to some actual python devs to do the proper refactoring :)

Relevant links:
http://stackoverflow.com/questions/7056306/python-wait-until-data-is-in-sys-stdin
http://stackoverflow.com/questions/6033781/how-to-avoid-python-fileinput-buffering

@anonymous-piwik-user commented on February 15th 2013

Btw, this is how the flow goes:

  1. Apache starts a shell with the piped python process on server start
  2. Apache pipes log contents continuously to that shell's stdin
  3. (with the above fix) The python process runs an eternal loop (while 1) and parses each received log line from stdin, turning them into "hits"
  4. For each config.options.recorder_max_payload_size (--recorder_max_payload_size=200 is the default), the hits gets POST:ed to the piwik server by one or many recorder processes that runs in the background. While in the eternal loop, the script currently does not output anything revealing what's going on...
  5. On apache shutdown the process is simply killed (ie no status reports will be written out anywhere).

Therefore, it is a good idea to also add some statements like: logging.debug('lineno %s', lineno) or similar to be able to verify that it works for you.

@anonymous-piwik-user commented on February 15th 2013

Btw, any hits that still haven't been recorded to Piwik (because them being less than recorder_max_payload_size, in queue or whatever) when Apache crashes or restarts will not be recorded. Ideally, the script should listen to the correct signals and record these before shutting down...

@mattab commented on April 4th 2013 Owner

There were bugs in the script. Can you try with the latest python script from GIT or from : http://builds.piwik.org/piwik-1.12-b5.zip ?

@anonymous-piwik-user commented on April 4th 2013

Still not working with latest git version of import_logs.py.

@anonymous-piwik-user commented on June 25th 2013

I have the same Problem / reloading Apache also gets the entries to piwik

@anonymous-piwik-user commented on June 25th 2013

when this bug will be closed. I change awstats by piwik

@mattab commented on January 13th 2014 Owner

To all users who have had this bug, do you still experience the issue with Piwik 2.0.3?

@mattab commented on March 16th 2014 Owner

Hi guys, this ticket is getting long. Maybe someone can make a pull request with a fix for this issue?

@anonymous-piwik-user commented on March 19th 2014

Hi all

I do have got Piwik 2.1.0 installed via FreeBSD ports. It doesn't work for me, even after applying the patch above. Is there an other solution available?

@anonymous-piwik-user commented on March 20th 2014

Replying to mirror:

Hi all

I do have got Piwik 2.1.0 installed via FreeBSD ports. It doesn't work for me, even after applying the patch above. Is there an other solution available?

It does work now, but there is a delay. I did do the following:

As motin mentioned 13 months ago it makes no difference if I use -u or not. But it is working, not as stated before.

@mattab commented on March 12th 2015 Owner

Issue was moved to the new repository for Piwik Log Analytics: https://github.com/piwik/piwik-log-analytics/issues

refs #7163

This Issue was closed on March 12th 2015
Powered by GitHub Issue Mirror