@ataraxie opened this issue on October 22nd 2014

Matt suggested me to create a bug report for this problem which was discussed in detail in this forum thread: http://forum.piwik.org/read.php?2,120225,120539 ### Scenario

As Piwik admin user, navigate to Settings => Websites. I always get "You currently have access to 0 website". This happens when I am superuser as well as when I have dedicated "admin" permission to websites. ### Problem source - The JSON response of the request is empty with Content-Length: 0 - Request: http://mypiwikhost/index.php?date=yesterday&fetchAliasUrls=true&format=JSON2&idSite=1&method=SitesManager.getSitesWithAdminAccess&module=API&period=day&token_auth=XXXXXXXXX - This is the case as soon as at least one Piwik site has an umlaut ("ä", "ü", "ö" ...) in its name - If format=XML is set in the request, it works fine ### Environment - PHP 5.5.10 - Internet Information Services 7.5 (IIS) - MySQL 5.5.27 - Windows Server 2008 R2

Since Piwik allows umlauts for site name, I consider this a Piwik bug. We have an enterprise customer installation of Piwik running with a database of ~ 40 GB. Our customer is very upset with this at the moment, since, with regard to the time I needed to figure out the problem source (see forum thread), this issue has kept us from updating from 1.12 to 2.x for several months. So I kindly ask you to consider this a major issue. Also, as the forum thread shows, I'm not the only one affected.

@mattab commented on October 22nd 2014

@ataraxie thanks for the report. Because it's likely that this is a bug in PHP, could you please try upgrade php to latest 5.5.18 ? if you still experience the problem with this version we would further investigate. cheers

@tsteur commented on October 24th 2014

This is not a bug in PHP I think. I had once access to another Piwik instance with the exact same issue. In the end the values were stored "broken" in the database or so. It was not UTF8. I couldn't find the actual problem but also didn't invest so much time.

@diosmosis commented on October 24th 2014

If the problem was w/ the DB, then format=XML would fail, no?

@tsteur commented on October 24th 2014

json_encode couldn't handle those values and failed

@ataraxie commented on October 24th 2014

That's right. I tried with PHP 5.5.18 and the problem persists. Also: when I add use umlauts in the Piwik admin interface for a website name, it's ok. My sites were added through the API. When I check in the database, then the encoding of the sites added through the admin interface is different.

So I add a Piwik site in the admin interface with the name "äää". I do a

SELECT name FROM piwik_site WHERE idsite = 1;

The result is:

äää

When I add the site with an SQL statement or through the Piwik API, the result of the above statement will be:

äää

The second case causes the error and I assume (as @tsteur stated above) json_encode fails (I'm wondering why, if so, there's nothing written in the PHP error log).

So, I guess this is clearly an database encoding issue. I created the Piwik database on my server with the default CREATE DATABASE statement as adviced in the Piwik docs. However, if I run this:

SELECT * FROM information_schema.SCHEMATA

The result is that my Piwik database has latin1 as DEFAULT_CHARACTER_SET_NAME and latin1_swedish_ci as DEFAULT_COLLATION_NAME.

Hmmmmmmm.....

@ataraxie commented on October 26th 2014

Ok, I finally got something in the log files:

ERROR API[2014-10-26 18:04:45] C:\inetpub\wwwroot\piwik\core\DataTable\Renderer\Json.php(74): Warning - json_encode(): Invalid UTF-8 sequence in argument
  #0  Piwik\Error::errorHandler(...) called at [:]
  #1  json_encode(...) called at [C:\inetpub\wwwroot\piwik\core\DataTable\Renderer\Json.php:74]
  #2  Piwik\DataTable\Renderer\Json->renderTable(...) called at [C:\inetpub\wwwroot\piwik\core\DataTable\Renderer\Json.php:29]
  #3  Piwik\DataTable\Renderer\Json->render(...) called at [C:\inetpub\wwwroot\piwik\plugins\API\Renderer\Json.php:55]
  #4  Piwik\Plugins\API\Renderer\Json->renderArray(...) called at [C:\inetpub\wwwroot\piwik\plugins\API\Renderer\Json2.php:21]
  #5  Piwik\Plugins\API\Renderer\Json2->renderArray(...) called at [C:\inetpub\wwwroot\piwik\core\API\ResponseBuilder.php:260]
  #6  Piwik\API\ResponseBuilder->handleArray(...) called at [C:\inetpub\wwwroot\piwik\core\API\ResponseBuilder.php:106]
  #7  Piwik\API\ResponseBuilder->getResponse(...) called at [C:\inetpub\wwwroot\piwik\core\API\Request.php:218]
  #8  Piwik\API\Request->process(...) called at [C:\inetpub\wwwroot\piwik\plugins\API\Controller.php:33]
  #9  Piwik\Plugins\API\Controller->index(...) called at [:]
  #10  call_user_func_array(...) called at [C:\inetpub\wwwroot\piwik\core\FrontController.php:587]
  #11  Piwik\FrontController->doDispatch(...) called at [C:\inetpub\wwwroot\piwik\core\FrontController.php:95]
  #12  Piwik\FrontController->dispatch(...) called at [C:\inetpub\wwwroot\piwik\core\dispatch.php:34]
  #13  require_once(...) called at [C:\inetpub\wwwroot\piwik\index.php:46]
@mattab commented on October 28th 2014

My sites were added through the API

Can you please paste here exact API URL used to created the website that cause the bug? I will try to re-create the issue locally. @ataraxie

@tsteur commented on October 28th 2014

I am pretty sure you won't be able to reproduce it. In the other instance where I experienced this bug it was also not possible to reproduce it on my instance. And it even failed to store the correct characters when adding a new site via SiteManager UI

@ataraxie commented on October 28th 2014

Thank your for further investigation. In my script, I'm calling this URL:

http://mypiwik/index.php?module=API&token_auth=[auth_token]&format=JSON&method=SitesManager.addSite&siteName=[siteName]&urls=[urls]

In addition to the administrator panel, the issue also occurs when I go to "All Websites" and then apply a date range.

I'm also going to do some further research in what data causes the json_encode call to fail. I'll let you know.

I don't think the issue is so hard to reproduce: - Create a database with DEFAULT_CHARACTER_SET_NAME latin1 (you can verify with SELECT * FROM information_schema.SCHEMATA) - Using MySQL workbench, change the name of a site to something with umlauts (e.g. UPDATE piwik_site SET name = 'testäää' WHERE idsite = 1;)

@ataraxie commented on October 28th 2014

Ok, some more information after experimenting in the source...

The interesting line where everything happens is piwik\core\DataTable\Renderer\Json.php line 74. Suppose I set the sitename of site 1 using MySQL workbench as explained above to "äää". If I do a few log/var_dump statements in the PHP file (suppose I have a putLog function), this is what happens: - putLog($array[1]['name'])) => "äää" - json_encode($array) => jsonbool(false) - json_encode($array[1]['name']) => jsonbool(false) - json_encode($array[1]['name']) => jsonbool(false)

So if the json_encode function returns jsonbool(false) on the array, there will be no data in the UI. The interesting thing is that if I do a utf8_encode on the problematic string, it will work fine (the result is a correct JSON string):

json_encode(utf8_encode($array[1]['name']))

I think the general question is: is my database encoding simply incorrect (latin1) and the requirement for Piwik to work is a UTF-8 encoded database? Note that MySQL distributions have usually default-character-set=latin1 in their my.ini. Otherwise, you'll probably need to take care of the encoding when sites are saved because you probably won't want to call a utf8_encode on all site names before calling json_encode. Another problem with that, however, will be that as soon as someone changes a site name using a tool like MySQL workbench, the problem will occur again. On the other hand, you're walking over the array already before calling json_encode: array_walk_recursive($array, $callback);. So in that case, you might want to consider calling utf8_encode on the site names in that callback function? Then in general, you'd probably want to think about what else besides the site names could be "broken".

I don't really know a good solution here at the moment. In our case, we'll probably need to patch the Json.php file temporarily with a utf8_encode on all sitenames in the callback function. Or something like that...

EDIT I temporarily applied the following patch that solves the problem in line 69 of Json.php: - before

$value =html_entity_decode($value, ENT_QUOTES, "UTF-8");
  • after
$value = utf8_encode(html_entity_decode($value, ENT_QUOTES, "UTF-8"));

I don't really know what side-effects this has, especially on performance. The request really really long. But I had this problem without the encoding issues as well. I guess this is the same as the problem here: http://forum.piwik.org/read.php?2,109904

@mattab commented on October 28th 2014

@ataraxie thanks for the details. I'd like to ask you to test something. In your [database] section in config.ini.php can you add the following:

charset = utf8

Then try again to add a new website with utf characters. Are the names recorded valid this time? is the bug still the same for data added to your piwik after changing the config file?

@ataraxie commented on November 3rd 2014

@mattab I cannot believe it - this is it! What I did is: - removed my patch with utf8_encode - reloaded page => no results shown - added your line in the config - reloaded page => results are shown!

I also added a new website with name "äää" like I did before using the API and it works too! So I really think this solves the issue. Thank you!!

Besides that, I now still have the problem that the load process for the websites takes really long (~ 1 min) and I get "long running script" errors in browsers. Maybe you could also give me some directions how to proceed here? My archive script is running every 40 minutes without errors.

@mattab commented on November 4th 2014

Besides that, I now still have the problem that the load process for the websites takes really long

please setup: http://piwik.org/docs/setup-auto-archiving/

regarding the initial problem, I'm wondering if we could detect that the Mysql does not run under proper unicode and automatically set the setting when it's the case?

@ghost commented on November 5th 2014

I found your solution here in GitHub for this kind of Problem: adding the line: charset = utf8 in the database section of config.ini.php.

I did this as well and the websites all reappeared in the settings.

But when I now look at my data the page titels are not displayed correct and look like this:

Broschüren --> should be: Broschüren Spaßbäder --> should be: Spaßbäder

The database is utf-8 and the pages are all utf-8 as well.

What can I do?

Best regards

Christoph

@ghost commented on November 6th 2014

No its gone. Maybe it was a caching thing.

@mattab commented on February 23rd 2015

Again today on the forums someone reported that adding charset=utf8 solved a problem for them. Maybe we should investigate this and really understand the problem. Or maybe we should simply add charset=utf8 to the config by default, to prevent such issue all together?

at http://forum.piwik.org/read.php?2,124264

@mattab commented on April 9th 2015

The goal of this issue is to estimate whether there is any performance overhead when we add charset = utf8 in the global config file by default. This will result in running SET NAMES query on each request - is there is no overhead doing this, let's add it for sure as it make Piwik work well across more DB configs!

(because we told users of the workaround charset=utf8 many times in the forums in the past, this little change would help much. added Performance tag and renamed issue)

@tsteur commented on April 21st 2015

It takes on my instance about 60 microsecs with Mysqli and 100 microsecs with PDO so should not cause any problems. Not sure how it behaves under load but don't think that it causes any problems.

Maybe the Tracker\DB could be improved as it does a separate set names query https://github.com/piwik/piwik/blob/2.13.0-b3/core/Tracker/Db/Pdo/Mysql.php#L92-L95 whereas in normal DB it is handled directly in DSN and driver option https://github.com/piwik/piwik/blob/2.13.0-b3/libs/Zend/Db/Adapter/Pdo/Mysql.php#L85-L107

I presume second version (dsn + init command) should be even faster

@mattab commented on April 21st 2015

Maybe the Tracker\DB could be improved as it does a separate set names query whereas in normal DB [...]

btw reading your comment i remembered we wanted to consider: Unify Tracker/Db and Db into one? #7727

@saleemkce commented on May 9th 2015

Yes, I have had bad experiences in the past in OSIpage app where users would post lots of URLs from different languages. We had some key tables where URLs were stored, were using default latin1 charset which seemed to garble URLs of foreign languages.

The fix had been quite simple. We need to make sure that, - when creating database, we should use database collation as either "utf8_general_ci" or "utf8_unicode_ci" -> utf8_general_ci is a little bit faster than utf8_unicode_ci -> utf8_unicode_ci is more accurate than utf8_general_ci according to myql documentation. -> the best bet may be using "utf8_unicode_ci" as we would always prefer accuracy over small performance improvement. - all tables should use charset as utf8 and collate as utf8_unicode_ci

In other words, we could use this mysql command to change a table's charset one at a time.

ALTER TABLE myTable CHARACTER SET utf8 COLLATE utf8_unicode_ci;
  • any website displaying content other than English, should have this in HTML head tag so that it handles other language characters properly & correctly on web page.
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8">

After applying these changes in OSIpage app, we didn't ever come across URL garble issues again. I hope this solves the issue.

@mattab commented on July 14th 2015

It takes on my instance about 60 microsecs with Mysqli and 100 microsecs with PDO so should not cause any problems. Not sure how it behaves under load but don't think that it causes any problems.

@tsteur Awesome! moving into 2.15.0

This issue was closed on August 14th 2015
Powered by GitHub Issue Mirror