@mattab opened this Issue on December 19th 2013 Owner

Reported in the forum: http://forum.piwik.org/read.php?2,108645

There was proposed solution:
http://forum.piwik.org/read.php?2,108645,page=1#msg-108949


by setting:

[database]
charset = utf8
@mattab commented on March 16th 2014 Owner

Since there is a work-around available, I decrease priority of this ticket. If you experience this bug, please comment here! we would like to hear from more users having the issue.

@mattab commented on April 27th 2014 Owner

Increasing priority as user sent us the database by email, so let's try to replicate!

Please report in this issue if you experience this bug, we need more report to undertand it.

@bla-kw commented on June 1st 2015

I had the same problem after our hoster Mittwald provided an update from Piwik 2.10 to 2.13.1.
Adding this charset seemed to fixed the problem, but umlaute from old entries look broken now, so I reverted.
It's php5.6.5 and MySQL 5.5. The database and tables are utf8_general_ci. php.ini has default_charset = "UTF-8"
We track several websites, but only with one I had this problem. In the other pages that still work there are also umlaute, but there was no problem without the setting.

I think I could track it down. The problem are umlaute in goal name. I renamed the goal with an umlaut in the DB and afterwards Piwik frontend was working again for this website, too.

@anEffingChamp commented on October 30th 2015

I hit this problem on a fresh installation. Piwik reports:

The string to escape is not a valid UTF-8 string in "@Installation/welcome.twig" at line 1.

I checked the template files for the plugin, and every thing looks fine. The error reported suggests that the problem may exist in the layout.twig, but that seems fine too. However when I remove the offending line Piwik progresses to the first installation page, but missing a lot of HTML.

@garvinhicking commented on February 10th 2016

Hi!

After upgrading from 2.15.0 to 2.16.0 viewing a segment report comes up with:

[code]
The string to escape is not a valid UTF-8 string in "@CoreHome/getDefaultIndexView.twig" at line 7.
[/code]

There are several segment reports that use umlauts. They worked just the same in 2.15.0 (!!!), and in the normal backend they show up without a problem in dropdowns.

What's happening? Googled a bit, found to insert 'charset = "utf8"' to config.ini.php but that doesn't change anything.

I don't know Twig, how can I debug exactly what UTF-8 string is making trouble here?!

@VorobeY1326 commented on February 29th 2016

Absolutely same issue after updating to 2.16.0.
Trying to segment by userId -> fail in file getDefaultIndexView at line 7.
Not sure about using some special utf-8 symbols, userId is just md5 string.

Edit: every segmented report fails, not only by userId.

@garvinhicking commented on February 29th 2016

To the devs: I'd love to debug, can you give a pointer where in PHP-scope this line 7 gets filled, or to see the actually "bad" content? I don't know where to start looking.

@VorobeY1326 commented on March 1st 2016

+1, ready to debug this issue, just give some hint

@schwindelbub commented on March 7th 2016

I had the same issu after migrating an existing piwik instance to a another server. We found out that two things resolved this issu:

  1. Switch to user language "English" instead of "Deutsch" in the user profile. Clear the browsercache. But in this case every user has to use the "Deutsch" settings. Its a bad workaround i think.
  2. Setting the default_charset in the php.ini. Maybe piwik does not handle it correct. Is this value in the php.ini is empty, the default fallback "UTF-8" should take place. But if we set it explicit everything works fine. http://php.net/manual/de/ini.core.php#ini.default-charset Ok, it should not be empty, but it can.

Sorry, but unfortunately the second option doesn't work for me.
System: PHP 5.6.19, 5.5.44-MariaDB

@mattab commented on March 15th 2016 Owner

@schwindelbub @VorobeY1326 @garvinhicking @anEffingChamp @hdi-kw Please test this patch: https://github.com/piwik/piwik/pull/9926/files

Does it fix the issue for you?

@schwindelbub thank you for the tip re: default_charset - I never came across this setting before and this may actually be the solution :-)

@garvinhicking commented on March 15th 2016

Sadly no change for me, I also forced default_charset in the php.ini.

I'd really love to understand which string exactly twig is trying to escape in getDefaultIndexView.twig at line 7. There must be some way to intercept the actual string so that I can understand where it comes from, and in which charset it is?!

@schwindelbub commented on March 15th 2016

For me there is also no change with this patch.

@schwindelbub commented on March 22nd 2016

I find another problem: If a visitor comes from a google search with a "ß" in the search term, this term is broken in the report.

For example:
Searchterm: maße
In the report: maã_e

Every other Umlaut works fine.

@schwindelbub commented on March 22nd 2016

As an hint: I migrated from
Debian with PHP 5.4.45 and mysql 5.5.47
to
CentOS with PHP 5.6.19 and 5.5.44-MariaDB

@schwindelbub commented on March 23rd 2016

I found a workaround that fixes the crash: Just set a comment around the Template.nextToCalendar in the twig template getDefaultIndexView.twig

{# {{ postEvent("Template.nextToCalendar") }} #}

I have no idea what this line does :-/ But it works without :-)

@garvinhicking commented on March 23rd 2016

@schwindelhub

cool, thanks for your effort, if the devs don't care :) I'll try it out, probably some event hooks for plugins there are executed... Maybe the problem is the system locale which uses ISO instead of UTF-8 for date outputs. Maybe setlocale() instead of default_charset can help here. My install is running on Windows, probably there's no UTF-8 locale abailable there...

On 23.03.2016, at 08:57, Schwindelhub notifications@github.com wrote:

I found a workaround that fixes the crash: Just set a comment around the Template.nextToCalendar in the twig template getDefaultIndexView.twig

{# {{ postEvent("Template.nextToCalendar") }} #}

I have no idea what this line does :-/ But it works without :-)


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub

@Cruiser13 commented on March 23rd 2016

Same issue for me, every segmented report fails with error message The string to escape is not a valid UTF-8 string in "@CoreHome/getDefaultIndexView.twig" at line 7 - any support would be welcome.

@Viswan-piwki commented on March 29th 2016

Hi there, please help in fixing this. Charset does not work. Should we install one less version say 2.15 ?

@VorobeY1326 commented on March 29th 2016

@schwindelbub Workaround works cause this line renders element with segmented reports. Works for me too, now I can't choose any segment: no segment no crash :)

@VorobeY1326 commented on March 29th 2016

More interesting scenario:
1) select some segment -> site fails with exception "The string to escape ..."
2) comment line with "nextToCalendar" as @schwindelbub suggested
3) restart site
4) reload failed page -> page opens, segmenting works, but no commented element, so I can't choose some other segment

Seems like this element (where I can choose segment) fails to render if some segment is selected. And in my case name of this segment was "IE 11", nothing very special.

@tsteur commented on March 29th 2016 Owner
@tsteur commented on March 29th 2016 Owner

To debug maybe remove some parts from the twig file, reload the page, and see if it still occurs. Slowly maybe remove more parts until the error doesn't occur anymore. This way you can maybe find a part of the template where it tries to escape the value.

However, likely it is not a problem with the template itself but the stored value in the database. I would recommend to make sure

  • Latest Piwik 2.16 is used
  • No custom [database] charset=... is set in config.ini.php
  • Segment name and definition are both stored correctly in piwik_segment database (check via select * from piwik_segment;). Likely there are some encoding issues here.

If someone can give me access to the server that has such a problem I would be happy to debug and try to find the problem. I would need access to the files, the database and the actual Piwik UI. If someone can provide access please email us at hello (at) piwik.org and afterwards leave a comment here in case it goes into spam folder.

@Viswan-piwki commented on March 30th 2016

Ok, now Cannot connect to the database:

SQLSTATE[42000] [1115] Unknown character set: 'UTF-8'

@Viswan-piwki commented on March 30th 2016

If I replace UTF-8 with utf8, i get this error back. The string to escape is not a valid UTF-8 string in "@CoreUpdater/runUpdaterAndExit_welcome.twig" at line 1. Although I dont see this line in this twig file

@garvinhicking commented on March 30th 2016

@tsteur Thanks, your hat-tip was gold. It is actually not a problem from the database point of view, but seems to stem from the i18n interface itself. See long description below.

I found the issue within SegmentSelectorControl.php when setting $this->segmentDescription. We have segments that have simply "Path contains XXX" definitions. Inside the DB, only the XXX definition is stored, and when it gets put into $this->segmentDescription, a piwik translation mechanism adds the german translation for "Path contains XXX" to it. For german, this is "Seiten-URL enthält XXX".

This "ä" character contains the URL entity %E3%A4, which is actually the valid UTF-8 encoding for "ä". However, at a point I cannot really debug at this point, this gets unproperly handled.

I patched my SegmentSelectorControl to read:

$this->segmentDescription = urlencode($formatter->getHumanReadable(Request::getRawSegmentFromRequest(), $this->idSite))

this will actually put in encoded special characters, but now the Reporting is no longer broken \o/

As for you devs, maybe you can reproduce it by choosing the German interface, so that the special character gets added there. I bet at some point there's some double UTF-8 encoding going on, and with the english language interface this doesn't happen.

I tried to quickly go through the language files, but couldn't find the file where the string for the Interface is actually defined.

@Viswan-piwki commented on March 30th 2016

Hi there,
any one please advice on SQLSTATE error please

@VorobeY1326 commented on March 30th 2016

@garvinhicking Same issue for me, urlencoding in this line fixes issue. And no problems in DB as suggested @tsteur, only ascii symbols in segment names.

@schwindelbub commented on March 30th 2016

@garvinhicking Your fix also works for me! Thank you so very much!!

@schwindelbub commented on March 30th 2016

But a last question: Why occurred this error not on the old server? (@_@)

@garvinhicking commented on March 30th 2016

@schwindelbub Good to hear. I believe it could be a change related to one of the german translation files (you and @VorobeY1326 are using German as well, I figure?) Maybe that file is mistakenly no longer UTF-8, or got transferred badly. Let's see what the devs say. If we can see where the actual string gets pulled from (I search for "enth.{1,8}lt" in all files, but did not find this word) we could get more debugging.

@Viswan-piwki Can't help with your issue, I don't think it's related to this. You are having an upgrader problem, maybe open another issue. "UTF-8" is not a valid SQL charset, you should either leave out the charset= option completely, or use "utf8".

@VorobeY1326 commented on March 30th 2016

@garvinhicking I'm using russian translation, so maybe it's problem in several translations.

@VorobeY1326 commented on March 30th 2016

Changing user language to English fixes problem also.

@schwindelbub commented on March 30th 2016

@garvinhicking Yes, i am using the german translation. The error occured only, if the profiles language is german. I think the "ä" is guilty :)

I searched in the docroot for the string in all files. Here are the results

find . -type f -exec grep -qi "enthält" {} \; -print0 | xargs -0 file

New Server
./lang/de.json: UTF-8 Unicode text, with very long lines
./plugins/Dashboard/lang/de.json: UTF-8 Unicode text
./plugins/DevicesDetection/lang/de.json: UTF-8 Unicode text, with very long lines
./plugins/UserCountry/lang/de.json: UTF-8 Unicode text, with very long lines
./plugins/Installation/lang/de.json: UTF-8 Unicode text, with very long lines
./plugins/Referrers/lang/de.json: UTF-8 Unicode text, with very long lines
./plugins/Actions/lang/de.json: UTF-8 Unicode text, with very long lines
./plugins/CoreAdminHome/lang/de.json: UTF-8 Unicode text, with very long lines
./plugins/CustomVariables/lang/de.json: HTML document, UTF-8 Unicode text, with very long lines
./plugins/Live/lang/de.json: UTF-8 Unicode text, with very long lines
./plugins/SitesManager/lang/de.json: UTF-8 Unicode text, with very long lines
./plugins/Goals/lang/de.json: UTF-8 Unicode text, with very long lines

Old Server
./lang/de.json: UTF-8 Unicode text, with very long lines
./plugins/Installation/lang/de.json: UTF-8 Unicode text, with very long lines
./plugins/Referrers/lang/de.json: UTF-8 Unicode text, with very long lines
./plugins/Actions/lang/de.json: UTF-8 Unicode text, with very long lines
./plugins/SitesManager/lang/de.json: UTF-8 Unicode text, with very long lines
./plugins/CustomVariables/lang/de.json: HTML document, UTF-8 Unicode text, with very long lines
./plugins/DevicesDetection/lang/de.json: UTF-8 Unicode text, with very long lines
./plugins/Live/lang/de.json: UTF-8 Unicode text, with very long lines
./plugins/Dashboard/lang/de.json: UTF-8 Unicode text
./plugins/CoreAdminHome/lang/de.json: UTF-8 Unicode text, with very long lines
./plugins/Goals/lang/de.json: UTF-8 Unicode text, with very long lines
./plugins/UserCountry/lang/de.json: UTF-8 Unicode text, with very long lines

For me are the files identicaly. Maybe it helps.

@tsteur commented on March 31st 2016 Owner

I tried to reproduce this problem for a while but couldn't, even with German language. äöüß etc is displayed correctly.

Changing user language to English fixes problem also.

That's interesting. When switching to German (or Russian) language there are two things different.

  • It tries to load a different locale. In my case it tried to load de_DE.UTF-8 (installed was de_DE.utf8 so also tried to reproduce it using that locale)
  • It uses different language files. It probably works with English language as they don't contain the umlaut.

Can everyone check whether the German locale is installed on their server? Eg via locale -a on Linux bash

But a last question: Why occurred this error not on the old server?

That would be interesting to know. Can you maybe compare installed locale's? Eg via locale -a on Linux bash . Is the PHP version the same on both systems? Maybe you can execute php -i and compare the installed PHP versions on both systems.

Which PHP version is everyone using?

You can find out by eg using php --version

@garvinhicking commented on April 1st 2016

I'm using a Windows IIS server, how do I figure out locales there? PHP didn't change on the upgrade. I only have limited access to the server so I can' check PHP version right now, but I believe it's 5.3. IIS/PHP versions dodn't change when upgrading Piwik last time where it worked before.

On 01.04.2016, at 00:21, Thomas Steur notifications@github.com wrote:

I tried to reproduce this problem for a while but couldn't, even with German language. äöüß etc is displayed correctly.

Changing user language to English fixes problem also.

That's interesting. When switching to German (or Russian) language there are two things different.

It tries to load a different locale. In my case it tried to load de_DE.UTF-8 (installed was de_DE.utf8 so also tried to reproduce it using that locale)
It uses different language files. It probably works with English language as they don't contain the umlaut.
Can everyone check whether the German locale is installed on their server? Eg via locale -a on Linux bash

But a last question: Why occurred this error not on the old server?

That would be interesting to know. Can you maybe compare installed locale's? Eg via locale -a on Linux bash . Is the PHP version the same on both systems? Maybe you can execute php -i and compare the installed PHP versions on both systems.

Which PHP version is everyone using?

You can find out by eg using php --version


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub

@schwindelbub commented on April 1st 2016

As i told you above: https://github.com/piwik/piwik/issues/4410#issuecomment-199812967
But we tested the problem with the same PHP Version, on CentOS.

And here the locale results

locale -a | grep -i de

New Server
de_AT
de_AT@euro
de_AT.iso88591
de_AT.iso885915@euro
de_AT.utf8
de_BE
de_BE@euro
de_BE.iso88591
de_BE.iso885915@euro
de_BE.utf8
de_CH
de_CH.iso88591
de_CH.utf8
de_DE
de_DE@euro
de_DE.iso88591
de_DE.iso885915@euro
de_DE.utf8
de_LU
de_LU@euro
de_LU.iso88591
de_LU.iso885915@euro
de_LU.utf8
deutsch
fy_DE
fy_DE.utf8
gez_ER@abegede
gez_ER.utf8@abegede
gez_ET@abegede
gez_ET.utf8@abegede
hsb_DE
hsb_DE.iso88592
hsb_DE.utf8
ks_IN@devanagari
ks_IN.utf8@devanagari
nds_DE
nds_DE.utf8
sd_IN@devanagari
sd_IN.utf8@devanagari

Old Server:
locale -a | grep -i de
de_DE.utf8

On the new Server are more locales avaiable.

@VorobeY1326 commented on April 1st 2016

@tsteur I'm using PHP 5.6.8 on Windows server / IIS. Current system locale is russian.

@Cruiser13 commented on April 1st 2016

Having the same issues here with IIS and PHP 5.5.33, german locale

@tsteur commented on April 3rd 2016 Owner

Maybe a random idea as it was mentioned it works with English language. I kind of want to find out whether it's related to the language file or eg the set locale.

Can someone replace the English language file with the German language file, switch language to English and see if it works?

Something like this

mv lang/en.json  lang/en.json.backup
cp lang/de.json lang/en.json

Then switch to English language and reload. You might also need to clear the cache directory in between:

rm -rf tmp/cache/*

After the test you can restore the correct english file by executing something like

mv lang/en.json.backup  lang/en.json

or

cp lang/en.json.backup  lang/en.json
rm lang/en.json.backup
@garvinhicking commented on April 3rd 2016

Nice idea. I'll test, but it'll take some time.

Just to be sure: the central lang/de.json file, not a lang file within a plugin directory, yes?

On 03.04.2016, at 22:57, Thomas Steur notifications@github.com wrote:

Maybe a random idea as it was mentioned it works with English language. I kind of want to find out whether it's related to the language file or eg the set locale.

Can someone replace the English language file with the German language file, switch language to English and see if it works?

Something like this

mv lang/en.json lang/en.json.backup
cp lang/de.json lang/en.json
Then switch to English language and reload. You might also need to clear the cache directory in between:

rm -rf tmp/cache/*
After the test you can restore the correct english file by executing something like

mv lang/en.json.backup lang/en.json
or

cp lang/en.json.backup lang/en.json
rm lang/en.json.backup

You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub

@tsteur commented on April 3rd 2016 Owner

Yes, the string that contains the previously mentioned enthält is in lang/de.json and not in a plugin directory.

@schwindelbub commented on April 4th 2016

@tsteur As you can see here, the string is not only in the lang/de.json
https://github.com/piwik/piwik/issues/4410#issuecomment-204275477

@tsteur commented on April 4th 2016 Owner

As you can see here, the string is not only in the lang/de.json

True. I had a look in the code though and for this particular widget it should use the string from that file.

@VorobeY1326 commented on April 6th 2016

Installed fresh 2.16.0 piwik to my local computer (Windows 7, IIS) and the issue easily reproduced. Just logged several actions, added one segment named "IE 11", clicked "save and use it" and voila — crash. Language of piwik is russian.

@tsteur commented on April 6th 2016 Owner

Can you give https://github.com/piwik/piwik/issues/4410#issuecomment-205053449 a try or give us access to a server where we can debug the problem maybe? If so please let us know via email: hello at piwik.org.

@garvinhicking commented on April 11th 2016

@tsteur Sorry it took some time. I was able to test now. Copying the english language file over the german one fixes the problem as well.

So I guess that the high-byte NON-ASCII char there in "enthält" must cause some sort of trouble in the PHP/IIS environment. Strangely, many other occurences of the german language file cause no problem at all, so I guess this ->getHumanReadable() method maybe double-encodes the UTF-8 input at some point?

@tsteur commented on April 11th 2016 Owner

Thanks for that @garvinhicking

In https://github.com/piwik/piwik/blob/2.16.1/plugins/SegmentEditor/templates/_segmentSelector.twig#L2 can you try to replace {{ 'SegmentEditor_CurrentlySelectedSegment'|translate(segmentDescription)|e('html_attr') }} with {{ 'SegmentEditor_CurrentlySelectedSegment'|translate(segmentDescription|raw)|e('html_attr') }}. Basically it adds the |raw before the escaping.

@schwindelbub commented on April 12th 2016

@tsteur This patch has no effect on my PIWIK (now 2.16.1). I still get the error, if the profile language ist "Deutsch".

And sorry, but i can't give you access to my server.

@garvinhicking commented on April 12th 2016

@tsteur Sadly, same for me. Doesn't fix the issue, same error.

@Cruiser13 commented on April 12th 2016

+1 with a non-fixed piwik including that patch.

@tsteur commented on April 12th 2016 Owner

I can't see any double encoding somewhere. Also compared hex codes etc and the character ä looks fine. Does it also occur when using different comparisons like equal, contains not(enthält nicht), etc.? Maybe you could also try for a test to remove the |e('html_attr') from https://github.com/piwik/piwik/issues/4410#issuecomment-208555027 and see if it works?

It would be really great if someone could offer access to a server to debug this issue.

@garvinhicking commented on April 13th 2016

@tsteur Ok I solved it:

In SegmentFormatter.php the getTranslationForcomparison() method uses "strtolower" to transform "Enthält" into "enthält". For that, it utilizes the "strtolower" PHP function, which is by default* not UTF-8 safe. So it messes up the UTF-8 and returns ISO-8859-1. To do that properly, the mb_strtolower function should be used.

The easiest, proper patch would thus be to replace this line in plugins/SegmentEditor/SegmentFormatter.php:

return strtolower($translation);

with:

return mb_strtolower($translation, 'UTF-8');

Note that I did hardcode "UTF-8" there, because I believe Piwik internally always works with UTF-8. If not, the proper way would be to use mb_internalencoding(charset) at some bootstrap code initially. But grepping through piwik's code, it seems whenever mb* functions are used, you have hardcoded the UTF-8 charset in the function call.

Also note that I do see quite a lot instances of "strtolower" inside other files. You might want to check those to see if they could contain non ISO-8859-1 characters, and use mb_* everywhere, so that the same problem will not show up elsewhere.

HTH.

(*: Unless mbstring.func_overload is set to overwrite strtolower with mb_strtolower in php.ini, which one should not rely on - that should quite likely be the reason why not every Piwik user is affected by this issue when using a language with UTF-8 characters).

@schwindelbub commented on April 13th 2016

@tsteur this solved the problem! Thanks! And it looks - expect the hardcoded UTF-8 - as a good solution.

@tsteur commented on April 13th 2016 Owner

@garvinhicking Awesome! 👍 Thanks so much! I will issue a PR. Should have seen it while looking at the code but didn't notice.

@garvinhicking commented on April 13th 2016

Great you approve. Afterwards it always gets obvious. Doesn't matter, now we can move forward. Thanks for listening and advising. :-)

On 13.04.2016, at 22:56, Thomas Steur notifications@github.com wrote:

@garvinhicking Awesome! 👍 Thanks so much! I will issue a PR. Should have seen it while looking at the code but didn't notice.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub

This Issue was closed on April 19th 2016
Powered by GitHub Issue Mirror