@mattab opened this issue on August 14th 2015

Currently, when creating a Page URL segment, the URL is match case sensitive. This means that if a user inputs any capital letter in the URL, it will not match the URL and no data will be shown. This is confusing to non technical users who are not aware that URL are case sensitive in Piwik segments.

Suggestions: - Could we make the segment match case insensitive? - For example `WHERE log_action.name COLLATE UTF8_GENERAL_CI LIKE '%UrlHereWithRandomCase%' may do the trick (source) - (if case insensitive matching is not possible, we should otherwise display some kind of notification, in the segment editor, when we detect that URLs contain some capital letters)

@mattab commented on August 14th 2015

adding to 2.15.0 as hopefully this is a quick fix (and this missing feature causing much confusion for some users)

@halabuda commented on August 17th 2015

hello matt. interesting problem presented here. i spent some time this afternoon looking into it. i just submitted a pull request with a simple fix addressing this issue. actually turned out to be a hash issue and not a problem with the log_action.name field itself which was already UTF8_GENERAL_CI.

@tsteur commented on August 18th 2015

As kinda mentioned in the PR I wouldn't say it is a bug. I'd maybe rather recommend to write a plugin that transforms all URL's into lowercase during tracking if someone wants this behaviour.

@mattab commented on August 19th 2015

maybe we could assume that "segment contains" implies "case insensitive match" ? ie. when we have "Segment CONTAINS xyz", maybe it's acceptable to include all variations of the xYz xyZ etc.

"Contains" uses the LIKE '%xyz% which maybe could be changed to COLLATE UTF8_GENERAL_CI LIKE to have case insensitivity matching?

@tsteur commented on August 19th 2015

Yeah but wasn't the problem that there is no index on that column?

@mattab commented on August 19th 2015

segment CONTAINS uses LIKE and so does not use the index. therefore adding the case insensitivity match, would not change the fact that the "Segment contains" already do not use the index.

@tsteur commented on August 19th 2015

And what about other cases? Like "equals"? I'm not sure if it will be intuitive for users that we sometimes do case sensitive and sometimes not. Reckon docs need to be updated and maybe UI improved to explain it. Maybe there could be "Segment contains case insensitive". In some cases someone might still want to use case sensitive but not sure how often...

@mattab commented on August 20th 2015

And what about other cases? Like "equals"?

fyi: for "equals" we cannot implement a "case insensitive" because when the operator is "equals", then we use the "hash" match.

Reckon docs need to be updated and maybe UI improved to explain it.

:+1: doc is at: http://developer.piwik.org/api-reference/reporting-api-segmentation

Not sure how we could make it look nice in the UI?

Also, we should change "Does not contain" to be similarly case insensitive.

In some cases someone might still want to use case sensitive but not sure how often...

I suggest we change "Contains" to mean "Contains (case insensitive)" and if some users complain in the future, we could consider adding a new "Contains (case sensitive)" (we would need a new segment operator too...)

@tsteur commented on August 21st 2015

fyi: for "equals" we cannot implement a "case insensitive" because when the operator is "equals", then we use the "hash" match.

I know, that why I meant it will be confusing to apply it in some cases and others not.

I suggest we change "Contains" to mean "Contains (case insensitive)" and if some users complain in the future, we could consider adding a new "Contains (case sensitive)" (we would need a new segment operator too...)

For the UI sounds good to me, docs as mentioned need to be updated clearly as well. This is kinda a "breaking change". Need to mention it there as clearly as well which should be easily doable. And we also need to mention that before 2.15 it used to be not case insensitive.

Do I get this right that we will now make all contains case insensitive?

@mattab commented on August 21st 2015

Do I get this right that we will now make all contains case insensitive?

:+1:

This is kinda a "breaking change".

We can document this: - in the developer changelog, - in the developer guide, - In the user guide - and it will be listed in the changelog via this issue title.

@tsteur commented on August 21st 2015

FYI: At least in my local Piwik it looks like it always was case insensitive already as collation=UTF8_GENERAL_CI is default.n I hope it won't lead to any errors in case database uses different charset eg latin1 etc see https://lalitvc.wordpress.com/2014/10/17/error-collation-utf8_general_ci-is-not-valid-for-character-set-latin1/

Eg it results either in error:

COLLATION 'UTF8_GENERAL_CI' is not valid for CHARACTER SET 'latin1'

or even

COLLATION 'UTF8_GENERAL_CI' is not valid for CHARACTER SET 'binary'

when column is binary (eg config_id etc)

@tsteur commented on August 21st 2015

In the past we ran some alter table statements to set UTF8_GENERAL_CI on same columns specifically but reckon this is not an option here as it would take maybe too long...

@tsteur commented on August 21st 2015

FYI: When we're setting the charset to UTF8 via set names utf8 we could also set the collation like SET NAMES 'utf8' COLLATE 'utf8_general_ci' (in case charset==utf8)... This would mean it would behave similar for all databases and behaviour would be more predictable similar to https://github.com/piwik/piwik/pull/8589#issuecomment-133316244

@tsteur commented on August 21st 2015

I made a change (in https://github.com/piwik/piwik/compare/8579) but I think there's actually nothing to do for contains... I think it always worked case insensitive and we would rather get more errors by merging that change. I think the issue was only about equal comparison etc see https://github.com/piwik/piwik/pull/8584#issue-101315551

@tsteur commented on August 21st 2015

As discussed moving issue to 2.15.1 as we cannot reproduce the issue. It seems to be case insensitive already

@mattab commented on September 20th 2015

Closing for now as it seems to work

This issue was closed on September 20th 2015
Powered by GitHub Issue Mirror