@mattab opened this issue on March 9th 2009

Baidu is the biggest search engine in China and currently Piwik fails detecting keywords from baidu.

Example queries:

http://www.baidu.com/s?lm=0&si=&rn=10&ie=gb2312&ct=0&wd=%BF%DA%D3%EF+%CD%F2%C4%DC&pn=10&ver=0&cl=3&uim=0&usm=0


http://www.baidu.com/s?kw=&sc=web&cl=3&tn=sitehao123&ct=0&rn=&lm=&ie=gb2312&rs2=&myselectvalue=&f=&pv=&z=&from=&word=%B7%E8%BF%F1%CB%B5%D3%A2%D3%EF+%D4%DA%CF%DF%B9%DB%BF%B4
http://www.baidu.com/s?wd=%C1%F7%D0%D0%C3%C0%D3%EF%CF%C2%D4%D8
http://www.baidu.com/s?wd=%C1%F7%D0%D0%C3%C0%D3%EF+%CF%C2%D4%D8&lm=0&si=&rn=10&ie=gb2312&ct=0&cl=3&f=1&rsp=3&oq=VOA%C1%F7%D0%D0%C3%C0%D3%EF
http://web.gougou.com/search?search=%e6%b5%81%e8%a1%8c%e7%be%8e%e8%af%ad%20%e4%b8%8b%e8%bd%bd

Resolving this issue involves writing unit test to cover these bits of code. Also we should check whether the code path around line 715 in core/Tracker/Visit.php is useful, if not fix it or delete it.

@robocoder commented on March 10th 2009

The problems with baidu might be more complex than at first glance: - the second url uses the variable name "word" instead of "wd" - gb2312 is an encoding; are the keywords not utf-8?

@mattab commented on March 20th 2009

also see #435 which is very related

@mattab commented on March 24th 2009

(In [1014]) - cleaning up the search engine parsing code, adding tests, recording UTF8 keywords in the DB rather than encoded (as tables are now utf8, refs #5730) - adding tests in url.test.php and fixed double encoding in some edge cases - fixed #589 Piwik fails to properly decode and store some chinese keywords (eg. from baidu.com) - fixed #435 Exotic encoded keywords should be stored as utf-8 in the DB - refs #575 hopefully fixed, will give it a few days of tests on piwik.org

This issue was closed on March 24th 2009
Powered by GitHub Issue Mirror