Piwik now has internal search keywords tracking. Awesome!
Functionnality: - specify an URL + parameter for the internal search query - give the general metrics: number of searches, visits with searches, unique searches, etc. (see comment below) - display the list of internal searches in a new report under Actions - can we auto detect google CSE? #426 - see hack 64 & 65 in peterson's book
Google is good at analysing search quality / search in general. We can get ideas from how google analytics does internal search tracking.
Site search http://www.google.com/support/analytics/bin/topic.py?topic=12626
Google Analytics uses the following formulas to calculate the metrics used in internal site search reports:
* Visits with Search = The number of visits that used your site's search function at least once. * Percentage of visits that used internal search = Visits with Search / Total Visits * Total Unique Searches = The total number of times your site search was used. This excludes multiple searches on the same keyword during the same visit. * Results Pageviews / Search = Pageviews of search result pages / Total Unique Searches * Search Exits = The number of searches a visitor made immediately before leaving the site. * Percentage of Search Exits = Search Exits / Visits with Search * Search Refinements = The number of times a visitor searched again immediately after performing a search. * Percentage Search Refinements = The percentage of searches that resulted in a search refinement. Calculated as Search Refinements / Pageviews of search result pages. * Time after Search = The average amount of time visitors spend on your site after performing a search. This is calculated as Sum of all "search_duration" across all searches / ("search_transitions" + 1) * Search Depth = The average number of pages visitors viewed after performing a search. This is calculated as Sum of all "search_depth" across all searches / ("search_transitions" + 1)
Example Calculations This section describes a visitor's experience with your website's search engine and explains how Google Analytics calculates the resulting data. The visitor progresses through three different pages when interacting with your website's search engine:
* Search Page - Page on site where the visitor enters terms for a web search * Search Results Page - Results page that is returned on a search engine query * Results Pageview - The page viewed after a click on a results page
Assuming your website received three visits from visitors that navigate as described...:
* Visit 1: (time between "camera" term search page and "black camera" term search page is 30 seconds; and "black camera" search to site exit is 60 seconds) o Search Term Page (term "camera") > o Results Page > o view Results Pageview > o view Results Pageview > o Search Term Page (term "black camera") > o Search Results Page > o view Results Pageview > o view Results Pageview > o view Results Pageview > o Site exit * Visit 2 : (time between "computer" term search page to site exit is 15 seconds) o Search Page (term "computer") > o Results Page > o Site exit * Visit 3: o No Search
...The following metrics can now be calculated:
* % Visits used internal search = 2 Visitor that used site search (Visit 1 & Visit 2) / 3 Total Visitors = 66.7% * Visits with Search = 2 (Visit 1 & Visit 2) * Total Unique Searches = 3 ("camera", "black camera", "computer") * Results Pageviews / Search = (2 + 3) / 3 = 1.67 * Search Exits = 1 (Visit 2) * % Search Exits = 1 (Visit 2) / 2 (Visit 1 & Visit 2) = 50% * Refinement = 1 (Visit 1 - "black camera") * % Refinement = 1 (Visit 1 - "black camera") / 3 = 33.3% * Time after Search = (30 seconds + 60 seconds + 15 seconds) / (1 + 1) [1 & visit 2](visit) = 52.5 sec * Search Depth = (2 [+ 3 ["black camera"]("camera"]) + 0 [/ (1 [Visit 1]("computer"])) + 1 (Visit)) = 2.5
Benjamin: afaik, no is working on it; so, feel free to implement and share.
I gave this plugin a try (it's my first Piwik plugin, so every suggestion is welcome).
The menu items are registered, I extended the site table by columns for url and search parameter name. The settings area is working.
Now, the logs have to be analyzed. As far as I can see, there are no existing API methods, that would provide adequate functionality. This whole DataTable business seems to be pretty complex (but cool!), so I'd really appreciate it, if somebody would help me get started with this (Documentation, Tips or Code).
What I have done so far can be found on github:
I know, Piwik uses SVN, but github can be accessed via SVN as well:
svn checkout https://svn.github.com/BeezyT/piwik-sitesearch
Thanks for your help,
I just pushed the first version of the keyword analysis to github (including screenshots). - A list of the most popular keywords is displayed - When the user clicks on one of the keywords, a second table is showing the most popular following pages. These pages are most likeley the ones, the website user was looking for.
The plugin is making good progress...
Have a look at the github wiki for up to date information:
Looks quite good. However I was you're using mysql_real_escape_string and probably other mysql_* specific functions (e.g. http://github.com/BeezyT/piwik-sitesearch/blob/master/SiteSearch.php#L117). You should use the second argument of Piwik_Query to work with parameterised queries (Piwik_Query($sqlQuery, $parameters)) and therefore allow other database backends (in future).
Thanks for your feedback (also the other open issues on github)!
I see, why I souldn't use the mysql functions. I replaced the parameters in a query with ? and passed the second argument to Piwik_FetchAll. Now the query is not working anymore, and I can't find a way to debug.
How can I find out, what the query looks like when it is executed?
It looks very interesting start!
Are you interested to have such plugin included in Piwik core? if so, we would need to review the schema updates (to process metrics above, visits per search, total search, search exits, etc.).
Tracker: - I think some new fields to log_visit would be necessary (visit_total_search, last_search_idaction: link to log_action.id). - tracker code must be highly optimized, in the case of a search, this would add a new INSERT in log_action (for new searches) and one in log_link_visit_action, but no more (the UPDATE to log_visit to update counter of searches, etc. would be done at same time as core UPDATE statement). The current code doing a massive join on 3 tables will not work at mid scale :) - the search URL and search term should be archived in the Tracker file (tmp/cache/tracker/) - check out the hook 'Common.fetchWebsiteAttributes' and how it can be used. the goal is to do less requests at Tracker time.
Archiving: the plugin doesn't do archiving currently, I understand you pointed out code was not reusable. Indeed because you are doing "new" metrics in the Piwik world :-) but technically your code should archive data using the same mechanism used, for example, in the Visits by Server Time. You can then lookup query enrich*() in ArchiveProcessing, to see what you would base your query on.
Your integration of Search Results using custom data is very cool!! First cool use case of this function. And your code looks really good. This would be amazing to have in Piwik core for sure :)
Thanks for the feedback, matt.
I know that performance is a huge issue at the moment. To be honest, I didn't care much about it yet since it's still more a proof of concept. At the moment, I'm adding a search refinements feature, which is the last of the must-haves. This is probably the most performance critical, I think, we have to extend the schema a little more to get efficiency.
Archiving would be great, I read a lot of code, but I still don't get how it's done :-(. Some sort of documentation about that would be great, but I guess, the target group isn't that big... So if you (or anybody else) have the time, feel free to fork the project and get the archiving process started. I'm very open to collaboration!
What does including the plugin in piwik core mean? That is comes with piwik by default? That would be great, but I'd like to keep working on it (at least as much as I have time for it). Can we find a solution for a common version control? I'd be happy to stick with github, but if you guys have a better suggestion, I'm open...
github is perfect for now until the code is maybe ready, and committed to SVN trunk. Then you could have SVN commit and be part of the team if it interests you :)
Before that, it would need to be in line with other plugins in terms of performance and vision. Yours is a great start so promising.
Regarding Archiving, the big idea is to query the logs GROUPED BY a given entity (eg. keyword), and then request common stats for all keywords (visits, pages, avg time on site, bounce count, etc.). The helpers in ArchiveProcessing/* are doing this. Check out enrich* methods in particular. You can of course write SQL directly in your archiving module, but you can then create datatables. The advantage is that you can just sum them automatically when archiving week and months (which are sums of days). So it makes the code smaller to reuse these classes.
Let us know how it goes. good luck!
I started implementing the archiving process, and I'm not sure what the best solution is. What I did: - Register ArchiveProcessing_Day.compute - Build a DataTable and archive under SiteSearch_keywords - Get the datable from the archive in API
I only implemented this for the keyword overview and for the day archive. Before I go on, please have a look and tell me whether that's what you had in mind...
Quick question: is the code working in its current state?
The concept of archiving in Piwik is explained briefly in: http://dev.piwik.org/trac/wiki/DatabaseSchema#Archiveddata
Idea is: - Daily archive queries mysql logs, and generate a datatable with several rows, each row having a dimension (eg. keyword, page URL) and metrics (visits, pages, time on site, etc.) - weekly/monthly/yearly archive just go over the daily archives in the set, and sum them. This is why daily archive need only select plain numbers that can be summed. All ratio, percentage, etc. must be processed by the display layer.
Let me know if you need specific guidance.
Thanks for the comment. I had most of that figured out by now, but still it's good to know, that there is documentation ;-)
I have some specific questions:
When the user clicks a keyword, the plugin shows statistics for that keyword only (following pages, previous pages, evolution, search refinements). 1. Should I store the DataTable for each keyword in an individual archive record? (At the moment, the plugin is doing that, there can be quite a lot of keywords, but I don't see a more efficient way.) 2. Should I trigger archiving the DataTables related to only one keyword before the keyword is clicked, meaning when the general DataTables are archived? (At the moment the plugin is doing that, and it's not performing well. The alternative would be to trigger the archiving process only when the user clicks a keyword. But would we then have to handle the cronjob archiving separately, and include archiving the keyword-DataTables in it?)
I haven't worked on the plugin for a few days now, and I won't have much time for the next 4 weeks or so, but after that, I'm planning to finish the the first beta version within a few weeks.
The metrics stored on a per keyword basis are: - Following pages: The pages people visited after searching for the keyword (this should be the content the user was looking for. if the home page is popular, the user didn't find what he was looking for.) - Previous pages: The pages people visited before searching for the keyword (the users might have gotten lost on these pages). - Evolution: The number of searches for that keyword over time (that could be stored as float, not blob) - Search refinements: The keywords users were searching for after searching for the current keyword. I might change this to "Searches by the same users". This metric is pretty expensive, I'll have to optimize the algorithm a lot.
At the moment, only the first two metrics are archived, and it still takes a long time to complete, when there are many keywords.
Have a look at http://github.com/BeezyT/piwik-sitesearch/blob/master/Archive.php (method archiveDay). The main performance issue is, that I have to analyze the actions (not only the visits) a lot - for every keyword.
Here are some more specific questions: - Do you have any ideas for dayAnalyzeKeywords() without the huge join? - Would you create separate DataTables for each metric? (Previous and following pages could easily be stored in a single table) - Would you create separate DataTables for each keyword? - Can I handle archiving differently, if it is done via cronjob: If so, archive completely (including keyword details), if it is done on the fly, archive the details only on demand.
Thanks for your help, matt! I really appreciate it.
EZdesign, sorry for the delay. Have you made further progress? - Following pages / Previous pages / Search refinements: These data sets will be huge so better to store each of them in a separate datatable and blob. You would then have a different API getter for each of these data sets. - in your archivePeriod() could you reuse the existing summing logic? ie. $archiveProcessing->archiveDataTable($dataTableToSum); - I wouldn't create separate datatable for each keyword, but instead separate datatable for each non integer metrics (eg. following/previous pages, refined keywords per keyword, etc.). You can keep all integer metrics in the same datatatable. - Regarding Archiving on demand, this could be done later as a core feature. Better archive everything as all other plugins do currently. - I haven't looked in details in the code regarding performance issues and large joins.. To have a clear picture of what is needed, can you please prepare the list of all metrics that are processed?
Thanks for the feedback, it helped getting the additional archiving time for my test database down from 100 to 5 seconds ;-)
I'll let you know, when I have some more specific questions.
The plugin is making great progress, everything uses archiving and seems to work now. You could say, that we have reached the first beta version. If you have any bug reports, please create issues on github.
There is one problem I'm having with the evolution graph, that I can't figure out. Have a look at this screenshot: http://github.com/downloads/BeezyT/piwik-sitesearch/Percentage.png
The axis is not scaled properly...
The Controller method is called searchPercentage, the API method is getSearchPercentageEvolution.
Did anybody have this problem before? Is it a bug or am I doing something wrong??
Btw, if you had the plugin installed previously and want to update to the latest version, remove the schema changes from piwik_site and piwik_log_action by hand, run the install method again and then check "analyze urls now" in the settings.
Please test against trunk. In fixing #1562 (displaying goal conversation rates, i.e., percentages), we've made some changes to the visualization code.
The ticket is about exactly the same problem, but unsing trunk didn't help.
When you use ColumnCallbackAddColumnPercentage, the result is a localized number with a '%'. This locale-specific format works well when displayed in the table, but it's a string, not a number. When the Visualization code goes to find the max value, PHP's max() function does a string comparison, so "13.5%" is "bigger" than "100%".
We also run into an issue with locales. Consider 3/4. In "en_US.UTF-8", this would become "0.75%". In "de_DE.UTF-8", this becomes "0,75%". Casting to (float) isn't locale-aware.
Can you use ColumnCallbackReplace with Piwik::getPercentageSafe?
re: search_percentage. core/ViewDataTable/GenerateGraphData/ChartEvolution.php will guess the unit from the column name. We can add _percentage to the list, or you can use _rate (e.g., search_rate), or you can explicitly set the Y-axis unit ('%').
You hit the nail on the head with that response! Works fine now.
I just released v0.1.2: it includes the fix and some widgets I added yesterday. If you want to test / use the plugin, I recommend using only the commits tagged with a version number. They should be more or less stable. If you don't want to use git or svn to access github, there are tgz/zip archives of the releases in the downloads section on github.
Looking forward to your feedback...
Timo: your code is missing a license statement.
Hey guys, I was just checking out repopular (http://repopular.com/) and what do I see on the first page? My plugin!
Thanks for the publicity, are you using it?
@vipsoft: what license do you recommend? What do I need to pick, so you can add it to the core when the time is right?
Must have been my tweet.
The license is up to you. For inclusion with Piwik core, we require that it be GPL v3 compatible, e.g., GPL v3, BSD, MIT, or LGPL v3. Affero GPL v3 isn't strictly compatible, but is also allowed.
Sorry for the delay. I'm going to try and squeeze in a review this week.
Ok. That was a pleasant code read. Only a few issues to address/discuss with Timo and Matt: - binding named parameters in SQL queries isn't supported by mysqli extension - should use phpdoc-style comments; if included in Piwik, should use standard header, but @author tags won't be rejected - no plugin-specific unit tests, and integration tests fail after SiteSearch plugin is activated - SiteSearch::log(): should we extend the tracker's printDebug to support file-based logging rather than having plugins implement their own? - dev/ folder scripts: I don't think we need to commit these to the repository; the alternative is to have the build script remove this when packaging releases
re: logResults: - the html_entity_decode() is now done by getRequestVar() in trunk. Probably should do an unsanitizeInputValue() before json_decode() though. - Matt's comment:20
the search URL and search term should be archived in the Tracker file (tmp/cache/tracker/) - check out the hook 'Common.fetchWebsiteAttributes' and how it can be used. the goal is to do less requests at Tracker time.
Thanks for the review.
I had planned to remove the logging and the dev folder from the plugin and move them to a separate plugin, that I use for development. If you want to include this functionality in the core, that's fine with me as well.
I also have some questions regarding the tracker cache (Matt's comment:20): - How am I supposed to cache url and search term during tracking? This information is set once for a site in the settings. - If you meant the search term for an action, that is also just set once for an action and is not associated to a single pageview. - If the above doesn't make sense, the real question is "what is the tracker cache" and "how does it work"? ;-)
Thanks for your help.
The tracker cache are files in tmp/cache/tracker to reduce the number of SQL queries by the tracker.
In plugins/SitesManager/SitesManager.php, recordWebsiteDataInCache() hooks on "Common.fetchWebsiteAttributes" to cache site data. The site search url and search parameter could also be saved this way.
In API.php, any update of the site table is followed by a call to Piwik_Common::regenerateWebsiteCacheAttributes().
Last part: logResults would call Piwik_Common::getCacheWebsiteAttributes( $idSite ) to access the tracker cache (which may already be loaded at this point), thus avoiding a SELECT during tracking.
Thanks Marco, that was spot on! I also added a check for the Piwik version, because the old query breaks the new verion and the new query breaks the old version... (See Github)
I just released a new version with numerous improvements, including tracker cache. Please notice the release notes in the README, otherwise the plugin won't work anymore.
Thanks jekko for submitting the report. The latest commit at github will fix your problems.
In 1.3, the constructor signature of Piwik_DataTable_Filter_ReplaceColumnNames has been changed and that broke the search evolution chart. This has happened a couple of times now, that a new Piwik release changes vital things like the database schema or core signatures - without any chance for me to have a trial run before the release. After the new release is out, bug reports come in, and I have to take the blame for writing an incompatible plugin. Am I the only plugin developer or is there something I don't know about (like a developer release before the public release)? This has to happen to other people as well, so there has to be something, right?
Further, 1.3 introduced the custom date range. Is there any documentation on how that works? Previously, I was relying on Piwik_Controller::$date. Someone added the comment "null if the requested date is a range", but I doesn't say what to when it's null. Where do I get the date? How does archiving date ranges work?
EZdesign, I hear your complaint. We have done a two weeks long beta testing, advertised it on the blog post & twitter & facebook but maybe you have missed the announcement. Maybe we should have some kind of lists for all beta testers (and plugin developers, etc.)?
It happens often to you with Search Tracking, because you are building one of the most advanced piwik plugins, so most likely that when we change core API it breaks. It is part of our goals to keep the API stable as much as possible, but sometimes there is no choice as we are still fast evolving.
On this note, we should integrate Search Tracking in core... it is a very useful plugin. However I think it should be improved performance wise, and maybe feature set. If you have time and interest for this, maybe we can work together? (also, a sponsor bounty would be possible for such work, if we include in core)
Date Range: if you use standard archivePeriod hooks, piwik will handle date range automatically (it sums daily periods for ranges, like it sums daily periods for weeks and months). If you have to do manual coding for period=range there is probably something wrong, or something that could be improved.
Thanks for the quick answer, matt.
Good to know, that there is a beta testing phase... Btw, it was not announced on the blog, otherwise I most likely would have read it. A mailing list for beta testers / developers would be great (or something else that creates some kind of push notification).
Including the plugin in the core sounds good for me. I'd be interested in working togerther on this. And if it's sponsored, making time for further development would be easier, of course ;-)
Can we maybe talk on skype about this?
Date Range: The overall date management of the plugin definately can be improved, but I can't find a clean way to do so. This could be one of the first things, we would improve together.
1.3-rc1 was announced on the blog: http://piwik.org/blog/2011/04/new-piwik-mobile-app-released-also-piwik-1-3rc1-available-for-early-adopters/
Sure we can talk on skype, my skype is my first name dot last name cheers
Oops, I missed that behind the Piwik Mobile headline. My bad.
An excellent article about how to use Site Search feature: http://www.cxfocus.com/index.php/google-analytics-tips/google-analytics-site-search-report/
Maybe we could somehow integrate "Analysis tips" in the UI and display in the UI the main questions raised in this article, to help users of the feature to find out interesting facts from the data.
Thanks for your interest in the plugin, jens.
There are plans to integrate the plugin in Piwik core. It's not certain yet, but they might be realized very soon as part of a sponsored project. If we integrate it in core, most of it will be overhauled (especially the backend) which means you wound have to set it up again and the reports would start from scratch as well.
If you want to analyze your internal search now, go ahead and use the plugin from github. It works well for many users. For a more performant core version, keep an eye on this ticket.
Have you tried pressing refresh in you browser? You might need to reload a JS file.
I don't know what exactly you downloaded but it seems to be wrong.
Go to https://github.com/BeezyT/piwik-sitesearch/downloads and click Download as ZIP. Extract the ZIP and place the folder SiteSearch in the plugins folder of you Piwik installation.
@gildesign: See https://github.com/BeezyT/piwik-sitesearch/issues/38. This is a know issue that won't be fixed in the github-version because SiteSearch will be integrated in Piwik Core soon. During that process, major parts will be overhauled and the problems will be fixed.
Any updates on this, any testing/help needed?
How's the progress on integrating this plugin into core? I cannot see any trace of it in the Subversion repository.
I'm working on it
(In ) Refs #2992 Site Search KABOOM, Refs #5469 Implementing Site Search tracking & reporting in Piwik core! - New Admin UI to customize, for each site, wheter site search is enabled. Also options to set default values to use. - New Reports: Searches, Searches with no result, Search categories, Top Pages Following a Search - to track "No result keyword" users will have to tag their site with a JS call, or add a new parameter to the search result page &search_count=X (X being zero for no result searches) - Reports works with Row evolution, PDF/HTML reports, Piwik Mobile - idaction_url is now NULLable because, Site Search records a page with idaction_name == Keyword, and idaction_url == NULL. This ensures that the Site Searches don't create "Page URL Not defined" records. - updates to Tracker JS API, new function trackSiteSearch, also added in PHP tracker - New fields in log_visit to track searches - new segment, "searches" which can be used to select visitors who did a search ie. searches>0 or those who searched a lot, ie searches>10
TODO: - commmit integration test, TESTING, DOCS, FAQ, release, and a nice Prayer to the universe and the stars, hoping that I can code a major new feature without any bug... - It would be awesome to have compatiblity with Transitions so we can see, for a given site search, what are the starting pages and Destination pages
Thank you for your patience Timo, and thank in advance everyone for your help Testing this new feature!
(In ) Refs #2992 #49 Message fix
(In ) Refs #2992 #49 - Adding integration test - Note: it appears the "No result keyword" does not work, i'm on it
(In ) Refs #2992 #49 Fixing the No result keyword bug
(In ) Refs #2992 #49 Also updating schema
(In ) Refs #2992 #49 enable transitions for the pages following site search reports
see follow up ticket: #3461
In 9e051a171c2734c88a61d51217e146e95c3b2594: refs #5469 remove "Others" row from site search report "keywords with no results"