@mattab opened this Issue on December 9th 2008 Owner

In Piwik, all data is stored in a monolithic database. That is a problem when you have a huge traffic to monitor with Piwik: the database server struggles, the queries take too long to finish. One solution is to record data in several databases, within the same Piwik instance. Piwik will route automatically the data to the right database using the "idsite".

For example:

  • sites 1-1000 in serverA
  • site 1001-1100 in serverB

We need to have the idsite in all SQL queries (as a parameter, or in a comment) and then automatically grep their content and route to the right server. The pairs (idsite, server) are stored in a configuration file.

THIS IS NOT FINISHED AND NEEDS MORE WORK.

Johan Mathe built the first dev version of the plugin, attached in the ticket.
We updated the core to add the idsite in some queries, but there is more to do.
This plugin was developed in August 2008 and Piwik has slightly changed since, added more queries, etc.

This plugin would be incredibly useful to all the big users of Piwik; some people are using Piwik to monitor thousands of websites, millions of visits, etc.

Attached is the current development version of the plugin. This is DEV only (it won't work with current trunk). It is helpful to give an idea on how it could work.

Please post a comment here if you are interested in this plugin development and would like to participate.

@mattab commented on December 9th 2008 Owner

Attachment: DEV version of sharding Piwik plugin
Sharding.zip

@anonymous-piwik-user commented on March 26th 2009

SKype provide its engine for sharding database with PostgreSQL. It is exactly deisgned for the purpose here.
Let's have a look for those interested : [https://developer.skype.com/SkypeGarage/DbProjects/PlProxy]

And it can be better to dispatch the idsite in a non linear way. The above link provide an example based on a hash.

@robocoder commented on October 8th 2009 Contributor

This plugin will need to be updated to reflect db abstraction changes.

In the absence of sharding, consider providing an option to remove the sharding comments to workaround a query cache bug on older MySQL versions.

@mattab commented on October 12th 2009 Owner

We should not remove the comments. Even though the plugin is now not in a working state, it is there as a proof of concept. Sharding in Piwik would be a must have feature for high traffic piwik instances.

What is the issue with mysql cache? if it is fixed in stable mysql releases, it is not a blocker for us.

@robocoder commented on May 11th 2010 Contributor

Thought: investigate using the Spider storage engine for MySQL as a more transparent method for partitioning/sharding.

@mattab commented on November 24th 2010 Owner

Sharding as such is not the way to go... we can open specific tickets for specific implementations (eg. Mysql spider storage) if someone starts work on it.

This Issue was closed on November 24th 2010
Powered by GitHub Issue Mirror