@mattab opened this Issue on October 11th 2014 Owner

There have been several reports of this error: allocateNewArchiveId: Cannot get named lock allocateNewArchive

2014-10-10 12:47:17 ERROR [2014-10-10 12:47:17] [6c725] Got invalid response from API request: index.php?module=API&method=API.get&idSite=1681&period=day&date=last2&format=php&trigger=archivephp. Response was 'a:2:{s:6:"result";s:5:"error";s:7:"message";s:95:"allocateNewArchiveId: Cannot get named lock allocateNewArchiveId.piwik_archive_numeric_2014_10.";}'

The goal of this issue is to investigate and make this bug does not occur again.

@tsteur commented on October 15th 2014 Owner

I had a look but not sure how to fix it unless we completely get rid of it (and using for instance other IDs for idarchive instead of our auto increment) as it somehow always can happen. But maybe I'm wrong.

Does anyone know if it is possible to do GET_LOCK(0.1). Haven't found anything about this. Docs say "seconds" but not sure if it would work with something like this. Update: I did a little test and it seems to be converted to 0 seconds. Not sure if there is another way to wait for a lock for like 100ms. Couldn't find anything. I was expecting the second parameter is only a timeout and the lock will be released immediately but according to a check I did this is not the case.

I had one get_lock() and released it after 30 seconds. Had another get_lock(,100) and although it was released after 30 seconds I got the lock after 100 with my PHP 5.3 / MySQL 5.5 version.

@diosmosis commented on October 15th 2014 Member

Maybe we can use an atomic count to keep track of the max number of archive IDs instead of using a lock? Ie, instead of getting a lock and doing a SELECT + INSERT, we could store the max id archive in an archive row, and do update/select like this: http://stackoverflow.com/questions/15034136/select-updated-rows-in-mysql#answer-15034351 . This way, we effectively reserve an ID and can safely insert afterwards. I think it might work.

@tsteur commented on October 16th 2014 Owner

Not sure how you would set the id here or rather how you would get the maxidarchive?

Wouldn't this be the same as this?

 $insertSql = "INSERT INTO $numericTable "
    . " SELECT <a class='mention' href='https://github.com/var'>@var</a> := IFNULL( MAX(idarchive), 0 ) + 1,
                        '" . $locked . "',
                        " . (int)$idSite . ",
                        '" . $date . "',
                        '" . $date . "',
                        0,
                        '" . $date . "',
                        0 "
    . " FROM $numericTable as tb1";
Db::get()->exec($insertSql);

$selectIdSql = "SELECT <a class='mention' href='https://github.com/var'>@var</a>";
$id = Db::get()->fetchOne($selectIdSql, $locked);

Problem is if 2 or more requests are selecting max(idarchive) at the same time we would again need a lock as with innodb not the whole table is locked.

@diosmosis commented on October 16th 2014 Member

My thought does not involve getting MAX(idarchive). The insertion happens in two steps:

1) reserve new idarchive

Db::query("UPDATE archive_numeric SET value = value + 1 WHERE name = 'specialIdArchiveCount' AND <a class='mention' href='https://github.com/reservedId'>@reservedId</a> := value");

$reservedId = Db::fetchOne("SELECT <a class='mention' href='https://github.com/reservedId'>@reservedId</a>");

2) use new idarchive in insert

Db::query("INSERT INTO archive_numeric (idarchive, ...) VALUES (?, ...)", array($reservedId));

Complications arise when the special archive doesn't exist, but I have not thought that far. An option could be used for each archive table as well.

@tsteur commented on October 16th 2014 Owner

Using another table for incrementing id would work. We did this in some other projects to be able to shard if needed. That's why we didn't use auto_increment feature in MySQL. What we did there is basically following:

Table consisting of column name and value.

UPDATE table SET value = LAST_INSERT_ID(value + :count) WHERE name = 'archive_numeric_2014_01';

followed by

SELECT LAST_INSERT_ID()

works as last insert id is managed per connection. Count would allow you to optionally reserve multiple ids in one step if needed.

@mattab any opinions?

@diosmosis commented on October 16th 2014 Member

Can you do it with one table or do you need one per archive numeric table?

@diosmosis commented on October 16th 2014 Member

Ah ok, saw the name part of the WHERE.

@diosmosis commented on October 16th 2014 Member

If a new table is used I'd like to request name refer not to the table per-say, but the name of the atomic count, so it can be used in other ways. This would require being able to reset a count, not sure if that can be done w/ LAST_INSERT_ID(...) being used?

@tsteur commented on October 16th 2014 Owner

Not sure what you meant with your first sentence?

Should work with last_insert_id. Not sure if there is any difference in using this one or <a class='mention' href='https://github.com/var'>@var</a>. Only posted it as we used it this way and worked perfectly even under load.

@diosmosis commented on October 16th 2014 Member

I mean instead of just using it for the IDs of tables, use it as a table for mysql based semaphores (so name could be, for example, 'CronArchive.someDistributedCount'.

@tsteur commented on October 16th 2014 Owner

yeah sure this can be done out of the box I think. Basically this is what we would use it for with idarchive. Do you have a specific use case in mind? Maybe better for finding a decision whether we want to create such a table or leave the lock.

@diosmosis commented on October 16th 2014 Member

I've already implemented something similar (uses Option table and value = value + 1) for #5363 locally, so using the table for this would just mean less duplication. The class I've created is called Semaphore and is located in core/Concurrency (in case someone decides to implement the table in this ticket).

@tsteur commented on October 30th 2014 Owner

@mattab what's the thought here? Is there actually a need to put a critical and major label here? I started ignoring those labels as they are set just so often and to me sometimes "randomly"

@mattab commented on October 30th 2014 Owner

@tsteur Meant only to put one not two labels.

It's a Major bug as we regularly have it on Cloud and Enterprise. If you have an idea how it could be solved please give it a try, it would be fantastic to get rid of this issue!

@tsteur commented on October 31st 2014 Owner

as discussed we could get rid of at least one of those two by having a "counter" in MySQL. Can work on this.

Not sure whether we can get rid of the second one (haven't really thought about this one yet). Maybe it would already occur less or it might be already fixed by removing one of them (probably not).

@mattab commented on October 31st 2014 Owner

as discussed we could get rid of at least one of those two by having a "counter" in MySQL. Can work on this.

+1 it's a good start to get rid of one of the two locks.

Not sure whether we can get rid of the second one (haven't really thought about this one yet).

Maybe you can think about it once you start removing the first lock. Would of course be nice to get rid of both locks if possible :+1:

@tsteur commented on November 2nd 2014 Owner

I am currently implementing the "sequence" table and came across one problem that I already expected.

Imagine we have a new table to count the sequences. When updating Piwik we'll have to "migrate" the data to this table. Meaning we have to create several initial counter values such as archive_numeric_2014_10 => 4822 (name => value/currentMaxId).

Usually we would show the SQL for migration during the update process but the ID might have already changed until someone actually executes the update (either manually, console, UI). To workaround this one could say we simply add +10000 to the next archive id so even if some more ids are created meanwhile it would not fail.

Problem is the code will be already updated and wants to use the new sequence table whereas the SQL update is not executed yet. So all archive creation during this time won't work. Meaning we can also not migrate those sequence ids in the background as the table won't exist yet.

So probably we'll have to detect whether the sequence table is installed and if so migrate in the background (should be very fast as we only have to get current max archive id and insert this value into an empty table) otherwise if table is not installed yet go with old logic. Alternatively we could say something like "when creating the next archive table use sequence table otherwise still use lock" although not sure how to implement it. Need to think a bit more about it. I'll check out what is easier to implement.

@tsteur commented on November 2nd 2014 Owner

Using the sequence table starting from the next archive table and the lock for the current archive table should work but might lead to issues if someone updates around the beginning of a month. Also we could never get rid of the code that uses lock and it could lead to inconsistent ids when lock is used due to any db failure although sequence should be used. ==> This is not an option. (I know this all sounds a bit confusing but simply ignore it if you do not understand)

We could detect in the background whether the table exist and if not, we can create it. We could probably also detect whether a sequence was initialized (meaning whether we have migrated the max archive id to the sequence table) and if not migrate it on the fly. I don't think this would add any overhead but it is possible that two processes migrate the max archive id at the same time which could maybe lead to errors again but don't think so. Will probably go with this solution.

Otherwise we could just create the sequence table and migrate the max archive id during the update but any archive creation would not work between the time of updating the files and finishing the SQL update. Is this a problem? @mattab

@mattab commented on November 2nd 2014 Owner

Otherwise we could just create the sequence table and migrate the max archive id during the update but any archive creation would not work between the time of updating the files and finishing the SQL update. Is this a problem?

AFAIK Piwik will return an error message in the API output when the DB schema is pending some upgrades. So it should be OK to do this! (the archiving will not have triggered anyway)

@tsteur commented on November 3rd 2014 Owner

See pull request.

Removed the lock to create the new archive id. The second lock is still there but this problem should not really occur anymore. The second lock locks no longer the whole table but only the "table + archiveId" which should be fine. We could maybe just remove the second lock completely ( https://github.com/piwik/piwik/pull/6583/files#diff-f0dc4d4d133f84005fcebc1e10f155c0R159 ) if InnoDb is used as there are also indexes on both columns (should not lock the whole table) but not sure whether we can remove it as I still do not 100% understand what is happening in the archiver.

Would be nice to run some tests with the sequence table to see if any problems occur under load (should not but one never knows). Otherwise the system tests say everything still works so I reckon the change should be fine.

@mattab commented on November 4th 2014 Owner

The second lock locks no longer the whole table but only the "table + archiveId" which should be fine.

Oh interesting, was this a bug before that we locked the whole table?

but not sure whether we can remove it as I still do not 100% understand what is happening in the archiver.

Can you check the following use case still work:

  • When two widgets request the same date+period+idsite at the same time
  • Expected: only one archive is processed (and other widgets "wait" for the archive to finish to grab it)
  • Not expected: both widgets process each an archive (processing twice the same data)
@tsteur commented on November 4th 2014 Owner

It creates the same archive multiple times but same happens with master branch. I'm not sure if you verified this in #4918 or if you only had a look at the time. I see the same behavior when adding the 20seconds to archiver (dashboard loads after about 23 sec) as you described in that issue but when verifying the archives one can see it does create the same archive multiple times

@mattab commented on November 5th 2014 Owner

Likely I didn't check properly but I intended to check (and thought it worked but possible i bugged). If it's not working in master already then it's fine. Maybe we should create a ticket for this enhancement.

@mattab commented on November 7th 2014 Owner

Closing as @tsteur has finished this nice solution of a sequence table, and fixed the other "deletion" lock :+1:

This Issue was closed on November 7th 2014
Powered by GitHub Issue Mirror