• English version
  • Version française
  • Versión española

Archives de janvier 2013

Download repositories master source failure

Posté le samedi 5 janvier 2013, à 10:05 UTC

We lost during the night the data RAID array on the server used for master source of download repositories.

Until now, we managed to recover the RAID array and the data to the last known revision, but we do not want to put it back on production until we replaced and re-synched all the faulty drives.

There seem to be missing files on slave servers, but this is not a big deal, they will come back on master-to-slaves resync after array is repaired.

Edit 2013-01-05 17:29 UTC: Staff arrived at destination, work begin.

Edit 2013-01-05 23:51 UTC: We managed to recover almost everything from the died RAID array, meaning we have everything up to date except a few files, it took some time but this is, in our opinion, better than just recover from the backup. But there is a sad news, the delivered spare hard drives are not the good one, so we are going to rebuild the array with disks that are available and we will need to work on it again later to grow the array with the expected disks.

Edit 2013-01-06 00:51 UTC: So, today's hard drives being reliable as close to "already broken in the sealed package", we of course had to deal with never used once and already broken disks. Well, we still managed to create a hopefully healthy array that will be able to store all the data. Previously recovered data are being copied back right now.

Edit 2013-01-06 04:07 UTC: Yeah, downloads master source is back, with a fresh clean filesystem and all the data available. Slave servers (those used to get files) are currently synchronizing from master, everything should be back to normal in a few hours.

Edit 2013-01-06 12:00 UTC: Slaves finished synching, everything should be back.

This is a good time to point out that you are supposed to do backups, it may save you if someday we cannot recover the data. And also following our security recommendations, which aims at preventing common security issues.

RSS Feed