Mozilla has published an initial postmortem on the gaffe that saw extensions for users of its Firefox browser automatically disabled after a certificate was allowed to expire, promising new mechanisms for emergency updates and to delete telemetry gathered as a result of twisting the Studies system into distributing the initial fix.
Firefox users found themselves without browser extensions this weekend, when Mozilla allowed a key certificate used for their validation to expire - rendering every single extension, including Mozilla's own, invalid in the browser's eyes. An emergency fix was distributed in a novel manner, using the Studies system typically used for A-B testing to push a temporary fix to all users while a new version of the browser could be built and distributed. It got the majority of users up and running as quickly as possible, but came with a cost: Those who had disabled telemetry, which sends anonymised browser data back to Mozilla for analysis, needed to switch it back on order to receive the patch.
'We strive to make Firefox a great experience. Last weekend we failed, and we’re sorry,' Mozilla's Joe Hildebrand writes in the company's mea culpa. 'An error on our part prevented new add-ons from being installed, and stopped existing add-ons from working. In order to address this issue as quickly as possible, we used our "Studies" system to deploy the initial fix, which requires users to be opted in to Telemetry. Some users who had opted out of Telemetry opted back in, in order to get the initial fix as soon as possible.
'In order to respect our users' potential intentions as much as possible, based on our current set up, we will be deleting all of our source Telemetry and Studies data for our entire user population collected between 2019-05-04T11:00:00Z and 2019-05-11T11:00:00Z.'
Going forward, the company has pledged to find a way to distribute emergency patches in a manner that doesn't require the (ab)use of the Studies system. 'We need a mechanism to be able to quickly push updates to our users even when — especially when — everything else is down. It was great that we are able to use the Studies system, but it was also an imperfect tool that we pressed into service, and that had some undesirable side effects,' explains Mozilla chief technical officer Eric Rescorla.
'In particular, we know that many users have auto-updates enabled but would prefer not to participate in Studies and that’s a reasonable preference (true story: I had it off as well!) but at the same time we need to be able to push updates to our users; whatever the internal technical mechanisms, users should be able to opt-in to updates (including hot-fixes) but opt out of everything else. Additionally, the update channel should be more responsive than what we have today. Even on Monday, we still had some users who hadn’t picked up either the hotfix or the dot release, which clearly isn’t ideal. There’s been some work on this problem already, but this incident shows just how important it is.'
More details about Mozilla's approach to prevent issues like this in the future, and to fix issues more quickly, can be found in Rescorla's initial postmortem of the problem - which will, the company has promised, be followed by a more detailed analysis in the near future.
September 18 2020 | 18:30