January 4th incident update
On January 4th at 23:59 UTC for about 40 minutes, modules available on
deno.land/x
and deno.land/std
failed to load properly. This was due to a
buggy patch that made code be served as HTML rather than raw text. This post
details what exactly happened and what we are doing to prevent this in the
future.
All services are now operating normally again. No data was lost. We take outages like these seriously and sincerely apologize for the disruption.
Timeline of events
At 23:59 UTC a change was merged into the deno.land repo which improved completion suggestions for language server clients. It also included a refactor of the code that determined if the raw code should be served to a client or the code wrapped in HTML for display in a browser. The code made an incorrect assumption about the run-time client’s behavior, causing HTML to be served to run-time clients instead of the raw code.
At 00:20 UTC on Wednesday a reversion of this logic was attempted, but the deployment was unsuccessful.
At 00:39 UTC the code was amended again to refer to an earlier deployment of
deno.land
directly for the dependencies, which allowed the code to be deployed
and restored service.
Root cause
deno.land/x
and deno.land/std
served code wrapped in an HTML user interface
to run-times, instead of the plain code. The code was refactored to provide more
complaint content negotiation, but did not account that run-time clients like
the Deno CLI and Deno Deploy provide an Accept
header in requests that
indicates that all content types are acceptable, including text/html
,
therefore the code served HTML to those clients.
Deno Deploy lacked a “revert” capability, meaning that reverting to the previous
commit was not possible and rolling forward to a new commit, which had
dependencies on code hosted on deno.land/x
and deno.land/std
was not
straight forward.
The code which determines if a client gets HTML or the plain code was tested using an incorrect assumption about what headers are sent from the Deno CLI and Deploy.
Impact
During the 40 minute outage, new deployments to Deno Deploy which had
dependencies on deno.land/x
or deno.land/std
failed, indicating that the
dependency was not a valid module. Also, any remote dependencies for
deno.land/x
or deno.land/std
which were not cached locally in Deno CLI would
have also failed, indicating that the dependency was not a valid module.
What’s next?
We are adding tests to deno.land
worker to test the correct behavior of this
code. We are also working on adding a feature to Deno Deploy to allow a
“rollback” or a “revert” to a previous deployment.