I have the same problem, so there is probably something wrong with the servers.
Since I still am on 1.2, I thought, this could be the reason. Thanks for testing with 1.3
Errors which get caught by automation also get posted automatically, but some errors aren’t caught automatically. Unfortunately the ops team is usually busy fixing the problem and we often forget to update the status until after the issue is fixed for these manual issues.
Sorry about the login problems recently. There have been two causes:
A metrics routine that overwhelms the database. We’re in the process of creating a follower database which can get slammed without impacting production.
An error in how we were handling our web proxies for user login: as soon as N users in a row failed to log in via the website, everyone got blocked because the locking logic only saw the proxy IP, so it blocked that. Blocking the proxy blocks everyone who tries to log in via the website. We’ve now fixed this so that we block the originating IP instead of the proxy. That could still be bad for conferences, etc, but it is the best way to keep out brute force attacks on passwords.
@rufus I think a big metric which would help tons is performance of the registry CDN.
Often we find that images in australia are super slow to download and super fast in north america. It would be very interesting to keep track of this somehow.