AFF Supporters can remove this and all advertisements
This is a follow up in a separate ticket regarding the problem you had with AFF this weekend and to discuss options.
The issue in itself was relating to the SSD drive taking over the boot loader which transpired after a power issue (Although this would have shown up next time the server was rebooted anyway). This shouldn't have taken more than 1-2 hours to identify and fix the rest of the time was all logistical in trying to get us to troubleshoot (without the tools) or the datacenter not responding or not being there - there was also a nearly 3 hour delay at one point on this end which I explained.
Given the timing of it the datacenter likely didn't have staff onsite which caused an initial few hour delay and the lack of troubleshooting assistance initially which is evident if you read between the lines of the replies we received.
Going forward you have some options,
1) We can have an IPMI card (or similar) permanently for the server which will avoid the need to wait for the datacenter to hook KVM's/etc up and cut out a large portion of these delays, you should however not be fooled into believing this will be the perfect solution as these often break and require actually being infront of the server. This issue for example needed a drive removed to confirm the issue and the order replaced, but it will significantly reduce the delay because if we can see it we can generally identify most issues quite rapidly and then guide them on what to do. Such as in this case.
This however is the cheapest option of them as it's basically a one time cost of the card. Anything that allows us an oversight of what is going on (assuming it actually works) will help us guide providers/take decisive action/resolve problems faster.
2) You can get an additional spare server with your current provider and we can configure it to be a failover server so your downtime should be in the minute range assuming your not having a power issue at the datacenter. These setups are not perfect with disk syncing but we've done thousands of them and maintain hundreds of them already so can quite comfortably manage this and wrap quirks up before they even occur.
3) You can move your server to another provider which will assist better in troubleshooting/have staff onsite 24/7. In reality for such issues of these while the actual work time is only 30-60 minutes they end up transpiring into 3-6 hours with most providers by the time everything is relayed unless you luck out and get a tech that actually is any good.
4) Same as #2 except an additional server located elsewhere to cover the event of your datacenter going down. This however would ideally need to be located on the West coast(such as LA) due to latency and isn't perfect again due to latency however can be made near perfect with the potential of loosing 1-2 minutes worth of posts in the event of a failover (You would need to remove the auto-fail back however as this would corrupt the database).
You however need to be the judge of what costs you most. Is the potential of ~12 hours or so in the event of a real hardware failure (Will your datacenter even replace the hardware/troubleshoot it?) verus the $300-400/month for an additional server/management costs (depending on the provider it could be more, west coast/la we can get you in that range however) for the potential of a long outage say once every 16-20 months.
I wouldn't also let this particular issue cloud everything as it's a rare issue in that it's relating to a recent hardware upgrade and only shown up when the server was rebooted, the real issue was all the delays not the technical problem itself.
not sure if it strictly applies but have heard good things about Amazon's cloud hosting offering which apparently is being deployed in AU at present.