Site is back up and running - just wanted to say thanks

Status
Not open for further replies.

munitalP

Suspended
Joined
Oct 10, 2006
Posts
3,802
ADMIN - thanks for the Facebook updates and info during this extended outage.

It must be so frustrating for you - as a group, I think we all feel for you when it comes to times such as this



G
 
Admin may have had a few stiff drinks after the outage !
 
As you evan imagine it was a pretty frustrating 18 hours for me. Unfortunately it came at a really bad time - a Sunday and just the week-end when I moved house!! Once things settle (and I get my office working properly!) will be investigating the entire AFF technical setup. An 18 hour downtime is completely unacceptable. If there are any AFFers with experience in hosting high-traffic vBulletin dedicated servers, please PM me. All help will be appreciated!
 
Australia's highest-earning Velocity Frequent Flyer credit card: Offer expires: 30 Apr 2025
- Earn 100,000 bonus Velocity Points
- Get unlimited Virgin Australia Lounge access
- Enjoy a complimentary return Virgin Australia domestic flight each year

AFF Supporters can remove this and all advertisements

Also, I had to do a manual Refresh of the site for it to re-load (CNTRL+F5), if any othere users still have trouble accessing.
 
I slept through most of it. I'll blame jet lag (even though I didn't change time zones). Mostly a refusal to sleep in CX F. Wanted to enjoy the offerings. :-)

Glad it's all working now though. Thanks Admin and all that got it up and going.


Sent from my Telstra iPhone using the Australian Frequent Flyer application.
 
As posted on facebook, I thought it was the Malaysian government blocking access to AFF:p...as I was never able to access AFF since checking into the Hilton KL. Glad to see the site back up again as I was staring to suffer from withdrawal effects. Thanks admin for getting it back up again, and the updates on Facebook!
 
I just assumed it went out in sympathy to QF? :rolleyes:

What was the issue (is it known?).
 
Its been a while since I have had to endure Sunday night television, thinking of taking up renovating, 605K profit for 10 weeks work seems a good deal :)
 
As I posted on FB, I don't usually go into details about our technical environment. But given the completely unacceptable 18+ hour outage yesterday, the inconvenience it caused everyone, the generous offers of help and support from our members, I'm now going to publish the analysis from our Server Administrator. I know that many of you are technical enough to understand this and have an opinion. I'd rather this doesn't become a public discussion, but if you have any opinion/insights/suggestion, please PM me.

Apologies again.


This is a follow up in a separate ticket regarding the problem you had with AFF this weekend and to discuss options.


The issue in itself was relating to the SSD drive taking over the boot loader which transpired after a power issue (Although this would have shown up next time the server was rebooted anyway). This shouldn't have taken more than 1-2 hours to identify and fix the rest of the time was all logistical in trying to get us to troubleshoot (without the tools) or the datacenter not responding or not being there - there was also a nearly 3 hour delay at one point on this end which I explained.


Given the timing of it the datacenter likely didn't have staff onsite which caused an initial few hour delay and the lack of troubleshooting assistance initially which is evident if you read between the lines of the replies we received.


Going forward you have some options,


1) We can have an IPMI card (or similar) permanently for the server which will avoid the need to wait for the datacenter to hook KVM's/etc up and cut out a large portion of these delays, you should however not be fooled into believing this will be the perfect solution as these often break and require actually being infront of the server. This issue for example needed a drive removed to confirm the issue and the order replaced, but it will significantly reduce the delay because if we can see it we can generally identify most issues quite rapidly and then guide them on what to do. Such as in this case.


This however is the cheapest option of them as it's basically a one time cost of the card. Anything that allows us an oversight of what is going on (assuming it actually works) will help us guide providers/take decisive action/resolve problems faster.


2) You can get an additional spare server with your current provider and we can configure it to be a failover server so your downtime should be in the minute range assuming your not having a power issue at the datacenter. These setups are not perfect with disk syncing but we've done thousands of them and maintain hundreds of them already so can quite comfortably manage this and wrap quirks up before they even occur.


3) You can move your server to another provider which will assist better in troubleshooting/have staff onsite 24/7. In reality for such issues of these while the actual work time is only 30-60 minutes they end up transpiring into 3-6 hours with most providers by the time everything is relayed unless you luck out and get a tech that actually is any good.


4) Same as #2 except an additional server located elsewhere to cover the event of your datacenter going down. This however would ideally need to be located on the West coast(such as LA) due to latency and isn't perfect again due to latency however can be made near perfect with the potential of loosing 1-2 minutes worth of posts in the event of a failover (You would need to remove the auto-fail back however as this would corrupt the database).


You however need to be the judge of what costs you most. Is the potential of ~12 hours or so in the event of a real hardware failure (Will your datacenter even replace the hardware/troubleshoot it?) verus the $300-400/month for an additional server/management costs (depending on the provider it could be more, west coast/la we can get you in that range however) for the potential of a long outage say once every 16-20 months.


I wouldn't also let this particular issue cloud everything as it's a rare issue in that it's relating to a recent hardware upgrade and only shown up when the server was rebooted, the real issue was all the delays not the technical problem itself.
 
Interesting, thanks for sharing. I think most here, whilst wishing AFF was available instantly via every known electronic medium ( :rolleyes:), are realistic enought to know there's a cost, and the vast majority arent paying anything.

We looked at this (downtime) issue for a professional body website, and decided the target downtime would be <24hrs as the cost to do otherwise became prohibitive (and not a good use of members funds). In the end, in AFF's case, it's "information and leisure", not banking or government, so whilst losing access for up to 24hrs isnt ideal, in the assessment we made, the cost to "prevent" wasnt worth it (and as you see, "prevent" doesnt always guarantee "prevention").
 
Hey Admin.

Just some thoughts on the solutions provided by your sysadmin. I work as a freelance web and graphic designer in my spare time and have managed high traffic forums before.

Basically the hardware failure was something that shouldn't have happened in the first place, SSD's should not really be used for servers especially when they can cause issues with server bootloaders. They should be using standard hard drives which are just as reliable. The 18 hour downtime is a bit silly, it is an issue that many service hosters would have had fixed within the hour. A lot of webhosts run redundancy servers at no or minimal extra cost that kick in as your sysadmin said after a certain period of down time.

I would probably look at switching hosts if this kind of thing is happening often. I'm unsure if you're on a dedicated server or a shared server with your current host, but really I wouldn't think it would matter. Without seeing the traffic statistics of the site i'd say that a shared server would be just fine for one vBulletin install. I use a fantastic host in America who i've hardly ever had problems with, and who have great support and are pretty cheap too. PM me if you're interested in more details.
 
Hi Admin

I've just sent over a PM with some options for you.
As I mentioned in the PM, the response has a "backyard operator" feel about it, it also mainly reads as an excuses list.
 
I totally agree that servers shouldn't be run with SSD's. SAS drives will do the trick :)
 
not sure if it strictly applies but have heard good things about Amazon's cloud hosting offering which apparently is being deployed in AU at present.
 
Status
Not open for further replies.

Become an AFF member!

Join Australian Frequent Flyer (AFF) for free and unlock insider tips, exclusive deals, and global meetups with 65,000+ frequent flyers.

AFF members can also access our Frequent Flyer Training courses, and upgrade to Fast-track your way to expert traveller status and unlock even more exclusive discounts!

AFF forum abbreviations

Wondering about Y, J or any of the other abbreviations used on our forum?

Check out our guide to common AFF acronyms & abbreviations.
Back
Top