Horror Story # 10 - Patch Problem

Tuesday, March 2, 2010 by Steven Toole
An Excerpt from Don Jones' Definitive Guide to Backup 2.0 about Backup and Disaster Recovery

This story really illustrates why I dislike point-in-time exchange backups so intensely. Sure, you can achieve a lot of business goals using Backup 1.0 techniques and tools, but you have to be so very careful in order to get exactly what you want. Who needs that extra mental overhead?

I work in one of my company’s larger data centers, and support about 100 servers. Most of these are file servers, but there are a couple of domain controllers, a few SQL Server machines, and three Exchange Server boxes. We are very good about patching our computers. We typically will not apply a set of patches until we have a full backup of the computer’s OS and application files, and we usually make a backup right after applying the patch, too. We tend to apply patches during maintenance windows when the servers aren’t otherwise available. On some of our larger servers (the Exchange machines come to mind), it gets difficult to grab one full backup and apply patches during our 6-hour maintenance windows (you try taking email away from people for longer), so sometimes we take a backup one night and apply patches the next, then take a second backup the following night.

The system works—but not always well.

I can recall a couple instances where Exchange patches have caused problems with some of our third-party software, and we needed to roll back to the pre-patch backup. Unfortunately, a whole day of work had passed since that backup was made, so we lost all that work. People become incredibly unhappy when email goes missing.

In at least once instance, we didn’t realize a general Windows hotfix was causing problems for about a week. At that point, the pre-hotfix backup was pretty aged. This was on a domain controller, so we decided to apply the old backup anyway, knowing that the domain would bring itself up to date through replication. Unfortunately, the backup also—we found out—had some deleted objects in the domain, which were near the end of their tombstone life. The practical effect was that about a dozen formerly-deleted objects suddenly reappeared in the domain. Our security auditors freaked out, people were yelled out, and it actually took us a while to work out what had happened, since that’s not a scenario you see every day. We’ve since decided to rely less on backups for undoing patches.

We’ve started spending more time testing patches, which is of course a good idea but it’s very boring and it takes a lot of time we didn’t really have to spare. It also means our patches only get rolled out about every other month, rather than every other week, and I worry about what happens when one of those patches fixes some major security hole—and we have to leave the hole open for 2 months just because of our processes.

Again, the Backup 1.0 mentality has deeper-reaching effects than just disaster recovery problems. In this instance, the company has actually decided to run out-of-date software for longer simply because of the way their backup processes work. Unbelievable. If ever there was a case of the “technology driving the business” rather than the other way around like it should be—this must be that case.

There are easy-to-recognize problems here, which should be familiar to you at this point:
  • Backup 1.0’s point-in-time snapshots don’t provide much granularity when it comes time to roll back something nor does it function as much needed continuous data protection software.
  • Backup 1.0’s reliance on backup or maintenance windows took away some of the author’s flexibility with regard to his Exchange infrastructure.
  • Except in a few special situations, Backup 1.0 tends to be an all-or-nothing proposition: either you roll back the entire server or you live with what you’ve got. There aren’t many good ways to restore a single application; Backup 2.0, by contrast, can more easily pull out just the bits related to a specific application and restore it—with a single click.
  •  
To learn more download Chapter 2 of the Definitive Guide to Backup 2.0

Horror Story # 7 - Weight Loss Plan

Saturday, February 27, 2010 by Steven Toole
An Excerpt from Don Jones' Definitive Guide to Backup 2.0 about Backup and Disaster Recovery

This one is just funny—I had to share it. Taken from http://appassure.ning.com/profiles/blogs/backups-what-backups: I was lead tech for a big server upgrade for a company that had to keep records for 7 years. We were imaging the old server data up on the network, then re-imaging back down [to the new server].

Well, one day I thought I would multitask four machines at the same time and go to lunch. When I came back from lunch, I did a wipedisk of the old hard drives and noticed that one of the techs kept leaning over the work bench; his belly was so big that it would hit the space bar, which would cancel the data transfer.

Well, to say the least, I found that he had done this on the day I went to lunch. Well, no one had any backups, not even Exchange backups, so 7 years of data was gone.

I know it’s not funny, but…it kind of is. The lesson to learn here, though, is that point-in-time images are really no better than an old-style tape backup. Anything that is just making a copy of the data at a particular point in time is Backup 1.0 mentality, where we’re more concerned with getting a copy of the data—and, as we’ve seen in this story, a lot of strange things can go wrong to interrupt that copy and make it useless.

So how would Backup 2.0 solve this problem? By having a continuous backup of the old server in the first place. There’d be no need to take an image during a server upgrade, and in fact, the upgrade could be done faster and more smoothly by just relying on that block-by-block, Backup 2.0-style backup recovery program. An upgrade is really no different than a bare-metal disaster recovery—just perhaps less urgent. So a good backup solution should be able to assist with a server upgrade or migration—all the more reason to have a good backup recovery program in place.

To learn more download Chapter 2 of the Definitive Guide to Backup 2.0

Horror Story # 4 - Migrating the Cluster—Or Not

Wednesday, February 24, 2010 by Steven Toole
An Excerpt from Don Jones' Definitive Guide to Backup 2.0 about Backup and Disaster Recovery

Sometimes, “disaster” doesn’t always mean a failed server. Sometimes a solid lack of backup and disaster recovery planning can provide all the disaster you need!

We received the new servers for the new cluster. The job: Swap out an old Windows for a brand new one, configure it, and move all the files and data over to the new cluster. This has to be done within a late-night maintenance window, which means my wife will not be happy again, but I’ll buy flowers. Start time is around 11:00pm, and it must be done by 7:00am—8 hours. Both clusters would eventually be running the same version of Windows, once I got everything installed. The cluster servers came with no OS installed at all, though, so installing Windows would be my first step. Fortunately, the server hardware was basically the same in both clusters.

After driving 300km — a horror driving on Polish roads — I set up the server hardware. The old cluster is humming along next to me, and I’m ready to get Windows installed. “Where are the drivers?”

And then the problem arose: Someone had lost the server manufacturer’s drivers disc. I know what you’re thinking—just download them, right? Well, suffice to say that this is a very secure organization—perhaps governmental—and nobody in the building at that hour could actually get to the Internet. So I had to pull out my mobile phone and, despite the high cost of data transfers, download the drivers for the server over the 3G cellular network. Then my phone’s battery died—and me without my charger.

I asked them about server backups, figuring we could perhaps just use those to restore to the new machine, but all they back up are the files - basically they are doing email and file archiving. They said they had never been able to do a bare-metal restore using their Exchange backups, so they just stopped backing up the operating system (OS).

Internet access was available again at 8:00am, and someone had to take me into the secure area where Internet access was available. I was tired, the entire night was wasted, and I had to do it all again the next night when I finally had the drivers in hand.

Ouch! We’ve all had a late night like that at some point, and it’s never fun. And the first thing I have to ask myself is why they couldn’t simply take a backup of the old cluster machines and restore them on the new hardware? Because in the Backup 1.0 world, restoring to dissimilar hardware is often impossible or at least “not recommended.” As a result, many organizations just back up their files — but a backup is useless without someplace to restore it. A more modern Backup 2.0 mentality, however, would say that this cluster migration was really no different than a bare-metal restore after a complete cluster failure—why not restore the backup to the new hardware, and call it a night? In the Backup 2.0 world, we’d be taking block- level backups of the entire server, so we could just apply that backup to the new hardware (which we’re told was substantially similar to the original hardware) and call it a night.

Less time on the job, a lower cell phone bill, and a less frustrated wife.

To learn more download Chapter 2 of the Definitive Guide to Backup 2.0

Horror Story # 3 - Exchange Server Backup Software Failure

Tuesday, February 23, 2010 by Steven Toole
An Excerpt from Don Jones' Definitive Guide to Backup 2.0 about Backup and Disaster Recovery

Nobody likes it when the Exchange Server goes down. Honestly, I think some companies could go without a file server for longer than they could live without email—especially companies whose employees have Blackberries. So here’s a story to move your emotions:

When my company’s Microsoft Exchange Server Backup Software failed at the end of the quarter, it could not have happened at a worse time. It began with the VP of Sales yelling “Email is down, and customers can’t send us their orders!” Then my Blackberry started going off, calls, emails, IMs—it was relentless. When I logged on to the Exchange Server, I found that some of my most critical mail stores were no longer mounted. When I tried to remount them, I received the ambiguous yet ominous JET-1601 JET_errRecordNotFound error message. I immediately connected to the replication server that runs at one of the company’s remote sites, only to find that I couldn’t mount those mail stores either.

When I called Microsoft, technicians prescribed the standard procedure of running Eseutil. They warned me, however, that the error message probably indicated a corruption problem deep within the database and that running Eseutil might result in cleaning the stores of all user data. I took the leap, on the chance that it would be quicker than getting the restore process underway. Running Eseutil took hours, then failed with the even more ambiguous JET -1003 JET_errInvalidParameter. At that point, I knew I HAD to go to the backup.

My company runs full backups every Saturday night and incremental backups the rest of the week. I started by recovering the most recent full backup, then applying the incrementals until I had the backup from the night before the failure. As you can imagine, the calls, emails, etc. kept coming all the while I was copying the mail stores from my disk to disk backup—although they did taper off a bit after 11:00pm, when our West Coast office closed.

Once our data was back on the primary server, it was time to roll the logs and mount the database. However, when the logs were about 80 percent applied, they failed with the JET -501 JET_errLogFileCorrupt message. At that point, Microsoft support could only suggest running Eseutil through my entire log chain, noting the corrupted log, deleting anything except log files from the log directory, and deleting the corrupted log and all the logs created thereafter. Then I could finally restart the log roll operation from scratch. This procedure took more than 6 hours. In the end, my company lost 2 days of email messages, and recovery took more than 30 hours. The cause turned out to be a problem with the RAID controller driver that had taken months to manifest itself after a previous server upgrade.

Executive management figured it cost the company about $50K, so they definitely wanted to know what had happened and how it could have been prevented—and how it would be prevented from happening again. Let’s just say “wanted to know” means that if I didn’t have a good answer, my name was going on the top of the next layoff list. I was seriously committed to finding a better recovery solution.

So here’s what I learned on my worst day as a network admin: You can have multiple copies of your data—on replicated servers, on disk, and on tape—but if you can’t mount the copies, they aren’t any good.

The Backup 1.0 mentality cost this company $50,000. Why? Because the Backup 1.0 mentality focuses on making backups, not restoring data. With Backup 1.0, we tend to focus on backup windows, tape drives, and so on—we don’t tend to focus on what will happen in the event of a disaster. Even with their backups and 30 hours of effort, the company still lost 2 days of emails—this is a backup plan?

Exchange Server is certainly a complex and difficult product when it comes to Exchange backups. The back-and-forth between the Exchange Server product team, the Windows product team, and Microsoft’s own backup products (in the System Center family) means that Exchange and Windows alone don’t offer an effective backup plan. Third-party vendors, however, tend to focus on Exchange-specific agents that just make copies of the data as handed over by Exchange. As this horror story points out, Exchange might not always be handing you good data, meaning your Exchange backups are useless.

To learn more download Chapter 2 of the Definitive Guide to Backup 2.0

Understanding Application-Aware Backup Recovery Programs

Wednesday, February 17, 2010 by Steven Toole
An Excerpt from Don Jones' Definitive Guide to Backup 2.0 about Backup and Disaster Recovery

Some applications—primarily mission-critical, always-on applications such as Microsoft SQL Server, Exchange Server, and SharePoint—present their own challenges for backup and disaster recovery. These applications’ executables are always running, and they always have several data files open, making it difficult for file-level backup software to get a consistent snapshot.
 
To help address this, the applications’ designers take varying approaches. SQL Server, for example, has its own SQL Backup Recovery capability, which is tied to the product’s own unique architecture. Traditionally, the best way to get a SQL Server backup is to ask SQL Server to do it. You might, for example, use SQL Server’s own tools to produce a backup file, then grab that file with a traditional file-based backup solution. Or, you might create an agent that taps into SQL Server and gets the data that way—the approach used by most enterprise-level, Backup 1.0-style backup solutions. Figure 1.9 shows a common dialog box for a backup solution’s configuration, showing that a SQL Server-specific agent is loaded and able to stream data from SQL Server to the backup software. Exchange Backups functionality might work similarly.

Exchange Server’s developers took a slightly different approach, choosing to integrate with Windows’ Volume Shadow Copy service. Essentially, they provide a copy of the Exchange data files through Volume Shadow Copy; the Exchange Server Backup Software solution simply needs to access the Volume Shadow Copy Application Programming Interfaces (APIs) and request the “latest copy” of the database. Again, it’s Exchange Server that’s doing most of the work, but a dedicated agent of some kind is usually needed to get to the right APIs. Even Windows Server 2008’s built-in backup can be extended to “see” the Exchange Server databasesf or Exchange Database Backup.
 
The downside is that these application-specific approaches are still Backup 1.0 in nature, meaning they’re grabbing a snapshot. You’re still at risk for losing data and work that occurs between snapshots; particularly with these mission-critical applications, I think that’s just unacceptable.

Block-level backups can certainly solve the problem because they’re grabbing changes at the disk level and don’t particularly need to “understand” what those disk blocks are for. A disk block that’s part of a file looks the same as one that’s part of a SQL Server database, so the backup solution just grabs ‘em all. But from a recovery viewpoint, your backup solution does need some additional smarts. Here’s why: A simple file—say, a Word document—consists of several blocks of disk space. It’s easy to keep track of which blocks make up any given file, and no disk block will ever share data from two files. If you need to restore Salaries.xls, you figure out which blocks that file lives on, and restore just those. Easy.
 
With complex data stores—such as SQL Server and Exchange Server—things aren’t so easy. A single mail message might occupy multiple blocks of disk space, but those same blocks might also contain data related to other messages. The database also has internal pointers and indexes that need to be restored in order for a given message to be accessible. So a block-based backup doesn’t need much in the way of extra smarts to make a backup, but it will need some cleverness in order to restore single items from that backup. Solution vendors tend to approach this by using plug-ins: It’s easy to think of these as being similar to the Backup 1.0-style agents, but they’re not. These plugins don’t necessarily assist with the backup process (although they may record special information to assist with recoveries), but they do contain the smarts necessary to peer “inside” complex data stores for single message resotre.


To learn more download Chapter 1 of the Definitive Guide to Backup 2.0

Chapter 3 of The Definitive Guide to Windows Application and Server Backup 2.0

Friday, September 11, 2009 by Joshua Hoffman
Good news! Chapter 3 of Don Jones's latest book - The Definitive Guide to Windows Application and Server Backup 2.0 - is now available for download on our website! Chapter 3 helps rethink the concept of whole-server backup and disaster recovery - going beyond just the data, beyond just the applications, and capturing the entire server, settings and all, so it is completely protected.

If you're just learning about Don's new book now, there's plenty of time to catch up! Chapter 1 and Chapter 2 are also available. Future chapters (you can see a list here) wil cover Exchange Backup and Recovery, SQL Backup and Recovery, Cloud Backup, and much more.

We're anxious to hear what you think. Share you comments here, or send us a tweet @appassure.

Have a great day,

Josh

Gmail Outage 'A Big Deal', According to Google

Thursday, September 3, 2009 by Joshua Hoffman
If you're one of the approximately 30 million people who use Gmail everyday, you might have noticed that the (obviously popular) e-mail service was unavailable for almost two hours on Tuesday afternoon. Even if you're not a Gmail user, you probably heard about it from one (or every one) of the media outlets who writes about technology (TechCrunch, Wired, the New York Times, etc.)

As discussed by Thomas Claburn in InformationWeek, even Google called the Gmail outage a big deal. And it is a big deal. The outage served to underscore (on a massive scale) how critical e-mail remains in our ability to communicate every day and why having an email recovery tool matters.

Of course, we're glad that Mr. Claburn chose to cite our whitepaper, "Preventing Your Next Microsoft Exchange Outage," to underscore the idea that it's critical to plan realistically, and in your planning, expect that you will face an outage at some point. Exchange backups are essential but the real trick comes in minimizing the impact of an outage. A lesson learned the hard way by Google this week.

Just some food for thought.

Cheers,

Josh

High Availability for the Remaining Server Roles for Exchange Backups

Monday, August 10, 2009 by Lautaro Cabrera
As replication is used for the Mailbox server role only, how do you ensure that the remaining roles are backed up properly in your Exchange backups? Let’s talk about how to achieve high availability in the remaining server roles. Providing high availability for the Client Access, Hub Transport, Edge Transport, and Unified Messaging servers is for the most part similar:

* Client Access Server—Deploy multiple client access servers and use Network Load Balancing (NLB) to provide high availability.

* Hub Transport Server—Resilient by default due to the fact that all Hub Transport servers are registered within AD. You can achieve high availability by deploying multiple hub transport servers.

* Edge Transport Server—Deploy multiple edge transport servers and use DNS MX records to achieve high availability.

* Unified Messaging Server—Deploy multiple unified messaging servers and place them into the same dial plan so that the VoIP gateways can retrieve a list of servers within the dial plan. Configure VoIP gateways to round-robin calls to ensure high availability if a unified messaging server is down. This is important for Email and file archiving.

Once you have a reliable Exchange backups plan that meets your RPO, RTO, and SLA requirements, and you have the ability to test that backup to ensure reliability, you need to move on to the last part of the fast recovery plan—finding out both how to recover Exchange from exchange backups quickly and ensure you have what you need in your environment once Exchange is recovered. The next article will explore this topic and consider recommendations for recovery in Exchange Server.


… Excerpted from Backup Methods Available for Exchange by Ron Barrett (published by Realtime Publishers).
Download the full version here

Best Practices for Exchange Backups

Thursday, August 6, 2009 by Lautaro Cabrera
To be sure you have the ability to perform fast recoveries in Exchange Server, you need to be sure you have a good method of Exchange backups. The Exchange backup process must be fast, complete, and, of course, recoverable.

Shrinking the backup window in Exchange can be achieved with the use of multiple storage groups. Keeping databases to a manageable size will also shrink the time it takes to back up those databases. Storage compression and virtualized storage can help to shrink backup windows by requiring less need for full backups. Replication can be the primary fast recovery option with backup as the secondary option.

Microsoft has a great document titled “What Needs to Be Protected,” which is a good gauge for how to setup your Exchange backup strategy.

Another best practice is to create reliability for Exchange backups. This can be done several ways. One way I like to emphasize is the use of disk imaging backups for Exchange. After you have a disk backup of the Exchange Server, you can use tape backup or replication to create redundancy.


… Excerpted from Backup Methods Available for Exchange by Ron Barrett (published by Realtime Publishers).
Download the full version here


Backing Up Domain Controllers for Exchange Backups

Wednesday, August 5, 2009 by Lautaro Cabrera
When dealing with Exchange Server Backup Software, it is important to think about backing up domain controllers in an Exchange environment. Since Exchange Server 2003, the importance of Active Directory (AD) to Exchange makes it necessary to ensure you have at least one domain controller backed up for fast recovery of your Exchange backups. Performing a system state backup on a domain controller will back up the necessary AD files. Remember that domain controllers have circular logging enabled for AD; therefore, any data written to AD after a backup will be lost.

Depending on the frequency of changes to your domain controllers, you should back up at least one domain controller nightly. Doing so will ensure that you can meet RPO and RTO levels and ensures that you will not run into trouble by having an “old” AD Exchange backups, which would be un-restorable.


… Excerpted from Backup Methods Available for Exchange by Ron Barrett (published by Realtime Publishers).
Download the full version here

VSS vs. Streaming (Legacy) Backups

Tuesday, August 4, 2009 by Lautaro Cabrera
Exchange Server offers two options for backing up data. You don't have to buy Exchange Server Backup software. Both options support the four backup methods (full, incremental, differential, and copy). The options are the traditional streaming (legacy) backup, which utilizes the ESE API and has been the available option for backing up Exchange Server using NTBackup, Windows Server Backup, and many third-party Exchange backup solutions.

Highlights of streaming (legacy) Exchange backups:

* Exchange Backups are taken from the active copy of the database
* Can perform backup at the database level
* Only one backup running against a single storage group
* Separate storage groups can be backed up concurrently

Warning : Windows Server 2008 does not support streaming backups and is not Exchange-aware. Therefore, a third-party solution that utilizes VSS is required for Exchange backup in Windows Server 2008.

The second method utilized is the Volume Shadow Copy Service (VSS), which provides a point in time “snapshot” of your data. In subsequent backups, it looks at the last snapshot and then backs up only the changes.VSS was introduced to Exchange 2003; although it provides the ability to take shadow copies, these copies were made at the file level and were not Exchange-aware. Snapshot backups in Exchange are fast and consistent due to the use of checksums to the database pages. Highlights of VSS:

* Exchange Backups can be taken from the active and passive copy of the database
* Can perform backup at the storage group level
* Separate storage groups can be backed up in parallel


… Excerpted from Backup Methods Available for Exchange by Ron Barrett (published by Realtime Publishers).
Download the full version here


Exchange Server 2007 won't run on next Windows Server

Monday, August 3, 2009 by Joshua Hoffman
Big news out of the Microsoft Exchange team - Exchange Server 2007 will not be supported on Windows Server 2008 R2. If you want your Exchange servers to reap the benefits of the next version of Windows, you'll have to upgrade to Exchange 2010.

Don't get us wrong - we're excited about the upcoming releases of both Windows and Exchange (we drool over new features.) And this will likely be less of an issue for those with Software Assurance, where upgrades are covered under their licensing agreement. However, even if the software is already paid for, an e-mail platform upgrade/migration is no small task.

We'll certainly be increasing our focus on digging up new content to help guide you through the process of ensuring complete integrity of your Exchange backup and email and file archiving, in the hopes of making it as smooth and painless as process. In the meantime, we'd love to hear your thoughts on this announcement. What is your Exchange Backup process? What Exchange Server Backup Software are you using and will this change with a migration? Is this expected news, or does it have you up in arms? Post your comments here, or send us a tweet at @appassure.

Tags: exchange 2007, exchange 2010, migration, support, upgrade, windows server 2008 r2