Editor's Note: John Savill contributes Frequently Asked Questions about Azure, PowerShell, and other Microsoft products and services three times each week at Windows IT Pro. He is a well-respected member of the Microsoft tech community and a frequent speaker at industry events.

His training classes through our e-learning portal will help you become more knowledgeable about these various technologies. This tip is just one example of what he will be teaching in an upcoming Master Class.


Why backup if I have replication??

This has come up a number of times recently so I thought I'd put my perspective down. Maybe it will help. But first, why backup, why replicate?

Typically we backup for a range of reasons. Below are a few:

  • Regulatory requirements to keep data for a certain amount of time, for example many financial systems have to retain data for 7 years and there are big penalties for not adhering to this. Keeping all 7 years on the source system may not be possible because of storage requirements, performance implications and even if you could if that system had a problem you may lose the data which is not allowable. Often for backup solutions a granularity of retention can be used for example keep a daily for 30 days, a weekly for 12 weeks, a monthly for 12 months then an annual for 7 years. This meets all the requirements for retention, granularity of restore for any likely request while optimizing storage actually used (because even deltas which only store the changes still take up space).
  • Same as above but just the company wants a data available for its own purposes.
  • To provide protection from data corruption or accidental deletion/modification. Something bad happens, someone presses a key they shouldn't and you need to get the backup from yesterday to retrieve the data.
  • Some kind of hardware failure and the system has to be restored to a new piece of equipment, possibly even in another location

That last point has challenges. If its a system failure then setting up a new system/location then retrieving the backup data (even if in the cloud there is a finite amount of time it takes to retrieve the data) takes some time. This impacts the Recovery Time Objective (how long the system can be down) and could easily be hours or days. The Recovery Point Objective (how much data can be lost) would also be high depending on how often you backup. It is for this reason we have replication of data.

Replication of data can be synchronous (data is written to the primary data target and the replica essentially at the same time and before the writing application receives an acknowledgement that the data has been written) or asynchronous (primary is written and the requesting application receives an acknowledgement and then the replica is written as quickly as possible or on a schedule). Synchronous can impact performance if the replica introduces latency and so typically is used within a facility where as asynchronous would be used between facilities (but introduces a certain amount of possible data loss in an unplanned problem).

As you've probably guessed, replication provides a solution to providing a fast recovery and small amount of data loss in instances where a system or facility is lost. Commonly known as disaster recovery. Now this is costly. I have to have complete systems available at the replica to receive the replicated data and run the systems in the event of a failover but it gives the best experience.

So if I have replication, why still backup? Certainly it may reduce WHEN you need to restore a backup. In the event of a system failure for example you would not have to restore a backup anymore, you would start up the replica system. Remember there are cost considerations though whereas for backups I don't have to have the hardware running all the time, maybe I use a 3rd party facility that I could restore to.

What about a logical corruption such as a deletion of data, modification of data or some malware that encrypts all your data. Well that data change would have replicated to the replica. Replication is not helping you here. Well that's not completely true. Often replication systems have the ability to have point in time views of data so in the event of a failover you can opt to failover to the latest time or perhaps how it was 4 hours ago however would that help if someone just deleted a folder? Do you want to perform a DR failover to get a folder back? Now there are other technologies to help with file shares to avoid having to restore backups but you get the idea. This is a key point. Forget about replication, many technologies now have capabilities to let you roll back to previous point in time views to avoid having to use a backup: file servers, Active Directory, SharePoint etc. but not everything can do this. For some things you need to restore a backup to get data back from deletion or corruption.

What does all this mean? You still need backups. For that long term, offsite protection backups are critical. You need them less with replication and with technologies that have their own point-in-time view/recover capability but you can't get rid of them completely!

We'll cover things like this and more in the Master Class so hope to see you there!