Disaster recovery is a set of processes, techniques, and tools used to swiftly and smoothly recover vital IT infrastructure and data when an unforeseen event causes an outage. The statistics tell the best story about the importance of disaster recovery—98 percent of organizations reported that a single hour of downtime costs over $100,000, while 81 percent indicated that an hour of downtime costs their business over $300,000.
The adoption of cloud computing into mainstream enterprise IT has altered the nature of the tools and platforms companies use for disaster recovery purposes. The likes of Azure Site Recovery and AWS (Amazon Web Service) provide the opportunity to use cloud services for disaster recovery. Some platforms, such as N2WS, recognize that increased reliance on the cloud for mission-critical enterprise workloads necessitates a platform that backs up AWS and other cloud environments because those environments are not immune from disaster either.
Big Data has further altered the disaster recovery approach for many companies. Fast-moving and high-volume data inundates most businesses daily, and many businesses use powerful systems to run Big Data workloads with the intention of gleaning insight from this data. Such insight can increase revenue, mitigate risks, reveal new business opportunities, and more.
Big Data disaster recovery entails backing up systems that collect all this data, ensuring that important workloads can be restored promptly and data loss can be minimized in the event of an unplanned outage. In this article, you’ll find out exactly why Big Data disaster recovery (DR) is important and you’ll also get an overview of some best practices for Big Data DR.
Big Data has been regarded as mission-critical by IT executives for several years now. If an unplanned outage occurs, inadequate disaster recovery practices for Big Data systems can lead to serious consequences, including lost revenue and missing out on vital business insights. Therefore, it’s imperative to treat Big Data environments like any other mission-critical IT infrastructure and ensure they can be recovered properly if disaster strikes.
A unique problem posed by Big Data and disaster recovery is the continuous streaming of data often used by enterprises for network monitoring, fraud detection, and other use cases that demand real-time processing and analytics. With stream processing, data is fed in real-time into analytics tools like Apache Spark using a processing framework such as Apache Storm. The high velocity of Big Data calls for a disaster recovery solution that returns functionality to Big Data systems as soon as possible.
You can use mirroring to replicate your Big Data datasets across multiple Hadoop clusters. Mirroring is particularly important for disaster recovery because it enables your Big Data workloads to failover between sites for high availability. Apache Falcon is a feed processing and feed management system that facilitates data replication across multiple clusters.
If an entire-site wide disaster occurs that takes down all your on-premise IT infrastructure, mirroring to an off-premise Hadoop cluster deployed in a secondary data center (private cloud) or in the public cloud can minimize downtime for mission-critical Big Data workloads. Make sure your setup replicates stream metadata alongside data.
Filter Extraneous Data
The truth is that Big Data often contains only small nuggets of insight among a raft of extraneous information. Social media data, for example, is often a collection of mostly meaningless information interspersed with some interesting insights. The same also rings true for streamed data sources, such as sensors.
Filtering the useful information from the extraneous can help reduce storage costs for disaster recovery. Data visualization can help here, as can a strong data model. Data with a low density of information doesn’t need to be recovered rapidly, and it is best backed up in the lowest cost storage solution possible. Another possible approach could be to segment data based on how frequently it’s accessed.
Reduce Backup Costs in the Cloud
Backup costs for Big Data can be quite high, particularly when you consider that the volumes of data gathered by enterprises grow all the time. Large volumes of Big Data typically accumulate in Hadoop clusters, which are groups of computers typically used on-premises for Big Data storage and analytics.
The cloud provides a cost-efficient option for data backups because you can only pay for the exact storage that you need with leading cloud vendors, such as Microsoft Azure and AWS. Additionally, an off-premise cloud backup location is immune from the disasters that can take out on-premise Big Data systems. You can further improve the efficiency of the cloud for disaster recovery using services like N2WS.
Test Your Disaster Recovery
There is no use investing significant time and money into creating a detailed disaster recovery plan for Big Data without actually testing that plan. The tests you run must verify whether the tools and procedures you use for restoring Big Data systems can do so within your specified recovery time objective (RTO) and recovery point objective (RPO).
The frequency at which tests should occur is organization-specific, but a prudent approach would be quarterly to semi-annually, particularly for new disaster recovery plans. A change to less frequent testing can be made after repeated regular tests in the first year.
Big Data processing, analytics, and storage have transformed from being used by relatively few businesses to becoming central to operations for many enterprises. In a data-driven world, Big Data analytics drives a competitive edge by revealing hidden insights in the vast swathes of information entering into transactional systems, web analytics logs, and other systems used by enterprises each day.
Businesses must adapt their disaster recovery plans to reflect the mission-critical status of Big Data systems and workflows in the modern enterprise. Effective Big Data disaster recovery encompasses many aspects working together—people, policies, procedures, tools, and dedicated disaster recovery platforms. And no matter how good a plan looks, it’s vital to test it at regular intervals to ensure smooth recovery from a natural or human-induced disaster.