Disaster Recovery Plan

Overview

This document discusses Incorta High Availability and Disaster Recovery architectures at a high level. The intended audience are the infrastructure and operations teams who need to understand how Incorta is installed and maintained.

Architecture Goals

  • Functioning Primary system in case of node failures
  • Switching to DR system in case of site failures

Architecture Considerations

The following is a list of architecture principles to be used for the Incorta DR Architecture

  • DR site will be in a different Data Center than the Primary Site
  • There will be at least one DR site for the Incorta solution
  • The Primary and DR sites can have different time zones
  • Data located on Primary site is replicated to DR site asynchronously.
  • Some manual steps are expected to switch to DR servers.

Incorta HA Architecture

Architecture

Figure 1: Incorta High Level Architecture
A Typical Incorta High Availability architecture consists of

  • – Incorta cluster with at least 2 nodes
  • – Zookeeper Ensemble with 3 nodes
  • – Database Cluster
  • – Shared Storage
  • – Spark Cluster (Optional)

The High Availability architecture deals with individual node failures and does not take care of disasters where a whole site fails. The following figure illustrates the various components in detail within a High Availability architecture.

Architecture

Figure 1: Incorta High Level Architecture
This sample HA Architecture for the primary site consisting of the following:

  • Incorta cluster with 2 nodes
    • Both the nodes will be in sync.
  • Zookeeper cluster with 3 nodes
    • Zookeeper is used to coordinate Incorta and Spark nodes
  • Shared Storage
    • Stores the Extracted data
  • DB Cluster
    • Stores key metadata
  • Spark cluster with 2 nodes
    • Spark is optional and used for complex transformations

One half of the cluster consist of Incorta Node-1 , Spark Node-1 and Zookeeper Node-1 and resides on an ESX Server 1. The other half of the cluster consist of Incorta Node-2, Spark Node-2 and Zookeeper Node-2 and resides on ESX Server 2. Since Zookeeper Ensemble requires at least 3 nodes, the third zookeeper node can be placed on any small VM. Metadata database should also be highly available. It can be a MySQL or Oracle cluster.

In case of individual node failures on any of the ESX servers, the backup nodes on the other server will still be available to keep Incorta functioning.

Disaster Recovery Solution
There are various solutions to enable Disaster Recovery. The following architecture uses duplication of the primary site High Availability architecture to a Disaster Recovery site.

Architecture

Figure 3: High Level DR Architecture
DR Architecture involves replication of two key components from Primary Site to DR Site. – Incorta Tenant data stored in Parquet and Snapshot locations
– Incorta Metadata database (MySQL or Oracle)

Architecture

Figure 4: Replication
The above diagram illustrates the replication of metadata database and the contents of shared storage from primary site to disaster recovery site. Metadata database is a lightweight database and is used to hold dictionary information related to Incorta. It can MySQL or Oracle.

Shared storage is used to store the actual user data extracted from source systems. In case of total primary site failure, Incorta on the Disaster Recovery site should be started. Since the actual data and the metadata is replicated from primary site to DR site, Incorta will be up and running. If the replication process is near real time then there will be no loss of data.
Even if there is negligible loss of data, you can run full refresh on schemas/tables to bring them up to date.