High availability disaster recovery

Venafi Trust Protection Platform supports a High Availability architecture that supports both fault tolerance and disaster scenarios.

Fault tolerance

Fault Tolerant clusters are achieved by having redundancy across all required Venafi services so that if any particular Trust Protection Platform server goes down, there are other Trust Protection Platform servers ready to do the same work. This is why the minimum number of recommended Trust Protection Platform servers is two, even for small deployments.

In order to achieve fault tolerance on the database layer, we recommend configuring AlwaysOn high-availability groups on your Microsoft SQL Server so that standby nodes are able to resume the work if the active node of the database goes down. See AlwaysOn high-availability disaster recovery solution.

High availability disaster recovery

In order to plan for scenarios where an entire data center may go offline, you may need to replicate your infrastructure to a secondary data center designated for disaster recovery. The Trust Protection Platform servers in this data center may need to be configured in standby mode so that they don’t attempt work over a high-latency connection to the active database node, and only activate during a disaster scenario where services are transferred to the disaster recovery data center.

Trust Protection Platform standby mode

Trust Protection Platform supports configuring each server in standby mode. When in standby mode, the Trust Protection Platform server continues run normally, except that no scheduled work is picked up. Trust Protection Platform servers in automatic standby mode will monitor the database connection for latency, and if the latency drops below a configured threshold, which would occur during a disaster scenario, then the Trust Protection Platform server will automatically activate and pick up work

Standby Mode is an advanced cluster configuration that Venafi Professional Services can help you configure. It isn’t currently available for customers to configure independently.

When Standby mode is configured on any server, the System Status Dashboard will indicate if the Standby servers are idling or active. The dashboard will also indicate the reason a Standby server went active.

In addition to monitoring database latency, Standby mode can be configured to activate based on the work queue size. If the primary Trust Protection Platform engines cannot keep up with the current work demands, the standby Trust Protection Platform servers can come out of idle and start picking up work until the work queue drops below the configured threshold.This configuration is best used for auto scaling use cases instead of automatic disaster recovery use cases, as it will only be useful if the latency to the database is low.