The purpose of this article is to detail Near Zero Downtime (NZDT) solution as to increase system availability.
The overall DC-DR design is Active-Passive across two physically separated data centers. These applications are both internet and Intranet facing. This network design includes 2 zones (Internet and Intranet) and 6 Tiers (subnets) in each Zone, namely Web, App, DB, Integration, Gateway Utility and Management.
The RPO & RTO requirement is Near Zero & 10 Minutes respectively.
Six Tier Technical Architecture, Replication Strategy and Traffic Flow
The objective of NZDT solution is to minimize overall downtime during maintenance activities by switching the application primary DC to DR in automated fashion using Control-M in less than 10 minutes. This solution would be used to perform DR Switch-Over and Swicth-Back as well.
Pre-dominantly, Maintenance activities (where NZDT solution would be used) as follows
DR Exercise
OS (Solaris & Windows) Patching
Oracle and SQL Server Database Patching
Fusion Middleware Patching
Deployments and configuration
Non-Oracle product patching (Informatica, Qliksense, Control-M, Crystal Reports)
SSL cert renewal
Performance tuning
Any kind of Maintenance tasks which demand system outage
There are Oracle and Non-Oracle Tech stacks involved, here are the various Replication strategy for different stacks/workloads.
Global Traffic Manager (Internet)
GTM solution will be used for Internet IP changes & traffic re-direction during DR switch-over & switch-back activities. There will be 2 units of GTM devices (F5-BigIp) for Internet and one device will be hosted in each DC internet GUT, replication/synchronization will be enabled between GTM devices across DC. These 2 GTM devices will be setup in Active-Active HA mode.
FQDNs related to User Traffic (Internet)
IP changes & traffic re-direction will be automated for Internet user traffic in GTM whenever switch-over/switch-back takes place across DC.
FQDNs related to User Traffic (Internet)
IP changes & traffic re-direction will be automated for Internet user traffic in GTM whenever switch-over/switch-back takes place across DC.
Global Traffic Manager (Intranet)
GTM solution will be used for Intranet IP changes & traffic re-direction during DR switch-over & switch-back activities. There will be 2 units of GTM devices (F5-BigIp) for intranet and one device will be hosted in each DC, replication/synchronization will be enabled between GTM devices, which are placed in DC1 & DC2 Intranet GUT Tier. These 2 GTM devices will be setup in Active-Active HA mode
FQDNs related to User Traffic (Intranet)
IP changes & traffic re-direction will be automated for Intranet user traffic domains in GTM whenever switch-over/switch-back takes place across DC
FQDNs related to Non-User Traffic (Intranet)
IP changes will be automated for Intranet non-user traffic domains in GSLB whenever switch-over/switch-back takes place across DC
Web Tier
2 physical F5 LTM devices factored for internet and intranet separately and they will be configured in Highly available HA mode (Active & Standby ) in each DC. No replication required across DC.
2 IIS servers will be configured in Active-Active mode in each DC. No replication required across DC across DCs as it will act as forward proxy. F5 LTM will distribute the load evenly across 2 IIS sites.
Gateway Utility Tier
1 Physical F5 GTM device will be hosted in each DC GUT tier and synchronization enabled and HA mode would be Active-Active , in total there would be 2 physical GTM devices for internet and intranet separately
App Tier
WCP/ADF No replication required at Mid-Tier and Database level across DC. These components will be up and running in both DC1 & DC2. Only application data sources (DS) will be started during DR swing.
SOA-BPM The data will be replicated at database level using Data Guard. In normal scenarios, SOA-BPM domain in DR will be in cold state and started as part of DR pre-requisites by connecting to primary database and then suspend data sources to close the connections to primary database . DR SOA-BPM data sources will be started during DR swing post oracle database switch-over.
OSB No replication required at Mid-Tier and database as OSB calls are stateless
UCM Archiver replication will be setup and configured to replicate the changes from primary to secondary DC and vice versa. As part of the UCM Archiver replication both file system level and database level synchronization will be handled. UCM services will be up and running in both DC1 & DC2
OID Multi Master replication will be setup and configured to replicate the changes related to authentication & authorization from primary to secondary DC and vice versa real-time all times, no database level replication is needed. OID service will be up and running in both DC1 & DC2 to maintain and the replication aaensure replication is on all time.
ORDS No replication required at Mid-Tier and tThe repository metadata for ORDS will be replicated between DC using data guard Data Guard. ORDS will be up and running in both DC1 & DC2 to maintain the replication at real-time.
BATCH JOBS RSYNC will be used to sync the files/scripts across DC.
RSYNC will also be used to sync any other binaries, patches, deployments across DCs periodically on a need-basis
Informatica Informatica source(Oracle) and target (SQL Server) databases will be replicated using its respective database level replication solutions (DG & SQL Always On). Any code changes at one side will be copied over to other DC via xcopy/robocopy through control-m job. Informatica software itself does not need any replication across DCs. Informatica service will be started during DR swing and data loading process will begin post DR swing as per job schedule.
Qliksense Qliksense source(SQL) databases will be replicated using its respective database replication solution (SQL Always On). Any code changes at one side will be copied over to other DC via xcopy/robocopy . Qliksense software itself does not need any replication across DCs. Qliksense service will be started during DR swing and data loading process will begin post DR swing
DB Tier
Meta Data DB (WCP, UCM, OID) No replication required at DB as it does not hold any transactional data and UCM & OID replication will be through product in-built mechanism.
BPM-SOA/ORDS Replication will be done using Data GuardData Guard. DB’s will be active (Read Write) only on primary DC at any given point of time, while others will be in standby mode (Mount State) at secondary DC
McAfee DLP DB SQL Always-On feature will be implemented to replicate the changes across DC across DCs and DLP services will be started during DR Swing.
BI MS SQL Data Warehouse SQL Always-On (AOAG) feature & replication solution will be implemented to replicate the changes across DCs
Tenant Management Tier
AD & DNS Replication AD Sync & Replication will be configured from primary to secondary DC and vice versa
Since DNS is an AD integrated, all the DNS entries will be replicated along with AD and Domain/FQDN will be same in both DCs, however, the IPs are different for each DC, so the common domains/FQDNs will be updated in /etc/hosts file in DR pointing to DR IP while DNS will have a PROD IPs, in DR, the DNS query resolution will be against to DR DNS for all unique/non-duplicate entries
The above replication solutions for each workload/stacks to ensure data completely in-sync between two data centers, so that required RPO is maintained.
To acheive RTO of less than 10 minutes, the DR Activation tasks, sequencing, dependencies, job flow created in Control-M Job Scheduler and load these ad-hoc jobs a on a need basis to trigger DR switch.