Tuesday, September 8, 2009

Oracle backup replication with Oracle Streams, Data Guard & RAC

As the world’s leading database, Oracle offers a wealth of options to replicate geographically distributed systems. Each method has its own features and it is the job of the Oracle professional to choose the one that makes sense for their business requirements:

  • Data GuardData Guard standby databases provide standby databases for failover in system that do not have a high downtime cost. In a data guard failover, the last redo log must be applied (up to 20 minutes to re-open the standby database), and if the last redo cannot be recovered from the crashed instance, the most recent updates may be lost.

  • Multi-Master Replication – See the excellent book for implementing multi-master replication. Uses advanced queuing to cross-pollinate many databases with updates. Has an advantage over Data Guard because the standby database can be used for updates, and a disadvantage that replication is not instantaneous and update collisions may occur.

  • Oracle Streams – See the comprehensive book on Oracle Streams Technology by Madhu Tumma. Ready for production use in Oracle 10g, Oracle Streams allows near real-time replication and support for master-to-master replication. Oracle Streams has no licensing costs (RAC costs extra) and it is less complex to configure than a RAC database. This disadvantage over RAC is that there is no continuous availability, update collisions may occur, replication is not instant, and application failover is usually done manually.

  • Real Application Clusters – The Cadillac of Oracle continuous availability tools, RAC allows for Transparent Application Failover (TAF) and RAC is de-rigueur for systems with a high cost of downtime and continuous availability on RAC.

Most shops will combine these technologies to provide replication that is easy to manage and reliable. Let’s explore such a scenario.

Why Replicate?

Oracle has sophisticated mechanisms to manage data concurrency and replication is normally used when we have geographically distributed systems with unreliable network connectivity between systems. In a perfect world, database would not be replicated, but the nature of worldwide networking mandates that global eCommerce system have replicated database to provide fast response time worldwide.

Because the Oracle Streams product is new, Oracle professionals are only now recognizing how Streams can be used to in a master-master replication architecture. Let’s take a closer look.

Oracle Streams for master-master replication

Oracle Streams is an ideal solution for systems that are geographically distributed and have a high-speed connection (e.g. a T1 line) between servers. As long as the server interconnect can keep-up with the changes you can implement systems that provide failover and disaster recovery, simply and reliably. However Oracle Streams replication is not for every database:

  • High update systems – Busy update database may spend excessive resources synchronizing the distributed databases.

  • Real-time replication required – If you require a two-phase commit (where the changes are committed on each database at exactly the same time), then RAC or on-commit replication are a better choice. For example, a banking system would require that all databases update each other simultaneously, and Oracle Streams might not be an ideal solution.

On the other had, Oracle Streams is perfect for geographically distributed applications where real-time synchronization is not required. Let’s explore master-master replication with a hypothetical example. Assume that we have a online database serving the US Federal government which is deployed using APEX making it accessible anywhere in the world with an internet-enabled web browser. Federal agencies will connect with either the California or the Texas server, based on their APEX shortcut URL. The two systems are cross-fed updates via Oracle Streams and the Texas server will use Data Guard to replicate to standby database in Kittrell North Carolina (Figure 1).










Figure 1 – Combining Oracle Streams and Oracle data Guard

Oracle Streams will provide near real-time replication of important information and in case of a server outage, updates are automatically stored in update queues and applied automatically when service is restored to the crashed Oracle server.

Providing reliable connectivity during network outages is very challenging and sophisticated, and Oracle has several tools that will aid TCI in their goals. In case of any type of outage we see alternative connection options:

  • Interconnect is down – All end-users will continue to operate independently and when the interconnect is restored the updates queues will synchronize each database.

  • California or Texas Server down – The end-user will have three icons on their desktop, and a California end-user simply clicks on the URL for the Texas database and continues working.

  • Major Disaster – In this case the Kittrell NC server will be synchronized using Data Guard (with a possible loss of the most recent redo log updates) and end-users with access to the internet can continue to work on the standby server. Upon restoration of service, the Kittrell server will flush all updates back to the main California and Texas databases.

Establishing the replication mechanisms is only part of the task. We also need detailed checks to alert the DBA staff when an outage has occurred. The end-user will have it easier, and know to simply log in to the surviving Oracle database. Let’s look at the replication alert mechanisms.

Streams replication failure alerts

This is a critical component of the replication so that Oracle staff can be aware when there has been a server or network failure. The alerts will include checks from the Kittrell, checks between Texas and California, and checks of the replication queues. Because the connectivity checks must be done outside from Oracle, the checks can be done using Linux cron entries.

  • Ping from standby server - The Kittrell standby server will check every 15 minutes for connectivity to all servers with a “ping” command. If the servers fail to respond a pager alert will be sent to support staff.

  • California and Texas connectivity checks – Each server will cross-check each other, first checking with tnsping, then ping, and sending appropriate alerts in cases of network failure.

Creating a self-synchronizing replication architecture

Once the Oracle Streams, Data Guard and alerts are in-place, the only challenge for the Oracle DBA is ensuring that the Streams replication queues have enough room to hold updates on the surviving Oracle instance.

For more information on using Oracle Streams and Multimaster Replication, we recommend these books and links:

Streams Replication Links:







No comments: