Create a Fast-Start Failover Observer Service on Linux
Oracle Data Guard protects critical database environments with exact, physical standbys, or copies, of production databases. Business continuity and disaster recovery are a matter of switching or failing over to the standby database when the primary system goes offline.
Fast-Start Failover (FSFO) automates these activities. FSFO monitors participants in Data Guard configurations and, when it detects that the primary is unavailable, performs the switch automatically. It’s more responsive and durable than manual intervention. The time it takes for a DBA to acknowledge a page, log in to the environment, assess the situation, and make the decision to switch to the standby represents lost revenue and productivity. As an automated solution, FSFO is more efficient for dealing with multiple databases, too.
The Observer is the critical component in a FSFO solution. It’s responsible for monitoring the environment, detecting events, and triggering a switch. The Observer is really just a Data Guard Broker client session. It connects to the Data Guard topology and reads the status. Ideally, the Observer (or, better still, multiple Observers) is on a dedicated machine, and not located on the primary or standby database host.
A typical Observer setup involves running a script that starts a Data Guard Broker client session, connects to the primary database, and starts the observer. The process runs in the background (and in older versions never returned control). But the Observer is monitoring mission-critical databases, and there should be some intelligence built around it. A lot could go wrong with the Observer. How do you know if it’s still running properly? If it stops, what’s the mechanism for restarting it?
So, we often see scripts with some diagnostic capabilities, called by a cron
job, checking the health and activity of the Observer.
This functionality is already built into the systemd
process on Linux systems. cron
runs on top of systemd
, and its service architecture has embedded restart features, so why not cut out the middleman and just run the Observer process as a service?
Configure an Observer Host
I’ll demonstrate this setup on Oracle Cloud Infrastructure, using an Always-Free eligible compute instance running Oracle Enterprise Linux 8. The Observer will run from an Oracle 19c Database Client home but might just as easily use a full Oracle Database installation. And, while I’m using a 19c client, the examples here will also work with older versions (looking at you, 11g) built before the “fire-and-forget” start observer in background
was added!
The first step after provisioning the VM is preparing the environment and installing the software. I used the preinstall RPM for this, just as I would for a database installation:
Next, I create directories for the Oracle software:
Client Software Installation
After switching to the oracle
user, I prepared the environment and expanded the installation archive to the ORACLE_HOME
:
Installing 19c client software on Oracle Enterprise Linux 8 (and particularly on OCI) usually produces some errors:
The following workaround—overriding the distribution ID and temporary directories—bypasses the problem:
I also created a response file so I could run the installer package in the background:
Next, I ran the installer package, using the response file:
Configure and Test Networking
I added environment values to the oracle
user’s .bashrc
file:
After starting a new session, I checked that I could reach the database hosts from the dgmgrl
command line using EZConnect:
Success!
However, I don’t want to log into the database with a password, particularly not if the connection runs as a Linux service. I created a wallet directory and configured a wallet with the database credentials:
I created an entry in my tnsnames.ora
file, and added the wallet information to the sqlnet.ora
file:
With the networking components in place, I tested a connection through dgmgrl
using the wallet and TNS alias:
It works!
Script Observer Startup
There are plenty of ways to script out startup of the Observer, depending on the database versions and Data Guard topology. Exit codes are important for a service-based process, too. Unexpected exits from the Data Guard Broker should return a non-zero value, so the service daemon recognizes and processes notifications, logging, and restart correctly. The service configuration itself will dictate some of this, too.
The following script is extremely basic. For one thing, it doesn’t query the participants to determine the primary database in the configuration. It’s also using the 11g-friendly "start observer
” rather than the newer “start observer in background
” command. You’ll want something better than this for “real” implementations, but it’s adequate to demonstrate the core objectives and isn’t cluttered with extras:
For testing, I suggest adding the -debug
flag to see additional output that’s helpful for troubleshooting the process:
The verbose messaging of -debug
shows behind-the-scenes activity:
Notice that I’m running a version 19.3 client, but the database is version 21.8. Different client versions are perfectly acceptable, but commands and features are limited to what’s available in the lowest version. I can’t run “start observer in background
” from recent client libraries against an 11g database. Likewise, an older client won’t include commands for newer databases.
Configure a Service
NOTE: The following steps for configuring and creating a service to run the Observer must run with sudo
or root
access.
Adding a service is a matter of creating a service file under /etc/systemd/system
for the service. The file name is the service name, and it must end with the .service
suffix. There are plenty of ways to write service
configurations—they can be fully self-contained or reference separate, modular components in other files. If you’re working in an environment with multiple Data Guard systems, you’ll probably set up a separate service for each, and it makes sense to use a common configuration file.
Remember, root
is managing this process, invoking the oracle
user to call the startup script, bypassing the user’s normal login process. The service needs to know all about the environment—especially the information normally set by oraenv
or the oracle
user’s shell login scripts. Configurations typically exist under /etc/sysconfig
, and mine looks like this:
I could pass additional values here, like connection strings, and that would work if there’s only one Data Guard configuration to monitor. If there are more, limiting entries in the configuration file to common, reusable values is better.
Next, I created the service definition:
Some things to notice:
The
After
entry in the[Unit]
section informssystemdctl
that this service should start after the network service. Attempting Observer connections before the network starts will fail.The
[Service]
section configures the user and group ownership for the service, and references the environment file created earlier. It also defines failure behaviors.RemainAfterExit=False
ends the service if it exits. Running thestart observer
command as a foreground process keeps the service “alive.” If the Observer’sdgmgrl
process ends, we know it failed.Restart
manages service restarts. Theon-abort
option only restarts the service if it sends an “unclean signal.” Alternatives, likeon-failure
oron-abnormal
handle different conditions. Consult with your systems administrators or read the documentation to determine what’s best for your situation.ExecStart
runs the script. Because this runs as a non-root
user, calling it as a shell command is required.
For this demonstration, I placed my script under the oracle
user’s home directory and gave the oracle
user rwx
permissions. It’s also using a wallet to connect as the SYS
user.
Don’t do that.
Remember this script will be called by root
, with elevated permissions, and anyone with access to the script could add nasty, destructive content. The same wallet that allows SYS
connections to the Data Guard Broker will work with SQL*Plus, and, provided it doesn’t exit unexpectedly, the service will run whatever’s in the script.
Protect the script in production environments by removing write permissions and locating it in a secure directory!
Start and Test the Service
Now that the service definition and configuration are created, it’s time to enable and start the service. The command sequence may be familiar to anyone who’s worked with services in the past (with the exception of the custom service name):
Even though we added the service files, the daemon doesn’t know about it until it’s refreshed with the daemon-reload
command. Only then can you enable and start the newly added service!
After starting the service, I viewed its status. In the output, I see the Observer started
message, telling me it’s running:
The Observer is running as a daemon, and should it fail, systemctl
handles restarts automatically (depending on the configuration in the Service
section). There’s no need for PID files to record the process ID, check for an existing Observer before startup, and no dependency on cron
to keep things going!
The final test for the configuration is whether it starts automatically when the system reboots:
During bootup, the service started and called the script to begin observing the Data Guard target!
Taking advantage of native Linux system controls to manage the Observer in Fast-Start Failover configurations is a more durable and reliable method—and, ultimately, simpler—than reproducing similar functionality in cron
-monitored scripts!
Observe on!