Oracle Sean

View Original

Create a Fast-Start Failover Observer Service on Linux

Oracle Data Guard protects critical database environments with exact, physical standbys, or copies, of production databases. Business continuity and disaster recovery are a matter of switching or failing over to the standby database when the primary system goes offline.

Fast-Start Failover (FSFO) automates these activities. FSFO monitors participants in Data Guard configurations and, when it detects that the primary is unavailable, performs the switch automatically. It’s more responsive and durable than manual intervention. The time it takes for a DBA to acknowledge a page, log in to the environment, assess the situation, and make the decision to switch to the standby represents lost revenue and productivity. As an automated solution, FSFO is more efficient for dealing with multiple databases, too.

The Observer is the critical component in a FSFO solution. It’s responsible for monitoring the environment, detecting events, and triggering a switch. The Observer is really just a Data Guard Broker client session. It connects to the Data Guard topology and reads the status. Ideally, the Observer (or, better still, multiple Observers) is on a dedicated machine, and not located on the primary or standby database host.

A typical Observer setup involves running a script that starts a Data Guard Broker client session, connects to the primary database, and starts the observer. The process runs in the background (and in older versions never returned control). But the Observer is monitoring mission-critical databases, and there should be some intelligence built around it. A lot could go wrong with the Observer. How do you know if it’s still running properly? If it stops, what’s the mechanism for restarting it?

So, we often see scripts with some diagnostic capabilities, called by a cron job, checking the health and activity of the Observer.

This functionality is already built into the systemd process on Linux systems. cron runs on top of systemd, and its service architecture has embedded restart features, so why not cut out the middleman and just run the Observer process as a service?

Configure an Observer Host

I’ll demonstrate this setup on Oracle Cloud Infrastructure, using an Always-Free eligible compute instance running Oracle Enterprise Linux 8. The Observer will run from an Oracle 19c Database Client home but might just as easily use a full Oracle Database installation. And, while I’m using a 19c client, the examples here will also work with older versions (looking at you, 11g) built before the “fire-and-forget” start observer in background was added!

The first step after provisioning the VM is preparing the environment and installing the software. I used the preinstall RPM for this, just as I would for a database installation:

See this content in the original post

Next, I create directories for the Oracle software:

See this content in the original post

Client Software Installation

After switching to the oracle user, I prepared the environment and expanded the installation archive to the ORACLE_HOME:

See this content in the original post

Installing 19c client software on Oracle Enterprise Linux 8 (and particularly on OCI) usually produces some errors:

See this content in the original post

The following workaround—overriding the distribution ID and temporary directories—bypasses the problem:

See this content in the original post

I also created a response file so I could run the installer package in the background:

See this content in the original post

Next, I ran the installer package, using the response file:

See this content in the original post

Configure and Test Networking


I added environment values to the oracle user’s .bashrc file:

See this content in the original post

After starting a new session, I checked that I could reach the database hosts from the dgmgrl command line using EZConnect:

See this content in the original post

Success!

However, I don’t want to log into the database with a password, particularly not if the connection runs as a Linux service. I created a wallet directory and configured a wallet with the database credentials:

See this content in the original post

I created an entry in my tnsnames.ora file, and added the wallet information to the sqlnet.ora file:

See this content in the original post

With the networking components in place, I tested a connection through dgmgrl using the wallet and TNS alias:

See this content in the original post

It works!

Script Observer Startup

There are plenty of ways to script out startup of the Observer, depending on the database versions and Data Guard topology. Exit codes are important for a service-based process, too. Unexpected exits from the Data Guard Broker should return a non-zero value, so the service daemon recognizes and processes notifications, logging, and restart correctly. The service configuration itself will dictate some of this, too.

The following script is extremely basic. For one thing, it doesn’t query the participants to determine the primary database in the configuration. It’s also using the 11g-friendly "start observer” rather than the newer “start observer in background” command. You’ll want something better than this for “real” implementations, but it’s adequate to demonstrate the core objectives and isn’t cluttered with extras:

See this content in the original post

For testing, I suggest adding the -debug flag to see additional output that’s helpful for troubleshooting the process:

See this content in the original post

The verbose messaging of -debug shows behind-the-scenes activity:

See this content in the original post

Notice that I’m running a version 19.3 client, but the database is version 21.8. Different client versions are perfectly acceptable, but commands and features are limited to what’s available in the lowest version. I can’t run “start observer in background” from recent client libraries against an 11g database. Likewise, an older client won’t include commands for newer databases.

Configure a Service


NOTE: The following steps for configuring and creating a service to run the Observer must run with sudo or root access.

Adding a service is a matter of creating a service file under /etc/systemd/system for the service. The file name is the service name, and it must end with the .service suffix. There are plenty of ways to write service configurations—they can be fully self-contained or reference separate, modular components in other files. If you’re working in an environment with multiple Data Guard systems, you’ll probably set up a separate service for each, and it makes sense to use a common configuration file.

Remember, root is managing this process, invoking the oracle user to call the startup script, bypassing the user’s normal login process. The service needs to know all about the environment—especially the information normally set by oraenv or the oracle user’s shell login scripts. Configurations typically exist under /etc/sysconfig, and mine looks like this:

See this content in the original post

I could pass additional values here, like connection strings, and that would work if there’s only one Data Guard configuration to monitor. If there are more, limiting entries in the configuration file to common, reusable values is better.

Next, I created the service definition:

See this content in the original post

Some things to notice:

  • The After entry in the [Unit] section informs systemdctl that this service should start after the network service. Attempting Observer connections before the network starts will fail.

  • The [Service] section configures the user and group ownership for the service, and references the environment file created earlier. It also defines failure behaviors.

  • RemainAfterExit=False ends the service if it exits. Running the start observer command as a foreground process keeps the service “alive.” If the Observer’s dgmgrl process ends, we know it failed.

  • Restart manages service restarts. The on-abort option only restarts the service if it sends an “unclean signal.” Alternatives, like on-failure or on-abnormal handle different conditions. Consult with your systems administrators or read the documentation to determine what’s best for your situation.

  • ExecStart runs the script. Because this runs as a non-root user, calling it as a shell command is required.

For this demonstration, I placed my script under the oracle user’s home directory and gave the oracle user rwx permissions. It’s also using a wallet to connect as the SYS user.

Don’t do that.

Remember this script will be called by root, with elevated permissions, and anyone with access to the script could add nasty, destructive content. The same wallet that allows SYS connections to the Data Guard Broker will work with SQL*Plus, and, provided it doesn’t exit unexpectedly, the service will run whatever’s in the script.

Protect the script in production environments by removing write permissions and locating it in a secure directory!

Start and Test the Service

Now that the service definition and configuration are created, it’s time to enable and start the service. The command sequence may be familiar to anyone who’s worked with services in the past (with the exception of the custom service name):

See this content in the original post

Even though we added the service files, the daemon doesn’t know about it until it’s refreshed with the daemon-reload command. Only then can you enable and start the newly added service!

After starting the service, I viewed its status. In the output, I see the Observer started message, telling me it’s running:

See this content in the original post

The Observer is running as a daemon, and should it fail, systemctl handles restarts automatically (depending on the configuration in the Service section). There’s no need for PID files to record the process ID, check for an existing Observer before startup, and no dependency on cron to keep things going!

The final test for the configuration is whether it starts automatically when the system reboots:

See this content in the original post

During bootup, the service started and called the script to begin observing the Data Guard target!

Taking advantage of native Linux system controls to manage the Observer in Fast-Start Failover configurations is a more durable and reliable method—and, ultimately, simpler—than reproducing similar functionality in cron-monitored scripts!

Observe on!