Sean Scott is an Oracle ACE with over 25 years experience IN Oracle technologies

Check for Services Not Running on Preferred Nodes

Check for Services Not Running on Preferred Nodes

In RAC, services can be “pinned” to specific nodes. This allows DBAs to control activity on individual nodes and even force different types of activity to run on a particular node. For example, in a three node cluster I might configure a transactional service to hit nodes one and two, and a reporting service to connect to node three, and even configure the instances differently to accommodate different types of queries. (It’s a technique that can also be used to reduce cache fusion waits and interconnect traffic by funneling users to nodes based on partition use, but that’s a topic for another day.)

What happens if I need to perform maintenance on node three? When I take node three down, CRS relocates my service to one of the surviving nodes, but when I bring node three back up the service doesn’t get relocated. This can lead to load and performance issues and administrators need to be alerted.

The following script checks services against their preferred instances and reports when there’s a mismatch.

#!/bin/bash

error()
{
  echo "$@" 1>&2
  exit 1
}

usage()
{
  version
  echo "Usage: "
  echo "$PROGRAM [-a -d -h -r -v]"
  echo " "
  echo "General Options: "
  echo "  -a                   Report all services. "
  echo "  -d                   Show databases that are down. "
  echo "  -D                   Show disabled services. "
  echo "  -h                   Help. "
  echo "  -r                   Report database status for misaligned services. "
  echo "  -v                   Show version. "
  echo " "
}

usage_and_exit()
{
  usage
  exit $1
}

version()
{
  echo " "
  echo "$PROGRAM version $VERSION"
  echo " "
}

version_and_exit()
{
  version
  exit $1
}

sort_instances()
{
  # Read and sort a comma-delimited list
  IFS=',' read -r -a array <<< "$1"
  for i in "$"
  do
     echo "$"
  done | sort | xargs -n"$


# Set up the environment. Define default values here.
PROGRAM=`basename $0`
VERSION=1.0
basename=check_services
err=
report_all=
report_disabled=
report_down=
report_status=
environment=$(hostname -d | cut -d. -f2)

# Get command line arguments:
while getopts "a; d; D; h; r; v; ?;" opt
do
  case $opt in
    a                ) report_all=1        ;;
    d                ) report_down=1       ;;
    D                ) report_disabled=1   ;;
    h                ) usage_and_exit 0    ;;
    r                ) report_status=1     ;;
    v                ) version_and_exit 0  ;;
    '?'              ) usage_and_exit 0    ;;
  esac
done

# Find the oratab file:
  if [ -f "/etc/oratab" ]
then ORATAB=/etc/oratab
elif [ -f "/var/opt/oracle/oratab" ]
then ORATAB=/var/opt/oracle/oratab
else error "No oratab file found"
fi

# Get the list of ORACLE_HOMEs on the host. Running srvctl from an incompatible home
# can throw an error.
for ORACLE_HOME in $(egrep -v '^$|^#|^\+' $ORATAB | cut -f2 -d: | sort | uniq)
 do

    export ORACLE_HOME=$ORACLE_HOME

    # Get the list of databases running on the host by running srvctl from each ORACLE_HOME
    for ORACLE_SID in $($ORACLE_HOME/bin/srvctl config database)
     do

        # Is there a running instance to match the SID?
        if [ $($ORACLE_HOME/bin/srvctl status database -db $ORACLE_SID | grep -i "is running" | wc -l) -gt 0 ]
        then

             # Get a list of services for each database and their preferred node
             while read service_name service_state node_list
             do

                # Only check for preferred, running instances when there's a configured service
                # and only if the service_state is enabled.
                if [ ! -z "$service_name" ]
                then

                     # Get the sorted list of preferred instances
                     preferred_instance=$(sort_instances $node_list)
                     service_status=$($ORACLE_HOME/bin/srvctl status service -db $ORACLE_SID -service $service_name)

                     # Create language to identify disabled services.
                     if [ "$service_status" == "enabled" ]
                     then service_state=""
                     else service_state=" is disabled and"
                     fi

                     # Check to see if the service is running
                     if [ "$(echo $service_status | grep -i "is not running")" ] && [ "$service_status" == "enabled" -o ! -z "$report_disabled" ]
                     then echo "Service $service_name on database $ORACLE_SID is$service_state not running. It is configured to run on instance(s) $preferred_instance"
                          err=1

                     # Get the sorted list of running instances
                     else running_instance=$(sort_instances $(echo $service_status | awk '{print $NF}'))

                          # Compare the preferred and running instance lists
                          if [ ! "$preferred_instance" = "$running_instance" ]
                          then echo "Service $service_name on database $ORACLE_SID is running on instance(s) $running_instance. It is configured to run on instance(s) $preferred_instance"
                               err=1
                          else
                               if [ ! -z "$report_all" ]
                               then echo "Service $service_name on database $ORACLE_SID is running as configured on instance(s) $running_instance."
                               fi
                          fi
                     fi
                fi

             done <<< "$($ORACLE_HOME/bin/srvctl config service -db $ORACLE_SID | egrep -i "^service name|^preferred instances|^Service is" | xargs -n3 -d'\n' | awk '{print $3, $6, $NF}')"

        # The instance is not running:
        else
             if [ ! -z "$report_down" ]
             then echo "Database $ORACLE_SID is not running."
             fi
        fi

        # If there was a misaligned service and status reporting is enabled, show database status
        if [ ! -z "$err" -a ! -z "$report_status" ]
        then $ORACLE_HOME/bin/srvctl status database -db $ORACLE_SID
        echo " "
        err=
        fi

    done # End of ORACLE_SID loop

done # End of ORACLE_HOME loop

Breaking it down

sort_instance function

This takes a (potentially) comma-delimited list of instances, passes it to an array, sorts the elements and then recombines them onto a single line. This is needed because a service can have multiple preferred nodes and they may not be listed in the same order as the running nodes. Sorting both allows the script to compare lists of multiple services.

Find the oratab file

This script will run without needing any environment settings. It gets the instance name(s) and ORACLE_HOME value(s) needed from the oratab file.

ORACLE_HOME loop

The output of srvctl may be inaccurate or produce an error if a host has more than one ORACLE_HOME, particularly if they’re different versions. This section identifies each unique home for non-ASM databases.

ORACLE_HOME needs to be set in the environment for srvctl to run properly.

ORACLE_SID loop

I opted to use srvctl config database to generate a database list because it produces the database name needed to run additional srvctl commands, without having to connect to the database.

Services Loop

The while loop here may not be intuitive. It uses a here document to pass three values to the loop:

<<< "$($ORACLE_HOME/bin/srvctl config service -db $ORACLE_SID | egrep -i "^service name|^preferred instances|^Service is" | xargs -n3 -d'\n' | awk '{print $3, $6, $NF}')"

This gets the list of services, preferred instances, and service status and coalesces them into a single line, passing the service name, status, and preferred instance list to the loop.

If you’re wondering why I’m doing it this way, versus piping the output into the while loop… I want to capture a flag to identify if a service isn’t running on a preferred instance and potentially provide more information on the instances. I may have multiple misconfigured services but I don’t want to produce information on the database state for each, instead generating the database status after. Piping the output to the while loop forks the process and the value of the flag variable, err, is lost when the loop exits. Using a here document preserves the variable.

The remainder of the script should be pretty straightforward. It gets and sorts the instances where a service is running and compares it to the sorted list of preferred instances.

Flags

Output is controllable via a handful of flags.

  • -a shows all services, whether they’re “right” or “wrong” The default is to show only services that aren’t running where expected.

  • -d reports databases that are down. The default is to skip databases that are down.

  • -D reports on disabled services that are running. The default skips services marked as disabled, whether running or not.

  • -r displays the database status if any of its services are misconfigured. The default is to show only the service information.

I hope you’ll find this useful! I’ve tested this against 12.1 and 12.2 databases. Let me know in the comments if you have ideas for enhancements or encounter issues on different database versions.

Oracle 19c rpm Install on OCI Free Compute Instance

Oracle 19c rpm Install on OCI Free Compute Instance

The Importance of Monitoring inodes in OCI

The Importance of Monitoring inodes in OCI