Database mirroring is a fantastic feature. It allows for a high safety and performance-capable copy of a database in another physical location. This feature provides a level of failure tolerance on the database which flows to the disk level. Of course, there is a specific failure point still remaining, which Availability Groups provides – server-level failure tolerance. With the combination of mirroring technology and Windows Server Failover Clustering, a level of availability is reached far beyond what mirroring alone provided. Further diving into availability levels, Failover Clustered Instances in conjunction with Windows Server Failover Clustering and Availability Groups, provides a level that is truly a five nines achievable metric.
Secondary Processes
With clustering, there has always been the hardship or complication of ensuring all critical tasks or underlying processing is configured to continue in the event of a failure on the primary node. Many times, great lengths are taken to ensure these tasks continue to run on the secondary instances. The SQL Server Agent and Jobs are a primary example. Since system databases are not available in available solutions as described earlier, the objects we retain in them, like the MSDB and Agent jobs data, has to take on a different process to ensure they continue on schedule when a primary node is lost.
The task of synchronizing the jobs is among the first steps that need to be taken. This can be automated with SMO and other scripting efforts fairly easily. If the task is found to be a manual effort, Jonathan Kehayias of SQLskills has gone to great lengths to provide an add-in to simplify the task. Once the jobs are synchronized between replicas, the problem still persists that, in the case of a failure on the primary node, the jobs on the secondary have to be enabled to start functioning. To further complicate the situation, some jobs might not function the same and should stay disabled.
Automate Enabling Jobs on Failure with Mirroring
In order to manage SQL Server Agent Jobs in the event of a failure, a customized solution can be implemented. This is only one of many techniques that can be implemented, but this technique has been proven in real failover situations and can be a highly recommended option when developing solutions.
As with any customized process, the process is only as good as the effort put into designing stages. Understanding and identifying the flow is a part of the design that should not be avoided. In the task implemented in this case, the flow is based on a monitoring job and has two fundamental checks; check if the mirror state has changed on the mirror, and check if the mirror state has changed from a principal back to a mirror. Both of these steps indicate if jobs should be enabled or disabled.
In order to maintain a result set check and an efficient way of validating the flow, a metadata table is created in a database designed to retain administrative objects. The table creation statement is shown below.
CREATE TABLE [dbo].[Mirroring_Jobs](
[JobName] [nvarchar](128) NOT NULL,
[RunStatus] [tinyint] NULL,
[Active] [bit] NULL,
[DatabaseName] [nvarchar](128) NULL,
[Role] [nvarchar](10) NULL,
CONSTRAINT [PK_Mirroring_Jobs] PRIMARY KEY CLUSTERED
(
[JobName] ASC
)
The above table is designed to manage each row as a job. Each job has the settings capable of dictating if the job is currently active on the primary instance (Active), if the current status of the job is enabled on the primary (RunStatus), if the job in this table and where the administrative database is located is on the mirror or principal (Role) and then the database the job executes within the context of.
A populated view of this table is shown below, containing three jobs – Full Backup, Differential Backup and Log Backup.
In the setup so far, we are creating a table that retains job information on the mirror only. This is protection in the event of a principal is in failure, high safety is set or a manual failover is initiated, and the mirror becomes the principal.
This is illustrated below with the left server acting as the principal and the right, mirror, retaining the table Mirroring_Jobs. Each server has an active SQL Server Agent, however, the three jobs in the Mirroring_Jobs table are enabled on the principal and disabled on the mirror. This is due to the jobs entering a failure state if they attempted to back up the mirror databases while they are in synchronized state or recovering.
In order to actively monitor the mirror state of this session, another job is created to run every 10 seconds. Note: base the interval to run the monitoring job based on the interval in which the heartbeat between the mirror and principal has been set at. The monitoring job should be created on the witness server or, at least, the mirror instance. The monitoring job should never exist on the principal as it will fail and potentially not initiate the task of enabling the jobs on the mirror.
In order to maintain an object that is easily updated and retained in the administrative database, a procedure is created that is executed from the monitoring job. The procedure has the same process as the flow chart earlier discussed. The primary task of the procedure is to first check if the state of the mirroring session has changed for the mirror databases. If it has changed and the roles are set for mirror, the jobs will be enabled by calling the sp_update_job procedure in the MSDB database. If the roles are set as mirror and the databases are found to be in a mirroring state and the jobs are enabled, the jobs will be set to disabled. This will protect against a mirror failover being reset back to the originating principal.
CREATE PROCEDURE [dbo].[dba_CheckMirrorForJobs]
AS
DECLARE @Loop INT = 1
DECLARE @CMD NVARCHAR(1500) = ''
SELECT
ROW_NUMBER() OVER (ORDER BY (SELECT 1)) ROWID,
DB_NAME(a.database_id) dbname, a.mirroring_role_desc, a.mirroring_state_desc,
b.jobName, b.RunStatus, b.Active, b.DatabaseName, b.[Role]
INTO #Review
FROM sys.database_mirroring a
JOIN Mirroring_Jobs b on DB_NAME(a.database_id) = b.DatabaseName
WHERE mirroring_guid IS NOT NULL
IF EXISTS(SELECT 1 FROM #Review WHERE mirroring_role_desc = 'PRINCIPAL' AND [Role] = 'MIRROR' AND RunStatus = 1 AND Active = 1)
BEGIN
WHILE (@Loop <= (SELECT COUNT(*) FROM #Review WHERE mirroring_role_desc = 'PRINCIPAL' AND [Role] = 'MIRROR' AND RunStatus = 1 AND Active = 1))
BEGIN
SET @CMD = 'EXEC msdb.dbo.sp_update_job @job_name = N''' + (SELECT jobName FROM #Review WHERE ROWID = @Loop) + ''',@enabled = 1'
EXEC (@CMD)
SET @Loop += 1
END
END
SET @Loop = 1
IF EXISTS(SELECT 1 FROM #Review WHERE mirroring_role_desc = 'MIRROR' AND [Role] = 'MIRROR' AND RunStatus = 1 AND Active = 1)
BEGIN
WHILE (@Loop <= (SELECT COUNT(*) FROM #Review WHERE mirroring_role_desc = 'MIRROR' AND [Role] = 'MIRROR' AND RunStatus = 1 AND Active = 1))
BEGIN
IF EXISTS (SELECT 1 FROM msdb..sysjobs WHERE name = (SELECT jobName FROM #Review WHERE ROWID = @Loop) AND enabled = 1)
BEGIN
SET @CMD = 'EXEC msdb.dbo.sp_update_job @job_name = N''' + (SELECT jobName FROM #Review WHERE ROWID = @Loop) + ''',@enabled = 0'
EXEC (@CMD)
END
SET @Loop += 1
END
END
DROP TABLE #Review
As we can see, the coding behind this procedure is easy to follow and has two primary logical conditions behind it – disable or enable based on the state.
Testing a failover
Following the design described up until now will implement a solution that will enable and disable jobs on a mirror database server based on the state of the mirroring session. The next critical step in this design is to test by initiating a failover.
Below is the current state of a database server acting as a mirror
As shown, the MIR_TEST database is actively the mirror and synchronized from another server. The Mirroring_Jobs table has been set so the three jobs shown are set as the mirror.
We can also review the jobs and the state of them being disabled by querying the MSDB..SYSJOBS table
To test this solution, initiating a failover from the principal to the mirror should be performed. To do this, we can use the mirroring properties page in the database properties page of the principal
Once the failover has been initiated, we can query the mirror server to review the state of the mirror as well as the state of the jobs.
As shown, the jobs were enabled and we’ve successfully retained the job’s functionality.
Summary
As we’ve shown, with a small amount of effort and utilization of the views and capability of monitoring the state of a mirroring session or replica state, we can actively make decisions on when to enable or disable jobs that are on other servers joined in a protective session such as mirroring or Availability Groups. This method allows for another level of availability that allows us to achieve goals that should be set on how protected databases are while retained on SQL Server.