Operating system and database maintenance

Maintaining the operating system and database infrastructure that supports Bravura Security Fabric requires planned outages and careful coordination, particularly in replicated environments. This section covers scheduling, preparation, and recovery procedures for both database maintenance and operating system patching.

Handling maintenance outages

Maintaining a Bravura Security Fabric instance involves processes that require service outages. In replicated environments, unplanned or poorly coordinated outages can lead to data loss or configuration desynchronization.

Outages may be needed when:

Applying operating system updates on an application node or database node.
Upgrading the backend database engine.
Running database maintenance such as index rebuilding or data defragmentation.
Performing manual database propagation.
Recovering from database or network failures.

Depending on the type of maintenance, you can perform either a single-node outage or a full instance outage.

Outage types

A single-node outage takes one application node offline while other nodes continue to serve user requests. Use this for routine maintenance such as OS patching or database index rebuilding on one node at a time.

A full instance outage takes all nodes offline simultaneously. Use this for operations that require all nodes to be at the same state, such as database version upgrades or manual data resynchronization.

Multi-node considerations

Bravura Security Fabric provides application-level data and configuration replication, allowing one node to be taken down while others continue to respond to user requests. When a node is offline, the remaining nodes queue changes in their respective db\replication directories.

Warning

When free disk space on the partition hosting the replication queues reaches 10%, the product stops responding to user requests. Administrators with the "Configure replication" privilege can adjust the replication thresholds.

If upgrading the MSSQL service, note that until all replicated nodes' databases are upgraded, the Database service (iddb) warns about different database versions on the Database replication page and in idmsuite.log.

After performing maintenance and bringing a node back online, navigate to Manage the system > Maintenance > Database replication and verify that the replication queue is empty between the restored node and all other nodes.

Perform a single-node outage

Use this procedure to take a single application node offline for maintenance. Other topics in this section reference these steps.

Disable scheduled jobs. Navigate to Manage the system > Maintenance > Scheduled jobs and disable any jobs scheduled to run during the maintenance window. Also disable Bravura Security tasks in the OS task scheduler.
If any jobs have already started, wait for them to complete or close them safely.
Remove the node from the load balancer. Work with the proxy service or DNS resolver to forward requests to a static maintenance page on another web server. Otherwise, users see a "service unavailable" error if they attempt to access the product web interface.
Wait for outgoing iddb replication queues to subside.
Stop the iddb service and all other Bravura Security Fabric services. To automate this in PowerShell:
```
gsv w3svc,*_<instancename> | % {C:\Windows\System32\sc stop $_.Name}
```
Back up the server (snapshot for VMs, disk image for bare metal) before making any configuration changes. This provides a rollback path if something goes wrong.
If upgrading MSSQL to a newer version, note that the target version may not be officially supported by Bravura Security. The database server may still work in compatibility mode. Test in a separate environment before applying in production. See Installing database and database client software for supported database versions.
Perform the required maintenance.
Restore services in reverse order:
1. Start iddb and the remaining non-disabled Bravura Security Fabric services, IIS, and scheduled tasks:
```
gsv *_<instancename>,w3svc | % {C:\Windows\System32\sc start $_.Name}
```
2. Monitor idmsuite.log as the services (especially iddb) start up. Address any errors before continuing.
3. Re-add the node to the load balancer.
4. Re-enable any scheduled jobs that were disabled in step 1.

Perform a full instance outage

Use this procedure when all nodes must be offline simultaneously, such as for a database version upgrade.

On all nodes, disable scheduled jobs and remove the nodes from the load balancer (steps 1-2 of the single-node outage procedure).
Wait for outgoing iddb replication queues to subside on all nodes.
Stop iddb and all other Bravura Security Fabric services on all nodes. Back up each server.
Perform the required maintenance.
Restore services on all nodes in reverse order (step 7 of the single-node outage procedure).
Navigate to Manage the system > Maintenance > Database replication on each node and verify that the replication queue is empty between all nodes.

See also

Database maintenance
Operating system patch management
Replication and Recovery for detailed information on restoring nodes in a replicated environment.

In this section: