Operating system and database maintenance
Maintaining the operating system and database infrastructure that supports Bravura Security Fabric requires planned outages and careful coordination, particularly in replicated environments. This section covers scheduling, preparation, and recovery procedures for both database maintenance and operating system patching.
Handling maintenance outages
Maintaining a Bravura Security Fabric instance involves processes that require service outages. In replicated environments, unplanned or poorly coordinated outages can lead to data loss or configuration desynchronization.
Outages may be needed when:
Applying operating system updates on an application node or database node.
Upgrading the backend database engine.
Running database maintenance such as index rebuilding or data defragmentation.
Performing manual database propagation.
Recovering from database or network failures.
Depending on the type of maintenance, you can perform either a single-node outage or a full instance outage.
Outage types
A single-node outage takes one application node offline while other nodes continue to serve user requests. Use this for routine maintenance such as OS patching or database index rebuilding on one node at a time.
A full instance outage takes all nodes offline simultaneously. Use this for operations that require all nodes to be at the same state, such as database version upgrades or manual data resynchronization.
Multi-node considerations
Bravura Security Fabric provides application-level data and configuration replication, allowing one node to be taken down while others continue to respond to user requests. When a node is offline, the remaining nodes queue changes in their respective db\replication directories.
Warning
When free disk space on the partition hosting the replication queues reaches 10%, the product stops responding to user requests. Administrators with the "Configure replication" privilege can adjust the replication thresholds.
If upgrading the MSSQL service, note that until all replicated nodes' databases are upgraded, the Database service (iddb) warns about different database versions on the page and in idmsuite.log.
After performing maintenance and bringing a node back online, navigate to Manage the system > Maintenance > Database replication and verify that the replication queue is empty between the restored node and all other nodes.
Perform a single-node outage
Use this procedure to take a single application node offline for maintenance. Other topics in this section reference these steps.
Disable scheduled jobs. Navigate to Manage the system > Maintenance > Scheduled jobs and disable any jobs scheduled to run during the maintenance window. Also disable Bravura Security tasks in the OS task scheduler.
If any jobs have already started, wait for them to complete or close them safely.
Remove the node from the load balancer. Work with the proxy service or DNS resolver to forward requests to a static maintenance page on another web server. Otherwise, users see a "service unavailable" error if they attempt to access the product web interface.
Wait for outgoing
iddbreplication queues to subside.Stop the
iddbservice and all other Bravura Security Fabric services. To automate this in PowerShell:gsv w3svc,*_<instancename> | % {C:\Windows\System32\sc stop $_.Name}Back up the server (snapshot for VMs, disk image for bare metal) before making any configuration changes. This provides a rollback path if something goes wrong.
If upgrading MSSQL to a newer version, note that the target version may not be officially supported by Bravura Security. The database server may still work in compatibility mode. Test in a separate environment before applying in production. See Installing database and database client software for supported database versions.
Perform the required maintenance.
Restore services in reverse order:
Start
iddband the remaining non-disabled Bravura Security Fabric services, IIS, and scheduled tasks:gsv *_<instancename>,w3svc | % {C:\Windows\System32\sc start $_.Name}Monitor
idmsuite.logas the services (especiallyiddb) start up. Address any errors before continuing.Re-add the node to the load balancer.
Re-enable any scheduled jobs that were disabled in step 1.
Perform a full instance outage
Use this procedure when all nodes must be offline simultaneously, such as for a database version upgrade.
On all nodes, disable scheduled jobs and remove the nodes from the load balancer (steps 1-2 of the single-node outage procedure).
Wait for outgoing
iddbreplication queues to subside on all nodes.Stop
iddband all other Bravura Security Fabric services on all nodes. Back up each server.Perform the required maintenance.
Restore services on all nodes in reverse order (step 7 of the single-node outage procedure).
Navigate to Manage the system > Maintenance > Database replication on each node and verify that the replication queue is empty between all nodes.
See also
Replication and Recovery for detailed information on restoring nodes in a replicated environment.