JetStream DR Alarm Triggers
Alarms are notifications that are activated in response to an event, a set of conditions, or the state of an inventory object. An alarm definition consists of the following elements in the vSphere Client:
Name and description – Provides an identifying label and description.
Targets – Defines the type of object that is monitored.
Alarm Rules – Defines the event, condition, or state that triggers the alarm and defines the notification severity. It also defines operations that occur in response to triggered alarms.
Last modified – The last modified date and time of the defined alarm.
JetStream DR alarms have the following severity levels:
Info/Normal (or unknown)
Warning
Error/Critical
Alarm definitions are associated with the object selected in the inventory. An alarm monitors the type of inventory objects specified in its definition.
List of JetStream DR Alarms
The following list describes conditions and methods that trigger various JetStream DR alarms which can be helpful for troubleshooting and testing purposes.
“State” events (unmarked) and “email” events and both serve the same purpose but are reported differently by the system. State events are displayed as a banner announcement in vCenter, while email events appear under the Monitoring tab and trigger email notification, if configured.
DRVA Restarted
Reboot the DRVA VM or restart the DRVA service.
An alarm will be triggered and can be viewed from DRVA VM > Monitor > Events.
DRVA High CPU Usage Duration Exceeded
In preparation, configure the DRVA with a minimum: 4 CPU + 8GB Memory.
Open the DRVA console and enable SSHD service.
Use the top command to manage system processes.
Create multiple (duplicate) SSH sessions running the following command in each:
cat /dev/zero > /dev/null
This will generate a continuous load on the system by creating an infinite loop writing zero bytes to nowhere. This operation causes high CPU usage because it is essentially an infinite loop of generating and discarding data, which keeps the CPU busy.
You may need to perform this task using up to 15 to 20 SSH sessions. The alarm should trigger after about 15 minutes with CPU usage above 90%.
DRVA High Memory Usage Duration Exceeded
Method 1
The DRVA high memory usage alarm may be triggered by conditions releated to DRVA high CPU usage (described above).
The following command can be used to write and discard 4G of memory.
dd if=/dev/zero of=/dev/null bs=4G count=1
Repeating this command can produce a spike in memory usage for testing alarm conditions.
Method 2
An alternate method is to consume the defined memory to the point the alarm is triggered.
Create a temporary mount point:
# sudo mkdir /mnt/tmpfs
# sudo mount -t tmpfs -o size=8G tmpfs /mnt/tmpfs
Use the
dd
command to allocate required memory.The following command writes a 7.5 GB file filled with zero bytes into the
tmpfs
mount, thus consuming 3.5 GB (3500MB) of RAM.# dd if=/dev/zero of=/mnt/tmpfs/testfile bs=1M count=7500
To clean up after testing:
Reboot the DRVA, or
Unmount the created
tmpfs
mount# sudo umount /mnt/tmpfs
# rmdir /mnt/tmpfs
DR Store IO Error
Remove the replication log from a DRVA that contains an actively replicating protected domain.
The DRVA should be configured to use the replication log that gets removed for the test.
This action should trigger the alarm.
DRVA Unreachable Duration Exceeded
Power off the DRVA.
Or, disconnect the DRVA network.
This action should trigger the alarm.
DR Store Unavailable
It is not possible to trigger this error specificially. It is similar in behavior to an IO error.
Bitmap Mode ‘On’ Duration Exceeded
Create a new protected domain.
Protect multiple VMs (two or three should be sufficient).
Wait for the VMs to enter the initial sync phase.
From the DRVA Edit settings screen, disconnect the replication volume disk.
This action should trigger the alarm.
Protected Domain Recovery Failure
Method 1
This alarm condition can be triggered while performing planned failover:
Initiate planed failover from the recovery site.
While failover is in progress, shut down the primary MSA.
After a period of time, the alarm should be triggered from the primary site.
Method 2
This alarm condition can be triggered while performing continuous failover:
Start continous failover.
Terminate the task from the task log.
After a period of time, the alarm should be triggered from the primary site.
Failback Interrupted Due to Issue at Failover Site
Initiate a failover.
After the failover successfully completes, open an MSA SSH session on the recovery site (where the domain has failed over).
Start the failback process and concurrently stop the VME2 service on the recovery site by issuing the command:
#service vme2 stop
This action should trigger the alarm.
After the test, the VME2 service can be restarted by issuing the command:
#service vme2 start
Protected Domain Test Failover Failed
Initiate a test failover at the recovery site.
As test failover is being performed, power off the MSA at the primary site.
This action should trigger the alarm.
Application Write Backpressure On
If the incoming VM network speed is high compared to the outgoing replication traffic, this can cause "backpressure" leading to the alarm being triggered.
DR Virtual Appliance Network IP Not Available
Disconnect the DRVA network.
After a period of time, the alarm should be triggered.
Test Failover Site Ready
Conduct a test failover and perform the steps to the point where VMs can be tested at the recovery site.
An alam message will appear in the UI of the recovery site where VMs can be tested.
Replication Log Reserved Space Running Low
Deploy a DRVA and add a replication log volume with a minimal configuration.
Create a protected domain configured with a large total estimated data size to be protected.
Set the metadata size to be greater than half the capacity of the replication log disk.
Once the protected domain is created, protect the VM.
Navigate to the replication log and change the reserved space alarm threshold to 10% (the default size is 5%).
This action should trigger the alarm.
Protected Domain Recovery Runbook Execution Failed
Create a protected domain and protect a VM that doesn’t have VMware tools installed.
If necessary, uninstall VMware tools from the VM.
Configure a runbook for Re-IP of the primary site.
Re-IP allows IP addresses of protected VMs to be changed via runbooks during failover or failback.
Initiate a failover.
After a period of time, the alarm should be triggered that can be viewed from Cluster > Monitor > Events.
VM Protection Cancelled
This condition could occur in earlier versions of JetStream DR (version 4.1.x and prior) when a protected VM undergoes a snapshot revert.
Current versions of JetStream DR software have addressed the underlying issue and it is no longer possible to create this condition or trigger this alarm.
DR Store Degraded in Multi-Pathing Mode
This condition occurs if the replication log uses an iSCSI volume that relies on multi-pathing to storage and one of the paths becomes broken or degraded.
In such case, the alarm will be triggered.
This issue is not applicable in AVS environments which do not use iSCSI multi-pathing to storage.
Protected Domain Recovery Complete (Failover/Failback/Restore)
Initiate any failover, failback, or restore operation.
Upon successful completion of the task, the alarm will be triggered.