...
Info |
---|
Learn more about monitoring events, alarms and automated actions in vSphere. |
Note |
---|
Learn how to troubleshoot various issues signaled by JetStream DR alarms. |
List of JetStream DR Alarms
The following list describes conditions and methods to that trigger various JetStream DR alarms which can be helpful for troubleshooting and testing purposes.
“State” events (unmarked) and “email” events and both serve the same purpose but are reported differently by the system. State events are displayed as a banner announcement in vCenter, while email events appear under the Monitoring tab and trigger email notification, if configured.
DRVA Restarted
Reboot the DRVA VM or restart the DRVA service.
An alarm will be triggered and can be viewed from DRVA VM > Monitor > Events.
DRVA High CPU Usage Duration Exceeded
In preparation, configure the DRVA with a minimum: 4 CPU + 8GB Memory.
Open the DRVA console and enable SSHD service.
Use the top command to manage system processes.
Create multiple (duplicate) SSH sessions running the following command in each:
cat /dev/zero > /dev/null
Note |
---|
This will generate a continuous load on the system by creating an infinite loop writing zero bytes to nowhere. This operation causes high CPU usage because it is essentially an infinite loop of generating and discarding data, which keeps the CPU busy. You may need to perform this task using up to 15 to 20 SSH sessions. The alarm should trigger after about 15 minutes with CPU usage above 90%. |
DRVA High Memory Usage Duration Exceeded
Method 1
The DRVA high memory usage alarm may be triggered by conditions releated to DRVA high CPU usage (described above).
...
Create a temporary mount point:
# sudo mkdir /mnt/tmpfs
# sudo mount -t tmpfs -o size=8G tmpfs /mnt/tmpfs
Use the
dd
command to allocate required memory.The following command writes a 7.5 GB file filled with zero bytes into the
tmpfs
mount, thus consuming 3.5 GB (3500MB) of RAM.# dd if=/dev/zero of=/mnt/tmpfs/testfile bs=1M count=7500
To clean up after testing:
Reboot the DRVA, or
Unmount the created
tmpfs
mount# sudo umount /mnt/tmpfs
# rmdir /mnt/tmpfs
DR Store IO Error
Remove the replication log from a DRVA that contains an actively replicating protected domain.
The DRVA should be configured to use the replication log that gets removed for the test.
This action should trigger the alarm.
DRVA Unreachable Duration Exceeded
Power off the DRVA.
Or, disconnect the DRVA network.
This action should trigger the alarm.
DR Store Unavailable
It is not possible to trigger this error specificially. It is similar in behavior to an IO error.
Bitmap Mode ‘On’ Duration Exceeded
Create a new protected domain.
Protect multiple VMs (two or three should be sufficient).
Wait for the VMs to enter the initial sync phase.
From the DRVA Edit settings screen, disconnect the replication volume disk.
This action should trigger the alarm.
Protected Domain Recovery Failure
Method 1
This alarm condition can be triggered while performing planned failover:
...
Start continous failover.
Terminate the task from the task log.
After a period of time, the alarm should be triggered from the primary site.
Failback Interrupted Due to Issue at Failover Site
Initiate a failover.
After the failover successfully completes, open an MSA SSH session on the recovery site (where the domain has failed over).
Start the failback process and concurrently stop the VME2 service on the recovery site by issuing the command:
#service vme2 stop
This action should trigger the alarm.
After the test, the VME2 service can be restarted by issuing the command:
#service vme2 start
...
/wiki/spaces/JSKB1/pages/2996961284
Initiate a test failover at the recovery site.
As test failover is being performed, power off the MSA at the primary site.
This action should trigger the alarm.
Application Write Backpressure On
If the incoming VM network speed is high compared to the outgoing replication traffic, this can cause "backpressure" leading to the alarm being triggered.
DR Virtual Appliance Network IP Not Available
Disconnect the DRVA network.
After a period of time, the alarm should be triggered.
...
/wiki/spaces/JSKB1/pages/2997092356
Conduct a test failover and perform the steps to the point where VMs can be tested at the recovery site.
An alam message will appear in the UI of the recovery site where VMs can be tested.
Replication Log Reserved Space Running Low
Deploy a DRVA and add a replication log volume with a minimal configuration.
Create a protected domain configured with a large total estimated data size to be protected.
Set the metadata size to be greater than half the capacity of the replication log disk.
Once the protected domain is created, protect the VM.
Navigate to the replication log and change the reserved space alarm threshold to 10% (the default size is 5%).
This action should trigger the alarm.
Protected Domain Recovery Runbook Execution Failed
Create a protected domain and protect a VM that doesn’t have VMware tools installed.
If necessary, uninstall VMware tools from the VM.
Configure a runbook for Re-IP of the primary site.
Re-IP allows IP addresses of protected VMs to be changed via runbooks during failover or failback.
Initiate a failover.
After a period of time, the alarm should be triggered that can be viewed from Cluster > Monitor > Events.
VM Protection Cancelled
This condition could occur in earlier versions of JetStream DR (version 4.1.x and prior) when a protected VM undergoes a snapshot revert.
Current versions of JetStream DR software have addressed the underlying issue and it is no longer possible to create this condition or trigger this alarm.
DR Store Degraded in Multi-Pathing Mode
This condition occurs if the replication log uses an iSCSI volume that relies on multi-pathing to storage and one of the paths becomes broken or degraded.
In such case, the alarm will be triggered.
This issue is not applicable in AVS environments which do not use iSCSI multi-pathing to storage.
...
/wiki/spaces/JSKB1/pages/2997059588
Initiate any failover, failback, or restore operation.
Upon successful completion of the task, the alarm will be triggered.