You observe all or several of these symptoms:
MDS WRN: CS#1025 have reported IO error on pushing chunk 1cee of 'data.0', please check disksMDS ERR CS#1026 detected back storage I/O failureMDS ERR CS#1026 detected journal I/O failureMDS WRN: Integrity failed accessing 'data.0' by the client at 192.168.1.11:42356MDS WRN: CS#1025 is failed permanently and will not be used for new chunks allocation
In case an I/O error is returned by any disk, Chunk Server located on this disk is switched to the 'failed' state. Acronis Storage would not automatically recover the CS from this state, even after a Storage Node reboot.
Right after an I/O error occurs, the file system is re-mounted in the read-only mode and Acronis Storage no longer tries to allocate any data chunks on this CS. At the same time, if the drive is still available for reading, Acronis Storage tries to replicate all the chunks out of it.
The following workflow is recommended to troubleshoot the issue:
How to find the affected node and drive with WebCPIn the left menu, go to Nodes and click the node marked as Failed. Note the name of this node. Click Disks and find the disk marked as Failed. Note the device name for this disk (for example, SDC on this screenshot):
How to find the affected disk with SSH and CLILog in to any node of the Acronis Storage cluster with SSH.
Issue the following command:vstorage -c <cluster_name> stat | grep failed
[root@ ~]# vstorage -c PCKGW1 stat | grep failedconnected to MDS#2CS nodes: 6 of 6 (5 avail, 0 inactive, 0 offline, 1 out of space, 1 failed), storage version: 122 1026 failed 98.2GB 0B 6 2 0% 0/0 0.0 172.29.38.210 7.5.111-1.as7
Note CS ID displayed in the first column (1026 in the example above) and the IP address of the node where CS is located (172.29.38.210 in the example above).
Log in to the affected node.
To determine the disk where the affected CS is located, use following command:vstorage -c <cluster_name> list-services
[root@PCKGW1 ~]# vstorage -c PCKGW1 list-servicesTYPE ID ENABLED STATUS DEVICE/VOLUME GROUP DEVICE INFO PATHCS 1025 enabled active  /dev/sdd1 VMware Virtual disk /vstorage/df218335/csCS 1026 enabled active  /dev/sdc1 VMware Virtual disk /vstorage/12bb6baf/csMDS 1 enabled active  /dev/sdb1 VMware Virtual disk /vstorage/38b5fb92/mds
TYPE ID ENABLED STATUS DEVICE/VOLUME GROUP DEVICE INFO PATH
CS 1025 enabled active  /dev/sdd1 VMware Virtual disk /vstorage/df218335/cs
CS 1026 enabled active  /dev/sdc1 VMware Virtual disk /vstorage/12bb6baf/cs
MDS 1 enabled active  /dev/sdb1 VMware Virtual disk /vstorage/38b5fb92/mds
In the ID column find CS with the ID you have noted on the previous step. Note Device/volume for this CS and its path (see PATH column). The PATH column is useful than you need to review the log file for given CS. Log file will be located at PATH/logs (/vstorage/12bb6baf/cs/logs for the example above).
The ultimate goal of this step is to collect information required to make a decision whether it is possible to continue using the affected disk, or whether it should be replaced.
The following information should be reviewed and analyzed for any data related to the issue:
3. Decide if device needs replacement
Depending on the physical storage type (directly attached JBOD, iSCSI LUN, Fibre channel etc.) and particular circumstances, exact error messages and patterns vary greatly.
Here are some rules of thumb to facilitate decision-making process:
If it is decided to reuse the same CS on the same drive, follow the steps below:
vstorage -c <cluster_name> rm-cs -U <CSID>
vstorage -c <cluster_name> stat | grep <CSID>
Zendesk Theme Designed by Diziana
Copyright © Diziana. All Rights Reserved