Quantcast
Channel: VMware Communities : Discussion List - Backup & Recovery
Viewing all articles
Browse latest Browse all 1844

Recovering "inaccessible" VMs on datastore

$
0
0

In short, I am looking for help recovering "inaccessible" VMs.

 

Background:

My operating theory is that during maintenance on the UPS connected to our SAN rack, I accidentally tripped the power on one of three fiber channel switches. When this happened, roughly 30 VMs were in vMotion through that switch, and as a result of the power loss several datastores hosted on the SAN were disconnected and listed as inaccessible. The datastores did not reconnect to effected hosts as expected (FC connections are redundant, there was always a path from host to storage array). The roughly 30 VMs previously mentioned were unresponsive. Attempting to open VMRC for any of the affected VMs failed with error: "Unable to connect to the MKS: Could not connect to pipe \\.\pipe\vmware-authpipe within retry period." These VMs reported vmtools not running, 0 bytes allocated storage, but with static CPU load (seemed stuck at whatever the last recorded load was). No commands to the affected VMs completed successfully, most timed out or continued to run indefinitely.

 

I migrated off unaffected VMs to unaffected hosts. This process completed without issue. Afterwards, all affected hosts were restarted and successfully reconnected to the datastores. At this point, I expected the inaccessible VMs would be identified on the datastores and return to normal operation. This did not happen, and the VMs are still listed as inaccessible.

 

Brief Version/Config Info:

ESXi 5.5.0 update 3 (VMKernel Release Build 3248547)

vCenter Server Appliance 5.5.0.30500 Build 4180648

All hosts are part of a single cluster which has vSphere DRS enabled (automated)

All datastores are part of a single datastore cluster with Storage DRS enabled (automated) and Storage I/O Control, VMFS5

 

Steps Taken:

  1. Checked logs, noted "nvram write failed" and datastore timeout errors on affected hosts (read Powering on a virtual machine fails with the error: NVRAM write failure (2097213) | VMware KB but was unable to power off/on the affected VMs to allow creation of a new .nvram file)
  2. Verified files still present on datastore, noted presence of *.vmx.lck file for affected VMs (tried steps listed here: can't register/add to inventory a vm because of locked file)
  3. A colleague attempted removing the lock file, but was unsuccessful, I believe they followed this: Investigating virtual machine file locks on ESXi (10051) | VMware KB
  4. Attempted removing affected VM from inventory and re-registering with vSphere (tried steps listed here: How to register/add a VM to the Inventory in vCenter Server (1006160) | VMware KB)
  5. Attempted created new VM and attaching .vmdk from an affected VM, this process was not successful

 

Questions:

  • What other actions can I take to restore these VMs? The files exist on the datastore, but something is stopping me at every step. I am not against retrying all steps taken previously to record more specific info on why they failed if this would help.
  • None of these were backed up--this was a failure in our deployment that I intended to correct later this year. What is the recommended method to backup VMs? I have researched several third-party products but I'm leaning towards VDP (really appreciated this paper: https://www.vmware.com/files/pdf/vsphere/vmware-vsphere-data-protection-overview.pdf ),

 

Thank you for taking the time out of your day to read this. I was fortunate that the failed VMs were not critical, but one of them is important and I would like to understand why I cannot restore it from the datastore.

 

- Oliver


Viewing all articles
Browse latest Browse all 1844

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>