Opened 9 years ago
Last modified 5 years ago
#15374 new defect
Virtual HDD becomes unavailable for guest : with AHCI#0: Port x reset
Reported by: | sylvain Gplservice | Owned by: | |
---|---|---|---|
Component: | other | Version: | VirtualBox 5.0.14 |
Keywords: | AHCI port reset hdd unavailable | Cc: | |
Guest type: | Linux | Host type: | Linux |
Description
It looks a like #9975 in symptoms but with linux/linux host/guest and no windows involved. Also errors logs are not as much verbose.
The guest runs find for a (what looks like) random duration ranging from a few weeks to one year then without clear causes one virtual disk becomes unavailable and creates sda i/o errors on guest with one only line in the Vbox.log file :
1900:51:41.454153 AHCI#0: Port 2 reset
A kill $pid + VboxManage startvm restores every thing to normal until next time.
I have seen this behavior on virtualbox version back to 4.3.6r91406 and up to 5.0.14. When the crash occurs, system load on host isn't particularly high, i/o aren't particularly high neither on host or guest. I've seen it on SATA rotative disks in RAID 1 and 5, SATA SSD disks raid 1
Guests linux kernels ranges from 2.6.18 up to 3.2.x Guest hosts are Linux debian 6.0 7.0 or 8.0 All vdi files are used as SATA controled drives
I keep a log of crashed VMs to try discovering common criteria to narrow it down, but nothing really jump to the eye beside :
- low load/low i/o VMs don't seam to crash
- On some host server with different hardware I never had any crash
I know my report is quite poor in details, but I'll try to keep it updated. During the mean time is there a way to make those "virtual disk reset" more verbose ?
Attachments (4)
Change History (12)
by , 9 years ago
Attachment: | debian-start added |
---|
by , 9 years ago
comment:1 by , 9 years ago
please ignore or remove "debian-start" attachement, it's a mistake I can't seam to revert myself
comment:3 by , 9 years ago
Doing a "VBoxManage debugvm <VM name> info ahci0" might be helpful too as it gives a quick overview about the state of the AHCI controller emulation.
comment:4 by , 9 years ago
I have restarted my VMs after issuing :
$ echo -n 1 > /proc/sys/fs/suid_dumpable
I'll wait (might take a few months !) until it happens again and then I'll provide a core dump and the output of "VBoxManage debugvm <VM name> info ahci0"
comment:5 by , 9 years ago
The crash append on one of the VM and here is the output of VBoxManage debugvm <VM name> info ahci0
However, there isn't anywhere a dumped "core" file. The procedure at https://www.virtualbox.org/wiki/Core_dump explains how to create one when running VM with following command : $ "virtualbox -startvm <vm name>" and I assumed that would be the same with $ VBoxManage startvm <vm name> --type headless
but it doesn't seam to be the case. (the core file seams only created with running virtualbox)
I'm using virtualbox in an headless environnement and it is unpractical to have VM output displayed. Is there anything else I can do to have a core dump with $ VBoxManage startvm <vm name> --type headless ?
by , 8 years ago
Attachment: | ahci-virtualbox-disconnect.txt added |
---|
comment:6 by , 8 years ago
A new output of VBoxManage debugvm <VM name> info ahci0
The virtualbox software was upgraded to 5.0.20r106931 and the bug still occurs (rarely)
Any news on the way to get a core dump on a headless setup ?
comment:7 by , 5 years ago
For the record, in Virtualbox 5.1 (at least) the problem still persist (I haven't tested with 6.0 or 6.1 yet) I also have windows (fewer) guests on Linux host and using SATA as disk drives triggers the problem as well. Wich makes me think the problem is not guest related.
On a windows guest, using old Virtualbox : VirtualBox VM 4.2.10 r84104 linux.amd64 (Mar 5 2013 13:37:15) release log The log shows : 783:09:48.334496 AHCI#0P1: Cancelled task 6 783:11:20.549345 AHCI#0: Port 1 reset 783:11:20.564215 AHCI#1: Canceled write at offset 3093704704 (4096 bytes left) returned rc=VINF_SUCCESS 783:17:56.028435 AHCI#0: Port 1 reset 783:18:56.528014 AHCI#0: Port 1 reset 783:24:51.371946 AHCI#0: Port 1 reset 783:28:51.559754 AHCI#0: Port 1 reset 783:29:52.059510 AHCI#0: Port 1 reset 783:31:00.809397 AHCI#0: Port 1 reset 783:38:04.715720 AHCI#0: Port 1 reset 783:39:25.590470 AHCI#0: Port 1 reset 783:40:26.090818 AHCI#0: Port 1 reset
But the good news is I found a workaround : If the bug is ended in the SATA/AHCI code, I switched to a SAS disk controler and now the problem is gone.
Hopefully, Linux has support for such controlers and switching works without modification. Unfortunelty, that isn't the case for my Windows Host
comment:8 by , 5 years ago
I have also had a (very) similar problem with my Windows 7 Guest(s). For me, it had only occurred on my ThinkCentre machines. I have other machines with the same Host/Guest combination but no errors. In my case, whilst the external symptom was the same, the action within the Windows 7 guest was dependant on which SATA driver I had installed in the guest. With the default Microsoft driver, the guest would hang completely. Using the latest Intel drivers (11.2.0.1006) for that virtual SATA controller type, the drives would go offline within the guest but no hang. I too changed to using the LSI SAS 1068 controller in the guest and I never had any further problems. I have been running with that configuration since March 2018.
The LSI SAS 1068 drivers are still available for Windows 7 from the Broadcom website. Use the following URL to search for the Legacy products section. The drivers are actually listed under the LSI SAS 3800X section. For Windows 7 the latest driver is version 1.34.3.0
vbox log file with AHCP port reset