Opened 15 years ago
Closed 14 years ago
#6013 closed defect (fixed)
SLES 10 Linux guest hangs -> retry with 3.1.4
Reported by: | Predrag | Owned by: | |
---|---|---|---|
Component: | guest smp | Version: | VirtualBox 3.1.4 |
Keywords: | SLES guest hangs | Cc: | |
Guest type: | Linux | Host type: | Linux |
Description
I have SLES Linux 10 host and two SLES 10 Linux guests. Installed Oracle 11g DB (DB size=300GB+) and application on both (testing environment). During a high I/O (some batch jobs e.g.) one of the guests hangs becoming totally unresponsive. RAM size is 12 GB, using dynamic disks. Also tried to allocate all RAM (except 2 GB for host) to one guest. When I used physical machines all worked OK with 4GB RAM. I attached logs for both VM. Any solution?
Attachments (6)
Change History (30)
by , 15 years ago
Attachment: | test5 logs.rar added |
---|
by , 15 years ago
Attachment: | VB logs.rar added |
---|
follow-up: 2 comment:1 by , 15 years ago
follow-ups: 5 7 comment:3 by , 15 years ago
It would be helpful if you could tell us which of the two VMs hang (which log file)? Furthermore you could check if the same hang occurs if you decrease the number of guest CPUs to 1.
And what did you exactly in the guest to provoke the hang? I/O from/to the virtual disk, network I/O or/and I/O over shared folders?
follow-up: 6 comment:4 by , 15 years ago
And: Does the whole VM process hang? If so, forcing a core dump of that VM like described here (Forcing VirtualBox to terminate with a core dump) and sending the core dump to us could help finding the problem. Give me a note if you have such a core dump and I can tell you a server for uploading the file.
by , 15 years ago
by , 15 years ago
Attachment: | VBox.log.1 added |
---|
by , 15 years ago
Attachment: | VBox.log.2 added |
---|
by , 15 years ago
Attachment: | VBox.log.3 added |
---|
comment:5 by , 15 years ago
Replying to frank:
It would be helpful if you could tell us which of the two VMs hang (which log file)? Furthermore you could check if the same hang occurs if you decrease the number of guest CPUs to 1.
And what did you exactly in the guest to provoke the hang? I/O from/to the virtual disk, network I/O or/and I/O over shared folders?
Both VM hanged, two kinds of high I/O provoke it.
I uploaded logs from VM hanged this morning, around 9:30. It happened during a backup operation (network I/O). Hang also happened during some batch jobs (/O from/to the virtual disk). There's no shared folders.
I decreased number of CPUs from 4 to 1 and I will inform you about result.
comment:6 by , 15 years ago
Replying to frank: > And: Does the whole VM process hang? If so, forcing a core dump of that VM like described here (Forcing VirtualBox to terminate with a core dump) and sending the core dump to us could help finding the problem. Give me a note if you have such a core dump and I can tell you a server for uploading the file.
When hang occur it happens on one VM, second works OK.
I forced a core dump, it is a file of 2GB+, compressed around 450 MB. Should I upload it and where?
comment:7 by , 15 years ago
Replying to frank:
It would be helpful if you could tell us which of the two VMs hang (which log >file)? Furthermore you could check if the same hang occurs if you decrease the >number of guest CPUs to 1.
And what did you exactly in the guest to provoke the hang? I/O from/to the virtual disk, network I/O or/and I/O over shared folders?
It seems that decreasing the number of guest CPUs to 1 work very well (multiprocessing don't work). VT-X is still enabled. We are testing yet but hang didn't occured in situations where it happened with 4 processors.
Processor is Intel Xeon CPU E5405 @ 2.00GHz. Am I wrong or it means that VM can use 25% of CPU?
Maybe I mad a mistake on one of the checkboxes for CPU settings?
follow-up: 9 comment:8 by , 15 years ago
The core dump you sent me is useless as you set one guest CPU for that VM. Reading your comments above I assume that the hang does only occur on high I/O with more than one guest CPU enabled.
Such a core dump makes only sense if you take it from a hanging VM session! So if you really want to help debugging this problem then set up 4 guest CPUs, make the VM hang with your I/O operations and then send me the core dump the same way as you already did.
And regarding your last question: On a 4 core host I would never activate 4 guest cores as the virtualization needs some overhead and there are other applications on the host requiring CPU time as well. A better choice would be 3 or 2 cores but nevertheless the guest VM shouldn't hang.
follow-up: 10 comment:9 by , 15 years ago
Replying to frank:
I hope I uploaded an useful core dump file this time for 4 CPU VM (still uploading at the time of writing this - 1.5 GB) Maybe I made one mistake with VM log files. After forcing dump of hanged VM I rebooted it and i couldn't find the right VM log file for hanging session so I uploaded all 3 logs. I sent file names by email.
It seems that another VM that is set to work with 2 CPUs doesn't work well but we are testing yet and will also try to provoke hang.
comment:10 by , 15 years ago
Tested and confirmed that VirtualBox hangs every time during a higher I/O load on VM with more than 1 CPU activated. Also noticed that system time is inaccurate. On VM with 1 CPU there's no such problem. Any solution from you?
follow-up: 12 comment:11 by , 15 years ago
Your last core dump was better but currently there is no solution. The wrong guest time will be most probably fixed in the next VBox maintenance release. So far I suggest you to use only one guest CPU for that VM. VirtualBox will still benefit from multiple host cores as the VMM itself and the virtual devices are multithreaded.
follow-up: 13 comment:12 by , 15 years ago
Is the issue related to guest OS (SLES 10.3 64bit)? SM on guest VM is very important to us.
comment:14 by , 15 years ago
Component: | other → guest smp |
---|---|
Summary: | SLES 10 Linux guest hangs → SLES 10 Linux guest hangs -> retry with 3.1.4 |
Retry with 3.1.4. That version will include an important stability fix for SMP guests.
follow-up: 16 comment:15 by , 15 years ago
Please check if 3.1.4 beta 1 solves the problem: http://forums.virtualbox.org/viewtopic.php?f=15&t=27300
comment:16 by , 15 years ago
Replying to sandervl73:
Please check if 3.1.4 beta 1 solves the problem: http://forums.virtualbox.org/viewtopic.php?f=15&t=27300
VirtualBox 3.1.4 beta doesn't solve the problem. VM hangs in the same way with SMP enabled (2 CPU) and the system time is more inaccurate then with version 3.1.3.
follow-up: 18 comment:17 by , 15 years ago
In that case, did you really test an unofficial 3.1.3 test build and if yes, which exact build was it?
follow-up: 19 comment:18 by , 15 years ago
Replying to frank:
In that case, did you really test an unofficial 3.1.3 test build and if yes, which exact build was it?
I tested VirtualBox-3.1-3.1.2_56127_sles10.1-1.x86_64. After post from sandervl73 on 2010-01-29 I downloaded and installed
VirtualBox 3.1-3.1.4_BETA1_57050_sles10.1-1.x86_64
comment:19 by , 15 years ago
SMP doesn't work with version 3.1.4 r57640 neither.
Also guest system time is not accurate (1 sec per minute forward before sync).
I will upload core dump file on ftp://ftp.innotek.de/incoming in a few minutes. File name is core.1246.tar.gz.
follow-up: 21 comment:20 by , 15 years ago
Version: | VirtualBox 3.1.2 → VirtualBox 3.1.4 |
---|
Analyzing the core dump I saw that the E1000 ethernet card waits for the guest to free more network descriptors. One of the guest CPUs is currently executing code, the other is in halt state. This could be a problem with the E1000 network card emulation. Could you test if your guest works better if you change the network card to PCNet (VM network settings / advanced)?
follow-up: 22 comment:21 by , 15 years ago
Replying to frank:
It seems that setting NC to PCnet_Fast_III solves the problem with hanging. With 2 processors, machine worked under load for 2 days without problem with much better performance than with 1 CPU.
There's just one problem left - system time.
Mar 5 09:19:02 test6 ntpdate[31432]: step time server 10.0.0.x offset -0.615278 sec
Mar 5 09:20:01 test6 ntpdate[31484]: step time server 10.0.0.x offset -0.856136 sec
Mar 5 09:21:01 test6 ntpdate[31541]: step time server 10.0.0.x offset -1.297398 sec
Mar 5 09:22:00 test6 ntpdate[31598]: step time server 10.0.0.x offset -2.188469 sec
Mar 5 09:23:02 test6 ntpdate[31654]: step time server 10.0.0.x offset -1.373936 sec
Mar 5 09:24:00 test6 ntpdate[31711]: step time server 10.0.0.x offset -1.646268 sec
Mar 5 09:25:00 test6 ntpdate[31758]: step time server 10.0.0.x offset -1.749338 sec
Mar 5 09:26:01 test6 ntpdate[31810]: step time server 10.0.0.x offset -0.745577 sec
Mar 5 09:27:02 test6 ntpdate[31917]: step time server 10.0.0.x offset -0.839320 sec
Mar 5 09:28:01 test6 ntpdate[31972]: step time server 10.0.0.x offset -0.545628 sec
Sync with time server is set on 1 minute.
comment:22 by , 15 years ago
Sorry for bad formatting, but it looks OK in Mozilla Firefox
Mar 5 09:19:02 test6 ntpdate[31432]: step time server 10.0.0.x offset -0.615278 sec
Mar 5 09:20:01 test6 ntpdate[31484]: step time server 10.0.0.x offset -0.856136 sec
Mar 5 09:21:01 test6 ntpdate[31541]: step time server 10.0.0.x offset -1.297398 sec
Mar 5 09:22:00 test6 ntpdate[31598]: step time server 10.0.0.x offset -2.188469 sec
Mar 5 09:23:02 test6 ntpdate[31654]: step time server 10.0.0.x offset -1.373936 sec
Mar 5 09:24:00 test6 ntpdate[31711]: step time server 10.0.0.x offset -1.646268 sec
Mar 5 09:25:00 test6 ntpdate[31758]: step time server 10.0.0.x offset -1.749338 sec
Mar 5 09:26:01 test6 ntpdate[31810]: step time server 10.0.0.x offset -0.745577 sec
Mar 5 09:27:02 test6 ntpdate[31917]: step time server 10.0.0.x offset -0.839320 sec
Mar 5 09:28:01 test6 ntpdate[31972]: step time server 10.0.0.x offset -0.545628 sec
comment:23 by , 14 years ago
Retry with 3.2.10. It contains an SMP performance fix that might apply to your case as well.
This is becoming very urgent...