Yesterday I manually renamed a few virtual machines (which contained special characters) and reconfigured backup jobs with the new names.
During night no backup happened at all.
This morning I rebooted the appliance and verified that changes were not saved, ie. the virtual machines I added were not in the backup jobs anymore.
I reconfigured backup jobs again and launched a simple test job (which only contains one VM). It DID NOT WORK. Task stuck about 30% and it did not even reach to perform VM snapshot.
After checking logs I found out of memory (OOM) condition, which sound funny because I already upgraded to 6GB of RAM.
- VDP has:
2TB disk
6GB RAM
4GB swap
- Datacenter has:
~200 VM in total
~100 VM covered by backup jobs
10 backup jobs
This is the beginning of the OOM logs in /var/log/messages:
May 10 11:45:55 vdp -- MARK --
May 10 11:49:18 vdp kernel: [ 1899.735068] java invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
May 10 11:49:18 vdp kernel: [ 1899.735077] java cpuset=/ mems_allowed=0
May 10 11:49:18 vdp kernel: [ 1899.735081] Pid: 11286, comm: java Tainted: G X 2.6.32.49-0.3.1.3755.0.PTF-default #1
May 10 11:49:18 vdp kernel: [ 1899.735083] Call Trace:
May 10 11:49:18 vdp kernel: [ 1899.735149] [<ffffffff810061dc>] dump_trace+0x6c/0x2d0
May 10 11:49:18 vdp kernel: [ 1899.735167] [<ffffffff8139b076>] dump_stack+0x69/0x73
May 10 11:49:18 vdp kernel: [ 1899.735189] [<ffffffff810b8e3c>] oom_kill_process+0xcc/0x2f0
May 10 11:49:18 vdp kernel: [ 1899.735204] [<ffffffff810b94c0>] __out_of_memory+0x50/0xa0
May 10 11:49:18 vdp kernel: [ 1899.735208] [<ffffffff810b96a8>] out_of_memory+0x198/0x210
May 10 11:49:18 vdp kernel: [ 1899.735213] [<ffffffff810bcc66>] __alloc_pages_slowpath+0x4b6/0x5f0
May 10 11:49:18 vdp kernel: [ 1899.735220] [<ffffffff810bceda>] __alloc_pages_nodemask+0x13a/0x140
May 10 11:49:18 vdp kernel: [ 1899.735227] [<ffffffff810c039e>] __do_page_cache_readahead+0xce/0x220
May 10 11:49:19 vdp kernel: [ 1899.735235] [<ffffffff810c050c>] ra_submit+0x1c/0x30
May 10 11:49:19 vdp kernel: [ 1899.735239] [<ffffffff810b70f3>] filemap_fault+0x3c3/0x3d0
May 10 11:49:19 vdp kernel: [ 1899.735244] [<ffffffff810cfd77>] __do_fault+0x57/0x520
May 10 11:49:19 vdp kernel: [ 1899.735252] [<ffffffff810d46f9>] handle_mm_fault+0x199/0x430
May 10 11:49:19 vdp kernel: [ 1899.735260] [<ffffffff813a07df>] do_page_fault+0x1bf/0x3e0
May 10 11:49:19 vdp kernel: [ 1899.735268] [<ffffffff8139e0ff>] page_fault+0x1f/0x30
May 10 11:49:19 vdp kernel: [ 1899.736442] DWARF2 unwinder stuck at page_fault+0x1f/0x30
May 10 11:49:19 vdp kernel: [ 1899.736444]