beastyfahr
February 27th, 2007, 03:49 PM
Hi
I hope someone can shed some light on a problem I am experiencing with my AMANDA setup. Every time 'amdump' or 'amflush' is run, the server runs out of memory and OOM (out of memory killer, http://linux-mm.org/OOM_Killer) steps in and kills a process, most often it is a process that was started right before amdump (e.g. postgres). Dumps are occasionally left on the holding disk (then I do 'amflush' which also kills a process due to memory issues).
The server:
- RHEL AS 4 Kernel 2.6.9-5.ELsmp
- 8G memory
- Holding disk 50GB
- 4xXeon 3.2Ghz CPUs
I believe 8G of physical memory (server is not under other significant load during the backups) would be more than enough to do all DLEs (disk list entries).
DLEs:
CLIENT PARTITION DUMP-TYPE APPROX SIZE
================================================== =======
CLIENT1 /SVR1PART1 comp-user-tar 230M
CLIENT1 / root-tar 4G
CLIENT2 /opt comp-user-tar 500M
CLIENT2 /boot comp-user-tar 80M
CLIENT2 /SVR2PART3 user-tar 7G
CLIENT2 / root-tar-estsvr 6G
CLIENT3 /opt comp-user-tar 500M
CLIENT3 / root-tar-estsvr 5G
CLIENT3 /SVR3PART3 user-tar 1G
CLIENT3 /SVR3PART4 user-tar 15G
CLIENT3 /home user-tar 15G
CLIENT4 / root-tar-estsvr 2G
CLIENT4 /home comp-user-tar 100M
CLIENT4 /var/log comp-user-tar 150M
CLIENT4 /var/spool/mqueue comp-user-tar 40M
CLIENT4 /var/spool/mail comp-user-tar 800M
NB: root-tar-estsvr is a dump type where estimates are done by the Amanda server, not the client.
Amanda dump log files show the following (process id differs each time, obviously):
/usr/sbin/amdump: line 116: 15691 Done $libexecdir/planner$SUF $conf "[email protected]"
15692 Killed | $libexecdir/driver$SUF $conf "[email protected]"
I thought that possible memory leaks (although none reported on Zmanda web site) could cause this issue and put the actual backup command into a wrapper that attempts to jail the backup process in a ulimit, with memory boundaries set to 2Gb. This has not brought the desired results - process(es) still get killed each time amdump is run.
#!/bin/bash
ulimit -m 2097152
ulimit -v 2097152
/usr/sbin/amdump BACKUPCONFIG
At present, I have no idea what is happening - done the usual "googling" around, but found no Amanda OOM/memory leaks related posts.
Please if anyone could shed some light on this - even rough hints would be helpful.
Much appreciated.
Many thanks
Dmitri
ADDITIONAL INFO:
Zmanda package installed: amanda-backup_server-2.5.1p3-1.rhel4
DMESG output:
----------------------------------------------
HighMem per-cpu:
cpu 0 hot: low 32, high 96, batch 16
cpu 0 cold: low 0, high 32, batch 16
cpu 1 hot: low 32, high 96, batch 16
cpu 1 cold: low 0, high 32, batch 16
cpu 2 hot: low 32, high 96, batch 16
cpu 2 cold: low 0, high 32, batch 16
cpu 3 hot: low 32, high 96, batch 16
cpu 3 cold: low 0, high 32, batch 16
cpu 4 hot: low 32, high 96, batch 16
cpu 4 cold: low 0, high 32, batch 16
cpu 5 hot: low 32, high 96, batch 16
cpu 5 cold: low 0, high 32, batch 16
cpu 6 hot: low 32, high 96, batch 16
cpu 6 cold: low 0, high 32, batch 16
cpu 7 hot: low 32, high 96, batch 16
cpu 7 cold: low 0, high 32, batch 16
Free pages: 3353720kB (3353024kB HighMem)
Active:11102 inactive:1015760 dirty:364692 writeback:3210 unstable:0 free:838430 slab:41662 mapped:7072 pagetables:477
DMA free:16kB min:16kB low:32kB high:48kB active:0kB inactive:44kB present:16384kB
protections[]: 0 0 0
Normal free:680kB min:936kB low:1872kB high:2808kB active:2112kB inactive:2272kB present:901120kB
protections[]: 0 0 0
HighMem free:3353024kB min:512kB low:1024kB high:1536kB active:42380kB inactive:4060640kB present:8257536kB
protections[]: 0 0 0
DMA: 2*4kB 1*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 16kB
Normal: 0*4kB 1*8kB 0*16kB 1*32kB 0*64kB 1*128kB 2*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 680kB
HighMem: 11302*4kB 9711*8kB 4413*16kB 1113*32kB 24785*64kB 9187*128kB 1087*256kB 127*512kB 16*1024kB 1*2048kB 0*4096kB = 3353024kB
Swap cache: add 47642, delete 47557, find 302822/305199, race 0+0
Out of Memory: Killed process 14533 (postmaster).
----------------------------------------------
I hope someone can shed some light on a problem I am experiencing with my AMANDA setup. Every time 'amdump' or 'amflush' is run, the server runs out of memory and OOM (out of memory killer, http://linux-mm.org/OOM_Killer) steps in and kills a process, most often it is a process that was started right before amdump (e.g. postgres). Dumps are occasionally left on the holding disk (then I do 'amflush' which also kills a process due to memory issues).
The server:
- RHEL AS 4 Kernel 2.6.9-5.ELsmp
- 8G memory
- Holding disk 50GB
- 4xXeon 3.2Ghz CPUs
I believe 8G of physical memory (server is not under other significant load during the backups) would be more than enough to do all DLEs (disk list entries).
DLEs:
CLIENT PARTITION DUMP-TYPE APPROX SIZE
================================================== =======
CLIENT1 /SVR1PART1 comp-user-tar 230M
CLIENT1 / root-tar 4G
CLIENT2 /opt comp-user-tar 500M
CLIENT2 /boot comp-user-tar 80M
CLIENT2 /SVR2PART3 user-tar 7G
CLIENT2 / root-tar-estsvr 6G
CLIENT3 /opt comp-user-tar 500M
CLIENT3 / root-tar-estsvr 5G
CLIENT3 /SVR3PART3 user-tar 1G
CLIENT3 /SVR3PART4 user-tar 15G
CLIENT3 /home user-tar 15G
CLIENT4 / root-tar-estsvr 2G
CLIENT4 /home comp-user-tar 100M
CLIENT4 /var/log comp-user-tar 150M
CLIENT4 /var/spool/mqueue comp-user-tar 40M
CLIENT4 /var/spool/mail comp-user-tar 800M
NB: root-tar-estsvr is a dump type where estimates are done by the Amanda server, not the client.
Amanda dump log files show the following (process id differs each time, obviously):
/usr/sbin/amdump: line 116: 15691 Done $libexecdir/planner$SUF $conf "[email protected]"
15692 Killed | $libexecdir/driver$SUF $conf "[email protected]"
I thought that possible memory leaks (although none reported on Zmanda web site) could cause this issue and put the actual backup command into a wrapper that attempts to jail the backup process in a ulimit, with memory boundaries set to 2Gb. This has not brought the desired results - process(es) still get killed each time amdump is run.
#!/bin/bash
ulimit -m 2097152
ulimit -v 2097152
/usr/sbin/amdump BACKUPCONFIG
At present, I have no idea what is happening - done the usual "googling" around, but found no Amanda OOM/memory leaks related posts.
Please if anyone could shed some light on this - even rough hints would be helpful.
Much appreciated.
Many thanks
Dmitri
ADDITIONAL INFO:
Zmanda package installed: amanda-backup_server-2.5.1p3-1.rhel4
DMESG output:
----------------------------------------------
HighMem per-cpu:
cpu 0 hot: low 32, high 96, batch 16
cpu 0 cold: low 0, high 32, batch 16
cpu 1 hot: low 32, high 96, batch 16
cpu 1 cold: low 0, high 32, batch 16
cpu 2 hot: low 32, high 96, batch 16
cpu 2 cold: low 0, high 32, batch 16
cpu 3 hot: low 32, high 96, batch 16
cpu 3 cold: low 0, high 32, batch 16
cpu 4 hot: low 32, high 96, batch 16
cpu 4 cold: low 0, high 32, batch 16
cpu 5 hot: low 32, high 96, batch 16
cpu 5 cold: low 0, high 32, batch 16
cpu 6 hot: low 32, high 96, batch 16
cpu 6 cold: low 0, high 32, batch 16
cpu 7 hot: low 32, high 96, batch 16
cpu 7 cold: low 0, high 32, batch 16
Free pages: 3353720kB (3353024kB HighMem)
Active:11102 inactive:1015760 dirty:364692 writeback:3210 unstable:0 free:838430 slab:41662 mapped:7072 pagetables:477
DMA free:16kB min:16kB low:32kB high:48kB active:0kB inactive:44kB present:16384kB
protections[]: 0 0 0
Normal free:680kB min:936kB low:1872kB high:2808kB active:2112kB inactive:2272kB present:901120kB
protections[]: 0 0 0
HighMem free:3353024kB min:512kB low:1024kB high:1536kB active:42380kB inactive:4060640kB present:8257536kB
protections[]: 0 0 0
DMA: 2*4kB 1*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 16kB
Normal: 0*4kB 1*8kB 0*16kB 1*32kB 0*64kB 1*128kB 2*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 680kB
HighMem: 11302*4kB 9711*8kB 4413*16kB 1113*32kB 24785*64kB 9187*128kB 1087*256kB 127*512kB 16*1024kB 1*2048kB 0*4096kB = 3353024kB
Swap cache: add 47642, delete 47557, find 302822/305199, race 0+0
Out of Memory: Killed process 14533 (postmaster).
----------------------------------------------