PDA

View Full Version : Amanda triggers OOM killer



beastyfahr
February 27th, 2007, 02:49 PM
Hi

I hope someone can shed some light on a problem I am experiencing with my AMANDA setup. Every time 'amdump' or 'amflush' is run, the server runs out of memory and OOM (out of memory killer, http://linux-mm.org/OOM_Killer) steps in and kills a process, most often it is a process that was started right before amdump (e.g. postgres). Dumps are occasionally left on the holding disk (then I do 'amflush' which also kills a process due to memory issues).

The server:
- RHEL AS 4 Kernel 2.6.9-5.ELsmp
- 8G memory
- Holding disk 50GB
- 4xXeon 3.2Ghz CPUs

I believe 8G of physical memory (server is not under other significant load during the backups) would be more than enough to do all DLEs (disk list entries).

DLEs:
CLIENT PARTITION DUMP-TYPE APPROX SIZE
================================================== =======
CLIENT1 /SVR1PART1 comp-user-tar 230M
CLIENT1 / root-tar 4G
CLIENT2 /opt comp-user-tar 500M
CLIENT2 /boot comp-user-tar 80M
CLIENT2 /SVR2PART3 user-tar 7G
CLIENT2 / root-tar-estsvr 6G
CLIENT3 /opt comp-user-tar 500M
CLIENT3 / root-tar-estsvr 5G
CLIENT3 /SVR3PART3 user-tar 1G
CLIENT3 /SVR3PART4 user-tar 15G
CLIENT3 /home user-tar 15G
CLIENT4 / root-tar-estsvr 2G
CLIENT4 /home comp-user-tar 100M
CLIENT4 /var/log comp-user-tar 150M
CLIENT4 /var/spool/mqueue comp-user-tar 40M
CLIENT4 /var/spool/mail comp-user-tar 800M

NB: root-tar-estsvr is a dump type where estimates are done by the Amanda server, not the client.

Amanda dump log files show the following (process id differs each time, obviously):
/usr/sbin/amdump: line 116: 15691 Done $libexecdir/planner$SUF $conf "[email protected]"
15692 Killed | $libexecdir/driver$SUF $conf "[email protected]"

I thought that possible memory leaks (although none reported on Zmanda web site) could cause this issue and put the actual backup command into a wrapper that attempts to jail the backup process in a ulimit, with memory boundaries set to 2Gb. This has not brought the desired results - process(es) still get killed each time amdump is run.

#!/bin/bash
ulimit -m 2097152
ulimit -v 2097152
/usr/sbin/amdump BACKUPCONFIG

At present, I have no idea what is happening - done the usual "googling" around, but found no Amanda OOM/memory leaks related posts.

Please if anyone could shed some light on this - even rough hints would be helpful.
Much appreciated.

Many thanks
Dmitri

ADDITIONAL INFO:
Zmanda package installed: amanda-backup_server-2.5.1p3-1.rhel4
DMESG output:
----------------------------------------------
HighMem per-cpu:
cpu 0 hot: low 32, high 96, batch 16
cpu 0 cold: low 0, high 32, batch 16
cpu 1 hot: low 32, high 96, batch 16
cpu 1 cold: low 0, high 32, batch 16
cpu 2 hot: low 32, high 96, batch 16
cpu 2 cold: low 0, high 32, batch 16
cpu 3 hot: low 32, high 96, batch 16
cpu 3 cold: low 0, high 32, batch 16
cpu 4 hot: low 32, high 96, batch 16
cpu 4 cold: low 0, high 32, batch 16
cpu 5 hot: low 32, high 96, batch 16
cpu 5 cold: low 0, high 32, batch 16
cpu 6 hot: low 32, high 96, batch 16
cpu 6 cold: low 0, high 32, batch 16
cpu 7 hot: low 32, high 96, batch 16
cpu 7 cold: low 0, high 32, batch 16

Free pages: 3353720kB (3353024kB HighMem)
Active:11102 inactive:1015760 dirty:364692 writeback:3210 unstable:0 free:838430 slab:41662 mapped:7072 pagetables:477
DMA free:16kB min:16kB low:32kB high:48kB active:0kB inactive:44kB present:16384kB
protections[]: 0 0 0
Normal free:680kB min:936kB low:1872kB high:2808kB active:2112kB inactive:2272kB present:901120kB
protections[]: 0 0 0
HighMem free:3353024kB min:512kB low:1024kB high:1536kB active:42380kB inactive:4060640kB present:8257536kB
protections[]: 0 0 0
DMA: 2*4kB 1*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 16kB
Normal: 0*4kB 1*8kB 0*16kB 1*32kB 0*64kB 1*128kB 2*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 680kB
HighMem: 11302*4kB 9711*8kB 4413*16kB 1113*32kB 24785*64kB 9187*128kB 1087*256kB 127*512kB 16*1024kB 1*2048kB 0*4096kB = 3353024kB
Swap cache: add 47642, delete 47557, find 302822/305199, race 0+0
Out of Memory: Killed process 14533 (postmaster).
----------------------------------------------

tuhin
February 27th, 2007, 08:43 PM
Dear Sir,

I want to ask something regarding amanda. To what address i should do mail. I am geeting some problem. Plz help me.

Thanks,

tuhin

paddy
March 2nd, 2007, 08:46 AM
Hi

I hope someone can shed some light on a problem I am experiencing with my AMANDA setup. Every time 'amdump' or 'amflush' is run, the server runs out of memory and OOM (out of memory killer, http://linux-mm.org/OOM_Killer) steps in and kills a process, most often it is a process that was started right before amdump (e.g. postgres). Dumps are occasionally left on the holding disk (then I do 'amflush' which also kills a process due to memory issues).
....


Can you provide more information about your configuration?

1. Which version of Amanda are you using on the server?
2. Does the problem go away when you reduce the number of DLEs?
3. Some details of amanda.conf (http://wiki.zmanda.com/index.php/Amanda.conf) parameters will be useful - Value of inparallel, maxdumps, use of tape spanning will be useful.
4. Use "top" command to find out what process is using up the memory.

Paddy

paddy
March 2nd, 2007, 08:47 AM
Dear Sir,

I want to ask something regarding amanda. To what address i should do mail. I am geeting some problem. Plz help me.

Thanks,

tuhin

You can post your question here. If you need services support (with SLAs), please contact [email protected]

Paddy

beastyfahr
March 4th, 2007, 02:35 PM
Can you provide more information about your configuration?

1. Which version of Amanda are you using on the server?
2. Does the problem go away when you reduce the number of DLEs?
3. Some details of amanda.conf (http://wiki.zmanda.com/index.php/Amanda.conf) parameters will be useful - Value of inparallel, maxdumps, use of tape spanning will be useful.
4. Use "top" command to find out what process is using up the memory.

Paddy

Paddy, thanks for your response.

1. Amanda version: amanda-backup_server-2.5.1p3-1.rhel4

2. Yes, the problem seems to have gone away after removing the biggest DLEs (appr. 40 and 60G). I still see, however, my Nagios monitoring reporting low memory during AMANDA dumps (free memory down to 0.5%, 1% etc.). I have had one successful dump without any processes being killed so far. 2007-03-02, even having the largest DLEs off the list resulted in 1 process being killed. 2007-03-03, however, things went reasonably OK, except for continous high memory usage.

3. Please see attached amanda.conf.txt

4. During the "amflush" runs (I do not monitor using "top" during nightly backups), taper seems to use quite a lot pf CPU and memory, also MegaCtrl (Dell management console) uses quite a bit (3.5% CPU, but not much memory). Should you require more information I will do a few "top" dumps during the nightly backups.

martineau
March 5th, 2007, 09:34 AM
Did your biggest DLE go to the holding disk or if they go directly to tape?
Do you have a tape_splitsize defined?

beastyfahr
March 5th, 2007, 12:34 PM
Did your biggest DLE go to the holding disk or if they go directly to tape?
Do you have a tape_splitsize defined?
martineau, thank you for your response, please see answers below:

1. That would vary from time to time, in most cases both or one of the biggest DLEs (again, appr. 40 and 60G, resp.) would be left on the holding disk. I recall having to run amflush almost every morning, because either amdump/amflush was killed by the OOM killer.
2. My "tape_splitsize" is unchanged as in the default AMANDA config. Those dump types that make use of this parameter are not used in my DLE list (/etc/amanda/CONF/disklist). Also, our LTO-2 tape native size is 200G and the dumps do not exceed this size even if we'd do full dumps evry night, therefore there's no need to define it, as far as I understand.