Results 1 to 7 of 7

Thread: Amanda triggers OOM killer

  1. #1
    Join Date
    Feb 2007
    Location
    Sydney, Australia
    Posts
    3

    Default Amanda triggers OOM killer

    Hi

    I hope someone can shed some light on a problem I am experiencing with my AMANDA setup. Every time 'amdump' or 'amflush' is run, the server runs out of memory and OOM (out of memory killer, [url]http://linux-mm.org/OOM_Killer[/url]) steps in and kills a process, most often it is a process that was started right before amdump (e.g. postgres). Dumps are occasionally left on the holding disk (then I do 'amflush' which also kills a process due to memory issues).

    The server:
    - RHEL AS 4 Kernel 2.6.9-5.ELsmp
    - 8G memory
    - Holding disk 50GB
    - 4xXeon 3.2Ghz CPUs

    I believe 8G of physical memory (server is not under other significant load during the backups) would be more than enough to do all DLEs (disk list entries).

    DLEs:
    CLIENT PARTITION DUMP-TYPE APPROX SIZE
    ================================================== =======
    CLIENT1 /SVR1PART1 comp-user-tar 230M
    CLIENT1 / root-tar 4G
    CLIENT2 /opt comp-user-tar 500M
    CLIENT2 /boot comp-user-tar 80M
    CLIENT2 /SVR2PART3 user-tar 7G
    CLIENT2 / root-tar-estsvr 6G
    CLIENT3 /opt comp-user-tar 500M
    CLIENT3 / root-tar-estsvr 5G
    CLIENT3 /SVR3PART3 user-tar 1G
    CLIENT3 /SVR3PART4 user-tar 15G
    CLIENT3 /home user-tar 15G
    CLIENT4 / root-tar-estsvr 2G
    CLIENT4 /home comp-user-tar 100M
    CLIENT4 /var/log comp-user-tar 150M
    CLIENT4 /var/spool/mqueue comp-user-tar 40M
    CLIENT4 /var/spool/mail comp-user-tar 800M

    NB: root-tar-estsvr is a dump type where estimates are done by the Amanda server, not the client.

    Amanda dump log files show the following (process id differs each time, obviously):
    /usr/sbin/amdump: line 116: 15691 Done $libexecdir/planner$SUF $conf "$@"
    15692 Killed | $libexecdir/driver$SUF $conf "$@"

    I thought that possible memory leaks (although none reported on Zmanda web site) could cause this issue and put the actual backup command into a wrapper that attempts to jail the backup process in a ulimit, with memory boundaries set to 2Gb. This has not brought the desired results - process(es) still get killed each time amdump is run.

    #!/bin/bash
    ulimit -m 2097152
    ulimit -v 2097152
    /usr/sbin/amdump BACKUPCONFIG

    At present, I have no idea what is happening - done the usual "googling" around, but found no Amanda OOM/memory leaks related posts.

    Please if anyone could shed some light on this - even rough hints would be helpful.
    Much appreciated.

    Many thanks
    Dmitri

    ADDITIONAL INFO:
    Zmanda package installed: amanda-backup_server-2.5.1p3-1.rhel4
    DMESG output:
    ----------------------------------------------
    HighMem per-cpu:
    cpu 0 hot: low 32, high 96, batch 16
    cpu 0 cold: low 0, high 32, batch 16
    cpu 1 hot: low 32, high 96, batch 16
    cpu 1 cold: low 0, high 32, batch 16
    cpu 2 hot: low 32, high 96, batch 16
    cpu 2 cold: low 0, high 32, batch 16
    cpu 3 hot: low 32, high 96, batch 16
    cpu 3 cold: low 0, high 32, batch 16
    cpu 4 hot: low 32, high 96, batch 16
    cpu 4 cold: low 0, high 32, batch 16
    cpu 5 hot: low 32, high 96, batch 16
    cpu 5 cold: low 0, high 32, batch 16
    cpu 6 hot: low 32, high 96, batch 16
    cpu 6 cold: low 0, high 32, batch 16
    cpu 7 hot: low 32, high 96, batch 16
    cpu 7 cold: low 0, high 32, batch 16

    Free pages: 3353720kB (3353024kB HighMem)
    Active:11102 inactive:1015760 dirty:364692 writeback:3210 unstable:0 free:838430 slab:41662 mapped:7072 pagetables:477
    DMA free:16kB min:16kB low:32kB high:48kB active:0kB inactive:44kB present:16384kB
    protections[]: 0 0 0
    Normal free:680kB min:936kB low:1872kB high:2808kB active:2112kB inactive:2272kB present:901120kB
    protections[]: 0 0 0
    HighMem free:3353024kB min:512kB low:1024kB high:1536kB active:42380kB inactive:4060640kB present:8257536kB
    protections[]: 0 0 0
    DMA: 2*4kB 1*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 16kB
    Normal: 0*4kB 1*8kB 0*16kB 1*32kB 0*64kB 1*128kB 2*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 680kB
    HighMem: 11302*4kB 9711*8kB 4413*16kB 1113*32kB 24785*64kB 9187*128kB 1087*256kB 127*512kB 16*1024kB 1*2048kB 0*4096kB = 3353024kB
    Swap cache: add 47642, delete 47557, find 302822/305199, race 0+0
    Out of Memory: Killed process 14533 (postmaster).
    ----------------------------------------------
    Last edited by beastyfahr; February 27th, 2007 at 04:02 PM. Reason: Adding more information.

  2. #2

    Default want to ask something regarding amanda new user

    Dear Sir,

    I want to ask something regarding amanda. To what address i should do mail. I am geeting some problem. Plz help me.

    Thanks,

    tuhin

  3. #3

    Default

    Quote Originally Posted by beastyfahr View Post
    Hi

    I hope someone can shed some light on a problem I am experiencing with my AMANDA setup. Every time 'amdump' or 'amflush' is run, the server runs out of memory and OOM (out of memory killer, [url]http://linux-mm.org/OOM_Killer[/url]) steps in and kills a process, most often it is a process that was started right before amdump (e.g. postgres). Dumps are occasionally left on the holding disk (then I do 'amflush' which also kills a process due to memory issues).
    ....
    Can you provide more information about your configuration?

    1. Which version of Amanda are you using on the server?
    2. Does the problem go away when you reduce the number of DLEs?
    3. Some details of [URL=http://wiki.zmanda.com/index.php/Amanda.conf]amanda.conf[/URL] parameters will be useful - Value of inparallel, maxdumps, use of tape spanning will be useful.
    4. Use "top" command to find out what process is using up the memory.

    Paddy

  4. #4

    Default

    Quote Originally Posted by tuhin View Post
    Dear Sir,

    I want to ask something regarding amanda. To what address i should do mail. I am geeting some problem. Plz help me.

    Thanks,

    tuhin
    You can post your question here. If you need services support (with SLAs), please contact [email]services@zmanda.com[/email]

    Paddy

  5. #5
    Join Date
    Feb 2007
    Location
    Sydney, Australia
    Posts
    3

    Default

    Quote Originally Posted by paddy View Post
    Can you provide more information about your configuration?

    1. Which version of Amanda are you using on the server?
    2. Does the problem go away when you reduce the number of DLEs?
    3. Some details of [URL=http://wiki.zmanda.com/index.php/Amanda.conf]amanda.conf[/URL] parameters will be useful - Value of inparallel, maxdumps, use of tape spanning will be useful.
    4. Use "top" command to find out what process is using up the memory.

    Paddy
    Paddy, thanks for your response.

    1. Amanda version: amanda-backup_server-2.5.1p3-1.rhel4

    2. Yes, the problem seems to have gone away after removing the biggest DLEs (appr. 40 and 60G). I still see, however, my Nagios monitoring reporting low memory during AMANDA dumps (free memory down to 0.5%, 1% etc.). I have had one successful dump without any processes being killed so far. 2007-03-02, even having the largest DLEs off the list resulted in 1 process being killed. 2007-03-03, however, things went reasonably OK, except for continous high memory usage.

    3. Please see attached amanda.conf.txt

    4. During the "amflush" runs (I do not monitor using "top" during nightly backups), taper seems to use quite a lot pf CPU and memory, also MegaCtrl (Dell management console) uses quite a bit (3.5% CPU, but not much memory). Should you require more information I will do a few "top" dumps during the nightly backups.
    Attached Files Attached Files

  6. #6
    Join Date
    Nov 2005
    Location
    Canada
    Posts
    1,049

    Default

    Did your biggest DLE go to the holding disk or if they go directly to tape?
    Do you have a tape_splitsize defined?

  7. #7
    Join Date
    Feb 2007
    Location
    Sydney, Australia
    Posts
    3

    Default

    Quote Originally Posted by martineau View Post
    Did your biggest DLE go to the holding disk or if they go directly to tape?
    Do you have a tape_splitsize defined?
    martineau, thank you for your response, please see answers below:

    1. That would vary from time to time, in most cases both or one of the biggest DLEs (again, appr. 40 and 60G, resp.) would be left on the holding disk. I recall having to run amflush almost every morning, because either amdump/amflush was killed by the OOM killer.
    2. My "tape_splitsize" is unchanged as in the default AMANDA config. Those dump types that make use of this parameter are not used in my DLE list (/etc/amanda/CONF/disklist). Also, our LTO-2 tape native size is 200G and the dumps do not exceed this size even if we'd do full dumps evry night, therefore there's no need to define it, as far as I understand.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •