Results 1 to 4 of 4

Thread: Split diskbuffer problem

  1. #1
    Join Date
    Nov 2009
    Location
    Philadelphia
    Posts
    2

    Thumbs up Split diskbuffer problem

    Hi,

    I am using Amanda v2.6.1p2 compiled from source on Centos 5.4 (kernel 2.6.18-164.9.1.el5). Things have been working fine until I added some really large DLEs to the disklist (4 of them are ~700Gb). Because those DLEs don't fit on the holding disk, Amanda uses the "split diskbuffer" feature to split the large DLEs into smaller chunks and store them on disk. Amanda's "taper" in the meantime writes these chunks to tape. Once they are written to tape, Amanda should delete/release the "split diskbuffer" file. The problem I'm seeing is that the "split diskbuffer" files fill up the partition they are created on, apparently because Amanda's "dumper" doesn't release them after they are marked as deleted. This causes the partition on which the "split diskbuffer" files are stored to become full and the "dumper" hangs. As soon as this happens, the "taper" hangs as well.

    Here is my complete Amanda configuration which causes the above to happen:
    Code:
    org      "myorg"
    mailto   "amanda@my.org"
    dumpuser "amanda"
    inparallel 2
    dumporder "sS"
    taperalgo smallest
    displayunit "m"
    netusage  75000 Kbps
    dumpcycle 1 week
    runspercycle 7
    tapecycle 30 tapes
    bumpsize 20 Mb 
    bumppercent 20
    bumpdays 1
    bumpmult 4
    etimeout 432000
    dtimeout 21600
    ctimeout 30 
    device_output_buffer_size 1280k
    usetimestamps yes
    flush-threshold-dumped 0
    flush-threshold-scheduled 0
    taperflush 0
    autoflush no
    runtapes 7
    tpchanger "chg-zd-mtx"
    changerdev "/dev/changer"
    tapedev "/dev/nst0"
    changerfile "/var/lib/amanda/Daily1/changer-zd-mtx"
    maxdumpsize -1
    tapetype LTO-4
    labelstr "^PCBI*"
    amrecover_do_fsf yes
    amrecover_check_label yes
    amrecover_changer "chg-zd-mtx" 
    
    holdingdisk hd1 {
        comment "main holding disk"
        directory "/data/amanda/holdingdisk/Daily1/"
        use 50Gb
    }
    
    logdir   "/var/lib/amanda/Daily1/log"
    infofile "/var/lib/amanda/Daily1/curinfo"
    indexdir "/var/lib/amanda/Daily1/index"
    tapelist "/var/lib/amanda/Daily1/tapelist"
    
    define tapetype LTO-4 {
        comment "LTO-4"
        length 772096 mbytes
        filemark 0 kbytes
        speed 31368 kps
    }
    
    define dumptype global {
        index yes
    }
    
    define dumptype nocomp-tar {
        global
        program "GNUTAR"
        compress none
        index yes
        record yes
    }
    
    define dumptype nocomp-tar-split {
        global
        program "GNUTAR"
        compress none
        index yes
        record yes
        tape_splitsize 50 Gb
        split_diskbuffer "/data/amanda/diskbuffer"
        fallback_splitsize 2 gb
        #estimate server
        estimate calcsize
    }
    
    define interface local {
        comment "a local disk"
        use 8000 kbps
    }
    The partition on which the holding disk resides is 148Gb (which has about 140Gb of free space when Amanda is not running):
    Code:
    Filesystem                           Size  Used Avail Use% Mounted on
    /dev/mapper/SystemVG-data 148G  141G     0 100% /data
    As you can see, it is currently full because of the multiple Amanda "split diskbuffer" files that are deleted but not released by the "dumper" process (as shown by lsof):

    Code:
    lsof | grep taper|grep diskbuffer|cat -n
         1  taper     23660    amanda    6u      REG              253,5       10240    3850286 /data                                                              /amanda/diskbuffer/amanda-split-buffer-HQM26U (deleted)
         2  taper     23660    amanda    7u      REG              253,5       10240    3850285 /data                                                              /amanda/diskbuffer/amanda-split-buffer-G7RS6U (deleted)
         3  taper     23660    amanda    8u      REG              253,5   456775680    3850287 /data                                                              /amanda/diskbuffer/amanda-split-buffer-065X6U (deleted)
         4  taper     23660    amanda    9u      REG              253,5     1720320    3850288 /data                                                              /amanda/diskbuffer/amanda-split-buffer-5T2O6U (deleted)
         5  taper     23660    amanda   11u      REG              253,5  1349068800    3850289 /data                                                              /amanda/diskbuffer/amanda-split-buffer-5PG56U (deleted)
         6  taper     23660    amanda   12u      REG              253,5  1797795840    3850290 /data                                                              /amanda/diskbuffer/amanda-split-buffer-D9IM6U (deleted)
         7  taper     23660    amanda   13u      REG              253,5 53687091200    3850291 /data                                                              /amanda/diskbuffer/amanda-split-buffer-NKCS6U (deleted)
         8  taper     23660    amanda   14u      REG              253,5 53687091200    3850292 /data                                                              /amanda/diskbuffer/amanda-split-buffer-LC456U (deleted)
         9  taper     23660    amanda   15u      REG              253,5 38789267456    3850293 /data                                                              /amanda/diskbuffer/amanda-split-buffer-LL756U (deleted)
        10  taper     23660    amanda   16u      REG              253,5           0    3850294 /data                                                              /amanda/diskbuffer/amanda-split-buffer-V26X6U (deleted)
    2 days ago when I first noticed that the partition is getting full, I decreased the number of dumpers from 4 to 2, thinking that, if each dumper creates a 50Gb "split diskbuffer" file, the partition won't get full (this would require ~100Gb and the /data partition has 140Gb available). Nevertheless, this didn't help, because the dumper doesn't release the "split diskbuffer" files which I think are written to tape.

    After the /data partition gets full, Amanda writes the following to 'amdump' and 'log' log files:
    Code:
    -bash-3.2$ pwd
    /var/lib/amanda
    -bash-3.2$ tail Daily1/log/log
    PART taper PCBI27L4 999 hostname /mnt/files/pbic 20100115143301 867/-1 0 [sec 0.000106 kb 0 kps 0.000000]
    PART taper PCBI27L4 1000 hostname /mnt/files/pbic 20100115143301 868/-1 0 [sec 0.000090 kb 0 kps 0.000000]
    PART taper PCBI27L4 1001 hostname /mnt/files/pbic 20100115143301 869/-1 0 [sec 0.000101 kb 0 kps 0.000000]
    PART taper PCBI27L4 1002 hostname /mnt/files/pbic 20100115143301 870/-1 0 [sec 0.000097 kb 0 kps 0.000000]
    PART taper PCBI27L4 1003 hostname /mnt/files/pbic 20100115143301 871/-1 0 [sec 0.000094 kb 0 kps 0.000000]
    PART taper PCBI27L4 1004 hostname /mnt/files/pbic 20100115143301 872/-1 0 [sec 0.000092 kb 0 kps 0.000000]
    PART taper PCBI27L4 1005 hostname /mnt/files/pbic 20100115143301 873/-1 0 [sec 0.000104 kb 0 kps 0.000000]
    PART taper PCBI27L4 1006 hostname /mnt/files/pbic 20100115143301 874/-1 0 [sec 0.000093 kb 0 kps 0.000000]
    PART taper PCBI27L4 1007 hostname /mnt/files/pbic 20100115143301 875/-1 0 [sec 0.000094 kb 0 kps 0.000000]
    PART taper PCBI27L4 1008 hostname /mnt/files/pbic 20100115143301 876/-1 0 [sec 0.000097 kb 0 kps 0.000000]
    -bash-3.2$ tail Daily1/log/amdump
    driver: hdisk-state time 103332.747 hdisk 0: free 52428800 dumpers 0
    driver: result time 103332.747 from taper: PARTDONE 00-00100 PCBI27L4 1008 0 "[sec 0.000097 kb 0 kps 0.000000]"
    driver: state time 103337.429 free kps: 34942 space: 52428800 taper: writing idle-dumpers: 1 qlen tapeq: 0 runq: 0 roomq: 0 wakeup: 0 driver-idle: not-idle
    driver: interface-state time 103337.429 if default: free 34942
    driver: hdisk-state time 103337.429 hdisk 0: free 52428800 dumpers 0
    driver: result time 103337.429 from taper: PARTDONE 00-00100 PCBI27L4 1009 0 "[sec 0.000102 kb 0 kps 0.000000]"
    driver: state time 103342.171 free kps: 34942 space: 52428800 taper: writing idle-dumpers: 1 qlen tapeq: 0 runq: 0 roomq: 0 wakeup: 0 driver-idle: not-idle
    driver: interface-state time 103342.171 if default: free 34942
    driver: hdisk-state time 103342.171 hdisk 0: free 52428800 dumpers 0
    driver: result time 103342.171 from taper: PARTDONE 00-00100 PCBI27L4 1010 0 "[sec 0.000099 kb 0 kps 0.000000]"
    Hence, I'm wondering why do the "dumpers" store more than 2 "split diskbuffer" files on the /data partition? Isn't the "dumper" supposed to create a "split diskbuffer" file and release it once it's written to tape? Why doesn't this happen above?

    ps: I just killed the "taper" process and saw that the "split diskbuffer" files were finally released and the /data partition looks like this now:
    Code:
    Filesystem                          Size  Used Avail Use% Mounted on
    /dev/mapper/SystemVG-data 148G  543M  140G   1% /data
    Maybe the "taper" has a bug?

    Best,
    Valeriu
    Last edited by vlr; January 18th, 2010 at 11:27 AM. Reason: Solved

  2. #2
    Join Date
    Mar 2007
    Location
    Chicago, IL
    Posts
    688

    Default

    Amazing - this is a bug that has been present for over a year now, and I've only seen two reports of it - yours and another about 5 days ago.

    Yes, the taper has a bug. If you've compiled from source, the fix is actually quite simple:
    Code:
    Index: server-src/taper-disk-port-source.c
    ===================================================================
    --- server-src/taper-disk-port-source.c (revision 2169)
    +++ server-src/taper-disk-port-source.c (working copy)
    @@ -94,6 +94,9 @@
         if(G_OBJECT_CLASS(parent_class)->finalize)
             (* G_OBJECT_CLASS(parent_class)->finalize)(obj_self);
     
    +    if (self->buffer_fd >= 0)
    +       amclose(self->buffer_fd);
    +
         amfree(self->buffer_dir_name);
         amfree(self->_priv->excess_buffer);
         amfree(self->_priv);
    This code has been completely erased (the taper has been rewritten) in the next release, so there's no fix needed in trunk.

  3. #3
    Join Date
    Mar 2007
    Location
    Chicago, IL
    Posts
    688

    Default

    I just committed the fix to the 2.6.1 branch, so if and when we release a final 2.6.1 patch version, it will include this fix.

  4. #4
    Join Date
    Nov 2009
    Location
    Philadelphia
    Posts
    2

    Smile

    Hi Dustin,

    I applied the patch and I think it fixed the problem. I no longer see multiple "diskbuffer" files with 'lsof', but only one as shown below:

    Code:
    [root@cronos ~]# date
    Mon Jan 18 12:03:42 EST 2010
    [root@cronos ~]# lsof | grep "diskbuffer"
    taper     23281    amanda    6u      REG              253,5 53687091200    3850285 /data/amanda/diskbuffer/amanda-split-buffer-O5TS6U (deleted)
    Code:
    [root@cronos ~]# date
    Mon Jan 18 15:16:31 EST 2010
    [root@cronos ~]# lsof|grep diskbuffer
    taper     23281    amanda    6u      REG              253,5 53687091200    3850285 /data/amanda/diskbuffer/amanda-split-buffer-TPQR6U (deleted)
    Also, Amanda doesn't use more than the imposed limit of 50Gb on the /data partition:
    Code:
    [root@cronos ~]# df -h /data
    Filesystem                          Size  Used Avail Use% Mounted on
    /dev/mapper/SystemVG-data 148G   51G   90G  37% /data
    I'm glad Amanda works as expected now. Thanks for the quick fix!

    Cheers,
    Valeriu

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •