Results 1 to 5 of 5

Thread: dumps of Amanda server's local root partition failing

  1. #1
    Join Date
    Aug 2006
    Posts
    34

    Question dumps of Amanda server's local root partition failing

    Hi all,
    I am trying to troubleshoot repeated dump failurs that are only occuring when my Amanda server tries to back up it's local root partition. Similar failures also occasionally occur on /var/lib/amanda, but usually the disk is re-tried and completes later on in the run.

    middenheap.xxx.edu / lev 1 FAILED [cannot read header: got 0 instead of 32768]
    middenheap.xxx.edu / lev 1 FAILED [cannot read header: got 0 instead of 32768]
    middenheap.xxx.edu / lev 1 FAILED [too many dumper retry: "[request failed: timeout waiting for ACK]

    Yesterday I made the following adjustments to amanda.conf and disklist in an effort to troubleshoot:
    * increased "inparallel" value from 12 to 14
    * gave middenheap's DLEs the following spindle numbers so that the largest disks won't try to back up simultaneously:

    middenheap.xxx.edu / {
    server-encrypt-root-fast
    exclude list ".am_exclude"
    } 2 local
    middenheap.xxx.edu /etc/amanda amanda-config-backup 2 local
    middenheap.xxx.edu /var/lib/amanda amanda-config-backup 3 local
    middenheap.xxx.edu /etc include 2 local
    middenheap.xxx.edu /var server-encrypt-fast 3 local

    The associated backup types:
    Code:
    define dumptype server-encrypt-fast {
          global
          program "GNUTAR"
          comment "dump with fast client compression and server openssl asymmetric encryption"
          compress client fast
          encrypt  server
          index
          server_encrypt "/usr/sbin/amcrypt-ossl-asym"
          server_decrypt_option "-d"
          priority medium
    }
    
    # high priority for user data
    define dumptype server-encrypt-user-fast {
      server-encrypt-fast
      priority high
    }
    
    # low priority for root partitions
    define dumptype server-encrypt-root-fast {
      server-encrypt-fast
      priority low
    }
     # amanda config/bootstrap file backups
    define dumptype amanda-config-backup {    #root-tar
       server-encrypt-fast    comment "force Level 0 backups for Amanda config files"
       strategy noinc
    }
    define dumptype include {
       amanda-config-backup    comment "force Level 0 backups and only backup files specified in the include list"
       priority medium
       include list "/var/lib/amanda/srv_configs"
    }
    .am_exclude for / lists /amandatapes, which is a symbolic link to a separate partition, probably not necessary but placed there anyway for good measure.

    Excerpts from /tmp/amanda/client/ISDaily2.5/sendbackup*.debug:
    sendbackup: time 318.720: started backup
    sendbackup: time 2665.645: 118: strange(?): sed: couldn't write 72 items to stdout: Broken pipe
    sendbackup: time 2677.876: 118: strange(?):
    sendbackup: time 2677.876: 118: strange(?): gzip: stdout: Broken pipe
    sendbackup: time 2678.156: index tee cannot write [Broken pipe]
    sendbackup: time 2678.156: pid 30607 finish time Wed Jan 10 04:53:34 2007

    sendbackup: time 0.450: started backup
    sendbackup: time 927.562: 118: strange(?): sed: couldn't write 47 items to stdout: Broken pipe
    sendbackup: time 928.079: 118: strange(?):
    sendbackup: time 928.079: 118: strange(?): gzip: stdout: Broken pipe
    sendbackup: time 928.088: index tee cannot write [Broken pipe]

    Why is the amanda client on middenheap losing connectivity with the amanda server process, also on middenheap, after only 900 seconds? There is a line in my iptables rules explicitly allowing all traffic on lo, so that would seem to rule out firewall issues.
    -A RH-Firewall-1-INPUT -i lo -j ACCEPT

    I've already made the suggested changes to /etc/modprobe.conf:
    options ip_conntrack_amanda master_timeout=3600

    And here's a error I've never noticed before in ~amandabackup/ISDaily2.5/amdump - "timeout waiting for REP":
    Code:
    driver: result time 1004.310 from chunker13: FAILED 14-00016 "[cannot read header: got 0 instead of 32768]"
    driver: state time 1004.310 free kps: 82966 space: 322408352 taper: idle idle-dumpers: 0 qlen tapeq: 0 runq: 31 roomq: 0 wakeup: 0    driver-idle: no-dumpers
    driver: interface-state time 1004.310 if default: free 72566 if LOCAL: free 10000 if LE0: free 400
    driver: hdisk-state time 1004.310 hdisk 0: free 322408352 dumpers 14
    driver: result time 1004.310 from dumper13: TRY-AGAIN 14-00016 "[request failed: timeout waiting for REP]"
    Most of the other related errors in amdump.1 look like this with "timeout waiting for ACK":
    Code:
    driver: hdisk-state time 3108.065 hdisk 0: free 322012874 dumpers 14
    driver: result time 3108.065 from dumper13: TRY-AGAIN 13-00017 "[request failed: timeout waiting for ACK]"
    driver: dump failed 13-00017 middenheap.xxx.edu /, too many dumper retry: "[request failed: timeout waiting for ACK]"
    rename_tmp_holding: /amandahold/20070110031502/middenheap.xxx.edu._.1: empty file?
    driver: adjust_diskspace: time 3108.460: middenheap.xxx.edu:/ /amandahold/20070110031502/middenheap.xxx.edu._.1
    driver: adjust_diskspace: time 3108.460: hdisk HD1 done, reserved 1344 used 0 diff -1344 alloc 71148758 dumpers 13driver: adjust_diskspace: time 3108.460: after: disk middenheap.xxx.edu:/ used 0
    I'm really stumped! All backups on all other clients (49 DLEs total) are working fine. Any ideas?

  2. #2
    Join Date
    Apr 2006
    Posts
    116

    Default iptables

    I think this could be caused by the firewall not allowing the client to connect back to the server. Since you are using a FQDN middenheap.xxx.edu and not "localhost" the traffic may not be going through "lo". I would suggest trying to use "localhost" instead.

    Thanks

  3. #3
    Join Date
    Aug 2006
    Posts
    34

    Default

    Quote Originally Posted by ppragin View Post
    I think this could be caused by the firewall not allowing the client to connect back to the server. Since you are using a FQDN middenheap.xxx.edu and not "localhost" the traffic may not be going through "lo". I would suggest trying to use "localhost" instead.
    Hi Ppragin - I tried this for last night's dumps, but both / and /var failed with the same errors. From amstatus:

    localhost:/ 0 driver: (aborted:"[request failed: timeout waiting for ACK]")(too many dumper retry)
    localhost:/var 0 driver: (aborted:"[request failed: timeout waiting for ACK]")(too many dumper retry)

    in sendbackup.20070112032821.debug:
    Code:
    sendbackup: time 65.848: started backup
    sendbackup: time 2442.495: 118: strange(?): sed: couldn't write 72 items to stdout: Broken pipe
    sendbackup: time 2460.388: 118: strange(?):
    sendbackup: time 2460.388: 118: strange(?): gzip: stdout: Broken pipe
    sendbackup: time 2460.684: index tee cannot write [Broken pipe]
    sendbackup: time 2460.684: pid 7933 finish time Fri Jan 12 04:10:14 2007
    sendbackup: time 2460.685: 118: strange(?): sendbackup: index tee cannot write [Broken pipe]
    sendbackup: time 2460.731:  47:    size(|): Total bytes written: 798720 (780KiB, ?/s)
    sendbackup: time 2460.731: 118: strange(?): gtar: -: Wrote only 8192 of 10240 bytes
    sendbackup: time 2460.731: 118: strange(?): gtar: Error is not recoverable: exiting now
    sendbackup: time 2460.732: error [compress returned 1, /bin/tar returned 2]
    ~/amandabackup/*/amdump.1:
    Code:
    driver: state time 880.426 free kps: 91408 space: 324362304 taper: idle idle-dumpers: 0 qlen tapeq: 0 runq: 32 roomq: 0 wakeup: 0 driver-idle: no-dumpers
    driver: interface-state time 880.426 if default: free 81008 if LOCAL: free 10000 if LE0: free 400
    driver: hdisk-state time 880.426 hdisk 0: free 324362304 dumpers 14
    driver: result time 880.426 from chunker10: FAILED 12-00018 "[cannot read header: got 0 instead of 32768]"
    driver: state time 880.426 free kps: 91408 space: 324362304 taper: idle idle-dumpers: 0 qlen tapeq: 0 runq: 32 roomq: 0 wakeup: 0 driver-idle: no-dumpers
    driver: interface-state time 880.426 if default: free 81008 if LOCAL: free 10000 if LE0: free 400
    driver: hdisk-state time 880.426 hdisk 0: free 324362304 dumpers 14
    driver: result time 880.426 from dumper10: TRY-AGAIN 12-00018 "[request failed: timeout waiting for ACK]"
    rename_tmp_holding: /amandahold/20070112031502/localhost._.0: empty file?
    I'm going to change it back to the FQDN since AMcheck would prefer it that way:

    Amanda Backup Client Hosts Check
    --------------------------------
    WARNING: Usage of fully qualified hostname recommended for Client localhost.

    Any other ideas or troubleshooting techniques I can use other than looking through log files?

  4. #4
    Join Date
    Oct 2005
    Location
    Bay Area, CA
    Posts
    124

    Default

    All the DLE are using amcrypt-ossl-asym, so it eliminates the possbility of OpenSSL setup problem.
    Only suggestion I have is to add "compress none" to dumptype server-encrypt-root-fast.
    Also look at the log of iptables see if any amanda packets have been rejected.

    Hope this helps!

    --Kevin Till

  5. #5
    Join Date
    Aug 2006
    Posts
    34

    Default "compress none"

    Yesterday I changed my server-encrypt-root-fast dumptype to use "compress none", and so far so good - last night's dumps completed without a hitch. Thanks so much!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •