Results 1 to 4 of 4

Thread: Full Backup of Windows server failing.

  1. #1
    Join Date
    Jun 2013
    Posts
    11

    Default Full Backup of Windows server failing.

    Hello,

    I'm using Amanda server 3.3.3 on an Ubuntu Server 12.04 machine, and I'm trying to backup to disk an Archive folder on our fileshare server, roughly 1TB in size.
    When I run amcheck, i get the following:


    Code:
    amandabackup@amanda:~$ amcheck DailySet
    Amanda Tape Server Host Check
    -----------------------------
    Holding disk /holding: 489893888 kB disk space available, using 489893888 kB
    slot 1: contains an empty volume
    Will write label 'DailySet01' to new volume in slot 1.
    NOTE: skipping tape-writable test
    NOTE: host info dir /amanda/state/curinfo/pcas-data.pcasinc.com does not exist
    NOTE: it will be created on the next run.
    NOTE: index dir /amanda/state/index/pcas-data.pcasinc.com/F__Archives does not exist
    NOTE: it will be created on the next run.
    Server check took 0.262 seconds
    
    Amanda Backup Client Hosts Check
    --------------------------------
    Client check: 1 host checked in 0.237 seconds.  0 problems found.
    
    (brought to you by Amanda 3.3.3)
    When I run amdump, it takes 2 hours and 11 minutes to do its estimate, then it fails. I receive the automated email that looks like the following:

    Code:
    Hostname: amanda
    Org     : DailySet
    Config  : DailySet
    Date    : June 21, 2013
    
    The next tape Amanda expects to use is: 1 new tape.
    FAILURE DUMP SUMMARY:
      pcas-data.pcasinc.com "F:/Archives" lev 0  FAILED [Request to pcas-data.pcasinc.com failed: recv error: Connection timed out]
    
    
    
    STATISTICS:
                              Total       Full      Incr.   Level:#
                            --------   --------   --------  --------
    Estimate Time (hrs:min)     2:11
    Run Time (hrs:min)          2:11
    Dump Time (hrs:min)         0:00       0:00       0:00
    Output Size (meg)            0.0        0.0        0.0
    Original Size (meg)          0.0        0.0        0.0
    Avg Compressed Size (%)      --         --         --
    DLEs Dumped                    0          0          0
    Avg Dump Rate (k/s)          --         --         --
    
    Tape Time (hrs:min)         0:00       0:00       0:00
    Tape Size (meg)              0.0        0.0        0.0
    Tape Used (%)                0.0        0.0        0.0
    DLEs Taped                     0          0          0
    Parts Taped                    0          0          0
    Avg Tp Write Rate (k/s)      --         --         --
    
    NOTES:
      planner: Adding new disk pcas-data.pcasinc.com:"F:/Archives".
      driver: WARNING: got empty schedule from planner
      taper: Slot 1 without label can be labeled
    
    
    DUMP SUMMARY:
                                                      DUMPER STATS               TAPER STATS
    HOSTNAME              DISK          L ORIG-kB  OUT-kB  COMP%  MMM:SS   KB/s MMM:SS   KB/s
    ------------------------------------- ------------------------------------- -------------
    pcas-data.pcasinc.com "F:/Archives"                              FAILED                  
    
    (brought to you by Amanda version 3.3.3)

    I've looked at the logs at /amanda/state/log/amdump.* and I get the following:

    Code:
    amdump: start at Fri Jun 21 08:03:39 EDT 2013
    amdump: datestamp 20130621
    amdump: starttime 20130621080339
    amdump: starttime-locale-independent 2013-06-21 08:03:39 EDT
    driver: pid 4270 executable /usr/libexec/amanda/driver version 3.3.3
    planner: pid 4269 executable /usr/libexec/amanda/planner version 3.3.3
    planner: build: VERSION="Amanda-3.3.3"
    planner:        BUILT_DATE="Thu Jan 10 00:50:35 PST 2013" BUILT_MACH=""
    planner:        BUILT_REV="5099" BUILT_BRANCH="community_3_3_3"
    planner:        CC="x86_64-linux-gnu-gcc"
    planner: paths: bindir="/usr/bin" sbindir="/usr/sbin"
    planner:        libexecdir="/usr/libexec"
    planner:        amlibexecdir="/usr/libexec/amanda" mandir="/usr/share/man"
    planner:        AMANDA_TMPDIR="/tmp/amanda"
    planner:        AMANDA_DBGDIR="/var/log/amanda" CONFIG_DIR="/etc/amanda"
    planner:        DEV_PREFIX="/dev/" RDEV_PREFIX="/dev/" DUMP="/sbin/dump"
    planner:        RESTORE="/sbin/restore" VDUMP=UNDEF VRESTORE=UNDEF
    planner:        XFSDUMP="/sbin/xfsdump" XFSRESTORE="/sbin/xfsrestore"
    planner:        VXDUMP=UNDEF VXRESTORE=UNDEF
    planner:        SAMBA_CLIENT="/usr/bin/smbclient" GNUTAR="/bin/tar"
    planner:        COMPRESS_PATH="/bin/gzip" UNCOMPRESS_PATH="/bin/gzip"
    planner:         LPRCMD=UNDEF  MAILER=UNDEF
    planner:        listed_incr_dir="/var/lib/amanda/gnutar-lists"
    planner: defs:  DEFAULT_SERVER="localhost" DEFAULT_CONFIG="DailySet1"
    planner:        DEFAULT_TAPE_SERVER="localhost" DEFAULT_TAPE_DEVICE=""
    planner:        NEED_STRSTR AMFLOCK_POSIX AMFLOCK_FLOCK AMFLOCK_LOCKF
    planner:        AMFLOCK_LNLOCK SETPGRP_VOID ASSERTIONS AMANDA_DEBUG_DAYS=4
    planner:        BSD_SECURITY USE_AMANDAHOSTS CLIENT_LOGIN="amandabackup"
    planner:        CHECK_USERID HAVE_GZIP COMPRESS_SUFFIX=".gz"
    planner:        COMPRESS_FAST_OPT="--fast" COMPRESS_BEST_OPT="--best"
    planner:        UNCOMPRESS_OPT="-dc"
    READING CONF INFO...
    planner: timestamp 20130621080339
    planner: tape_length is set from tape length (618073856 KB) * runtapes (1) == 618073856 KB
    planner: time 0.000: startup took 0.000 secs
    
    SENDING FLUSHES...
    ENDFLUSH
    
    SETTING UP FOR ESTIMATES...
    planner: time 0.000: setting up estimates for pcas-data.pcasinc.com:"F:/Archives"
    pcas-data.pcasinc.com:"F:/Archives" overdue 15878 days for level 0
    setup_estimate: pcas-data.pcasinc.com:"F:/Archives": command 0, options: none    last_level -1 next_level0 -15878 level_days 0    getting estimates 0 (-3) -1 (-3) -1 (-3)
    planner: time 0.000: setting up estimates took 0.000 secs
    
    GETTING ESTIMATES...
    driver: tape size 618073856
    driver: adding holding disk 0 dir /holding size 489893888 chunksize 1048576
    reserving 489893888 out of 489893888 for degraded-mode dumps
    driver: started dumper0 pid 4272
    driver: send-cmd time 0.012 to dumper0: START 20130621080339
    driver: started dumper1 pid 4273
    driver: send-cmd time 0.013 to dumper1: START 20130621080339
    driver: started dumper2 pid 4274
    driver: send-cmd time 0.013 to dumper2: START 20130621080339
    driver: started dumper3 pid 4275
    driver: send-cmd time 0.014 to dumper3: START 20130621080339
    driver: started dumper4 pid 4276
    driver: send-cmd time 0.014 to dumper4: START 20130621080339
    driver: started dumper5 pid 4277
    driver: send-cmd time 0.015 to dumper5: START 20130621080339
    driver: started dumper6 pid 4278
    driver: send-cmd time 0.015 to dumper6: START 20130621080339
    driver: started dumper7 pid 4279
    driver: send-cmd time 0.016 to dumper7: START 20130621080339
    driver: started dumper8 pid 4280
    driver: send-cmd time 0.016 to dumper8: START 20130621080339
    driver: started dumper9 pid 4281
    driver: send-cmd time 0.017 to dumper9: START 20130621080339
    driver: send-cmd time 0.017 to taper: START-TAPER worker0 20130621080339
    driver: start time 0.017 inparallel 10 bandwidth 80000 diskspace 489893888  dir OBSOLETE datestamp 20130621080339 driver: drain-ends tapeq FIRST big-dumpers ttt
    driver: state time 0.017 free kps: 80000 space: 489893888 taper: idle idle-dumpers: 10 qlen tapeq: 0 runq: 0 roomq: 0 wakeup: 0 driver-idle: not-idle
    driver: interface-state time 0.017 if default: free 80000
    driver: hdisk-state time 0.017 hdisk 0: free 489893888 dumpers 0
    dumper: pid 4273 executable dumper1 version 3.3.3
    dumper: pid 4280 executable dumper8 version 3.3.3
    dumper: pid 4278 executable dumper6 version 3.3.3
    dumper: pid 4272 executable dumper0 version 3.3.3
    dumper: pid 4275 executable dumper3 version 3.3.3
    dumper: pid 4281 executable dumper9 version 3.3.3
    dumper: pid 4276 executable dumper4 version 3.3.3
    dumper: pid 4274 executable dumper2 version 3.3.3
    dumper: pid 4279 executable dumper7 version 3.3.3
    dumper: pid 4277 executable dumper5 version 3.3.3
    driver: state time 0.383 free kps: 80000 space: 489893888 taper: idle idle-dumpers: 10 qlen tapeq: 0 runq: 0 roomq: 0 wakeup: 0 driver-idle: not-idle
    driver: interface-state time 0.383 if default: free 80000
    driver: hdisk-state time 0.383 hdisk 0: free 489893888 dumpers 0
    driver: result time 0.383 from taper: TAPER-OK worker0
    error result for host pcas-data.pcasinc.com disk "F:/Archives": Request to pcas-data.pcasinc.com failed: recv error: Connection timed out
    planner: time 7878.436: getting estimates took 7878.436 secs
    FAILED QUEUE:
      0: pcas-data.pcasinc.com "F:/Archives"
    DONE QUEUE: empty
    
    ANALYZING ESTIMATES...
    planner: FAILED pcas-data.pcasinc.com "F:/Archives" 20130621080339 0 "[Request to pcas-data.pcasinc.com failed: recv error: Connection timed out]"
    INITIAL SCHEDULE (size 64):
    
    DELAYING DUMPS IF NEEDED, total_size 64, tape length 618073856 mark 0
      delay: Total size now 64.
    
    PROMOTING DUMPS IF NEEDED, total_lev0 0, balanced_size 0...
    planner: time 7878.437: analysis took 0.000 secs
    
    GENERATING SCHEDULE:
    --------
    --> Generated empty schedule! <--
    --------
    driver: flush size 0
    driver: state time 7878.441 free kps: 80000 space: 489893888 taper: idle idle-dumpers: 10 qlen tapeq: 0 runq: 0 roomq: 0 wakeup: 0 driver-idle: no-dumpers
    driver: interface-state time 7878.441 if default: free 80000
    driver: hdisk-state time 7878.441 hdisk 0: free 489893888 dumpers 0
    driver: QUITTING time 7878.441 telling children to quit
    driver: send-cmd time 7878.441 to dumper0: QUIT ""
    driver: send-cmd time 7878.441 to dumper1: QUIT ""
    driver: send-cmd time 7878.441 to dumper2: QUIT ""
    driver: send-cmd time 7878.441 to dumper3: QUIT ""
    driver: send-cmd time 7878.441 to dumper4: QUIT ""
    driver: send-cmd time 7878.441 to dumper5: QUIT ""
    driver: send-cmd time 7878.441 to dumper6: QUIT ""
    driver: send-cmd time 7878.441 to dumper7: QUIT ""
    driver: send-cmd time 7878.441 to dumper8: QUIT ""
    driver: send-cmd time 7878.441 to dumper9: QUIT ""
    driver: send-cmd time 7878.441 to taper: QUIT
    driver: FINISHED time 7879.442
    amdump: end at Fri Jun 21 10:14:58 EDT 2013
    I've been goggling the key error: "planner: FAILED [Request to failed: recv error: Connection timed out" and I just can't find any help. Would anyone here be able to help me out?

    Thank you

    Alex

  2. #2
    Join Date
    Jun 2013
    Posts
    11

    Default

    After about two weeks now of testing and tweaking our Amanda server setup, occasionally searching for an answer to this problem, I think I've found the solution. I found a vague post on a forum archive mentioning a network firewall timeout factor closing the connection started by the Amanda server before the client has finished running its estimates. Since the Amanda server is reaching clients on separate networks and going through at least one router, I brought it up to our network architect: "do you think our firewalls are closing TCP connections after about 2 hours?" He agreed that might actually be very possible.

    Being the networking pro he is, we monitored the routers and found out this might actually be the problem. He added a new rule on the routers in question, to permit a timeout of 86400 seconds (1 day) instead of the default 7200 seconds (2 hours), specifically for sessions started by the Amanda server through port 10080. I left yet another test backup running last night, and it failed at 3 hours, due to the etimeout I had left in the amanda.conf.

    I just changed the value to 86400 and started it again now. I'll post back when it's done, hopefully with good news.

  3. #3
    Join Date
    Jun 2013
    Posts
    11

    Default

    My first test after these changes failed after two hours, but this was because I had forgotten to change the setting at /proc/sys/net/ipv4/tcp_keepalive_time from the default setting of 7200 (2 hours) back to my own setting of 86400 (24 hours). I was having some problems with amcheck, and in an attempt to revert back all change I had changed this setting back to default, but I forgot to update this one. I've also added the line "net.ipv4.tcp_keepalive_time=86400" to the config file /etc/sysctl.conf, to make the previous setting persistent after a reboot.

    Now with the changes in place, my backups still failed. The amanda server sent me a failed email after 24 hours, with the error:

    Code:
    Hostname: amanda
    Org     : Archiver
    Config  : Archiver
    Date    : July 5, 2013
    
    The next tape Amanda expects to use is: 1 new tape.
    FAILURE DUMP SUMMARY:
      pcas-data.pcasinc.com "F:/Archives" lev 0  FAILED [Some estimate timeout on pcas-data.pcasinc.com, using server estimate if possible]
    
    
    
    STATISTICS:
                              Total       Full      Incr.   Level:#
                            --------   --------   --------  --------
    Estimate Time (hrs:min)    24:00
    Run Time (hrs:min)         24:00
    Dump Time (hrs:min)         0:00       0:00       0:00
    Output Size (meg)            0.0        0.0        0.0
    Original Size (meg)          0.0        0.0        0.0
    Avg Compressed Size (%)      --         --         --
    DLEs Dumped                    0          0          0
    Avg Dump Rate (k/s)          --         --         --
    
    Tape Time (hrs:min)         0:00       0:00       0:00
    Tape Size (meg)              0.0        0.0        0.0
    Tape Used (%)                0.0        0.0        0.0
    DLEs Taped                     0          0          0
    Parts Taped                    0          0          0
    Avg Tp Write Rate (k/s)      --         --         --
    
    NOTES:
      planner: Adding new disk pcas-data.pcasinc.com:"F:/Archives".
      driver: WARNING: got empty schedule from planner
      taper: Slot 9 without label can be labeled
    
    
    DUMP SUMMARY:
                                                      DUMPER STATS               TAPER STATS
    HOSTNAME              DISK          L ORIG-kB  OUT-kB  COMP%  MMM:SS   KB/s MMM:SS   KB/s
    ------------------------------------- ------------------------------------- -------------
    pcas-data.pcasinc.com "F:/Archives"                              FAILED                  
    
    (brought to you by Amanda version 3.3.3)
    On the client side, I noticed the estimates had finished after about 5 hours. The error logs end in the following errors:

    Code:
    11232:13128:5/7/2013:14:46:16:721::FS: NTFS: FindClose: Closing Find for F:\Archives\
    11232:13128:5/7/2013:14:46:16:721::CZWCJobHandler : Leaving DoBackup
    11232:13128:5/7/2013:14:46:16:736::FS: UnloadPlugin: Unloaded Plugin for FileSystem NTFS
    11232:13128:5/7/2013:14:46:16:736::CZWCJobHandler : Leaving DoEstimateFromFS
    11232:13128:5/7/2013:14:46:16:736::CZWCJobHandler::GetDLESize : DLE size = 1319636900970
    11232:13128:5/7/2013:14:46:16:736::SNAPSHOT: zwcSnapUninitializeModule: Invalid Context
    11232:13128:5/7/2013:14:46:16:736::Catalog:DisconnectDb: Disconnecting DB
    11232:13128:5/7/2013:14:46:16:736::Catalog:DisconnectDb: Connection to DB is disconnected successfully
    11232:13128:5/7/2013:14:46:16:736::CZWCJobHandler : Leaving GetDLESize
    11232:13128:5/7/2013:14:46:16:736::CZWCJobHandler : Leaving ExecuteGetDLESizeJob
    11232:13600:5/7/2013:14:46:16:752::11232:13128:5/7/2013:14:46:16:752::Got error while reading from socket 
    CWZCDispatcher : Leaving ExecuteTask
    11232:13600:5/7/2013:14:46:16:752::ZWC Failure error code: 64
    11232:13600:5/7/2013:14:46:16:752::RemoveStaleClient : Last error code is 64
    11232:13600:5/7/2013:14:46:16:767::The specified network name is no longer available.
    
    11232:13600:5/7/2013:14:46:16:767::Communication Layer : Connection closed by server.
    I found this post here: [URL="http://forums.zmanda.com/archive/index.php/t-3369.html"]http://forums.zmanda.com/archive/index.php/t-3369.html[/URL]
    He has very similar errors, but he said changing the tcp_keepalive_time value that I've already changed solved his problem. I've run 3 test backups since I've made these changes, and they've all failed.

    So far, amanda backups is running perfectly for the last 3 weeks on another windows server and an NFS share. I've spent so much time getting this far, and I'm really lost here. I really like this software, expecially compared to the Microsoft DPM system that we kinda sorta use at the moment. If I can consider this backup system reliable and show it to the rest of my team, I would be on the front lines to push our company to buy enterprise support for this software. Please, can anyone here help me?

  4. #4
    Join Date
    Jun 2013
    Posts
    11

    Default

    Last week I decided to use a dumptype that included "estimate server" instead of "estimate client". At this point who cares if it's estimating, I just want it backed up. Had anyone told me this, they would have saved me over 2 weeks of over thinking everything.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •