Results 1 to 6 of 6

Thread: "waiting for flush", "wait for writing to tape" and tape being ready?

  1. #1
    Join Date
    Jan 2007
    Location
    Richland, WA [USA]
    Posts
    78

    Default "waiting for flush", "wait for writing to tape" and tape being ready?

    [Executive summary: two days of backups from roughtly 300 DLEs stuck on holding disks, no longer writing to virtual tapes for some reason, amflush NOT flushing to tape. Details below.]

    Hello folks. Yesterday morning, my Amanda backup died for some reason (as I got blown out of my amandabackup user shells as well, I'm guessing I might have hit a resource limit on the box - odd, as it's handled a load of 44 before without dying).

    Anyhow, the backups that started at 00:59 in the morning died somewhere aroung 02:15 or 02:31 (not sure, those were the last timestamps in the /etc/amanda/XXXXset1/log directory). When I got in to the office in the morning and looked for my daily e-mail and didn't find it, I checked with an 'amstatus XXXXset1' and saw that things weren't finished. Doing a 'ps -ef|grep am' showed that no Amanda processes were still running, and the holding disks had data in them.

    I tried doing an 'amflush XXXXset1' and got a message that I needed to run amcleanup first. I did that, it seems to have succeeded, and I did another amflush which appeared to work just fine.

    Since I was expecting a large set of level-0 backups that night, I decided to rerun the backup during the day. I started it, and it ran until about 21:30 or so, when I got my completion e-mail. I then looked, as I noticed it didn't write anything to the tape (XXXXset1-51 showed 0% used). I figured that it needed another amflush (although I don't know why), and tried a "amflush -f XXXXset1" so it would run in the foreground and I could see how long it took. The amflush found several sets of data to flush, I answered 'all' and 'y' and it completed immediately, e-mailing me a AMFLUSH report that showed all 0's, and said that it wrote to tape XXXXset1-52. I tried again, this time using a "amflush -b XXXXset1" so it would just flush all the data that it found. I got another "all 0's" mail report back, so it seems that amflush is not in fact flushing anything.

    After these problems (and noticing that there were NO Amanda processes running in the background), and trying a few more times to run amflush (just picking one of the sets that it found, letting it run in the background, etc.) I just looked in the virtual disk "data" directories and saw that there was no actual data, did "amrmtape" and "amlabel" commands on the four tapes it said it used but did not actually write anything too, and figured that the screwup was running more than one backup on the same date (thinking maybe I'm tickling a bug from the conversion from "one run per day" to using timestamps.)

    So, fast forward to this morning. The backups completed, saying they only wrote backups to one tape, but in fact writing NO data to any tapes. The "amstatus XXXXset1" shows all the DLEs like this:

    Code:
    ...
    walleye:/              1       32m waiting to flush
    walleye:/              1       58m dump done (2:23:05), wait for writing to tape
    walleye:/home          1        0m waiting to flush
    walleye:/home          1        0m dump done (2:21:01), wait for writing to tape
    
    SUMMARY          part      real  estimated
                               size       size
    partition       : 592
    estimated       : 281               199169m
    flush           : 290    218752m
    failed          :  23                 4247m           (  2.13%)
    wait for dumping:   0                    0m           (  0.00%)
    dumping to tape :   0                    0m           (  0.00%)
    dumping         :   0         0m         0m (  0.00%) (  0.00%)
    dumped          : 279     92493m    194921m ( 47.45%) ( 46.44%)
    wait for writing: 279     92493m    194921m ( 47.45%) ( 46.44%)
    wait to flush   : 289    218752m    218752m (100.00%) (  0.00%)
    writing to tape :   1         0m         0m (100.00%) (  0.00%)
    failed to tape  :   0         0m         0m (  0.00%) (  0.00%)
    taped           :   0         0m         0m (  0.00%) (  0.00%)
      tape 1        :   0         0m         0m (  0.00%) XXXXset1-55
    15 dumpers idle : no-dumpers
    taper idle
    network free kps:   2252800
    holding space   :    214695m ( 69.89%)
    chunker0 busy   :  0:00:00  (  0.00%)
    chunker1 busy   :  0:00:00  (  0.00%)
    -sh-3.00$ amtape XXXXset1 current
    amtape: scanning current slot in tape-changer rack:
    slot 55: date 20070216005901 label XXXXset1-55
    -sh-3.00$
    -sh-3.00$ cat changerstatus
    # multi-changer state cache: DO NOT EDIT!
    curslot 55
    curloaded 1
    -sh-3.00$
    So, it seems to want to write to tape XXXXset1-55, that IS the tape in the current tape drive slot, and yet it's doing nothing. The holding disk space is about full, and I don't want to touch it until I can figure out what is the right thing to do here. ALL the DLEs are both waiting for flushing (from yesterday?) as well as writing to tape (today's backups).

    I did update from v2.5.1p2 to 2.5.1p3 (on the 4GB 4 CPU Redhat AW4 backup server), on the off chance that some small bug was fixed that affects this problem. I updated using the Zmanda.com binary RPMs. That hope doesn't seem to have panned out.

    So, once again, what is it that I'm doing wrong here? I'm getting a little leery of trusting our systems to Amanda when I can't get past these simple problems. Granted, I am aiming for 400 to 500 DLEs worth of backups, using virtual tapes on 7.2TB worth of RAID-5 storage to start, with more growth to additional storage arrays after we get a feel for how much "churn" there is on these systems.

    Best regards (and hopes for yet another magic bullet).

    -Andrew

  2. #2
    Join Date
    Jan 2007
    Posts
    56

    Default

    can you post the contents of the following:


    /etc/amanda/<backup-set in question>/amdump.1
    /etc/amanda/<backup-set in question>/amdump.2
    /etc/amanda/<backup-set in question>/amdump.3

    Also, corresponding to the time date of the 3 most current amdump files, can you post the 3 most recent log files. from the same path.

    /etc/amanda/<backup-set in question>/log.200702......

    Thank You
    -Shailen

  3. #3
    Join Date
    Nov 2005
    Location
    Canada
    Posts
    1,049

    Default

    The amdump.1 ar amflush.1 file is needed to debug this problem.
    Could you also post the taper.<timestamp>.debug file?

  4. #4
    Join Date
    Jan 2007
    Location
    Richland, WA [USA]
    Posts
    78

    Default

    Quote Originally Posted by martineau View Post
    The amdump.1 ar amflush.1 file is needed to debug this problem.
    Could you also post the taper.<timestamp>.debug file?
    The log files that you and Shailen have requested amount to just over 35MB of files, which gzip's down to about 2.8MB at maximum compression. If you have a place I can upload it, I can do that. Otherwise, if there are particular things to look for, I can look around and/or snip out sections of particular interest. I didn't realize that the amflush process left log files behind, and I'll see if there's anything obvious in there.

    I didn't find any files starting with the string "taper" anywhere in the /etc/amanda directory tree.

    You can contact me off the forum at:

    ambackup.rakowski 'at' spamgourmet.com

    to arrange for some place where I can put these files for you to look at.

    Thanks for helping out...

    -Andrew

    ps: Is it the weekend yet? Somehow, it feels like a Monday!

  5. #5
    Join Date
    Jan 2007
    Posts
    56

    Default

    i will email you the ftp information

    -shailen

  6. #6
    Join Date
    Jan 2007
    Location
    Richland, WA [USA]
    Posts
    78

    Default

    Quote Originally Posted by shailen View Post
    i will email you the ftp information

    -shailen
    I worked with Shailen and Jean-Louis offline and discovered the problem. After deleting the exactly 32KB sized file in the holding disk directory, I was able to run a flush successfully (filled seven tapes). I suspect this file was created just as the system crashed/amanda processes died, and amflush didn't know what to do with it.

    Thanks for all the help folks! 8^)

    -Andrew

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •