Results 1 to 2 of 2

Thread: amanda server: chunker takes forever, backup never finishes.

  1. #1

    Unhappy amanda server: chunker takes forever, backup never finishes.

    hello all!

    recently we had a power outage and my backup server crashed during backup process.

    server is configured with virtual tapes (15 x 500G on a 20 Tb disk).

    after crash i have removed failed incremental from disk and forced full backup. added 5 new virtual tapes (there is plenty of space available). everything started as should, i saw some activity on the client sides as well, but...

    right now amanda server seems to take forever (backup is running for 5 days straight, never took more then 2 days before). when looking at the processes there is only one by amanda - chunker. up for many many minutes.

    where should i dig / what shall i check?

    another problem is that when i try to run amstatus right now it never gives out an answer, hanging in the middle of status report right after some messages about chunks 'waiting for flush'.

    please help!

  2. #2

    Question

    update: finally i decided to kill the process and was unsuccessful (all CPUs were on i/o wait and nothing could be terminated, despite DEFUNCT status). so i had to reboot the server, ran amcleanup.

    rught now amstatus gives out following:

    Code:
    Using /etc/amanda/cpst/log/amdump
    From Thu Nov 22 18:49:27 CET 2012
    
    nfs02.cluster1:/data/cockpit/cockpit 1         2m flushing to tape (18:49:27), waiting for a new tape
    nfs02.cluster1:/data/cockpit/cockpit 1         2m waiting to flush
    nfs02.cluster1:/data/cockpit/cockpit 1         2m waiting to flush
    nfs02.cluster1:/data/cockpit/cockpit 1         3m waiting to flush
    nfs02.cluster1:/data/cockpit/cockpit 1         3m waiting to flush
    nfs02.cluster1:/data/cockpit/cockpit 1       165m waiting to flush
    nfs02.cluster1:/data/cockpit/cockpit 1       168m waiting to flush
    nfs02.cluster1:/data/cockpit/cockpit 1       168m waiting to flush
    nfs02.cluster1:/data/cockpit/cockpit 1       168m waiting to flush
    nfs02.cluster1:/data/cockpit/cockpit 0      1973m wait for dumping
    nfs02.cluster1:/data/cpsteam/cpsdata 1     14619m waiting to flush
    nfs02.cluster1:/data/cpsteam/cpsdata 1     14619m waiting to flush
    nfs02.cluster1:/data/cpsteam/cpsdata 1     14619m waiting to flush
    nfs02.cluster1:/data/cpsteam/cpsdata 1     14891m waiting to flush
    nfs02.cluster1:/data/cpsteam/cpsdata 1     44111m waiting to flush
    nfs02.cluster1:/data/cpsteam/cpsdata 1     45301m waiting to flush
    nfs02.cluster1:/data/cpsteam/cpsdata 1     45353m waiting to flush
    nfs02.cluster1:/data/cpsteam/cpsdata 1     45360m waiting to flush
    nfs02.cluster1:/data/cpsteam/cpsdata 1     45360m waiting to flush
    nfs02.cluster1:/data/cpsteam/cpsdata 0   1644024m dumping   707639m ( 43.04%) (18:57:53)
    
    SUMMARY          part      real  estimated
                               size       size
    partition       :  20
    estimated       :   2              1645997m
    flush           :  18    284923m
    failed          :   0                    0m           (  0.00%)
    wait for dumping:   1                 1973m           (  0.12%)
    dumping to tape :   0                    0m           (  0.00%)
    dumping         :   1    707639m   1644024m ( 43.04%) ( 42.99%)
    dumped          :   0         0m         0m (  0.00%) (  0.00%)
    wait for writing:   0         0m         0m (  0.00%) (  0.00%)
    wait to flush   :  17    284920m    284920m (100.00%) (  0.00%)
    writing to tape :   1         2m         2m (100.00%) (  0.00%)
    failed to tape  :   0         0m         0m (  0.00%) (  0.00%)
    taped           :   0         0m         0m (  0.00%) (  0.00%)
      tape 1        :   0         0m         0m (  0.00%) cpst-11
    5 dumpers idle  : runq
    taper writing, tapeq: 17
    network free kps:         0
    holding space   :   8719761m ( 84.14%)
     dumper0 busy   :  0:00:00  (  0.00%)
     0 dumpers busy :  0:08:25  ( 99.88%)                runq:  0:08:25  (100.00%)
     1 dumper busy  :  0:00:00  (  0.00%)


    my problem is... it looked exactly the same last time when i started the dump. and at some point it hang. just stopped to give results with amstatus as well...


    any ideas, anyone? i really do not want to kill all backups and start from scratch. what if such power outage happens again? (yes, we are looking into UPS story)


    oh, btw, last amdump, just in case.
    Attached Files Attached Files
    Last edited by spad; November 23rd, 2012 at 12:14 AM.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •