Results 1 to 10 of 12

Thread: Taper process hanging

Hybrid View

  1. #1
    Join Date
    May 2017
    Location
    UK
    Posts
    8

    Default Taper process hanging

    Hi,

    Have problems with the backup run hanging and it appears to be the taper process.

    Sometimes the run will complete, but more often than not most of the dumps will remain on the holding disk and the taper process will stop updating the log. I have to kill taper to get amdump to complete and send the mail report.

    Running amflush on the remaining dumps will sometimes complete, but usually hangs as well.

    I've pushed large quantities of data to the tape drive via tar, without issue, so can't see a problem there.

    The taper always hangs on this point in the log,

    taper: Building type SPLIT_FILE header of 65536-65536 bytes with name='mgt-ams-satdb-01.XXX' disk='/var/log/audit' dumplevel=1 and blocksize=65536

    But with a different DLE each time.

    Running strace on the taper I see this repeated indefinitely.

    poll([{fd=0, events=POLLIN}, {fd=3, events=POLLIN}], 2, 5000) = 0 (Timeout)
    lseek(4, 0, SEEK_SET) = 0
    lseek(4, 0, SEEK_CUR) = 0
    write(4, "0", 1) = 1
    ftruncate(4, 1) = 0
    poll([{fd=0, events=POLLIN}, {fd=3, events=POLLIN}], 2, 5000) = 0 (Timeout)
    lseek(4, 0, SEEK_SET) = 0
    lseek(4, 0, SEEK_CUR) = 0
    write(4, "0", 1) = 1
    ftruncate(4, 1) = 0
    poll([{fd=0, events=POLLIN}, {fd=3, events=POLLIN}], 2, 5000) = 0 (Timeout)
    lseek(4, 0, SEEK_SET) = 0
    lseek(4, 0, SEEK_CUR) = 0
    write(4, "0", 1) = 1
    ftruncate(4, 1) = 0

    I was on 3.4, upgraded to 3.4.4 to see if that resolved it, but still remains the same.

    Any assistance appreciated.

    Thanks,

    Alan

  2. #2
    Join Date
    Nov 2005
    Location
    Canada
    Posts
    1,045

    Default

    Can you get the stacktrace of all threads,
    in gdb, run: thread apply all bt
    Also post the taper debug file

  3. #3
    Join Date
    May 2017
    Location
    UK
    Posts
    8

    Default

    Hi,

    Here is the stack trace against taper process. It complained about a number of missing debug pkgs, so not sure if this is useful without them?

    Code:
    Thread 4 (Thread 0x7f3798b7d700 (LWP 11709)):
    #0  0x00007f37a1894e2d in poll () from /lib64/libc.so.6
    #1  0x00007f37a016204c in g_main_context_iterate.isra.24 () from /lib64/libglib-2.0.so.0
    #2  0x00007f37a016216c in g_main_context_iteration () from /lib64/libglib-2.0.so.0
    #3  0x00007f37a01621a9 in glib_worker_main () from /lib64/libglib-2.0.so.0
    #4  0x00007f37a01870f5 in g_thread_proxy () from /lib64/libglib-2.0.so.0
    #5  0x00007f37a1b70dc5 in start_thread () from /lib64/libpthread.so.0
    #6  0x00007f37a189f76d in clone () from /lib64/libc.so.6
    
    Thread 3 (Thread 0x7f3793fff700 (LWP 11760)):
    #0  0x00007f37a1899c29 in syscall () from /lib64/libc.so.6
    #1  0x00007f37a01a494f in g_cond_wait () from /lib64/libglib-2.0.so.0
    #2  0x00007f379ac47a8a in device_thread_wait_for_block (eof_flag=<synthetic pointer>, self=0x31d7080) at xfer-dest-taper-splitter.c:389
    #3  device_thread_write_part (self=<optimized out>) at xfer-dest-taper-splitter.c:565
    #4  device_thread (data=<optimized out>) at xfer-dest-taper-splitter.c:745
    #5  0x00007f37a01870f5 in g_thread_proxy () from /lib64/libglib-2.0.so.0
    #6  0x00007f37a1b70dc5 in start_thread () from /lib64/libpthread.so.0
    #7  0x00007f37a189f76d in clone () from /lib64/libc.so.6
    
    Thread 2 (Thread 0x7f37937fe700 (LWP 11761)):
    #0  0x00007f37a1899c29 in syscall () from /lib64/libc.so.6
    #1  0x00007f37a01a494f in g_cond_wait () from /lib64/libglib-2.0.so.0
    #2  0x00007f379aea55fa in holding_thread (data=<optimized out>) at xfer-source-holding.c:157
    #3  0x00007f37a01870f5 in g_thread_proxy () from /lib64/libglib-2.0.so.0
    #4  0x00007f37a1b70dc5 in start_thread () from /lib64/libpthread.so.0
    #5  0x00007f37a189f76d in clone () from /lib64/libc.so.6
    
    Thread 1 (Thread 0x7f37a3095740 (LWP 11707)):
    #0  0x00007f37a1894e2d in poll () from /lib64/libc.so.6
    #1  0x00007f37a016204c in g_main_context_iterate.isra.24 () from /lib64/libglib-2.0.so.0
    #2  0x00007f37a016216c in g_main_context_iteration () from /lib64/libglib-2.0.so.0
    #3  0x00007f37a0ce6555 in event_loop_wait (wait_eh=0x0, nonblock=0, return_when_empty=0) at event.c:427
    #4  0x00007f37a0ce6958 in event_loop_run () at event.c:336
    #5  0x00007f3799f6b8a5 in run_c () at Amanda/MainLoop.c:1545
    #6  0x00007f3799f6b900 in _wrap_run_c (my_perl=<optimized out>, cv=<optimized out>) at Amanda/MainLoop.c:1981
    #7  0x00007f37a2bb439f in Perl_pp_entersub (my_perl=0xdf7010) at pp_hot.c:2778
    #8  0x00007f37a2bacb16 in Perl_runops_standard (my_perl=0xdf7010) at run.c:41
    #9  0x00007f37a2b49925 in S_run_body (oldscope=<optimized out>, my_perl=<optimized out>) at perl.c:2402
    ---Type <return> to continue, or q <return> to quit---
    #10 perl_run (my_perl=0xdf7010) at perl.c:2320
    #11 0x0000000000400d99 in main (argc=7, argv=0x7ffe9a070938, env=0x7ffe9a070978) at perlmain.c:120

  4. #4
    Join Date
    May 2017
    Location
    UK
    Posts
    8

    Default

    And here is the debug log.
    Attached Files Attached Files

  5. #5
    Join Date
    May 2017
    Location
    UK
    Posts
    8

    Default

    Adding stack trace again as it didn;t seem to upload first time

    Code:
    Thread 4 (Thread 0x7f3798b7d700 (LWP 11709)):
    #0  0x00007f37a1894e2d in poll () from /lib64/libc.so.6
    #1  0x00007f37a016204c in g_main_context_iterate.isra.24 () from /lib64/libglib-2.0.so.0
    #2  0x00007f37a016216c in g_main_context_iteration () from /lib64/libglib-2.0.so.0
    #3  0x00007f37a01621a9 in glib_worker_main () from /lib64/libglib-2.0.so.0
    #4  0x00007f37a01870f5 in g_thread_proxy () from /lib64/libglib-2.0.so.0
    #5  0x00007f37a1b70dc5 in start_thread () from /lib64/libpthread.so.0
    #6  0x00007f37a189f76d in clone () from /lib64/libc.so.6
    
    Thread 3 (Thread 0x7f3793fff700 (LWP 11760)):
    #0  0x00007f37a1899c29 in syscall () from /lib64/libc.so.6
    #1  0x00007f37a01a494f in g_cond_wait () from /lib64/libglib-2.0.so.0
    #2  0x00007f379ac47a8a in device_thread_wait_for_block (eof_flag=<synthetic pointer>, self=0x31d7080) at xfer-dest-taper-splitter.c:389
    #3  device_thread_write_part (self=<optimized out>) at xfer-dest-taper-splitter.c:565
    #4  device_thread (data=<optimized out>) at xfer-dest-taper-splitter.c:745
    #5  0x00007f37a01870f5 in g_thread_proxy () from /lib64/libglib-2.0.so.0
    #6  0x00007f37a1b70dc5 in start_thread () from /lib64/libpthread.so.0
    #7  0x00007f37a189f76d in clone () from /lib64/libc.so.6
    
    Thread 2 (Thread 0x7f37937fe700 (LWP 11761)):
    #0  0x00007f37a1899c29 in syscall () from /lib64/libc.so.6
    #1  0x00007f37a01a494f in g_cond_wait () from /lib64/libglib-2.0.so.0
    #2  0x00007f379aea55fa in holding_thread (data=<optimized out>) at xfer-source-holding.c:157
    #3  0x00007f37a01870f5 in g_thread_proxy () from /lib64/libglib-2.0.so.0
    #4  0x00007f37a1b70dc5 in start_thread () from /lib64/libpthread.so.0
    #5  0x00007f37a189f76d in clone () from /lib64/libc.so.6
    
    Thread 1 (Thread 0x7f37a3095740 (LWP 11707)):
    #0  0x00007f37a1894e2d in poll () from /lib64/libc.so.6
    #1  0x00007f37a016204c in g_main_context_iterate.isra.24 () from /lib64/libglib-2.0.so.0
    #2  0x00007f37a016216c in g_main_context_iteration () from /lib64/libglib-2.0.so.0
    #3  0x00007f37a0ce6555 in event_loop_wait (wait_eh=0x0, nonblock=0, return_when_empty=0) at event.c:427
    #4  0x00007f37a0ce6958 in event_loop_run () at event.c:336
    #5  0x00007f3799f6b8a5 in run_c () at Amanda/MainLoop.c:1545
    #6  0x00007f3799f6b900 in _wrap_run_c (my_perl=<optimized out>, cv=<optimized out>) at Amanda/MainLoop.c:1981
    #7  0x00007f37a2bb439f in Perl_pp_entersub (my_perl=0xdf7010) at pp_hot.c:2778
    #8  0x00007f37a2bacb16 in Perl_runops_standard (my_perl=0xdf7010) at run.c:41
    #9  0x00007f37a2b49925 in S_run_body (oldscope=<optimized out>, my_perl=<optimized out>) at perl.c:2402
    ---Type <return> to continue, or q <return> to quit---
    #10 perl_run (my_perl=0xdf7010) at perl.c:2320
    #11 0x0000000000400d99 in main (argc=7, argv=0x7ffe9a070938, env=0x7ffe9a070978) at perlmain.c:120

  6. #6
    Join Date
    Nov 2005
    Location
    Canada
    Posts
    1,045

    Default

    Some post are moderated, they must be approved before you can see them.
    I need you increase the debugging level to understand what's going on. Can you add the following line in amanda.conf

    debug-taper 9

    retry an amflush and post the resulting taper debug file.
    I would also like to get the value of the mem_ring structure from the thread that is hang in the holding_thread function.
    In gdb:
    thread 2
    up
    up
    print *self->mem_ring

  7. #7
    Join Date
    May 2017
    Location
    UK
    Posts
    8

    Default

    It ran for about two hours this time before hanging. Unfortunately the zipped log will not upload, just gives a generic file upload error. So I'm sharing it from Dropbox.

    https://www.dropbox.com/s/8cwhjz5vusg1ok7/taper-debug-9.zip?dl=0

    The trace

    Code:
    $1 = {write_offset = 1307440, written = 1307440, eof_flag = 0, padding1 = '\000' <repeats 235 times>, read_offset = 0, readx = 0, padding2 = '\000' <repeats 239 times>, 
      buffer = 0x7f62cc151e00 "Salted__\254\227\022f\377\066\373C\367t\021\032\263\071Д\021\220\070A*\227\255\355\343\214\v\240$\307\346F\211M\f<\246\346\333\350\037|\242U>+\270\303\373\206\262\366\320\300.\371\325\357Ѝ_\200#7;\006\331V\316)\177\241\205\272\313\035\201\231\315\004\234\302;\367\367\037\230\312\021\322}2!^ȯ\031--;", ring_size = 1310720, add_cond = 0x7f62c80312a0, free_cond = 0x7f62c80312b0, 
      mutex = 0x7f62c8031290, consumer_block_size = 65536, producer_block_size = 32768, consumer_ring_size = 1310720, producer_ring_size = 1048576, data_avail = 62256}

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •