Page 2 of 3 FirstFirst 123 LastLast
Results 11 to 20 of 28

Thread: amrecover with S3 tape stops after 2 blocks

  1. #11
    Join Date
    Nov 2005
    Location
    Canada
    Posts
    1,031

    Default

    Since you run amrecover on the server, can you try with the 'local' auth?
    $ amrecover -oauth=local ...

  2. #12
    Join Date
    Mar 2017
    Posts
    14

    Default

    Ok - just tried that with no difference, it hangs in the same place.

  3. #13
    Join Date
    Nov 2005
    Location
    Canada
    Posts
    1,031

    Default

    Try with:
    device-property "nb-threads-backup" "1"
    device-property "nb-threads-recovery" "1"
    device_property "VERBOSE" "YES"

    And post the complete amidxtaped debug files ( add a '.txt' extension to be able to attach it)

  4. #14
    Join Date
    Mar 2017
    Posts
    14

    Default

    I've uploaded the logfile as a zip ( it wouldn't go as a plain txt file ). I've had to alter some elements of the log ( ip's, hostnames etc ) but it should be structurally unchanged.

    Thanks for continuing to look at this.

    Dave
    Attached Files Attached Files

  5. #15
    Join Date
    Nov 2005
    Location
    Canada
    Posts
    1,031

    Default

    You said you can restore it with amfetchdump, can you restore it again and post the amfetchdump debug file.
    I would like to compare them.

  6. #16
    Join Date
    Mar 2017
    Posts
    14

    Default

    I have attached a similarly redacted amfetchdump log. I'm coming to the end of our day here, so I'll pick this up again tomorrow morning (UK) if you need any more info.

    Once again, many thanks for looking into this.

    Kind Regards
    Dave
    Attached Files Attached Files

  7. #17
    Join Date
    Nov 2005
    Location
    Canada
    Posts
    1,031

    Default

    everything looks good.
    In the amrecover console, did it show that it read some data?
    Do you see something at the same in the system error log?

  8. #18
    Join Date
    Mar 2017
    Posts
    14

    Default

    amrecover shows the directories which are created on the recovery filesystem, but then it just stops.

    I've checked /var/log/messages - and nothing is shown. The OS is Centos 6.8 ( 64bit ).

    I was looking at the stack traces for amfetchdump - and there were three threads running when it was pulling data from S3. In amidxtaped there were only two. Is it possible that something has caused the retrieving thread to die without telling it's parent - so that's just left waiting for data that's never going to come?

    I'm at home now - but can send you the amfetchdump stacks tomorrow if that helps?

  9. #19
    Join Date
    Nov 2005
    Location
    Canada
    Posts
    1,031

    Default

    There is one more thread in amfetchdump because it do the decompression, which is done by amrecover in the amidxtaped case.

    Add the following line to amanda.conf
    Code:
    debug-recovery 9
    The new debugging information might help us to understand what is going on.

  10. #20
    Join Date
    Mar 2017
    Posts
    14

    Default

    Hi Jean-Louis

    Thanks for that , I've enabled it, and I've been working on this today - and have now managed to reproduce this on a desktop VM using AWS S3 ( not EMC ATMOS ).

    The steps I followed were:

    1) Create a simple backup of localhost / (5G) , /var (3G), /boot (75M)
    2) Backup to S3
    3) Backup again to S3 ( force an L1 backup if necessary )

    Code:
    [amandabackup@localhost ~]$ amadmin ProdSet info localhost
    
    Current info for localhost /boot:
      Stats: dump rates (kps), Full:  16246.5, 12997.2,  -1.0
                        Incremental:   -1.0,  -1.0,  -1.0
              compressed size, Full:  93.3%, 93.3%,-100.0%
                        Incremental: -100.0%,-100.0%,-100.0%
      Dumps: lev datestmp  tape             file   origK   compK secs
              0  20170309  ProdSet-6        3 69680 64986 4
    
    Current info for localhost /var:
      Stats: dump rates (kps), Full:  14263.8,  -1.0,  -1.0
                        Incremental:  2117.5, 1849.0, 1580.0
              compressed size, Full:  51.2%,-100.0%,-100.0%
                        Incremental:  39.7%, 37.3%, 34.4%
      Dumps: lev datestmp  tape             file   origK   compK secs
              0  20170309  ProdSet-1        1 3176180 1626071 114
              1  20170309  ProdSet-6        1 10660 4235 2
    
    Current info for localhost /:
      Stats: dump rates (kps), Full:  11181.8, 11512.3, 11685.0
                        Incremental:  159.7,  -1.0,  -1.0
              compressed size, Full:  53.1%, 53.1%, 53.1%
                        Incremental:   7.1%,-100.0%,-100.0%
      Dumps: lev datestmp  tape             file   origK   compK secs
              0  20170309  ProdSet-5        3 4397330 2337000 209
              1  20170309  ProdSet-6        2 13500 958 6
    4. Attempt a recovery of /

    Code:
    [root@localhost data]# amrecover -oauth=local
    AMRECOVER Version 3.4.3. Contacting server on localhost ...
    220 localhost AMANDA index server (3.4.3) ready.
    Setting restore date to today (2017-03-09)
    200 Working date set to 2017-03-09.
    200 Config set to ProdSet.
    501 Host localhost.localdomain is not in your disklist.
    Trying host localhost ...
    200 Dump host set to localhost.
    Use the setdisk command to choose dump disk to recover
    amrecover> setdisk /
    200 Disk set to /.
    amrecover> add .
    Added dir /. at date 2017-03-09-16-34-49
    Added dir /. at date 2017-03-09-17-12-06
    amrecover> extract
    
    Extracting files using tape drive changer on host localhost.
    The following tapes are needed: ProdSet-5
                                    ProdSet-6
    
    Extracting files using tape drive changer on host localhost.
    Load tape ProdSet-5 now
    Continue [?/Y/n/s/d]? y
    Restoring files into directory /data
    All existing files in /data can be deleted
    Continue [?/Y/n]? y
    
    ./ kb
    ./bin/
    ./boot/
    ./cgroup/
    ./data/
    ./dev/
    ./etc/
    ./etc/ConsoleKit/
    ./etc/ConsoleKit/run-seat.d/
    ./etc/ConsoleKit/run-session.d/
    ./etc/ConsoleKit/seats.d/
    ./etc/NetworkManager/
    ./etc/NetworkManager/dispatcher.d/
    ./etc/X11/
    ./etc/X11/applnk/
    ./etc/X11/fontpath.d/
    ./etc/abrt/
    ./etc/abrt/plugins/
    ./etc/acpi/
    ./etc/acpi/actions/
    ./etc/acpi/events/
    ./etc/alsa/
    ./etc/alternatives/
    ....
    ./lib/modules/2.6.32-642.3.1.el6.x86_64/kernel/drivers/media/video/cx88/
    ./lib/modules/2.6.32-642.3.1.el6.x86_64/kernel/drivers/media/video/em28xx/
    ./lib/modules/2.6.32-642.3.1.el6.x86_64/kernel/drivers/media/video/gspca/
    Note: The console just hangs at this point.

    The amidxtaped log ends with:

    Code:
    [amandabackup@localhost ProdSet]$ tail -f amidxtaped.20170309174247.debug
    Thu Mar 09 17:42:50.234280886 2017: pid 6934: thd-0x1133400: amidxtaped: CTL >> USE-DATAPATH AMANDA
    Thu Mar 09 17:42:50.234310691 2017: pid 6934: thd-0x22733d0: amidxtaped: pull_and_write
    Thu Mar 09 17:42:50.235815562 2017: pid 6934: thd-0x1133400: amidxtaped: CTL << DATAPATH-OK
    Thu Mar 09 17:42:50.235944829 2017: pid 6934: thd-0x1133400: amidxtaped: CTL >> DATA-SEND
    Thu Mar 09 17:42:50.236185131 2017: pid 6934: thd-0x1133400: amidxtaped: CTL << DATA-READY
    Thu Mar 09 17:42:50.236321503 2017: pid 6934: thd-0x1133400: amidxtaped: Amanda::Recovery::Clerk: starting recovery
    Thu Mar 09 17:42:50.238149965 2017: pid 6934: thd-0x1133400: amidxtaped: Amanda::Recovery::Clerk: reading file 3 on 'ProdSet-5'
    Thu Mar 09 17:42:50.238194434 2017: pid 6934: thd-0x1133400: amidxtaped: XSR: start_part called
    Thu Mar 09 17:42:50.238205540 2017: pid 6934: thd-0x1133400: amidxtaped: XSR: triggering condition variable
    Thu Mar 09 17:42:50.238284491 2017: pid 6934: thd-0x22733d0: amidxtaped: XSR: first pull_buffer of new part
    Thu Mar 09 17:43:05.250453791 2017: pid 6934: thd-0x1133400: amidxtaped: /usr/local/share/perl5/Amanda/Restore.pm:1698:info:4900000 67768 kb
    Thu Mar 09 17:43:20.266313401 2017: pid 6934: thd-0x1133400: amidxtaped: /usr/local/share/perl5/Amanda/Restore.pm:1698:info:4900000 102400 kb
    Thu Mar 09 17:43:35.281422591 2017: pid 6934: thd-0x1133400: amidxtaped: /usr/local/share/perl5/Amanda/Restore.pm:1698:info:4900000 102400 kb
    ...
    As this is now on a local vm , I have compiled the RPMs from source and installed the debug rpms, so can now get full backtraces:

    amrecover:

    Code:
    (gdb) thread apply all bt
    
    Thread 2 (Thread 0x7f1d20820700 (LWP 7168)):
    #0  0x00000034f4e0e7cd in write () at ../sysdeps/unix/syscall-template.S:82
    #1  0x00007f1d273ffd46 in safe_write (fd=17, buf=0x7f1d2081bd90, count=16384) at safe-read.c:66
    #2  0x00007f1d273ff5ae in full_write (fd=17, buf=<value optimized out>, count=16384) at full-write.c:65
    #3  0x0000000000409e16 in handle_crc_thread (data=0x625b40) at extract_list.c:3589
    #4  0x00000034f626a374 in g_thread_create_proxy (data=0xc905c0) at gthread.c:1955
    #5  0x00000034f4e07aa1 in start_thread (arg=0x7f1d20820700) at pthread_create.c:301
    #6  0x00000034f4ae8aad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
    
    Thread 1 (Thread 0x7f1d268c27c0 (LWP 7158)):
    #0  0x00000034f4e0e7cd in write () at ../sysdeps/unix/syscall-template.S:82
    #1  0x00007f1d273ffd46 in safe_write (fd=8, buf=0xcb16e0, count=32768) at safe-read.c:66
    #2  0x00007f1d273ff5ae in full_write (fd=8, buf=<value optimized out>, count=32768) at full-write.c:65
    #3  0x000000000040e15b in read_amidxtaped_data (cookie=0x625b80, buf=0xcb16e0, size=32768) at extract_list.c:3313
    #4  0x00007f1d273ec736 in stream_read_callback (arg=0xca15a0) at security-util.c:2246
    #5  0x00007f1d273ee475 in sec_tcp_conn_read_callback (cookie=0xc628f0) at security-util.c:2382
    #6  0x00007f1d273dbca6 in event_handle_callback (user_ptr=0xc89600) at event.c:118
    #7  0x00000034f6240642 in g_main_dispatch (context=0xc62fb0) at gmain.c:2441
    #8  g_main_context_dispatch (context=0xc62fb0) at gmain.c:3014
    #9  0x00000034f6244c98 in g_main_context_iterate (context=0xc62fb0, block=1, dispatch=1, self=<value optimized out>) at gmain.c:3092
    #10 0x00000034f6244e4c in g_main_context_iteration (context=0xc62fb0, may_block=1) at gmain.c:3155
    #11 0x00007f1d273dbe49 in event_loop_wait (wait_eh=0xc8fbe0, nonblock=0, return_when_empty=1) at event.c:427
    #12 0x00007f1d273ef0a5 in tcpm_stream_read_sync (s=0xc99500, buf=0x7ffcfdc350a8) at security-util.c:447
    #13 0x00000000004082aa in get_amidxtaped_line () at extract_list.c:2943
    #14 0x000000000040e835 in writer_intermediary (elist=<value optimized out>) at extract_list.c:2274
    #15 0x000000000040f7ee in extract_files () at extract_list.c:2633
    #16 0x000000000041419a in yyparse () at uparse.y:258
    #17 0x0000000000415189 in process_line (line=<value optimized out>) at uscan.l:199
    #18 0x0000000000406e93 in main (argc=<value optimized out>, argv=<value optimized out>) at amrecover.c:644
    amidxtaped:

    Code:
    (gdb)  thread apply all bt
    
    Thread 2 (Thread 0x7f1af5ab9700 (LWP 7166)):
    #0  0x00000034f4e0e7cd in write () at ../sysdeps/unix/syscall-template.S:82
    #1  0x00007f1afae01d46 in safe_write (fd=6, buf=0x7f1aecdff010, count=52428800) at safe-read.c:66
    #2  0x00007f1afae015ae in full_write (fd=6, buf=<value optimized out>, count=52428800) at full-write.c:65
    #3  0x00007f1afb07f9b9 in pull_and_write (self=0x33ed020 [XferElementGlue]) at element-glue.c:411
    #4  0x00007f1afb082c16 in worker_thread (data=<value optimized out>) at element-glue.c:1380
    #5  0x00000034f626a374 in g_thread_create_proxy (data=0x33ecf00) at gthread.c:1955
    #6  0x00000034f4e07aa1 in start_thread (arg=0x7f1af5ab9700) at pthread_create.c:301
    #7  0x00000034f4ae8aad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
    
    Thread 1 (Thread 0x7f1afc0de700 (LWP 7163)):
    #0  0x00000034f4adf283 in __poll (fds=<value optimized out>, nfds=<value optimized out>, timeout=<value optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87
    #1  0x00000034f62449f9 in g_main_context_poll (context=0x2ed9c30, block=1, dispatch=1, self=<value optimized out>) at gmain.c:3405
    #2  g_main_context_iterate (context=0x2ed9c30, block=1, dispatch=1, self=<value optimized out>) at gmain.c:3087
    #3  0x00000034f6244e4c in g_main_context_iteration (context=0x2ed9c30, may_block=1) at gmain.c:3155
    #4  0x00007f1afaddde49 in event_loop_wait (wait_eh=0x0, nonblock=0, return_when_empty=0) at event.c:427
    #5  0x00007f1af966141d in _wrap_run_c (my_perl=<value optimized out>, cv=<value optimized out>) at Amanda/MainLoop.c:2017
    #6  0x0000003a926a6815 in Perl_pp_entersub () from /usr/lib64/perl5/CORE/libperl.so
    #7  0x0000003a926a4b06 in Perl_runops_standard () from /usr/lib64/perl5/CORE/libperl.so
    #8  0x0000003a9264d0d8 in perl_run () from /usr/lib64/perl5/CORE/libperl.so
    #9  0x0000000000400e74 in main ()
    gzip:

    Code:
    (gdb) thread apply all bt
    
    Thread 1 (process 7167):
    #0  0x00000034f4adb6d0 in __write_nocancel () at ../sysdeps/unix/syscall-template.S:82
    #1  0x000000000040aade in write_buffer (fd=1, buf=0x64b440, cnt=32768) at util.c:190
    #2  write_buf (fd=1, buf=0x64b440, cnt=32768) at util.c:233
    #3  0x000000000040ab88 in flush_window () at util.c:216
    #4  0x00000000004062a2 in inflate_codes (tl=0x1b20020, td=0x1b225c0, bl=9, bd=6) at inflate.c:612
    #5  0x0000000000406b37 in inflate_dynamic () at inflate.c:891
    #6  0x00000000004072d2 in inflate () at inflate.c:969
    #7  0x000000000040a22d in unzip (in=<value optimized out>, out=<value optimized out>) at unzip.c:138
    #8  0x0000000000403fb7 in treat_stdin () at gzip.c:807
    #9  0x0000000000405b9a in main (argc=2, argv=0x7fff49f13358) at gzip.c:714
    tar:

    Code:
    (gdb) thread apply all bt
    
    Thread 1 (Thread 0x7fdda816a7a0 (LWP 7169)):
    #0  0x00000034f4adb6d0 in __write_nocancel () at ../sysdeps/unix/syscall-template.S:82
    #1  0x00000034f4a71a93 in _IO_new_file_write (f=0x34f4d8f040, data=<value optimized out>, n=58) at fileops.c:1268
    #2  0x00000034f4a73045 in new_do_write (fp=0x34f4d8f040, data=0x7fdda8173000 "./opt/puppetlabs/puppet/lib/ruby/2.1.0/rubygems/core_ext/\nlate/json_index/js/\n/\ninttypes/\n", to_do=58) at fileops.c:522
    #3  _IO_new_do_write (fp=0x34f4d8f040, data=0x7fdda8173000 "./opt/puppetlabs/puppet/lib/ruby/2.1.0/rubygems/core_ext/\nlate/json_index/js/\n/\ninttypes/\n", to_do=58) at fileops.c:495
    #4  0x00000034f4a72248 in _IO_new_file_sync (fp=0x34f4d8f040) at fileops.c:897
    #5  0x00000034f4a709f3 in fflush_unlocked (fp=<value optimized out>) at iofflush_u.c:40
    #6  0x0000000000416398 in simple_print_header (st=<value optimized out>, blk=0xfbe400, block_ordinal=<value optimized out>) at list.c:1314
    #7  0x000000000040dd38 in extract_archive () at extract.c:1400
    #8  0x00000000004178eb in read_and (do_something=0x40db60 <extract_archive>) at list.c:186
    #9  0x000000000041f247 in main (argc=<value optimized out>, argv=<value optimized out>) at tar.c:2692
    
    
    [root@localhost ProdSet]# strace -s 255 -p 7169
    Process 7169 attached
    write(1, "./opt/puppetlabs/puppet/lib/ruby/2.1.0/rubygems/core_ext/\n", 58^CProcess 7169 detached
     <detached ...>
    
    [root@localhost ProdSet]# ls -l /proc/7169/fd/1
    l-wx------. 1 root root 64 Mar  9 18:00 /proc/7169/fd/1 -> pipe:[716533]
    It looks to me like tar is stuck writing to a pipe?

    The reader for the pipe is proc 7158:

    [root@localhost ProdSet]# ls -l /proc/*/fd/* 2>/dev/null | grep 716533
    lr-x------. 1 root root 64 Mar 9 18:01 /proc/7158/fd/9 -> pipe:[716533]
    l-wx------. 1 root root 64 Mar 9 18:00 /proc/7169/fd/1 -> pipe:[716533]

    which is amrecover:

    Code:
    [root@localhost ProdSet]# ps -fp 7158
    UID        PID  PPID  C STIME TTY          TIME CMD
    root      7158  4966  0 17:48 pts/0    00:00:00 amrecover -oauth=local
    But the backtrace above shows 2 threads that are both blocked writing - nothing appears to be reading from that pipe.

    Let me know if you need more info - I'll take my laptop back with me tonight, so can now recreate this at home.

    Kind Regards
    Dave

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •