Hi Jean-Louis
Thanks for that , I've enabled it, and I've been working on this today - and have now managed to reproduce this on a desktop VM using AWS S3 ( not EMC ATMOS ).
The steps I followed were:
1) Create a simple backup of localhost / (5G) , /var (3G), /boot (75M)
2) Backup to S3
3) Backup again to S3 ( force an L1 backup if necessary )
Code:
[amandabackup@localhost ~]$ amadmin ProdSet info localhost
Current info for localhost /boot:
Stats: dump rates (kps), Full: 16246.5, 12997.2, -1.0
Incremental: -1.0, -1.0, -1.0
compressed size, Full: 93.3%, 93.3%,-100.0%
Incremental: -100.0%,-100.0%,-100.0%
Dumps: lev datestmp tape file origK compK secs
0 20170309 ProdSet-6 3 69680 64986 4
Current info for localhost /var:
Stats: dump rates (kps), Full: 14263.8, -1.0, -1.0
Incremental: 2117.5, 1849.0, 1580.0
compressed size, Full: 51.2%,-100.0%,-100.0%
Incremental: 39.7%, 37.3%, 34.4%
Dumps: lev datestmp tape file origK compK secs
0 20170309 ProdSet-1 1 3176180 1626071 114
1 20170309 ProdSet-6 1 10660 4235 2
Current info for localhost /:
Stats: dump rates (kps), Full: 11181.8, 11512.3, 11685.0
Incremental: 159.7, -1.0, -1.0
compressed size, Full: 53.1%, 53.1%, 53.1%
Incremental: 7.1%,-100.0%,-100.0%
Dumps: lev datestmp tape file origK compK secs
0 20170309 ProdSet-5 3 4397330 2337000 209
1 20170309 ProdSet-6 2 13500 958 6
4. Attempt a recovery of /
Code:
[root@localhost data]# amrecover -oauth=local
AMRECOVER Version 3.4.3. Contacting server on localhost ...
220 localhost AMANDA index server (3.4.3) ready.
Setting restore date to today (2017-03-09)
200 Working date set to 2017-03-09.
200 Config set to ProdSet.
501 Host localhost.localdomain is not in your disklist.
Trying host localhost ...
200 Dump host set to localhost.
Use the setdisk command to choose dump disk to recover
amrecover> setdisk /
200 Disk set to /.
amrecover> add .
Added dir /. at date 2017-03-09-16-34-49
Added dir /. at date 2017-03-09-17-12-06
amrecover> extract
Extracting files using tape drive changer on host localhost.
The following tapes are needed: ProdSet-5
ProdSet-6
Extracting files using tape drive changer on host localhost.
Load tape ProdSet-5 now
Continue [?/Y/n/s/d]? y
Restoring files into directory /data
All existing files in /data can be deleted
Continue [?/Y/n]? y
./ kb
./bin/
./boot/
./cgroup/
./data/
./dev/
./etc/
./etc/ConsoleKit/
./etc/ConsoleKit/run-seat.d/
./etc/ConsoleKit/run-session.d/
./etc/ConsoleKit/seats.d/
./etc/NetworkManager/
./etc/NetworkManager/dispatcher.d/
./etc/X11/
./etc/X11/applnk/
./etc/X11/fontpath.d/
./etc/abrt/
./etc/abrt/plugins/
./etc/acpi/
./etc/acpi/actions/
./etc/acpi/events/
./etc/alsa/
./etc/alternatives/
....
./lib/modules/2.6.32-642.3.1.el6.x86_64/kernel/drivers/media/video/cx88/
./lib/modules/2.6.32-642.3.1.el6.x86_64/kernel/drivers/media/video/em28xx/
./lib/modules/2.6.32-642.3.1.el6.x86_64/kernel/drivers/media/video/gspca/
Note: The console just hangs at this point.
The amidxtaped log ends with:
Code:
[amandabackup@localhost ProdSet]$ tail -f amidxtaped.20170309174247.debug
Thu Mar 09 17:42:50.234280886 2017: pid 6934: thd-0x1133400: amidxtaped: CTL >> USE-DATAPATH AMANDA
Thu Mar 09 17:42:50.234310691 2017: pid 6934: thd-0x22733d0: amidxtaped: pull_and_write
Thu Mar 09 17:42:50.235815562 2017: pid 6934: thd-0x1133400: amidxtaped: CTL << DATAPATH-OK
Thu Mar 09 17:42:50.235944829 2017: pid 6934: thd-0x1133400: amidxtaped: CTL >> DATA-SEND
Thu Mar 09 17:42:50.236185131 2017: pid 6934: thd-0x1133400: amidxtaped: CTL << DATA-READY
Thu Mar 09 17:42:50.236321503 2017: pid 6934: thd-0x1133400: amidxtaped: Amanda::Recovery::Clerk: starting recovery
Thu Mar 09 17:42:50.238149965 2017: pid 6934: thd-0x1133400: amidxtaped: Amanda::Recovery::Clerk: reading file 3 on 'ProdSet-5'
Thu Mar 09 17:42:50.238194434 2017: pid 6934: thd-0x1133400: amidxtaped: XSR: start_part called
Thu Mar 09 17:42:50.238205540 2017: pid 6934: thd-0x1133400: amidxtaped: XSR: triggering condition variable
Thu Mar 09 17:42:50.238284491 2017: pid 6934: thd-0x22733d0: amidxtaped: XSR: first pull_buffer of new part
Thu Mar 09 17:43:05.250453791 2017: pid 6934: thd-0x1133400: amidxtaped: /usr/local/share/perl5/Amanda/Restore.pm:1698:info:4900000 67768 kb
Thu Mar 09 17:43:20.266313401 2017: pid 6934: thd-0x1133400: amidxtaped: /usr/local/share/perl5/Amanda/Restore.pm:1698:info:4900000 102400 kb
Thu Mar 09 17:43:35.281422591 2017: pid 6934: thd-0x1133400: amidxtaped: /usr/local/share/perl5/Amanda/Restore.pm:1698:info:4900000 102400 kb
...
As this is now on a local vm , I have compiled the RPMs from source and installed the debug rpms, so can now get full backtraces:
amrecover:
Code:
(gdb) thread apply all bt
Thread 2 (Thread 0x7f1d20820700 (LWP 7168)):
#0 0x00000034f4e0e7cd in write () at ../sysdeps/unix/syscall-template.S:82
#1 0x00007f1d273ffd46 in safe_write (fd=17, buf=0x7f1d2081bd90, count=16384) at safe-read.c:66
#2 0x00007f1d273ff5ae in full_write (fd=17, buf=<value optimized out>, count=16384) at full-write.c:65
#3 0x0000000000409e16 in handle_crc_thread (data=0x625b40) at extract_list.c:3589
#4 0x00000034f626a374 in g_thread_create_proxy (data=0xc905c0) at gthread.c:1955
#5 0x00000034f4e07aa1 in start_thread (arg=0x7f1d20820700) at pthread_create.c:301
#6 0x00000034f4ae8aad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
Thread 1 (Thread 0x7f1d268c27c0 (LWP 7158)):
#0 0x00000034f4e0e7cd in write () at ../sysdeps/unix/syscall-template.S:82
#1 0x00007f1d273ffd46 in safe_write (fd=8, buf=0xcb16e0, count=32768) at safe-read.c:66
#2 0x00007f1d273ff5ae in full_write (fd=8, buf=<value optimized out>, count=32768) at full-write.c:65
#3 0x000000000040e15b in read_amidxtaped_data (cookie=0x625b80, buf=0xcb16e0, size=32768) at extract_list.c:3313
#4 0x00007f1d273ec736 in stream_read_callback (arg=0xca15a0) at security-util.c:2246
#5 0x00007f1d273ee475 in sec_tcp_conn_read_callback (cookie=0xc628f0) at security-util.c:2382
#6 0x00007f1d273dbca6 in event_handle_callback (user_ptr=0xc89600) at event.c:118
#7 0x00000034f6240642 in g_main_dispatch (context=0xc62fb0) at gmain.c:2441
#8 g_main_context_dispatch (context=0xc62fb0) at gmain.c:3014
#9 0x00000034f6244c98 in g_main_context_iterate (context=0xc62fb0, block=1, dispatch=1, self=<value optimized out>) at gmain.c:3092
#10 0x00000034f6244e4c in g_main_context_iteration (context=0xc62fb0, may_block=1) at gmain.c:3155
#11 0x00007f1d273dbe49 in event_loop_wait (wait_eh=0xc8fbe0, nonblock=0, return_when_empty=1) at event.c:427
#12 0x00007f1d273ef0a5 in tcpm_stream_read_sync (s=0xc99500, buf=0x7ffcfdc350a8) at security-util.c:447
#13 0x00000000004082aa in get_amidxtaped_line () at extract_list.c:2943
#14 0x000000000040e835 in writer_intermediary (elist=<value optimized out>) at extract_list.c:2274
#15 0x000000000040f7ee in extract_files () at extract_list.c:2633
#16 0x000000000041419a in yyparse () at uparse.y:258
#17 0x0000000000415189 in process_line (line=<value optimized out>) at uscan.l:199
#18 0x0000000000406e93 in main (argc=<value optimized out>, argv=<value optimized out>) at amrecover.c:644
amidxtaped:
Code:
(gdb) thread apply all bt
Thread 2 (Thread 0x7f1af5ab9700 (LWP 7166)):
#0 0x00000034f4e0e7cd in write () at ../sysdeps/unix/syscall-template.S:82
#1 0x00007f1afae01d46 in safe_write (fd=6, buf=0x7f1aecdff010, count=52428800) at safe-read.c:66
#2 0x00007f1afae015ae in full_write (fd=6, buf=<value optimized out>, count=52428800) at full-write.c:65
#3 0x00007f1afb07f9b9 in pull_and_write (self=0x33ed020 [XferElementGlue]) at element-glue.c:411
#4 0x00007f1afb082c16 in worker_thread (data=<value optimized out>) at element-glue.c:1380
#5 0x00000034f626a374 in g_thread_create_proxy (data=0x33ecf00) at gthread.c:1955
#6 0x00000034f4e07aa1 in start_thread (arg=0x7f1af5ab9700) at pthread_create.c:301
#7 0x00000034f4ae8aad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
Thread 1 (Thread 0x7f1afc0de700 (LWP 7163)):
#0 0x00000034f4adf283 in __poll (fds=<value optimized out>, nfds=<value optimized out>, timeout=<value optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87
#1 0x00000034f62449f9 in g_main_context_poll (context=0x2ed9c30, block=1, dispatch=1, self=<value optimized out>) at gmain.c:3405
#2 g_main_context_iterate (context=0x2ed9c30, block=1, dispatch=1, self=<value optimized out>) at gmain.c:3087
#3 0x00000034f6244e4c in g_main_context_iteration (context=0x2ed9c30, may_block=1) at gmain.c:3155
#4 0x00007f1afaddde49 in event_loop_wait (wait_eh=0x0, nonblock=0, return_when_empty=0) at event.c:427
#5 0x00007f1af966141d in _wrap_run_c (my_perl=<value optimized out>, cv=<value optimized out>) at Amanda/MainLoop.c:2017
#6 0x0000003a926a6815 in Perl_pp_entersub () from /usr/lib64/perl5/CORE/libperl.so
#7 0x0000003a926a4b06 in Perl_runops_standard () from /usr/lib64/perl5/CORE/libperl.so
#8 0x0000003a9264d0d8 in perl_run () from /usr/lib64/perl5/CORE/libperl.so
#9 0x0000000000400e74 in main ()
gzip:
Code:
(gdb) thread apply all bt
Thread 1 (process 7167):
#0 0x00000034f4adb6d0 in __write_nocancel () at ../sysdeps/unix/syscall-template.S:82
#1 0x000000000040aade in write_buffer (fd=1, buf=0x64b440, cnt=32768) at util.c:190
#2 write_buf (fd=1, buf=0x64b440, cnt=32768) at util.c:233
#3 0x000000000040ab88 in flush_window () at util.c:216
#4 0x00000000004062a2 in inflate_codes (tl=0x1b20020, td=0x1b225c0, bl=9, bd=6) at inflate.c:612
#5 0x0000000000406b37 in inflate_dynamic () at inflate.c:891
#6 0x00000000004072d2 in inflate () at inflate.c:969
#7 0x000000000040a22d in unzip (in=<value optimized out>, out=<value optimized out>) at unzip.c:138
#8 0x0000000000403fb7 in treat_stdin () at gzip.c:807
#9 0x0000000000405b9a in main (argc=2, argv=0x7fff49f13358) at gzip.c:714
tar:
Code:
(gdb) thread apply all bt
Thread 1 (Thread 0x7fdda816a7a0 (LWP 7169)):
#0 0x00000034f4adb6d0 in __write_nocancel () at ../sysdeps/unix/syscall-template.S:82
#1 0x00000034f4a71a93 in _IO_new_file_write (f=0x34f4d8f040, data=<value optimized out>, n=58) at fileops.c:1268
#2 0x00000034f4a73045 in new_do_write (fp=0x34f4d8f040, data=0x7fdda8173000 "./opt/puppetlabs/puppet/lib/ruby/2.1.0/rubygems/core_ext/\nlate/json_index/js/\n/\ninttypes/\n", to_do=58) at fileops.c:522
#3 _IO_new_do_write (fp=0x34f4d8f040, data=0x7fdda8173000 "./opt/puppetlabs/puppet/lib/ruby/2.1.0/rubygems/core_ext/\nlate/json_index/js/\n/\ninttypes/\n", to_do=58) at fileops.c:495
#4 0x00000034f4a72248 in _IO_new_file_sync (fp=0x34f4d8f040) at fileops.c:897
#5 0x00000034f4a709f3 in fflush_unlocked (fp=<value optimized out>) at iofflush_u.c:40
#6 0x0000000000416398 in simple_print_header (st=<value optimized out>, blk=0xfbe400, block_ordinal=<value optimized out>) at list.c:1314
#7 0x000000000040dd38 in extract_archive () at extract.c:1400
#8 0x00000000004178eb in read_and (do_something=0x40db60 <extract_archive>) at list.c:186
#9 0x000000000041f247 in main (argc=<value optimized out>, argv=<value optimized out>) at tar.c:2692
[root@localhost ProdSet]# strace -s 255 -p 7169
Process 7169 attached
write(1, "./opt/puppetlabs/puppet/lib/ruby/2.1.0/rubygems/core_ext/\n", 58^CProcess 7169 detached
<detached ...>
[root@localhost ProdSet]# ls -l /proc/7169/fd/1
l-wx------. 1 root root 64 Mar 9 18:00 /proc/7169/fd/1 -> pipe:[716533]
It looks to me like tar is stuck writing to a pipe?
The reader for the pipe is proc 7158:
[root@localhost ProdSet]# ls -l /proc/*/fd/* 2>/dev/null | grep 716533
lr-x------. 1 root root 64 Mar 9 18:01 /proc/7158/fd/9 -> pipe:[716533]
l-wx------. 1 root root 64 Mar 9 18:00 /proc/7169/fd/1 -> pipe:[716533]
which is amrecover:
Code:
[root@localhost ProdSet]# ps -fp 7158
UID PID PPID C STIME TTY TIME CMD
root 7158 4966 0 17:48 pts/0 00:00:00 amrecover -oauth=local
But the backtrace above shows 2 threads that are both blocked writing - nothing appears to be reading from that pipe.
Let me know if you need more info - I'll take my laptop back with me tonight, so can now recreate this at home.
Kind Regards
Dave