PDA

View Full Version : amrecover stucks when choice for "set owner/mode for '.' is made



ZORN
May 10th, 2012, 06:59 AM
Hi there,

facing a weird issue with amrecover and I am running out of ideas, so all help/ideas on this problem are more than welcome.

I am using amanda to restore files on the backup server directly (but I also tried over network restore, the problem is the same).
When using amrecover for small size file restores all works fine and as expected.
When I use amrecover to restore a medium size file (~ 1 GB) I come to the point where amrecover asks:
--> set owner/mode for '.'? [yn] n
After that nothing happens anymore, although I waited for hours (I made one over night test).
And it doesn't matter if I answer with y or n, effect is always the same.

As soon as I make the choice at:"set owner/mode for '.'? [yn]" I get two defunct processes:

[root@mybackupsrv1 ~]# ps aux | grep 'Z'
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 27961 0.6 0.0 0 0 pts/0 Z+ 16:22 0:09 [gzip] <defunct>
root 27962 0.4 0.0 0 0 pts/0 Z+ 16:22 0:07 [restore] <defunct>

If I then quit amrecover wit CTRL-C the zombi-processes disappear and all seems fine.
And the recovered file is there too, completely.

I tried the same thing on another server (bit different hardware, same software) to exclude any machine issues, same result :-(
I really wonder why amrecover does not quit as expected.

Here is some information about the environment I have:
- Linux version 2.6.18-274.18.1.el5 (mockbuild@builder10.centos.org) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-51))
- amrecover-3.3.1
- tar (GNU tar) 1.15.1
- dump 0.4b41
- gzip 1.3.5

Any help would be really welcome,

thanks a million.

martineau
May 10th, 2012, 01:46 PM
It is the 'restore' process that print the message, but you can't reply, amrecover do not allow iteractivity with the restore process.
All the data are restored, but you must check onwership of '.'.

ZORN
May 10th, 2012, 08:20 PM
This morning I got a new situation when trying to amrecover a small file:

amanda 15720 0.0 0.0 0 0 ? Z 06:10 0:00 [amandad] <defunct>
root 15722 0.2 0.0 0 0 pts/0 Z+ 06:10 0:00 [gzip] <defunct>
root 15723 0.6 0.0 0 0 pts/0 Z+ 06:10 0:02 [restore] <defunct>

The restore itself worked and finished restoring the file, but amrecover stucked.

ZORN
May 13th, 2012, 11:01 PM
As the trace showed some broken pipes in the gzip system call I updatet to version gzip 1.4 - but nothing changed. Amrecover still stucks after answering "set owner/mode for '.'
Any ideas what else I could try?

Thanks

ZORN
May 14th, 2012, 04:10 AM
Here is also some logging/debugging output untill the moment where amrecover stucks, maybe it helps debugging:

==> ./amandad/amandad.20120514140223.debug <==
Mon May 14 14:04:05 2012: thd-0x2b27258fd930: amandad: security_stream_close(0x2b2725941700)
Mon May 14 14:04:05 2012: thd-0x2b27258fd930: amandad: security_stream_seterr(0x2b2725941700, write error to: Broken pipe)
Mon May 14 14:04:06 2012: thd-0x2b27258fd930: amandad: security_stream_seterr(0x2b2725920ed0, SOCKET_EOF)
Mon May 14 14:04:06 2012: thd-0x2b27258fd930: amandad: security_stream_seterr(0x2b2725920ed0, write error to: Bad file descriptor)
Mon May 14 14:04:06 2012: thd-0x2b27258fd930: amandad: sending NAK pkt:
<<<<<
ERROR write error on stream 499999: write error to: Bad file descriptor
>>>>>
Mon May 14 14:04:06 2012: thd-0x2b27258fd930: amandad: security_stream_close(0x2b2725920ed0)
Mon May 14 14:04:07 2012: thd-0x2b27258fd930: amandad: pid 27095 finish time Mon May 14 14:04:07 2012

==> ./client/mybackup/amrecover.20120514140223.debug <==
Mon May 14 14:03:35 2012: thd-0x2b002ea6b920: amrecover: send_to_tape_server: DATAPATH-OK

Mon May 14 14:03:40 2012: thd-0x2b002ea6b920: amrecover: stream_read_callback: data is still flowing
Mon May 14 14:03:51 2012: thd-0x2b002ea6b920: amrecover: stream_read_callback: data is still flowing

gzip: stdout: Broken pipe
Mon May 14 14:04:05 2012: thd-0x2b002ea6b920: amrecover: sending: QUIT


Mon May 14 14:04:05 2012: thd-0x2b002ea6b920: amrecover: tcpm_send_token: data is still flowing

==> ./server/mybackup/amidxtaped.20120514140330.debug <==
Mon May 14 14:03:35 2012: thd-0x40c7580: amidxtaped: ..with decompression applied
Mon May 14 14:03:35 2012: thd-0x40c7580: amidxtaped: Building type FILE header of 32768-32768 bytes with name='buserver.mybackup.net' disk='/share/buserver/disk02' dumplevel=1 and blocksize=0
Mon May 14 14:03:35 2012: thd-0x40c7580: amidxtaped: CTL << AVAIL-DATAPATH AMANDA
Mon May 14 14:03:35 2012: thd-0x40c7580: amidxtaped: CTL >> USE-DATAPATH AMANDA
Mon May 14 14:03:35 2012: thd-0x40c7580: amidxtaped: CTL << DATAPATH-OK
Mon May 14 14:03:35 2012: thd-0x40c7580: amidxtaped: Starting <Xfer@0x502e4f0 (<XferSourceRecovery@0x5063860> -> <XferDestFd@0x5064c10>)>
Mon May 14 14:03:35 2012: thd-0x40c7580: amidxtaped: Final linkage: <XferSourceRecovery@0x5063860> -(PULL_BUFFER)-> <XferElementGlue@0x5060920> -(WRITEFD)-> <XferDestFd@0x5064c10>
Mon May 14 14:03:35 2012: thd-0x40c7580: amidxtaped: started xfer; datapath=amanda
Mon May 14 14:03:35 2012: thd-0x40c7580: amidxtaped: Amanda::Recovery::Clerk: starting recovery
Mon May 14 14:03:35 2012: thd-0x40c7580: amidxtaped: Amanda::Recovery::Clerk: reading file 410 on 'DAILY-11'

Interesting is the: "write error to: Broken pipe", and the "write error to: Bad file descriptor" in the amandad.20120514140223.debug.
Any idea what causes this?
Any help is welcome, I really do not want to downgrade to a former amanda version :-(

Thanks.

martineau
May 14th, 2012, 07:39 AM
Wait that the 'set owner/mode for' finish, it can take a lot of time, you should get the amrecover prompt as soon as amidxtaped exit.

ZORN
May 14th, 2012, 07:46 AM
Wait that the 'set owner/mode for' finish, it can take a lot of time, you should get the amrecover prompt as soon as amidxtaped exit.

o.k., I'll start a recover job over night and see how far it comes. Thanks for now.

ZORN
May 14th, 2012, 08:16 PM
Hello Martineau,

thanks for your answer.
The long time test unfortunately failed again.
As before, I came to the point where choice is made: "set owner/mode for '.'
Again, as soon as I type y or n, I get two defunct processes (gzip and restore), and nothing happens (I waited 8 hours this time).

What could help debugging this, would more logs be useful?

Thanks a lot for your help here.

DebB57
January 8th, 2013, 02:41 PM
Hey ZORN -- I ran into the exact same problem. I'm running amanda v3.3.2.

A complete fix seems to occur (only did a brief test) if I switch to auth=bsdtcp instead of auth=bsd. This also requires/enables switching the xinetd process (I'm on Linux) to use tcp and stream instead of udp and dgram. <3/28/2013: nope, this doesn't fix it Deb>

I can't do this change until all my clients are ready, but it worked on a brief test.

You might also try a work-around which seems to solve it: have the dumptype do its compression on the server instead of on the client. This fixed the problem for me, except that older backups still hang with the broken pipe. <This is still my best bet; move compression to the server.>

Deb B (new here)