PDA

View Full Version : Amcheck OK, amdump timeout waiting for ACK



bluethumb
July 12th, 2007, 04:51 PM
I've seen a lot of discussion about the problems that cause amcheck to timeout waiting for and ACK. I have a new wrinkle. Amcheck works just fine, but amdump times out because of waiting for ACK.

We have a system that has been running for years with a Solaris tape host and a mixture of Solaris and Linux clients. The old tape host has Amanda version 2.4.5. Recently we got a new tape server. It has Scientific Linux (a clone of RHEL5) and Amanda version 2.5.0p2-4.x86_64. The idea is to have this machine take over for the old server.

First I got the new server set up properly so it would back up itself. Then I chose one of the clients to be backed up. It runs a clone of RHEL4 with amanda version 2.4.4p3-1.i386. I modified the iptables and .amandahosts files to allow access from both servers. Tcpd isn't used for amanda.

As I said, amcheck runs quickly with no errors. That would seem to indicate that communication is working properly. Yet when amdump runs, the client produces estimates and then quits. All the amanda-related programs on the client are completed well before amdump times out.

I have attached my amandad.debug file. Apparently I can attach only one file, so I will include some other possibly-relevant files here.

excerpt from client iptables -L:
ACCEPT tcp -- 171.64.108.0/23 anywhere state NEW tcp dpt:amanda
ACCEPT udp -- 171.64.108.0/23 anywhere state NEW udp dpt:amanda
ACCEPT tcp -- 171.64.108.0/23 anywhere state NEW tcp dpt:kamanda
ACCEPT udp -- 171.64.108.0/23 anywhere state NEW udp dpt:kamanda
(Yes, that range of IP addresses is OK for both servers.)

Client services:
# grep amanda /etc/services
amanda 10080/tcp # amanda backup services
amanda 10080/udp # amanda backup services
kamanda 10081/tcp # amanda backup services (Kerberos)
kamanda 10081/udp # amanda backup services (Kerberos)
amandaidx 10082/tcp # amanda backup services
amidxtape 10083/tcp # amanda backup services

Server iptables:
ACCEPT udp -- 171.64.108.0/23 anywhere state NEW udp dpt:amanda
ACCEPT tcp -- 171.64.108.0/23 anywhere state NEW tcp dpt:amandaidx
ACCEPT tcp -- 171.64.108.0/23 anywhere state NEW tcp dpt:amidxtape

Client xinet:
# cat /etc/xinetd.d/amanda
service amanda
{
socket_type = dgram
protocol = udp
wait = yes
user = amanda
group = disk
server = /usr/lib/amanda/amandad
disable = no
}

client .amandahosts:
# cat ~amanda/.amandahosts
localhost amanda
localhost.localdomain amanda
server1 amanda
server1.domain.com amanda
server2 amanda
server2.domain.com amanda
(names altered for security)

sendsize.debug:
sendsize: debug 1 pid 18036 ruid 33 euid 33: start at Thu Jul 12 17:20:42 2007
sendsize: version 2.4.4p3
sendsize[18038]: time 0.016: calculating for amname '/boot', dirname '/boot', spindle -1
sendsize[18038]: time 0.016: getting size via gnutar for /boot level 0
sendsize[18038]: time 0.018: spawning /usr/lib/amanda/runtar in pipeline
sendsize[18038]: argument list: /bin/tar --create --file /dev/null --directory /boot --one-file-system --listed-incremental /var/lib/amanda/gnutar-lists/glastcolor_boot_0.new --sparse --ignore-failed-read --totals .
sendsize[18036]: time 0.019: waiting for any estimate child: 1 running
sendsize[18038]: time 0.120: Total bytes written: 5662720 (5.5MiB, ?/s)
sendsize[18038]: time 0.121: .....
sendsize[18038]: estimate time for /boot level 0: 0.103
sendsize[18038]: estimate size for /boot level 0: 5530 KB
sendsize[18038]: time 0.121: waiting for /bin/tar "/boot" child
sendsize[18038]: time 0.122: after /bin/tar "/boot" wait
sendsize[18038]: time 0.122: done with amname '/boot', dirname '/boot', spindle -1
sendsize[18036]: time 0.123: child 18038 terminated normally
sendsize[18036]: time 0.124: waiting for any estimate child: 1 running
sendsize[18040]: time 0.124: calculating for amname '/home', dirname '/home', spindle -1
sendsize[18040]: time 0.124: getting size via gnutar for /home level 0
sendsize[18040]: time 0.126: spawning /usr/lib/amanda/runtar in pipeline
sendsize[18040]: argument list: /bin/tar --create --file /dev/null --directory /home --one-file-system --listed-incremental /var/lib/amanda/gnutar-lists/glastcolor_home_0.new --sparse --ignore-failed-read --totals .
sendsize[18040]: time 20.245: /bin/tar: ./pln/.Devhelp.pln: socket ignored
sendsize[18040]: time 20.956: Total bytes written: 23387013120 (22GiB, 1.1GiB/s)
sendsize[18040]: time 20.957: .....
sendsize[18040]: estimate time for /home level 0: 20.831
sendsize[18040]: estimate size for /home level 0: 22838880 KB
sendsize[18040]: time 20.957: waiting for /bin/tar "/home" child
sendsize[18040]: time 20.958: after /bin/tar "/home" wait
sendsize[18040]: time 20.959: done with amname '/home', dirname '/home', spindle -1
sendsize[18036]: time 20.960: child 18040 terminated normally
sendsize[18044]: time 20.961: calculating for amname '/usr/local', dirname '/usr/local', spindle -1
sendsize[18044]: time 20.961: getting size via gnutar for /usr/local level 0
sendsize[18044]: time 20.963: spawning /usr/lib/amanda/runtar in pipeline
sendsize[18044]: argument list: /bin/tar --create --file /dev/null --directory /usr/local --one-file-system --listed-incremental /var/lib/amanda/gnutar-lists/glastcolor_usr_local_0.new --sparse --ignore-failed-read --totals .
sendsize[18036]: time 20.964: waiting for any estimate child: 1 running
sendsize[18044]: time 44.049: Total bytes written: 2716508160 (2.6GiB, 113MiB/s)
sendsize[18044]: time 44.050: .....
sendsize[18044]: estimate time for /usr/local level 0: 23.087
sendsize[18044]: estimate size for /usr/local level 0: 2652840 KB
sendsize[18044]: time 44.051: waiting for /bin/tar "/usr/local" child
sendsize[18044]: time 44.051: after /bin/tar "/usr/local" wait
sendsize[18044]: time 44.052: done with amname '/usr/local', dirname '/usr/local', spindle -1
sendsize[18036]: time 44.053: child 18044 terminated normally
sendsize[18048]: time 44.054: calculating for amname '/db', dirname '/db', spindle -1
sendsize[18048]: time 44.054: getting size via gnutar for /db level 0
sendsize[18036]: time 44.055: waiting for any estimate child: 1 running
sendsize[18048]: time 44.057: spawning /usr/lib/amanda/runtar in pipeline
sendsize[18048]: argument list: /bin/tar --create --file /dev/null --directory /db --one-file-system --listed-incremental /var/lib/amanda/gnutar-lists/glastcolor_db_0.new --sparse --ignore-failed-read --totals .
sendsize[18048]: time 225.302: Total bytes written: 20804198400 (20GiB, 110MiB/s)
sendsize[18048]: time 225.327: .....
sendsize[18048]: estimate time for /db level 0: 181.270
sendsize[18048]: estimate size for /db level 0: 20316600 KB
sendsize[18048]: time 225.327: waiting for /bin/tar "/db" child
sendsize[18048]: time 225.327: after /bin/tar "/db" wait
sendsize[18048]: time 225.353: done with amname '/db', dirname '/db', spindle -1
sendsize[18036]: time 225.353: child 18048 terminated normally
sendsize: time 225.354: pid 18036 finish time Thu Jul 12 17:24:27 2007

bluethumb
July 12th, 2007, 05:53 PM
I'm a moron. A minute after posting that big mess, I realized that I hadn't checked the effects of the firewalls. So I turned off iptables on both server and client, and the problem went away. I haven't figured out exactly what the problem is yet, but it should be solvable.

I see that there are 3 IP connections, using high port numbers, between sendbackup on the client and dumper on the server. Is that the problem with iptables? In my first post I edited out part of the iptables for simplicity. Both of them have this line:
-A RH-Firewall-1-INPUT -s 171.64.108.0/23 -p tcp --dport 1024:65535 -j ACCEPT
That looks a bit like overkill, but it used to work. What do you think?