PDA

View Full Version : Server blocks data from client on Fiber IP connection



mjoop
April 8th, 2008, 03:33 PM
Hi. I have an Amanda server running on an x86 Solaris 10 box. Amanda is 2.4.5 from the solaris companion disk. There are several clients. One, a CentOS 4.6 box is connected with a direct cable using an extra gigabit ethernet port. Two are also Solaris 10 boxes running the same version of Amanda, but connected using a fiber IP network (using Sun's fcip driver). The server backs itself and the CentOS box up fine. The other two always fail with data timeouts or a lost message channel. I watched them doing a backup, and using netstat, I see the three channels opened between the boxes and the gtar processes start. The TX cue on the clients soon reaches somewhere around 270000 and stays there until the clients drop one of the three channels and the other two go into the CLOSE_WAIT state. The clients fail with a broken pipe on the index process. The server keeps the channels open with nothing in the TX and RX cues. After the clients close the channels, the server dutifully re-establishes them and the same story is played out again until the server finally gives up.

Here is the failure on the server side:

FAILURE AND STRANGE DUMP SUMMARY:
fiber-ldap /fs1 lev 0 FAILED [mesg read: Connection timed out]
fiber-sql /fs1 lev 0 FAILED [mesg read: Connection timed out]
fiber-ldap / lev 0 FAILED [mesg read: Connection timed out]
fiber-sql / lev 0 FAILED [mesg read: Connection timed out]
fiber-sql /fs2 lev 0 FAILED 20080408 [too many dumper retry]
fiber-sql /fs3 lev 0 FAILED 20080408 [too many dumper retry]
fiber-sql /fs4 lev 0 FAILED 20080408 [too many dumper retry]
fiber-sql /fs5 lev 0 FAILED 20080408 [too many dumper retry]
fiber-ldap /fs2 lev 0 FAILED [mesg read: Connection timed out]
fiber-ldap /fs3 lev 0 FAILED [mesg read: Connection timed out]

On the client, the sendbackup.debug is:

sendbackup: debug 1 pid 3200 ruid 102 euid 102: start at Tue Apr 8 10:19:55 2008
/opt/sfw/libexec/sendbackup: version 2.4.5
parsed request as: program `GNUTAR'
disk `/usr'
device `/usr'
level 0
since 1970:1:1:0:0:0
options `|;bsd-auth;srvcomp-fast;index;exclude-list=amandaexclude;'
sendbackup: try_socksize: send buffer size is 65536
sendbackup: time 0.000: stream_server: waiting for connection: 0.0.0.0.32956
sendbackup: time 0.001: stream_server: waiting for connection: 0.0.0.0.32957
sendbackup: time 0.001: stream_server: waiting for connection: 0.0.0.0.32958
sendbackup: time 0.001: waiting for connect on 32956, then 32957, then 32958
sendbackup: time 0.003: stream_accept: connection from 192.168.178.21.35139
sendbackup: time 0.004: stream_accept: connection from 192.168.178.21.35140
sendbackup: time 0.005: stream_accept: connection from 192.168.178.21.35141
sendbackup: time 0.005: got all connections
sendbackup-gnutar: time 0.006: doing level 0 dump as listed-incremental to /opt/sfw/var/amanda/gnutar-lists/fiber-sql_usr_0.new
sendbackup-gnutar: time 0.010: doing level 0 dump from date: 1970-01-01 0:00:00 GMT
sendbackup: time 0.014: spawning /opt/sfw/libexec/runtar in pipeline
sendbackup: argument list: gtar --create --file - --directory /usr --one-file-system --listed-incremental /opt/sfw/var/amanda/gnutar-lists/fiber-sql_usr_0.new --sparse --ignore-failed-read --totals --exclude-from /tmp/amanda/sendbackup._usr.20080408101955.exclude .
sendbackup-gnutar: time 0.017: /opt/sfw/libexec/runtar: pid 3203
sendbackup: time 0.022: started index creator: "/usr/sfw/bin/gtar -tf - 2>/dev/null | sed -e 's/^\.//'"
sendbackup: time 1283.197: index tee cannot write [Broken pipe]
sendbackup: time 1283.197: pid 3202 finish time Tue Apr 8 10:41:18 2008

While the backup is going on, the Amanda server does not respond to pings on the fiber IP address from either of the two clients, but the clients can ping each other fine. When the backup is over, the server responds to pings again. Everything else I have tried to do over the fiber IP works without a problem.

Any ideas?

Thanks. Mike.