PDA

View Full Version : amcheck, amdump failing when a single client is unavailable



bethany
October 31st, 2006, 08:24 AM
Hi all -
Since I changed to bsdtcp auth, amcheck and amdump fail/never finish when one of my backup clients is unavailable. This means that if one Amanda client is down, my entire backup fails. :(

For example, if I run amcheck against all clients, or just the client that is unavailable:

-sh-3.00$ amcheck mybackupset
I get the expected results, then it just hangs after "Server check took 0.069". Here is a snippet of the amcheck log in /tmp/amanda/server/(backupset)/amcheck.*.debug:


amcheck-clients: time 4000.034: connect_port: Try port 1922: Available - amcheck-clients: time 4003.035: connect_portrange: connect from 0.0.0.0.1922 failed
amcheck-clients: time 4003.035: connect_portrange: connect to XXX.XXX.XXX.XX.10080 failed: No route to host
amcheck-clients: time 4003.035: connect_port: Try port 1923: Available - amcheck-clients: time 4006.036: connect_portrange: connect from 0.0.0.0.1923 failed
amcheck-clients: time 4006.036: connect_portrange: connect to XXX.XXX.XXX.XX.10080 failed: No route to host
amcheck-clients: time 4006.036: connect_port: Try port 1924: Available - amcheck-clients: time 4009.036: connect_portrange: connect from 0.0.0.0.1924 failed
amcheck-clients: time 4009.036: connect_portrange: connect to XXX.XXX.XXX.XX.10080 failed: No route to host
amcheck-clients: time 4009.036: connect_port: Try port 1925: Available - amcheck-clients: time 4012.036: connect_portrange: connect from 0.0.0.0.1925 failed
amcheck-clients: time 4012.036: connect_portrange: connect to XXX.XXX.XXX.XX.10080 failed: No route to host
amcheck-clients: time 4012.037: connect_port: Try port 1926: Available -

It looks as if it's enumerating ports on the unavailable client or server. It will run for hours before I give up and kill the process. I thought bsdtcp is only supposed to use 10080/tcp or 512,1023/tcp? Do I need to specify the reserved-tcp-port in amanda.conf?

Server: amanda-backup_server-2.5.1b2-1.rhel4 RPM
thanks!
B.

ktill
October 31st, 2006, 03:49 PM
Hi,

I cannot reproduce the problem here. I purposedly brought down one of the systems in the disklist. Amcheck report:
WARNING: ultra2.zmanda.com: selfcheck request failed: No route to host

amdump reports "result missing" from ultra2 while the other sytem on the disklist got backed up correctly.

What are the ctimeout, dtimeout and etimeout setting?

--Kevin Till

bethany
November 1st, 2006, 06:32 AM
Hi Kevin,
My timeout values in amanda.conf are as follows:

etimeout 300 # number of seconds per filesystem for estimates.
dtimeout 3600 # number of idle seconds before a dump is aborted.
ctimeout 30

bethany
November 1st, 2006, 07:39 AM
P.S. - I tried specifying the "reserved-tcp-port" variable as mentioned here: http://wiki.zmanda.com/index.php/Amanda.conf by putting the following in amanda.conf:

reserved-tcp-port "512,1023"

But Amanda doesn't seem to like it:

-sh-3.00$ amcheck ISDaily2.5
"/etc/amanda/ISDaily2.5/amanda.conf", line 65: configuration keyword expected
"/etc/amanda/ISDaily2.5/amanda.conf", line 65: end of line is expected
amcheck: errors processing config file "/etc/amanda/ISDaily2.5/amanda.conf"

bethany
November 1st, 2006, 08:04 AM
Here I run amcheck against a host that is in fact up and running:


-sh-3.00$ amcheck ISDaily2.5 cfdev.xxx.xxxx.xxx
Amanda Tape Server Host Check
-----------------------------
Holding disk /amandahold: 385996 MB disk space available, using 395258080 MB
slot 28: read label `ISDaily27', date `20060906030001'
NOTE: skipping tape-writable test
Tape ISDaily27 label ok
(snip)

Amanda Backup Client Hosts Check
--------------------------------
ERROR: NAK cfdev.xxx.xxxx.xxx: host middenheap-dev.xxx.xxxx.xxx: port 1025 not secure
Client check: 18 hosts checked in 188.978 seconds, 1 problem found

(brought to you by Amanda 2.5.1b2)

On the client, a ps -aux turns up two amandad processes still hanging around:

504 23785 0.0 0.1 2204 924 ? Ss Oct29 0:00 amandad -auth=bsdtcp amdump
504 27214 0.0 0.1 2204 924 ? Ss Oct30 0:00 amandad -auth=bsdtcp amdump

I'm guessing those are from the last night's two backups that I had to kill on the server because they were timing out due to the single host being unavailable. I did a "sudo killall amandad" on the client then ran amcheck on the server again and it ran fine.

bethany
November 1st, 2006, 08:09 AM
And, one more time, an amcheck run on just the host that is down:

cat /tmp/amanda/server/ISDaily2.5/amcheck.20061101104702.debug

amcheck: debug 1 pid 14416 ruid 502 euid 0: start at Wed Nov 1 10:47:02 2006
amcheck: debug 1 pid 14416 ruid 502 euid 502: rename at Wed Nov 1 10:47:02 2006
security_getdriver(name=bsdtcp) returns 0xf57140
security_handleinit(handle=0x8ff4670, driver=0xf57140 (BSDTCP))
security_streaminit(stream=0x8ff4ef0, driver=0xf57140 (BSDTCP))
amcheck-clients: time 0.005: connect_port: Skip port 512: Owned by exec.
amcheck-clients: time 0.005: connect_port: Skip port 513: Owned by login.
amcheck-clients: time 0.005: connect_port: Skip port 514: Owned by shell.
amcheck-clients: time 0.005: connect_port: Skip port 515: Owned by printer.
amcheck-clients: time 0.006: connect_port: Try port 516: Available - changer_query: changer return was 60 1
changer_query: searchable = 0
changer_find: looking for ISDaily27 changer is searchable = 0
amcheck-clients: time 3.006: connect_portrange: connect from 0.0.0.0.516 failed
amcheck-clients: time 3.006: connect_portrange: connect to XXX.XXX.XXX.99.10080 failed: No route to host
amcheck-clients: time 3.007: connect_port: Try port 517: Available - amcheck-clients: time 6.008: connect_portrange: connect from 0.0.0.0.517 failed
amcheck-clients: time 6.008: connect_portrange: connect to XXX.XXX.XXX.99.10080 failed: No route to host
amcheck-clients: time 6.008: connect_port: Try port 518: Available - amcheck-clients: time 9.008: connect_portrange: connect from 0.0.0.0.518 failed
amcheck-clients: time 9.008: connect_portrange: connect to XXX.XXX.XXX.99.10080 failed: No route to host
amcheck-clients: time 9.008: connect_port: Skip port 519: Owned by utime.
amcheck-clients: time 9.008: connect_port: Skip port 520: Owned by efs.
amcheck-clients: time 9.008: connect_port: Skip port 521: Owned by ripng.
amcheck-clients: time 9.009: connect_port: Try port 522: Available - amcheck-clients: time 12.009: connect_portrange: connect from 0.0.0.0.522 failed
amcheck-clients: time 12.009: connect_portrange: connect to XXX.XXX.XXX.99.10080 failed: No route to host
amcheck-clients: time 12.009: connect_port: Try port 523: Available - amcheck-clients: time 15.009: connect_portrange: connect from 0.0.0.0.523 failed
amcheck-clients: time 15.009: connect_portrange: connect to XXX.XXX.XXX.99.10080 failed: No route to host
amcheck-clients: time 15.010: connect_port: Try port 524: Available - amcheck-clients: time 18.010: connect_portrange: connect from 0.0.0.0.524 failed
amcheck-clients: time 18.010: connect_portrange: connect to XXX.XXX.XXX.99.10080 failed: No route to host
amcheck-clients: time 18.010: connect_port: Skip port 525: Owned by timed.
amcheck-clients: time 18.010: connect_port: Skip port 526: Owned by tempo.
amcheck-clients: time 18.010: connect_port: Try port 527: Available - amcheck-clients: time 21.011: connect_portrange: connect from 0.0.0.0.527 failed
amcheck-clients: time 21.011: connect_portrange: connect to XXX.XXX.XXX.99.10080 failed: No route to host
amcheck-clients: time 21.011: connect_port: Try port 528: Available - amcheck-clients: time 24.012: connect_portrange: connect from 0.0.0.0.528 failed
amcheck-clients: time 24.012: connect_portrange: connect to XXX.XXX.XXX.99.10080 failed: No route to host
amcheck-clients: time 24.012: connect_port: Try port 529: Available - amcheck-clients: time 27.013: connect_portrange: connect from 0.0.0.0.529 failed
amcheck-clients: time 27.013: connect_portrange: connect to XXX.XXX.XXX.99.10080 failed: No route to host
amcheck-clients: time 27.014: connect_port: Skip port 530: Owned by courier.
amcheck-clients: time 27.014: connect_port: Skip port 531: Owned by conference.
amcheck-clients: time 27.014: connect_port: Skip port 532: Owned by netnews.

Why does it keep trying the host even though it has no route to it? I tried turning off iptables on the server thinking maybe it was causing the server to somehow not get the hint that the host is completely unavailable, no difference.

ktill
November 1st, 2006, 11:43 AM
Hi,

try
reserved-tcp-port 512,1023 # no quotes

bethany
November 1st, 2006, 12:10 PM
Without quotes doesn't work either. Here's the paste from my amanda.conf:


tapebufs 20 # A positive integer telling taper how many
# 32k buffers to allocate. The default is 20 (640k).

reserved-tcp-port 512,1023 # ports used by bsdtcp auth

And the results:

-sh-3.00$ amcheck ISDaily2.5 cfdev.XX.XXX.XXX
"/etc/amanda/ISDaily2.5/amanda.conf", line 65: configuration keyword expected
"/etc/amanda/ISDaily2.5/amanda.conf", line 65: end of line is expected
"/etc/amanda/ISDaily2.5/amanda.conf", line 66: configuration keyword expected
"/etc/amanda/ISDaily2.5/amanda.conf", line 66: end of line is expected
amcheck: errors processing config file "/etc/amanda/ISDaily2.5/amanda.conf"

Does it need to be in a specific location in amanda.conf?

ktill
November 1st, 2006, 12:17 PM
reserved-tcp-port support was added on 9/22/06. I suspect your amanda software version was built before that date.

bethany
November 1st, 2006, 12:25 PM
amdump: start at Wed Nov 1 03:15:01 EST 2006
amdump: datestamp 20061101
planner: pid 7183 executable /usr/lib/amanda/planner version 2.5.1b2
planner: build: VERSION="Amanda-2.5.1b2"
planner: BUILT_DATE="Tue Aug 22 12:12:18 PDT 2006"
planner: BUILT_MACH="Linux rocky.zmanda.com 2.6.9-22.0.2.ELsmp #1 SMP Thu Jan 5 17:13:01 EST 2006 i686 i686 i386 GNU/Linux"

OK, I guess I need to uninstall and re-install with a newer RPM. Is it normal to add new features without incrementing the version/build ID?
Edit: maybe I'm a little confused about versions - should I be working with the 2.5.1p RPMs available on the download page?

bethany
November 1st, 2006, 01:30 PM
Upgraded server and all clients to 2.5.1p1-1. It still doesn't like the reserved-tcp-port directive in amanda.conf:

-sh-3.00$ rpm -qa | grep amanda
amanda-backup_server-2.5.1p1-1.rhel4

-sh-3.00$ grep reserved /etc/amanda/ISDaily2.5/amanda.conf
reserved-tcp-port 512,1023 # ports used by bsdtcp auth
# non-reserved portion of the holding disk.

-sh-3.00$ amcheck ISDaily2.5
"/etc/amanda/ISDaily2.5/amanda.conf", line 65: configuration keyword expected
"/etc/amanda/ISDaily2.5/amanda.conf", line 65: end of line is expected
"/etc/amanda/ISDaily2.5/amanda.conf", line 66: configuration keyword expected
"/etc/amanda/ISDaily2.5/amanda.conf", line 66: end of line is expected
amcheck: errors processing config file "/etc/amanda/ISDaily2.5/amanda.conf"

HOWEVER, the good news is that amcheck is no longer timing out on the unavailable host! :)

ktill
November 1st, 2006, 02:55 PM
>HOWEVER, the good news is that amcheck is no longer timing out on the unavailable >host!

that's good to know.
As for the reserved-tcp-port, maybe the fix didn't get picked up for the build. However, the default is 512,1023 which is what you want anyway.