PDA

View Full Version : FreeBSD 6.4-RELEASE amtape DailySet2 eject



David.Boyd@insightbb.com
February 5th, 2009, 10:36 AM
Operating system level FreeBSD 6.4-RELEASE-p3

Amanda server upgraded from 2.6.0p2 to 2.6.1

Library hardware is Exabyte EZ17

Applied recommended glib level patch to common-src/event.c (__FreeBSD__).

Everything seems to work until the last tape is left in the drive...amtape DailySet2 eject results
in "amtape: slot <none> is ejected." and the tape is not ejected.

This exact same configuration works without a hitch on 2.6.0p2 and before.

mtx level is 1.3.11.

perl level is 5.8.9.

tape changer is chg-zd-mtx.

I can continue to use this configuration in production by issuing "mt offline && mtx unload" after the amdump is complete.

Debugging is possible during the daytime hours...all dumps run at night.

Thanks, in advance, for any assistance that you can provide in this matter.

David.

dustin
February 5th, 2009, 01:00 PM
Yikes -- the piece of Amanda::Changer::compat that should handle ejecting volumes is missing. I'll have a patch (just to the perl, so something you can apply to your already-built Amanda) shortly.

David.Boyd@insightbb.com
February 5th, 2009, 01:18 PM
Great! I look forward to deploying 2.6.1 on all of our tape servers!

dustin
February 5th, 2009, 05:04 PM
OK, patch is here:
http://github.com/djmitche/amanda/commit/compat-eject-fix.patch

dustin
February 6th, 2009, 08:03 AM
Don't apply the installchecks (they have a line of debugging code that won't work except on my machine), but the perl/* changes should be fairly straightforward to apply either to your source tree or to the in-place Amanda perl modules.

David.Boyd@insightbb.com
February 6th, 2009, 01:43 PM
I applied your patch (tried both applying it to the installed system and to the source during the install).

The result seems exactly the same ... "amtape: slot <none> is ejected.

I have tested as many amtape commands as I use in production ... slot #, slot next, slot advance, clean, show, label xxxxxx, etc. and only "eject" fails to perform.

I had an opportunity to install amanda-2.6.1-20090206 ( which includes you patch).

I still needed to apply the recommended glib fix (common-src/event.c).

The result is the same...only eject fails.

I fell back to amanda-2.6.0p2 and the problem(s) went away.

I have access to the server over the weekend. If there is anything that I can do to provide additional information, please let me know.

David.Boyd@insightbb.com
February 6th, 2009, 01:56 PM
Also, if it helps, /usr/local/libexec/amanda/chg-zd-mtx -eject works ... I haven't changed any configuration files/options since upgrading, but chg-zd-mtx seems to indicate (to me) that the configs are OK.:(

dustin
February 6th, 2009, 02:09 PM
I'll commit the event.c patch in a moment.

Can you attach the chg-glue debug log?

David.Boyd@insightbb.com
February 6th, 2009, 02:46 PM
Please don't assume I know what I'm doing!

amgetconf DailySet2 dbopen.chg-glue
export DEBUG=9
amtape DailySet2 eject
amgetconf DailySet2 dbclose.chg-glue:/tmp/amanda/server/chg_glue.<timestamp>.debug

1233959309.538653: chg_glue: pid 74919 ruid 90 euid 90 version 2.6.1: start at Fri Feb 6 17:28:29 2009
1233959354.143300: unknown: pid 74922 ruid 90 euid 90 version 2.6.1: at Fri Feb 6 17:29:14 2009
1233959354.143386: unknown: pid 74922 finish time Fri Feb 6 17:29:14 2009


I'm also including /tmp/amanda/server/amtape.<timestamp>.debug

1233959315.094604: amtape: pid 74920 ruid 90 euid 90 version 2.6.1: start at Fri Feb 6 17:28:35 2009
1233959315.097366: amtape: pid 74920 ruid 90 euid 90 version 2.6.1: rename at Fri Feb 6 17:28:35 2009
1233959315.098488: amtape: changer: >> -eject
1233959315.333440: amtape: changer: << EXITSTATUS 0
1233959315.333545: amtape: changer: << <none> OK: no volume loaded
1233959315.333656: amtape: pid 74920 finish time Fri Feb 6 17:28:35 2009

I'm not good, but I'm persistent.

David.Boyd@insightbb.com
February 7th, 2009, 11:30 AM
I apparently misspoke when I said that all the amptape commands except "eject" were working.

The "clean" command also has problems:

amtape DailySet2 clean

amtape: device <error> not clean: unknown command '-clean

Yes, there was no trailing apostrophe.

This is with amanda-2.6.1-20090207 ... didn't have to apply any extra patches ... thanks.

/usr/local/libexec/amanda/chg-zd-mtx -clean returned "7 tape:/dev/nrsa0" correctly.

We're doing okay, though, with the new amanda-2.6.1.

30 of 52 clients are running 2.6.1 (I'll have 50 upgraded by 2/15/2009).

2 clients are runing 2.5.2.p1 due to build restrictions of older OS levels.

1 of 5 servers is running 2.6.1 (4 servers are running 2.6.0p2).

dustin
February 7th, 2009, 06:29 PM
Wow, you're "old school", if you know about dbopen and dbclose!

There's some funny business going on here -- "clean" also seems to be missing from chg-glue, and chg-glue doesn't seem to be producing a debug log. I'll get back to you.

dustin
February 7th, 2009, 07:33 PM
As background, what's happened in 2.6.1 is that we added a new (Perl-based) changer API (Amanda::Changer), and a backward-compatible wrapper (chg-glue). I'm currently working on rewriting the C applications that access a changer in Perl, using the new changer API. Apparently there were some components omitted from Amanda::Changer and chg-glue.

I've added two fixes here:
http://github.com/djmitche/amanda/commits/changerfixes

The specific patches are:
http://github.com/djmitche/amanda/commit/27cd0ee270f46e53364145fce29e758ba66e574a.patch
http://github.com/djmitche/amanda/commit/28e2521ae88ea069babdf39ae54677e64b7da103.patch

David.Boyd@insightbb.com
February 8th, 2009, 10:39 AM
I applied both patches.

amtape DailySet2 clean worked perfectly.

here is the contents of my chg-zd-mtx.conf. I don't know that it helps.

#
#
driveslot=0
firstslot=1
lastslot=7
cleanslot=7
autoclean=0
cleancycle=120
offline_before_unload=1
havereader=0
#
slotinfofile=SlotInfo
#
#
#
#

I'm ready to try anything you think might help with this problem.

I have updated another server to amanda-2.6.1 ... this "eject" problem is annoying but not a show stopper.

I have built a work-around into our "AmandaDump" scripts.

dustin
February 8th, 2009, 05:23 PM
Great! You should now see a debug file matching
$AMANDA_DBGDIR/server/DailySet2/chg-glue*
that will hopefully have some information about the "eject" operation. You'll also see an
$AMANDA_DBGDIR/server/DailySet2/amtape*
logfile from about the same time. Please attach or excerpt both? Thanks for your patience, by the way!

David.Boyd@insightbb.com
February 9th, 2009, 08:38 AM
I may have included more information than you need but, hopefully, not less.

################################################## #############################
INPUT: amcheck DailySet2

OUTPUT:
Amanda Tape Server Host Check
-----------------------------
Holding disk /u/2/dumps/amanda: 67880 MB disk space available, using 67880 MB
Holding disk /u/1/dumps/amanda: 67880 MB disk space available, using 67880 MB
slot 3:read label `890623', date `X'.
NOTE: skipping tape-writable test
Tape 890623 label ok
WARNING: tapecycle (6) <= runspercycle (6)
Server check took 25.603 seconds

Amanda Backup Client Hosts Check
--------------------------------
Client check: 4 hosts checked in 2.245 seconds. 0 problems found.

(brought to you by Amanda 2.6.1-20090207)

################################################## #############################
INPUT: mtx status

OUTPUT:
Storage Changer /dev/changer:1 Drives, 7 Slots ( 0 Import/Export )
Data Transfer Element 0:Full (Storage Element 3 Loaded)
Storage Element 1:Full
Storage Element 2:Full
Storage Element 3:Empty
Storage Element 4:Full
Storage Element 5:Full
Storage Element 6:Full
Storage Element 7:Full

################################################## #############################
INPUT: amtape DailySet2 eject

OUTPUT:
amtape: slot <none> is ejected.

################################################## #############################
/tmp/amanda/server/DailySet2/amtape.20090209112836.debug

1234196916.002922: amtape: pid 60888 ruid 90 euid 90 version 2.6.1: start at Mon Feb 9 11:28:36 2009
1234196916.005727: amtape: pid 60888 ruid 90 euid 90 version 2.6.1: rename at Mon Feb 9 11:28:36 2009
1234196916.006797: amtape: changer: >> -eject
1234196916.231954: amtape: changer: << EXITSTATUS 0
1234196916.232043: amtape: changer: << <none> OK: no volume loaded
1234196916.232156: amtape: pid 60888 finish time Mon Feb 9 11:28:36 2009

################################################## #############################
/tmp/amanda/server/DailySet2/chg-glue.20090209112836.debug

1234196916.131227: chg-glue: pid 60889 ruid 90 euid 90 version 2.6.1: start at Mon Feb 9 11:28:36 2009
1234196916.196727: chg-glue: pid 60889 ruid 90 euid 90 version 2.6.1: rename at Mon Feb 9 11:28:36 2009
1234196916.231732: chg-glue: got command '-eject'

################################################## #############################

dustin
February 9th, 2009, 09:20 AM
And I assume the tape wasn't ejected?

The code that should handle that is in Amanda::Changer::compat:


501 sub do_release {
502 my $self = shift;
503 my %params = @_;
504
505 if (exists $params{'eject'} && $params{'eject'}) {
506 $self->{'chg'}->_simple_op("eject", %params);
507 } else {
508 $self->{'chg'}->{'reserved'} = 0;
509
510 if (exists $params{'finished_cb'}) {
511 Amanda::MainLoop::call_later($params{'finished_cb' }, undef);
512 }
513 }
514 }


which, now that I look more closely, has a bug (it doesn't set 'reserved' back to zero), but not a bug that would cause the eject to fail.

Does chg-zd-mtx's debug log show its -eject actually being invoked? I'll have some more debugging patches on the branch in a moment.

dustin
February 9th, 2009, 09:30 AM
OK, two new patches in http://github.com/djmitche/amanda/commits/changerfixes. You can just download the result at http://github.com/djmitche/amanda/raw/changerfixes/perl/Amanda/Changer/compat.pm

David.Boyd@insightbb.com
February 9th, 2009, 01:04 PM
With the new compat.pm copied in, the resulting logs were identical (except for timestamp, of course) to the previous logs.

Does that indicate a code path issue?

You we're expecting to hit one or more Amanda::Debug::debug items, right?

1234213046.935879: amtape: pid 66586 ruid 90 euid 90 version 2.6.1: start at Mon Feb 9 15:57:26 2009
1234213046.938592: amtape: pid 66586 ruid 90 euid 90 version 2.6.1: rename at Mon Feb 9 15:57:26 2009
1234213046.939720: amtape: changer: >> -eject
1234213047.165936: amtape: changer: << EXITSTATUS 0
1234213047.166024: amtape: changer: << <none> OK: no volume loaded
1234213047.166128: amtape: pid 66586 finish time Mon Feb 9 15:57:27 2009

1234213047.064356: chg-glue: pid 66587 ruid 90 euid 90 version 2.6.1: start at Mon Feb 9 15:57:27 2009
1234213047.130333: chg-glue: pid 66587 ruid 90 euid 90 version 2.6.1: rename at Mon Feb 9 15:57:27 2009
1234213047.165719: chg-glue: got command '-eject'

1234213056.390471: chg_zd_mtx: pid 66598 ruid 90 euid 90 version 2.6.1: start at Mon Feb 9 15:57:36 2009
15:57:37 Using config file /usr/local/etc/amanda/DailySet2/chg-zd-mtx.conf
15:57:37 Arg info:
$# = 1
$0 = "/usr/local/libexec/amanda/chg-zd-mtx"
$1 = "-eject"
15:57:37 Running: /usr/local/sbin/mtx status
15:57:37 Exit code: 0
Stdout:
Storage Changer /dev/changer:1 Drives, 7 Slots ( 0 Import/Export )
Data Transfer Element 0:Full (Storage Element 4 Loaded)
Storage Element 1:Full
Storage Element 2:Full
Storage Element 3:Full
Storage Element 4:Empty
Storage Element 5:Full
Storage Element 6:Full
Storage Element 7:Full
15:57:37 Config info:
firstslot = "1"
lastslot = "7"
cleanslot = "7"
cleancycle = "60"
offline_before_unload = "1"
unloadpause = "0"
autoclean = "0"
autocleancount = "99"
havereader = "0"
driveslot = "0"
poll_drive_ready = "3"
initial_poll_delay = "0"
max_drive_wait = "120"
15:57:37 EJECT -> ejecting tape from /dev/nsa0
15:57:37 STATUS -> currently loaded slot = 4
-> currently loaded barcode = ""
15:57:37 EJECT -> moving tape from drive 0 to storage slot 4
15:57:37 Running: try_eject_device /dev/nsa0
15:57:51 Exit code: 0
15:57:51 Running: /usr/local/sbin/mtx unload 4 0
15:57:58 Exit code: 0
Stdout:
Unloading drive 0 into Storage Element 4...done
15:57:58 -> status 0, result "Unloading drive 0 into Storage Element 4...done"
15:57:58 Exit (0) -> 4 tape:/dev/nsa0
1234213078.823608: unknown: pid 66812 ruid 90 euid 90 version 2.6.1: at Mon Feb 9 15:57:58 2009
1234213078.823694: unknown: pid 66812 finish time Mon Feb 9 15:57:58 2009

Just let me know what to try next.

I have this system until 2:00 A.M. EST.

dustin
February 9th, 2009, 02:25 PM
Ah, I see it now. The problem is that the new changer API only allows you to eject a volume you've reserved, but 'amtape' is not reserving a volume. I'll add a new eject method to changer objects to take care of this.

dustin
February 9th, 2009, 02:29 PM
OK, the changes are pushed to the github branch. Want to give them a try?

David.Boyd@insightbb.com
February 9th, 2009, 03:50 PM
Okay, I must be doing something wrong ... the results are the same.

I copied a new version of compat.pm to Amanda/Changer.

Was there something else I needed to do?

dustin
February 10th, 2009, 01:09 PM
Well, there's one way to tell -- introduce a syntax error into the compat.pm that you're overwriting. If it doesn't cause Amanda's behavior to change, then you're overwriting the wrong file.

David.Boyd@insightbb.com
February 10th, 2009, 02:13 PM
I've done that.

I inserted an Amanda::Debug::debug("processing EJECT") line in compat.pm to verify correct execution.

I've created a test server with the same library hardware (an Exabyte EZ17).

now I don't have to worry about leaving anything in a "not quite ready for prime time" state.

Can you briefly explain the interfaces between Changer.pm, compat.pm, chg-glue and chg-zd-mtx?

Also, "amtape <config> clean" seems to insist on loading the "current" tape, even when no tape was loaded previously. Is this desired/expected/required behavior?

dustin
February 10th, 2009, 02:23 PM
The changer operational pathway is as follows:

An Amanda application (amtape, taper, amfetchdump, etc.) invokes C-level changer functions. The C source (server-src/changer.{c,h}) spawns chg-glue and communicates Changer-API-1.0 commands to it using a simple line-based protocol. chg-glue interfaces those commands to the 2.0 perl API (Amanda::Changer). In this case, the changer in use is an old-style shell script (chg-zd-mtx), so chg-glue uses the Amanda::Changer::compat class, which spawns a new chg-zd-mtx process for each operation. Yeah, it's pretty convoluted, but contains *two* compatibility bridges which will soon be eliminated.

Did you see any results from your "processing EJECT" line?

As for the clean operation subsequently loading the "current" slot -- yeah, I'm not sure why I did that:


149 sub do_clean {
150 my $do_clean = sub {
151 $chg->clean(
152 finished_cb => sub {
153 my ($error) = @_;
154 if ($error) {
155 print "EXITSTATUS 1\n";
156 print "<error> $error\n";
157 Amanda::MainLoop::call_later(\&getcmd);
158 } else {
159*** do_slot("current");
160 }
161 },
162 drive => '',
163 );
164 };
165 release_and_then([], $do_clean);
166 }

David.Boyd@insightbb.com
February 10th, 2009, 03:11 PM
The "processing EJECT" line verifies that I did, in fact, install compat.pm properly.

Unfortunately, as I state last night, the resulting debug logs look identical to the earlier ones (with updated timestamp, of course).

So, I am still confused.

What's next?

dustin
February 10th, 2009, 03:21 PM
Oh! I bet you didn't see the changes to chg-glue. You can download it here:
http://github.com/djmitche/amanda/raw/changerfixes/changer-src/chg-glue.pl
and replace your existing chg-glue. The only trick is to look at the 'use lib' line near the beginning of the original, and copy/paste it into the new version.

I don't know how familiar you are with 'git', but if you're interested, I can help you fork the repository (http://wiki.zmanda.com/index.php/Fork_Amanda_on_Github) and set up a test-build (http://wiki.zmanda.com/index.php/Testing) environment for you. You need a bit more equipment (http://wiki.zmanda.com/index.php/Minimum_Versions) to build Amanda from the version control rather than a tarball, but it's not too much.

David.Boyd@insightbb.com
February 10th, 2009, 04:11 PM
You're right.

I didn't see chg-glue.pl.

I copied chg-glue.pl to chg-glue.

command: amtape DailySet2 eject

result: "amtape: slot <none> is ejected."

the actual slot was 4.

The tape was ejected.

debug logs:

1234310356.708838: amtape: pid 22029 ruid 90 euid 90 version 2.6.1: start at Tue Feb 10 18:59:16 2009
1234310356.711559: amtape: pid 22029 ruid 90 euid 90 version 2.6.1: rename at Tue Feb 10 18:59:16 2009
1234310356.712635: amtape: changer: >> -eject
1234310382.440514: amtape: changer: << EXITSTATUS 0
1234310382.440603: amtape: changer: << <none> volume ejected
1234310382.440710: amtape: pid 22029 finish time Tue Feb 10 18:59:42 2009

1234310356.837977: chg-glue: pid 22030 ruid 90 euid 90 version 2.6.1: start at Tue Feb 10 18:59:16 2009
1234310356.903244: chg-glue: pid 22030 ruid 90 euid 90 version 2.6.1: rename at Tue Feb 10 18:59:16 2009
1234310356.938596: chg-glue: got command '-eject'
1234310356.938683: chg-glue: processing EJECT
1234310356.938716: chg-glue: running simple op 'eject'
1234310382.440310: chg-glue: simple op 'eject' ok

1234310357.097171: chg_zd_mtx: pid 22041 ruid 90 euid 90 version 2.6.1: start at Tue Feb 10 18:59:17 2009
18:59:17 Using config file /usr/local/etc/amanda/DailySet2/chg-zd-mtx.conf
18:59:17 Arg info:
$# = 1
$0 = "/usr/local/libexec/amanda/chg-zd-mtx"
$1 = "-eject"
18:59:17 Running: /usr/local/sbin/mtx status
18:59:18 Exit code: 0
Stdout:
Storage Changer /dev/changer:1 Drives, 7 Slots ( 0 Import/Export )
Data Transfer Element 0:Full (Storage Element 4 Loaded)
Storage Element 1:Full
Storage Element 2:Full
Storage Element 3:Full
Storage Element 4:Empty
Storage Element 5:Full
Storage Element 6:Full
Storage Element 7:Full
18:59:18 Config info:
firstslot = "1"
lastslot = "7"
cleanslot = "7"
cleancycle = "60"
offline_before_unload = "1"
unloadpause = "0"
autoclean = "0"
autocleancount = "99"
havereader = "0"
driveslot = "0"
poll_drive_ready = "3"
initial_poll_delay = "0"
max_drive_wait = "120"
18:59:18 EJECT -> ejecting tape from /dev/nsa0
18:59:18 STATUS -> currently loaded slot = 4
-> currently loaded barcode = ""
18:59:18 EJECT -> moving tape from drive 0 to storage slot 4
18:59:18 Running: try_eject_device /dev/nsa0
18:59:32 Exit code: 0
18:59:32 Running: /usr/local/sbin/mtx unload 4 0
18:59:40 Exit code: 0
Stdout:
Unloading drive 0 into Storage Element 4...done
18:59:40 -> status 0, result "Unloading drive 0 into Storage Element 4...done"
18:59:40 Exit (0) -> 4 tape:/dev/nsa0
1234310380.434458: unknown: pid 22255 ruid 90 euid 90 version 2.6.1: at Tue Feb 10 18:59:40 2009
1234310380.434544: unknown: pid 22255 finish time Tue Feb 10 18:59:40 2009

dustin
February 10th, 2009, 04:18 PM
Great!

As far as which slot was actually ejected, that's kind of a messy notion. The *drive* was ejected, but what the changer did with that volume is unspecified -- it may still be halfway-inserted in the drive; it may be back in slot 4; it may be in the robot arm .. so the new changer API prefers not to make any representations.

David.Boyd@insightbb.com
February 24th, 2009, 09:37 AM
Will these patches (chg-glue and compat.pm) be included in a near future snapshot?

dustin
February 24th, 2009, 10:41 AM
They have already been committed.

David.Boyd@insightbb.com
February 24th, 2009, 12:58 PM
Maybe I'm missing something here.

The amanda-2.6.1-20090223 that I just configured and installed is missing subroutines "do_clean" and "do_eject" in /usr/local/libexec/amanda/chg-glue and is missing subroutine "eject" in /usr/local/lib/perl5/site_perl/5.8.9/Amanda/Changer/compat.pm.

The source file amanda-20090223/changer-src/chg-glue.pl is dated December 16.

The source file amanda-20090223/perl/Amanda/Changer/compat.pm is dated February 19.

Please forgive me if I'm just too dumb to figure this out.

I restored the aforementioned subroutines to the two programs and now things appear to work.

dustin
February 24th, 2009, 01:44 PM
sorry, I forgot to merge it to the 2.6.1 branch. Merged now - r1706

E-RPM SOFT COM
August 9th, 2009, 03:32 PM
Great! I look forward to deploying 2.6.1 on all of our tape servers!