I recently upgraded my server from 3.3.2 to 3.4.1. Although backups seem to be proceeding without incident, the mail report has scores of messages per the following pattern:
planner: Last full dump of client:disk on tape DAILYNN overwritten in 1 run
My disklist has roughly 110 entries and I received 112 such messages after the first run of 3.4.1 and 87 on the second. Dumps are to vdisk. I'm running RHEL 6.
I tracked this down to planner.c (which recently had changes to relevant routines, per the ChangeLog). It looks like when_overwrite() is being called to determine if a dump is to be overwritten within the next 5 days. The return value of when_overwrite() is used verbatim, and since it is often 1 this is duly logged as a warning. However, when_overwrite() will also return 1 to report certain warning or error conditions. These appear to relate to lookup failures related to the tape database, but I haven't done enough digging to really understand what's going on.
It seems to me that when_overwrite() ought to do a better job of reporting errors it encounters, or else error conditions ought to be checked in some other fashion. I'm happy to provide further information, but perhaps this is enough to identify an obvious problem. I'm also inclined to suspect that, due to the reduced number of messages tonight, once every dle has a level 0 performed by 3.4.1 whatever database issue their might be will be resolved. But it's an obnoxious and unnerving bug...
Also, it's really not clear which support venues are active these days. Forgive me (and please advise me) if I should really be posting this elsewhere.
The build is currently fighting me. Why the hell does amanda need glib? And --disable-glibtest doesn't work. RHEL 6 includes only glib 1.2, and the configure test indicates it wants 2.2. Any idea how to get around this idiotic requirement? I certainly don't need any graphical components.
Oh, right. glib2. I'm still irritated at how Gnome has infiltrated everything. Why does Amanda require the Gnome libraries? Is there some graphical component I'm unaware of?
A build without the patch using identical configure flags seems to work. The planner is currently running and has passed the point at which it was breaking.
Incidentally, I couldn't determine why I wasn't getting a core dump. I can generate a core dump with an appropriate C program, so they're not disabled in any system- or user-wide way. But nothing from Amanda. Any ideas? I can provide a stack backtrace if I could get a core file.
I encountered two additional bugs and have rolled back to 3.3.9.
The first is a HEADER_DONE not set error on a single dataset of a ZFS pool. We're dumping in the neighborhood of 20 small datasets (one per user) on a server running zfsonlinux, and each run resulted in precisely one of these failing with the HEADER_DONE not set error. It was not the same dataset each time. I've never encountered this error before, and digging into the code it wasn't immediately apparent why this was happening. The error did not occur with last night's 3.3.9 run.
The other error, which was a (literal) showstopper, was a taper hang. The run two nights ago never finished. At the time of investigation there were 8 idle dumper threads and 3 taper threads, with the driver reporting that the taper was writing. Two of the tapers were in a futex wait channel and the third was in a poll_s... wchan. I ultimately sent sigterm to this last taper and amanda immediately wrapped things up with a taper protocol error. The logs didn't report anything unusual prior to the hang and there's no evidence of any i/o problems with the hardware or network. And the problem did not occur with 3.3.9.
So it seems 3.4 is not ready for primetime, at least not for us.