Page 1 of 2 12 LastLast
Results 1 to 10 of 12

Thread: Bug in when_overwrite() usage in planner.c

  1. #1

    Default Bug in when_overwrite() usage in planner.c

    I recently upgraded my server from 3.3.2 to 3.4.1. Although backups seem to be proceeding without incident, the mail report has scores of messages per the following pattern:

    planner: Last full dump of client:disk on tape DAILYNN overwritten in 1 run

    My disklist has roughly 110 entries and I received 112 such messages after the first run of 3.4.1 and 87 on the second. Dumps are to vdisk. I'm running RHEL 6.

    I tracked this down to planner.c (which recently had changes to relevant routines, per the ChangeLog). It looks like when_overwrite() is being called to determine if a dump is to be overwritten within the next 5 days. The return value of when_overwrite() is used verbatim, and since it is often 1 this is duly logged as a warning. However, when_overwrite() will also return 1 to report certain warning or error conditions. These appear to relate to lookup failures related to the tape database, but I haven't done enough digging to really understand what's going on.

    It seems to me that when_overwrite() ought to do a better job of reporting errors it encounters, or else error conditions ought to be checked in some other fashion. I'm happy to provide further information, but perhaps this is enough to identify an obvious problem. I'm also inclined to suspect that, due to the reduced number of messages tonight, once every dle has a level 0 performed by 3.4.1 whatever database issue their might be will be resolved. But it's an obnoxious and unnerving bug...

    Also, it's really not clear which support venues are active these days. Forgive me (and please advise me) if I should really be posting this elsewhere.

    Thanks for any help.

    Corey

  2. #2
    Join Date
    Nov 2005
    Location
    Canada
    Posts
    1,044

    Default

    Can you try the untested attached patch?

    You can test it by running only the planner:
    /path/.../to/.../planner CONFIG --log-filename /tmp/planner.log

    The look at the /tmp/planner.log,there should be no warnings.
    Attached Files Attached Files

  3. #3

    Default

    The build is currently fighting me. Why the hell does amanda need glib? And --disable-glibtest doesn't work. RHEL 6 includes only glib 1.2, and the configure test indicates it wants 2.2. Any idea how to get around this idiotic requirement? I certainly don't need any graphical components.

  4. #4
    Join Date
    Nov 2005
    Location
    Canada
    Posts
    1,044

    Default

    Please, report the exact error you get.

    I think you need to install glib2 and glib2-devel

  5. #5

    Default

    Oh, right. glib2. I'm still irritated at how Gnome has infiltrated everything. Why does Amanda require the Gnome libraries? Is there some graphical component I'm unaware of?

    The build is now proceeding.

  6. #6
    Join Date
    Nov 2005
    Location
    Canada
    Posts
    1,044

    Default

    glib2is not only for graphic, it have a lot of useful things.

  7. #7

    Default

    Doesn't look good.

    -bash-4.1$ /usr/local/amanda/3.4.1p1/libexec/amanda/planner DAILY --log-filename /tmp/planner.log.3.4.1p1
    planner: pid 2851 executable /usr/local/amanda/3.4.1p1/libexec/amanda/planner version 3.4.1
    planner: build: VERSION="Amanda-3.4.1"
    planner: BUILT_DATE="Thu Dec 1 12:43:54 MST 2016" BUILT_MACH=""
    planner: BUILT_REV="6945" BUILT_BRANCH="tags" CC="gcc"
    planner: paths: bindir="/usr/local/amanda/3.4.1p1/bin"
    planner: sbindir="/usr/local/amanda/3.4.1p1/sbin"
    planner: libexecdir="/usr/local/amanda/3.4.1p1/libexec"
    planner: amlibexecdir="/usr/local/amanda/3.4.1p1/libexec/amanda"
    planner: mandir="/usr/local/amanda/3.4.1p1/share/man"
    planner: AMANDA_TMPDIR="/tmp/amanda" AMANDA_DBGDIR="/tmp/amanda"
    planner: CONFIG_DIR="/usr/local/amanda/3.4.1p1/etc/amanda"
    planner: DEV_PREFIX="/dev/" DUMP="/sbin/dump"
    planner: RESTORE="/sbin/restore" VDUMP=UNDEF VRESTORE=UNDEF
    planner: XFSDUMP="/sbin/xfsdump" XFSRESTORE="/sbin/xfsrestore"
    planner: VXDUMP=UNDEF VXRESTORE=UNDEF
    planner: SAMBA_CLIENT="/usr/bin/smbclient" GNUTAR="/bin/tar"
    planner: COMPRESS_PATH="/bin/gzip" UNCOMPRESS_PATH="/bin/gzip"
    planner: LPRCMD=UNDEF MAILER=UNDEF
    planner: listed_incr_dir="/usr/local/amanda/3.4.1p1/var/amanda/gnutar-lists"
    planner: defs: DEFAULT_SERVER="jilamanda.colorado.edu"
    planner: DEFAULT_CONFIG="DailySet1"
    planner: DEFAULT_TAPE_SERVER="amanda.XXX"
    planner: DEFAULT_TAPE_DEVICE="" NEED_STRSTR AMFLOCK_POSIX
    planner: AMFLOCK_FLOCK AMFLOCK_LOCKF AMFLOCK_LNLOCK SETPGRP_VOID
    planner: AMANDA_DEBUG_DAYS=4 BSD_SECURITY USE_AMANDAHOSTS
    planner: CLIENT_LOGIN="amandabackup" CHECK_USERID HAVE_GZIP
    planner: COMPRESS_SUFFIX=".gz" COMPRESS_FAST_OPT="--fast"
    planner: COMPRESS_BEST_OPT="--best" UNCOMPRESS_OPT="-dc"
    planner: CONFIGURE_ARGS=" '--prefix=/usr/local/amanda/3.4.1p1' '--without-rest-server' '--with-group=disk' '--with-user=amandabackup'"
    READING CONF INFO...
    DATE 20161201131505
    planner: timestamp 20161201131505
    planner: tape_length is set from maxdumpsize (768000000 KB)
    planner: time 0.125: startup took 0.125 secs

    SENDING FLUSHES...
    Cleaning up holding disk '/holdA'
    Cleaning up holding disk '/holdB'
    ENDFLUSH
    ENDFLUSH

    SETTING UP FOR ESTIMATES...
    planner: time 0.156: setting up estimates for ancilla-2:/dev/sda1
    Segmentation fault

  8. #8
    Join Date
    Nov 2005
    Location
    Canada
    Posts
    1,044

    Default

    i do not understand why this patch fail with: Segmentation fault

    Maybe the problem is with the build, can you try to build without the patch to see if it succeed?

  9. #9

    Default

    A build without the patch using identical configure flags seems to work. The planner is currently running and has passed the point at which it was breaking.

    Incidentally, I couldn't determine why I wasn't getting a core dump. I can generate a core dump with an appropriate C program, so they're not disabled in any system- or user-wide way. But nothing from Amanda. Any ideas? I can provide a stack backtrace if I could get a core file.

  10. #10

    Default

    I encountered two additional bugs and have rolled back to 3.3.9.

    The first is a HEADER_DONE not set error on a single dataset of a ZFS pool. We're dumping in the neighborhood of 20 small datasets (one per user) on a server running zfsonlinux, and each run resulted in precisely one of these failing with the HEADER_DONE not set error. It was not the same dataset each time. I've never encountered this error before, and digging into the code it wasn't immediately apparent why this was happening. The error did not occur with last night's 3.3.9 run.

    The other error, which was a (literal) showstopper, was a taper hang. The run two nights ago never finished. At the time of investigation there were 8 idle dumper threads and 3 taper threads, with the driver reporting that the taper was writing. Two of the tapers were in a futex wait channel and the third was in a poll_s... wchan. I ultimately sent sigterm to this last taper and amanda immediately wrapped things up with a taper protocol error. The logs didn't report anything unusual prior to the hang and there's no evidence of any i/o problems with the hardware or network. And the problem did not occur with 3.3.9.

    So it seems 3.4 is not ready for primetime, at least not for us.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •