PDA

View Full Version : index files owned as root, not amandabackup, plus 0-len holding disk files?



Andrew Rakowski
July 24th, 2007, 05:03 PM
Hi folks - I'm finding some index files laying around in the configuration/index directory that are owned by root (as opposed to the great majority, which are owned by user amandabackup). Additionally, I'm also seeing occasional zero-length files left in the holding disk. I've recently upgraded to Amanda server version 2.5.2p1 on my Linux backup server.

When I noticed the root-owned files over the weekend, I thought another admin had gotten onto the backup server and ran something as root rather than amandabackup, so I did a:

find index/ -user root -exec chown amandabackup {} \;

to set all the file ownership to the amandabackup user. I found more of these files this morning. I believe that I might have found a problem in Amanda that's related to (virtual) tape changes, in the case of the root-owned files (details provided below). I don't know why the zero-length files are getting left behind in the holding disk locations, as other systems that are having backup failures aren't leaving them behind.

I've changed the ownership on root-owned files (to avoid problems with Amanda, since everything else is owned by user amandabackup), and deleted the zero-length files from the holding disk. I'll look for more tomorrow morning.

Details follow. Examples are shown relative to a configuration directory /etc/amanda/XXXXXset1 on the Amanda backup server (a Redhat Linux box).



[amandabackup@amanda01 XXXXset1]$ cat /etc/redhat-release
Red Hat Enterprise Linux WS release 4 (Nahant Update 5)
[amandabackup@amanda01 XXXXset1]$
[amandabackup@amanda01 XXXXset1]$ uname -a
Linux amanda01 2.6.9-55.ELsmp #1 SMP Fri Apr 20 17:03:35 EDT 2007 i686 i686 i386 GNU/Linux
[amandabackup@amanda01 XXXXset1]$
[amandabackup@amanda01 XXXXset1]$ rpm -qa|grep amanda
amanda-backup_server-2.5.2p1-1.rhel4
[amandabackup@amanda01 XXXXset1]$
[amandabackup@amanda01 XXXXset1]$ find index -user root -ls
1245434 76 -rw------- 1 root disk 72789 Jul 23 13:43 index/portland-0/_/20070723005902_1.gz
1524193 4 -rw------- 1 root disk 419 Jul 24 03:38 index/pump/_home_d3mNNN_diskb/20070724005902_1.gz
1524196 88 -rw------- 1 root disk 83772 Jul 24 03:41 index/pump/_/20070724005902_1.gz
1376622 112 -rw------- 1 root disk 109705 Jul 24 04:06 index/raman/_/20070724005902_1.gz
1245459 160 -rw------- 1 root disk 159416 Jul 24 03:41 index/foresail/_/20070724005902_1.gz
1425858 124 -rw------- 1 root disk 120579 Jul 24 04:43 index/pleiades-0/_/20070724005902_1.gz
1360267 968 -rw------- 1 root disk 987071 Jul 24 03:52 index/ccn/_export/20070724005902_0.gz
[amandabackup@amanda01 XXXXset1]$
[amandabackup@amanda01 XXXXset1]$
[amandabackup@amanda01 XXXXset1]$ find /backup0/amanda-hdset1a -size 0 -ls
5193860 0 -rw------- 1 amandaba disk 0 Jul 23 03:34 /backup0/amanda-hdset1a/20070723005902/foresail._.1
5193851 0 -rw------- 1 amandaba disk 0 Jul 23 03:15 /backup0/amanda-hdset1a/20070723005902/foresail._boot_efi.1
5193740 0 -rw------- 1 amandaba disk 0 Jul 24 03:09 /backup0/amanda-hdset1a/20070724005902/foresail._home.0
5210201 0 -rw------- 1 amandaba disk 0 Jul 24 03:30 /backup0/amanda-hdset1a/20070724005902/foresail._boot_efi.1
[amandabackup@amanda01 XXXXset1]$

These are systems that reported troubles with doing backups. For instance, from the morning summary report:



FAILURE AND STRANGE DUMP SUMMARY:
...snip...
foresail /home lev 0 FAILED [data timeout]
foresail /home lev 0 FAILED [cannot read header: got 0 instead of 32768]
foresail /home lev 0 FAILED [too many dumper retry: "[request failed: Connection
timed out]"]
mitcluster1 /export lev 2 STRANGE
foresail /boot/efi lev 1 FAILED [cannot read header: got 0 instead of 32768]
mitp5 /home lev 1 STRANGE
foresail /boot/efi lev 1 FAILED [too many dumper retry: "[request failed: Connection
timed out]"]
foresail /boot/efi lev 1 FAILED [cannot read header: got 0 instead of 32768]
...snip...


and further down:


/-- foresail /home lev 0 FAILED [data timeout]
sendbackup: start [foresail:/home level 0]
sendbackup: info BACKUP=/bin/tar
sendbackup: info RECOVER_CMD=/bin/tar -f - ...
sendbackup: info end
\--------
...snip...
/-- ccn /export lev 0 STRANGE
sendbackup: start [ccn:/export level 0]
sendbackup: info BACKUP=/bin/tar
sendbackup: info RECOVER_CMD=/bin/gzip -dc |/bin/tar -f - ...
sendbackup: info COMPRESS_SUFFIX=.gz
sendbackup: info end
? gtar: ./home/bb/bb18d4/BBOUT: file changed as we read it
| Total bytes written: 17533736960 (16GB, 8.4MB/s)
sendbackup: size 17122790
sendbackup: end
\--------
...snip...


and still further down (in the "NOTES:" section):



NOTES:
...snip...
planner: Full dump of skor:/home promoted from 1 day ahead.
taper: tape XXXXset1-43 kb 50332640 fm 368 writing file: No space left on device
taper: continuing baker:/usr1.1 on new tape from 0kb mark: [writing file: No space left on
device]
taper: tape XXXXset1-13 kb 50332640 fm 18 writing file: No space left on device
taper: continuing pump:/home.1 on new tape from 0kb mark: [writing file: No space left on
device]
taper: tape XXXXset1-14 kb 42967872 fm 14 writing file: short write
taper: continuing raman:/home.2 on new tape from 5242880kb mark: [writing file: short write]
taper: tape XXXXset1-15 kb 50332640 fm 11 writing file: No space left on device
taper: continuing pleiades-0:/export.0 on new tape from 20971520kb mark: [writing file: No
space left on device]
taper: tape XXXXset1-16 kb 50332640 fm 11 writing file: No space left on device
taper: continuing malina:/.2 on new tape from 15728640kb mark: [writing file: No space left on
device]
taper: tape XXXXset1-17 kb 32436224 fm 7 [OK]
big estimate: bethe / 1


Notice that the root-owned files are all indexes from systems that happened to have their backups split between (virtual) tapes. The zero-length files (all system "foresail" related in this case) just seem to be related to failed backups, but not vtape changes.

If desired, I can provide unadulterated logs, config files, etc. to support individuals, but can't post them here. Please send a private message to me with the e-mail address you'd like logs or other files sent to.

Best regards,

-Andrew [in the desert of eastern Washington state, USA]

Andrew Rakowski
July 30th, 2007, 01:49 PM
Hi folks - I'm finding some index files laying around in the configuration/index directory that are owned by root (as opposed to the great majority, which are owned by user amandabackup). Additionally, I'm also seeing occasional zero-length files left in the holding disk. I've recently upgraded to Amanda server version 2.5.2p1 on my Linux backup server.


Hmmm, I guess either nobody else is seeing this problem or knows what to do with it. I'll assume it's a coding error, hope it gets fixed in the next release and move on.

So, is there any problem with running cron jobs to unscrew the ownership of files in the index/hostname/ when they get left as root-owned, rather than as the amandabackup user? Also, since there are zero-length files left behind in the holding disk directory, is there any problem with just deleting them (assuming backups are complete?)

My thought is to do something (as root) like this:



find /etc/amanda/*/index -user root -exec chown amandabackup:disk {} \;

find /backup*/*hd* -size 0c -exec /bin/rm {} \;


Any issues anyone?

-Andrew