After updating from 2.5.0p2 to 2.6.0p2 and running OK for a few days, I'm seeing the compression estimates for planner are totally off, which disturbs the DLEs chosen for backup on each run. For example, right now amadmin balance shows:
due-date #fs orig KB out KB balance
----------------------------------------------
9/26 Fri 1 137375330 0 ---
9/27 Sat 0 0 0 ---
9/28 Sun 3 127146210 4051325553137561658 +3693.9%
9/29 Mon 6 183484280 7829679975848763634 +7232.2%
9/30 Tue 3 132909620 3833724574305892912 +3490.1%
10/01 Wed 3 130272160 13951584 -100.0%
10/02 Thu 1 107617510 3906644189318756912 +3558.4%
----------------------------------------------
TOTAL 17 818805110 1174630218915375084 106784565355943189
When I look at the curinfo files, I see weird numbers, like:
Other than this, the backups seem to run normally and the daily Amanda report includes correct compression rates for every DLE, which somehow do not get written to the curinfo files. Is there a fix for this? How can I safely reset the compression rate calculation by editing or rebuilding the curinfo database without ditching my backups?
I've determined that my problem consists on amanda not getting the correct compressed sizes to write into curinfo. It basically uses and writes garbage compressed dumpsizes into curinfo: either 0 or a really big number. So, the compression rates are totally off because of that and in the next cycle planner uses these bogus rates and schedules random DLEs.
I glanced at driver.c and it seems the compressed dumpsize is only read from the chunker output, but my logs don't show any chunker output. Maybe that's the source of the problem: dumpsize never gets the right value. For now, I wrote a script that runs after amdump and fixes the curinfo files using the correct dumpsizes read from the dumper output on the amdump log file. This works, but I would like a better solution.
I'm also posting my amanda.conf, just in case it's relevant. I've been using basically this same configuration for a couple of years with Amanda 2.5.0p2. In short, I use a lot of 5GB tapes on hard disk, each one split into 511MB chunks, without holding disk. This was working fine until I updated to 2.6.0p2, which I downloaded from the Amanda website and built from sources without any change/patch.
I had to upgrade Amanda because I was unable to recover large compressed DLEs with 2.5.0p2 (the default on CentOS 5), not even by stitching the chunks together without using amfetchdump. Now I run amcheckdump every night and it passes. I use CentOS 5 on the server and most of my client machines, except for one CentOS 4 machine. All CentOS 5 machines use Amanda 2.6.0p2 and the CentOS 4 client machine uses the default Amanda 2.4.4p3.
Your patch fixed the problem in my last run. Thanks a lot for your prompt and effective answer.
I'm just curious about the reason this problem was triggered, because I guess this is not a mainstream bug affecting most users. Is it something in my configuration? If you know the reason and can tell me to satisfy my curiosity, thanks again. Otherwise, don't spend time on finding it out. I'm already very satisfied with the prompt fix.
It will be great if you can share your Amanda setup. E.g. how many (and type of) systems you are protecting, data size, and any performance numbers you can share.