Hello zmanda forums,
I am a new amanda backup admin and the amanda server that i have built has been functioning well for about a week at this point. But today I have many 'missing results' failures in the amreports and in the amstatus output there are errors that read "failed: process terminated while waiting for dumping".
amstatus failures 
Code:
nemo.example.com:/home 0 20885m failed: process terminated while waiting for dumping
nemo.example.com:/root 1 0m finished (19:19:29)
nemo.example.com:/var/client 0 11527m failed: process terminated while waiting for dumping
oscar.example.com:/etc 1 0m finished (19:17:09)
oscar.example.com:/home 0 20198m failed: process terminated while waiting for dumping
oscar.example.com:/root 1 0m finished (19:16:34)
oscar.example.com:/var/client 1 0m finished (19:16:16)
peach.example.com:/etc 1 0m finished (19:19:48)
peach.example.com:/home 0 44822m failed: process terminated while waiting for dumping
amreport failures
Code:
nemo.acadaca /etc 1 0 0 7.7 0:01 208.1 0:00 170.0
nemo.acadaca /home MISSING ---------------------------------------------
nemo.acadaca /root 1 0 0 16.7 0:01 28.5 0:00 50.0
nemo.acadaca /var/client MISSING ---------------------------------------------
I've checked the logs and this is what I've found:
Code:
pondering nemo.example.com:/root... next_level0 7 last_level 0 (not due for a full dump, picking an incr level)
pondering nemo.example.com:/etc... next_level0 7 last_level 0 (not due for a full dump, picking an incr level)
nemo.example.com /home pri 15129 lev 0 nsize 42774490 csize 21387245
nemo.example.com /var/client pri 15129 lev 0 nsize 23608070 csize 11804035
nemo.example.com /etc pri 1 lev 1 nsize 220 csize 110
nemo.example.com /root pri 1 lev 1 nsize 30 csize 15
DUMP nemo.example.com ffffffff9efefbffffffffff01 /home 20110602185326 15129 0 1970:1:1:0:0:0 42774490 21387245 20885 1024 "Skipping: new disk can't be dumped in degraded mode"
DUMP nemo.example.com ffffffff9efefbffffffffff01 /var/client 20110602185326 15129 0 1970:1:1:0:0:0 23608070 11804035 11527 1024 "Skipping: new disk can't be dumped in degraded mode"
DUMP nemo.example.com ffffffff9efefbffffffffff01 /etc 20110602185326 1 1 2011:6:2:20:17:34 220 110 0 1024 "Skipping: a full is not planned, so can't dump in degraded mode"
DUMP nemo.example.com ffffffff9efefbffffffffff01 /root 20110602185326 1 1 2011:6:2:20:17:34 30 15 0 1024 "Skipping: a full is not planned, so can't dump in degraded mode"
driver: send-cmd time 1562.140 to chunker0: PORT-WRITE 00-00172 /var/bak/amanda/20110602185326/nemo.example.com._root.1 nemo.example.com ffffffff9efefbffffffffff01 /root 1 2011:6:2:20:17:34 1048576 GNUTAR 96 |;auth=ssh;srvcomp-fast;encrypt-serv-cust=/usr/sbin/amcrypt;server-decrypt-option=-d;index;exclude-file="*/*log* /var/lib/pgsql/data/postmaster.pid";
driver: send-cmd time 1562.158 to dumper0: PORT-DUMP 00-00172 48469 nemo.example.com ffffffff9efefbffffffffff01 /root NODEVICE 1 2011:6:2:20:17:34 GNUTAR "" "" "" /var/lib/amanda/.ssh/id_rsa ssh AMANDA 127.0.0.1:48470 |" <auth>ssh</auth>\n <compress>SERVER-FAST</compress>\n <encrypt>SERVER-CUSTOM<custom-encrypt-program>/usr/sbin/amcrypt</custom-encrypt-program>\n <decrypt-option>-d</decrypt-option>\n </encrypt>\n <record>YES</record>\n <index>YES</index>\n <datapath>AMANDA</datapath>\n <exclude>\n <file encoding=\"raw\" raw=\"Ki8qbG9nKiAvdmFyL2xpYi9wZ3NxbC9kYXRhL3Bvc3RtYXN0ZXIucGlk\">*/*log*_/var/lib/pgsql/data/postmaster.pid</file>\n </exclude>\n"
driver: finished-cmd time 1563.688 dumper0 dumped nemo.example.com:/root
driver: finished-cmd time 1563.690 chunker0 chunked nemo.example.com:/root
driver: send-cmd time 1563.712 to taper: FILE-WRITE worker0 01-00175 /var/bak/amanda/20110602185326/nemo.example.com._root.1 nemo.example.com /root 1 20110602185326 "" "" "" "" 53687091200 "" "" "" 30
driver: startaflush: LARGEST nemo.example.com /root 38 524155092
taper: status file nemo.example.com /root:/tmp/amanda/taper_status_file_mk8SyL
driver: finished-cmd time 1563.782 taper wrote nemo.example.com:/root
driver: send-cmd time 1581.739 to chunker0: PORT-WRITE 00-00192 /var/bak/amanda/20110602185326/nemo.example.com._etc.1 nemo.example.com ffffffff9efefbffffffffff01 /etc 1 2011:6:2:20:17:34 1048576 GNUTAR 192 |;auth=ssh;srvcomp-fast;encrypt-serv-cust=/usr/sbin/amcrypt;server-decrypt-option=-d;index;exclude-file="*/*log* /var/lib/pgsql/data/postmaster.pid";
driver: send-cmd time 1581.758 to dumper0: PORT-DUMP 00-00192 48734 nemo.example.com ffffffff9efefbffffffffff01 /etc NODEVICE 1 2011:6:2:20:17:34 GNUTAR "" "" "" /var/lib/amanda/.ssh/id_rsa ssh AMANDA 127.0.0.1:48735 |" <auth>ssh</auth>\n <compress>SERVER-FAST</compress>\n <encrypt>SERVER-CUSTOM<custom-encrypt-program>/usr/sbin/amcrypt</custom-encrypt-program>\n <decrypt-option>-d</decrypt-option>\n </encrypt>\n <record>YES</record>\n <index>YES</index>\n <datapath>AMANDA</datapath>\n <exclude>\n <file encoding=\"raw\" raw=\"Ki8qbG9nKiAvdmFyL2xpYi9wZ3NxbC9kYXRhL3Bvc3RtYXN0ZXIucGlk\">*/*log*_/var/lib/pgsql/data/postmaster.pid</file>\n </exclude>\n"
driver: finished-cmd time 1583.301 dumper0 dumped nemo.example.com:/etc
driver: finished-cmd time 1583.302 chunker0 chunked nemo.example.com:/etc
driver: send-cmd time 1583.332 to taper: FILE-WRITE worker0 01-00195 /var/bak/amanda/20110602185326/nemo.example.com._etc.1 nemo.example.com /etc 1 20110602185326 "" "" "" "" 53687091200 "" "" "" 220
driver: startaflush: LARGEST nemo.example.com /etc 50 524131080
taper: status file nemo.example.com /etc:/tmp/amanda/taper_status_file_yhqZGw
driver: finished-cmd time 1583.418 taper wrote nemo.example.com:/etc
Here are some entries in the chunker logs pertaining to the failed host:
Code:
Thu Jun 2 19:19:30 2011: chunker: getcmd: PORT-WRITE 00-00172 /var/bak/amanda/20110602185326/nemo.example.com._root.1 nemo.example.com ffffffff9efefbffffffffff01 /root 1 2011:6:2:20:17:34 1048576 GNUTAR 96 |;auth=ssh;srvcomp-fast;encrypt-serv-cust=/usr/sbin/amcrypt;server-decrypt-option=-d;index;exclude-file="*/*log* /var/lib/pgsql/data/postmaster.pid";
Thu Jun 2 19:19:31 2011: chunker: Building type FILE header of 32768-32768 bytes with name='nemo.example.com' disk='/root' dumplevel=1 and blocksize=32768
here is a tail -1 of the drnsize logs on the client as per the zmanda wiki:
Code:
[[email protected]:~] $ tail -1 /var/log/amanda/client/acadaca/sendsize.20110602185337.debug
Thu Jun 2 18:53:41 2011: sendsize: pid 10130 finish time Thu Jun 2 18:53:41 2011
And here is a grep of 'estimate size' of the sendsize logs:
Code:
[[email protected]:~] $ grep 'estimate size' /var/log/amanda/client/acadaca/sendsize.20110602185337.debug
Thu Jun 2 18:53:37 2011: sendsize: estimate size for /etc level 0: 65780 KB
Thu Jun 2 18:53:37 2011: sendsize: estimate size for /etc level 1: 220 KB
Thu Jun 2 18:53:37 2011: sendsize: estimate size for /root level 0: 72710 KB
Thu Jun 2 18:53:37 2011: sendsize: estimate size for /root level 1: 30 KB
Thu Jun 2 18:53:40 2011: sendsize: estimate size for /home level 0: 42774490 KB
Thu Jun 2 18:53:41 2011: sendsize: estimate size for /var/client level 0: 23608070 KB
This is the etimeout value on the server:
Code:
etimeout -2400 # total number of seconds for estimates.
I am enclosing a grep of the amdump logs for one of the failed hosts in the hopes it sheds some light on the situation
I would appreciate any advice you may have to help get to the resolution of this issue. thanks in advance!