PDA

View Full Version : Windows Client backup user



georgeeye
April 3rd, 2009, 04:43 AM
Has anyone had issues setting the amandabackup user up as a domian user instead of a local user? Our setup on windows 2003 server using the local backup user works fine but not as a domain user. The domain backup user is a member of the following groups:

Administrators
Backup Operators
Server Operators
Domain Users

The local backup user is a member of the following groups:

Administrators
Backup Operators
Users

Amanda's logs for the failures always list the error code 260

Has anyone seen similar behavior?

paddy
April 3rd, 2009, 10:51 AM
Amanda's logs for the failures always list the error code 260

Has anyone seen similar behavior?

Error code 260 means backup failed multiple times. Are you seeing VSS errors?

vssadmin list writers output should be looked at.

BTW, amandabackup as domain user should work.

Paddy

georgeeye
April 6th, 2009, 12:02 PM
BTW, amandabackup as domain user should work.

Paddy

As it turns out, the domain vs local user was not the sure fire fix. We still have one cluster that repeatedly fails after this change was made. Below is the output from vssadmin list writers from one of the failed cluster nodes:

vssadmin 1.1 - Volume Shadow Copy Service administrative command-line tool
(C) Copyright 2001 Microsoft Corp.

Writer name: 'System Writer'
Writer Id: {e8132975-6f93-4464-a53e-1050253ae220}
Writer Instance Id: {606aea0f-60fa-4e56-ab9e-286ab5a6cbee}
State: [1] Stable
Last error: No error

Writer name: 'MSDEWriter'
Writer Id: {f8544ac1-0611-4fa5-b04b-f7ee00b03277}
Writer Instance Id: {40ebf33a-621d-45b5-bea0-d245936106d0}
State: [1] Stable
Last error: No error

Writer name: 'Registry Writer'
Writer Id: {afbab4a2-367d-4d15-a586-71dbb18f8485}
Writer Instance Id: {ef4a8a65-f16c-49cd-ab17-3a277c1c8f54}
State: [1] Stable
Last error: No error

Writer name: 'Event Log Writer'
Writer Id: {eee8c692-67ed-4250-8d86-390603070d00}
Writer Instance Id: {feeee0e6-df81-41b3-ab61-3dd98d7deb1d}
State: [1] Stable
Last error: No error

Writer name: 'COM+ REGDB Writer'
Writer Id: {542da469-d3e1-473c-9f4f-7847f01fc64f}
Writer Instance Id: {434af70d-6d08-42f9-8a0d-55ff5d33353b}
State: [1] Stable
Last error: No error

Writer name: 'WMI Writer'
Writer Id: {a6ad56c2-b509-4e6c-bb19-49d8f43532f0}
Writer Instance Id: {40cc761c-1c76-4549-833f-3e06d5165db1}
State: [1] Stable
Last error: No error

Writer name: 'BITS Writer'
Writer Id: {4969d978-be47-48b0-b100-f328f07ac1e0}
Writer Instance Id: {1d107231-5928-4aa0-9f69-3889ed426901}
State: [1] Stable
Last error: No error

Writer name: 'IIS Metabase Writer'
Writer Id: {59b1f0cf-90ef-465f-9609-6ca8b2938366}
Writer Instance Id: {1a1d8514-615c-474e-8656-e49051077c0e}
State: [1] Stable
Last error: No error

Writer name: 'Cluster Service Writer'
Writer Id: {41e12264-35d8-479b-8e5c-9b23d1dad37e}
Writer Instance Id: {0e00d21f-ba09-446c-8382-82484bdc3cc5}
State: [1] Stable
Last error: No error


I don't see anything wrong here, but I am unfamiliar with this utility.

Here is a little more info about our DLE setup here:

clusterservice_name "R:/" zwc-compress-span
cluster_node01 "C:/" zwc-compress-span
cluster_node02 "C:/" zwc-compress-span

Basically, the active node in the cluster has the name clusterservice_name and the following DLE should be backed up:: C:/ & R:/
the inactive node cluster_node0x should be backing up C:/




I am curious if we are encoutering a race condition between C: and R: on the active cluster node. I can't find any logs to indicate this, it's just a theory for now.

I can include the output of any amanda commands needed, or any of the log files ie. amstatus, etc.

Thanks!!

paddy
April 6th, 2009, 02:59 PM
Here is a little more info about our DLE setup here:

clusterservice_name "R:/" zwc-compress-span
cluster_node01 "C:/" zwc-compress-span
cluster_node02 "C:/" zwc-compress-span

Basically, the active node in the cluster has the name clusterservice_name and the following DLE should be backed up:: C:/ & R:/
the inactive node cluster_node0x should be backing up C:/




I am curious if we are encoutering a race condition between C: and R: on the active cluster node. I can't find any logs to indicate this, it's just a theory for now.

I can include the output of any amanda commands needed, or any of the log files ie. amstatus, etc.

Thanks!!

amdump log files will help. Are you having problems backing R:/ DLE only or backing up all?

Paddy

georgeeye
April 7th, 2009, 04:52 AM
The issue is with C:/ and R:/ on the active cluster node and C:/ on the inactive node. I cannot attach or copy the contents of today's amdumplog, it exceeds the max file size :(, could I email it to you?

Thanks!

paddy
April 7th, 2009, 10:49 AM
I would like to look at log.<datestamp> file for the backup run. Hopefully, this is smaller. If we confirm that this is an issue with Windows client, we can look at Windows log files.

Paddy

georgeeye
April 8th, 2009, 04:44 AM
OK I trimmed the log to exclude all the linux boxes, (they are backing up fine), so only our windows servers are shown. The cluster that we are having the most failures on is the mlrpwinapp cluster, (includes mlrpwinapp01 & 02), and mlrpwinstg, (only C:/ is failing).

See log.20090408020002.0 contents below:


DISK planner mlrpwinstg-b.materialogic.com "R:/"
DISK planner mlrpwinstg01-b.materialogic.com "C:/"
DISK planner mlrpwinapp-b.materialogic.com "R:/"
DISK planner mlrpwinapp01-b.materialogic.com "C:/"
DISK planner mlrpwinapp02-b.materialogic.com "C:/"
DISK planner mlwinrnd01-b.materialogic.com "C:/"

START planner date 20090408020002
START driver date 20090408020002
STATS driver hostname mladmin01
STATS driver startup time 0.077
WARNING planner Last full dump of mlrpwinapp-b.materialogic.com:"R:/" on tape 100104 overwritten in 2 runs.
WARNING planner Last level 1 dump of mlrpwinapp02-b.materialogic.com:"C:/" on tape 100103 overwritten in 1 run.
ERROR planner Request to mlrpwinapp01-b.materialogic.com failed: timeout waiting for ACK
ERROR planner Request to mlrpwinapp-b.materialogic.com failed: timeout waiting for ACK
INFO planner Incremental of mlwinrnd01-b.materialogic.com:"C:/" bumped to level 3.
INFO planner Incremental of mlrpwinstg-b.materialogic.com:"R:/" bumped to level 2.
FINISH planner date 20090408020002 time 5936.336

SUCCESS dumper mlwinrnd01-b.materialogic.com "C:/" 20090408020002 3 [sec 48.580 kb 67428 kps 1388.0 orig-kb 526800]
SUCCESS chunker mlwinrnd01-b.materialogic.com "C:/" 20090408020002 3 [sec 50.188 kb 67428 kps 1344.1]
STATS driver estimate mlwinrnd01-b.materialogic.com "C:/" 20090408020002 3 [sec 22 nkb 262362 ckb 42560 kps 1882]
PART taper 100102 15 mlwinrnd01-b.materialogic.com "C:/" 20090408020002 1/1 3 [sec 0.888735 kb 67427 kps 75868.974538]
DONE taper mlwinrnd01-b.materialogic.com "C:/" 20090408020002 1 3 [sec 0.888735 kb 67427 kps 75868.974538]
FAIL chunker mlrpwinstg01-b.materialogic.com "C:/" 20090408020002 0 [cannot read header: got 0 bytes instead of 32768]
FAIL chunker mlrpwinstg01-b.materialogic.com "C:/" 20090408020002 0 [cannot read header: got 0 bytes instead of 32768]
FAIL driver mlrpwinstg01-b.materialogic.com "C:/" 20090408020002 0 [too many dumper retry: "[request failed: timeout waiting for REP]"]
SUCCESS dumper mlrpwinstg-b.materialogic.com "R:/" 20090408020002 2 [sec 805.174 kb 989873 kps 1229.4 orig-kb 14259605]
SUCCESS chunker mlrpwinstg-b.materialogic.com "R:/" 20090408020002 2 [sec 806.779 kb 989873 kps 1227.0]
STATS driver estimate mlrpwinstg-b.materialogic.com "R:/" 20090408020002 2 [sec 986 nkb 13967118 ckb 2239456 kps 2271]
PART taper 100102 36 mlrpwinstg-b.materialogic.com "R:/" 20090408020002 1/1 2 [sec 14.902670 kb 989872 kps 66422.513536]
DONE taper mlrpwinstg-b.materialogic.com "R:/" 20090408020002 1 2 [sec 14.902670 kb 989872 kps 66422.513536]
FAIL dumper mlrpwinapp02-b.materialogic.com "C:/" 20090408020002 0 [return error code 260]
sendbackup: start [mlrpwinapp02-b.materialogic.com:C:/ level 0]
sendbackup: info BACKUP=pkzip
sendbackup: info RECOVER_CMD=Extract with zmanda windows client or unzip program
sendbackup: info end
| Total bytes written: 14420351 (14082KiB, 165751KiB/s)
sendbackup: error [return error code 260]
sendbackup: size 14082
sendbackup: end
FAIL chunker mlrpwinapp02-b.materialogic.com "C:/" 20090408020002 0 [dumper returned FAILED]
FAIL chunker mlrpwinapp02-b.materialogic.com "C:/" 20090408020002 0 [dumper returned FAILED]
FAIL dumper mlrpwinapp02-b.materialogic.com "C:/" 20090408020002 0 [return error code 260]
sendbackup: start [mlrpwinapp02-b.materialogic.com:C:/ level 0]
sendbackup: info BACKUP=pkzip
sendbackup: info RECOVER_CMD=Extract with zmanda windows client or unzip program
sendbackup: info end
| Total bytes written: 14421487 (14083KiB, 169664KiB/s)
sendbackup: error [return error code 260]
sendbackup: size 14083
sendbackup: end

paddy
April 8th, 2009, 01:40 PM
Can you please check Zmanda Windows Client logs?

http://wiki.zmanda.com/index.php/Zmanda_Windows_Client#Log_Files

Paddy

georgeeye
April 13th, 2009, 05:16 AM
The windows client log files on both cluster nodes are larger than 6MB for the most recent ones, is there anything inparticular I should be looking for?

georgeeye
April 22nd, 2009, 05:44 AM
We are still having issues with our windows clusters. Below is the failure summary from amoverview:

FAILURE DUMP SUMMARY:
staging_cluster_activenode "C:/Perl" RESULTS MISSING
production_cluster_activenode "C:/" RESULTS MISSING
planner: ERROR Request to production_cluster_activenode failed: timeout waiting for REP
planner: ERROR Request to staging_cluster_activenode failed: timeout waiting for REP


I could not post the full logs from these boxes before due to thier size. I have used the zwcconfig.exe to reduce the max file size to 1MB in hopes that I could post them here for review. I will have the reduced output tomorrow, and will post the Windows logs then.

Thanks for all the help!

georgeeye
April 24th, 2009, 04:47 AM
Here is the contents of the log file from amanda windows client on an active cluster node. I have trimmed it to only include the sections that reference the C:/ DLE


2512:5160:03:48:550:CZWCJobHandler : Bytes to be written 65536
2512:5160:03:48:550:CZWCJobHandler : Bytes written 65536
2512:5160:03:48:550:CZWCJobHandler : Bytes to be written 4373
2512:5160:03:48:550:CZWCJobHandler : Bytes written 4373
2512:3712:03:48:550:Query:ReadDOSEntries: GetHandle
2512:3712:03:48:550:Query:ReadDOSEntries: Executing Query
2512:3712:03:48:550:SELECT RelativePath FROM ZWCCatalog.ZWCDataObjectSet WHERE DiskListId = 19 AND Backup = 1 ORDER BY `dosentryid` LIMIT 0,5000000 INTO OUTFILE 'C:/Program Files/Zmanda/Zmanda Client for Windows Community Edition/misc//IndexRecords' FIELDS ESCAPED BY '';
2512:3712:03:48:253:Query:ReadDOSEntries: GetHandle
2512:3712:03:48:253:Query:ReadDOSEntries: Executing Query
2512:3712:03:48:253:SELECT RelativePath FROM ZWCCatalog.ZWCDataObjectSet WHERE DiskListId = 19 AND Backup = 1 ORDER BY `dosentryid` LIMIT 5000000,5000000 INTO OUTFILE 'C:/Program Files/Zmanda/Zmanda Client for Windows Community Edition/misc//IndexRecords' FIELDS ESCAPED BY '';
2512:4216:03:48:957:Coming out of CompressBackupJobInThread
2512:5772:03:48:957:SF: Pkzip: StopFormatting: Invalid context
2512:4720:03:48:957:Coming out of Async Thread
2512:3712:03:48:128:zwcUnInitQSnapshot: VSS Service Stopped
2512:3712:03:48:128:Unitialize Snapshot failed 0
2512:3712:03:48:128:CServiceSessionMgr : Disk list Entry = 05C1E980
2512:3712:03:48:128:CZWCJobHandler::ExecuteJob : Job Type = 21
2512:3712:03:48:128:CZWCJobHandler::ExecuteJob : Job Type = 6
2512:3712:03:48:128:ZWC Failure error code: 2
2512:3712:03:48:128:FS: NTFS: GetAttributes: GetAttributes of C:\
2512:3712:03:48:128:FS: NTFS: GetAttributes: GetAttributes is Sucessful
2512:3764:03:48:128:RemoveStaleClient : Last error code is
2512:3764:03:48:128:The operation completed successfully.

2512:3712:03:48:128:2512:CAmandaStateManager : Changing the service state to SRV_SERVER_ACK_WAIT
3764:03:48:128:Communication Layer : Connection closed by server.
2512:3712:03:48:128:CZWCJobHandler::ExecuteJob : Job Type = 21
2512:3712:03:48:128:CZWCJobHandler::ExecuteJob : Job Type = 6
2512:3712:03:48:128:ZWC Failure error code: 2
2512:3712:03:48:128:FS: NTFS: GetAttributes: GetAttributes of C:\
2512:3712:03:48:128:FS: NTFS: GetAttributes: GetAttributes is Sucessful
2512:3712:03:48:128:CAmandaStateManager : Changing the service state to SRV_SERVER_ACK_WAIT
2512:3768:07:14:542:Got error while reading from socket
2512:3768:07:14:542:ZWC Failure error code: 64
2512:3768:07:14:542:RemoveStaleClient : Last error code is
2512:3768:07:14:542:The specified network name is no longer available.

2512:3768:07:14:542:Communication Layer : Connection closed by server.
2512:3768:07:14:542:CServiceSessionMgr : Cancelling the command
2512:3768:07:14:542:CServiceSessionMgr : Cancelling the command
2512:584:07:14:542:CServiceSessionMgr : In connecting state Error 109.
2512:3768:07:14:542:CServiceSessionMgr : Invalid session in response
2512:3768:07:14:542:CServiceSessionMgr : Cancelling the command
2512:3768:07:14:542:CServiceSessionMgr : Cancelling the command
2512:8096:07:14:542:CServiceSessionMgr : In connecting state Error 109.
2512:3768:07:14:542:CServiceSessionMgr : Invalid session in response

Thanks!

paddy
April 24th, 2009, 02:20 PM
2512:3712:03:48:128:CAmandaStateManager : Changing the service state to SRV_SERVER_ACK_WAIT
2512:3768:07:14:542:Got error while reading from socket

What is the dtimeout value in amanda.conf? How big is the backup?

Paddy

georgeeye
April 27th, 2009, 04:46 AM
dtimeout 3600
etimeout 1800

C:/ = 7.95GB
R:/ = 4.57GB