Page 2 of 2 FirstFirst 12
Results 11 to 17 of 17

Thread: tape spanning issue with large DLE

  1. #11
    Join Date
    Aug 2008
    Location
    Sunnyvale, CA
    Posts
    306

    Default

    I assume you are configuring Amanda for tape blocksize 256k because you know this is what your drive uses? It makes sense that being in the tapetype definition is enough.

  2. #12

    Default

    Thanks Dustin. I'll do some more investigation.

    Attached in the taper debug log with some of the middle snipped out to reduce its size. One reoccurring entry at the failure point strikes me as something I can dig into tonight.

    taper: changer_find: looking for NULL changer is searchable = 1
    "/dev/nst0" uses deprecated device naming convention;

    I thought I had completely eliminated the "/dev/nst0" entries in favor of "tape:/dev/nst0". Just a nit but might lead me to something. The "looking for NULL ..." is something I'll look into also but don't know quite how to interpret it at the moment.

    I'll post anything I find over the weekend.

    Cheers,

    Mike
    Attached Files Attached Files

  3. #13

    Default

    Paul,

    I looked at the manufacturers web page and they suggested 256 and other posts I read in the forum suggested improved performance. As I mentioned, 32K buffers actually was a bit better performance but I could clearly hear much more noise from the drive so I opted for the 256. It appears either work and if all else fails I would certainly consider trying 32k buffers. Seems like an easy thing to do now since I apparently don't have to re-compile amanda. Remove one entry in modprobe.conf, change the tapetype, reboot the computer and of course relabel the tapes.

    I'll look into Dustin's suggestions and if necessary try 32k buffers too although I might have to compete for bandwidth this weekend.

    I appreciate the help and suggestions.

    Mike

  4. #14

    Default

    Got some very interesting results that makes me believe we're definitely in the hunt but more tuning is required.

    I did some investigation into the suggestions Dustin made and found what looked to be a smoking gun. I had the initial_poll_delay set to 20 seconds and the max_drive_wait set to 60 seconds. To validate these numbers I timed the robot to see how much time it actually takes to change a tape. Much to my horror I discovered that it takes about 60 seconds just to change tapes plus another 30 to 35 before the lcd panel on the front reads ready (threading and getting to load point). If amanda was waiting the 60 plus 20 it's entirely possible the tape wasn't ready since I clocked it 5 times in a row to be ~90+ seconds. This could account for the system error of the tape not ready.

    So I started a run with only those two parameters changed: initial_poll_delay from 20 to 60 and max_drive_wait from 60 to 120.

    The first observation is that the time to write a tape went up dramatically. Using the original parameters it took 5.3 hrs to write out one full tape. I looked in the amdump log and calculated the time from the first block until I saw the Request New Tape entry. Using the same method with these new parameters writing one tape took 11.1 hrs.

    However, it did successfully transition from the first tape to the second where it proceeded to write the remaining portion of the second DLE entry. When it filled the second tape it crashed with the same Kernel error message for each of the next three tapes left in runtapes.

    So digging into the logs again looking carefully at the transition points where tapes were being changed. From one to two here's the entry I focused in on:

    driver: result time 39629.482 from taper: PARTDONE 00-00002 MCNET-001 183 2097152 "[sec 213.594621 kb 2097152 kps 9818.374593]"
    driver: state time 39848.753 free kps: 798976 space: 36700160 taper: writing idle-dumpers: 3 qlen tapeq: 0 runq: 0 roomq: 0 wakeup: 0 driver-idle: not-idle
    driver: interface-state time 39848.753 if default: free 798976
    driver: hdisk-state time 39848.753 hdisk 0: free 36700160 dumpers 0
    driver: result time 39848.870 from taper: REQUEST-NEW-TAPE 00-00002
    driver: send-cmd time 39848.870 to taper: NEW-TAPE
    taper: using label `MCNET-002' date `20081111162944'
    driver: state time 39988.694 free kps: 798976 space: 36700160 taper: writing idle-dumpers: 3 qlen tapeq: 0 runq: 0 roomq: 0 wakeup: 0 driver-idle: not-idle
    driver: interface-state time 39988.694 if default: free 798976
    driver: hdisk-state time 39988.694 hdisk 0: free 36700160 dumpers 0
    driver: result time 39988.694 from taper: NEW-TAPE 00-00002 MCNET-002
    driver: state time 40035.641 free kps: 798976 space: 36700160 taper: writing idle-dumpers: 3 qlen tapeq: 0 runq: 0 roomq: 0 wakeup: 0 driver-idle: not-idle
    driver: interface-state time 40035.641 if default: free 798976
    driver: hdisk-state time 40035.641 hdisk 0: free 36700160 dumpers 0
    driver: result time 40035.642 from taper: PARTDONE 00-00002 MCNET-002 1 1933829 "[sec 44.103476 kb 1933829 kps 43847.554740]"
    driver: state time 40248.174 free kps: 798976 space: 36700160 taper: writing idle-dumpers: 3 qlen tapeq: 0 runq: 0 roomq: 0 wakeup: 0 driver-idle: not-idle
    driver: interface-state time 40248.174 if default: free 798976
    driver: hdisk-state time 40248.174 hdisk 0: free 36700160 dumpers 0
    driver: result time 40248.309 from taper: PARTDONE 00-00002 MCNET-002 2 2097152 "[sec 211.333224 kb 2097152 kps 9923.437311]"

    and the from the second to the third tape:

    driver: result time 63047.842 from taper: PARTDONE 00-00002 MCNET-002 108 2097152 "[sec 212.410113 kb 2097152 kps 9873.126898]"
    driver: state time 63097.127 free kps: 798976 space: 36700160 taper: writing idle-dumpers: 3 qlen tapeq: 0 runq: 0 roomq: 0 wakeup: 0 driver-idle: not-idle
    driver: interface-state time 63097.127 if default: free 798976
    driver: hdisk-state time 63097.127 hdisk 0: free 36700160 dumpers 0
    driver: result time 63097.127 from dumper0: DONE 00-00002 293894550 293894550 30321 "[sec 30320.848 kb 293894550 kps 9692.8 orig-kb 293894550]"
    driver: finished-cmd time 63097.127 dumper0 dumped MCS1:/export/M-P
    Kernel gave unexpected write() result of "Invalid argument" on device /dev/nst0.
    driver: state time 63179.363 free kps: 798976 space: 36700160 taper: writing idle-dumpers: 3 qlen tapeq: 0 runq: 0 roomq: 0 wakeup: 0 driver-idle: not-idle
    driver: interface-state time 63179.363 if default: free 798976
    driver: hdisk-state time 63179.363 hdisk 0: free 36700160 dumpers 0
    driver: result time 63179.363 from taper: REQUEST-NEW-TAPE 00-00002
    driver: send-cmd time 63179.363 to taper: NEW-TAPE
    taper: using label `MCNET-003' date `20081111162944'
    driver: state time 63319.975 free kps: 798976 space: 36700160 taper: writing idle-dumpers: 3 qlen tapeq: 0 runq: 0 roomq: 0 wakeup: 0 driver-idle: not-idle
    driver: interface-state time 63319.975 if default: free 798976
    driver: hdisk-state time 63319.975 hdisk 0: free 36700160 dumpers 0
    driver: result time 63319.975 from taper: NEW-TAPE 00-00002 MCNET-003
    Kernel gave unexpected write() result of "Invalid argument" on device /dev/nst0.
    driver: state time 63343.367 free kps: 798976 space: 36700160 taper: writing idle

    It appears to me that the amount of time is about the same, ~200 seconds. The only thing I can figure is that I'm on the ragged edge possibly.

    From this I'm thinking max_drive_wait needs to get pushed to 180 seconds and initial_poll_delay set back to 20 or even less to speed up the process.

    Suggestions and thoughts always welcome.

    Thanks to all,

    Mike

  5. #15
    Join Date
    Aug 2008
    Location
    Sunnyvale, CA
    Posts
    306

    Default

    Sounds like a great guess to me. I'm interested to see the [positive] results.

  6. #16

    Default

    Success!

    The complete 1.6T run spanning 5 tapes finished normally. I haven't had a chance to check the results and try a restore on a couple of files yet but I'd be surprised if there was a problem.

    Notes to take away:

    The error posted in the amdump log was:

    Kernel gave unexpected write() result of "Invalid argument" on device /dev/nst0

    This was caused because the max_tape_wait value in the changer.conf file was incorrectly set to a value that was insufficient to allow the robot to unload a tape, store it, fetch the next tape, load it and allow it to get to load point thus resulting in a ready condition not being observed prior to expiration of the max_tape_wait. Neither amlabel or amtape ever complained despite my frequent use of the "slot n" arguments that would cause the changing of tapes prior to labeling or reset.

    It is worthy of note that the error posted in the client log file was a broken pipe error. I believe this was caused because the client lost contact with the server when the server stopped the process after exhausting all available tapes. Obviously a symptom but not the root cause.

    One thing I still don't understand is the additional time to write a tape. Here's some data from three successive runs. The only thing changed were these setting in changer.conf. Note that the runs on 1105 and 1111 both failed but the last run succeeded.

    Run on date 20081112 poll=15, max tape = 180 (just finished successfully)
    Run on date 20081111 poll=60, max tape = 120 (failed)
    Run on date 20081105 poll=20, max tape = 60 (failed)

    PART taper MCNET-005 9 MCS1 /export/ 20081112155621 9/-1 0 [sec 218.504344 kb 2097152 kps 9597.758844]
    PART taper MCNET-001 9 MCS1 /export/ 20081111162944 9/-1 0 [sec 223.447785 kb 2097152 kps 9385.423087]
    PART taper MCNET-001 9 MCS1 /export/ 20081105192309 9/-1 0 [sec 105.692900 kb 2097152 kps 19841.938295]

    note that when max_tape_delay was set to 60 the time to write a 2G block (which was set with the tape_splitsize setting in amanda.conf) was 105s and the speed was 19842kps. From then on the time to write the same size block took ~220s at ~9500kps. I'll play with things a bit and if I make any discoveries I'll post accordingly. This makes the time to write a single LTO3 tape ~8.5 hours compared to 5.3hr. This is a significant increase and doesn't seem to make sense so I'll dig some more to see what I can figure out but hey, it worked so I'm not unhappy.

    Thanks to Paul and Dustin for extremely helpful comments and suggestions. Obviously couldn't have gotten here without your help.

    I noticed a few nits but will post that in the suggestion box.

    Regards and appreciation to all.

    Mike
    Last edited by Mike McConnell; November 14th, 2008 at 07:09 PM.

  7. #17
    Join Date
    Aug 2008
    Location
    Sunnyvale, CA
    Posts
    306

    Default

    That's great! I don't know if Dustin or Jean-Louis have any insight into the change in speed.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •