Results 1 to 2 of 2

Thread: Virtual tapes and Amazon S3 resource efficiency questions

  1. #1

    Default Virtual tapes and Amazon S3 resource efficiency questions

    Dear Amanda community,

    I would like to make backups to disk and to Amazon's S3 service. I have a few questions regarding the tape size and possible impacts on resource usage and simplicity of restore operations.

    1. For local vtapes: Should I define my virtual tapes so big that the (virtual) tape length is never reached? This would be more resource-efficient since no chunk would be duplicated when the (arbitrary) end of a vtape is hit. A single vtape would be used per backup run. Of course I would need to find a good oversubscription factor, based on the average amount of data actually written per tape.

    2. S3 has a file-size limit of 5GB so I assume I should use a corresponding tape_splitsize? However, that should not affect the considerations in (1) even if the same tape_splitsize is used locally.

    3. Will restore operations work on the level of a) backup runs, b) tapes, c) DLEs on a tape or d) chunks when using tape_splitsize? I am wondering how much data Amanda would need to pull from S3 when a file needs to be restored. This has a direct impact on the time needed for recovery and the cost incurred. If I want to keep the data transfer volume low, should I use a small tape_splitsize? Should I use short vtapes (contrary to (1) above)?

    4. Let's assume the risk of losing a vtape is 0. Then one could choose a rather long dumpcycle, because it would help to keep traffic low (more incrementals will be done per full backup). But I assume Amanda would have to keep all tapes since the last level 0 backup? Since disk space used has to be paid for with S3, there might be a cost-optimal solution between transferring much data (short dump cycle) and storing much "outdated" incremental data (longer dumpcycle). Which tapes/files/chunks would Amanda need to transfer (as with the preceeding question) when a long dump cycle is used and data is spread widely over backup runs?

    I hope I can come up with a good strategy once I understand the way Amanda works :-)

    Kind regards
    Matthias

  2. #2
    Join Date
    Mar 2007
    Location
    Chicago, IL
    Posts
    688

    Default

    First, you'll find better answers to questions like these on the amanda-users mailing list.

    Quote Originally Posted by mpdude View Post
    1. For local vtapes: Should I define my virtual tapes so big that the (virtual) tape length is never reached? This would be more resource-efficient since no chunk would be duplicated when the (arbitrary) end of a vtape is hit. A single vtape would be used per backup run. Of course I would need to find a good oversubscription factor, based on the average amount of data actually written per tape.
    It sounds like you've already read the appropriate article. This is one strategy, but amounts to lying to Amanda about how much space is available, so it can lead to surprises when you're caught in your lie.

    Quote Originally Posted by mpdude View Post
    2. S3 has a file-size limit of 5GB so I assume I should use a corresponding tape_splitsize? However, that should not affect the considerations in (1) even if the same tape_splitsize is used locally.
    No, the S3 device writes one S3 object per block, with a default blocksize of 10M, so there is no need to use a split size. Also, note that the S3 device does not impose a hard length -- it will keep writing data as long as Amanda sends data.

    Quote Originally Posted by mpdude View Post
    3. Will restore operations work on the level of a) backup runs, b) tapes, c) DLEs on a tape or d) chunks when using tape_splitsize? I am wondering how much data Amanda would need to pull from S3 when a file needs to be restored. This has a direct impact on the time needed for recovery and the cost incurred. If I want to keep the data transfer volume low, should I use a small tape_splitsize? Should I use short vtapes (contrary to (1) above)?
    Restore operations currently pull all dumpfiles containing files that are required, in their entirety. So if you need two files from a particular DLE, and one of those files is on a full backup on 7/29 and the other on an incremental on 8/1, then Amanda will pull the entire full backup of that DLE and the entire incremental. If you want to minimize this time/transfer cost, use small DLEs.

    Quote Originally Posted by mpdude View Post
    4. Let's assume the risk of losing a vtape is 0. Then one could choose a rather long dumpcycle, because it would help to keep traffic low (more incrementals will be done per full backup). But I assume Amanda would have to keep all tapes since the last level 0 backup? Since disk space used has to be paid for with S3, there might be a cost-optimal solution between transferring much data (short dump cycle) and storing much "outdated" incremental data (longer dumpcycle). Which tapes/files/chunks would Amanda need to transfer (as with the preceeding question) when a long dump cycle is used and data is spread widely over backup runs?
    Yes, this is a tradeoff. Amanda can usually figure out when a single-file recovery only requires the latest incremental to capture that file, but when restoring more than a few files, Amanda will generally end up downloading the full (level 0) and any intervening incrementals (a level 1, level 2, ...)

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •