December 18th, 2012, 03:47 AM
Limit disk read speed on a per-client basis
Although Amanda has an overall network bandwidth throttle config option, it's quite surprising that there isn't a per-client disk read throttle setting too. In our situation, we're using the same SAN for almost all our Amanda clients. If you can't wrap your head around that, think of a setup where all clients mount unique NFS disks from the same NFS server, so there's a lot of "disk read" contention (which uses the network behind the scenes) amongst the (NFS) clients and the NFS server, particularly if multiple parallel reads are taking place.
Hence, if we set inparallel above 1 (which we will need to because there's 20+ clients), the SAN goes berserk contending for disk reads and we start to see timeouts or very long dump times for the clients. And yet the "overall network bandwidth" is quite low between the Amanda server and all the clients combined - now can you see why an overall network throttle is of no use to us?
What I propose is that each Amanda client is "disk read limited" - this would be a value in Kbytes/sec that no particular second could exceed. This is prior to any client-side encryption/compression - it's purely reading from whatever "disk" medium they are backing up from. Basically, it maximises the amount they can read in one file operation that can fit in one second. perhaps sleeping the rest of the second if it can be read in faster than a second. Clearly the first read would have to be at the speed limit and then future reads scaled down appropriately if needed [i.e. if the first read took more than a second] - you should always be looking at the speed/size of the previous read to adjust as it contends with more clients coming in and slowing down the speed). It does need to be a per-client basis remember - some DLEs won't be using the SAN and will need maximum read speed.
I've just had a thought - can client-side values be set on the Amanda server and then passed through to the client? It would be much easier to set up client values that I propose on the central server than having to change each value on the client (and then subsequently forget which client has which values).
Last edited by rkl; December 18th, 2012 at 03:56 AM.
December 18th, 2012, 04:29 AM
You can set the interface (in the disklist) for all clients of the SAN to a specific interface that have a low 'use'.
You can try to use ionice on the amanda client, search on google for 'amanda ionice'
You can run a 'pv' script to limit a client bandwidth
To limit the rate to 300k/sec
You create a small script: /usr/bin/pv-300k
exec /usr/bin/pv -L300k -q
In the dumptype:
compress client custom
Reduce inparallel, why you can't set it to 1?
You could set 'estimate server' in the dumptype.
December 18th, 2012, 04:48 PM
It's a single SAN (OK, 2 SANs with one of them on standby as an automatic failover using drbd and heartbeat) presented as a set of volumes to the clients that ultimately use the same RAID container. I guess we just didn't expect this much contention on just reads otherwise we might have split it into two or more separate RAID containers (but then may have "lost" some disk capacity).
Originally Posted by martineau
> You can try to use ionice on the amanda client, search on google for 'amanda ionice'
Apologies that I wasn't more specific about the clients - they are a mix of both Windows and Linux clients, so I'm not sure how easy it would be to do an ionice on the Windows clients. Ditto with the "pv" script I guess.
We've also been having timeouts on the estimate pass too on WIndows clients - one culprit might be directories full of 100,000's files that take a long time to scan on Windows (maybe NTFS is particularly poor in this regard). Further investigation is needed (e.g. purge any unwanted files or even skip the dir they're in with an exclude option to see if that's an issue).
> Reduce inparallel, why you can't set it to 1?
Yes, that's what we're trying at the moment. Interestingly, the Linux client is massively quicker and more reliable
(no pauses, no timeouts, fast estimates [I left the default "estimate client" on for those and it estimates in seconds] and constant read speeds from the SAN. The Amanda client is virtually the exact opposite (even in the same conditions with inparallel=1).
Ultimately, though, you really want inparallel=2 if at all possible, because if any client "stalls" in mid-backup (which the Windows client seems to do regularly and this eventually leads to a timeout on some occasions), you'd want some other client doing a dump too. It would have been nice if Amanda offered per-dumptype parallelism options too (e.g. inparallel=3, inparallel-<dumptype_name1>=1, inparallel-<dumptype-name2>=2).
> You could set 'estimate server' in the dumptype.
Yes, done that now too - "estimate client" on the Windows client is appallingly slow (and it's the default - it really does put Amanda in a bad light on the Windows platform) and for some reason it wouldn't even run the faster "estimate calcsize" option on the Windows clients either.
We are doing some general housekeeping on the Windows servers - identifying trees that can be excluded or purged, since on WIndows, it does seem important not to keep thousands of files in one dir. Our clients are mostly VMs (using proxmox+kvm) using virtio disk/network and maybe they need the latest versions of their disk and network virtio drivers to improve performance? I can't believe I'm seeing only 1 Mbytes/sec compressed (could be 2-5 Mbytes/sec uncompressed) from the Zmanda Windows client, even with inparallel=1. I'm seeing at least 7-10 Mbytes compressed from the equivalent Linux clients, so the Windows client side is the bottleneck when inparallel=1.