A buffer-cache friendly mirror selection algorithm
There is one question that comes up from time to time when trying to optimize content delivery via mirrors -- especially when really pushing the mirrors to their limits.
For the most capable mirrors, this means that their disk I/O bandwidth is saturated, while their network bandwidth isn't.A busy mirror doesn't have enough RAM to keep all popular files in its buffer cache, so it is constantly loading data from the storage systems. Now the question is:
For the efficiency of the cache effect, it would be good if mirrorbrain [the mirroring system] could for each file request prefer a server to whom this request recently already got redirected. Is that possible?
It could be done. The same has been suggested by Christoph Thiel, my predecessor and former colleague at SUSE, some years ago. The idea was to make better use of the mirrors buffer caches.
I brainstormed a little bit and came up with the following conclusions.
It is principally an old problem to optimize the utilization of caches which are located between users and servers. The goal is to distribute requests into "buckets" equally, close both to the users and to the servers. Ideally in a stable fashion, that if one bucket is added, or one drops out, not all caches are flushed by a redistribution.
Characteristically, the classic problem is about a large number of objects (say, web pages) cached, compared to a relatively small number of caches (intermediaries). In todays mirroring, we often have a few particular files (the hot spots) that are comparably large, but infrequent (let's say 2 OOo files, or 5 Linux iso images), and 1-10 mirrors per country. However, the most obvious approach, a simple hash function that distributes the requests, won't work here. Why not? I'll demonstrate below.
First, there are conditions that must be met.
- The mirrors considered must be in *equal proximity* to the user. It would obviously not make much sense to send all requests to DVD #1 to national mirrors, whereas DVD #2 is served from another continent.
- The mirrors considered must have *equal capacity*. If there are two mirror nearby, but one of them has 10x the bandwidth, the requests can't be split fifty-fifty. If there are two DVDs to be served, it wouldn't be good to serve one from a strong mirror and one from a much weaker one.
- If all the users run after just *one* file, the whole idea doesn't apply.
- The files to be distributed intelligently must be similar in size and popularity. It would not work to send all requests on DVDs to mirror A and all requests to the release notes to mirror B...
Fifth, it is worth asking the question, how many mirrors are really disk-bound in their operation, and how many are more bandwidth-bound. Maybe the situation of some largest mirrors, whose bottleneck is disk throughput, is not representative for everyone. There are mirrors like in India for instance, where sheer bandwidth seems to be the biggest bottleneck. So, the first question is, which percentage of mirrors would benefit from better buffer cache utilization at all? Is it worth putting time into this?
(minus side)
- Currently, mirror selection tries to alternate between available mirrors; this means, users have a certain chance to work around a broken or slow mirror simply by repeating their request. The technique proposed here would avoid that, and mirror selection would be stuck to a one mirror, or to a smaller set of mirrors; users had no chance to get to a better mirror with the next request.
- Assuming the method works as designed, there can still always be requests that "disturb" the cache nevertheless: like people accessing the mirror directly, and "local" requests that are currently rightfully routed to a certain mirror because they originate from the network or the autonomous system of a mirror. Depending on the frequency of such requests, they might kill the whole effort. I am not sure about the effect of these requests. It probably depends on their absolute frequency.
(plus side)
- Even if only a small effect is achieved in the attempt to split up the requests, some mirrors will benefit very much from it.
- Byterange requests (partial GETs), which can reach mirrors in random order, are alleviated, because it is obviously better for mirrors they are limited to fewer files.
- Intermediary proxy caches also benefit from the technique, automatically, and for free.
- The technique should not require too much manual tweaking. Ideally, none. A need to configure manually which files are redirected to which mirror would not be very usable.
- Obviously, mirrors that are considered for splitting requests up between them need to be "grouped": Mirrors of similar location (same country) and similar capacity (same priority) can be put together into a group. Inside such an entity, there is room for the "fanciness".
- A classic mechanism to distribute requests would be to calculate a quick hash of the URL of filename, and take the modulo (remainder) of a division by the number of eligible mirrors. However, this alone will result in uneven distribution among mirrors, up to the extreme that if there is only one really popular file, all requests would be sent to a single mirror, instead of all mirrors. Or when onsidering 2 popular files, and 7 mirrors, only 2 mirrors would get requests. Thus, the usual "hash % number of buckets" algorithm is not suitable. (It would have been easy to just calculate "inode % # mirrors" and be done...)
- With only 1 file, the exercise is pointless.
- With only 1 mirror, the exercise is pointless as well.
- With 2 files, and an even number of mirrors, a perfect distribution is possible.
- With 2 files, and an odd number of mirrors, the requests can be split between n-1 mirrors, leaving one mirror unconsidered, which should receive a fair share of requests on all files.
- With 3 files it's getting a little bit more complicated:
- With 2 mirrors, 30% buffer cache can be saved by splitting 2 files and sending the 3rd file to both mirrors.
- With 3 mirrors, we have the ideal case where every mirror gets requests for one file.
- With 4 mirrors, we need to ensure that mirror #4 does not get 3 times as many requests as the other mirrors. 1/4 (one forth) of the requests needs to go to this "remaining" mirror.
- With 5 mirrors, it's principally the same as with 4 mirrors: split requests between 3 of them, and give the remaining two a fair share of the requests (two fifths = 40%).
- With 6 mirrors, we have an "ideal case" again.
- With 7 mirrors, ... you get the pattern by now:
7 mirrors % 3 files = 1
which can be transcribed by stating that 1 of the 7 mirrors needs to get 1/7 of the requests, and the rest of the requests is distributed round-robin to the other 6 mirrors.
Stable round-robin distribution to a "fitting" number of mirrors is straightforward if the mirrors are internally ordered by some internal id that doesn't change.
So to cast that into pseudo code, the mirror selection could be implemented like this: nF = number of files
nM = number of mirrors
f = # of the file in question
selected = # of the selected mirror (integer between 0 and nM-1) selected_mirror = int( rand(0..1) * (nm/nf) + f * (nm/nf) ) This simple, and light-weight computation does the job. It is not even necessary to take separate care of the remaining mirrors (nR = nM % nF), because they are dealt with automatically.
To see if this pencil-and-paper exercise is workable, I wrote a little script. It is attached to this mail. It runs random "requests" to a defined number of files, and uses the above algorithm to assign them to a defined number mirrors. It appears to work wonderfully to me. Below, you see the output of the attached test script with various parameters - let's start with 1 file. The output is as expected. (Sorry, no colourful diagrams ;-)
% ./ra.py --files 1 --mirrors 1 100000 requests (100%) to mirror0 for file0
% ./ra.py --files 1 --mirrors 2
50095 requests (50%) to mirror0 for file0
49905 requests (49%) to mirror1 for file0
% ./ra.py --files 1 --mirrors 3
33459 requests (33%) to mirror0 for file0
33170 requests (33%) to mirror1 for file0
33371 requests (33%) to mirror2 for file0
50237 requests (50%) to mirror0 for file0
49763 requests (49%) to mirror1 for file1
% ./ra.py --files 2 --mirrors 3
33329 requests (33%) to mirror0 for file0
33381 requests (33%) to mirror1 for file1, file0
33290 requests (33%) to mirror2 for file1
% ./ra.py --files 2 --mirrors 4
25081 requests (25%) to mirror0 for file0
24968 requests (24%) to mirror1 for file0
25054 requests (25%) to mirror2 for file1
24897 requests (24%) to mirror3 for file1
% ./ra.py --files 2 --mirrors 5
19851 requests (19%) to mirror0 for file0
20040 requests (20%) to mirror1 for file0
20008 requests (20%) to mirror2 for file1, file0
20019 requests (20%) to mirror3 for file1
20082 requests (20%) to mirror4 for file1
% ./ra.py --files 2 --mirrors 6
16645 requests (16%) to mirror0 for file0
16808 requests (16%) to mirror1 for file0
16804 requests (16%) to mirror2 for file0
16759 requests (16%) to mirror3 for file1
16554 requests (16%) to mirror4 for file1
16430 requests (16%) to mirror5 for file1 Perfect! Now, how does it cope with 3 files?
% ./ra.py --files 3 --mirrors 1
100000 requests (100%) to mirror0 for file2, file1, file0
% ./ra.py --files 3 --mirrors 2
49806 requests (49%) to mirror0 for file1, file0
50194 requests (50%) to mirror1 for file2, file1
% ./ra.py --files 3 --mirrors 3
33284 requests (33%) to mirror0 for file0
33323 requests (33%) to mirror1 for file1
33393 requests (33%) to mirror2 for file2
% ./ra.py --files 3 --mirrors 4
25095 requests (25%) to mirror0 for file0
25024 requests (25%) to mirror1 for file1, file0
24912 requests (24%) to mirror2 for file2, file1
24969 requests (24%) to mirror3 for file2
% ./ra.py --files 3 --mirrors 5
20026 requests (20%) to mirror0 for file0
20102 requests (20%) to mirror1 for file1, file0
19871 requests (19%) to mirror2 for file1
19991 requests (19%) to mirror3 for file2, file1
20010 requests (20%) to mirror4 for file2
% ./ra.py --files 3 --mirrors 6
16613 requests (16%) to mirror0 for file0
16626 requests (16%) to mirror1 for file0
16598 requests (16%) to mirror2 for file1
16876 requests (16%) to mirror3 for file1
16743 requests (16%) to mirror4 for file2
16544 requests (16%) to mirror5 for file2
% ./ra.py --files 3 --mirrors 7
14251 requests (14%) to mirror0 for file0
14125 requests (14%) to mirror1 for file0
14257 requests (14%) to mirror2 for file1, file0
14447 requests (14%) to mirror3 for file1
14308 requests (14%) to mirror4 for file2, file1
14252 requests (14%) to mirror5 for file2
14360 requests (14%) to mirror6 for file2
And with 4 files? % ./ra.py --files 4 --mirrors 1
100000 requests (100%) to mirror0 for file3, file2, file1, file0
% ./ra.py --files 4 --mirrors 2
49845 requests (49%) to mirror0 for file1, file0
50155 requests (50%) to mirror1 for file3, file2
% ./ra.py --files 4 --mirrors 3
33213 requests (33%) to mirror0 for file1, file0
33212 requests (33%) to mirror1 for file2, file1
33575 requests (33%) to mirror2 for file3, file2
% ./ra.py --files 4 --mirrors 4
24978 requests (24%) to mirror0 for file0
25034 requests (25%) to mirror1 for file1
25137 requests (25%) to mirror2 for file2
24851 requests (24%) to mirror3 for file3
% ./ra.py --files 4 --mirrors 5 20206 requests (20%) to mirror0 for file0
19819 requests (19%) to mirror1 for file1, file0
20070 requests (20%) to mirror2 for file2, file1
20103 requests (20%) to mirror3 for file3, file2
19802 requests (19%) to mirror4 for file3
% ./ra.py --files 4 --mirrors 6
16518 requests (16%) to mirror0 for file0
16533 requests (16%) to mirror1 for file1, file0
16755 requests (16%) to mirror2 for file1
16761 requests (16%) to mirror3 for file2
16602 requests (16%) to mirror4 for file3, file2
16831 requests (16%) to mirror5 for file3
Excellent!
And with 5 files (is anyone still following this far?): % ./ra.py --files 5 --mirrors 1100000 requests (100%) to mirror0 for file3, file2, file1, file0, file4
% ./ra.py --files 5 --mirrors 2
50059 requests (50%) to mirror0 for file2, file1, file0
49941 requests (49%) to mirror1 for file3, file2, file4
% ./ra.py --files 5 --mirrors 3
33404 requests (33%) to mirror0 for file1, file0
33249 requests (33%) to mirror1 for file3, file2, file1
33347 requests (33%) to mirror2 for file3, file4
% ./ra.py --files 5 --mirrors 4
24904 requests (24%) to mirror0 for file1, file0
25022 requests (25%) to mirror1 for file2, file1
25109 requests (25%) to mirror2 for file3, file2
24965 requests (24%) to mirror3 for file3, file4
% ./ra.py --files 5 --mirrors 5
20192 requests (20%) to mirror0 for file0
19836 requests (19%) to mirror1 for file1
20202 requests (20%) to mirror2 for file2
19869 requests (19%) to mirror3 for file3
19901 requests (19%) to mirror4 for file4
% ./ra.py --files 5 --mirrors 6
16880 requests (16%) to mirror0 for file0
16654 requests (16%) to mirror1 for file1, file0
16731 requests (16%) to mirror2 for file2, file1
16593 requests (16%) to mirror3 for file3, file2
16592 requests (16%) to mirror4 for file3, file4
16550 requests (16%) to mirror5 for file4
Fantastic.
To my surprise, it seems to work beautifully for larger number of files even! Or am I shooting over the roof now? See:
20083 requests (20%) to mirror0 for file3, file12, file1, file10, file7, file15, file5, file4, file8, file11, file9, file18, file16, file0, file6, file13, file17, file2, file19, file14
19777 requests (19%) to mirror1 for file32, file23, file27, file35, file36, file34, file39, file38, file28, file29, file24, file31, file22, file30, file20, file21, file26, file33, file37, file25
19823 requests (19%) to mirror2 for file57, file59, file56, file47, file40, file41, file42, file43, file44, file45, file46, file58, file48, file49, file55, file54, file53, file52, file51, file50
20059 requests (20%) to mirror3 for file75, file63, file73, file67, file65, file61, file64, file66, file74, file77, file76, file62, file70, file60, file72, file71, file79, file78, file68, file69
20258 requests (20%) to mirror4 for file87, file94, file80, file82, file96, file93, file99, file98, file88, file89, file86, file92, file84, file85, file91, file90, file97, file81, file95, file83
It seems fair and efficient.
When mirrors drop out or are added, the distribution changes -- it is not stable against such changes, but I suppose we can live with that. (After all, we live with complete entropy at present!) This all works only under a further condition: files that should be treated this way need to be indexed in some way. Maybe one would want to apply the special handling only to certain "hotspots" in the tree, like freshly released DVDs of a new distro. Practically, these files could be (automatically) assigned an integer as index during configuration. That would be the missing number "f" in the formula. Or it could be the n-th file inside a directory. That's an implementation-specific detail I guess. I don't know for other redirectors, but for MirrorBrain I would have a rough idea how to solve it in an easy way. Tagging only certain "hotspot" files for this extra treatment would allow to stick to a simpler mechanism for the mass of other files. Because, not to forget, for the buffer-cache-friendly request splitting the mirrors need to be grouped in a sensible way. Which is not a big task in itself, but the little things add up. In the picture that forms in front of my eye, I would expect the code to be not very costly, though.
Would be fun to implement. Anyone wants to fund a little bit of development, for the good of everybody?
