In my previous blog post, I wrote about a bottleneck I ran into that caused my application to pause when I felt it should’ve been working hard.
My workers were all stuck waiting for data from one source while other sources of the same data were available.
This is how I fixed it.
First, it’s good to see exactly what the problem looked like.
There are four workers and they all need a mix of data of type
b data sets are replicated so any given record
can come from one of two of these servers.
If there are equal tasks for
b and servers are all generally
available, but one of the servers is slower than the others, then you
will invariably end up with all of the workers stuck retrieving data
a rep 1 even though
a rep 2 contains the same data.
But why is it blocked on that node? Because that node is slow.
When randomly selecting workers and one worker takes considerably longer to complete its work, you will inevitably be blocked waiting for slower workers to complete their tasks.
If this isn’t intuitive to you, think about it this way:
Imagine just two servers containing the same information from which you randomly choose for any given request. Let’s say one server is 8x slower than the other.
Assume a reasonable random number generator such that there’s a 50% chance that any given request will be issued against the fast one, and a 50% chance it will be issued against the slow one.
Now imagine you’ve got two clients that are wanting to grab a bunch of data from the two servers.
The scenario illustrated above will occur frequently:
If you can express your work in the form of
pairs, you can use go-saturate to help resolve this
type of problem with a double-fanout as illustrated and discussed
Internally, the resources are each represented by a channel that has one or more goroutines servicing it (user-specified). A resource is “available” when a worker is idle and just waiting for new work on a channel. If the resource is slow, the worker spends more time working and less time accepting new work (see the red in the sequence diagram above).
The diagram to the right represents a producer feeding into two tiers
of workers. The first tier workers, e.g.
worker 1 are higher level
(e.g. copy a file from the internet to local disk) while the second
tier workers, e.g.
a rep 1 are lower-level (e.g. read this file from
It’s important to understand that the first-tier workers are in the same process as the second tier workers, the distinction being that the second tier workers are operating on a resource identified by the first tier workers and the duration of that work isn’t necessarily predictable.
In the illustrated case, workers 1 and 3 are concurrently performing
tasks that need data from
Let’s break down the specific example shown in that diagram.
a rep 1and
a rep 2workers as being possible sources of
selectfrom worker 1 chooses
a rep 2
selectfrom worker 3 chooses
a rep 1
Note that steps 4 and 5 can occur at approximately the same time, but
a rep 1 worker can only pick up one job and the first-tier
worker 1) can only have one worker pick up its task.
By asking for one of either and choosing the most available resource,
they’re able to do their work without blocking on each other. For
a rep 1 is slow, tasks requiring
a will continue to
a rep 2 as fast as that replica can keep up.
In a sense, the resources are self-selecting.
Here is a concrete example that illustrates go-saturate in the real world. It’s being used in the cbfs client code. (cbfs is a distributed blob store for Couchbase Server).
As you can see in the right (a cbfs scenario where one server has 100Mbps ethernet and the others have gig-e), slow resources are still used, but when a slow resource is being slow and a fast resource is available, we won’t choose the slow one.
In long-running tasks, this is great – if I can keep all the fast nodes and the slow nodes busy, then I get a little more out of the network as a whole.
You can read the go-saturate docs to get an overview of the API.
Also, I gave a presentation at work earlier this week on this topic which goes a lot further into basic go channel semantics interactively including a range of information from basics like how to send and receive on a channel to how send a message to exactly one of a dynamically built set of recipients.
That presentation is captured in the following 20 minute video: