Posts Tagged ‘monospace’
What’s asynchrony good for?
In my previous post, "Tasked to get Results", I covered the use of DReAM's Result and the TPL's Task as waithandles for asynchronously completed work. Except that all examples dealt with blocking behavior to await the completion of the other task, which made the async illustration rather academic.
Blocking for async work does have its applications. The canonical example being a controlling thread firing off a number of workers and waiting for all of them to complete to gather the results. For implementing this example, Task and Result can be thought of as convenient alternatives to WaitHandle.WaitAll() or Thread.Join() or EndInvoke() calls:
// Summing up Results var results = new List>(); for(var i = 0; i < 100; i++) { results.Add(SomeAsyncWork(i)); } var total = results.Sum(x => x.Wait()); // Summing up Tasks var tasks = new List >(); for(var i = 0; i < 100; i++) { tasks.Add(SomeMoreAsyncWork(i)); } var total = tasks.Sum(x => x.Result);
Both rely on implicit blocking behavior to perform the summation. We could just as easily have explicitly waited on all the handles with C#5.0 AsyncCtp's TaskEx.WhenAll() or DReAM's Result.Join(), and then summed the results. One interesting feature to keep in mind about the join constructs in TPL and DReAM is that both return synchronization handles themselves. I'll explain why this is useful in the next article. Either way though, our main thread was blocked until the workers completed.
Finding opportunities for asynchrony
The parallel worker example is a great way to show the power of Task and Result for managing many workers, but in itself is not asynchronous. Asynchrony implies that we do not block waiting for work to complete, whereas the above example explicilty blocks the main thread. Asynchronous behavior may be combined with parallelism to manage resources better, but asynchronous behavior does not mean that we are performing the work in parallel. Quite the opposite, the primary goal of asynchrony is to suspend the caller until the asynchronous operation has completed. This means that most asynchronous workflows are inherently serial even if the steps executed may happen on different threads.
To properly show the benefits of asynchrony let's use something that we deal with every day and is inherently asynchronous: I/O – Whenver we are reading from or writing to a file, making a database call or calling a webservice we are making an I/O request and waiting for the response. We don't think of I/O as asynchronous because most I/O APIs have exposed these operations as synchronous method calls, i.e. they block the thread while waiting for the out-of-context operation to complete:
var command = new SqlCommand(
"SELECT CategoryID, CategoryName FROM Categories;",
connection);
// Blocking!
connection.Open();
//Blocking!
var reader = command.ExecuteReader();
// Blocking!
while(reader.Read()) {
// read row
}
reader.Close();
Most of time, we don't even interact with a DB at this low level, but that doesn't change the fact that in order to get data back from the database, we have to:
- Open a socket to the DB and wait for the connection to be established
- Send our SQL and wait to for a cursor to become readable
- Iterate over the cursor and wait for data to be sent
Each operation sends data over the wire and waits on a response. That waiting time is blocking the current thread.
Why so slow, I/O?
Most I/O operations are quite fast, and much of our optimization work relates to reducing the times of database queries, file reads and web service calls to the single digit milliseconds. However, compared to in-process memory access, which is measured in microseconds, even the best optimized I/O operation is orders of magnitude slower. Every time we treat I/O as synchronous we rely on thread scheduling to utilize the CPU time not utilized by the blocking thread, which in turn costs us in thread context switching and memory footprint.
But we've been doing this for ages and it hasn't been a problem, right? What's changed?
For one thing, it has actually been a problem and for the most part, we've just been throwing larger and larger machines at our server clusters to make up for the resource wastes committed. But all the optimization of I/O aside, in the real world we will always encounter the occasional slow DB query, file read/write blocked by a lock and web service request over a congested network.
For another, on top of those real world concerns, we've received a wake-up call in the form of evented I/O: Suddenly node.js and python's tornado web server have shown us that a single server could handle thousands of simultaneous connections, which would make traditional servers fall over and die. This incredible capacity is not due to some magic optimization or that python and javascript have secretly gotten faster than other languages. What they are doing is deal with I/O as non-blocking. It is easy to keep thousand's of sockets open at the same time when each isn't tied to a process or thread. When node.js boasts 100k+ connections at once, it still is only processing one request at a time, but every time a I/O operation is required, it schedules a callback for completion rather than blocking until the I/O has completed. That this one at a time model ends up with more requests/second shows how much overhead we incur when we rely on thread management. But there's nothing about node.js performance that couldn't be replicated in C# (and Manos de Mono is doing exactly that), as long as we break our addiction to blocking.
…and after you are done with that, could you do this?
Using an asynchronous approach provides a way to mitigate this in-process vs. I/O operation speed differential and stop blocking our threads. What we want is to be notified when the asynchronous operation has completed. While most I/O APIs provide some type of asynchronous calling convention (ADO.NET only provides it on the Execute* members), Task and Result provide a unifying interface for all operations and a way to handle the continuation of work in a non-blocking form in a much simpler way:
// Result
connection.Open();
Async.From(command.BeginExecuteReader,
command.EndExecuteReader,
null,
new Result())
.WhenDone(
result => {
var reader = result.Value;
while(reader.Read()) {
// read row
}
reader.Close();
}
);
// Task
connection.Open();
Task.Factory.FromAsync(command.BeginExecuteReader,
command.EndExecuteReader,null)
.ContinueWith(
task => {
var reader = task.Result;
while(reader.Read()) {
// read row
}
reader.Close();
}
);
The above introduces the continuation constructs Result.WhenDone and Task.ContinueWith, which allow us to chain operations to execute in the context of the asynchronous method's callback. Both Result and Task provide a way to convert the standard Begin*/End* pattern from any API into Result/Task, which we can then attach a continuation to. I should note that in the C#5.0 AsyncCtp Microsoft has added Extension Methods for virtually all Microsoft async patterns to further simplify invocation.
Next time, we'll take a look at a more complex workflow using the continuation passing style possible with Result.WhenDone and Task.ContinueWith and what complications it may introduce.
Tasked to get Results
The second monospace conference has finally been scheduled for July in Boston and I'm honored to have been selected to speak at it again. Back in 2009, i covered the gamut of concurrency tooling available in DReAM. This year, I'm going to concentrate on asynchronous programming and in particular cover both DReAM coroutines and the remarkably similar async/await constructs going into C# 5.0. In preparation for my talk, I thought it would be interesting to run a series of posts presenting my experience working with both async programming models.
Keeping things synchronized
Traditionally when work is being completed outside of the current thread there are a number of different patterns offered. You could spawn a new thread, use the threadpool or use a BackgroundWorker. To synchronize a Waithandle is used to signal completion on the waiting thread and a shared reference is commonly used to communicate the result. In addition, various asynchronous APIs offer Begin* and End* methods to start asynchronous work and be notified of their completion, using an untyped state object to communicate the outcome.
Trying to simplify and unify these patterns, DReAM introduced Result back in 2006 to support asynchronous operations for its REST framework running on .NET 2.0. A very similar construct called Task was added .NET 4.0 via the Task Parallel Library. While there are some semantic differences, both provide virtually identical capabilities. At the heart of both constructs are synchronization handles that serve simultaneously as a completion signal and as a means of marshaling the outcome to the interested party. Both can be queried for completion, used to block execution or attach a continuation handle to capture the outcome.
Let me know when you're done
The basic usage of Result as the party wanting to receive information about some asynchronous work is as follows:
Resultr = SomeAsyncWork(); // check if work is done if(r.HasValue) { ... } // check if work is was cancelled if(r.IsCanceled) { ... } // block until it's done r.Block(); // check if an exception occured if(r.HasException) { Console.WriteLine("work errored out: {0}",r.Exception); } else { // get the resultant value var value = r.Value; }
And the equivalent code using Task:
Taskt = SomeMoreAsyncWork(); // check if work is done if(t.IsCompleted) { ... } // check if work is was cancelled if(t.IsCanceled) { ... } // block until it's done t.Wait(); // check if an exception occured if(t.IsFaulted) { Console.WriteLine("work errored out: {0}", t.Exception); } else { // get the resultant value var value = t.Result; }
Aside from simple semantic differences in usage, Result and Task differ in how blocking behavior is treated. With Result, accessing .Value before work has completed will throw, while with Task accessing the analogous .Result will invoke its .Wait() for you and block. If you prefer Task's behavior, Result provides it's own .Wait() which blocks and returns the value of the result upon completion. Unlike Result.Block() but like Task.Result, the Result.Wait()call with throw if the work is faulted.
Go do some work
Now that we know how we synchronize with the work being done asynchronously, let's get this work started ourselves
Task is built around being provided an expression of the work to perform. There a numerous ways of setting up the task and determining how its execution is to be scheduled, but the simplest method is to use the Task.Factory:
var task = Task.Factory.StartNew(() => { var x = 1; // ... do some time consuming work return x; });
Firing off work is one of the largest differences between Task and Result. Where Task is fundamentally built around wrapping around a unit of work to be executed, Result is just the synchronization handle — firing of the unit of work is left as an execution detail. Since starting work on a threadpool thread is the one of the most common patterns, DReAM provides this functionally via the Async static class:
var result = Async.Fork(() => { var x = 1; // ... do some time consuming work return x; },new Result ());
The signature of Async.Fork shows another philosophical difference between Task and Result: With Result, we use a pattern in which any method returning a Result will take that result as its last argument. This is done primarily because with Result the concept of timeout is fundamental and attached to the handle, whereas with Task it is usually an argument on blocking operations. By requiring that a Result is passed into the method, the opportunity to initialize the handle with timeout and cleanup behaviors is provided before the result is attached to a unit of work.
Everything's synced up
With what I've shown so far, Result and Task might be handy constructs, but we really haven't strayed far from the usual blocking behaviors that are so troublesome and resource intensive. Yes, we've parallelized some things, but we're really still doing synchronous computing. Asynchrony doesn't actually imply any parallelism, it really just refers to our work not blocking a thread and instead suspending execution and resuming it once we're notified of the completion of work.
Next time, I'll cover how continuations can be used to chain operations to create non-blocking, sequential operations.
MindTouch @ Monospace: Going Concurrent & Keeping your Sanity
Last week, on October 28th, Steve and I had the pleasure of presenting at the Monospace conference. I picked a topic close to the API Team’s heart, Concurrency, and specifically covered our use of the asynchronous method pattern and our coroutine framework. We’ve covered a lot of the motivation and reasoning for this a number of times on the Concurrent Podcast. I captured the talk as a screencast because i think it’s a useful primer on concurrency in MindTouch 2009 and our Dream framework and its especially useful as a hands-on companion for Concurrent Podcast 3: Coroutines:
I’d like to thank Scott Bellware for putting on a great conference, as well as everyone from the Mono team and the Mono/.NET community who attended. Monospace was incredibly insightful and I met a lot of very smart people there. Let’s hope this becomes a regular conference.