Wednesday, November 7, 2007

Concurrent Thinking, Part 3

When I decided to dust off and publish the old post I had written, I wasn't expecting it to turn into a week-log series. And it's not. This is the last.
If you'll remember, gentle reader, I've been foreshadowing trouble. Today's the payoff. You'll remember that the reason I didn't just slurp the entire file into memory at one time and build lists from it is that I need to be able to process files that were much larger than would fit in memory. If the process reading the file can read all of it before the transformation process can handle much of it, or before the output process can write the output, the file will be read into memory in its entirety, more or less, and this entire exercise has failed. To imagine how this could happen, suppose the input comes from a local file, but the output is being written across a network. The input process could read all the file, and the transformer process could handle the data, but the messages would still pile up in the output process's mailbox, filling memory and bringing the system to its knees. Or suppose the transformation being applied involves a time-intensive computation. Its mailbox would fill up memory when the input process sends it the entire file while the first few messages are being transformed. Again, the system dies, horribly, horribly. I won't tell you how I solved this when it happened to me. Looking back, it was a nasty, nasty kludge that I only used because I was under time pressure. What I should have done was use the functional solution someone mentioned in a comment to Concurrent Thinking, Part 1. This solution involves having the input and transformation functions return a tuple with two values: data and another function. The new function will return more data and yet another function. Eventually, when there's no more data, the function will just return a flag indicating that. I won't rewrite the entire example from yesterday in this style, but the input function would look like this:
input(Filename) ->
    {ok, F} = file:open(Filename, [read]),

input_continuation(F) ->
    fun() ->
        case file:read(F, 1024) of
            eof ->
            Data ->
                {Data, input_continuation(F)}
Here, input/1 opens the file and calls input_continuation/1. This returns a function, A. When you call A, you get the first block of data and a new function, B. B will return the second block of data and a function C. This happens until there is no more data, at which point the file is closed and eof is returned. This solution is very Erlang-ish, and it throttles nicely so it doesn't accidentally fill up memory. End of series.

No comments: