Monday, November 5, 2007

Concurrent Thinking, Part 1

Note: I originally wrote this back in June, but never finished it. An now, presenting part one.... Concurrency is the key idea to come to terms with in Erlang and the key tool that it gives you. It's the modeling tool to use when solving a problem. I realized this when I was implementing a code transformation pipeline. The pipeline would go input to transformation to output. I wanted to be able to interchange any of those pieces, so I could read from multiple formats, transform the data in different ways, and write the data out in different formats. And it needed to be able to handle a lot of data. On my first attempt, I approached the problem the way I might with Python: Input, transformation, and output are iterators which are just chained together. For example, the input would read CSV, the transformation would drop every other item, and the output would write it back to CSV::
  from __future__ import with_statement

  import csv

  def input(filename):
      with open(filename, 'rb') as f:
          for row in csv.reader(f):
              yield row

  def transform(rows):
      rows = iter(rows)
      while True:
          rows.next()
          yield rows.next()

  def output(rows, filename):
      with open(filename, 'wb') as f:
          writer = csv.writer(f)
          writer.writerows(rows)
That's not very useful, but you get an idea of the problem and what I was trying to do. Using iterators/generators is nice here because I don't have to worry about too much input filling up memory. Also, I don't build a list I don't need or use. Unfortunately, Erlang doesn't have iterators. I thought I would use lists instead and work from there. I could have had the input function read everything into a list, the transformation function transforms it into another list, and the output function writes a list's contents to disk. It wouldn't have been pretty, but it would have worked. Part way through coding this, I realized that Erlang has a great solution to this: concurrency.
Stay tuned tomorrow, when the hero attempts to use powerful Erlang concurrency and ends up hoisted on his own petard.

2 comments:

Anonymous said...

Iterators can be easily defined in
Erlang - either with strict of lazy
semantics, or using processes.

for example:

map(F, [H|T]) -> [F(H) | map(T)];
map(F, []) -> [].

a lazy map


map(F, [H|T]) ->
     {F(H), fun() ->
         map(F, T)
     end}.


so map(fun(X) -> 2*X end, [2,4,7])
returns {4, F1}
and F1() returns {8, F2}
and F2() return {14, F3}
and so on

Or you can just use chains of processes

Eric Rochester said...

Right. I was considering using a post to look at the functional style you describe here, but you've done such a good job that I think I'll pass on a longer explanation.