Tuesday, September 24, 2013

Anonymous Methods and Closures in C#

In the last two posts we looked at Action, Func, delegate and callbacks in C#. In this post we look at anonymous methods and closures.

Anonymous Methods

A method is anonymous if it is not bound to a name. A couple of ways to use anonymous methods in C#:

1. The 'new' way, using lambda expressions:

System.Func< int, int > doubler = x => x + x;
2. The 'old' way, using delegate explicitly:

delegate int DoublerDelegate( int d );
class Test
{
  public DoublerDelegate Foo()
  {
    return delegate( int x ) { return x + x; };
  }
}

//In the client, we can call the anonymous method:
Test t = new Test();
int i = (t.Foo())( 2 );
As expected, both ways give the same result.

Under the hood

Looking at the IL in the Ildasm disassembler for the 'old' way (explicitly using delegates), we see:

As expected, a DoublerDelegate which extends System.MulticastDelegate.
And inside the Test class, the compiler generated an anonymous static method corresponding to our anonymous method:

.field private static class anonymous.DoublerDelegate 
'CS$<>9__CachedAnonymousMethodDelegate1'
//...

.custom instance void 
[mscorlib]System.Runtime.CompilerServices.CompilerGeneratedAttribute::.ctor() 
= ( 01 00 00 00 )
 
.method private hidebysig static int32  
'
b__0'(int32 x) cil managed { //... Implements our anonymous doubler method } // end of method Test::'b__0'

In the DoublerDelegate instance, the static method is invoked.

IL_0009:  ldftn    int32 anonymous.Test::'b__0'(int32)

This static method gets called in main, in lieu of our anonymous method.

If we looked at the IL for the 'new' way with Func, we would see that the IL generated for the anonymous method is similar: A static method is generated and called from the Func.

Non-local variables and closure(?)

Our anonymous doubler methods only accessed the variable x. x is local to the anonymous method and not influenced by the world outside the method once it is bound to a value.
This is why the compiler can generate just a static method for the anonymous method. The method does not need external state.

Let us introduce a variable which is non-local to the anonymous method.

public DoublerDelegate Foo()
{
  int i = 10;

  return delegate( int x ) {
    i = i + x;
    return i * x; 
  };
}

i is non-local to the anonymous method, but used by it (in other words, i is a 'free variable'). Now the compiler would have to account for the variable i, whenever the anonymous method is called.

It does this by generating a new class.

.class auto ansi sealed nested private 
beforefieldinit '<>c__DisplayClass1'
       extends [mscorlib]System.Object
{
  .custom instance void 
  [mscorlib]System.Runtime.CompilerServices.CompilerGeneratedAttribute::.ctor() 
  = ( 01 00 00 00 ) 
} // end of class '<>c__DisplayClass1'
This generated class encapsulates both the non-local variable i, and the generated anonymous method which uses it.

Raymond Chen's post from years ago elegantly describes how anonymous methods are generated.

BTW, there seems to be some confusion among people who know better than me, whether the example above is a true 'lexical closure'. For example in Scheme, the closure would consist of the anonymous function and the environment which references it. In case of C#, this is simulated by generating classes and methods.

For my point of view, of just another application programmer, the above code is an example of closure in C#.

Garbage Collection is important

In the example, the variable i has to live even after the enclosing method Foo() has returned. Why? Because the anonymous method references it, and it could live on based on the client code's intentions. We saw the compiler generated an internal class which maintains the non-local variable i. But who frees the generated class? The answer is obvious - the garbage collector. The internal object would be marked for deletion whenever the referencing anonymous method is freed. While this is a trivial question in C#, it would not be so simple if the closure would have to be freed in the absence of an omnipresent GC.

Gotchas

In C# anonymous methods, ref and out parameters cannot be captured automatically. The GC essentially works on the heap, not the stack. Reference params are not moved to the heap in C#, they live on the stack and are not captured in the closure. If we do need ref parameters, we would need to make local copies ourselves. In other words we have to handle the copy-in/copy-out.

This post from one of the implementers of anonymous methods in C# explains the copy-in/out semantics. There are other posts on his blog which explain deeper nuances of anonymous methods in C#. The matter of ref and out parameters, stack and heap are also explained in detail by Eric Lippert.

In Practice

Delegates which simulate functions-as-first-class-citizens are a useful feature in C#. They allow us to better express verbs (functions) in what is essentially a kingdom of nouns (classes). This helps an application programmer in terms of expressiveness, composability and terseness.

But how useful are anonymous methods and closures to just another application programmer like me?

Anyone who has used Linq, threading or task library in C# in some depth would be familiar with this topic. Here we use an anonymous method which forms a closure over the non-local variable 'extra'.

var extra = 10;
var result = myList.Select( x => x + extra );

Consider another example. We format a string based on some inputs. We need to insert a comma into the string based on the inputs.

public string SerializeToStr( string subject, string status )
{
  StringBuilder output = new StringBuilder();
  output.Append( "{'ticket':{" );
  bool aFieldExists = false;

  Action< string, string, string > Append = ( string item, string header, string trailer ) => {
    if ( item != null ) {
      if ( aFieldExists ) {
        output.Append( ", " );
      }
      aFieldExists = true;
      output.Append( header ).Append( item ).Append( trailer );
    }
  };

  Append( subject, "'subject':'" + DateTime.Today.ToString() + " ", "'" );
  Append( status, "'status':'", "'" );
  output.Append( "}}" );

  return output.ToString();
}

This needlessly long example could as well have been implemented with a helper method and a state variable for aFieldExists. However the anonymous method does provide for better locality of reference from an application code's point of view.

Anonymous methods and closures seem to be syntactic sugar in C# from a user's point of view. However they do improve expressiveness, and can be quite addictive to use.


1 comment:

  1. I like your post.
    i guess we have the same taste

    http://frederictorres.blogspot.com/2013/05/closure-in-c-versus-javascript.html

    ReplyDelete

Boston, MA, United States