« What Makes Good Code Good? | Main | A truly lazy OrderBy in LINQ »

March 8, 2010

The Visitor Pattern and dynamic in C# 4

Introduction

The Visitor Pattern allows new functionality to be added to a class hierarchy without modifying the hierarchy. It accomplishes this by having one (virtual) "accept" method that can call back many different visitor implementations.

The Visitor Pattern is a way of implementing double dispatch in languages (like C#) that don't support it natively; as a result, many have argued (1, 2, 3) that the existence of this pattern is an indication of a missing language feature.

C# 4 addresses this limitation with the addition of the dynamic keyword, which allows method calls to be dispatched dynamically at runtime based on the runtime types of the objects involved. This can be used to implement the visitor pattern more simply.

Example

Consider a typical pedagogical class hierarchy:

Animal Class Diagram 
We implement the Visitor Pattern by defining an IAnimalVisitor interface, adding an abstract Accept(IAnimalVisitor) method to Animal, then implementing it on (usually only) the concrete derived types.

    interface IAnimalVisitor

    {

        void Visit(Bear bear);

        void Visit(Lion lion);

        void Visit(Snake snake);

        void Visit(Tiger tiger);

    }

 

    abstract class Animal

    {

        public abstract void Accept(IAnimalVisitor visitor);

    }

 

    abstract class Mammal : Animal { }

 

    abstract class Feline : Mammal { }

 

    sealed class Lion : Feline

    {

        public override void Accept(IAnimalVisitor visitor) { visitor.Visit(this); }

    }

 

    sealed class Tiger : Feline

    {

        public override void Accept(IAnimalVisitor visitor) { visitor.Visit(this); }

    }

 

    sealed class Bear : Mammal

    {

        public override void Accept(IAnimalVisitor visitor) { visitor.Visit(this); }

    }

 

    abstract class Reptile : Animal { }

 

    sealed class Snake : Reptile

    {

        public override void Accept(IAnimalVisitor visitor) { visitor.Visit(this); }

    }

I'm going to assume that iterating through the collection of Animal objects is simple; in this example, they're just stored in a List. The code to visit them (in the "classic" way) is as follows:

    class ClassicVisitor : IAnimalVisitor

    {

        public void VisitAll(IEnumerable<Animal> animals)

        {

            foreach (Animal animal in animals)

                animal.Accept(this);

        }

 

        public void Visit(Bear bear) { Bears++; }

        public void Visit(Lion lion) { Lions++; }

        public void Visit(Snake snake) { Snakes++; }

        public void Visit(Tiger tiger) { Tigers++; }

    }

By using "dynamic" in C# 4, we can write the code as follows. At runtime, the system will pick the best matching overload from the available methods and call it. There's no need for the object to provide a visitor interface so that it can call us back; indeed, this lets us process class hierarchies (XNode, Expression, etc.) that don't implement the visitor pattern. (In this example, Animal is no longer required to expose the Accept(IAnimalVisitor) method.)

    class DynamicVisitor

    {

        public void VisitAll(IEnumerable<Animal> animals)

        {

            foreach (Animal animal in animals)

                Visit((dynamic) animal);

        }

 

        // Visit methods defined as before (but are now private)

    }

Another advantage is that we can visit at a level in the hierarchy not supported by the visitor interface:

    class DynamicVisitor2

    {

        public void VisitAll(IEnumerable<Animal> animals)

        {

            foreach (Animal animal in animals)

                Visit((dynamic) animal);

        }

 

        void Visit(Mammal mammal) { Mammals++; }

        void Visit(Reptile reptile) { Reptiles++; }

    }

In order to see what the relative performance of these methods is like, I wrote three other visitors that use different strategies to accomplish the same goal: the "Reflection" visitor uses reflection and MethodInfo to invoke the correct method based on the type of object being visited; the "Type Checking" visitor uses hard-coded type tests to dispatch to the correct method; the "Delegate" visitor uses a hard-coded list of delegates to dispatch to the desired method. I also wrote a "Null" visitor, which does nothing but enumerate the list of animals (to establish a baseline).

    class ReflectionVisitor

    {

        static readonly Dictionary<Type, MethodInfo> s_methods = new Dictionary<Type, MethodInfo>();

        public void VisitAll(IEnumerable<Animal> animals)

        {

            foreach (Animal animal in animals)

            {

                Type type = animal.GetType();

                MethodInfo mi;

                if (!s_methods.TryGetValue(type, out mi))

                {

                    mi =

                        (from m in GetType().GetMethods(BindingFlags.Instance | BindingFlags.NonPublic)

                        where m.Name == "Visit"

                        let p = m.GetParameters()[0]

                        where p.ParameterType == type

                        select m).Single();

                    s_methods.Add(type, mi);

                }

 

                mi.Invoke(this, new object[] { animal });

            }

        }

 

        // Visit methods defined as before

    }

 

    class TypeCheckingVisitor

    {

        public void VisitAll(IEnumerable<Animal> animals)

        {

            foreach (Animal animal in animals)

            {

                Bear bear = animal as Bear;

                if (bear != null)

                {

                    Visit(bear);

                    continue;

                }

 

                Lion lion = animal as Lion;

                if (lion != null)

                {

                    Visit(lion);

                    continue;

                }

 

                Snake snake = animal as Snake;

                if (snake != null)

                {

                    Visit(snake);

                    continue;

                }

 

                Tiger tiger = animal as Tiger;

                if (tiger != null)

                {

                    Visit(tiger);

                    continue;

                }

 

                throw new InvalidOperationException();

            }

        }

 

        // Visit methods defined as before

    }

 

    class DelegateVisitor

    {

        readonly Dictionary<Type, Action<Animal>> m_methods;

        public DelegateVisitor()

        {

            m_methods = new Dictionary<Type, Action<Animal>>

            {

                { typeof(Bear), a => Visit((Bear) a) },

                { typeof(Lion), a => Visit((Lion) a) },

                { typeof(Snake), a => Visit((Snake) a) },

                { typeof(Tiger), a => Visit((Tiger) a) },

            };

        }

 

        public void VisitAll(IEnumerable<Animal> animals)

        {

            foreach (Animal animal in animals)

                m_methods[animal.GetType()](animal);

        }

 

        // Visit methods defined as before

    }

Performance

In order to get nice big numbers, all timing was performed on a Pentium 4 3.4GHz computer with .NET 4 RC1. The number in the chart is the time (in milliseconds) that it took to invoke VisitAll one million times. (This data is fairly unscientific; I claim no statistical validity.)

For each method, I visited a different collection of objects: "Animals" was a mix of all four concrete types; "Mammals" had only types derived from Mammal, and "Lions" (obviously) contained only Lion objects.

Method/Collection Animals Mammals Lions
Null 326 338 388
Type Checking 502 484 484
Classic 661 625 546
Delegate 1467 1509 1543
Dynamic 2 3526 1471 1362
Dynamic 3589 1483 1290
Reflection 30952 32180 31559

chart

One obvious result is that using MethodInfo.Invoke is extremely slow; I had to truncate the bars in the chart to make them fit. (And this is after improvements that must have been made in .NET 4; the same code compiled for .NET 3.5 took 108s--more than three times longer--to run, while the other methods stayed the same.)

Using "dynamic" objects incurs a substantial startup cost: the first call to Visit took 106ms, while the second took 0.007ms (which is barely measurable). Reflection was quicker to initialize (6ms) but more costly(0.037ms) for each call. (These costs are not included in the table above.)

I used the three collections (Animals, Mammals, Lions) to test if the DLR optimises for situations where the runtime type is the same as the last time this method was called; the results seem to indicate that it does do this type of optimisation: the more homogenous the collection, the faster the method ran. In a typical visitor situation, the type of the visited object will be changing a lot, so be aware that there is a penalty for this type of input.

Maintenance

What happens if we add a new type (e.g., Eagle, derived from Bird, derived from Animal) to the hierarchy? Obviously all the visitors need to be updated to handle this new type and process it appropriately. The Classic, Dynamic, and Reflection visitors only need to have a new "Visit" method added; the Type Checking and Delegate visitors also need additional dispatching code. However, if the new Visit method isn't added, the visitors will fail in the following ways (* means the exception is dependent on the implementation of the code; the listed exception is based on my sample code shown earlier):

Summary

The 'dynamic' keyword in C# 4 allows us to eliminate a lot of boilerplate code that was necessary because C# 3.0 couldn't dispatch methods differently at runtime based on the types of their arguments, with an insignificant (for most applications) performance penalty (30ms on a collection of 10,000 objects). (Note that I haven't profiled the impact on memory of using this technique; it may or may not be significant.)

A major drawback is the loss of compile-time type checking and early detection of errors if the class hierarchy is modified. By using 'dynamic', we also lose refactoring support, "find usages", and IntelliSense (if the dynamic object is used for more than the simple cast shown above). To some degree, the stability of the class hierarchy and the sophistication of accompanying test suites mitigate these problems, but they are still significant problems to introduce. Using 'dynamic' in the test suites themselves seems like a good way to improve productivity, but the overhead of creating the additional tests necessary to detect what would otherwise be compile-time errors may proscribe its use as a replacement for the visitor pattern in production code.

The full source code file for the examples in this post is available at http://gist.github.com/324732.

Posted by Bradley Grainger at March 8, 2010 5:01 PM

Trackback Pings

TrackBack URL for this entry:
http://ancientblogs.logos.com/mt-cgi/mt-tb.cgi/329