The beauty of yield statement

How many of you played with C# yield statement ? I guess that most of you did.
Anyway, like most of the MSDN examples out there, when used incorrectly, it could introduce very bad behavior to your code. Consider the following:

public IEnumerable<User> GetUsers(int count)
{
    using (MysqlConnection connection = new MysqlConnection(“…”))
    {
        // MysqlDataReader reader = create an MysqlCommand and execute it
        while (reader.Read())
            yield return new User(/*… fill parameters from the reader … */);
    }
}

Looks pretty harmless right? Not quite. The yield statement is actually transformed to a “state machine” which means that every time we yield a result back to the client (the caller of GetUsers in our example), we wait for the client to call to the next item (via IEnumerator<T>.MoveNext()). The code above will hold the connection open until the client done iterating all of the User items. This will lead into major scalability issues very quickly! You should always keep your DB connections open for short period to prevent connection exhaustion (threads waiting for available DB connection in the pool for long period, until timeout). Because yield returns the control to the caller, it might be that the caller will “take his time” thus leading to connection exhaustion.

On the other hand, used wisely and yield yields (lame joke, sorry) HUGE benefits:
[note: code written in notepad, stupidity won’t compile]

  • Avoid useless memory allocations

#1:
How many times you end up creating something like this:

    public List<T> Filter(List<T> input, Predicate<T> predicate)
    {
       List<T> output = new List<T>(input.Count /2);
       foreach (T item in input)
          if (predicate(item))
             output.Add(item);

       return output;
   }

We allocate much more memory than we need only to hold the output during the calculation. A better approach will be:

    public IEnumerator<T> Filter(IEnumerator<T> input, Predicate<T> predicate)
    {
       foreach (T item in input)
          if (predicate(item))
             yield return item;
   }

This way we allocate only one T at a time (will be saved in the generated state machine). In addition, the client could choose to send each item via yield as well, thus saving the need to create the “input” before calling our Filter method.


#2:

Another oh-(gosh-why)-so-common example is the following:

public void Save(T item)
{
    Save(new T[] { item });
}

public void Save(ICollection<T> items)
{
    // do your magic here to save items
}

Assuming you call Save with a single item quite a lot, you’re allocating A LOT of memory to create one-item arrays. A better approach will be:

public void Save(T item)
{
    Save(Yield.One(item));
}

public void Save(IEnumerator<T> items)
{
    // do your magic here to save items
}

public static class Yield
{
    public static IEnumerator<T> One<T>(T item)
    {
        yield item;
    }
}

~Zero memory allocation here.

  • Avoid “impossible” memory allocation

Let’s say you want to read a 20G file with emails where every line holds a single email. Trying to declare a List<string> and filling it up will make your memory blow up obviously. You simply can’t hold that much in memory. Instead, you can use Stream.ReadLine and yield back each row to your client, until all of the emails are taken care of. Yes, you can try to read the file in chunks (keeping a pointer), but this is exactly what yield does under the hood. Reminder: DRY principle is gold (or Don’t Repeat .Net Framework, in our case).

  • Execute synchronous code asynchronously (nicely achieved via CCR)

A bit advanced, but you can read all about it here. The great benefit is you can transform (almost) any “yield based” code to run async, if needed/wanted.


Recap:
Prefer using the yield statement as long as calculating values doesn’t require holding an expensive resource like a DB connection, or a FileHandler for long period of time.

 

Oren Ellenbogen