Running dotTrace 3.1 on multi-core machine

I’ve spent around 2 hours to figure it out, so at least I’ll write it down.
I’m using dotTrace 3.1 to do some analysis for .net web application running on Windows Server 2008 R2 (IIS 7.5).

I got strange numbers and was missing some data (methods calls) running “profile web application” until I set the affinity of the w3wp.exe process to single Node (single core). This can be done easily by opening the Tasks Manager, right click on w3wp process –> Set Affinity –> pick just one core.

Only then you should record the required page(s) and take the snapshot.

 

Keeping team’s knowledge base

During team’s growth, the practice of keeping the knowledge base sound, looks like mission impossible. There are simply too many concerns to write down and maintain: picking a proper (=”searchable”) title to the knowledge base item, keeping the history of changes in the design and why they were made, adding some drawing etc. Text notes (and pretty pictures), what can we do, are getting obsolete extremely fast.

“How can we keep the team’s knowledge base without running after our tail?”

My team came up with this suggestion:

“Let’s keep the titles and put owners next to it, the rest can be solved with good communication and tests!”

Examples for such “knowledge base items” are, from our real list:

  • Supporting Contextual Sign-In scenarios (Avish)
  • Common CSS Classes for our beautiful look-and-feel (Moran)
  • Creating and using pretty-URL routes (Ken)

We broke it into domains like “basic architecture”, “common features”, “cross-cutting concerns”. Now the list is easy to search at (wiki) and fun to maintain.

This work extremely well if:

  • Code-reviews are done on a regular basis. This means that at least 2 developers can talk about each knowledge base item.
  • Tests are written to built great confidence (to refactor easily) in the code. I talk about unit-testing, integration testing, UI testing etc.
    • Great tests explain how to use the API and what should be expected from it.
    • Tests (mostly integration tests) explain the latest architecture.
  • Comments in code should include the “why” behind the logic. Team members can add to it even more if needed, by talking with each other.
  • New developers are exposed to the knowledge base, talking with the relevant owners. This is a great reference to look at.

Simple, short, effective!

 

Estimations spin the wheel

Great dishes. Great timing. Every single day. This is one amazing place to eat at!

In many ways, running a great team of developers is just like running top-quality kitchen.
You’re measured by your ability to serve great quality(CI? well tested?), super tasty(nice UI?) dishes (features?), to hungry people (clients?) in reasonable time (before they’ll leave to another site).

Without solid understanding of ETA, there is no way to predict quality and release in time.

Most of us are driven by what life has to throw on us, may it be some personal event, a new book you read that makes you want to refactor everything or simply a lot of context-switches between projects and bugs.
Add this to our constant urge to excel at what we do, and you get a “miraculous” skill of losing control over our time. We’re built this way. Life is pushing us around!

How does this effect us or the “wheel” I’m talking about? I want to introduce you to Joe. 
Joe is a nice guy in my imaginary world, working as a superstar developer for “TheBestOutThere” company. Joe was assigned to complete a feature and asked to estimate when it will be done. “This feature is quite simple”, he shoots fast, “a few changes here, a few there and then the UI to nail the all thing. Hmmm… I guess that no more than 2 days of work”. So our story begins…

The first day goes pretty well as Joe adding the needed fields to the database and even manage to complete some tasks regarding the logic needed. On the second day (reminder: the last day according to the ETA) things starting to bite Joe’s ass. The feature turns to be a bit more difficult than he anticipated, and he feels that it will take 2 extra hours to complete the work. Keeping it in mind, he already notifies the wife he’s going to be late today. “Damn, late again?”, he ponders, “oh well, at least I know what is left to do now”. Knowing that he needs to leave late today, he continues to work and suddenly he detects a really ugly piece of code in his path. “Refactoring time baby!”, he smiles to himself. “Well, I already have a few spare hours, why not?”, he convince himself once more. It works. The refactoring takes additional 4 hours. Joe, in a panic attack understands that it took more than he planned, hurries up missing a few critical tests and canceling the “code review” meeting claiming “this feature is simple enough!”. The code-review still takes place as this is simply a must for top-quality kitchens. Fixing the code review comments, including adding the missing tests (as needed to begin with) cost a few more hours and so days starting to fly by. After 5 days, Joe moving the feature to “code-ready”, now waiting for QA to test the feature. QA opens about 7 bugs, which take 6 more hours to fix (with solid unit tests to reproduce these bugs). On top of it, some more fixes were needed for easy deployment later on in production. The feature took total of about 7 days. Where is the missing 5 days went to? is Joe simply a poor estimator? Not quite, he just suffer from bad “context” we all do. He suffers from the constant battle between deliver fast and deliver with high quality. The problem is that it’s really hard to understand how estimation and quality work together.

If ETA is Jesus, Quality is God. Which is more tangible?

People are wasting time trying to convince themselves they can outperform on a daily basis. We can’t. Our best is probably not as great as we imagine. Instead, let’s not try to improve our ability to code faster, think faster or drink more coffee. Even if you manage to get better at it (or consume more coffee and urinate to a cup), it won’t last for long (urine tends to stink in the room).

Instead, think carefully about “where the hack my 5 more days went to? Could I see it coming?”
Here is the “context” Joe needed to have in order to produce a complete feature, with great quality, in time:

  1. Did I spare enough time reading the spec and talking with the Product Manager about it? (yes, this is part of the feature!)
  2. Did I make sure there are no open issues left?
  3. Did I spare time thinking about how this feature will be tested? how we can automate these tests easily?
  4. Did I spare time to design the solution and do the required review with other people? what about the hours needed to fix your solution based on the reviews?
  5. Did I spare time to read QA test cases and give feedback on it?
  6. Do I have everything you need to make the feature a huge success?
    1. What about people helping you out where needed?
  7. Did I spare enough time to write the code itself? are you sure?
  8. Did I spare time for testing? I talk about really great testing (unit tests, integration tests, automated UI tests, manual UI tests)
    1. Again, the goal here is great confidence in quality, not 100% test coverage.
  9. Did I spare time for code reviews? for code reviews fixes?
  10. Did I spare time thinking about how this feature will be deployed?
  11. Did I spare time sitting with the Product Manager again on your final result before merging your work back to “trunk”? (do it now as it’s “hot”)
  12. Did I spare time for “bug fixing buffer” (saving ~10% of the feature time for bug fixing is a good start, try to reduce later)?

Estimating each one of these steps (use “buffers” if certainty is low) and you’re a bit closer to understand where the missing 5 days vanished. 
Joe was thinking only about the coding effort, not the entire picture! This, by nature, means inability to consistently predict when something will end. He was pricing the wrong thing!

What does it take to make your team great unit of brilliant minds producing great dishes, every time:

  1. Hire the best. They simply worth it. (oh well, that was easy, right? :))
  2. Trust them doing the best work one can do! Help them get them but don’t think they can’t get there by themselves with the right set of tools and context.
  3. Start with explaining the meaning of great ETA – without it, prediction is impossible. Consistency is a lost cause. This is one bad kitchen.
  4. Ask people to stand behind there ETA. They are responsible for it!
  5. In the same breath, remind them that quality work is the right path to get a correct ETA. Quality work will lead to shorter cycle eventually!
  6. But (here comes to tricky part!), they should also remember that over-refactoring leads to high quality of nothing important (as nothing really reach production).
  7. So, define together what is “quality work”. Set it as “team context”.
  8. Constantly hear what your guys has to say about “I wish we could fix it!”. Eliminate the big things (lack of tools, too many context-switches, not enough testing etc).
  9. Try to inject a lot of honesty, communication and great motivation to the team. This is the basic engine oil no one can live without.
  10. Measure delivery cycle length: spec –> design –> getting ETA –> feature is code-ready –> solving all bugs –> deployed in production.
  11. Talk with your guys to understand waste again. Eliminate the big things now! seriously!
  12. Keep hiring more people, only the best, you’ll need them soon enough.

Over time, the cycle of delivery should get much better while the quality will get even higher. The trick here is that people will feel more confident in the flow, “saving’” time to write quality piece of code, making features amazingly stable in the 1st attempt. This is the consistent rhythm you should dream of. The wheel spins…

 

Do you believe in God?

Can you count the number of times you were asked with this innocent question?

I tried to think and analyze the “motivation” behind this question, considering why people are so intrigued to hear my answer when putting me on the spot. Is it really that important for others to know what Oren have to say about God? I can’t ignore the simple fact that religion is a huge cornerstone in our life and finding more people with similar believes give us great comfort and sense of power. People are looking for acceptance, to become part of a group they can define and feel “right” in. This is a social aspect that is common to almost everything we do in life (love, friends, work, faith etc). I already know all of that and so do you. So, where am I going with this?

I’m trying to muse with the the notion of acceptance and where do I fit in. What is my “do you believe in God” question?

Personally, It’s striking me as plain crazy that our life is so driven to into what I grasp as the “wrong” groups. Do you believe in God? Do you eat Kosher food? Do you fast at Yom Kipor? Are you rightist? And the list goes on and on. Are these the questions we should really ask ourselves? is these how we want to measure ourselves or find others to connect with? I’m not sure I know the answers, but I feel there are some deep feelings, clear ideas and emotional borders that I’m so passionate about, they must represent my scales. They must represent my inner truth. Writing allow one to shoot ideas and thoughts on paper, so I tried to explain myself my truths. Hopefully it will give me a common ground to fit myself into a world I truly believe in and measure myself about what I see as right. Trying to synthesize my list, I came up with the 3 questions I have definite feeling of, (but) these are only my big questions so read it with the appropriate suspicion:

  1. Are you making someone else proud?
    I’m blessed with the best parents one can wish for and the greatest family one can be born into. The best thing they have done for me is giving me their complete trust, always stating how much faith they have in me making the right call. It was really hard for me to mess things up with that kind of love and support. Make sure that you’ve got someone in life that you can make them proud, may it be your parents, family, girlfriend, close friend or the girl from the cafeteria. “I’m proud of you” is huge empowerment and it’s a constant motivator to drive your life beyond the easy and all-so-common mediocrity. This source should be your impregnable place for positive enforcement, keeping you down to earth and posses the right confident to make tough choices in life.

    Be careful not to confuse making one proud with making one satisfied. You must own your path thus there is no place for blind acceptance. You need someone that trust your best efforts and push you forward in your struggles.

  2. Do you see yourself as a good person?
    Let’s avoid the definition of good for a second as it doesn’t really matter. Although some of us really cynical about the meaning (and with no doubt some are plain bad), we all know that Mother Teresa is a good soul. The pure notion of bad and good is embedded deep inside us. Try to be truthful to yourself, do you see yourself as a good person? how would you define good? Do you think you can measure “how good” are you? Do you think it’s important? Are you doing what it takes to get “better”? does it get easier to look at the mirror?

    You can call me what ever you might feel, but there is a romantic notion behind “being good”. The way I see it, my time here is limited so my constant question is “what am I going to leave behind?”. I would like to think that I’ve touched some people during my short life, that I managed to teach something as my great family and friends have taught me. I’m not sure that I made a great change (or even a small one), but at least I feel I did the best I can do with my believe of sincere communication. I’ll never be Mother Teresa, but at least I’ll do the best I can with what I feel is right for me, trying to focus on my strength and talent to make others picking their brain and wonder about themselves. Being there for others will make you feel less alone, less “on a road to nowhere”. What do you have to lose?

  3. Is there a way for you to help others?
    Do you feel lucky? If you do, you played your cards right somehow. Is there a way for you to show your tricks to the world? to make others think of a different way to look at things? Helping others makes a great closure between making others proud and making you feel as a good person. It’s a powerful tool you should use wisely. The purpose is not the “convert” someone into your believes or trying to bring him into your “group”. One should reach his own right and wrong, you can only help him by asking the right questions and trying to share your thoughts and experience. This way you are part of the journey, part of a legacy in other’s life. This is a great accomplishment to leave behind you.

The order of the questions is important. Without making others proud and have their support it’s impossible to be highly motivated all the time. Without constant motivation you won’t allow yourself to grow into the person you can become, making it easier to avoid helping others and seeking for the “wrong kind” of acceptance. Giving up is easy, fitting in just for the sake of acceptance will not make you happier (at least for long).

Don’t allow yourself to give up just because you were born to the wrong environment, that you had troubles as a child or you feel just like the world is against you. The world don’t owe you anything, you can blame your luck, your God (or lack of it) but bottom line it’s really up to you. Don’t hurt others to pay back to your awful sense of luck. Find your positive people around you and let them pick you up when needed, there is no shame of needing others help, they would love to do so in order to feel good about themselves and making others proud. You’ll do the same for them one day, I promise you that.

Everyone who knows me are familiar with my reiterative phrase “I firmly believe in natural growth”, this rule apply here as well. Find your own sources of strength and make sure that you keep your “good standards” before trying to influence others to ponder about their path in life. Once you have the confidence in your answers, you’ll be ready to open your view to the world and see what it has to offer.

Do I believe in God?
We’re probably ants in a giants world, but is it really matter? I would like to think that this God cares more about us as living being, trying to grow together than following strict rules without understanding the real truth behind them. Isn’t this the all purpose of the bible? of the stories we are taught since we were born? I wish our discussions would address more of the “are we doing the best we can?” and “how can we make it better?” rather than such a superficial scratch of our real purpose.

I would love to hear what is your truth and which questions drive you in your path. Drop me a comment if you feel like sharing.

 

ReaderWriterLock, my naive implementation

Ron challenged me to write my own implementation of reader/writers lock. He did a great job of doing his own implementation.

I came up with this:
(note: this is really naive implementation, it’s far from ideal in terms of fairness and possible racing conditions. Just take it as brain-teaser)

   1: public class StateReaderWriterLock : IDisposable
   2:     {
   3:         private readonly AutoResetEvent _changeStateAutoLock = new AutoResetEvent(false);
   4:         private readonly AutoResetEvent _writerDone = new AutoResetEvent(false);
   5:         private readonly object _readersLock = new object();
   6:         private readonly object _writersLock = new object();
   7:         private int _readers;
   8:         private int _state; // 0 is "neutral", 1 is read, 2 is write
   9:         private int _writers; 
  10:  
  11:         public void LockReader()
  12:         {
  13:             while (true)
  14:             {
  15:                 // try to bring the state from "neutral" or "read" to "read". If the current state is "write", let's wait.
  16:                 while (Interlocked.CompareExchange(ref _state, 1, 0) == 2)
  17:                     _changeStateAutoLock.WaitOne(); 
  18:  
  19:                 // an interesting case here where the last reader is now in ReleaseRead and we're trying to read as well, we might be too late (the writer might have changed the _state already)
  20:                 // if that happens - we're back to square one, but we still want to avoid recursive locking!
  21:                 bool loseInRace = false;
  22:                 lock (_readersLock)
  23:                 {
  24:                     if (Interlocked.CompareExchange(ref _state, 1, 0) == 2)
  25:                         loseInRace = true;
  26:                     else
  27:                         _readers++;
  28:                 }
  29:                 if (!loseInRace)
  30:                     return; // success!
  31:             }
  32:         } 
  33:  
  34:         public void ReleaseReader()
  35:         {
  36:             lock (_readersLock)
  37:             {
  38:                 _readers--; 
  39:  
  40:                 // if I am the last reader, let's reset the state so any given reader/writer can take it
  41:                 if (_readers == 0)
  42:                 {
  43:                     Thread.VolatileWrite(ref _state, 0);
  44:                     _changeStateAutoLock.Set();
  45:                 }
  46:             }
  47:         } 
  48:  
  49:         public void LockWriter()
  50:         {
  51:             while (true)
  52:             {
  53:                 // try to bring the state from "neutral" or "write" to "write". If the current state is "read", let's wait.
  54:                 while (Interlocked.CompareExchange(ref _state, 2, 0) == 1)
  55:                     _changeStateAutoLock.WaitOne(); 
  56:  
  57:                 bool loseInRace = false;
  58:                 lock (_writersLock)
  59:                 {
  60:                     if (Interlocked.CompareExchange(ref _state, 2, 0) == 1)
  61:                         loseInRace = true;
  62:                     else
  63:                         _writers++;
  64:                 } 
  65:  
  66:                 if (!loseInRace)
  67:                     break; // great success!
  68:             } 
  69:  
  70:             // allow 1 writer only
  71:             if (Thread.VolatileRead(ref _writers) > 1)
  72:             {
  73:                 _writerDone.WaitOne();
  74:             }
  75:         } 
  76:  
  77:         public void ReleaseWriter()
  78:         {
  79:             lock (_writersLock)
  80:             {
  81:                 _writers--; 
  82:  
  83:                 // if I am the last writer, let's reset the state so any given reader/writer can take it
  84:                 if (_writers == 0)
  85:                 {
  86:                     Thread.VolatileWrite(ref _state, 0);
  87:                     _changeStateAutoLock.Set();
  88:                 }
  89:                 else
  90:                 {
  91:                     _writerDone.Set();
  92:                 }
  93:             }
  94:         } 
  95:  
  96:         public void Dispose()
  97:         {
  98:             _changeStateAutoLock.Close();
  99:             _writerDone.Close();
 100:         }
 101:     }
 102:  

Not sure it’s the greatest job interview question, but it is indeed challenging and fun to play with.

 

On Interlocked.Increment and volatile

Imagine you’ve got the following code, running in 2 separate threads T1 and T2:

private bool _go = true;

T1                                      T2
while (_go)                        // … some code here …
{                                         _go = false; // something happened, let’s stop
   // … do work
}

Even if T2 will set _go to false, this code might lead to endless loop in T1. Why is that?
Each CPU have a local memory called L1 Cache, that might cache the value of _go. Multiple threads running on multiple CPU’s can (and will) cache data and re-order instructions to optimize code. So if for example, T1 is running on processorA and T2 is running on processorB, we might have an endless loop here. A simple solution here is to add the volatile keyword on the _go field definition to assure order and avoid caching on CPU level:

“The volatile modifier is usually used for a field that is accessed by multiple threads without using the lock statement to serialize access. Using the volatile modifier ensures that one thread retrieves the most up-to-date value written by another thread.” (MSDN)

Now let’s examine another example, again 2 threads T1 and T2:

private int _writers = 0;

T1                                                         T2
while (_writers > 1)                              ….
{                                                           Interlocked.Increment(ref _writers);
    _someAutoEvent.WaitOne();
}

You can see that we’re using an atomic increment via Interlocked.Increment method, so we should be thread-safe here right? not quite.
We’ve got one thread that is reading (T1) and another thread that is writing (T2) to _writers. If T1 is running in processorA and T2 in processorB, the _writers value will be cached on the CPU level for some time which means T1 might see different value than T2. If T2 would have catch the returned value from Interlocked.Increment and it as the only reader of that field, then it was thread-safe. This is not the case here.
The solution here is to use Thread.VolatileRead(ref _writers) to make sure T1 gets the latest value. We could have used lock keyword as well of course to serialize access to _writers field.

Summary:

  • Although setting a word size variable is atomic (like in our first example), it doesn’t mean it is thread-safe! For “boolean flags” I would recommend using volatile as a clean and simple solution.
  • For “counters scenarios”, I would have used Interlocked with Thread.ReadVolatile. This should outperform lock usage and still keep your code neat and shiny.
  • The lock keyword is probably the safest way to avoid dangerous race conditions, so unless you’re sure about your solution, keep it simple and use lock.
 

Talking about Microsoft CCR on ALT.NET Israel

It seems that someone has upload a 10 minutes video for some time now (thanks TristanIce!).
This is far from being a well prepared lecture (alt.net is all about “everyone can get up and talk”), I just tried to show the charm behind CCR after people started to share their pain developing multi-threaded applications. I’m talking crazy-fast and the material is not well organized so I had to think while talking/writing. The results are hard to watch. Multi-threading is a bitch eh?

All in all, the vibe was positive and people looked intrigued by it. Anyway, here it is:

Oren explains how to use CCR framework (MSN video, the movie is playing on the right)

 

Makes me want to prepare a lecture on the subject some day…

 

FileStreamer

Requirements:

  1. Be able to read all the lines in a given file.
  2. Be able to do so even if the file is HUGE ( == don’t load it all at once).
  3. Control the number of items I want to receive and whether or not the enumerator ignore empty lines. Always nice to have.
  4. Thread-safe should be supported easily. Think about 50 threads, each reading the next line and processing it.
  5. Nice performance is a plus.

Playing with the API with my teammate Ron gave the following (code written in notepad, stupidity won’t compile):

“Common foreach” usage:

foreach (string line in FileStreamer.GetLines(@”c:\temp\myfile.txt”, true, 1000)) { /* .. code */ } // read 1000 items from the file while ignoring empty lines.

Reading from multiple threads usage:

using (FileStreamer streamer = new FileStreamer(@”c:\temp\myfile.txt”, true, -1)) // -1 means no limit, read all non-empty lines
{
    Thread[] threads = new Thread[10];
    for(int i=0; i<threads.Length; i++)
    {
        threads[i] = new Thread((ThreadStart)delegate {
            string line;
            if (!streamer.TryGetNextLine(out line)) // thread safe!
                return; // end of file, we can exit

            // do work …
        });

        threads[i].Start();
    }

    // join the threads + whatever …
}

 

After reading a few ideas in stackoverflow, I thought to share my solution:    

// written by bogen (30/12/2008)
 
#region using
 
using System;
using System.Collections.Generic;
using System.IO;
 
#endregion
 
namespace Semingo.Common.Utils
{
    /// <summary>
    /// Return a stream of lines for the specified file.
    /// This class is thread safe by design!
    /// Use the static method FileStreamer.GetLines for not thread safe usage (via foreach)
    /// </summary>
    public class FileStreamer : IDisposable
    {
        #region fields
 
        private readonly object _locker = new object();
        private readonly string _path;
        private readonly bool _ignoreEmptyLines;
        private readonly int _limit;
        private readonly IEnumerator<string> _enumerator;
        private int _linesGiven;
        private bool _disposed;
 
        #endregion
 
        #region ctors
 
        /// <summary>
        /// Create a file streamer instance
        /// </summary>
        /// <param name="path">File path</param>
        public FileStreamer(string path) : this(path, false, -1)
        {
        }
 
        /// <summary>
        /// Create a file streamer instance
        /// </summary>
        /// <param name="path">File path</param>
        /// <param name="ignoreEmptyLines">Should the streamer avoid empty lines</param>
        /// <param name="limit">Number of maximum lines the streamer should return. Send -1 for no limit</param>
        public FileStreamer(string path, bool ignoreEmptyLines, int limit)
        {
            if (!File.Exists(path))
                throw new ArgumentException("Cannot find the file: " + path);
            if (limit != -1 && limit <=0 )
                throw new ArgumentException("Limit must be bigger than 0 (or -1 for no limit) but was: " + limit + ". File given was: " + path);
 
            _path = path;
            _ignoreEmptyLines = ignoreEmptyLines;
            _limit = limit;
            
            _enumerator = CreateStream().GetEnumerator();
        }
 
        #endregion
 
        #region public API
 
        public bool TryGetNextLine(out string nextItem)
        {
            lock (_locker)
            {
                return TryGetNextLineAssumingInsideLock(out nextItem);
            }
        }
 
        public bool TryGetNextLines(out ICollection<string> nextItems, int howMany)
        {
            if (howMany <= 0)
                throw new ArgumentException("'howMany' parameter must be > 0 but was " + howMany, "howMany");
 
            nextItems = new List<string>(howMany);
            lock (_locker)
            {
                string nextItem;
                for(int i=0; i<howMany; i++)
                {
                    if (!TryGetNextLineAssumingInsideLock(out nextItem))
                        break; // no more lines (EOF)
                    
                    nextItems.Add(nextItem);
                }
            }
 
            return nextItems.Count > 0;
        }
       
        public static IEnumerable<string> GetLines(string path)
        {
            return GetLines(path, false, -1);
        }
 
        /// <summary>
        /// 
        /// </summary>
        /// <param name="path"></param>
        /// <param name="ignoreEmptyLines"></param>
        /// <param name="limit">send -1 for no limit</param>
        /// <returns></returns>
        public static IEnumerable<string> GetLines(string path, bool ignoreEmptyLines, int limit)
        {
            using (FileStreamer streamer = new FileStreamer(path, ignoreEmptyLines, limit))
            {
                string nextItem;
                while (streamer.TryGetNextLine(out nextItem))
                    yield return nextItem;
 
                yield break; // EOF
            }
        }
 
        ///<summary>
        ///Performs application-defined tasks associated with freeing, releasing, or resetting unmanaged resources.
        ///</summary>
        public void Dispose()
        {
            Dispose(true);
            GC.SuppressFinalize(this);
        }
 
        #endregion
 
        #region private API
 
        /// <summary>
        /// Get the next line in the file.
        /// dev: assume that the lock is from the outside, by the caller (this is why it's a private method)
        /// </summary>
        private bool TryGetNextLineAssumingInsideLock(out string nextItem)
        {
            nextItem = null;
            if (_linesGiven == _limit)
                return false; // we reached the limit, no more please.
 
            if (!_enumerator.MoveNext())
                return false; // end of stream (EOF)
 
            nextItem = _enumerator.Current;
            _linesGiven++;
            return true;
        }
 
        private IEnumerable<string> CreateStream()
        {
            using (FileStream fs = new FileStream(_path, FileMode.Open, FileAccess.Read, FileShare.Read, 1024, FileOptions.SequentialScan))
            using (StreamReader reader = new StreamReader(fs))
            {
                string line;
                while ((line = reader.ReadLine()) != null)
                {
                    if (_ignoreEmptyLines && line == string.Empty)
                        continue; // skip empty lines if needed
 
                    yield return line;
                }
 
                yield break;
            }
        }
 
        protected void Dispose(bool disposing)
        {
            if (_disposed)
                return;
 
            if (disposing)
            {
                _enumerator.Dispose();
            }
 
            _disposed = true;
        }
 
        #endregion
    }
}
 

Part3: SpawnEnumerator, dealing with billions of independent items

[ Part 1, Part 2, this is Part 3 ]

In part 1 I’ve talked about the general notion behind SpawnEnumerator. In part 2 I’ve implemented it and talked a bit about CCR and how it works under the hood.
In the third and last part I’ll try to show how we can actually execute billions of tasks very easily.

Iterating over billions of tasks:

In my previous post I’ve tried to show one of the greatest yet unused features in C# 2.0 – the yield statement. I’ve closed the post with:
“Prefer using the yield statement as long as calculating values doesn’t require holding an expensive resource like a DB connection, or a FileHandler for long period of time.”

Now, let’s assume that we have 100,000,000 Tasks to pull from a repository (DB, files, doesn’t really matter) and execute them all via SpawnEnumerator.
Should we define a list of Task, fill it with 100,000,000 items and then iterate over it? of course not! we can’t even if we want to.
Assuming that reading 1000 (for example) Tasks is much faster than executing them (it usually is), and it’s safe to hold them in memory, we can “bulk read & yield” the entire thing:

public delegate T Func<T>();

public static class Yield
{
    public static IEnumerable<T> Bulked<T>(Func<IEnumerable<T>> bulkYielder)
    {
        while (true)
        {
            IEnumerable<T> yielder = bulkYielder();
            if (yielder == null)
                throw new ArgumentException(“bulkYielder cannot return a null enumerable”, “bulkYielder”);

            int itemsGiven = 0;
            foreach (T t in yielder)
            {
                itemsGiven++;
                yield return t;
            }

            if (itemsGiven == 0)
                break;
        }
    }
}

This can be used to return a stream of “tasks bulk” :

Yield.Bulked<Task>(delegate { return tasksRepository.Dequeue(1000); }) // return the next 1000 from the 100,000,000 items queue until no more tasks exists in the repository.

Important note:
In this scenario, we should return 1000 items from Dequeue method without using yield to avoid holding expensive resources for long period of time (as discussed).
Once we have those 1000 items, we can yield each one of them to the caller (very cheap, no resources are used).

Finally, we can execute billions of tasks in parallel via SpawnEnumerator:

IEnumerator<Task> tasksEnumerator = Yield.Bulked<Task>(delegate { return tasksRepository.Dequeue(1000); });
Action<Task> taskHandler = delegate(Task t) { /* execute single task */ };

// execute the tasks in parallel, using 50 threads and holding ~1000 items in memory.
_tasksExecutor = SpawnEnumerator<Task>.Start(50, “mypool”, 1000, tasksEnumerator, taskHandler);

// note:don’t forget to dispose _tasksExecutor when killing/stopping app!

Using Yield.Bulked guarantees that we won’t hold expensive resources for too long while allowing us to generate a “stream” of Tasks to run in parallel.
The code is easy to read and follow (I hope) and we gain a simple method for executing billions of tasks very effectively (CPU & memory wise).

 

The beauty of yield statement

How many of you played with C# yield statement ? I guess that most of you did.
Anyway, like most of the MSDN examples out there, when used incorrectly, it could introduce very bad behavior to your code. Consider the following:

public IEnumerable<User> GetUsers(int count)
{
    using (MysqlConnection connection = new MysqlConnection(“…”))
    {
        // MysqlDataReader reader = create an MysqlCommand and execute it
        while (reader.Read())
            yield return new User(/*… fill parameters from the reader … */);
    }
}

Looks pretty harmless right? Not quite. The yield statement is actually transformed to a “state machine” which means that every time we yield a result back to the client (the caller of GetUsers in our example), we wait for the client to call to the next item (via IEnumerator<T>.MoveNext()). The code above will hold the connection open until the client done iterating all of the User items. This will lead into major scalability issues very quickly! You should always keep your DB connections open for short period to prevent connection exhaustion (threads waiting for available DB connection in the pool for long period, until timeout). Because yield returns the control to the caller, it might be that the caller will “take his time” thus leading to connection exhaustion.

On the other hand, used wisely and yield yields (lame joke, sorry) HUGE benefits:
[note: code written in notepad, stupidity won’t compile]

  • Avoid useless memory allocations

#1:
How many times you end up creating something like this:

    public List<T> Filter(List<T> input, Predicate<T> predicate)
    {
       List<T> output = new List<T>(input.Count /2);
       foreach (T item in input)
          if (predicate(item))
             output.Add(item);

       return output;
   }

We allocate much more memory than we need only to hold the output during the calculation. A better approach will be:

    public IEnumerator<T> Filter(IEnumerator<T> input, Predicate<T> predicate)
    {
       foreach (T item in input)
          if (predicate(item))
             yield return item;
   }

This way we allocate only one T at a time (will be saved in the generated state machine). In addition, the client could choose to send each item via yield as well, thus saving the need to create the “input” before calling our Filter method.


#2:

Another oh-(gosh-why)-so-common example is the following:

public void Save(T item)
{
    Save(new T[] { item });
}

public void Save(ICollection<T> items)
{
    // do your magic here to save items
}

Assuming you call Save with a single item quite a lot, you’re allocating A LOT of memory to create one-item arrays. A better approach will be:

public void Save(T item)
{
    Save(Yield.One(item));
}

public void Save(IEnumerator<T> items)
{
    // do your magic here to save items
}

public static class Yield
{
    public static IEnumerator<T> One<T>(T item)
    {
        yield item;
    }
}

~Zero memory allocation here.

  • Avoid “impossible” memory allocation

Let’s say you want to read a 20G file with emails where every line holds a single email. Trying to declare a List<string> and filling it up will make your memory blow up obviously. You simply can’t hold that much in memory. Instead, you can use Stream.ReadLine and yield back each row to your client, until all of the emails are taken care of. Yes, you can try to read the file in chunks (keeping a pointer), but this is exactly what yield does under the hood. Reminder: DRY principle is gold (or Don’t Repeat .Net Framework, in our case).

  • Execute synchronous code asynchronously (nicely achieved via CCR)

A bit advanced, but you can read all about it here. The great benefit is you can transform (almost) any “yield based” code to run async, if needed/wanted.


Recap:
Prefer using the yield statement as long as calculating values doesn’t require holding an expensive resource like a DB connection, or a FileHandler for long period of time.