HttpWebRequest, avoiding the pitfalls

It seems that there are MANY ways to perform http web request poorly. This is a huge problem in today’s world where web-services are more common than bankrupt banks. Here is a quick pattern of how to do it right:

public string Fetch(Uri requestUri)
{

HttpWebRequest webRequest = (HttpWebRequest)WebRequest.Create(requestUri);

webRequest.Timeout = requestConnectTimeoutInMs; // take timeout from config
webRequest.ReadWriteTimeout = requestReadWriteTimeoutInMs; // take timeout from config

using (WebResponse webResponse = webRequest.GetResponse())
using (StreamReader streamReader = new StreamReader(new TimeoutStream(webResponse.GetResponseStream(), fetchTimeoutInMs)) // take timeout from config
     return streamReader.ReadToEnd();

}

Details:

  1. Setting Timeout property: to make sure we don’t wait the default 100 seconds for “ACK” from the server. WAY too much.
  2. Setting ReadWriteTimeout: This is crucial to understand. StreamReader under the hood read data in chunks, this timeout determine how much time you should wait for reading a single chunk. 100 seconds, the default value, is again WAY too much.
  3. Using TimeoutStream (you need to implement your own or let me know if you’re interested and I’ll send it to you): Alright, let’s say you’re willing to wait for 500ms for ACK (Timeout), up to 500ms for reading every chunk (ReadWriteTimeout) but not more than 5 seconds for the entire read to complete. There is no way to achieve it without TimeoutStream. It will start a timer internally and override Seek/Read/Write (etc) method by checking the timer before calling the internal stream method. TimeoutStream is a very simple wrapper around Stream. For example:
  4.    public override int Read(byte[] buffer, int offset, int count)
          {
              CheckTimeout(); // throw TimeoutException if timeout was reached
              return _stream.Read(buffer, offset, count);
          }

Multiple HttpWebRequest limitation:
By default, you can’t perform more than 2-3 async HttpWebRequest (depends on the OS). In order to override it (the easiest way, IMHO) don’t forget to add this under <configuration> section in the application’s config file:

<system.net>
  <connectionManagement>
     <add address=”*” maxconnection=”65000″ />
  </connectionManagement>
</system.net>

Why should you follow these guidelines:

  1. Never trust 3rd party components: avoid excuses like “my site is not responsive because 1000 threads are waiting for web-service-X to respond”. By setting those parameters you’re safe to make your own choices of how much time to wait. Log and monitor these things to adjust your application and alert your suppliers.
  2. Be able to determine your own SLA for the world: again, if internally you need to call a web-service, make sure you’re able to control the time you’re willing to spend. You’ve got clients to serve and they want you to meet the SLA as you promised!

Important note about recycling HttpWebRequest.GetResposne()
Simply put, it’s not working by design. That means that if you fail to get a response on time (due to 1,2 or 3), don’t call the webRequest.GetResponse() again as it is cached internally (you’ll get the same HttpWebResposne). What you should do is to re-create the HttpWebRequest and try again. I don’t agree with the selected design by Microsoft for this method, but at least it’s good to be aware of it.

   from MSDN:

   ” Multiple calls to GetResponse return the same response object; the request is not reissued. ”

Final note:
You should obviously consider writing a HttpWebRequestHelper class (or extension method) and use it instead of copy&paste this code all over your codebase.

 

Part2: SpawnEnumerator, dealing with billions of independent items

[ Part 1, this is Part 2, Part 3 ]

In Part 1 I’ve talked about the general notions behind SpawnEnumerator and played with the API. If you’re not familiar with Microsoft’s CCR, this post might require a 2nd & 3rd read to understand completely. CCR changed the way you should think or address async code. It’s a game worth playing and studying the rules is only for your advantage. Alright, enough chit chat, let’s make it happen (complete code attached at the end).

Class definition:

  public sealed class SpawnEnumerator<T> : IDisposable
       private SpawnEnumerator(int threadsCount, string threadsPoolName, IEnumerator<T> filler) { /* initialize fields by parameters, nothing more */ }

Now, let’s have a look at some of the fields:

  private const double DefaultLowThresholdPrcentage = 0.1; // When “items to process” queue reach 10%, we want to re-fill. Should be exposed of course.

  private event Action<DateTime> _enumeratorDepleted = delegate { }; // trigger when the enumerator is empty
  private event Action<DateTime> _allTasksAreCompleted = delegate { }; // trigger when all tasks are completed

  private readonly Dispatcher _dispatcher; // our “threadpool”
  private readonly DispatcherQueue _dispatcherQueue; // hold actual ITask, waiting for the dispatcher (aka worker threads) to handle them. more about it soon.

  private readonly Port<T> _itemsToProcessPort;  // hold wannabe tasks, currently a queue of items of T we want to process.
  private readonly Port<EmptyValue> _itemCompleteNotificationPort; // soon…
  private readonly Port<EmptyValue> _initializePort;  // soon…

  private readonly IEnumerator<T> _enumerator; // the enumerator we’ll use to fill the _itemsToProcessPort

Deeper look on what we have so far:

  • _itemsToProcessPort: will act as the queue of items we want to process. In CCR’s world, Port<T> is actually a “smart queue” (more on it later).
  • _itemCompleteNotificationPort: will be used to notify on every completed item. Assuming that we need to fill 100 items to the port, and our lower limit is 10%, we want to re-fill the _itemsToProcessPort every 90 completed items. Notice we’re using EmptyValue as T. EmptyValue is a CCR type that holds EmptyValue.SharedInstance to avoid memory allocation.
  • _initializePort: will be used to initialize the _itemsToProcessPort with the 1st bulk of items.

Step back, what’s going on?!!? why so many Port ?
Well, the idea behind CCR is all about messaging. You can pass messages to ports (Port<T> is thread-safe of course) and by doing so, you can take advantage of the “smart queue” implementation behind Port<T>. When posting a message to a port, the CCR will try to apply some “predicates” on the port and if a “predicate” returns true, it will dequeue the item(s) from the Port matched that “predicate”, create an ITask of it and push it into the DispatcherQueue as an “actual task”.

Using Ports makes it easier for us to define complex async code. instead of putting locks all over the place, I can simply post a message to _itemCompleteNotificationPort and after X messages posted to this port, ask to re-fill the queue when a thread is available. This is much easier then counting each completed item and if ((counter % X) == 0), lock some object and re-fill. Both will work, but using the CCR world you don’t have to think about technical async problems/solutions but rather on the logical async operations you want to perform. You’ll write much less code, zero locks of your own and mostly think about “this could run concurrently”, “this must run exclusively” and let the CCR schedule everything for you.

Start method: (as discussed in Part 1)
public static SpawnEnumerator<T> Start(int threadsCount, string threadsPoolName, int upperQueueSize, IEnumerable<T> filler, Action<T> handler)
{
  // .. validate parameters, nothing interesting…
  SpawnEnumerator<T> sw = new SpawnEnumerator<T>(threadsCount, threadsPoolName, filler.GetEnumerator());
  sw.Initialize(handler, upperQueueSize);
  return sw;
}

API Design: Why Start method with private constructor instead of public constructor alone:
The client of this method should understand that once she supply the arguments, things will start happening – we’ll immediately start to process items from the enumerator. You’ll soon find out that Start is non-blocking method. This “Start” method, so I feel, make it’s it more explicit as it should be.

Initialize method:
private void Initialize(Action<T> handler, int upperQueueSize)
{
  RegisterRecievers(handler, upperQueueSize); // where CCR *magic* happens. soon…
  _initializePort.Post(EmptyValue.SharedInstance); // post a message to let “someone” know we want to fill the 1st bulk of items
}

RegisterRecievers method:
Before we look at the code, here is a remainder of the main things we want to accomplish:

  1. We should fill the _itemsToProcessPort for the 1st bulk or once we reach the lower limit of the queue, by counting how many items were completed. Keeping in mind that _enumerator is not thread-safe and we don’t want to start locking access to it on our own, we should make sure that re-filling is done exclusively from 1 thread only.
  2. We want to handle each one of the items posted to _itemsToProcessPort with the supplied “handler” (given in Start method). Each item is independent so obviously we want to process each item concurrently, according to the amount of threads in the Dispatcher.

private void RegisterRecievers(Action<T> handler, int upperQueueSize)
{
  int numberOfItemsToDepleteBeforePushingNewBulk = (int)Math.Ceiling((1 – DefaultLowThresholdPrcentage) * upperQueueSize);

  Arbiter.Activate(_dispatcherQueue,

     Arbiter.Interleave(
         new TeardownReceiverGroup(),// nothing here
         new ExclusiveReceiverGroup(
                // 1st bulk:
                Arbiter.Receive(false, _initializePort, delegate { FillItemsToProcessQueueWithNextBulk(upperQueueSize); }), // only once, that’s why the “false” is here.

                // enough items were completed which means “items to process” queue reached lower limit:
                Arbiter.MultipleItemReceive(true, _itemCompleteNotificationPort, numberOfItemsToDepleteBeforePushingNewBulk,
                                          delegate { FillItemsToProcessQueueWithNextBulk(upperQueueSize); })
            ),
         new ConcurrentReceiverGroup(
                // process items concurrently
                Arbiter.Receive(true, _itemsToProcessPort, // listen to every post, that’s why the “true”
                     delegate(T item)
                     {
                         try
                         {
                             handler(item);
                         }
                         catch (Exception err) { /* log error */ }
                         finally
                         {
                             HandleCompletedItem();
                         }
                     })
            )
        )
    );
}

Alright, this is a bit harder to read but let’s try to simplify it by reading it from inside-out.

  • Under ExclusiveReceiverGroup you’ll see 2 receivers, one that listen to _initalizePort and once a message is posted, it will create an ITask that internally (when a thread is available) call FillItemsToProcessQueueWithNextBulk method. The second receiver will listen to _itemCompleteNotificationPort and do the same as the first one for every numberOfItemsToDepleteBeforePushingNewBulk items posted to the Port.
  • Under ConcurrentReceiverGroup you’ll see a receiver listening to _itemsToProcessPort and for every message it will create an ITask that internally (again, one a thread is available) will run the given “item handler” with the item dequeued from the Port.
  • We use Arbiter.Activate(_dispatcherQueue, …) to register each created ITask from the receivers to the internal queue. Queue of actual tasks.
  • This method is non-blocking, we only register receivers that know how to create and enqueue ITask from messages posted to some Port. That’s it.

Recap:
We post messages to different Port<T> and write receivers that “listen” to the messages based on some rules (one receiver listen to every message, one listen to only one message, one listen to X messages etc). When the receiver’s rules apply, the CCR will dequeue all the relevant (applied the rule) messages from the Port<T> and wrap them with the supplied delegated as ITask. This ITask instance will be enqueued to the DispatcherQueue until the Dispatcher have a free thread to handle it. The Dispatcher, according to the requested scheduling (some ITask must run exclusively, some can run concurrently, as we’ve seen), will execute those tasks.

You can download the complete code (with some extra features) here: SpawnEnumerator.txt (12KB) – you’ll need CCR & log4net dll in order to compile it.

 

Part1: SpawnEnumerator, dealing with billions of independent items

[ this is Part 1, Part 2, Part 3 ]

I promised to write more about CCR and how to use it in practice.
There is no better way to explain how infrastructure work then writing some code and play with it, so lets toy with a simple utility, based on CCR under the hood.

Scenario: you’ve got billions of *independent* tasks you need to execute as fast as possible.
Requirements:

  1. Amount of “worker threads” should be easy to define. obviously.
  2. Memory consumption – don’t eat more than you can chew. We don’t want to hold everything in memory, this should be easy to configure.
  3. The process should *not* shutdown due to un-handled exception in one of the worker threads.
  4. We want to know when the enumerator is depleted and/or all items were completed. This is crucial for testing and adjusting parameters.

 

The general notion behind SpawnEnumerator is something like this:

SpawnEnumerator_generalIdea

 

We’re pulling a bulk of items from the enumerator, each one of them is actually a “wannabe” item. Combining the value of each T with a given “item handler” could be transformed into an actual task, waiting for its turn in the “threads pool”. Once the port/queue of “items to execute” reach the lower limit, we’ll pull another bulk of items from the enumerator until the enumerator is depleted.

API playground:

SpawnEnumerator<T>.Start(int numberOfThreads, string threadPoolName, int numberOfItemsToHoldInMemory, IEnumerator<T> enumerator, Action<T> singleItemHandler)

T = type of items we would like to run. if we have 10,000,000 items of type int, we’ll supply “int” as T.
numberOfThreads = how many worker threads do we want to run?
threadPoolName = name the worker threads for easier debugging.
numberOfItemsToHoldInMemory = rough number of items we want to hold in memory, to make sure memory consumption won’t blow up in our face.
enumerator = the enumerator of “items” to execute, assuming that it will hold a *huge* amount of tasks.
singleItemHandler = delegate that receives 1 item and handle it.

event Action<DateTime> EnumeratorDepleted

event Action<DateTime> AllTasksAreCompleted

Test playground:

(written in notepad, sorry for stupid mistakes)

[Test]
public void Execute_LargeEnumeratorOfNumbers_ExecuteAllItems
{

// arrange:
int numberOfThreads = 10;
string threadPoolName = “mypool”;
int numberOfItemsToHoldInMemory = 100;

IEnumerator<int> items = Yield.For(1, 10000); // yield return numbers from 1-10000
Action<int> singleItemHandler = delegate(int item) {  /* sum the given number, via Interlocked.Add */ };

ManualResetEvent trigger = new ManualResetEvent(false);

// act:
_executor = SpawnEnumerator<int>.Start(threadPoolSize, threadPoolName, numberOfItemsToHoldInMemory, items, singleItemHandler))
_executor.AllTasksAreCompleted += delegate { trigger.Set(); };   

// assert:
bool signaled = trigger.WaitOne(TimeSpan.FromSeconds(1), false);
Assert.IsTrue(signaled, “timedout reached, that shouldn’t happen!”);
// assert that all of the items were called by matching sum of 1-N sequential numbers (simple formula) to what we collected in the handler

}

* In the “test teardown” we can check if _executor is not null and if so Dispose it to close the “worker threads”.

 

Next post – Implementing SpawnEnumerator via CCR.

 

Building search queries in Bugzilla

This is mostly a self-note but heck, maybe someone will reach this post via Delver (or Google :)), so I’m all about sharing.

Anyway, we’re using (for now) Bugzilla and I tried to get all the “open” bugs for my team with status equal to X,Y,Z.

Sounds easy right? well, you are … sadly mistaken.
After investing 15 minutes banging my head into the nearest wall with “Bugzilla Advanced Serach” (well, I’ll be polite and say it’s “advanced” alright), I gave in on that one.
Instead, I hacked the url a bit to understand the dark voodoo of Bugzilla and voila, 2 minutes later:

http://qabugz/cgi-bin/bugzilla/buglist.cgi
?bug_status=NEW
&bug_status=ASSIGNED
&bug_status=REOPENED
&assigned_to=Joe
&assigned_to=Joe2
&assigned_to=Joe3
&query_format=specific
&field0-0-0=bug_status
&field0-0-1=assigned_to
&type0-0-0=anyexact
&type0-0-1=anyexact
&order=bugs.bug_severity

Just remove the “break line” of course (this is easier to edit) and replace “Joe” with your favorite developer name.

If you want to make it a bit more complex, here is a nice site with the supported field names:
http://pkp.sfu.ca/bugzilla/page.cgi?id=quicksearchhack.html

 

Anyway, I hope it will help someone.

* geeky btw – notice the funny “binary index” of the fields! made me laugh quite a bit, being the geek that I am.

 

Self Organization inside a Team

One of the biggest challenges in management is be able to track your own rhythm, making sure your plans are executed and things go smoothly.

In my previous post about Driven By Self Organization I stressed the importance of making things visible and plan for the future. How can one set the environment of the team to drive the entire team to success?

Thinking about it lately made me think that I have my own thoughts about what should be visible, how the team should react internally and how should one behave in such team as a Team Lead. Don’t get me wrong, I don’t have 30 years of experience leading small<->giant teams for small<->giant companies, these ideas are solely based on my gut feelings: making things SIMPLE (KISS) so the entire team will be driven by self-organization without the “burden” of self-management. If things are easy to do, it’s easier to get better at them.

The trick is to allow all of the members in the team to be part of the organize->execute game. Some will play, some won’t, but they will all be affected by the always-ready environment and notice how inner-interaction change their day. By making it visible, they will be motivated to take action (it will feel natural). I came up with this drawing to expose our sprint plans and progress:

 

 image

 

This is a very SIMPLE presentation of the current sprint features and every-day progress – “cards” will move between columns as people work on them.
There are some magic “self organization” tricks integrated into this 1-Visible-view:

  1. Achievement-based planning: Before arranging the features/cards, we try to set the “deliveries” for the week. This is a free-text (don’t be tempted to make it a list of features) ideas to “set the mood” for the week. It will describe features we want to finish, some design we need to handle before the next week, some quality check we want to pass (“make sure no critical/major bugs are open”) etc. They help to plan the week as they define goals that are measurable, driving the team to achievement-based planning rather than like-best-do-first development.
  2. Visibility is key: Every member of the team knows exactly what’s left for the entire team on a weekly level. No more “but I finished all my tasks two days earlier than we’ve talked! what can I do that Joe is new here and couldn’t continue as you thought? oh, I didn’t know we are dependent on his task before we can release the package…”
  3. Help each other: remind yourselves that things need to be completed, help one another by helping out when someone is behind of schedule (alright Joe, I’ll take Feature A, don’t worry). The board will make it easier to know “where can I help?”
  4. We want Quality: Nothing is DONE until it’s tested and fixed. The idea of splitting “coded” tasks from “tested” tasks is to set the mood for “production ready” code.
  5. Small goals are easier to achieve: Splitting the sprint into smaller chunks make it easier to win small battles. Each week defines small trophies – the “deliveries” we promised for that week.
  6. Plan leftovers for tomorrow: at the end of every week\sprint, you could easily see what was left. Discuss why it failed and plan it for tomorrow (=next week or next sprint).

 

Team Lead in such a team will mostly act as a coach, helping the team members to split the features into tasks, remove obstacles, motivate cooperation and taking notes about “how can we get better?”. Most importantly, it will allow her (or him) to be productive and feel he can help the team’s effort instead of the constant-chaos feeling managers tend to have when things go poorly. The team members are aware of the plans and can balance the efforts to break loose of this chaos-like feeling.

 

Driven by Self Organization

Think about a young fellow, wanting to get into shape. Here are 2 scenarios of how one can tackle that wish:

1. Create a plan and “manage yourself” to keep it.

   20:00 – set the clock to 06:00
   06:00 – wake up, make sure you’re not falling back to sleep! get dressed etc
   06:15 – make sure to eat something small and drink some water
   06:35 – run 5km
   08:00 – make sure your bag is ready for school
   08:15 – go to school
   …
   20:00 – set the clock to 06:00


2.
Organize things to drive your day:

   20:00 – put your shoes next to your bed, set your clock to 06:00 and set it FAR from the bed, prepare a little something to eat for tomorrow morning, prepare your bag for (tomorrow’s) school.
   06:00 – wake up – go close the damn clock (you need to stand in order to do it), get dressed etc.
   06:15 – eat something small and drink some water
   06:35 – run 5km
   08:00 – relax for a few minutes and then go to school
   …
   20:00 – put your shoes next to your bed, set your clock to 06:00 and set it FAR from the bed, prepare a little something to eat for tomorrow morning, prepare your bag for (tomorrow’s) school.

Where most of us fail?

I might be wrong, but it seems that a lot of us (myself included) simply can’t manage our time wisely during the day, for long period of time. It’s too easy to forget something to do TODAY when you had to force yourself to plan it TODAY. It’s too easy to fail. It’s too easy to stop the rhythm.

This is why most of us can’t lose weight, can’t get into shape, can’t read 5 books every month etc.


Driven by Self-organization

To me, it means that I want to set my environment to drive me into success. If I’ll take the time to prepare my tomorrow, little chance I’ll fail due to laziness: (1) I’m planning for TOMORROW, so what do I care to invest the time? I only plan things and set the environment / mood. I don’t need to run the 5km now, I only want to make sure it will be easier to achieve tomorrow and (2) tomorrow morning, surprise surprise, everything is ready for me! I don’t need to wake up and find out that I forgot a stupid thing like the fact that my shoes are in the washing machine. The rules are pretty easy: de-couple planning from performing and make sure everything you need for those actions are visible and available for the time you’ll need them.

Although it seems like planning is more tedious than actually doing the task, planning for the future is quite relaxing. Try it: plan your tomorrow at the end of today, think about what should be set so your tomorrow will go smoothly. It will make it easier for you to come in the morning and simply perform, without the burden of planning fast to perform now:

“hhmmm… alright, it’s Wednesday, what should I do today… gosh, so many things to complete! Maybe I’ll start with sending those emails… naaa.. don’t have power for that now.. maybe I’ll finish that task I promised yesterday! naa.. she don’t need it for today anyway… crap! I don’t have the energy to deal with it! alright, first thing is to grab a cup of coffee”.

This leads to “do what I like best first” syndrome.

At the end of the day, before going home, go over the things you’ve accomplished and try to see what is left for tomorrow. Relax, you don’t need to do those chores now. Write everything you should accomplish for tomorrow down (with time estimations) and go home smiling, knowing that your tomorrow is best planned for achievements rather than for your personal whims.

 

Deep dive into CCR and Parallel Extensions: kick-off

I’m preparing a set of articles regarding Microsoft CCR and Parallel Extensions, trying to explain the multi threading libraries supplied by MS and how they are going to change our life in the multi-core world.

I thought about starting with some architecture point of view for each library, going over the data structures and then demonstrate usage for real life scenarios. I’ll start with level 100 articles and dive deeper until complex scenarios and how to harness the framework’s full power to your needs.

We had the pleasure here at Delver working with CCR so I’ll share with you some of the code base we created on top of it, some tips for using it correctly and my take about using these libraries in mainstream applications.

 

Some nice articles to get you started if you didn’t hear about these libraries so far:

Parallel Programming with .NET : Most Common Performance Issues in Parallel Programs

Concurrent Affairs: Concurrency and Coordination Runtime — MSDN Magazine, September 2006

Channel9 Wiki: Concurrency Runtime

Parallel Programming with .NET : Coordination Data Structures Overview

Parallel Programming with .NET : Fork/Join parallelism with .NET CountdownEvent

Parallel Programming with .NET : Wrapping an APM implementation with Future<T>

 

Is there anything specific you’re interested to read about regarding those topics ?

 

Hosting an application source on an external repository

As I mentioned before, Pasha and I are in the process of developing a small application and we still thinking about where to place our Source Control repository. At the beginning, we thought about hosting it on my computer but after a few hours of trying to work with SubVersion (I need to read a little, I thought it would be simpler) configuration and some attempts to configure VSS 2005 I realized (actually, Dror realized, I just nodded) that my router is not suit to serve as a host. We now have 2 solutions: (1) order a new router (about 100$), buy a static IP and configure it properly (as I told Dror: “I’m a code hacker, not a network hacker”) which means a lot of time by my side(and 4-7 days of waiting for the router to arrive) or I can (2) pay for a repository server and let someone else take care of static IP, and the configuration. Oren Eini proposed working with www.hosted-projects.com which seems like a nice service – they provide SubVersion & BugTracking for a nice fee. Lacking a solid Source Control really keep us from coding in full speed, so it’s urgent to set up our repository in the next couple of days. Any ideas ?


p.s – no, it’s not Open Source :-).