Login or Sign Up to become a member!
LessThanDot Sit Logo

LessThanDot

Desktop Developer

Less Than Dot is a community of passionate IT professionals and enthusiasts dedicated to sharing technical knowledge, experience, and assistance. Inside you will find reference materials, interesting technical discussions, and expert tips and commentary. Once you register for an account you will have immediate access to the forums and all past articles and commentaries.

LTD Social Sitings

Lessthandot twitter Lessthandot Linkedin Lessthandot friendfeed Lessthandot facebook Lessthandot rss

Note: Watch for social icons on posts by your favorite authors to follow their postings on these and other social sites.

Your profile

    Search

    XML Feeds

    Google Ads

    « Why you should not add formats to Microsoft Access tablesServiceStack.Text has a nice extension method called Dump and it has a few friends »
    comments

    The basic Azure Worker Role consists of a run method, an endless loop, and a sleep statement. Earlier this week, Magnus Martensson walked through implementing a more sophisticated wait object than the generic Thread.Sleep call. Which reminded me of a problem inherent in the basic Microsoft template.

    Every exit is a crash.

    The basic Worker Role is a while(true) statement that alternates between doing work and sleeping for a period of time. When it's time for Azure to recycle the instance, deploy a new one, scale...what happens to this while(true) statement?

    It's killed.

    The more critical it was, the higher our chances it was in the work side of the work/sleep loop.

    (Ouch.)

    The Basic Worker Role

    Here is the worker class that Visual Studio generates on when creating a new Worker Role project:

    1. public class WorkerRole : RoleEntryPoint
    2. {
    3.     public override void Run()
    4.     {
    5.         // This is a sample worker implementation. Replace with your logic.
    6.         Trace.WriteLine("WorkerRole1 entry point called", "Information");
    7.  
    8.         while (true)
    9.         {
    10.             Thread.Sleep(10000);
    11.             Trace.WriteLine("Working", "Information");
    12.         }
    13.     }
    14.  
    15.     public override bool OnStart()
    16.     {
    17.         // Set the maximum number of concurrent connections
    18.         ServicePointManager.DefaultConnectionLimit = 12;
    19.  
    20.         // For information on handling configuration changes
    21.         // see the MSDN topic at <a href="http://go.microsoft.com/fwlink/?LinkId=166357">http://go.microsoft.com/fwlink/?LinkId=166357</a>.
    22.  
    23.         return base.OnStart();
    24.     }
    25. }

    Azure calls the OnStart when it starts, then calls the Run method. This sample will hard crash when Azure scales it out of existence, swaps in new instances, decides it's Windows patch time, lets us press the Stop button, and so on.

    Let's see it in action. I've added a DoWork() method that sleeps for 10 seconds to simulate important work being done. I've also added Trace.WriteLine calls to the existing methods and to an override of the OnStop method, so we can see what's happening.

    1. public override void Run()
    2. {
    3.     // This is a sample worker implementation. Replace with your logic.
    4.     Trace.WriteLine("BasicWorker - Entry point called", "Information");
    5.  
    6.     while (true)
    7.     {
    8.         Thread.Sleep(10000);
    9.         Trace.WriteLine("BasicWorker - Starting some work", "Information");
    10.         DoWork();
    11.         Trace.WriteLine("BasicWorker - Finished some work", "Information");
    12.     }
    13. }
    14.  
    15. public void DoWork()
    16. {
    17.     Thread.Sleep(10000);
    18. }
    19.  
    20. public override void OnStop()
    21. {
    22.     Trace.WriteLine("BasicWorker - OnStop", "Information");
    23. }

    If we run this in the emulator and suspend the worker role in the middle of our important work, it exits right on cue, in the middle of the work.

    Diagnostic Trace Output

    1. Information: 000.1s - BasicWorker - Entry point called
    2. Information: 010.1s - BasicWorker - Starting some work
    3. Information: 020.1s - BasicWorker - Finished some work
    4. Information: 030.1s - BasicWorker - Starting some work
    5. Information: 040.1s - BasicWorker - Finished some work
    6. Information: 050.1s - BasicWorker - Starting some work
    7. Information: 056.1s - BasicWorker - OnStop

    note: I later added timestamps to the Trace output for readability

    If this were a real worker role, we could have been doing just about anything in that step when it was killed. Is our system still in a good state?

    A Cancel-able Worker Role

    The base class for a WorkerRole is the RoleEntryPoint. As we saw above, it offers an OnStop method that will be called when the instance is suspended. More importantly, though, we are allowed to delay that OnStop method up to 30 seconds to finish up what we are working on.

    Note: Early last year (2012) this was extended to 5 minutes, though it's not reflected in the documentation above.

    The first change we want to make is to replace the while(true) construct with a method that we can cancel. Using a CancellationTokenSource, we can instead loop while that token is not cancelled.

    1. private CancellationTokenSource _cancellationTokenSource;
    2.  
    3. public override void Run()
    4. {
    5.     Trace.WriteLine("SafeWorker - Entry point called", "Information");
    6.     _cancellationTokenSource = new CancellationTokenSource();
    7.     var token = _cancellationTokenSource.Token;
    8.  
    9.     while (!token.IsCancellationRequested)
    10.     {
    11.         Trace.WriteLine("SafeWorker - Starting some work", "Information");
    12.         DoWork();
    13.         Trace.WriteLine("SafeWorker - Finished some work", "Information");
    14.         token.WaitHandle.WaitOne(10000);    // sleep 10s or exit early if cancellation is signalled
    15.         // ...

    Replacing the Thread.Sleep with a WaitOne() call will allow us to reduce the time to cancel. Unless it receives a signal (cancellation), the token will wait the specified number of milliseconds before continuing. Moving the WaitOne to the end ensures that if a cancellation is signaled, we won't pick up one last bit of work before exiting.

    The other piece of the equation is making the OnStop wait until the we have safely exited the loop. We can achieve this by creating a "Safe to exit" WaitHandle that is only set after successfully exiting the loop. The OnStop will Cancel via the CancellationToken, then wait for the "Safe to exit" token to be set before returning.

    1. private CancellationTokenSource _cancellationTokenSource;
    2. private ManualResetEvent _safeToExitHandle;
    3.  
    4. public override void Run()
    5. {
    6.     Trace.WriteLine("SafeWorker - Entry point called", "Information");
    7.     _cancellationTokenSource = new CancellationTokenSource();
    8.     _safeToExitHandle = new ManualResetEvent(false);
    9.     var token = _cancellationTokenSource.Token;
    10.  
    11.     while (!token.IsCancellationRequested)
    12.     {
    13.         Trace.WriteLine("SafeWorker - Starting some work", "Information");
    14.         DoWork();
    15.         Trace.WriteLine("SafeWorker - Finished some work", "Information");
    16.         token.WaitHandle.WaitOne(10000);    // sleep 10s or exit early if cancellation is signalled
    17.     }
    18.  
    19.     Trace.WriteLine("SafeWorker - Ready to exit", "Information");
    20.     _safeToExitHandle.Set();    // cleanly exited the main loop
    21. }
    22.  
    23. // ...
    24.  
    25. public override void OnStop()
    26. {
    27.     Trace.WriteLine("SafeWorker - OnStop Called", "Information");
    28.     _cancellationTokenSource.Cancel();
    29.     _safeToExitHandle.WaitOne();
    30.     Trace.WriteLine("SafeWorker - OnStop Complete, Exiting Safely", "Information");
    31. }

    Now if we run this like the BasicWorker above, hitting the Suspend button in the middle of the DoWork call, we see the system takes the time to exit out safely:

    Diagnostic Trace Output

    1. Information: 000.0s - SafeWorker - Entry point called
    2. Information: 000.1s - SafeWorker - Starting some work
    3. Information: 010.1s - SafeWorker - Finished some work
    4. Information: 020.1s - SafeWorker - Starting some work
    5. Information: 030.1s - SafeWorker - Finished some work
    6. Information: 040.1s - SafeWorker - Starting some work
    7. Information: 046.4s - SafeWorker - OnStop Called
    8. Information: 050.1s - SafeWorker - Finished some work
    9. Information: 050.1s - SafeWorker - Ready to exit
    10. Information: 050.1s - SafeWorker - OnStop Complete, Exiting Safely

    note: I later added timestamps to the Trace output for readability

    We can see the OnStop call come in in the middle of our Starting some Work/Finished some work output, but instead of exiting immediately, the worker calmly finished up it's work and then announced it was ready to exit (by setting the _safeToExitHandle WaitHandle).

    Wrapping Up

    Letting your application die in the middle of an operation is typically not a good idea.

    The example code for this post is available on github at tarwn/AzureWorkerRole_Cancellation. After finishing the code samples above, I went back and add seconds elapsed to the trace output message. I didn't update the code samples above because it would have only served to distract from the real code.

    I haven't posted on Azure as much as I probably should have, given how much of my time I spend working with it. Expect to see more posts on this in the upcoming months.

    About the Author

    User bio imageEli delivers software and technology solutions for a living. His roles have included lone developer, accidental DBA, team lead, and even unintentional Solaris consultant once. With experience in adhoc, Lean, and Agile environments across NSF grants, SaaS products, and in-house IT groups, he is just as willing to chat about the principles of Lean or Continuous Delivery as he is to dive into Azure, SQL Server, or the last ATDD project he created.
    Social SitingsTwitterLinkedInHomePagedeliciousLTD RSS Feed
    InstapaperVote on HN

    1 comment

    Comment from: Mark [Visitor] Email
    Mark Thanks for this - it's just what I was looking for.
    Do you have a source for this:
    "Note: Early last year (2012) this was extended to 5 minutes, though it's not reflected in the documentation above."?
    04/04/13 @ 11:26

    Leave a comment


    Your email address will not be revealed on this site.

    To mislead the spambots.

    Your URL will be displayed.
    (Line breaks become <br />)
    (Name, email & website)
    (Allow users to contact you through a message form (your email will not be revealed.)