Visual Studio 2010 and .NET 4.0 are almost released, one of the new things that ship with this release is Parallel Programming. Since you can’t buy a machine anymore with just one core it is time that we developers get intimate with concurrent programming. I decided to play around with this a little today, this is not a real technical post, I mostly show you how you can get started and what new tools are available.

## Getting to know Parallel Programming with the .NET Framework 4

The best way to learn about new additions to a framework is to look at some code. There are 22 samples for Parallel Programming with the .NET Framework 4 available on the msdn code library. You can download the .NET Framework 4 Parallel Programming samples here: Samples for Parallel Programming with the .NET Framework 4

Here is a screen shot of the Mandelbrot Fractals sample.

If you run the sample on a 2 core machine you will see that the parallel execution is about 60% faster than the sequential one.

After I was done with the Mandelbrot Fractals sample, I decided to open up the ComputePi sample, after all it is Pi day today (2010.3.14)

Here is the code for the different ways of calculating Pi which is part of the ComputePi sample.

Estimates the value of PI using a LINQ-based implementation.

 C# ```1 2 3 4 5 6 7 8 ``` ```      static double SerialLinqPi()     {         double step = 1.0 / (double)num_steps;         return (from i in Enumerable.Range(0, num_steps)                 let x = (i + 0.5) * step                 select 4.0 / (1.0 + x * x)).Sum() * step;     }```
```
static double SerialLinqPi()
{
double step = 1.0 / (double)num_steps;
return (from i in Enumerable.Range(0, num_steps)
let x = (i + 0.5) * step
select 4.0 / (1.0 + x * x)).Sum() * step;
}```

Estimates the value of PI using a PLINQ-based implementation.

 C# ```1 2 3 4 5 6 7 8 ``` ```        static double ParallelLinqPi()     {         double step = 1.0 / (double)num_steps;         return (from i in ParallelEnumerable.Range(0, num_steps)                 let x = (i + 0.5) * step                 select 4.0 / (1.0 + x * x)).Sum() * step;     }```
```
static double ParallelLinqPi()
{
double step = 1.0 / (double)num_steps;
return (from i in ParallelEnumerable.Range(0, num_steps)
let x = (i + 0.5) * step
select 4.0 / (1.0 + x * x)).Sum() * step;
}```

Estimates the value of PI using a for loop.

 C# ```1 2 3 4 5 6 7 8 9 10 11 12 ``` ```        static double SerialPi()     {         double sum = 0.0;         double step = 1.0 / (double)num_steps;         for (int i = 0; i < num_steps; i++)         {             double x = (i + 0.5) * step;             sum = sum + 4.0 / (1.0 + x * x);         }         return step * sum;     }```
```
static double SerialPi()
{
double sum = 0.0;
double step = 1.0 / (double)num_steps;
for (int i = 0; i < num_steps; i++)
{
double x = (i + 0.5) * step;
sum = sum + 4.0 / (1.0 + x * x);
}
return step * sum;
}```

Estimates the value of PI using a Parallel.For.

 C# ```1 2 3 4 5 6 7 8 9 10 11 12 ``` ```    static double ParallelPi()     {         double sum = 0.0;         double step = 1.0 / (double)num_steps;         object monitor = new object();         Parallel.For(0, num_steps, () => 0.0, (i, state, local) =>         {             double x = (i + 0.5) * step;             return local + 4.0 / (1.0 + x * x);         }, local => { lock (monitor) sum += local; });         return step * sum;     }```
```    static double ParallelPi()
{
double sum = 0.0;
double step = 1.0 / (double)num_steps;
object monitor = new object();
Parallel.For(0, num_steps, () => 0.0, (i, state, local) =>
{
double x = (i + 0.5) * step;
return local + 4.0 / (1.0 + x * x);
}, local => { lock (monitor) sum += local; });
return step * sum;
}```

Estimates the value of PI using a Parallel.ForEach and a range partitioner.

 C# ```1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ``` ```        static double ParallelPartitionerPi()     {         double sum = 0.0;         double step = 1.0 / (double)num_steps;         object monitor = new object();         Parallel.ForEach(Partitioner.Create(0, num_steps), () => 0.0, (range, state, local) =>         {             for (int i = range.Item1; i < range.Item2; i++)             {                 double x = (i + 0.5) * step;                 local += 4.0 / (1.0 + x * x);             }             return local;         }, local => { lock (monitor) sum += local; });         return step * sum;     } }```
```
static double ParallelPartitionerPi()
{
double sum = 0.0;
double step = 1.0 / (double)num_steps;
object monitor = new object();
Parallel.ForEach(Partitioner.Create(0, num_steps), () => 0.0, (range, state, local) =>
{
for (int i = range.Item1; i < range.Item2; i++)
{
double x = (i + 0.5) * step;
local += 4.0 / (1.0 + x * x);
}
return local;
}, local => { lock (monitor) sum += local; });
return step * sum;
}
}```

Here are the results when you run the sample. The first column is the time in seconds and the second column has the method name.

```03.3369869: SerialLinqPi
02.1516130: ParallelLinqPi
01.0522855: SerialPi
00.6457441: ParallelPi
00.7098180: ParallelPartitionerPi```

Again these were from my dual core laptop, you might get a bigger difference if you run it on a quad core box.

## Profiling Performance

Visual Studio 2010 ships with a couple of tools that will make your life easier if you do parallel programming. Launch the Performance Wizard from Tools–>Launch Performance Wizard

After that a wizard will launch and you will see a window like the one in the sreenshot below

Pick concurrency and visualize the behaviour of a multithreaded application.

After you are done, start your app and when you close the app, reports will be generated.
If you don’t see the reports then do the following; click on View–>Other Windows–>Performance Explorer.

FYI, You need to run as admin to generate these report and if you are on a 64bit machine then you need to set the platform target to x86 in order to be able to generate these reports. I was greeted with the following message: To enable complete call stacks on x64 platforms, executive paging must be disabled. A reboot is then required. To make this change, click “Yes”, save your work, and then reboot. For more information, see http://go.microsoft.com/fwlink/?LinkId=157265 . After I rebooted everything worked.

There are 3 types of reports that you will see. Here is what the CPU Utilization report looks like.

There is a report for threads

Finally there is also a report for cores

Instead of having 20 images embedded I decided that a video would be more useful. This video is about 1 minute and 52 seconds and it shows you what the tool looks like when I am clicking around in it.

Here is the HD Video version, I would suggest you click on the video and watch in on YouTube in 720P full screen format

## Learning more about Concurrency Profiling and Parallel Programming

To finalize this post, here are some links to technical resources that will help you with Concurrency Profiling and Parallel Programming.

Below is a 10 part blog post series by Reed Copsey, Jr about parallelism in .net

Introduction

Part 2, Simple Imperative Data Parallelism

Part 3, Imperative Data Parallelism: Early Termination

Part 4, Imperative Data Parallelism: Aggregation

Part 5, Partitioning of Work

Part 6, Declarative Data Parallelism

Part 7, Some Differences between PLINQ and LINQ to Objects

Part 8, PLINQ’s ForAll Method

Part 9, Configuration in PLINQ and TPL

Part 10, Cancellation in PLINQ and the Parallel class

So hopefully this post will spark your interest and you will take a look at these interesting technologies and tools.

Happy π day ,and today is also Albert Einstein’s birthday.