I'm James Reinders and, as I've travelled around talking to programmers about parallel programming, I've come up with a set of rules of thumb: recurring themes about what makes programmers more successful doing parallel programming.
The first one is: think parallel. By this I mean think about the parallelism in your program first. Don't think about how to implement your program in the traditional methods first and then try to jam parallelism in.
Now the reality is that you probably already have some programs you're trying to introduce parallelism in. But I'd still encourage you to step back, think about what parallelism really is, and how you're going to get at it and have a strategy for that before you get busy coding.
Doing this really helps; even if you can't take advantage of all the parallelism right away, you can have a strategy for it and introduce a little bit at a time and have a roadmap for taking advantage of it.
The second rule of thumb is: don't try to program to the number of processors there are. Think about tasks, think about pieces of parallelism and program tools, don't think about threads. So I call this program to tasks, not to threads.
A thread to me is something that, typically, there is one thread per processor. The best parallel programs don't query the number of processors and have all their algorithms written around that. Even though that may happen at the lower levels, the higher level of the program really should be producing tasks; lots of them, and then allowing some underlying structure to map those on to the number of processors that happen to be there.
So obviously a program that produces thousands of tasks of available parallelism will scale a lot better tan one that only produces two tasks. So focus programming to tasks, not to threads.
The third one which you don't want to overlook is: take a look at the tool that you're using. Take a look at the compilers, the libraries, the debugging tools; take a look at them and ask yourself: were these designed with parallelism in mind?
If not, take a look at what is available and think very hard about making sure that your not making your life a lot more difficult as you approach parallelism by ignoring getting the proper tools in place.
As a professional, writing programs, worrying about parallelism, you really deserve to take a look at the tools you're using for that. You're probably used to having tools that support you really well in the programming you're doing now and as you add parallelism you really should have the same attitude about looking for them.
The fourth rule of thumb is perhaps my favourite one, as it turns out to be an incredibly important one that I see people do all the time in parallelism, but it is often not spoken about. And that is to make sure that your program can run sequentially.
So if you have a multi-threaded program and it can run in four threads and eight threads and so on, make sure it can run in one thread. It doesn't need to be efficient, but if you have a program that you can kick back into single-threaded mode, you'll find yourself doing a lot of debugging in that mode, because you can debug the general errors in your program that you do today in your sequential program; you can do that without having to debug any special issues relating to parallelism. And then you kick it into two threads or four threads mode.
If your program has a failure running in four threads and you make it run in one thread and the failure goes away, it gives you a hint that the error has something to do with the way you specified parallelism. If the issue is still there, the good news is that you can use the traditional debugging tools you already have today to debug the program in single-threaded mode.
It's quite possible to write programs that can only run in parallelism. It's a bad idea. Avoid using techniques that require that. In general it is very easy to write a program so that it can run in a single-threaded mode, and it makes debugging much easier.
Fifth rule of thumb: really limit the use of locks. Often when people learn about parallelism, they learn about the need to synchronise between multiple threads between pieces of execution of their program. They learn about locks and they start adding locks. Lots and lots of locks.
And then they start to learn the problems with locks. One is that they are inefficient they limit the scalability of a program. Another one is that in a real program with lots of dynamics going on and libraries being called, it is very to have a lock set that then interferes with another lock, and you get deadlocks and all sorts of problems. The fewer locks you have the better.
We can go back to the first rule of thumb, thinking parallelism, and we can connect these together. If you keep in mind that you don't want to use a lot of locks, then the best way to extract parallelism in your program won't need explicit locking, or it won't need a lot of it. If the algorithm you come up with requires a lot of locks and you're using locks everywhere to protect things, it's worth taking a bit time to think about whether you can come up with an algorithm that take a lot less locks or perhaps no locks at all using implicit synchronisation.
Another rule of thumb is: don't forget about memory allocation. Memory allocation turns you to be a very common bottleneck for programs as they move to parallelism, if you continue to use the memory allocation techniques you've used before.
It's much more efficient to have multiple pools of memory than to have a scalable memory allocator. And there are quite a few of them available out the market. The rule of thumb here is, don't forget to take a look at those. Don't forget to take a look at whether memory allocation is something you do a lot of in your program and, if it is, when you move to parallelism don't forget to move the memory allocation to parallelism as well.
The final rule of thumb is going back to thinking about Amdahl's Law and Gustafson's observations made in 1988. To get true scalability of program, you need to expect the workload to increase so, over time, as we get more and more processors, more powerful systems, you should have in mind that you're going to get performance benefits out of that.
Maybe another way to look at this rule of thumb is: don't get too hung up on expecting great big speedups on today's program with today's dataset. Get yourself lined up for scalability. Get yourself some benefit from the current processors, but be sure to take a look at how that scalability is going to come as your workloads increase. As your programs are used to process more data or to do more tasks in the future. Keep in mind the scaling.
So these are seven rules of thumb that I think work very well for people that are doing parallel programming today.