I did a lecture today on fork/join parallelism using the JSR 166 fork/join framework. As an example, I parallelized merge sort in the obvious way and got some decent performance improvements:
Here's the speedups for increasing numbers of elements for the 2-way and 4-way cases:
This was on an Intel Core 2 Quad Q6600. The sequential data is measuring the time to call the built-in Collections.sort(), which is an optimized quick sort implementation.
Overall, the implementation took about 25 minutes, much of which was consumed with figuring out that the ForkJoinPool class's execute() method doesn't wait for the top-level task to complete. (I should probably go back and read Doug Lea's book.)
Not bad for a language whose programming paradigm is anti-parallel by its very nature!