Thinking in parallel is really really really.... different.
Currently I'm working on the Standard Template Adaptive Parallel Library (STAPL) project at Texas A&M this summer for a Comp Sci undergraduate research program. STAPL an adaptive parallel library for parallel computing systems that's a superset of the sequential ANSI C++ STL. Basically its goal is to provide a generic C++ library with all the same functionality as the C++ STL and in some cases more (i.e. STAPL's pContainers, pAlgorithms, pRanges are the parallel equivalents of STL containers, algorithms, and iterators, respectively). STAPL looks to provide users/ programmers with the ability to write programs that take advantage of the power parallel computing systems offer while not necessarily requiring them to be aware of the additional complexity that's added through parallelism. Conversely, it offers the power to those who do with the option of using the framework at a lower level if they desire. Due to the wide variety of different types of parallel architectures and systems, STAPL relies on both optimization done at runtime, compile time, and at install time (i.e. user specific options and database updating). The particular parallel model STAPL supports at the moment is the Single Program Multi-Data (SPMD) model, which may change to include other models in the future.
This is just a simple description of the STAPL project. The actual CS@TAMU STAPL project page provides information that is out of date to a certain extent, but should suffice for a general idea. With that said, the project has undergone a lot of design changes on both a conceptual and implementation level in recent months which require the re-implementation of the majority of the code, as well as the addition of new code. To be brief, the most important conceptual and implementational changes involve how the data spread among multiple processors is viewed and locality, as well as how work is done and its relation to data locality.
But anyways, the specific topic concerning the STAPL project I'm working on at the moment is the reimplemention of part of p_sort(), which is STAPL's version of the generic STL sort() algorithm. What's different about p_sort() is that sorting is a much different ballpark in parallel than in sequential... as well as still actively being researched. The actual p_sort() algorithm consists of multiple types of parallel sorting algorithms, including but not limited to Column-Sort, Sample-Sort, Radix-Sort, and Bitonic-Sort. Depending on certain criteria, the "best" parallel sorting algorithm is chosen (i.e. architecture, type of data being worked with, etc.) I've actually been assigned the Column-Sort algorithm... which I've already implemented sequentially but is a different arena in parallel, and considering I haven't taken a parallel computing course and I'm doing graduate level work... it's tough, but very educational and interesting.