Vector Processing (SIMD) and Related Execution Models

A quick discussion of what vector processing is, how it contrasts with scalar processing (and super-scalar), and how it compares with other processing mechanisms like co-processors and graphics processors.

A simple way to think about this is: If "scalar" processing is operating on a single piece of data at a time, "vector" processing is operating on a list of data elements at the same time. This requires that the same operation is performed on every element of the list, as it is a single instruction (hence, "SIMD": Single Instruction Multiple Data).

As the vector sizes get larger, moving the memory in and out of the CPU becomes the bottleneck, not just the pure computation. As a result, there is some benefit to using different and dedicated memory arrangements. This is part of why the Cell SPE worked on a dedicated piece of memory, as it didn't need to compete with the other parts of the system to access main memory, but could also optimize the memory layout and access patterns for its purposes.

Further generalizing this, we see how GP-GPU (General Purpose Graphics Processing Unit) approaches work. After all, a programmable GPU (the kind we run shaders on, these days) is basically just a big vector processor, often with its own dedicated memory.

As a result of using GPUs for this, much of the interest in specialized vector co-processors seems to have fallen by the wayside. On-chip vector processing is still highly utilized, however, as it fits into general CPU programming models and is still high performance while being low-overhead compared to something like a co-processor or GPU.

Vector processing is quite common in modern computer hardware and you likely recognize it by its various trademarked names: AltiVec, VMX, MMX, SSE, NEON, etc.

Jeff.