An instruction set extension designed to accelerate multimedia applications is presented and evaluated. In the proposed complex streamed instruction (CSI) set, a single instruction can process vector data streams of arbitrary length and stride and combines complex memory accesses
...
An instruction set extension designed to accelerate multimedia applications is presented and evaluated. In the proposed complex streamed instruction (CSI) set, a single instruction can process vector data streams of arbitrary length and stride and combines complex memory accesses (with implicit prefetching), program control for vector sectioning, and complex computations on multiple data in a single operation. In this way, CSI eliminates overhead instructions (such as instructions for data sectioning, alignment, reorganization, and packing/unpacking) often needed in applications utilizing MMX-like extensions and accelerates key multimedia kernels. Simulation results demonstrate that a superscalar processor extended with CSI outperforms the same processor enhanced with Sun's VIS extension by a factor of up to 7.77 on key multimedia kernels and by up to 35% on full applications.@en