Chip Multiprocessor Architecture: Techniques To Improve Throughput And Latency (synthesis Lectures On Computer Architecture)
by Kunle Olukotun /
2007 / English / PDF
5.6 MB Download
Chip multiprocessors - also called multi-core microprocessors or
CMPs for short - are now the only way to build high-performance
microprocessors, for a variety of reasons. Large uniprocessors are
no longer scaling in performance, because it is only possible to
extract a limited amount of parallelism from a typical instruction
stream using conventional superscalar instruction issue techniques.
In addition, one cannot simply ratchet up the clock speed on
today's processors, or the power dissipation will become
prohibitive in all but water-cooled systems. Compounding these
problems is the simple fact that with the immense numbers of
transistors available on today's microprocessor chips, it is too
costly to design and debug ever-larger processors every year or
two. CMPs avoid these problems by filling up a processor die with
multiple, relatively simpler processor cores instead of just one
huge core. The exact size of a CMPs cores can vary from very simple
pipelines to moderately complex superscalar processors, but once a
core has been selected the CMPs performance can easily scale across
silicon process generations simply by stamping down more copies of
the hard-to-design, high-speed processor core in each successive
chip generation. In addition, parallel code execution, obtained by
spreading multiple threads of execution across the various cores,
can achieve significantly higher performance than would be possible
using only a single core. While parallel threads are already common
in many useful workloads, there are still important workloads that
are hard to divide into parallel threads. The low inter-processor
communication latency between the cores in a CMP helps make a much
wider range of applications viable candidates for parallel
execution than was possible with conventional, multi-chip
multiprocessors; nevertheless, limited parallelism in key
applications is the main factor limiting acceptance of CMPs in some
types of systems.
Chip multiprocessors - also called multi-core microprocessors or
CMPs for short - are now the only way to build high-performance
microprocessors, for a variety of reasons. Large uniprocessors are
no longer scaling in performance, because it is only possible to
extract a limited amount of parallelism from a typical instruction
stream using conventional superscalar instruction issue techniques.
In addition, one cannot simply ratchet up the clock speed on
today's processors, or the power dissipation will become
prohibitive in all but water-cooled systems. Compounding these
problems is the simple fact that with the immense numbers of
transistors available on today's microprocessor chips, it is too
costly to design and debug ever-larger processors every year or
two. CMPs avoid these problems by filling up a processor die with
multiple, relatively simpler processor cores instead of just one
huge core. The exact size of a CMPs cores can vary from very simple
pipelines to moderately complex superscalar processors, but once a
core has been selected the CMPs performance can easily scale across
silicon process generations simply by stamping down more copies of
the hard-to-design, high-speed processor core in each successive
chip generation. In addition, parallel code execution, obtained by
spreading multiple threads of execution across the various cores,
can achieve significantly higher performance than would be possible
using only a single core. While parallel threads are already common
in many useful workloads, there are still important workloads that
are hard to divide into parallel threads. The low inter-processor
communication latency between the cores in a CMP helps make a much
wider range of applications viable candidates for parallel
execution than was possible with conventional, multi-chip
multiprocessors; nevertheless, limited parallelism in key
applications is the main factor limiting acceptance of CMPs in some
types of systems.