



| Techno                                     | ology Tre  | ends         |                                              |                                                 |
|--------------------------------------------|------------|--------------|----------------------------------------------|-------------------------------------------------|
|                                            | Capa       | city         | Speed (late                                  | ency)                                           |
| Logic:                                     | 2x in      | 3 years      | 2x in 3 yea                                  | ars                                             |
| DRAM:                                      | 4x in      | 3 years      | 2x in 10 ye                                  | ears                                            |
| Disk:                                      | 4x in      | 3 years      | 2x in 10 ye                                  | ears                                            |
|                                            | I          | DRAM         |                                              |                                                 |
|                                            | Year       | Size         | Cycle Time                                   |                                                 |
|                                            | 1980       |              | 250 ns                                       |                                                 |
|                                            | 1983 1000: | 1! 256 Kb 2: | r. 220 ns                                    |                                                 |
|                                            | 1986       | 1 Mb         | 190 ns                                       |                                                 |
|                                            | 1989 (     | 4 Mb (       | 165 ns                                       |                                                 |
|                                            | 1992       | 16 Mb        | 145 ns                                       |                                                 |
|                                            | 1995       | ∽ 64 Mb      | ີ 120 ns                                     |                                                 |
|                                            |            |              | perform a read or w<br>write mechanism at th | rite operation. For non-<br>ne desired location |
| • $T_N = T_A + N/R$ , w<br>N=Number of bit |            |              |                                              | Average access time,                            |
|                                            |            |              | Tsung-Han Tsai                               | 3                                               |















Tsung-Han Tsai

10

## **Direct Mapped Cache: Multiword Block**

 Taking advantage of spatial locality: to have a cache block that is larger than one word in length; Example: Fig.7.10 and P.556=>Read operation is the same except when miss occurred a whole block is replaced; write operation is different ! -> require read operation when miss occurs









|                                            | Alpha                                  | MIPS                                    |                                       |                        | PowerPC                                       | Super                                                                       |                              |                     |
|--------------------------------------------|----------------------------------------|-----------------------------------------|---------------------------------------|------------------------|-----------------------------------------------|-----------------------------------------------------------------------------|------------------------------|---------------------|
| Microsrossor<br>Company                    | 21054<br>Olgital<br>Equipment<br>Corp. | Attps<br>MtPS<br>Technol-<br>ogles Inc. | 247/190<br>Hewlett-<br>Packard<br>Co. | Posturn<br>Intel Corp. | 1901<br>1914 Corp.<br>and<br>Motoroia<br>Inc. | -Spare<br>Sen Micro-<br>systems<br>Corp. and<br>Teras Instru-<br>ments Inc. | 68040<br>Motorola<br>Inc.    | 80486<br>Intel Cor  |
| Introduction date                          | 2/92                                   | 11/92                                   | 2/92                                  | 3/93                   | 4/93                                          | 5/92                                                                        | 1989                         | 6/91                |
| Architecture and experiments               |                                        |                                         |                                       | ti in the Car          |                                               |                                                                             | 2 <b></b>                    |                     |
| Type                                       | RESC                                   | RISC                                    | RISC                                  | CLISC                  | RISC                                          | RISC                                                                        | CISC                         | CISC<br>CISC        |
| Width, bits (a)                            | 4                                      | 64                                      | *                                     | 32                     | 32                                            | 32                                                                          | 32                           | 12                  |
| On-chip cache, kB<br>(instruction / datz)  | \$/8                                   | 16/16                                   | None                                  | 8/8                    | 32 unified                                    | 20/16                                                                       | 4/4                          | 8 unilla            |
| Off-chip cache, MB<br>(instruction / data) | 16                                     | 4                                       | 1/2                                   | External<br>controller | External<br>controller                        | External<br>controller                                                      | External<br>controller       | External<br>control |
| No. of registers<br>(general-purpose / FP) | 32/32                                  | 32/32                                   | 32/32                                 | 8/8                    | 32/32                                         | 136/32                                                                      | 16/8                         | 8/8                 |
| Instruction issue rate per cycle           | 2                                      | 1                                       | 2                                     | 2                      | 3                                             | 3                                                                           | 1                            | 1                   |
| No. of independent units                   | 4                                      | NA.                                     | 3                                     | 3                      | 3                                             | 5                                                                           | NA.                          | NA.                 |
| No. of pipeline stages<br>(ininger / FP)   | 7 /10                                  | 7/10                                    | 5/6                                   | 5/8                    | 4/6                                           | 4/5                                                                         | 3/6                          | 5/N.5               |
| Endian (b)                                 | Little                                 | Big/little                              | 6ig                                   | Little                 | Big / little                                  | Big                                                                         | 8ig                          | Little              |
| Typical latency (integer / FP)             | 1/6                                    | 1/4                                     | 1/2                                   | 1/3                    | 1/3                                           | 1/3                                                                         | 1/3                          | N.S.                |
| Multiprocessing support?                   | Yes                                    | Yes                                     | Yes                                   | Yes                    | Yes                                           | Yes                                                                         | No                           | No                  |
| Technology and performance                 |                                        |                                         | and the second                        |                        | ed:                                           |                                                                             |                              | in prairie          |
| Technology                                 | 0.68 µm<br>CMOS                        | 0.6 µm<br>CMOS                          | 0.8 µm<br>CMOS                        | 0.8 µm<br>BiCMOS       | 0.65 µm<br>CMOS                               | 0.7 µm<br>BiCMOS                                                            | 0.65 µm<br>CM.OS             | 0.8 pr<br>CMOS      |
| Oie size, mm                               | 15.3 by<br>12.7                        | 12 by 15.5                              | 14.2 by<br>14.2                       | 17.2 by<br>17.2        | 11 by 11                                      | 16 by 16                                                                    | 10.8 by<br>11.7              | N.S.                |
| Transistors, millions                      | 1.68                                   | 2.3                                     | 0.65                                  | 3.1                    | 2.8                                           | 3.1                                                                         | 1.2                          | 12                  |
| Metallization layers                       | 3                                      | 2                                       | 3                                     | 3                      | 4                                             | 3                                                                           | 2                            | 3                   |
| Operating voltage, V                       | 3.3                                    | 5/3.3                                   | 5                                     | 5                      | 3.6                                           | 5.3                                                                         | 5                            | 5                   |
| Clock, MHz                                 | 200                                    | 150                                     | 100                                   | 66                     | 80                                            | 60                                                                          | 25                           | 50                  |
| SPECint 92 (c)                             | 130                                    | 85                                      | 81                                    | 67.4                   | 85                                            | 80                                                                          | 21                           | 27.9                |
| SPECtp 92 (c)                              | 184                                    | 97                                      | . 150                                 | 63.6                   | 105                                           | 100                                                                         | 15                           | 13.1                |
| Press, packaging, and price                | 17. 24 ST.                             | COLUMN 1                                | Stand                                 |                        |                                               | 的法律和法律                                                                      | 2. <b>1</b> 1 1 1 1 1        |                     |
| Peak power, W                              | 30                                     | 15                                      | 23                                    | 16                     | 9.1                                           | 14.2                                                                        | 6                            | 5                   |
| Cooling                                    | Heat sink                              | Heat sink                               | Heat sink                             | Fan plus<br>heat sink  | Ambient<br>plus heat<br>sink                  | Forced air<br>plus heat<br>sink                                             | Ambient<br>plus heat<br>sink | Fan or i<br>sini    |
| Ceramic package (pins / style)             | 431/ PGA                               | 447 / PGA                               | 504/PGA                               | 273/ PGA               | 304/0FP                                       | 293 / PGA                                                                   | 179/ PGA                     | 168/1               |
| US \$ price per 1000                       | \$505                                  | \$1100                                  | N.S.                                  | \$896                  | \$545/\$557                                   | \$999                                                                       | \$233                        | \$43                |











|                                              | ted memory systems:                                                                                                                              |                                                                                                           |
|----------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------|
| Characteristic                               | Intel Pentium Pro                                                                                                                                | PowerPC 604                                                                                               |
| Virtual address                              | 32 bits                                                                                                                                          | 52 bits                                                                                                   |
| Physical address                             |                                                                                                                                                  | 32 bits                                                                                                   |
| Page size                                    | 4 KB, 4 MB                                                                                                                                       | 4 KB, selectable, and 256 MB                                                                              |
| TLB organization                             | A TLB for instructions and a TLB for data                                                                                                        | A TLB for instructions and a TLB for data                                                                 |
|                                              | Both four-way set associative                                                                                                                    | Both two-way set associative                                                                              |
|                                              | Pseudo-LRU replacement                                                                                                                           | LRU replacement                                                                                           |
|                                              | Instruction TLB: 32 entries                                                                                                                      | Instruction TLB: 128 entries                                                                              |
|                                              | Data TLB: 64 entries<br>TLB misses handled in hardware                                                                                           | Data TLB: 128 entries<br>TLB misses handled in hardware                                                   |
|                                              |                                                                                                                                                  |                                                                                                           |
|                                              |                                                                                                                                                  | 7                                                                                                         |
| Charac                                       | teristic Intel Pentium Pro                                                                                                                       | PowerPC 604                                                                                               |
| Charac<br>Cache organiz                      |                                                                                                                                                  |                                                                                                           |
|                                              |                                                                                                                                                  | ches Split intruction and data caches                                                                     |
| Cache organiz                                | 2 Split instruction and data ca<br>8 KB each for instructions/di                                                                                 | ches Split intruction and data caches                                                                     |
| Cache organiz<br>Cache size                  | 2 Split instruction and data ca<br>8 KB each for instructions/di                                                                                 | ches Split intruction and data caches<br>ata 16 KB each for instructions/data<br>Four-way set associative |
| Cache organiz<br>Cache size<br>Cache associa | ation Split instruction and data ca<br>8 KB each for instructions/d<br>ativity Four-way set associative<br>Approximated LRU replacer<br>32 bytes | ches Split intruction and data caches<br>ata 16 KB each for instructions/data<br>Four-way set associative |

## **Some Issues**

- Processor speeds continue to increase very fast
  much faster than either DRAM or disk access times
- Design challenge: dealing with this growing disparity
- Trends:
  - synchronous DRAMs (provide a burst of data), DDR SDRAM, RAMBUS
  - redesign DRAM chips to provide higher bandwidth or processing
  - restructure code to increase locality
  - use prefetching (make cache visible to ISA)