# DECchip 21164-AA (EV5 CPU) Functional Specification

The DECchip 21164-AA CPU Chip is a high-performance, single-chip implementation of the Alpha Architecture.

#### **Revision/Update Information:**

This is Revision 1.9 of this specification. Relative to Revision 1.5, all chapters are updated. This is the last release expected prior to tape-out.

Product Manager: John Fortune, RICKS::FORTUNE Engineering Manager: Paul Rubinfeld, ROCK::RUBINFELD

#### DIGITAL RESTRICTED DISTRIBUTION

This information shall not be disclosed to persons other than DIGITAL employees or generally distributed within DIGITAL. Distribution is restricted to persons authorized and designated by the originating organization. This document shall not be transmitted electronically, copied unless authorized by the originating organization, or left unattended. When not in use, this document shall be stored in a locked storage area. These restrictions are enforced until this document is reclassified by the originating organization.

Semiconductor Engineering Group Digital Equipment Corporation, Hudson, Massachusetts

This is copy number | 6

#### December 1992

The drawings and specifications in this document are the property of Digital Equipment Corporation and shall not be reproduced or copied or used in whole or in part as the basis for the manufacture or sale of items without written permission.

The information in this document may be changed without notice and is not a commitment by Digital Equipment Corporation. Digital Equipment Corporation is not responsible for any errors in this document.

This specification does not describe any program or product that is currently available from Digital Equipment Corporation, nor is Digital Equipment Corporation committed to implement this specification in any program or product. Digital Equipment Corporation makes no commitment that this document accurately describes any product it might ever make.

Copyright ©1992 by Digital Equipment Corporation All Rights Reserved Printed in U.S.A.

The following are trademarks of Digital Equipment Corporation:

DEC DECnet DECUS MicroVAX MicroVMS PDP ULTRIX ULTRIX-32 UNIBUS VAX VAXBI VAXCluster VAXstation VMS VT



# Contents

| CHAPTER 1 | INTRODUCTION                                                                                                                                                                                                                              | 1–1                                                                   |
|-----------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------|
| 1.1       | SCOPE                                                                                                                                                                                                                                     | 1–1                                                                   |
| 1.2       | CHIP FEATURES                                                                                                                                                                                                                             | 1–1                                                                   |
| 1.3       | TERMINOLOGY AND CONVENTIONS1.3.1Numbering1.3.2UNPREDICTABLE And UNDEFINED1.3.3Data Field Size1.3.4Ranges And Extents1.3.5Register Format Notation                                                                                         | 1–2<br>1–2<br>1–3<br>1–3<br>1–3                                       |
| 1.4       | CHIP SUMMARY                                                                                                                                                                                                                              | 1–5                                                                   |
| 1.5       | REVISION HISTORY                                                                                                                                                                                                                          | 1–6                                                                   |
| CHAPTER 2 | DECCHIP 21164-AA MICRO-ARCHITECTURE                                                                                                                                                                                                       | 2–1                                                                   |
| 2.1       | INTRODUCTION                                                                                                                                                                                                                              | 2–1                                                                   |
| 2.2       | OVERVIEW                                                                                                                                                                                                                                  | 2-1                                                                   |
| 2.3       | IBOX2.3.1Instruction Prefetch2.3.2Branch Execution2.3.3ITB2.3.4Interrupt Logic2.3.5Performance Counters                                                                                                                                   | 24<br>25<br>2-5<br>2-6<br>2-8                                         |
| 2.4       | EBOX                                                                                                                                                                                                                                      | 2–8                                                                   |
| 2.5       | MBOX2.5.1Big Endian Support2.5.2DTB2.5.3Replay Traps2.5.4Load Instruction Execution and the Miss Address File2.5.5Store Execution2.5.6Write Buffer and the WMB Instruction2.5.7MB Instruction2.5.8Ibox Read Requests2.5.9Mbox Arbitration | 28<br>28<br>29<br>210<br>2-12<br>2-13<br>2-15<br>2-15<br>2-15<br>2-15 |
| 2.6       | THE CBOX                                                                                                                                                                                                                                  | 2–15                                                                  |
| 2.7       | FBOX                                                                                                                                                                                                                                      | 2–17                                                                  |
| 2.8       | CACHE ORGANIZATION<br>2.8.1 Data Cache<br>2.8.2 Instruction Cache<br>2.8.3 Second Level Cache<br>2.8.4 External Cache - Bcache                                                                                                            | 2–18<br>2–18<br>2–18<br>2–18<br>2–18<br>2–19                          |
| 2.9       | PIPELINE ORGANIZATION                                                                                                                                                                                                                     | 2–19                                                                  |

# DIGITAL RESTRICTED DISTRIBUTION

iii .

## Contents

| 2.10      | SCHEDUL  | ING AND IS           | SUING RULES                                                                                               | 2–24 |
|-----------|----------|----------------------|-----------------------------------------------------------------------------------------------------------|------|
|           | 2.10.1   |                      | Class Definition and Instruction Slotting<br>Slotting • 2-25                                              | 2–24 |
|           | 2.10.2   | Instruction          | -                                                                                                         | 2–26 |
|           | 2.10.3   | Producer-F           | Producer Latency                                                                                          | 2–28 |
|           | 2.10.4   | DECchip 2            | 1164-AA Issue Rules                                                                                       | 2–28 |
| 2.11      | REVISION | HISTORY              |                                                                                                           | 2–30 |
| CHAPTER 3 | PALCODE  | AND IPRS             |                                                                                                           | 3–1  |
| 3.1       | OVERVIEV | N                    |                                                                                                           | 3-1  |
| 3.2       | PALCODE  | ENTRY PO             | NTS                                                                                                       | 3–1  |
|           | 3.2.1    | CALL_PAL             |                                                                                                           | 32   |
|           | 3.2.2    | Traps                |                                                                                                           | 3–3  |
| 3.3       | PAL OPCO | DDES                 |                                                                                                           | 3–3  |
|           | 3.3.1    | HW_LD                |                                                                                                           | 3–4  |
|           | 3.3.2    | HW_ST                |                                                                                                           | 35   |
|           | 3.3.3    | HW_REI               |                                                                                                           | 36   |
|           | 3.3.4    | HW_MFPR              | and HW_MTPR                                                                                               | 3-7  |
| 3.4       | PAL STOP | RAGE REGIS           | STERS                                                                                                     | 39   |
| 3.5       | SRM DEF  | INED STATE           | - OPENVMS                                                                                                 | 3–9  |
| 3.6       | SRM DEF  | INED STATE           | - OSF                                                                                                     | 3–10 |
| 3.7       | PERFORM  | IANCE                | ·                                                                                                         | 3–11 |
| 3.8       | TBMISS F | LOWS                 |                                                                                                           | 3–12 |
| 3.9       | IPRS     |                      |                                                                                                           | 3–13 |
|           | 3.9.1    | Ibox IPRs            |                                                                                                           | 3–13 |
|           |          | 3.9.1.1              | ITB_TAG • 3–14                                                                                            |      |
|           |          | 3.9.1.2<br>3.9.1.3   | ITB_PTE • 3–14<br>Address Space Number, ITB_ASN • 3–15                                                    |      |
|           |          | 3.9.1.3              | ITB PTE TEMP • 3–15                                                                                       |      |
|           |          | 3.9.1.5              | Istream TB Invalidate All Process, ITB_IAP • 3-16                                                         |      |
|           |          | 3.9.1.6              | IStream TB Invalidate All, ITB_IA • 3-16                                                                  |      |
|           |          | 3.9.1.7              | ITB_IS • 3–16                                                                                             |      |
|           |          | 3.9.1.8              | Formatted Faulting VA register, IFAULT_VA_FORM • 3-16                                                     |      |
|           |          | 3.9.1.9              | Virtual Page table Base register, IVPTBR • 3-17                                                           |      |
|           |          | 3.9.1.10             | Icache Parity Error Status register, ICPERR_STAT • 3-17                                                   |      |
|           |          | 3.9.1.11             | ICache Flush Control register, IC_FLUSH_CTL • 3–17                                                        |      |
|           |          | 3.9.1.12<br>3.9.1.13 | Exception Address register, EXC_ADDR • 3–18<br>Exception Summary register, EXC_SUM • 3–18                 |      |
|           |          | 3.9.1.13             | Exception Mask Register, EXC_MASK • 3–19                                                                  |      |
|           |          | 3.9.1.15             | PAL Base Register, PAL_BASE • 3–19                                                                        |      |
|           |          | 3.9.1.16             | Processor Status, PS • 3–20                                                                               |      |
|           |          | 3.9.1.17             | Ibox Control/Status Register, ICSR • 3-20                                                                 |      |
|           |          | 3.9.1.18             | Interrupt Priority Level Register, IPL • 3–21                                                             |      |
|           |          | 3.9.1.19             | Interrupt Id Register, INTID • 3-22                                                                       |      |
|           |          | 3.9.1.20             | Aynchronous System Trap Request Register, ASTRR • 3–22                                                    |      |
|           |          | 3.9.1.21<br>3.9.1.22 | Aynchronous System Trap Enable Register, ASTER • 3–23<br>Software Interrupt Request Register. SIRR • 3–23 |      |

DIGITAL RESTRICTED DISTRIBUTION

iv

| 3.10<br>CHAPTER 4<br>4.1 | EXTERN                                         | Possible                                                                                                              | Definitions • 3–56                                                                                                                                                  | 3–56<br>3–60<br>4–1<br>4–2<br>4–3<br>4–3<br>4–3<br>4–4 |
|--------------------------|------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------|
| CHAPTER 4                | REVISIO<br>EXTERN<br>CHIP IN<br>4.1.1<br>4.1.2 | 3.9.3.8<br>3.9.3.9<br>3.9.3.10<br>PAL Rest<br>3.9.4.1<br>ON HISTORY<br>NAL JNTERF<br>ITERFACE<br>Overview<br>Physical | Bcache Tag Address Register, BC_TAG_ADDR • 3–53<br>Fill_Syndrome Register, FILL_SYN • 3–54<br>Load Lock Register, LD_LOCK • 3–56<br>rrictions<br>Definitions • 3–56 | 3–60<br>4–1<br>4–2<br>4–3                              |
| CHAPTER 4                | REVISIO<br>EXTERN<br>CHIP IN<br>4.1.1          | 3.9.3.8<br>3.9.3.9<br>3.9.3.10<br>PAL Rest<br>3.9.4.1<br>ON HISTORY<br>NAL JNTERF<br>ITERFACE<br>Overview             | Bcache Tag Address Register, BC_TAG_ADDR • 3–53<br>Fill_Syndrome Register, FILL_SYN • 3–54<br>Load Lock Register, LD_LOCK • 3–56<br>rrictions<br>Definitions • 3–56 | 3–60<br>4–1<br>4–1<br>4–2                              |
| CHAPTER 4                | REVISIO<br>EXTERN<br>CHIP IN                   | 3.9.3.8<br>3.9.3.9<br>3.9.3.10<br>PAL Rest<br>3.9.4.1<br>ON HISTORY<br>NAL JNTERF                                     | Bcache Tag Address Register, BC_TAG_ADDR • 3–53<br>Fill_Syndrome Register, FILL_SYN • 3–54<br>Load Lock Register, LD_LOCK • 3–56<br>rictions<br>Definitions • 3–56  | 3–60<br>4–1<br>4–1                                     |
| CHAPTER 4                | REVISIO<br>EXTERN                              | 3.9.3.8<br>3.9.3.9<br>3.9.3.10<br><b>PAL Rest</b><br>3.9.4.1<br><b>DN HISTORY</b>                                     | Bcache Tag Address Register, BC_TAG_ADDR • 3–53<br>Fill_Syndrome Register, FILL_SYN • 3–54<br>Load Lock Register, LD_LOCK • 3–56<br>rictions<br>Definitions • 3–56  | 3–60<br>4–1                                            |
|                          | REVISIO                                        | 3.9.3.8<br>3.9.3.9<br>3.9.3.10<br><b>PAL Rest</b><br>3.9.4.1<br><b>DN HISTORY</b>                                     | Bcache Tag Address Register, BC_TAG_ADDR • 3–53<br>Fill_Syndrome Register, FILL_SYN • 3–54<br>Load Lock Register, LD_LOCK • 3–56<br>rictions<br>Definitions • 3–56  | 3–60                                                   |
| 3.10                     |                                                | 3.9.3.8<br>3.9.3.9<br>3.9.3.10<br><b>PAL Rest</b><br>3.9.4.1                                                          | Bcache Tag Address Register, BC_TAG_ADDR • 3–53<br>Fill_Syndrome Register, FILL_SYN • 3–54<br>Load Lock Register, LD_LOCK • 3–56<br>rictions                        |                                                        |
|                          | 3.9.4                                          | 3.9.3.8<br>3.9.3.9<br>3.9.3.10<br><b>PAL Rest</b>                                                                     | Bcache Tag Address Register, BC_TAG_ADDR • 3–53<br>Fill_Syndrome Register, FILL_SYN • 3–54<br>Load Lock Register, LD_LOCK • 3–56<br>rictions                        | 3–56                                                   |
|                          | 0.0.4                                          | 3.9.3.8<br>3.9.3.9<br>3.9.3.10                                                                                        | Bcache Tag Address Register, BC_TAG_ADDR • 3–53<br>Fill_Syndrome Register, FILL_SYN • 3–54<br>Load Lock Register, LD_LOCK • 3–56                                    |                                                        |
|                          |                                                | 3.9.3.8<br>3.9.3.9                                                                                                    | Bcache Tag Address Register, BC_TAG_ADDR • 3–53<br>Fill_Syndrome Register, FILL_SYN • 3–54                                                                          |                                                        |
|                          |                                                | 3.9.3.8                                                                                                               | Bcache Tag Address Register, BC_TAG_ADDR • 3-53                                                                                                                     |                                                        |
|                          |                                                |                                                                                                                       |                                                                                                                                                                     |                                                        |
|                          |                                                |                                                                                                                       |                                                                                                                                                                     |                                                        |
|                          |                                                | 3.9.3.6                                                                                                               | External Interface Status Register, EI_STAT • 3-51                                                                                                                  |                                                        |
|                          |                                                | 3.9.3.5                                                                                                               | Bcache Configuration Register, BC_CONFIG • 3-49                                                                                                                     |                                                        |
|                          |                                                | 3.9.3.4                                                                                                               | Bcache Control Register, BC_CONTROL • 3-46                                                                                                                          |                                                        |
|                          |                                                | 3.9.3.3                                                                                                               | Scache Address Register, SC_ADDR • 3-45                                                                                                                             |                                                        |
|                          |                                                | 3.9.3.2                                                                                                               | Scache Status Register, SC_STAT • 3-44                                                                                                                              | -                                                      |
|                          |                                                | 3.9.3.1                                                                                                               | Scache Control Register, SC_CTL • 3-42                                                                                                                              |                                                        |
|                          | 3.9.3                                          | Cbox IPR                                                                                                              | •                                                                                                                                                                   | 3-42                                                   |
|                          |                                                | 0.0.2.20                                                                                                              | Register • 3–40                                                                                                                                                     |                                                        |
|                          |                                                | 3.9.2.23                                                                                                              | DC_TEST_TAG_TEMP, Dcache Test TAG Temp                                                                                                                              |                                                        |
|                          |                                                | 3.9.2.21                                                                                                              | DC_TEST_TAG, Dcache Test TAG Register • 3–39                                                                                                                        |                                                        |
|                          |                                                | 3.9.2.20                                                                                                              | DC_TEST_CTL, Dcache Test TAG Control Register • 3–38                                                                                                                |                                                        |
|                          |                                                | 3.9.2.19<br>3.9.2.20                                                                                                  | CC, Cycle Counter • 3–37<br>CC_CTL, Cycle Counter Control • 3–37                                                                                                    |                                                        |
|                          |                                                | 3.9.2.18                                                                                                              | ALT_MODE, Alternate mode • 3–36                                                                                                                                     |                                                        |
|                          |                                                | 3.9.2.17                                                                                                              | DC_FLUSH, Dcache Flush Register • 3–36                                                                                                                              |                                                        |
|                          |                                                | 3.9.2.16                                                                                                              | MAF_MODE, MAF Mode Register • 3–35                                                                                                                                  |                                                        |
|                          |                                                | 3.9.2.15                                                                                                              | DC_MODE, Dcache Mode Register • 3-34                                                                                                                                |                                                        |
|                          |                                                | 3.9.2.14                                                                                                              | MCSR, Mbox Control Register • 3–33                                                                                                                                  |                                                        |
|                          |                                                | 3.9.2.13                                                                                                              | DTBIS, Dstream TB Invalidate Single • 3-32                                                                                                                          |                                                        |
|                          |                                                | 3.9.2.12                                                                                                              | Dstream TB Invalidate All, DTBIA • 3–32                                                                                                                             |                                                        |
|                          |                                                | 3.9.2.11                                                                                                              | Dstream TB Invalidate All Process, DTBIAP • 3-32                                                                                                                    |                                                        |
|                          |                                                | 3.9.2.10                                                                                                              | DC_PERR_STAT, Dcache Parity Error Status • 3-31                                                                                                                     |                                                        |
|                          |                                                | 3.9.2.9                                                                                                               | MVPTBR, Mbox Virtual Page Table Base Register • 3-31                                                                                                                |                                                        |
|                          |                                                | 3.9.2.8                                                                                                               | VA_FORM, Formatted Virtual Address • 3-30                                                                                                                           |                                                        |
|                          |                                                | 3.9.2.7                                                                                                               | VA, Faulting Virtual Address • 3-30                                                                                                                                 |                                                        |
|                          |                                                | 3.9.2.6                                                                                                               | MM_STAT, Dstream MM Fault Status Register • 3-29                                                                                                                    |                                                        |
|                          |                                                | 3.9.2.5                                                                                                               | DTB_PTE_TEMP • 3–28                                                                                                                                                 |                                                        |
|                          |                                                | 3.9.2.4                                                                                                               | Dstream TB PTE, DTB PTE • 3–27                                                                                                                                      |                                                        |
|                          |                                                | 3.9.2.2<br>3.9.2.3                                                                                                    | DTB_CM, Dstream TB Current Mode • 3–26<br>DTB_TAG, Dstream TB TAG • 3–27                                                                                            |                                                        |
|                          |                                                | 3.9.2.1                                                                                                               | DTB_ASN, Dstream TB Address Space Number • 3–26                                                                                                                     |                                                        |
|                          | 3.9.2                                          |                                                                                                                       | Dcache IPRs                                                                                                                                                         | 3–26                                                   |
|                          |                                                | 3.9.1.27                                                                                                              | Serial line receive, SL_RCV • 3–26                                                                                                                                  | 0.00                                                   |
|                          |                                                | 3.9.1.26                                                                                                              | Serial line transmit, SL_XMIT • 3–25                                                                                                                                |                                                        |
|                          |                                                | 3.9.1.25                                                                                                              | Interrupt Summary register, ISR • 3-24                                                                                                                              |                                                        |
|                          |                                                | 3.9.1.24                                                                                                              | HW Interrupt Clear register, HWINT_CLR • 3-24                                                                                                                       |                                                        |
|                          |                                                | 3.9.1.23                                                                                                              | Software Interrupt Clear Register. SICR • 3–23                                                                                                                      |                                                        |

۷

|       |      | 4.1.6    | •                     | 1164-AA Interface                                                      | 46   |
|-------|------|----------|-----------------------|------------------------------------------------------------------------|------|
|       |      |          | 4.1.6.1               | System Interface • 4-7                                                 |      |
|       |      |          | 4.1.6.2               | Bcache Interface • 4–10                                                |      |
|       |      | 4.1.7    | •                     | 1164-AA Interface Command Descriptions                                 | 4-12 |
|       |      | 4.1.8    | Transactio<br>4.1.8.1 | ns<br>Read Miss • 4–14                                                 | 4–14 |
|       |      |          | 4.1.8.2               | Read Miss • 4–14<br>Read Miss with victim • 4–15                       |      |
|       |      |          | 4.1.8.2.1             | Without a Victim Buffer • 4–15                                         |      |
|       |      |          | 4.1.8.2.2             | With a Victim Buffer • 4–16                                            |      |
|       |      |          | 4.1.8.3               | Fill • 4–16                                                            |      |
|       | •    |          | 4.1.8.4               | Write Block • 4–17                                                     |      |
|       |      |          | 4.1.8.5               | Set Dirty, Lock • 4–18                                                 |      |
|       |      |          | 4.1.8.6               | Flush • 4–19                                                           |      |
|       |      |          | 4.1.8.7               | Read Dirty, and Read Dirty/INV • 4-19                                  |      |
|       |      |          | 4.1.8.8<br>4.1.8.9    | Invalidate • 4–20<br>Set Shared • 4–21                                 |      |
|       |      |          | 4.1.8.10              | Non-cached Reads • 4–21                                                |      |
|       |      |          | 4.1.8.11              | Non-cached Writes • 4-21                                               |      |
|       |      |          | 4.1.8.12              | Locks • 4-22                                                           |      |
|       |      | 4.1.9    | Clocks                |                                                                        | 4–22 |
|       |      |          | 4.1.9.1               | CPU Clock • 4-22                                                       |      |
|       |      |          | 4.1.9.2               | System Clock • 4-23                                                    |      |
|       |      |          | 4.1.9.3               | Reference Clock • 4-23                                                 |      |
|       |      |          | 4.1.9.4               | Sysclock to Bcache cycle time ratios • 4-23                            |      |
|       |      | 4.1.10   | Tri-state C           | •                                                                      | 4-24 |
|       |      | 4.1.11   | Restrictio            |                                                                        | 425  |
|       |      |          | 4.1.11.1<br>4.1.11.2  | Fills after other transactions • 4–25                                  |      |
|       |      |          | 4.1.11.2              | Sending System commands • 4–25<br>CACK for WRITE BLOCK commands • 4–25 |      |
|       |      |          | 4.1.11.4              | No Bcache Systems • 4-25                                               |      |
|       |      |          | 4.1.11.5              | Scache duplicate tag store • 4-25                                      |      |
|       |      | 4.1.12   | ECC/Parit             | · –                                                                    | 4-26 |
|       | 4.2  |          |                       | -                                                                      | 4.00 |
|       | 4.2  | REVISION | N HISTORY             |                                                                        | 4–28 |
| СНАРТ | FR 5 | RESET A  | ND INITIAL            | IZATION                                                                | 5-1  |
| ••••• |      |          |                       |                                                                        |      |
|       | 5.1  |          | SET_L AND             |                                                                        | 51   |
|       |      | 5.1.1    | •                     | Requirements                                                           | 5–2  |
|       |      | 5.1.2    | Pin State             | with DC_OK_H Not Asserted                                              | 5–2  |
|       | 5.2  | SYSCLO   | CK RATIO A            | ND DELAY                                                               | 5–2  |
|       | 5.3  | BIST     |                       |                                                                        | 5–3  |
|       | 5.4  | SERIAL I | ROM                   |                                                                        | 5–3  |
|       | 5.5  | CACHE I  | NITIALIZATI           | ON                                                                     | 5-4  |
|       | 5.6  |          | ALIZATION             |                                                                        | 5-4  |
|       | 5.7  | UNITIALI | ZED STATE             |                                                                        | 54   |
|       | 5.8  | TIMEOUT  | <b>FRESET</b>         | •                                                                      | 5-4  |
|       | 5.9  | CLOCK F  |                       |                                                                        | 5-4  |
|       |      |          |                       |                                                                        |      |
|       | 5.10 | IEEE 114 | 9.1 TEST P            | DRT RESET                                                              | 5–5  |

| 5.11       | REVISION HISTORY                                                                                                                                                                                                                                                                   | 5–6                                           |
|------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------|
| CHAPTER 6  | ERROR HANDLING                                                                                                                                                                                                                                                                     | 6–1                                           |
| 6.1        | OVERVIEW                                                                                                                                                                                                                                                                           | 6–1                                           |
| 6.2        | REVISION HISTORY                                                                                                                                                                                                                                                                   | 6–3                                           |
| CHAPTER 7  | DC CHARACTERISTICS                                                                                                                                                                                                                                                                 | 7–1                                           |
| 7.1        | OVERVIEW<br>7.1.1 Power Supply<br>7.1.2 Input Clocks<br>7.1.3 Signal pins                                                                                                                                                                                                          | 7–1<br>7–1<br>7–1<br>7–2                      |
| 7.2        | ECL 100K MODE7.2.1Power Supply7.2.2Reference Supply7.2.3Inputs7.2.4Outputs7.2.5Bidirectionals                                                                                                                                                                                      | 7-3<br>7-3<br>7-3<br>7-3<br>7-3<br>7-3<br>7-3 |
| 7.3        | POWER DISSIPATION                                                                                                                                                                                                                                                                  | 7–4                                           |
| 7.4        | REVISION HISTORY                                                                                                                                                                                                                                                                   | 7–5                                           |
| CHAPTER 8  | AC CHARACTERISTICS                                                                                                                                                                                                                                                                 | 8-1                                           |
| 8.1        | REVISION HISTORY                                                                                                                                                                                                                                                                   | 8–1                                           |
| CHAPTER 9  | PINOUT                                                                                                                                                                                                                                                                             | 9–1                                           |
| 9.1        | DECCHIP 21164-AA PINOUT OVERVIEW                                                                                                                                                                                                                                                   | 9–1                                           |
| 9.2        | DECCHIP 21164-AA SIGNAL PINS                                                                                                                                                                                                                                                       | 9–1                                           |
| 9.3        | REVISION HISTORY                                                                                                                                                                                                                                                                   | 9–6                                           |
| CHAPTER 10 | ) THE PACKAGE                                                                                                                                                                                                                                                                      | 10–1                                          |
| 10.1       | REVISION HISTORY                                                                                                                                                                                                                                                                   | 10–1                                          |
| CHAPTER 11 | TEST INTERFACE AND TESTABILITY FEATURES                                                                                                                                                                                                                                            | 11–1                                          |
| 11.1       | INTRODUCTION                                                                                                                                                                                                                                                                       | 11–1                                          |
| 11.2       | TEST PORT           11.2.1         Normal Test Interface Mode           11.2.1.1         SROM Port • 11–2           11.2.1.2         Serial Terminal Port • 11–3           11.2.1.3         IEEE 1149.1 Test Access Port • 11–3           11.2.1.4         Test Status Pins • 11–5 | 11–1<br>11–2                                  |
|            | 11.2.2 Manufacturing Test Interface Mode<br>11.2.3 Debug Test Interface Mode                                                                                                                                                                                                       | 11–6<br>11–9                                  |
|            | 11.2.4 Activating Debug/Manufacturing Port Modes in a System                                                                                                                                                                                                                       | 11–10                                         |
| 11.3       | ICACHE BIST                                                                                                                                                                                                                                                                        | 11–10                                         |

.

vii

# Contents

| 11.4  | ICACHE SERIAL WRITE AND READ OPERATIONS                                            | 11–12                   |
|-------|------------------------------------------------------------------------------------|-------------------------|
| 11.5  | SCACHE/DCACHE TEST FEATURES                                                        | 11–13                   |
| 11.6  | OBSERVABILITY LFSRS (OBLS)<br>11.6.1 Organization<br>11.6.2 On-line LFSR Operation | 11–14<br>11–14<br>11–15 |
| 11.7  | OBSERVABILITY SCAN REGISTERS (OBSS)                                                | 11–15                   |
| 11.8  | CONTROLLABILITY FEATURES                                                           | 11–16                   |
| 11.9  | BOUNDARY SCAN REGISTER                                                             | 11 <b>–17</b>           |
| 11.10 | TESTABILITY IPRS                                                                   | 11–19                   |
| 11.11 | TEST FEATURE RESET AND INITIALIZATION                                              | 11–19                   |
| 11.12 | OPEN ISSUES                                                                        | 1120                    |
| 11.13 | REVISION HISTORY                                                                   | 11–21                   |

# FIGURES

| 2–1             | Abstract CPU Block Diagram                      | 23   |
|-----------------|-------------------------------------------------|------|
| 2–2             | Pipeline Examples                               | 2–20 |
| 3–1             | HW_LD instruction                               | 3-4  |
| 32              | HW_ST instruction                               | 3–5  |
| 3–3             | HW_REI instruction                              | 3–6  |
| 3–4             | HW_MFPR, HW_MTPR instruction                    | 3–7  |
| 3–5             | Istream TBmiss flow                             | 3–12 |
| 3–6             | Dstream TBmiss flow                             | 3–13 |
| 3–7             | Istream TB Tag, ITB_TAG                         | 3–14 |
| 3–8             | Istream TB PTE Write Format, ITB_PTE            | 3–14 |
| 3– <del>9</del> | Istream TB PTE Read Format, ITB_PTE             | 3–15 |
| 3–10            | Address Space Number Read/Write Format, ITB_ASN | 3–15 |
| 3–11            | Istream TB PTE Temp Read Format, ITB_PTE_TEMP   | 3–15 |
| 3–12            | ITB_IS                                          | 3–16 |
| 3–13            | IFAULT_VA_FORM in non NT mode                   | 3–16 |
| 3–14            | IFAULT_VA_FORM in NT mode                       | 3–17 |
| 3–15            | IVPTBR                                          | 3–17 |
| 3–16            | ICPERR_STAT Read format                         | 3–17 |
| 3–17            | EXC_ADDR Read/Write format                      | 3–18 |
| 3–18            | Exception Summary register Read Format, EXC_SUM | 3–18 |
| 3–19            | Exception Mask register Read Format, EXC_MASK   | 3–19 |
| 3–20            | PAL_BASE                                        | 3–20 |
| 3–21            | Processor Status, PS                            | 3–20 |
| 3-22            | Ibox Control/Status Register ICSR               | 320  |
| 3–23            | Interrupt Priority Level Register, IPL          | 3–22 |
| 3–24            | Interrupt Id Register, INTID                    | 3–22 |

DIGITAL RESTRICTED DISTRIBUTION

•

viii

.

| 3–25 | Asynchronous System Trap Request Register, ASTRR       | 3–23 |
|------|--------------------------------------------------------|------|
| 326  | Asynchronous System Trap Enable Register, ASTER format | 3–23 |
| 327  | Software Interrupt Request Register, SIRR write format | 3–23 |
| 3–28 | Software Interrupt Clear Register, SICR write format   | 3–24 |
| 3–29 | Hardware Interrupt Clear Register, HWINT_CLR           | 3–24 |
| 3–30 | Interrupt Summary Register, ISR read format            | 3–24 |
| 331  | Serial line transmit Register, SL_XMIT                 | 3–25 |
| 3–32 | Serial line receive Register, SL_RCV                   | 3–26 |
| 3–33 | DTB_ASN                                                | 3–26 |
| 3–34 | DTB_CM                                                 | 3–27 |
| 3–35 | DTB_TAG, Dstream TB Tag                                | 3–27 |
| 3–36 | DTB_PTE, Dstream TB PTE                                | 3–28 |
| 3–37 | DTB_PTE_TEMP                                           | 329  |
| 3–38 | MM_STAT, Dstream MM Fault Register                     | 3–29 |
| 3–39 | VA, Faulting VA Register                               | 330  |
| 3-40 | VA_FORM, Formatted VA Register for NT_Mode=0           | 3–30 |
| 3-41 | VA_FORM, Formatted VA Register, NT_Mode=1              | 3–31 |
| 3-42 | MVPTBR                                                 | 3–31 |
| 3–43 | DC_PERR_STAT, Dcache Parity Error Status               | 3–32 |
| 3-44 | DTBIS                                                  | 3–33 |
| 3–45 | MCSR, Mbox Control Register                            | 3–33 |
| 3–46 | DC_MODE, Dcache Mode Register                          | 3–34 |
| 3-47 | MAF_MODE, MAF Mode Register                            | 3–35 |
| 348  | ALT_MODE                                               | 3–36 |
| 3-49 | CC, Cycle Counter Register                             | 3–37 |
| 3–50 | CC_CTL, Cycle Counter Control Register                 | 3–38 |
| 3–51 | DC_TEST_CTL, Dcache Test TAG Control Register          | 3–38 |
| 3–52 | DC_TEST_TAG, Dcache Test TAG Register                  | 3–39 |
| 353  | DC_TEST_TAG_TEMP, Dcache Test TAG Temp Register        | 3-40 |
| 4–1  | DECchip 21164-AA System Interface                      | 4–1  |
| 4–2  | Read Miss                                              | 4–15 |
| 4–3  | Read Miss with Victim                                  | 4–16 |
| 44   | Read Miss with Victim Buffer                           | 4–16 |
| 4–5  | Fill                                                   | 4–17 |
| 46   | Write Block                                            | 4–18 |
| 4-7  | Set Dirty, and Lock                                    | 4–19 |
| 48   | Flush                                                  | 4–19 |
| 49   | Read Dirty                                             | 420  |
| 4–10 | Invalidate                                             | 4–20 |
| 4-11 | Set Shared                                             | 421  |
| 4–12 | Reference Clock Timing                                 | 4–23 |
| 4–13 | Driving the CMD/ADDRESS Bus                            | 4–24 |
| 4-14 | ECC code                                               | 4-26 |

ix

| 415  | x4 bit arrangement                | 4–27  |
|------|-----------------------------------|-------|
| 5-1  | Serial ROM Load Timing            | 5–3   |
| 9–1  | DECchip 21164-AA Pinout           | 9–1   |
| 11-1 | IEEE 1149.1 Test Access Port      | 11–4  |
| 11–2 | Manufacturing Test Interface Mode | 11–7  |
| 113  | Tes Port Connections on Module    | 11–11 |
| 11–4 | SROM Fill Scan Path Bit Order     | 11–13 |
| 11–5 | Read Scan Path Bit Order          | 11–13 |
| 11–6 | LFSR Chain Organization           | 11–14 |

# TABLES

| 1    | REVISION History                                     | 1    |
|------|------------------------------------------------------|------|
| 1-1  | Register Field Type Notation                         | 1–4  |
| 1–2  | Register Field Notation                              | 1–4  |
| 1–3  | DECchip 21164-AA Chip Summary and Micro-architecture | 1–5  |
| 1-4  | Revision History                                     | 1–6  |
| 2–1  | Interrupt Priority Level Effect                      | 2–6  |
| 2–2  | Pipeline Examples - All Cases                        | 2–20 |
| 2–3  | Pipeline Examples - Integer Add                      | 220  |
| 2–4  | Pipeline Examples - Floating Add                     | 2–21 |
| 25   | Pipeline Examples - Load (Dcache hit)                | 2–21 |
| 26   | Pipeline Examples - Load (Dcache miss)               | 2–21 |
| 2–7  | Pipeline Examples - Store (Dcache hit)               | 2–21 |
| 28   | Instruction Classes and Slotting                     | 2–24 |
| 2–9  | Instruction Latencies                                | 2–26 |
| 2–10 | Revision History                                     | 2–30 |
| 3-1  | PALcode Trap Entry Points                            | 3–3  |
| 3–2  | HW_LD Format description                             | 3–4  |
| 3–3  | HW_ST Format description                             | 3–5  |
| 3-4  | HW_REI Format description                            | 3–6  |
| 3-5  | HW_MTPR and HW_MFPR Format description               | 3–7  |
| 3–6  | IPR Encodings                                        | 3–7  |
| 3–7  | OpenVMS SRM defined State                            | 3–9  |
| 3–8  | OSF SRM defined State                                | 3–11 |
| 3–9  | Description of GHD bits in ITB_PTE_TEMP read format  | 3–15 |
| 3-10 | ICPERR_STAT Field Descriptions                       | 3–17 |
| 3–11 | EXC_SUM Field Descriptions                           | 3–19 |
| 3–12 | ICSR Field Descriptions                              | 3–20 |
| 3–13 | Performance Counter 0 Programming information        | 3–21 |
| 3–14 | Performance Counter 1 Programming information        | 3–21 |
| 315  | HWINT_CLR Field Descriptions                         | 3–24 |
| 3–16 | ISR read format Field Descriptions                   | 3–25 |

| 3–17             | DTB_CM Mode Bits                                        | 3–27  |
|------------------|---------------------------------------------------------|-------|
| 3–18             | MM_STAT Field Descriptions                              | 3–29  |
| 3–19             | VA_FORM Field Descriptions                              | 3–31  |
| 3–20             | DC_PERR_STAT Field Descriptions                         | 3–32  |
| 3–21             | MCSR Field Descriptions                                 | 3–33  |
| 3–22             | DC_MODE Field Descriptions                              | 3–34  |
| 3–23             | MAF_MODE Field Descriptions                             | 3–35  |
| 3–24             | ALT Mode                                                | 3–37  |
| 3–25             | CC_CTL Field Descriptions                               | 3–38  |
| 3–26             | DC_TEST_CTL Field Descriptions                          | 3–38  |
| 327              | DC_TEST_TAG Field Descriptions                          | 3–39  |
| 3–28             | DC_TEST_TAG_TEMP Field Descriptions                     | 3–40  |
| 3–2 <del>9</del> | CBOX_IPRS Descriptions                                  | 3-42  |
| 3–30             | SC_CTL Field Descriptions                               | 3–43  |
| 331              | SC_TAG_STAT Field Description                           | 3–44  |
| 3–32             | SC_STAT Field Descriptions                              | 3–45  |
| 3–33             | SC_CMD Field Descriptions                               | 3-45  |
| 3–34             | BC_CONTROL Field Descriptions                           | 3–47  |
| 3–35             | BC_TAG_STAT Field Description                           | 349   |
| 3–36             | BC_CONFIG Field Descriptions                            | 3-49  |
| 3–37             | BC_SIZE Field Descriptions                              | 3–50  |
| 3–38             | Loading/Locking Rules for External Interface Registers  | 351   |
| 339              | EI_STAT Field Descriptions                              | 3–52  |
| 3–40             | Syndromes For Single-Bit Errors                         | 3–54  |
| 3–41             | PAL Restrictions Table                                  | 3–57  |
| 3-42             | Cbox IPR Restrictions Table                             | 3-59  |
| 3–43             | Revision History                                        | 3–60  |
| 4-1              | Physical Memory Regions                                 | 4–3   |
| 42               | System Designs                                          | 4–4   |
| 4–3              | Cache States                                            | 4–5   |
| 4-4              | System Actions                                          | 4–5   |
| 4–5              | Processor Actions                                       | 46    |
| 46               | DECchip 21164-AA Commands to the System                 | 4–7   |
| 4–7              | System Commands to DECchip 21164-AA                     | 4-8   |
| 4-8              | DECchip 21164-AA Responses to System Commands           | 4-9   |
| 49               | Revision History                                        | 4–28  |
| 5-1              | System Clock Divisor                                    | 5–2   |
| 5–2              | System Clock Delay                                      | 53    |
| 5–3              | Revision History                                        | 56    |
| 6–1              | Revision History                                        | 6–3   |
| 7-1              | CMOS DC Characteristics                                 | . 7–2 |
| 72               | DECchip 21164-AA Estimated Power Dissipation @Vdd=3.45V | 7-4   |
| 7–3              | Revision History                                        | 7-5   |

.

Xİ

Contents

| 8-1             | Revision History                               | 8–1   |
|-----------------|------------------------------------------------|-------|
| 9–1             | Clock Pins                                     | 9-2   |
| <del>9</del> –2 | System Interface Pins                          | 9–3   |
| 9-3             | Bcache Pins                                    | 9-4   |
| 94              | Interrupt and Misc. Pins                       | 9-5   |
| <del>9</del> –5 | Revision History                               | 9–6   |
| 10-1            | Revision History                               | 10–1  |
| 11-1            | DECchip 21164-AA Test Port Pins and Port Modes | 11–2  |
| 11–2            | Instruction Register                           | 11–5  |
| 11–3            | Test Command Register                          | 11–8  |
| 11–4            | FRCAM Scan Register Organization               | 11–12 |
| 11–5            | Observability LFSR Organization                | 11–14 |
| 116             | Observability Scan Register Organization       | 11–16 |
| 11–7            | Boundary Scan Register Organization            | 11–17 |
| 118             | Revision History                               | 11–21 |

| Revision | Date             | Description                                 |
|----------|------------------|---------------------------------------------|
| 1.0      | 2-March-1992     | Initial release.                            |
| 1.5      | 8-May-1992       | Updates including some significant changes. |
| 1.9      | 21-December-1992 | Further updates.                            |

Table 1: REVISION History

.

# Chapter 1

# Introduction

## 1.1 Scope

This document describes the DECchip 21164-AA chip, a microprocessor that implements the Alpha architecture. This specification describes the external interface and programming information specific to the actual implementation. It does not describe the detailed implementation of the chip nor the Alpha architecture. The reader is referred to the Alpha System Reference Manual for the architectural specification.

### 1.2 Chip Features

The DECchip 21164-AA microprocessor is a CMOS-5 (.5 micron) super-scalar super-pipelined implementation of the Alpha architecture. It will be the basis of a family of Alpha products. The DECchip 21164-AA chip is designed to meet the requirements of a wide variety of systems, ranging from uni-processor workstations to multiprocessors. DECchip 21164-AA is intended to integrate well into a certain style of system environment, one with a particular kind of cache coherence protocol and a pipelined or lock-step style of bus and memory subsystem operation. A number of configuration options allow its use in a range of system designs ranging from extremely simple systems with minimum component count to high-performance systems with very high cache and memory bandwidth. DECchip 21164-AA design compromises are made with the intention of achieving maximum performance in high-performance systems while offering competitive performance and reasonable implementation constraints in lower cost systems.

DECchip 21164-AA features:

- Alpha instructions to support byte, word, longword, quadword, DEC F\_floating, G\_floating and IEEE S\_floating and T\_floating data types. Limited support is provided for DEC D\_ floating operations. Partial implementation of the architecturally optional instructions: FETCH and FETCH\_M.
- Demand paged memory management unit which in conjunction with properly written PALcode fully implements the Alpha memory management architecture appropriate to the operating system running on the processor. The translation buffer can be used with alternative PALcode to implement a variety of page table structures and translation algorithms.
- On-chip 48-entry I-stream TB and 64-entry D-stream TB in which each entry maps one 8Kbyte page or a group of 8, 64, or 512 8Kbyte pages, with the size of each TB entry's group specified by hint bits stored in the entry.

#### DIGITAL RESTRICTED DISTRIBUTION

Introduction 1–1

- World class performance.
- Low average cycles per instructions (CPI). The DECchip 21164-AA chip can issue four Alpha instructions in a single cycle, thereby minimizing the average CPI. A number of low-latency and/or high-throughput features in the instruction issue unit and the on-chip components of the memory subsystem further reduce the average CPI.
- On-chip high-throughput floating point units, capable of executing both DEC and IEEE floating point data types.
- On-chip 8Kbyte virtual instruction cache with seven-bit ASNs (MAX\_ASN=127).
- On-chip dual-read-ported 8Kbyte data cache (implemented as two 8Kbyte data caches containing identical data).
- On-chip write buffer with six 32-byte entries.
- On-chip 96Kbyte 3-way set associative writeback second level cache.
- Bus interface unit, which contains logic to directly access an optional external third-level cache without CPU module action. The size and access time of the external third-level cache is programmable.
- On-chip performance counters to measure and analyze CPU and system performance.
- An instruction cache diagnostic interface to support chip and module level testing.
- An internal clock generator which generates both a high-speed clock needed by the chip itself, and a pair of system clocks for use by the CPU module.
- The DECchip 21164-AA chip is packaged in 503 pin IPGA packages. The heat sinks are separable and application specific.

## **1.3 Terminology and Conventions**

#### 1.3.1 Numbering

All numbers are decimal unless otherwise indicated. Where there is ambiguity, numbers other than decimal are indicated with the name of the base following the number in parentheses, e.g., FF(hex).

# **1.3.2 UNPREDICTABLE And UNDEFINED**

Throughout this specification, the terms UNPREDICTABLE and UNDEFINED are used. Their meanings are quite different and must be carefully distinguished. One key difference is that only privileged software (that is, software running in kernel mode) may trigger UNDEFINED operations, whereas either privileged or unprivileged software may trigger UNPREDICTABLE results or occurrences. A second key difference is that UNPREDICTABLE results and occurrences do not disrupt the basic operation of the processor; the processor continues to execute instructions in its normal manner. In contrast, UNDEFINED operation may halt the processor or cause it to lose information.

A result specified as UNPREDICTABLE may acquire an arbitrary value subject to a few constraints. Such a result may be an arbitrary function of the input operands or of any state information that is accessible to the process in its current access mode. UNPREDICTABLE results may be unchanged from their previous values. UNPREDICTABLE results must not be security holes. Specifically, UNPREDICTABLE results must not do any of the following:

1–2 Introduction

- Depend on or be a function of the contents of memory locations or registers which are inaccessible to the current process in the current access mode.
- Write or modify the contents of memory locations or registers to which the current process in the current access mode does not have access.
- Halt or hang the system or any of its components.

For example, a security hole would exist if some UNPREDICTABLE result depended on the value of a register in another process, on the contents of processor temporary registers left behind by some previously running process, or on a sequence of actions of different processes.

An occurrence specified as UNPREDICTABLE may happen or not based on an arbitrary choice function. The choice function is subject to the same constraints as are UNPREDICTABLE results and, in particular, must not constitute a security hole.

Results or occurrences specified as UNPREDICTABLE may vary from moment to moment, implementation to implementation, and instruction to instruction within implementations. Software can never depend on results specified as UNPREDICTABLE.

Operations specified as UNDEFINED may vary from moment to moment, implementation to implementation, and instruction to instruction within implementations. The operation may vary in effect from nothing to stopping system operation. UNDEFINED operations must not cause the processor to hang, i.e., reach an unhalted state from which there is no transition to a normal state in which the machine executes instructions. Only privileged software (that is, software running in kernel mode) may trigger UNDEFINED operations.

## 1.3.3 Data Field Size

The term INTnn, where nn is one of 2, 4, 8, 16, 32, or 64, refers to a data field of nn contiguous naturally aligned bytes. INT4 refers to a naturally aligned longword, for example.

# 1.3.4 Ranges And Extents

Ranges are specified by a pair of numbers separated by a ".." and are inclusive, e.g., a range of integers 0..4 includes the integers 0, 1, 2, 3, and 4.

Extents are specified by a pair of numbers in angle brackets separated by a colon and are inclusive, e.g., bits <7:3> specify an extent of bits including bits 7, 6, 5, 4, and 3.

# 1.3.5 Register Format Notation

This specification contains a number of figures that show the format of various registers, followed by a description of each field. In general, the fields on the register are labeled with either a name or a mnemonic. The description of each field includes the name or mnemonic, the bit extent, and the type.

The "Type" column in the field description includes both the actual type of the field, and an optional initialized value, separated from the type by a comma. The type denotes the functional operation of the field, and may be one of the values shown in Table 1–1. If present, the initialized value indicates that the field is initialized by hardware to the specified value at powerup. If the initialized value is not present, the field is not initialized at powerup.

#### DIGITAL RESTRICTED DISTRIBUTION

Introduction 1–3

| Notation | Description                                                                                                                                                                                                                                                                                                                         |  |
|----------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| RW       | A read-write bit or field. The value may be read and written by software.                                                                                                                                                                                                                                                           |  |
| RO       | A read-only bit or field. The value may be read by software. It is written by hardware software writes are ignored.                                                                                                                                                                                                                 |  |
| WO       | A write-only bit or field. The value may be written by software. It is used by hardware and reads by software return an UNPREDICTABLE result.                                                                                                                                                                                       |  |
| WZ       | A write bit or field. The value may be written by software. It is used by hardware and reads by software return a 0.                                                                                                                                                                                                                |  |
| W1C      | A write-one-to-clear bit. If reads are allowed to the register then the value may be<br>read by software. If it is a write-only register then a read by software returns ar<br>UNPREDICTABLE result. Software writes of a 1 cause the bit to be cleared by hard<br>ware. Software writes of a 0 do not modify the state of the bit. |  |
| WOC      | A write-zero-to-clear bit. If reads are allowed to the register then the value may<br>be read by software. If it is a write-only register then a read by software returns<br>an UNPREDICTABLE result. Software writes of a 0 cause the bit to be cleared by<br>hardware. Software writes of a 1 do not modify the state of the bit. |  |
| WA       | A write-anything-to-the-register-to-clear bit. If reads are allowed to the register the<br>the value may be read by software. If it is a write-only register then a read by softwar<br>returns an UNPREDICTABLE result. Software write of any value to the register caus<br>the bit to be cleared by hardware.                      |  |
| RC       | A read-to-clear field. The value is written by hardware and remains unchanged unti<br>read. The value may be read by software at which point, hardware may write a new<br>value into the field.                                                                                                                                     |  |

 Table 1–1:
 Register Field Type Notation

In addition to named fields in registers, other bits of the register may be labeled with one of the four symbols listed in Table 1–2. These symbols denote the type of the unnamed fields in the register.

| Notation | Description                                                                                                                                                                                                          |  |
|----------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| RAZ      | Fields specified as Read As Zero (RAZ) return a zero when read.                                                                                                                                                      |  |
| RAO      | Fields specified as Read As One (RAO) return a one when read.                                                                                                                                                        |  |
| IGN      | Fields specified as Ignore (IGN) are ignored when written and UNPREDICTABLE when read if not otherwise specified.                                                                                                    |  |
| MBZ      | Fields specified as Must Be Zero (MBZ) must never be filled by software with a non-<br>zero value. If the processor encounters a non-zero value in a field specified as MBZ, a<br>Reserved Operand exception occurs. |  |
| SBZ      | Fields specified as Should Be Zero (SBZ) should be filled by software with a zero value.<br>These fields may be used at some future time. Non-zero values in SBZ fields produce<br>UNPREDICTABLE results.            |  |

Table 1–2: Register Field Notation

.

# 1.4 Chip Summary

| Feature                       | Description                                                                                                      |
|-------------------------------|------------------------------------------------------------------------------------------------------------------|
| Estimated Cycle Time Range    | 4.4ns to 3.2ns†                                                                                                  |
| Product Speed Bin Points      | To Be Determined                                                                                                 |
| Process Technology            | CMOS5 (0.5 micron CMOS) and CMOS5S (TBD micron CMOS)                                                             |
| Transistor count              |                                                                                                                  |
| Die Size                      |                                                                                                                  |
| Package                       | 503 pin IPGA (interstitial pin grid array)                                                                       |
| No. Chip Pads                 | 581                                                                                                              |
| No. Signal Pins               | 289                                                                                                              |
| Typ Maximum Power Dissipation | approx. 60W @ 3.5ns cycles, Vdd=3.45V‡                                                                           |
| Clocking input                | two times the internal clock speed. E.g., 571.4 Mhz at a 3.5ns cycle time.                                       |
| Virtual address size          | 43 bits                                                                                                          |
| Physical address size         | 40 bits                                                                                                          |
| Page size                     | 8Kbytes                                                                                                          |
| Issue rate                    | 4 instructions per cycle                                                                                         |
| Integer Pipeline              | 7 stage                                                                                                          |
| Floating Pipeline             | 9 stage                                                                                                          |
| On-chip Dcache                | 8Kbyte, physical, direct-mapped, write-thru, 32-byte block, 32-byte fill                                         |
| On-chip Icache                | 8Kbyte, virtual, direct-mapped, 32-byte block, 32-byte fill, 128 ASN<br>(MAX_ASN=127)                            |
| On-chip Scache                | 96Kbyte, physical, 3-way set associative, writeback, 32 or 64-byte block<br>32 or 64-byte fill                   |
| On-chip DTB                   | 64-entry, fully-associative, NLU replacement, 8K pages, 128 ASNs (MAX<br>ASN=127), full granularity hint support |
| On-chip ITB                   | 48-entry, fully-associative, NLU replacement, 128 ASNs (MAX_ASN=127) full granularity hint support               |
| FPU                           | On-chip FPU supports both IEEE and DEC floating point                                                            |
| Bus                           | Separate data and address bus. 128-bit                                                                           |
| Serial ROM Interface          | Allows the chip to access a serial ROM                                                                           |

 Table 1–3:
 DECchip 21164-AA Chip Summary and Micro-architecture

†This range should not be interpreted as implying any particular production speed bin point. Speed bin ranges will not be known until characterization of production CMOS5 parts has been completed. The highest performance system designs should be designed to accept 3.2ns DECchip 21164-AA parts, though it is not known if or when production parts that fast will be available.

<sup>‡</sup>Power consumption scales linearly wih frequency over the frequency range 225Mhz to 312Mhz.

# 1.5 Revision History

| Who | When                 | Description of change              | **** |
|-----|----------------------|------------------------------------|------|
| JHE | 9-Feb-1992           | Initial version.                   |      |
| JHE | 1-March-1992         | Add chip summary. Initial release. |      |
| JHE | 29-November-<br>1992 | Updates for new revision.          |      |

Table 1–4: Revision History

# Chapter 2

# **DECchip 21164-AA Micro-Architecture**

## 2.1 Introduction

This chapter gives a programmer and system designer a view of the DECchip 21164-AA microarchitecture. It is intended to be sufficient for almost all purposes. More detailed hardware descriptions of the chip exist in the internal specification and the behavioral model.

DECchip 21164-AA can issue four instructions in a single cycle. Scheduling and issue rules are given at the end of the chapter. DECchip 21164-AA is a pipelined CPU with 4  $Ibox^1$  stages, 3 integer operate stages and 4 floating point operate stages. The pipeline is presented later in this chapter.

The combination of DECchip 21164-AA and its PALcode implements the Alpha architecture. Parts of the hardware design assume specific PAL functionality. This functionality is described in the next chapter. If a certain piece of hardware is "architecturally incomplete", the missing functionality must be implemented in PALcode.

# 2.2 Overview

The DECchip 21164-AA microprocessor consists of five functional units:

- The Ibox fetches, decodes, and issues instructions. It manages the pipelines (data bypassing), the PC, instruction caching (Icache), prefetching, and instruction stream memory management. It also contains interrupt and trap handling hardware.
- The Ebox contains the two integer execution units which execute all integer instructions. It also partially executes all memory instructions by calculating the effective address, if there is one.
- The Mbox processes all load and store operations after the Ebox produces the address. It implements data stream memory management, executes loads, stores, the memory barrier instruction, and some other instructions. It manages outstanding load misses, the write buffer, and the data cache (Dcache). It enforces any reference ordering required for correct operation or by the Alpha shared memory model. It also buffers physical instruction stream requests sent by the Ibox.

#### DIGITAL RESTRICTED DISTRIBUTION

DECchip 21164-AA Micro-Architecture 2–1

<sup>&</sup>lt;sup>1</sup> The Ibox is the unit which fetches, decodes and issues instructions.

- The Cbox processes all accesses sent by the Mbox and implements all memory related external interface functions, particularly the coherence protocol functions for writeback caching. It controls the Scache, a 96 Kbyte, 3 way set-associative, writeback, data and instruction cache. The Cbox also manages the optional external direct mapped Bcache. It handles all instruction and data primary cache read misses, performs the function of writing data from the write buffer into the shared coherent memory subsystem, and has a major role in executing the memory barrier instruction.
- The Fbox contains the two floating point execution units, one which basically executes floating multiply instructions and another which executes all other floating point instructions, particularly floating point add and subtract. Both units execute the CPYS instruction.

The Ebox and Fbox can each accept one or two instructions per cycle. If code is properly scheduled, DECchip 21164-AA can issue up to four instructions per cycle.

Figure 2–1 is a block diagram of DECchip 21164-AA showing the major functional elements and their positions in the pipeline.

DIGITAL RESTRICTED DISTRIBUTION

I

**DECchip 21164-AA Micro-Architecture** 

2-3



DEC Chip 21164-AA (EV5 CPU) Specification, Revision 1.9, December 1992

### 2.3 Ibox

The primary function of the Ibox is to issue instructions to the Ebox, Mbox and Fbox. In order to provide those instructions, the Ibox also contains the prefetcher, PC pipeline, 48-entry ITB, abort logic, register conflict or dirty logic, and interrupt and exception logic. The Ibox decodes four instructions in parallel and checks that the required resources are available for each instruction. If resources are available and multiple issue is possible, then up to four instructions may be issued. Section 2.10 give the detailed rules governing multiple instruction issue. The Ibox issues only the instructions for which all required resources are available. The Ibox does NOT issue instructions out of order, even if the resources are available for a later instruction and not for an earlier one.

The Ibox controls the primary instruction cache, the Icache. See Section 2.8.2 for more detail.

The Ibox does not advance to a new group of four instructions until all instructions in the current group have been issued. The Ibox only handles naturally aligned groups of four instructions (INT16). If a branch to the middle of such a group occurs, the Ibox attempts issuing the instructions from the branch target to the end of the INT16, proceeding to the next INT16 of instructions only when all the instructions in the target INT16 have been issued. This implies that achieving maximum issue rate requires that code be be scheduled properly and NOPs (floating or integer) be used to fill empty slots in the schedule.

#### 2.3.1 Instruction Prefetch

The Ibox contains an aggressive instruction prefetcher and a four entry prefetch buffer (called the refill buffer). Each Icache miss is checked in the refill buffer. If the refill buffer contains the instruction data, it fills the Icache and instruction buffer simultaneously. If the refill buffer does not contain the necessary data, a fetch and a number of prefetches are sent to the Mbox. If these requests are all Scache hits, it is possible for instruction data to stream into the Ibox at the rate of one INT16, four instructions, per cycle. The Ibox can sustain up to quad-instruction issue from this Scache fill stream, filling the Icache simultaneously. The refill buffer holds all returned fill data until the data is required by the Ibox pipeline.

Each fill occurs when the instruction buffer stage in the Ibox pipeline requires a new INT16. The INT16 is written into the Icache and the instruction buffer simultaneously. This can occur at a maximum rate of one Icache fill per cycle. The actual rate depends on how frequently the instruction buffer stage requires a new INT16 and on availability of data in the refill buffer.

Once an Icache miss occurs, the Icache enters fill mode. When it is both in fill mode and awaiting a fill, the Icache is checked for hit. If the instruction data is found in the Icache, the Icache returns to access mode and the prefetcher stops sending fetches to the Mbox. When a new PC is loaded (e.g., taken branches) the Icache returns to access mode until the first miss. The refill buffer receives and holds instruction data from prefetches initiated before before the Icache returned to access mode.

2-4 DECchip 21164-AA Micro-Architecture

DIGITAL RESTRICTED DISTRIBUTION

# 2.3.2 Branch Execution

When a branch or jump instruction is fetched from the Icache, the Ibox takes one cycle to calculate the target PC before it is ready to fetch the target instruction stream. In the second cycle after the fetch, the Icache is accessed at the target address. Branch and PC prediction are necessary to predict and begin fetching the target instruction stream before the branch or jump instruction is issued.

The Icache records the outcome of branch instructions in a two-bit history state provided for each instruction location in the cache. This information is used as the prediction for the next execution of the branch instruction. The history status is not initialized on Icache fill, so it may "remember" a branch which is evicted from the Icache and subsequently reloaded.

DECchip 21164-AA does not limit the number of branch predictions outstanding to one; it predicts branches even while waiting to confirm the prediction of previously predicted branches. There can be one branch prediction pending for each of stages 3 and 4 plus up to four in stage 2.

When a predicted branch is issued, the Ebox or Fbox checks the prediction. The branch history table is updated accordingly. On branch mispredict, a mispredict trap occurs and the Ibox restarts execution from the correct PC.

DECchip 21164-AA provides a twelve-entry subroutine return stack which is controlled by decoding the opcode (BSR, HW\_REI and JMP/JSR/RET/JSR\_COROUTINE), and disp<15:14> in JMP/JSR/RET/JSR\_COROUTINE. The stack stores an Icache index in each entry. (Note that the stack is implemented as a circular queue which wraps around in the overflow and underflow cases.)

DECchip 21164-AA uses the Icache index hint in the JMP and JSR instructions to predict the target PC. The Icache index hint in the instruction's displacement field is used to access the direct mapped Icache. The upper bits of the PC are formed from the data in the Icache tag store at that index. Later in the pipeline, the PC prediction is checked against the actual PC generated by the Ebox. A mismatch causes a PC mispredict trap and restart from the correct PC. This is similar to branch prediction.

The RET, JSR\_COROUTINE, and HW\_REI instructions predict the next PC using the index from the subroutine return stack. The upper bits of the PC are formed from the data in the Icache tag at that index. These predictions are checked against the actual PC in exactly the same way that JMP and JSR predictions are checked.

Note that changes from PAL mode to native mode and vice versa are predicted on all PC predictions that use the subroutine return stack. If the opcode isn't HW\_REI, this might not seem to make sense, but if the PC prediction is correct, the mode prediction will be as well.

As noted above, Istream prefetching is disabled when a PC prediction is outstanding.

# 2.3.3 ITB

I

The Ibox contains a 48-entry fully associative translation buffer to cache recently used instructionstream address translations and protection information for pages ranging from 8 Kbytes to 512 Kbytes in size. The ITB uses a not-last-used replacement algorithm. The ITB is filled and maintained by PALcode. Each entry supports all four granularity hint bit combinations permitting translation for up to 512 contiguously mapped 8 Kbyte pages using a single ITB entry.

The operating system, via PALcode, must ensure that virtual addresses can only be mapped through a single ITB entry or super page mapping at one time. Multiple simultaneous mapping can cause UNDEFINED results.

While not executing in PAL mode, the 43-bit virtual program counter (PC) is presented each cycle to the ITB. If the PTE associated with the PC is cached in the ITB, the protection bits for the page which contains the PC are used by the Ibox to do the necessary access checks. If there is an Icache miss and the PC is cached in the ITB, the PFN and protection bits for the page which contains the PC are used by the Ibox to do the address translation and access checks.

The DECchip 21164-AA ITB supports 128 ASNs (MAX\_ASN=127) via a seven-bit ASN field in each ITB entry. PALcode which supports writes to the architecturally-defined TBIAP register does so by using the hardware-specific HW\_MTPR instruction to write to a specific hardware register. This has the effect of invalidating ITB entries which do not have their ASM bit set.

DECchip 21164-AA provides two optional translation extensions referred to as super pages. They are enabled via ICSR<SPE>. One super page mapping maps virtual address bits <39:13> one-to-one to physical address bits <39:13> when virtual address bits <42:41> = 2. This maps the entire physical address space four times over to the quadrant of the virtual address space with virtual address bits <42:41> = 2. The second super page mapping maps virtual address bits <29:13> one-to-one to physical address bits <29:13> with physical address bits <39:30> set to 0. This mapping occurs for virtual addresses with bits <42:30> = 1FFE(Hex), mapping a 30-bit region of physical address space to a single region of the virtual address space defined by virtual address bits <42:30> = 1FFE(Hex). Access to either super page mapping is only allowed while executing in kernel mode.

# 2.3.4 Interrupt Logic

The DECchip 21164-AA chip supports three sources of interrupts: hardware, software and asynchronous system trap (AST). There are seven level-sensitive hardware interrupts sourced by pins, 15 software interrupts sourced by an on-chip IPR (SIRR), and 4 AST interrupts sourced by a second on-chip IPR (ASTRR). Interrupts are masked by the hardware interrupt priority level register (IPL). In addition, AST interrupts are qualified by the current processor mode and the performance counter interrupts, the serial line interrupt, and the internally-detected correctable error interrupt are all maskable by bits in the IPR, ICSR (see Chapter 3). All interrupts are disabled when the processor is executing PALcode.

Table 2–1 shows which interrupts are enabled for a given IPL. An interrupt is enabled if the current IPL is less than the target IPL of the interrupt.

| Interrupt Source             | Target IPL (decimal) |  |
|------------------------------|----------------------|--|
| Software Interrupt Request 1 | 1                    |  |
| Software Interrupt Request 2 | 2                    |  |
| Software Interrupt Request 3 | • 3                  |  |
| Software Interrupt Request 4 | 4                    |  |
| Software Interrupt Request 5 | 5                    |  |

Table 2–1: Interrupt Priority Level Effect

#### 2–6 DECchip 21164-AA Micro-Architecture

| Interrupt Source                                                                              | Target IPL (decimal)                  |
|-----------------------------------------------------------------------------------------------|---------------------------------------|
| Software Interrupt Request 6                                                                  | 6                                     |
| Software Interrupt Request 7                                                                  | 7                                     |
| Software Interrupt Request 8                                                                  | 8                                     |
| Software Interrupt Request 9                                                                  | 9                                     |
| Software Interrupt Request 10                                                                 | 10                                    |
| Software Interrupt Request 11                                                                 | 11                                    |
| Software Interrupt Request 12                                                                 | 12                                    |
| Software Interrupt Request 13                                                                 | 13                                    |
| Software Interrupt Request 14                                                                 | 14                                    |
| Software Interrupt Request 15                                                                 | 15                                    |
| AST pending (for current or more privileged mode)                                             | 2                                     |
| Performance counter interrupt                                                                 | 29                                    |
| Power fail interrupt§                                                                         | 30                                    |
| System machine check interrupt§; Internally detected correctable error in-<br>terrupt pending | 31                                    |
| External interrupt 20§ (I/O interrupt at IPL 20; corrected system error interrupt)            | 20                                    |
| External interrupt 21§ (I/O interrupt at IPL 21)                                              | 21                                    |
| External interrupt 22§ (I/O interrupt at IPL 22; interprocessor interrupt; timer interrupt)   | 22                                    |
| External interrupt 23§ (I/O interrupt at IPL 23)                                              | 23                                    |
| Halt§                                                                                         | Masked only by executing in PAL mode. |
|                                                                                               |                                       |

Table 2–1 (Cont.): Interrupt Priority Level Effect

§These interrupts are from external sources. In some cases, the system environment provides the logic-or of multiple interrupt sources at the same IPL.

When the processor receives an interrupt request and that request is enabled, an interrupt is reported or delivered to the exception logic if the processor is not currently executing PALcode. Before vectoring to the interrupt service PAL dispatch address, the pipeline is completely drained to the point that instructions issued before entering the PALcode can not trap (implied DRAINT).

The restart address is saved in the Exception Address IPR (EXC\_ADDR) and the processor enters PALmode. The cause of the interrupt may be determined by examining the state of the INTID and ISR registers.

Note that hardware interrupt requests are level sensitive and therefore may be removed before an interrupt is serviced. PALcode must verify the interrupt actually indicated in INTID is to be serviced at an IPL higher that the current IPL. If it is not, PALcode should ignore the spurious interrupt.

## 2.3.5 Performance Counters

#### TBD FUNCTIONALITY

We have yet to define our performance monitoring features completely.

#### 2.4 Ebox

The Ebox contains two 64-bit integer execution pipelines, a total of 2 adders, 2 logic boxes, 1 barrel shifter, 1 byte zapper, and 1 integer multiplier. Almost all useful bypass paths are implemented; the result of any completed integer operation is available for use by instructions other than integer multiply issuing into either pipeline. (The integer multiplier is unable to recieve data from certain bypass paths. This is reflected in the latency specification at the end of this chapter.) The integer multiplier retires 8 bits per cycle. Table 2–9 lists all instruction latencies. The Ebox also contains the 40-entry 64-bit integer register file containing the 32 integer registers defined by the Alpha architecture and 8 PAL shadow registers. The register file has four read ports and two write ports which provide operands to both integer execution pipelines and accept results from both pipes. The register file also accepts load instruction results (memory data) on the same two write ports. Arbitration implemented by the Ibox reserves the write ports for fills from the Mbox when appropriate.

#### 2.5 Mbox

The Mbox contains the address translation buffer for all loads and stores, the write buffer address file, the miss address file, the Dcache interface, and Mbox IPRs. It executes up to two loads per cycle, though a load can not be issued simultaneously with a store or certain other Mbox instructions (see Section 2.10 for detailed issue rules). The address translation datapath receives a virtual address every cycle from each adder in the Ebox. A translation buffer with two read ports generates the corresponding physical addresses and access control information.

#### 2.5.1 Big Endian Support

DECchip 21164-AA provides limited support for big endian data formats via MCSR<BIG\_ ENDIAN>. When this bit is set, physical address bit <2> is inverted for all longword D-stream references. It is intended that this mode be set during initialization PALcode and not changed during operation.

#### 2.5.2 DTB

DECchip 21164-AA contains a 64-entry fully associative dual read-ported translation buffer which caches recently used data-stream page table entries for 8 Kbyte pages. Each entry supports all four granularity hint bit combinations permitting translation for up to 512 contiguously mapped 8 Kbyte pages using a single DTB entry. The translation buffer uses a not-last-used replacement algorithm.

#### 2–8 DECchip 21164-AA Micro-Architecture

DIGITAL RESTRICTED DISTRIBUTION

Р. 2. The DECchip 21164-AA DTB supports 128 ASNs (MAX\_ASN=127) via a seven-bit ASN field in each DTB entry. PALcode which supports writes to the architecturally-defined TBIAP register does so by using the hardware-specific HW\_MTPR instruction to write to a specific hardware register. This has the effect of invalidating DTB entries which do not have their corresponding ASM bit set.

For load and store instructions and other Mbox instructions requiring address translation, the effective 43-bit virtual address is presented to the DTB. If the PTE of the supplied virtual address is cached in the DTB, the PFN and protection bits for the page which contains the address are used by the Mbox to complete the address translation and access checks.

DECchip 21164-AA provides two optional translation extensions referred to as super pages. They are enabled via MCSR<SP<1:0>>. One super page mapping maps virtual address bits <39:13> one-to-one to physical address bits <39:13> when virtual address bits <42:41> = 2. This maps the entire physical address space four times over to the quadrant of the virtual address space with virtual address bits <42:41> = 2. The second super page mapping maps virtual address bits <29:13> one-to-one to physical address bits <29:13> with physical address bits <39:30> set to 0. This mapping occurs for virtual addresses with bits <42:30> = 1FFE(Hex), mapping a 30-bit region of physical address space to a single region of the virtual address space defined by virtual address bits <42:30> = 1FFE(Hex). Access to either super page mapping is only allowed while executing in kernel mode.

The DTB is filled and maintained by PALcode. Figure 3–6 shows the DTB miss flow. In general, the operating system, via PALcode, must ensure that virtual addresses can only be mapped through a single DTB entry or super page mapping at one time. Multiple simultaneous mapping can cause UNDEFINED results. The only exception to this rule is that one virtual page may be mapped twice with identical data in two different DTB entries. This occurs in operating systems utilizing virtually accessible page tables like those used by VMS. If the level 1 page table is accessed virtually, PALcode ends up loading the translation information twice, once in the double-miss handler, and once again in the primary handler. The PTE mapping the level 1 page table must remain constant during accesses to this page to meet this requirement.

# 2.5.3 Replay Traps

For implementation reasons, there are no stalls after the instruction issue point in the pipeline. For certain cases, an Mbox instruction can not be executed because of insufficient resources or some other reason. These instructions trap and the Ibox restarts their execution from the beginning of the pipeline. This is called a replay trap. Replay traps occur in the following cases:

- Write buffer full when a store is executed and there are already six write buffer entries allocated. The trap occurs regardless of whether the entry would have merged in the write buffer.
- A load issued in E0 when all six miss address file entries are valid (not available) or a load issued in E1 when five of the six miss address file entries are valid. The trap occurs regardles of whether the load would have hit in the Dcache merged with a miss address file entry.
- Alpha shared memory model order trap (Litmus test 1 trap): If a load issues that addressmatches with any miss in the miss address file, the load is aborted via a replay trap regardless of whether the newly-issued load hits or misses in the Dcache. The address match is precise except that it includes the case in which a longword access matches within a quadword access. This ensures that the two loads execute in issue order.

- Load-after-store trap: If a load is issued in the cycle immediately following a store that hits in the Dcache, and both access the same memory location, a replay trap occurs. The address match is exact with respect to low order bits of the address, but it is TBD whether it ignores address bits <42:13>.
- When a load is followed within one cycle by any instruction which uses the result of that load and the load misses in the Dcache, the consumer instruction traps and is restarted from the beginning of the pipeline. This happens because the consumer instruction is issued speculatively while Dcache hit is being evaluated. If the load misses in the Dcache, the speculative issue of the consumer instruction was incorrect. The replay trap brings the consumer instruction to the issue point before or simultaneously with the availability of fill data.

## 2.5.4 Load Instruction Execution and the Miss Address File

The Mbox begins execution of each load instruction by translating the virtual address and accessing the Dcache. Translation and Dcache tag read occur in parallel. If the addressed location is found in the Dcache (a hit), the data from the Dcache is formatted and written to either the integer or floating point register file. The formatting required depends on the particular load instruction executed. If the data is not found in the Dcache (a miss), the address, target register number, and formatting information are entered in the miss address file.

The miss address file (MAF) performs a load merging function. When a load miss occurs, each MAF entry is checked to see if it contains a load miss addressing the same Dcache (32 byte) block. If it does, and if certain merging rules are met, the new load miss is merged with an existing MAF entry. This allows the Mbox to service two or more load misses with one data fill from the Cbox. The merging rules are as follows:

- Merging only occurs if the new load miss addresses a different INT8 from all loads previously entered or merged to that miss address file entry.
- Merging only occurs if the new load miss is the same access size as the loads previously entered in that miss address file entry. I.e., quadword loads only merge with other quadword loads and longword loads only merge with other longword loads.
- In the case of longword loads, address bit<2> must be the same. I.e., longword loads with even addresses merge only with other even longword loads and longword loads with odd addresses merge only with other odd longword loads.
- The miss address file does not merge floating point and integer load misses in the same entry.
- Merging is prevented for the MAF entry a certain number of cycles after the Scache access corresponding to the MAF entry begins. Merging is prevented for that entry only if the Scache access hits. The minimum number of cycles of merging is three, the cycle in which the first load is issued and the two subsequent cycles. This corresponds to the most optimistic case of a load miss being forwarded to the Scache without delay (accounting for the cycle saved by the bypass which sends new load misses directly to the Scache when there is nothing else pending).

Note that merging is allowed for loads to non-cacheable space (physical address bit  $\langle 39 \rangle = 1$ ). At the pins, these reads will tell the system environment which INT32 is addressed and which INT8s within the INT32 are actually accessed. (Merging stops for a load to non-cacheable space as soon as the Cbox accepts the reference.) This permits the system environment to access only those INT8s actually requested by load instructions. For memory mapped INT4 registers, the system

2–10 DECchip 21164-AA Micro-Architecture

environment must return the result of reading each register within the INT8 since DECchip 21164-AA only indicates which INT8s are accessed, not the exact length and offset of the access within each INT8. Systems implementing memory mapped registers with side effects from reads should place each such register in a separate INT8 in memory.

When merging does not occur, a new MAF entry is allocated for the new load miss. Merging is done for two loads issued simultaneously which both miss as if they were issued sequentially with the load from Ebox pipe E0, in effect, first. The Mbox sends a read request to the Cbox for each MAF entry allocated.

A bypass is provided so that if the load issues in Ebox pipe E0, and no MAF requests are pending, that load's read request is sent to the Cbox immediately. Similarly, if a load from Ebox pipe E1 misses and there was no load instruction in E0 at all, the E1 load miss is sent to the Cbox immediately. In either case, the bypassed read request is aborted if the load hits in the Dcache or merges in the MAF.

There are six MAF entries for load misses and four more for Ibox instruction fetches and prefetches. Normally load misses are the highest priority Mbox request.

If the MAF is full and a load issues in E0 or if five of the six MAF entries are valid and a load issues in E1, an MAF full trap occurs causing the Ibox to restart execution with the load that caused the MAF overflow. When the load arrives at the MAF the second time, an MAF entry may have become available. If not, the MAF full trap occurs again.

Eventually, the Cbox provides the data requested for a given MAF entry (a fill). If the fill is integer data (and not floating point data), the Cbox requests that the Ibox allocate two consecutive "bubble" cycles in the Ebox pipelines. The first bubble prevents any instruction from issuing. The second bubble prevents only Mbox instructions (particularly loads and stores) from issuing. The fill uses the first bubble cycle as it progresses down the Ebox/Mbox pipelines to format the data and load the register file. It uses the second bubble cycle to fill the Dcache.

Referring to Figure 2–2, note that an instruction typically writes the register file in stage 6. Because there is only one register file write port per integer pipeline, a no-instruction bubble cycle is required to reserve a register file write port for the fill. Again refering to Figure 2–2, note that a load or store accesses the Dcache in the second half of stage 4 and the first half of stage 5. The fill operation writes the Dcache, making it unavailable for other accesses at that time. Relative to the register file write, the Dcache (write) access for a fill occurs a cycle later than the Dcache access for a load hit. This is because the fill data arrives just in time to be bypassed to the consuming instruction. Since only loads and stores use the Dcache in the pipeline, the second bubble reserved for a fill is a no-Mbox-instruction bubble. See Section 2.9 for more details of pipeline.

The second bubble is a subset of the first bubble. When two fills are in consecutive cycles (as they are for Scache hit) then three total bubbles are allocated, two no-instruction bubbles followed by one no-Mbox-instruction bubble. Note that the bubbles are requested before it is known whether the Scache (and similarly, the Bcache) will hit. In other words, bubble allocation is speculative.

For fills from the Cbox to floating point registers, no cycle is allocated. Loads which conflict in the pipeline with the fill are forced to miss. Stores which conflict in the pipeline force the fill to be aborted in order to keep the Dcache available to the store operation. In all cases, the floating point register(s) are filled as dictated by the associated MAF entry. A single store can block up to four consecutive fills. (Note that the Fbox has separate write ports for fill data as is necessary for this fill scheme.)

Up to two floating or integer registers may be written for each Cbox fill cycle. Fills deliver 32 bytes in two cycles, two INT8s per cycle. The MAF merging rules ensure that there is no more than one register to write for each INT8, so there is a register file write port available for each INT8. After appropriate formatting, data from each INT8 is written into the integer or floating point register file provided there is a miss recorded for that INT8.

Loads misses are all checked against the write buffer contents for conflicts between new loads and previously issued stores. See Section 2.5.6 for more detail.

LDL\_L and LDQ\_L instructions always allocate a new MAF entry. No loads that follow a LDL\_L or LDQ\_L are allowed to merge with it. After LDL\_L or LDQ\_L is issued, the Ibox does not issue any more Mbox instructions until the Mbox has successfully sent the LDL\_L or LDQ\_L to the Cbox. This guarantees correct ordering between a LDL\_L or LDQ\_L and a subsequent STL\_C or STQ\_C even if they access different addresses.

### 2.5.5 Store Execution

Stores execute in the Mbox by reading the Dcache tag store in the pipeline stage in which a load would read the Dcache, checking for a hit in the next stage, and writing the Dcache data store if there is a hit in the second following pipeline stage. See Section 2.9 for pipeline details.

Loads are not allowed to issue in the second cycle after a store (1 bubble cycle). Other instructions can be issued in that cycle. Stores can issue at the rate of one per cycle because stores streaming down the pipeline do not conflict in their use of resources (the Dcache tag store and Dcache data store are the principal resources). However, a load uses the Dcache data store in the same early stage that it uses the Dcache tag store. Therefore a load would conflict with a store if it were issued in the second cycle after any store. Section 2.9 gives details on store execution in the pipeline.

A load which is issued one cycle after a store in the pipeline creates a conflict if both access  $exactly^1$  the same memory location; the store hasn't updated the location when the load reads it. This conflict is handled by forcing the load to trap (a replay trap). The Ibox flushes the pipeline and restarts execution from the load instruction. By the time the load arrives at the Dcache the second time, the conflicting store has written the Dcache and the load is executed normally.

It is recommended that software not load data immediately after storing it. The replay trap that is incurred is fairly expensive. The best solution is to schedule the load to issue three cycles after the store. No issue stalls or replay traps will occur in that case. If the load is scheduled to issue two cycles after the store, it will be issue-stalled for one cycle for the reasons given above. This is not optimal but is much better than incurring a replay trap on the load.

For three cycles during store execution, fills from the Cbox are not placed in the Dcache. Register fills are unaffected. There are conflicts which make it impossible to fill the Dcache in each of these cycles. Fills are prevented in cycles in which a store is in pipeline stage 4, 5, or 6. Note that this applies most strongly to fills of floating point data. Fills of integer data allocate bubble cycles such that an integer fill never conflicts with a store in pipeline stages 4 or 5. A store which would have conflicted in stage 4 or 5 is issue-stalled instead.

2–12 DECchip 21164-AA Micro-Architecture

DIGITAL RESTRICTED DISTRIBUTION

1

<sup>&</sup>lt;sup>1</sup> It is TBD if this address check will include the most significant bits of the address. It will be precise over bits <12:0>.

If a store is stalled at the issue point for any reason, it interferes with fills just as if it had been issued. Again, this applies only to fills of floating point data. If, when a store issues, subsequent instructions at the issue point do not issue, then a "shadow" of the store remains in the pipeline latches at the issue point. The Mbox has special logic which detects that the stalled "shadow" of the store is not a new store and will never issue, so the store "shadow" is prevented from interfering with concurrent fills.

For each store, a search of the MAF is done to detect load-before-store hazards. If a store is executed and a load of the same address is present in the MAF, two things happen:

- 1. Bits are set in each conflicting MAF entry to prevent its fill from being placed in the Dcache when it arrives and to prevent subsequent loads from merging with that MAF entry.
- 2. Conflict bits are set with the store in the write buffer to prevent the store from being issued until all conflicting loads have been issued to the Cbox.

This ensures proper results from the loads and prevents incorrect data from being cached in the Dcache.

A check is done for each new store against stores in the write buffer that have already been sent to the Cbox but have not been completed. This is described in the next section.

#### 2.5.6 Write Buffer and the WMB Instruction

The write buffer address file is contained in the Mbox. The write buffer data store is contained in the Cbox. It contains six fully associative 32-byte entries. The purpose of the write buffer is to minimize the number of CPU stall cycles by providing a high bandwidth (but finite) resource for receiving store data. This is required since DECchip 21164-AA can generate store data at the peak rate of one INT8 every CPU cycle which is greater than the average rate at which the Scache can accept the data if Scache misses occur.

In addition to store instructions (including HW\_ST), STQ\_C, STL\_C, FETCH and FETCH\_M instructions are also written into the write buffer and sent off-chip. Unlike stores, however, these write buffer-directed instructions are never merged into a write buffer entry with other instructions.

A write buffer entry is invalid if it does not contain one of the commands listed above.

The WMB instruction has a special effect on the write buffer. When it is executed, a bit is set in every write buffer entry containing valid store data that will prevent future stores from merging with any of the entries. Also, the next entry to be allocated is marked with a WMB flag. (Note that the entry marked with the WMB flag does not yet have any valid data in it). When an entry marked with a WMB flag is ready to issue to the Cbox, it is not issued until every previously issued write is completely finished. This ensures correct ordering between stores issued before the WMB instruction and stores issued after it.

Each write buffer entry contains a CAM for holding physical address bits <39:5>, 32 bytes of data, eight INT4 mask bits which indicate which of the eight INT4s in the entry contain valid data, and miscellaneous control bits. Among the control bits are a WMB flag, already described, and a no-merge bit which indicates the entry is closed to further merging.

Two entry pointer queues are associated with the write buffer, a free entry queue and a pending request queue. The free entry queue contains pointers to available invalid write buffer entries. The pending request queue contains pointers to valid write buffer entries that have not yet been issued to the Cbox. The pending request queue is ordered in allocation order.

Each time the write buffer is presented with a store instruction the physical address generated by the instruction is compared to the address in each valid write buffer entry that is open for merging. If the address is in the same INT32 as an address in a valid write buffer entry which also contains a store and the entry is open for merging, then the new store data is merged into that entry and the entry's INT4 mask bits are updated. If no matching address is found or all entries are closed to merging, then the store data is written into the entry at the top of the free entry queue, that entry is validated, and pointer to the entry is moved from the free entry queue to the pending request queue. Note this scheme does not maintain write ordering.

When two or more entries are in the pending request queue, the Mbox requests that the Cbox process the write buffer entry at the head of the pending request queue. It then removes the entry from the pending request queue (without placing it in the free entry queue). When the Cbox has completely processed the write buffer entry, it notifies the Mbox and the now invalid write buffer entry is placed in the free entry queue. The Mbox may request a second write buffer entry be processed while waiting for the Cbox to finish the first. The write buffer entries are invalidated and placed in the free entry queue in the order that the requests complete. That order may be different than the order in which the requests were made.

The Mbox also requests a write buffer entry be processed every 64 cycles if there is even one valid entry. This ensures writes do not wait forever to be written to memory. Note that the timer which spurs this is free running.

When a LDL\_L or LDQ\_L is processed by the Mbox, the Mbox requests processing of the next pending write buffer request. This increases the chances of the write buffer being empty when a STL\_C or STQ\_C is issued.

The Mbox continues to request that write buffer entries be processed as long as one contains a STQ\_C, STL\_C, FETCH, FETCH\_M instruction or as long as one is marked by a WMB flag or there is an MB being executed by the Mbox. This insures that these instructions are finished as quickly as possible.

Every store that does not merge in the write buffer is checked against every valid entry. If any is an address match, then the WMB flag is set on the newly allocated write buffer entry. This prevents the Mbox from sending two writes to exactly the same block to the Cbox. The Cbox does not necessarily complete writes in the order in which they were issued, and reordering two writes to the same block can lead to an incorrect final result.

Load misses are checked in the write buffer for conflicts. The granularity of this check is an INT32; any load matching any write buffer entry's address is considered a hit even if it does not access an INT4 marked for update in that write buffer entry. If a load hits in the write buffer, a conflict bit is set in the load's MAF entry which prevents the load from being issued to the Cbox before the conflicting write buffer entry has been issued (and completed). At the same time, the no-merge bit is set in every write buffer entry with which the load hit. A write buffer flush flag is also set. The Mbox continues to request that write buffer entries be processed until all the entries which were ahead of the conflicting write(s) at the time of the load hit have been processed.

Some writes can not be processed in the Scache without external environment involvement. To support this, the Mbox retransmits a write at the Cbox's request. This situation arises when the Scache block is not dirty when the write is issued or when the access misses in the Scache.

I

#### 2.5.7 MB Instruction

The Mbox processes the MB instruction by first completing all outstanding loads and flushing the write buffer. It delays issuing the MB until all loads in the MAF and all writes in the write buffer have completed. The Mbox then issues the MB to the Cbox and waits until the Cbox signals that the MB has been processed before signaling the Ibox that the MB is complete. BC\_CTL<EI\_OPT\_CMD> determines whether the Cbox processes the MB by issuing it on the pins and waiting for acknowledgement. If BC\_CTL<EI\_OPT\_CMD> is not set, the Cbox retires the MB and immediately signals the Mbox that it has been processed. The Ibox stops issuing Mbox instructions after issuing the MB until the signal telling it to start again.

#### 2.5.8 Ibox Read Requests

The Mbox has a four entry file of Ibox read requests. There is a strict one-for-one mapping between these request file entries and the four entries in the refill buffer in the Ibox. Allocation of these entries is controlled by the Ibox. The Ibox never reuses an entry until the previous read has completed. For Istream reads in non-cacheable space, the Mbox marks all INT8s as accessed in the request to the Cbox.

# 2.5.9 Mbox Arbitration

The Mbox arbitrates among the pending Ibox requests, load misses, and write buffer requests to decide which is the next request to be sent to the Scache and Cbox. The Cbox overrides Mbox arbitration to handle fills and system bus requests (invalidates and probes) and to force a write buffer request to reissue when required by shared block write processing in the Cbox. Normally, load misses are the highest priority Mbox request, followed by Ibox requests and write buffer requests. Write buffer requests become higher priority than reads when a write buffer flush condition exists.

In some cases a request is refused by the Cbox due to lack of resources or a conflict. The Mbox places these refused requests in a replay queue. When arbitrating for an entry in the replay queue, the Mbox uses a priority higher than any other Mbox source. However, when only one replay queue entry is allocated, the Mbox delays arbitrating for the replay queue entry such that other Mbox requests can slip in between replays of refused commands. Sometimes the Cbox will be able to process the other request despite the conflict associated with the replayed request. Once the Mbox has two or more commands in the replay queue, it stops sending new references (because those too might be refused).

#### 2.6 The Cbox

The Cbox controls the Scache and the interface to the DECchip 21164-AA pin bus. It responds to all Mbox generated requests: load misses, instruction fetches and prefetches, and write buffer requests. It also implements a generic writeback cache protocol for the Scache and Bcache (external cache). Chapter 4 describes the DECchip 21164-AA pin bus and coherence protocol.

Internal data transfers between the Mbox (and Ibox) and the Cbox are made via 16-byte buses. Since the internal cache fill block size is 32 bytes, cache fill operations result in two data transfers from the Cbox to the appropriate cache. Since each write buffer entry is 32 bytes in size, write

transactions may result in two data transfers from the write buffer to the Scache and/or the external caches.

The Scache is fully pipelined and is able to provide fill data at a sustained rate of two INT8s per CPU cycle indefinitely. It is writeback and write allocate. Writes which hit in a private-dirty block are processed in a pipelined fashion at a rate of 1 INT16 per CPU cycle. Thus, extremely high data bandwidths are supported by the Scache.

The Scache and Bcache block sizes are selected to be 32 or 64 bytes by SC\_CTL<SC\_BLK\_SIZE>. The Scache and Bcache block sizes are always the same.

The optional Bcache supports high data bandwidth as well. It can provide fill data at a rate as high as one INT16 every 4 CPU cycles if pipelined, though the Bcache in many systems operates at a significantly slower rate. Bandwidth of Scache writebacks into the Bcache is the same, one INT16 per 4 CPU cycles. Writeback bandwidth into the Bcache is optimized by maintaining a modified bit for each INT16 in each Scache block. Only those INT16s that have actually been modified since the block was allocated in the Scache are written back to the Bcache. Scache victim writebacks can therefore take one to four Bcache cycles or not occur at all, depending on the state of the modified bits.

Programs which organize (block) their data such that it fits in the Scache for phases of execution will benefit most significantly from the high data bandwidths available from the DECchip 21164-AA Cbox. Data blocked to fit in the Bcache will benefit from the high Bcache bandwidth supported, but only to the degree that the particular system's Bcache has high bandwidth and never as much as for data blocked to fit in the Scache.

The Scache is set associative but is kept a subset of the larger externally implemented Bcache which is always direct mapped. Logic associated with the Scache tag comparators detects the case in which an Scache miss will cause a block in the Scache to be evicted from both the Bcache and Scache. If the Scache victim is dirty, it is copied from the Scache to the Bcache before the new read is allowed to access the Bcache and cause the Bcache block to be copied back to main memory. In other cases, Scache victims are buffered and written back to the Bcache after reading the new block from the Bcache.

The Cbox detects Scache references to INT64 blocks that have already missed in the Scache. They effectively stall the Scache until the fill occurs. When they proceed they should Scache hit. A special case occurs when the Scache block size is 64 bytes and the second Scache miss is an access to the other INT32 within an outstanding INT64 reference. Such a miss is merged in the Cbox such that the Scache pipeline does not stall. The INT64 fill will service both of the original INT32 references when it arrives. Only one such merge can occur for a particular Scache miss; once both halves of an INT64 Scache block have been requested, no additional merging is done.

#### NOTE

The Cbox never merges two INT32 references in non-cacheable space (physical address bit <39>=1). This is required so that the Cbox can inform the environment precisely which INT8s are accessed for each non-cacheable space read reference.

• Up to two Scache misses can be processed by the Cbox. These can be Bcache hits or misses. Once one of any two Scache misses is resolved, a new Scache miss can be accepted. Once two Scache misses are outstanding, the Cbox and Scache stop accepting new transactions until one of the outstanding misses is completed. Merging of the kind described in the previous paragraph affects this by effectively condensing two misses into one. Merging can not occur if two misses

are outstanding already, so with merging of INT32s into INT64s, up to three misses can be outstanding.

The Cbox implements a writeback coherence protocol characterized by write allocate, write invalidate, and snooping for dirty data in all coherent caches in the system for each bus read issued by each processor. The Cbox facilitates this protocol by:

- Interacting with the external bus interface so that it may maintain an accurate Bcache duplicate tag store (or Scache duplicate tag store in the absence of a Bcache). An accurate duplicate tag store always has the correct dirty status for each cache block.
- Maintaining shared and dirty status bits for each Scache block. Writes to private-dirty blocks occur without external activity. Writes to shared blocks are broadcast externally. Writes to blocks not shared and not dirty require interface acknowledgment to transition into the dirty state.
- Fulfilling reads to dirty blocks in the Scache or Bcache by providing the data directly from the appropriate cache. Reads from the system bus are processed at highest priority. If the block is dirty, the data is transmitted (under external control) from the appropriate cache.

Normally, the Mbox's arbiter determines the next request that enters the Scache pipeline. The Cbox causes override of the Mbox arbiter in the following cases:

- Scache fills from the Bcache or system environment.
- Processing of system probes and invalidates.
- Write broadcast data transmission or write to a private block after receiving acknowledgment from the interface.

### 2.7 Fbox

DECchip 21164-AA has an on-chip pipelined Fbox capable of executing both DEC and IEEE floating point instructions. IEEE floating point datatypes S and T are supported with all rounding modes. DEC floating point datatypes F and G are fully supported. There is limited support for D floating point format. The Fbox contains a 32-entry 64-bit floating point register file and a user accessible control register, FPCR, containing round mode controls and exception flag information. The Fbox contains two execution pipelines, a floating point multiply pipeline and a floating point add pipeline (which executes all Fbox instructions except multiply operations). The floating point divide unit is associated with the floating point add pipeline but is not itself pipelined. The Fbox can accept a multiply instruction and a non-multiply instruction every cycle, with the exception of floating point divide instructions. The latency for all instructions except divide is four cycles. Bypassers are provided to allow issue of instructions which are dependent on prior results while those results are written to the register file. For detailed information on instruction timing, refer to Section 2.10.

The floating point multiply pipeline and floating point add pipeline are both capable of executing the CPYS instruction. This is important for two reasons. It allows floating point NOPs to be executed in either floating point pipe and it allows floating point data to be moved from register to register simultaneously with execution of any floating point operation. (Recall that floating point NOP is CPYS F31,F31,F31.)

#### DIGITAL RESTRICTED DISTRIBUTION

DECchip 21164-AA Micro-Architecture 2–17

The floating point register file has five read ports and four write ports. Four of the read ports are used by the two pipelines to source operands. The remaining read port is used by floating point stores. Two of the write ports are used to write results from the two pipelines. The other two write ports are used to write fills from floating point loads. The Mbox arbitrates between floating point loads that hit in the Dcache and floating point fills from the Cbox, making certain that only one register need be written per fill port in each cycle. Floating point loads that conflict with Cbox fills for use of these write ports are forced to miss in the Dcache so that the Cbox fill can occur. The purpose of this is to maximize the available bandwidth for floating point loads.

### 2.8 Cache Organization

DECchip 21164-AA includes three on-chip caches. All memory cells are fully static CMOS 6T structures. Parity protection is implemented in all on-chip caches.

### 2.8.1 Data Cache

The DECchip 21164-AA data cache, the Dcache, is a dual-ported cache implemented as two 8 Kbyte cache banks. It is a write-through, read-allocate direct mapped physical cache with 32-byte blocks. One bank is associated with each of the two Ebox execution pipelines, E0 and E1. The cache banks contain exactly the same data. The Cbox keeps the Dcache coherent and keeps it a subset of the Scache.

A load that misses in the Dcache will result in a Dcache fill. The two banks are filled at the same time with identical data.

### 2.8.2 Instruction Cache

The DECchip 21164-AA instruction cache, the Icache, is an 8 Kbyte virtual direct-mapped cache. Icache blocks contain 32-bytes of instruction stream data, associated predecode data, the corresponding tag, a seven-bit ASN field (MAX\_ASN=127), a one-bit ASM field and a 1 bit PALcode indication per block. Coherency with memory is not maintained by Ibox hardware. The virtual instruction Icache is kept coherent with memory via the IMB PAL call, as specified in the Alpha SRM.

The DECchip 21164-AA virtual instruction cache is kept coherent with changes to PTEs via the IMB PAL call or by assigning a new ASN to the affected process. The TBIA, TBIAP, and TBIS PAL calls do not affect the contents of the Icache in any way.

### 2.8.3 Second Level Cache

The DECchip 21164-AA second level cache, Scache, is a 96 Kbyte, 3-way set associative, physical, writeback, write-allocate cache with 32 or 64 byte blocks (configured by SC\_CTL<SC\_BLK\_SIZE>). It is a mixed data and instruction cache. The Scache is fully pipelined; it processes reads and writes at the rate of 1 INT16 per CPU cycle and can alternate between read and write accesses without "bubble" cycles.

If the Scache block size is configured to 32 bytes, the Scache is organized as three sets of 512 blocks where each block consists of two 32-byte subblocks. Otherwise the Scache is three sets of 512 64-byte blocks.

#### 2–18 DECchip 21164-AA Micro-Architecture

Scache tags contain the following special bits for each 32-byte sub-block: one dirty bit, one shared bit, two INT16 modified bits, and one valid bit. Dirty and shared are the coherence state of the subblock required for the cache coherence protocol. The modified bits are used to prevent unnecessary writebacks from the Scache to the Bcache. The valid bit indicates the subblock is valid. In 64-byte block mode, the block is made up of two 32-byte subblocks and the valid, shared, and dirty bits in one subblock always match the corresponding bit in the other subblock

The Scache tag compare logic contains extra logic to check for blocks in the Scache which map to the same Bcache block as a new reference. This allows the Scache block to be moved to the Bcache (if dirty) before the block is evicted because of the new reference missing in the Bcache.

The Scache supports write broadcast by merging write data with Scache data in preparation for a write broadcast as required by the coherence protocol.

### 2.8.4 External Cache - Bcache

The Cbox implements control for an optional external, direct mapped, physical, writeback, write allocate cache with 32 or 64 byte blocks. (The block size is configured by SC\_CTL<SC\_BLK\_SIZE>). It is a mixed data and instruction cache. Bcache sizes of 1, 2, 4, 8, 16, 32, and 64 Mbytes are supported. See Chapter 4.

### 2.9 **Pipeline Organization**

DECchip 21164-AA has an seven stage pipeline for integer operate and memory reference instructions. Floating point operate instructions progress through a nine stage pipeline. The Ibox maintains state for all pipeline stages to track outstanding register writes. The pipeline diagrams below show the DECchip 21164-AA pipeline for several significant examples. The first four cycles are executed in the Ibox and the later stages are executed in the Ebox, Fbox, Mbox, and Cbox. There are bypass paths that allow the result of one instruction to be used as a source operand of a following instruction before it is written to the register file.

| F | igure | 2-2: | Pipeli | ne Examp | les |
|---|-------|------|--------|----------|-----|
|---|-------|------|--------|----------|-----|

|              | 0 1           | 2    | 3 4          | 5        | 6         | 7      | 8             | 9       | 10      | 11        |
|--------------|---------------|------|--------------|----------|-----------|--------|---------------|---------|---------|-----------|
| Integer Add  | access buffer | 1    | dirty        | 1        | 1         | 1      |               |         |         |           |
| 2            | Icacheland    | slot | check   add  | silo     | wrt RF    | 1      |               |         |         |           |
|              | decode        | i    | ard RF       | i        | i         | Ì      |               |         |         |           |
| Floating Add | access buffer | 1    | dirty        | Fbox     | Fbox      | Fbox   | Fbox          |         |         |           |
|              | Icacheland    |      | check   silo | stage    | stage     | stage  | st 4          |         |         |           |
|              | decode        | i    | rd RF        | 11       | 12        | 3      | wrt RF        |         |         |           |
| Load         | access buffer | 1    | dirty  addr  | I        | (use)     | 1      |               |         | •       |           |
| (Dcache hit) | [Icache] and  | slot | check  calc  | 1        | 1         | 1      |               |         |         |           |
|              | decode        | 1    | srd RF   ac  | cess det | ect   wrt | 1      |               |         |         |           |
|              | 1             | Ì    | dc           | ache i r | nit  RF   | i      |               |         |         |           |
|              | i i           | i    | i i          | for      |           | i      |               |         |         |           |
|              |               | l    | i i          | Imat     | :         | Ì      |               |         |         |           |
|              | i i           | i    | i i          | i        | 1         | i      | + (Bcad       | he acc  | ess beq | ins here) |
|              |               | •    |              |          |           |        |               |         | -       |           |
| Load         | access buffer | I    | dirty addr   | 1        | 1         | 1      | İ <b>v</b> I  | 1       | 1 (     | use)      |
|              | Icacheland    |      |              | i        | i         | i      | i i           | i       | Dcach   | • •       |
| (Scache hit) | l decode      |      | &rd RF   ac  | cessidet | ectlaco   | esside | tectlacces    | si send |         |           |
| (,           | 1 1           | i    |              |          |           |        | ache   Scache |         |         | 1 1       |
| •            | i i           | i    | 1 1          | 1        | •         |        | nit   data    |         | mati    | i i       |
|              | 1             |      | i i          | i        | 1         | 1      | 1 1           | ́ т     | 1       | 1         |
| Store        | access buffer | 1    | dirty  addr  | 1        | I         | ł      | 1             |         |         |           |
| (Dcache hit) | Icache and    | slot | check  calc  | 1        | 1         | 1      | 1             |         |         |           |
| . ,          | decode        | i i  | srd RF   ac  | cessidet | ectiwri   | te     | 1             |         |         |           |
|              | 1             | i i  | Dc           | ache   1 | nit  Dca  | che    | 1             |         |         |           |
|              |               | 1    |              |          |           | •      |               |         |         |           |

 Table 2–2:
 Pipeline Examples - All Cases

| Pipe Stage | Events                                                                                                                                                                                                                                                                                                                         |
|------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 0          | Access Icache tag and data.                                                                                                                                                                                                                                                                                                    |
| 1          | Buffer 4 instructions, check for branches, calculate branch displacements, check for Icache hit.                                                                                                                                                                                                                               |
| 2          | slot - swap instructions around so they are headed for pipelines capable of executing<br>them. Stall preceding stages if all instructions in this stage can not issue simultane-<br>ously because of function unit conflicts.                                                                                                  |
| 3          | Check the operands of each instruction to see that the source is valid and available<br>and that no write-write hazards exist. Read the integer register file. Stall preceding<br>stages if any instruction can not be issued. All source operands must be available at<br>the end of this stage for the instruction to issue. |

| Table 2–3: | Pipeline Examples | - Integer Add |
|------------|-------------------|---------------|
|            |                   |               |

| Pipe Stage | Events                                                                              |
|------------|-------------------------------------------------------------------------------------|
| 4          | Do the add.                                                                         |
| 5          | Result available for use by an operate this cycle.                                  |
| 6          | Write the integer register file. Result available for use by an operate this cycle. |

## 2–20 DECchip 21164-AA Micro-Architecture

| Pipe Stage | Events                                                                     |
|------------|----------------------------------------------------------------------------|
| 4          | Read the floating register file.                                           |
| 5          | First cycle of Fbox add pipeline.                                          |
| 6          | Second cycle of Fbox add pipeline.                                         |
| 7          | Third stage of Fbox add pipeline.                                          |
| 8          | Fourth stage of Fbox add pipeline. Write the floating point register file. |
| 9          | Result available for use by an operate this cycle.                         |

Table 2-4: Pipeline Examples - Floating Add

### Table 2–5: Pipeline Examples - Load (Dcache hit)

| Pipe Stage | Events                                                                                                                                                             |
|------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 4          | Calculate the effective address. Begin the Dcache data and tag store access.                                                                                       |
| 5          | Finish the Dcache data and tag store access. Detect Dcache hit. Format the data as required. Scache arbitration defaults to E0 in anticipation of a possible miss. |
| 6          | Write the integer or floating register file - data available for use by an operate this cycle.                                                                     |

| Pipe Stage | Events                                                                                                                                                                                                                                           |  |  |  |  |
|------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|--|
| 4          | Calculate the effective address. Begin the Dcache data and tag store access.                                                                                                                                                                     |  |  |  |  |
| 5          | Finish the Dcache data and tag store access. Detect Dcache miss. Scache arbitration defaults to E0 in anticipation of a possible miss. A load in E1 would be delayed at least one more cycle since default arbitration speculatively selects E0. |  |  |  |  |
| 6          | Begin Scache tag read.                                                                                                                                                                                                                           |  |  |  |  |
| 7          | Finish Scache tag read. Begin detecting Scache hit.                                                                                                                                                                                              |  |  |  |  |
| 8          | Finish detecting Scache hit. Begin accessing the correct Scache data bank. (Bcache index at pins; Bcache access begins)                                                                                                                          |  |  |  |  |
| 9          | Finish Scache data bank access. Begin sending fill data from Scache.                                                                                                                                                                             |  |  |  |  |
| 10         | Finish sending fill data from Scache. Begin Dcache fill. Format the data as required.                                                                                                                                                            |  |  |  |  |
| 11         | Finish Dcache fill. Write the integer or floating register file - data available for use by an operate this cycle.                                                                                                                               |  |  |  |  |

| Table 2–7: | Pipeline | <b>Examples</b> - | Store | (Dcache | hit) |
|------------|----------|-------------------|-------|---------|------|
|            |          |                   |       |         |      |

| Events                                                                                                |
|-------------------------------------------------------------------------------------------------------|
| Calculate the effective address. Begin the Dcache tag store access.                                   |
| Finish the Dcache tag store access. Detect Dcache hit. Send store to the write buffer simultaneously. |
| Write the Dcache data store if hit (write begins this cycle).                                         |
|                                                                                                       |

## DIGITAL RESTRICTED DISTRIBUTION

### DECchip 21164-AA Micro-Architecture 2–21

The DECchip 21164-AA pipeline divides instruction processing into four static and a number of dynamic stages of execution. The first four stages consist of the instruction fetch, buffer and decode, slotting, and issue check logic. These stages are static in that instructions may remain valid in the same pipeline stage for multiple cycles while waiting for a resource or stalling for other reasons. Dynamic stages always advance state and are unaffected by any stall in the pipeline. A pipeline stall may occur while zero instructions issue, or while some instructions of a set of four issue and the others are held at the issue stage. A pipeline stall implies that a valid instruction or instructions is (are) presented to be issued but can not proceed.

Upon satisfying all issue requirements, instructions are issued into their slotted pipeline. After issuing, instructions cannot stall in a subsequent pipe stage. It is up to the issue stage to ensure that all resource conflicts are resolved before an instruction is allowed to continue. The only means of stopping instructions after the issue stage is an abort condition. Note that the term abort as used here is different from its use in the Alpha SRM.

Aborts may result from a number of causes. In general, they may be grouped into two classes, namely exceptions (including interrupts) and non exceptions. The basic difference between the two is that exceptions require that the pipeline be drained of all outstanding instructions before restarting the pipeline at a redirected address. In either case, the pipeline must be flushed of all instructions which were fetched subsequent to the instruction which caused the abort condition. This includes aborting some instructions of a multiply-issued set in the case of an abort condition on the one instruction in the set. The non-exception case, however, does not need to drain the pipeline of all outstanding instructions ahead of the aborting instruction. The pipeline can be immediately restarted at a redirected address. Examples of non exception abort conditions are branch mispredictions, subroutine call/return mispredictions, and replay traps. Data cache misses can cause aborts or issue stalls depending on the cycle-by-cycle timing.

In the event of an exception other than an arithmetic exception, the processor aborts all instructions issued after the exceptional instruction as described above. Due to the nature of some exception conditions, this may occur as late as the integer register file write cycle. (In the case of an arithmetic exception, the processor may execute instructions issued after the exceptional instruction.) Next, the address of the exceptional instruction is latched in the EXC\_ADDR IPR. (In the case of an arithmetic exception, the address latched in the EXC\_ADDR IPR is that of the lats instruction executed which may be a later instruction than the exceptional instruction.) When the pipeline is fully drained, the processor begins instruction execution at the address given by the PALcode dispatch. The pipeline is drained when all outstanding writes to both the integer and floating point register file have completed and all outstanding instructions have passed the point in the pipeline such that all instructions are guaranteed to complete without an exception in the absence of a machine check.

Replay traps are aborts that occur when an instruction requires a resource that is not available at some point in the pipeline. Generally these are Mbox resources whose availability could not be anticipated accurately at issue time. If the necessary resource is not available when the instruction requires it, the instruction is aborted and the Ibox begins fetching at exactly that instruction, thereby replaying the instruction in the pipeline. A slight variation on this is the load-miss-and-use replay trap in which an operate is issued just as Dcache hit is being evaluated to determine if one of the instructions operands is valid. If it turns out that there is a Dcache miss, then the operate is aborted and replayed.

2–22 DECchip 21164-AA Micro-Architecture

It should be noted that there are two basic reasons for non-issue conditions. The first is a pipeline stall wherein a valid instruction or set of instructions are prepared to issue but cannot due to a resource conflict (register conflict or function unit conflict). These type of non-issue cycles can be minimized through code scheduling. The second type of non-issue conditions consist of pipeline bubbles where there is no valid instruction in the pipeline to issue. Pipeline bubbles result from the abort conditions described above. In addition, a single pipeline bubble is produced whenever a branch type instruction is predicted to be taken, including subroutine calls and returns. Pipeline bubbles are reduced directly by the instruction buffer hardware and through bubble squashing, but can also be effectively minimized through careful coding practices. Bubble squashing involves the ability of the first four pipeline stages to advance whenever a bubble or buffer slot is detected in the pipeline stage immediately ahead of it while the pipeline is otherwise stalled.

### 2.10 Scheduling and Issuing Rules

### 2.10.1 Instruction Class Definition and Instruction Slotting

It is important to note that the following scheduling and multiple issue rules are only performance related. There are no functional dependencies related to scheduling or multiple issuing. The scheduling and issuing rules are defined in terms of instruction classes. The table below specifies all of the instruction classes and the pipeline which executes the particular class. With a few additional rules, Table 2–8 gives the information necessary to determine the functional resource conflicts that determine the which instructions can issue in a given cycle.

| Class Name | Pipeline                      | Instruction List                                                                                                                                             |
|------------|-------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------|
| LD         | $E0^1$ or $E1^2$              | all loads except LDx_L                                                                                                                                       |
| ST         | EO                            | all stores except STx_C.                                                                                                                                     |
| MBX        | EO                            | LDx_L, MB, WMB, STx_C, HW_LD-lock, HW_ST-<br>cond, FETCH •                                                                                                   |
| RX         | EO                            | RS, RC                                                                                                                                                       |
| MXPR       | E0 or E1 depending on the IPR | HW_MFPR, HW_MTPR                                                                                                                                             |
| IBR        | E1                            | integer conditional branches                                                                                                                                 |
| FBR        | FA <sup>3</sup>               | floating point conditional branches                                                                                                                          |
| JSR        | E1                            | jump to subroutine instructions JMP, JSR, RET, or JSR_COROUTINE, BSR, BR, HW_REI, CALLPAL                                                                    |
| IADD       | E0 or E1                      | ADDL ADDL/V ADDQ ADDQ/V SUBL SUBL/V SUBQ<br>SUBQ/V S4ADDL S4ADDQ S8ADDL S8ADDQ S4SUB<br>S4SUBQ S8SUBL S8SUBQ LDA LDAH                                        |
| ILOG       | E0 or E1                      | AND BIS XOR BIC ORNOT EQV                                                                                                                                    |
| SHIFT      | E0 ·                          | SLL SRL SRA EXTQL EXTLL EXTWL EXTBL EXTQ<br>EXTLH EXTWH MSKQL MSKLL MSKWL MSKBL<br>MSKQH MSKLH MSKWH INSQL INSLL INSWL<br>INSBL INSQH INSLH INSWH ZAP ZAPNOT |
| CMOV       | E0 or E1                      | CMOVEQ CMOVNE CMOVLT CMOVLE CMOVGT<br>CMOVGE CMOVLBS CMOVLBC                                                                                                 |
| ICMP       | E0 or E1                      | CMPEQ CMPLT CMPLE CMPULT CMPULE CMPBO                                                                                                                        |
| IMULL      | EO                            | MULL MULL/V                                                                                                                                                  |
| IMULQ      | EO                            | MULQ MULQ/V                                                                                                                                                  |
| IMULH      | EO                            | UMULH                                                                                                                                                        |
| FADD       | FA .                          | floating point operates except multiply and CPYS (but including CPYSN and CPYSE).                                                                            |

 Table 2–8:
 Instruction Classes and Slotting

<sup>1</sup>Ebox pipeline 0.

<sup>2</sup>Ebox pipeline 1.

<sup>3</sup>Fbox "add" pipeline.

2–24 DECchip 21164-AA Micro-Architecture

| Class Name | Pipeline        | Instruction List              |
|------------|-----------------|-------------------------------|
| FDIV       | FA              | floating point divide.        |
| FMUL       | $\mathbf{FM}^4$ | floating point multiply       |
| FCPYS      | FM or FA        | CPYS (but not CPYSN or CPYSE) |
| MISC       | EO              | RPCC, TRAPB                   |
| UNOP       | none            | UNOP                          |

| Table 2–8 (Cont.): | Instruction | Classes | and Slotting |
|--------------------|-------------|---------|--------------|
|--------------------|-------------|---------|--------------|

#### 2.10.1.1 Slotting

The slotting function in Ibox determines which instructions will be sent forward to attempt to issue. The slotting function detects and removes all static functional resource conflicts. The set of instructions output by the slotting function will issue if no register or other dynamic resource conflict is detected in stage 3 of the DECchip 21164-AA pipeline.

The basic slotting algorithm is simple. Starting from the first (lowest addressed) valid instruction in the INT16 in stage 2 of the DECchip 21164-AA Ibox pipeline, attempt to assign that instruction to one of the four pipelines (E0, E1, FA, FM). If it is an instruction which can issue in either of E0 or E1, put it in E0 except that if one the following is true, put it in E1.

- E0 isn't free and E1 is free.
- The next integer instruction<sup>‡</sup> in this INT16 can only issue in E0.

If the current instruction is one which can issue in either FA or FM, put it in FA unless FA isn't free. Mark the pipeline selected by this process as taken and begin again with the next sequential instruction. Stop when an instruction can not be allocated an execution pipeline because any pipeline it can use is already taken. The slotting logic also enforces the special rules listed below, stopping the slotting process when a rule would be violated by allocating the next instruction an execution pipeline. Note that the slotting logic doesn't send instructions forward out of logical instruction order because DECchip 21164-AA always issues instructions in order.

- 1. An instruction of class LD can not be simultaneously issued with an instruction of class ST.
- 2. All instructions are discarded at the slotting stage after a predicted-taken IBR or FBR class instruction, or a JSR class instruction.
- 3. After a predicted not-taken IBR or FBR, no other IBR, FBR, or JSR class can be slotted together.
- 4. The following cases are detected by the slotting logic:
  - from lowest address to highest within an INT16, the arrangement I-instruction, F-instruction, I-instruction, I-instruction, where I-instruction is any instruction that can issue in one or both of E0 or E1.and F-instruction is any instruction that can issue in one or both of FA or FM.
  - from lowest address to highest within an INT16, the arrangement F-instruction, I-instruction, I-instruction.

### DIGITAL RESTRICTED DISTRIBUTION

DECchip 21164-AA Micro-Architecture 2–25

<sup>‡</sup> In this context, an integer instruction is one which can issue in one or both of E0 or E1, not FA or FM.

When this type of case is detected, the first two instructions are forwarded to the issue point in one cycle, and the second two are sent only when the first two have both issued, provided no other slotting rule would prevent the second two from being slotted in the same cycle. This makes a code sequence that was optimally scheduled for EV4 perform at least as well on DECchip 21164-AA.

### 2.10.2 Instruction Latencies

After slotting, instruction issue is governed by the availability of registers for read or write and the availability of the floating divide unit and the integer multiply unit. There are producer-consumer dependencies, producer-producer dependencies (also known as write after write conflicts) and dynamic function unit availability dependencies (integer multiply and floating divide). Ibox logic in stage 3 of the DECchip 21164-AA pipeline detects all these conflicts.

For most instructions the latency to produce a valid result is fixed. The exceptions are loads which miss, floating point divides, and integer multiplies. Table 2–9 gives the latencies for each instruction class. A latency of 1 means that the result may be used by an instruction issued one cycle after the producing instruction. Note that most latencies are a property of the producer only; except for integer multiply latencies, there are no variations in latency due to which particular unit produces a given result relative to the particular unit that consumes it. Even in the case of integer multiply, the instruction is issued at the time determined by the standard latency numbers, but the multiply's latency is dependent on which previous instructions produced its operands and when they executed.

| Class | Latency                                                                                                                                                               | Additional time before<br>result available to inte-<br>ger multiply unit <sup>†</sup> |
|-------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------|
| LD    | Dcache hits, latency=2; Dcache miss/Scache hit, latency=7 or<br>longer§                                                                                               | 1 cycle                                                                               |
| ST    | Stores produce no result                                                                                                                                              | -                                                                                     |
| MBX · | LDx_L always Dcache misses, latency depends on memory<br>subsystem state; STx_C, latency depends on memory subsys-<br>tem state; MB, WMB, and FETCH produce no result | -                                                                                     |
| RX    | RS, RC, latency=1                                                                                                                                                     | 2 cycles                                                                              |
| MXPR  | HW_MFPR, latency=1, 2 or longer depending on the IPR; HW_<br>MTPR, produces no result                                                                                 | 1 or 2 cycles                                                                         |
| IBR   | produce no result                                                                                                                                                     |                                                                                       |

#### Table 2–9: Instruction Latencies

§When idle, Scache arbitration predicts a load miss in E0. If a load actually does miss in E0, it is sent to the Scache right away. If this hits and no other event in the Cbox affects the operation, the requested data is available for bypass in 7 cycles. Otherwise, the request takes longer, possibly much longer depending on the state of the Scache and Cbox. It should be possible to schedule some unrolled code loops for Scache using a data access pattern that takes advantage of the Mbox load merging function, achieving high throughput with large data sets.

†The multiplier is unable to receive data from Ebox bypass paths. The instruction issues at the expected time, but its latency is increased by the time it takes for the input data to become available to the multiplier. For example, an IMULL issued one cycle later than an ADDL which produced one of its operands has a latency of 10 (8 + 2). If the IMULL issued two cycles later than the ADDL, the latency is 9 (8 + 1).

#### 2–26 DECchip 21164-AA Micro-Architecture

| Class | Latency                                                                                                                                                                                                                                                       | Additional time before<br>result available to inte-<br>ger multiply unit† |
|-------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------|
| FBR   | produce no result                                                                                                                                                                                                                                             | -                                                                         |
| JSR   | all but HW_REI, latency=1; HW_REI produces no result                                                                                                                                                                                                          | 2 cycles                                                                  |
| IADD  | latency=1‡                                                                                                                                                                                                                                                    | 2 cycles                                                                  |
| ILOG  | latency=1‡                                                                                                                                                                                                                                                    | 2 cycles                                                                  |
| SHIFT | latency=1                                                                                                                                                                                                                                                     | 2 cycles                                                                  |
| CMOV  | latency=2                                                                                                                                                                                                                                                     | 1 cycle                                                                   |
| ICMP  | latency=1                                                                                                                                                                                                                                                     | 2 cycles                                                                  |
| IMULL | latency=8 plus up to 2 cycles of added latency depending on<br>the source of the data <sup>†</sup> ; latency until next IMULL, IMULQ, or<br>IMULH can issue if there are no data dependencies is 4 cycles<br>plus the number of cycles added to the latency.  | 1 cycle                                                                   |
| IMULQ | latency=12 plus up to 2 cycles of added latency depending on<br>the source of the data <sup>†</sup> ; latency until next IMULL, IMULQ, or<br>IMULH can issue if there are no data dependencies is 8 cycles<br>plus the number of cycles added to the latency. | 1 cycle                                                                   |
| IMULH | latency=14 plus up to 2 cycles of added latency depending on<br>the source of the data <sup>†</sup> ; latency until next IMULL, IMULQ, or<br>IMULH can issue if there are no data dependencies is 8 cycles<br>plus the number of cycles added to the latency. | 1 cycle                                                                   |
| FADD  | latency=4                                                                                                                                                                                                                                                     | •                                                                         |
| FDIV  | data dependent latency is preliminary, 2.4 bits per cycle aver-<br>age rate; next floating divide can be issued in the same cycle<br>the result of the previous divide's result is available, regardless<br>of data dependencies.                             | -                                                                         |
| FMUL  | latency=4                                                                                                                                                                                                                                                     | -                                                                         |
| FCPYS | latency=4                                                                                                                                                                                                                                                     | -                                                                         |
| MISC  | RPCC, latency=2; TRAPB produces no result                                                                                                                                                                                                                     | 1 cycle                                                                   |
| UNOP  | UNOP produces no result                                                                                                                                                                                                                                       | -                                                                         |

#### Table 2–9 (Cont.): Instruction Latencies

†The multiplier is unable to receive data from Ebox bypass paths. The instruction issues at the expected time, but its latency is increased by the time it takes for the input data to become available to the multiplier. For example, an IMULL issued one cycle later than an ADDL which produced one of its operands has a latency of 10 (8 + 2). If the IMULL issued two cycles later than the ADDL, the latency is 9 (8 + 1).

‡A special bypass provides an effective latency of 0 (zero) cycles for an ICMP or ILOG producing the test operand of an IBR or CMOV. This is only true when the IBR or CMOV issues in the same cycle as the ICMP or ILOG which produces the test operand of the IBR or CMOV. In all other cases the effective latency of ICMP and ILOG is 1 cycle

### ISSUE

The actual issue times of floating divides after floating divides is still open. The above statement is approximately correct.

#### DIGITAL RESTRICTED DISTRIBUTION

DECchip 21164-AA Micro-Architecture 2–27

### 2.10.3 Producer-Producer Latency

Producer-producer latency, also known as write after write conflicts, cause issue-stalls to preserve write order. If two instructions write the same register, they are by the Ibox forced to do so in different cycles. This is necessary to ensure that the correct result is left in the register file after both instructions have executed. For most instructions, the order in which they write the register file is dictated by issue order, however IMUL, FDIV and LD instructions may require more time than other instructions to complete. Subsequent instructions that write the same destination register are issue-stalled to preserve write ordering at the register file.

Cases involving an intervening producer-consumer conflict are of interest. They can occur commonly in a multiple-issue situation when a register is re-used. In these cases, producer-consumer latencies are equal to or greater than the required producer-producer latency as determined by write ordering and therefore dictate the overall latency.

An example of this case is shown in the code:

LDQ R2,D(R0) ; R2 destination ADDQ R2,R3,R4 ; wr-rd conflict stalls execution waiting for R2 LDQ R2,D(R1) ; wr-wr conflict may dual issue when addq issues

In general, producer-producer latency are determined by applying the rule that register file writes must occur in the correct order (which is enforced by Ibox hardware). Two IADD or ILOG class instructions that write the same register will issue at least one cycle apart. The same is true of a pair of CMOV class instructions, even though their latency is 2. For IMUL, FDIV and LD, producer-producer conflicts with any subsequent instruction results in the second instruction being issue-stalled until the IMUL, FDIV, or LD is about to complete. The second instruction is issued as soon as it is guaranteed to write the register file after the IMUL, FDIV, or LD, at least one cycle afterwards.

If a load writes a register and within two cycles a subsequent instructions writes the same register, the subsequent instruction is issued speculatively assuming the load hits. If the load misses, a load-miss-and-use trap is generated, causing the second instruction to be replayed by the Ibox. When the second instruction again reaches the issue point, it is issue-stalled until the load fill occurs.

### 2.10.4 DECchip 21164-AA Issue Rules

The following is a list of conditions that prevent DECchip 21164-AA from issuing an instruction.

1. No instruction can be issued until all of it's source and destination registers are clean, i.e. all outstanding writes to the destination register are guaranteed to complete in issue order and there are no outstanding writes to the source registers or those writes can be bypassed.

Technically, load-miss-and-use replay traps are an exception to this rule. The consumer of the load's result issues and is aborted because a load was predicted to hit and discovered to miss just as the consumer instruction issued. In practice, the only difference is that the latency of the consumer may be longer than it would have been had the issue logic known the load would miss in time to prevent issue.

- 2. An instruction of class LD can not be issued in the second cycle after an instruction of class ST is issued.
- 3. No LD, ST, LDX\_L, MXPR (to an Mbox register), or MBX class instruction after an MB instruction has been issued until until the MB has been acknowledged on the external pin bus.

2–28 DECchip 21164-AA Micro-Architecture

- 4. No LD, ST, LDX\_L, MXPR (to an Mbox register), or MBX class instruction after a STx\_C (or HW\_ST-cond) instruction has been issued until the Mbox writes the success/failure result of the STx\_C (HW\_ST-cond) in its destination register.
- 5. No IMUL instructions can be issued if the integer multiplier is busy.
- 6. No floating point divide instructions can be issued if the floating point divider is busy.
- 7. No instruction can be issued to pipe E0 exactly two cycles before an integer multiplication completes.
- 8. No instruction can be issued to pipe FA exactly TBD cycles before an floating point divide completes.
- 9. No instruction can be issued to pipe E0 or E1 exactly two cycles before a integer register fill is requested (speculatively) by the Cbox, except IMULL, IMULQ, IMULH instructions and instructions which do not produce a result at all.
- 10. No LD, ST, LDX\_L, or MBX class instruction can be issued to pipe E0 or E1 exactly one cycle before a integer register fill is requested (speculatively) by the Cbox.
- 11. No instruction issues after a TRAPB instruction until all previously issued instructions areguaranteed to finish without generating a trap other than a machine check.

Subject to the above rules, all instructions sent to the issue stage (stage 3) by the slotting logic (stage 2) are issued. If issue is prevented for a given instruction at the issue stage, all logically subsequent instructions at that stage are prevented from issuing automatically. DECchip 21164-AA only issues instructions in order.

# 2.11 Revision History

| Who            | When                 | Description of change  |       |
|----------------|----------------------|------------------------|-------|
| John Edmondson | 9-Feb-1992           | Initial release.       | · · · |
| John Edmondson | 1-May-1992           | Update to version 1.5. |       |
| John Edmondson | 29-November-<br>1992 | Update to version 1.8. |       |

 Table 2–10:
 Revision History

2-30 DECchip 21164-AA Micro-Architecture

## Chapter 3

## PALcode and IPRs

### 3.1 Overview

PALcode is macrocode that runs with privileges enabled, instruction stream mapping disabled, and interrupts disabled. PALcode has privilege to use five "special" opcodes which allow functions such as physical data stream references and Internal Processor Register (IPR) manipulation. In DECchip 21164-AA, these opcodes are: HW\_LD, HW\_ST, HW\_MFPR, HW\_MTPR, and HW\_REI. PALmode is the CPU state that distinguishes between native macrocode and PALcode.

Hardware calculates PALcode entry points as offsets to the PAL\_BASE IPR. Hardware loads the EXC\_ADDR IPR with a return PC when a PALcode flow is begun. EXC\_ADDR can also be directly read and written using the HW\_MFPR and HW\_MTPR instructions. The HW\_REI instruction returns instruction flow to the PC stored in EXC\_ADDR. The Return Prediction Stack is used to speed execution by predicting the PC to be executed after HW\_REI.

PC<0> is used as the PALmode flag both to the hardware and to PALcode itself. When the CPU enters a PAL flow, the Ibox sets PC<0>, and this bit remains set as we move through the PAL Istream. The Ibox hardware ignores this and behaves as if the PC were still longword aligned for the purposes of Istream fetch and execute. On HW\_REI, the new state of PALmode is copied from EXC\_ADDR<0>.

The DECchip 21164-AA Ebox register file has eight extra registers that are called the PALshadow registers. The PALshadows overlay R8, R9, R10, R11, R12 and R25 when the CPU is in PALmode and ICSR<SDE> is asserted. For additional PAL scratch, the Ibox has a register bank of 24 PALtemps, which are accessable via HW\_MTPR and HW\_MFPR.

The DECchip 21164-AA architecture group will provide PALcode to support both the OpenVMS and OSF operating systems. We will also provide a DECchip 21164-AA PALcode violation checker (PVC).

### 3.2 PALcode Entry Points

There are two different types of PALcode entry points: CALL\_PAL and traps.

### 3.2.1 CALL\_PAL

The CALL\_PAL entry points are used whenever the Ibox encounters a CALL\_PAL instruction in the Istream. The CALL\_PAL itself is issued into pipe E1 and the Ibox stalls for the minimum number of cycles necessary to perform an implicit TRAPB. The PC of the instruction immediately following the CALL\_PAL is loaded into EXC\_ADDR and is pushed onto the Return Prediction Stack.

The Ibox contains special hardware to minimize the number of cycles in the TRAPB at the start of a CALL\_PAL. Software can benefit from this by scheduling CALL\_PALs such that they do not fall in the shadow of:

- IMUL
- Any Floating Point operate, especially FDIV

The Microarchitecture chapter describes the latency of these instructions.

Each CALL\_PAL instruction includes a function field that will be used in the calculation of the next PC. The PAL OPCDEC flow will be started if the CALL\_PAL function field is:

- in the range 40(hex) to 7F(hex) inclusive.
- is greater than BF(hex).
- between 00 and 3F(hex) inclusive, AND PS<CUR\_MOD> is not equal to kernel.

If no OPCDEC is detected on the CALL\_PAL function, then the PC of the instruction to execute after the CALL\_PAL is calculated as follows:

- PC<63:14> = PAL\_BASE IPR<63:14>
- PC<13> = 1
- PC<12> = CALL\_PAL function field<7>
- PC<11:6> = CALL\_PAL function field<5:0>.
- PC < 5:1 > = 0
- PC < 0 > = 1 PALmode

The minimum number of cycles for a CALL\_PAL execution is 5:

- 1 issue the CALL\_PAL instruction.
- 1 minimum TRAPB for empty pipe. More typically this will be 4 cycles.
- 2 The minimum length of a PAL flow. More typically, of course, there will be more than 2 cycles of work for the CALL\_PAL.
- 1 return bubble to do Icache fetch.

#### 3–2 PALcode and IPRs

### 3.2.2 Traps

PALcode is started up on a subset of the DECchip 21164-AA traps. (No PALcode assist is required for replay and mispredict type traps). EXC\_ADDR is loaded with the return PC and the Ibox performs a TRAPB in the shadow of the trap. The Return Prediction Stack is pushed with the PC of the trapping instruction for precise traps, and with some later PC for imprecise traps.

Table 3-1 shows the PALcode trap entry points, and their offset from the PAL\_BASE IPR. The table lists the entry points from highest to lowest priority. (Prioritization among the Dstream traps works because DTB miss is not asserted when there is a sign check error. The priority of ITBmiss and Interrupt is reversed if the is an Icache miss.)

| Entry Name     | Offset(hex) | Description                                                             |
|----------------|-------------|-------------------------------------------------------------------------|
| RESET          | 0000        | Reset                                                                   |
| MCHK           | 0080        | Uncorrected hardware error                                              |
| ARITH          | 0100        | Arithmetic exception                                                    |
| INTERRUPT      | 0580        | Interrupt: hardware, software, and AST                                  |
| ITBMISS        | 0400        | Istream TBmiss                                                          |
| IACCVIO        | 0180        | Istream access violation or sign check error on PC                      |
| FEN            | 0200        | Floating Point Operation attempted with:                                |
|                |             | - FP Instructions(LD, ST and Operates) disabled through FPE bit in ICSR |
|                |             | - FP IEEE operation with datatype other than S, T or Q                  |
| OPCDEC         | 0280        | Illegal Opcode                                                          |
| DTBMISS_SINGLE | 0480        | Dstream TBmiss                                                          |
| DTBMISS_DOUBLE | 0500        | Dstream TBmiss during Virtual PTE fetch                                 |
| UNALIGN        | 0300        | Dstream unaligned reference                                             |
| DFAULT         | 0380        | Dstream fault or sign check error on VA                                 |

#### Table 3–1: PALcode Trap Entry Points

### 3.3 PAL Opcodes

This section describes the DECchip 21164-AA mapping of the 5 PALRES opcodes. In normal mode, the execution of a PALRES opcode causes an OPCDEC exception if PALmode is not asserted. In addition, ICSR<HWE> is provided as a hook to allow the execution of the PAL opcodes by kernel mode software. Any software executing with ICSR<HWE> set must use extreme care to obey all restrictions listed in this chapter.

## 3.3.1 HW\_LD

The HW\_LD instruction is used by PALcode to do special forms of Dstream loads. Figure 3-1 and Table 3-2 describe the format and fields of the HW\_LD instruction. Data alignment traps are inhibited for HW\_LD instructions.

## Figure 3–1: HW\_LD instruction

| 3<br>1   | 22<br>65 |    | 22 |    | 1 1<br>6 5      | 43                                    | 2                      | 1                   | 0                 | 9 |      | 0 |
|----------|----------|----|----|----|-----------------|---------------------------------------|------------------------|---------------------|-------------------|---|------|---|
| I OPCODE |          | RA |    | RB | P <br> H <br> Y | A   W<br>L   R<br>T   T<br>  C<br>  K | Q <br> U <br> A <br> D | V <br>P <br>T <br>E | C I<br>C I<br>K I | I | DISP |   |

| Table | 3-2: | HW | LD | Format | description |
|-------|------|----|----|--------|-------------|
|-------|------|----|----|--------|-------------|

| Field  | Description                                                                                                                                    |
|--------|------------------------------------------------------------------------------------------------------------------------------------------------|
| OPCODE | The OPCODE field contains 1B (hex).                                                                                                            |
| RA     | Destination register number.                                                                                                                   |
| RB     | Base register for memory address.                                                                                                              |
| PHYS . | 0 - The effective address for the HW_LD is virtual.                                                                                            |
|        | 1 - The effective address for the HW_LD is physical. Translation and memory manage-<br>ment access checks are inhibited.                       |
| ALT    | 0 - Memory management checks use Mbox IPR DTB_CM for access checks.                                                                            |
|        | 1 - Memory management checks use Mbox IPR ALT_MODE for access checks.                                                                          |
| WRTCK  | 0 - Memory management checks FOR and read access violations.                                                                                   |
|        | 1 - Memory management checks FOR, FOW, read and write access violations.                                                                       |
| QUAD   | 0 - Length is longword.                                                                                                                        |
|        | 1 - Length is quadword.                                                                                                                        |
| VPTE   | 1 - Flags a virtual PTE fetch. Used by trap logic to distinguish single TBmiss from double TBmiss. Access checks are performed in kernel mode. |
| LOCK   | 1 - Load_lock version of HW_LD. PAL must slot to E0 pipe.                                                                                      |
| DISP   | Holds a 10-bit signed byte displacement.                                                                                                       |

#### 3–4 PALcode and IPRs

## 3.3.2 HW\_ST

The HW\_ST instruction is used by PALcode to do special forms of Dstream stores. Figure 3-2 and Table 3-3 describe the format and fields of the HW\_ST instruction. Data alignment traps are inhibited for HW\_ST instructions.

The Ibox logic will always slot HW\_ST to pipe E0.

### Figure 3–2: HW\_ST instruction

|                  | 2            | 2 2<br>1 0             | 1 1 1 1 1 1 1 0<br>6 5 4 3 2 1 0 9                                       | 0 |
|------------------|--------------|------------------------|--------------------------------------------------------------------------|---|
| <br>  OPCODE<br> | <br>  RA<br> | <br> <br>  RB<br> <br> | P A M Q M C <br> H L B U B O <br> Y T Z A Z N  DISP<br> S     D   D <br> | + |

| Table 3–3: | HW ST | Format | description |
|------------|-------|--------|-------------|
|            |       |        |             |

| Field  | Description                                                                                                              |
|--------|--------------------------------------------------------------------------------------------------------------------------|
| OPCODE | The OPCODE field contains 1F (hex).                                                                                      |
| RA     | Write data register number.                                                                                              |
| RB     | Base register for memory address.                                                                                        |
| PHYS   | 0 - The effective address for the HW_ST is virtual.                                                                      |
|        | 1 - The effective address for the HW_ST is physical. Translation and memory manage-<br>ment access checks are inhibited. |
| ALT    | 0 - Memory management checks use Mbox IPR DTB_CM for access checks.                                                      |
|        | 1 - Memory management checks use Mbox IPR ALT_MODE for access checks.                                                    |
| QUAD   | 0 - Length is longword.                                                                                                  |
|        | 1 - Length is quadword.                                                                                                  |
| COND   | 1 - Store_conditional version of HW_ST. In this case, RA will be written with the value of LOCK_FLAG.                    |
| DISP   | Holds a 10-bit signed byte displacement.                                                                                 |
| MBZ    | Bits 13 and 11 must be zero.                                                                                             |

### 3.3.3 HW\_REI

The HW\_REI instruction is used to return instruction flow to the PC pointed to by the EXC\_ ADDR IPR. The value in EXC\_ADDR<0> will be used as the new value of PALmode after the HW\_REI.

The Ibox uses the Return Prediction Stack to speed the execution of HW\_REI. We have two different types of HW\_REI:

- Prefetch: In this case, the Ibox will begin fetching the new Istream as soon as possible. This is the version of HW\_REI that is normally used.
- Stall Prefetch: This encoding of HW\_REI inhibits Istream fetch until the HW\_REI itself is issued. Thus, this is the method used to synchronize Ibox changes (such as ITB writes) with the HW\_REI. There is a rule that PALcode can only have one such HW\_REI in an aligned block of four instructions.

Figure 3-3 and Table 3-4 describe the format and fields of the HW\_REI instruction.

The Ibox logic will slot HW\_REI to pipe E1.

#### Figure 3–3: HW\_REI instruction

|              | 22<br>65 |    | 2 2<br>1 0 |    | 1 1 1 1<br>6 5 4 3  |     | 0 |
|--------------|----------|----|------------|----|---------------------|-----|---|
| <br>  OPCODE |          | RA | .          | RB | T  <br>  Y  <br>  P | MBZ |   |

| Field  | Description                                                  |
|--------|--------------------------------------------------------------|
| OPCODE | The OPCODE field contains 1E (hex).                          |
| RA/RB  | Register numbers, should be R31 to avoid unnecessary stalls. |
| TYP    | 10 - normal version                                          |
|        | 11 - stall version                                           |
| MBZ    | Bits 13 - 0 Must Be Zero                                     |

#### Table 3–4: HW\_REI Format description

#### 3–6 PALcode and IPRs

### 3.3.4 HW\_MFPR and HW\_MTPR

The HW\_MFPR and HW\_MTPR instructions are used to access internal state from the Ibox, Mbox, and Dcache. The data for Ibox IPRs and PALtemps will be moved to and from the Ebox via the PC buses. These HW\_MFPRs have a latency of one cycle (HW\_MFPR in cycle x results in data available in the Ebox in cycle x+1). For Mbox and Dcache IPRs, the data will be moved to and from the Ebox over the normal load and store datapaths. HW\_MFPR from Mbox and Dcache IPRs have a latency of 2 cycles. Ibox hardware slots each type of MXPR to the correct Ebox pipe, see Table 3-6.

Figure 3-4 and Table 3-5 describe the format and fields of the HW\_MFPR and HW\_MTPR instruction.

### Figure 3–4: HW\_MFPR, HW\_MTPR instruction

| 3               | _      | 2<br>5      | 22<br>10       | 1<br>6 | 1<br>5               |       | 0              |
|-----------------|--------|-------------|----------------|--------|----------------------|-------|----------------|
| <br> <br> <br>+ | OPCODE | RA<br> <br> | <br> <br> <br> | RB     | <br> <br> <br> <br>+ | Index | <br> <br> <br> |

| Table 3–5: | HW MTPR and | <b>HW MFPR Format</b> | description |
|------------|-------------|-----------------------|-------------|
|            |             |                       |             |

| Field  | Description                                                                                            |
|--------|--------------------------------------------------------------------------------------------------------|
| OPCODE | The OPCODE field contains 19 (hex) for HW_MFPR, 1D (hex) for HW_MTPR.                                  |
| RA/RB  | Must be the same. Source register for HW_MTPR. Destination register for HW_MFPR.                       |
| Index  | Specifies the IPR. See Table 3–6 for encodings. See Section 3.9 for more details about a specific IPR. |

Table 3-6: IPR Encodings

| IPR          | Access | Index(hex) | Ibox slots to Pipe |
|--------------|--------|------------|--------------------|
| ISR          | R      | 100        | E1                 |
| ITB_TAG      | W      | 101        | E1                 |
| ITB_PTE      | R/W    | 102        | E1                 |
| ITB_ASN      | R/W    | 103        | E1                 |
| ITB_PTE_TEMP | R      | 104        | E1                 |
| ITB_IA       | W      | 105        | E1                 |

| IPR            | Access | Index(hex) | Ibox slots to Pipe |
|----------------|--------|------------|--------------------|
| ITB_IAP        | W      | 106        | E1                 |
| ITB_IS         | W      | 107        | E1                 |
| SICR           | W      | 108        | E1                 |
| IFAULT_VA_FORM | R      | 109        | E1                 |
| IVPTBR         | R/W    | 10A        | E1                 |
| EXC_ADDR       | R/W    | 10B        | E1                 |
| EXC_SUM        | R/WC   | 10C        | E1                 |
| EXC_MASK .     | R      | 10D        | E1                 |
| PAL_BASE       | R/W    | 10E        | E1                 |
| PS             | R/W    | 10F        | E1                 |
| IPL            | R/Ŵ    | 110        | E1                 |
| INTID          | R      | 111        | E1                 |
| ASTSR          | R/W    | . 112      | E1                 |
| ASTER          | R/W    | 113        | E1                 |
| SIRR           | W      | 114        | E1                 |
| HWINT_CLR      | W      | 115        | E1                 |
| SL_XMIT        | W      | 116        | E1                 |
| SL_RCV         | R      | 117        | E1                 |
| ICSR           | R/W    | 118        | E1                 |
| IC_FLUSH       | W      | 119        | E1                 |
| IC_PERR_STAT   | R/WC   | 11A        | E1                 |
| PALtemp[0:23]  | R/W    | 140-157    | <b>E1</b>          |
| DTB_ASN        | W      | 200        | EO                 |
| DTB_CM         | W      | 201        | EO                 |
| DTB_TAG        | W      | 202        | EO                 |
| DTB_PTE        | R/W    | 203        | EO                 |
| DTB_PTE_TEMP   | R      | 204        | EO                 |
| MM_STAT        | R      | 205        | EO                 |
| VA             | R      | 206        | EO                 |
| VA_FORM        | R      | 207        | EO                 |
| MVPTBR         | W      | 208        | EO                 |
| DTBIAP         | W      | 209        | EO                 |
| DTBIA          | W      | 20A        | EO                 |
| DTBIS          | W      | 20B        | EO                 |
| ALT_MODE       | W      | 20C        | EO                 |
| CC             | W      | 20D        | EO                 |

Table 3–6 (Cont.): IPR Encodings

3–8 PALcode and IPRs

| IPR              | Access | Index(hex) | Ibox slots to Pipe |
|------------------|--------|------------|--------------------|
| CC_CTL           | W      | 20E        | E0                 |
| MCSR             | R/W    | 20F        | EO                 |
| DC_FLUSH         | w      | 210        | E0                 |
| DC_PERR_STAT     | R/W1C  | 212        | E0                 |
| DC_TEST_CTL      | R/W    | 213        | EO                 |
| DC_TEST_TAG      | R/W    | 214        | EO                 |
| DC_TEST_TAG_TEMP | R/W    | 215        | EO                 |
| DC_MODE          | R/W    | 216        | EO                 |
| MAF_MODE         | R/W    | 217        | EO                 |

| Table 3-6 ( | Cont.): | IPR Encodings |
|-------------|---------|---------------|
|-------------|---------|---------------|

### 3.4 PAL storage registers

The DECchip 21164-AA Ebox register file has eight extra registers that are called the PALshadow registers. The PALshadows overlay R8 - R14 and R25 when the CPU is in PALmode and ICSR<SDE> is set. Thus, PALcode can consider R8 - R14 and R25 as local scratch. PALshadow registers cannot be written in the last 2 cycles of a PALcode flow, as the Ibox does not implement complete dirty logic on these registers. The normal state of the CPU is ICSR<SDE> = ON. PALcode disables SDE for the unaligned trap and for error flows.

The Ibox holds a bank of 24 PALtemp registers. The PALtemps are accessed with the HW\_MTPR and HW\_MFPR instructions. The latency from a PALtemp read to availability is one cycle.

### 3.5 SRM defined State - OpenVMS

This table is an accounting of the DECchip 21164-AA storage used to implement the SRM defined state for OpenVMS.

| Register Name        | Mnemonic | Access | Internal Storage                                                  |
|----------------------|----------|--------|-------------------------------------------------------------------|
| Processor Status     | PS       | R/W    | PALtemp /Ibox-PS /Mbox-DTB_CM<br>/ Interrupt logic-IPL/ PALshadow |
| Program Counter      | PC       |        | Ibox                                                              |
| AST Enable           | ASTEN    | R/W    | Interrupt logic-ASTER                                             |
| AST Summary          | ASTSR    | R/W    | Interrupt logic-ASTRR                                             |
| Interproc. Interrupt | IPIR     | W      | _                                                                 |

| Register Name                 | Mnemonic | Access | Internal Storage                                                               |
|-------------------------------|----------|--------|--------------------------------------------------------------------------------|
| Interrupt Priority Level      | IPL      | R/W    | Interrupt Logic–IPL                                                            |
| Machine Check Error Summary   | MCES     | R/W    | PALtemp                                                                        |
| Privileged Context Block Base | PCBB     | R      | PALtemp                                                                        |
| Processor Base Register       | PRBR     | R/W    | PALtemp                                                                        |
| Page Table Base Register      | PTBR     | R      | PALtemp                                                                        |
| System Control Block Base     | SCBB     | R/W    | PALtemp                                                                        |
| SW Interrupt Request Register | SIRR     | w      | _                                                                              |
| SW Interrupt Summary Register | SISR     | R      | Interrupt logic–ISR                                                            |
| TB Check                      | TBCHK    | R      | Not implemented                                                                |
| TB Invalidate All             | TBIA     | W      |                                                                                |
| TB Invalidate All Process     | TBIAP    | w      |                                                                                |
| TB Invalidate All Dstream     | TBIAD    | w      |                                                                                |
| TB Invalidate All Istream     | TBIAI    | W      |                                                                                |
| TB Invalidate Single          | TBIS     | w      |                                                                                |
| Kernel Stack Pointer          | KSP      | None   | PCB                                                                            |
| Executive Stack Pointer       | ESP      | R/W    | PCB                                                                            |
| Supervisor Stack Pointer      | SSP      | R/W    | PCB                                                                            |
| User Stack Pointer            | USP      | R/W    | PALtemp                                                                        |
| Virtual Page Table Base       | VPTB     | R/W    | PALtemp / Ibox–IVPTBR/ Mbox–<br>MVPTBR                                         |
| Who Am I                      | WHAMI    | R      | PALtemp                                                                        |
| Floating Point Enable         | FEN      | W      | Ibox-ICSR                                                                      |
| Address Space Number          | ASN      | w      | Ibox-ITB_ASN/ Mbox-DTB_ASN                                                     |
| Cycle Counter                 | CC       | R/W    | MboxCC,CC_CTL. Read with RPC                                                   |
| Unique                        | UNQ      | R/W    | PCB                                                                            |
| lock_flag                     | -        | R/W    | Cbox/System. Access with LDx_I<br>and STx_C, and HW_LD and HW_<br>ST variants. |

| Table 3–7 (Cont.): | OnenVMS | SRM  | defined | State |
|--------------------|---------|------|---------|-------|
|                    |         | OTIN | ucinicu | Juaie |

## 3.6 SRM defined State - OSF

This table is an accounting of the DECchip 21164-AA storage used to implement the SRM defined state for OSF.

3–10 PALcode and IPRs

| IPR Name                       | Mnemonic | Access | Internal Storage                                                               |
|--------------------------------|----------|--------|--------------------------------------------------------------------------------|
| Processor Status               | PS       | R/W    | PALtemp /Ibox-PS /Mbox-DTB_CM<br>/ Interrupt logic-IPL/ PALshadow              |
| Program Counter                | PC       | -      | Ibox                                                                           |
| Interrupt Entry Address        | entINT   | W      | PALtemp                                                                        |
| Arith Trap Entry Address       | entARITH | w      | PALtemp                                                                        |
| MM Fault Entry Address         | entMM    | W      | PALtemp                                                                        |
| Unaligned Access Entry Address | entUNA   | W      | PALtemp                                                                        |
| Instruction Fault Address      | entIF    | W      | PALtemp                                                                        |
| Call System Entry Address      | entSys   | W      | PALtemp                                                                        |
| User Stack Pointer             | USP      | R/W    | PALtemp                                                                        |
| Kernel Stack Pointer           | KSP      | W      | PALtemp                                                                        |
| Kernel Global Pointer          | Kgp      | W      | PALtemp                                                                        |
| System Value                   | sysval   | R/W    | PALtemp                                                                        |
| Page Table Base Register       | ptptr    | R/W    | PALtemp                                                                        |
| Virtual Page Table Base        | vptbr    | W      | Ibox-IVPTBR/ Mbox-MVPTBR                                                       |
| Process Control Block Base     | PCBB     | R/W    | PALtemp                                                                        |
| Address Space Number           | ASN      | w      | Ibox-ITB_ASN/ Mbox-DTB_ASN                                                     |
| Cycle Counter                  | CC       | R/W    | Mbox-CC,CC_CTL. Read with RPC                                                  |
| Floating Point Enable          | FEN      | w      | Ibox-ICSR                                                                      |
| lock_flag                      | -        | R/W    | Cbox/System. Access with LDx_L<br>and STx_C, and HW_LD and HW_<br>ST variants. |
| Unique                         | UNQ      | R/W    | PCB                                                                            |
| Who Am I                       | WHAMI    | R      | PALtemp                                                                        |

## Table 3–8: OSF SRM defined State

## 3.7 Performance

This list is a summary of DECchip 21164-AA features that improve PAL performance:

- PALshadows save cycles that would have been spent stashing and restoring GPRs.
- Ibox performs minimum TRAPB on CALL\_PAL entry.
- Return Prediction Stack is used to speed HW\_REI.
- Ibox and Mbox hardware calculate the Virtual Address of the PTE entry needed on a TBmiss.
- Ibox and Mbox support distinct trap entry points for single and double TBmiss.
- The design of the interrupt hardware is specifically tailored to speed up OpenVMS CALL\_PALs like MTPR\_IPL.

### DIGITAL RESTRICTED DISTRIBUTION

### PALcode and IPRs 3-11

- The PALtemps have a 1 cycle latency.
- The more frequent PAL trap entry points are grouped together to improve Icache hit on traps.

### 3.8 **TBmiss flows**

#### Figure 3–5: Istream TBmiss flow

```
Assumptions, info, etc.
  This is the entry for Istream TBmiss. A virtual fetch of the PTE will
       be attempted. If the virtual PTE fetch TBmisses, a trap will
       be taken to the double miss routine, which will fill the TB for
       the PTE fetch and HW_REI back to this routine.
   Instruction pairs show E0/E1.
   Best case timing: 16 cycles (6 in, 8 execute, 2 out)
ITBMISS:
   nop
  mfpr
           r8, ev5$_ifault_va_form ; Get virtual address of PTE.
  ຕດຕ
           r10, exc_addr
                                  ; Get PC of faulting instruction.
  mfpr
   ld_vpte r8, 0(r8)
                                 ; Get PTE, traps to DTBMISS_DOUBLE in case of TBmiss ; Restore exc_address if there was a trap.
          r10, exc_addr
   mtpr
   mfpr
           r31, ev5$ va
                                   ; Unlock VA in case there was a double miss
   nop
   and
          r8, #pte$m foe, r25
                                    ; Look for FOE set.
          r8, INVALID_OR_FOE_IPTE_HANDLER ; PTE not valid.
   blbc
   nop
           r25, INVALID_OR_FOE_IPTE_HANDLER
   bne
   nop
           r8, ev5$_itb_pte
                                  ; Ibox remembers the VA, load the PTE into the ITB.
   mtpr
   hw rei stall
                                    ; Done, synch and return.
```

#### 3–12 PALcode and IPRs

#### Figure 3–6: Dstream TBmiss flow

```
Assumptions, info, etc.
  This is the entry for Dstream TBmiss (from native or PALmode).
      A virtual fetch of the PTE will be attempted. If the virtual
      PTE fetch TBmisses, a trap will be taken to the double miss routine,
      which will fill the TB for the PTE fetch and HW_REI back to this routine.
   Instruction pairs show E0/E1.
   Best case timing: 18 cycles (8 trap shadow, 9 execute, 1 out)
DTBMISS SINGLE:
  mfpr
          r8, ev5$_va_form
                                  ; Get virtual address of PTE.
           rl0, exc addr
                                  ; Get PC of faulting instruction.
  mfpr
  mfpr
           r9, ev5$_mm_stat
                                  ; Get read/write bit.
           r10, pt6
                                  ; Stash exc addr away.
  mtpr
   ld_vpte r8, 0(r8)
                                  ; Get PTE, traps to DTBMISS_DOUBLE in case of TBmiss
   nop
                                  ; Pad MFPR VA
   mfpr
           r10, ev5$_va
                                  ; Get original faulting VA for TB load.
   nop
           r8, ev5$ dtb pte
                                  ; Write DTB PTE part.
   mtpr
           r8, INVALID DPTE HANDLER ; Handle invalid PTE
   blbc
           rl0, ev5$_dtb_tag
                                  ; Write DTB TAG part, completes DTB load. No virt ref for 3 cycles.
   mtpr
   mfpr
           r10, pt6
                                  ; Following 2 instructions take 2 cycles
           r10, exc_addr
                                  ; Return linkage in case we trapped.
   mtpr
           r31, pt0
                                  ; Pad the write to dtb tag,
   mfpr
   hw_rei
                                   ; Done, return
```

### 3.9 **IPRs**

This section describes, on a box by box basis, all the DECchip 21164-AA Internal Processor Registers. Ibox, Mbox, and Dcache IPRs are accessable to PALcode via the HW\_MTPR and HW\_MFPR instructions. Table 3–6 lists the IPR numbers. Cbox, Scache, and Bcache IPRs are accessable in the physical address region FFFFF00000 to FFFFFFFFFF. Table 3–29 summarizes the Cbox, Scache, and Bcache IPRs. Table 3–41 lists restrictions on the IPRs.

### 3.9.1 Ibox IPRs

#### NOTE

Unless explicitly stated, IPRS are not cleared or set by hardware on chip or on timeout reset.

### 3.9.1.1 ITB\_TAG

The ITB\_TAG register is a write only register. This register is written by hardware on an ITBMISS/IACCVIO, with the tag field of the faulting VA. To ensure the integrity of the ITB, the TAG and PTE fields of an ITB entry are updated simultaneously by a write to the ITB\_PTE register. This write causes the contents of the ITB\_TAG register to be written into the tag field of the ITB location, which is determined by a NLU algorithm. The PTE field is obtained from the MTPR ITB\_PTE instruction.

#### Figure 3–7: Istream TB Tag, ITB\_TAG

#### 3.9.1.2 ITB\_PTE

The ITB\_PTE register is a read/write register. A write to this register, writes both the PTE and TAG fields of an ITB location determined by a not-last-used algorithm. The TAG and PTE fields are updated simultaneously to insure the integrity of the ITB. A write to the ITB\_PTE register increments the NLU pointer, which allows for writing the entire set of ITB PTE and TAG entries. The TAG field of the ITB location is determined by the contents of the ITB\_TAG register. The PTE field is available in the MTPR ITB\_PTE instruction. Writes to this register use the memory format bits as described in the Open VMS memory management chapter of the Alpha SRM.

#### Figure 3–8: Istream TB PTE Write Format, ITB PTE

|   | 63   | 59 |       |           | 32 | - |     |          |          |         |          |         |         |        | 03        |       |
|---|------|----|-------|-----------|----|---|-----|----------|----------|---------|----------|---------|---------|--------|-----------|-------|
|   | IGN  |    | 1     | PFN[3913] |    |   | IGN | U<br>  R | S<br>  R | E<br> R | K<br>  R | I<br> G | ян      | A<br>S | <br>  IGN | Í     |
| • | <br> |    | <br>+ |           |    | + |     |          |          |         |          |         | ~ ~ ~ ~ |        |           | <br>+ |

A read of the ITB\_PTE requires two instructions. A read of the ITB\_PTE register, returns the PTE pointed to by the NLU pointer to the ITB\_PTE\_TEMP register and updates the NLU pointer according to the not-last-used algorithm. A zero value is returned to the integer register file. A second read of the ITB\_PTE\_TEMP register returns the PTE the the general purpose integer register file.

### Figure 3–9: Istream TB PTE Read Format, ITB\_PTE

|     | 59 |       |           | 32 |   |      |   |     |     | <br> |   |   |   |     |     |   |   |     | - |
|-----|----|-------|-----------|----|---|------|---|-----|-----|------|---|---|---|-----|-----|---|---|-----|---|
|     |    | +<br> |           |    |   |      |   |     |     |      |   |   |   |     |     |   |   |     |   |
| RAZ |    | 1     | PFN[3913] |    | < | 2:0> | • | l i | RAZ | R    | Т | R | R | R   | RAZ | S | Ţ | RAZ | 1 |
|     |    | 1     |           |    |   |      |   | 1   |     | . –  |   | - | • | . – | 1   |   |   |     |   |

#### 3.9.1.3 Address Space Number, ITB\_ASN

The ITB\_ASN register is a read/write register which contains the Address space number (ASN) of the current process.

Figure 3–10: Address Space Number Read/Write Format, ITB\_ASN

| 63 |         | <br>10 |          |     | 00   |
|----|---------|--------|----------|-----|------|
| 1  | RAZ/IGN | i –    | ASN<6:0> | RAZ | /IGN |
| +  |         | <br>+  |          | +   | +    |

#### 3.9.1.4 ITB\_PTE\_TEMP

The ITB\_PTE\_TEMP register is a read-only holding register for ITB\_PTE read data. A read of the ITB\_PTE register returns data to this register. A second read of the ITB\_PTE\_TEMP register returns data to the integer general purpose register file.

## Figure 3–11: Istream TB PTE Temp Read Format, ITB\_PTE\_TEMP

| 63  | • • | 58 |           |   |   | 30   | <br> | 22 |   |   |   |   |     |     | 12  | - |
|-----|-----|----|-----------|---|---|------|------|----|---|---|---|---|-----|-----|-----|---|
| 1   |     | 1  |           |   |   |      |      |    |   |   |   |   | -   |     |     |   |
| RAZ |     | 1  | PFN[3913] | 1 | < | 2:0> | RAZ  |    | R | R | R | R | RAZ | I S | RAZ | 1 |
| 1   |     | I  |           | I |   |      |      |    |   |   | • |   |     | • • | 1   |   |

| Table 3–9: Description of GHD bits in ITB PTE TEMP read for |
|-------------------------------------------------------------|
|-------------------------------------------------------------|

.

| Name | Extent | Туре | Description                               |
|------|--------|------|-------------------------------------------|
| GHD  | 31     | RO   | Is set if GH(granularity hint) equals 11. |
| GHD  | 30     | RO   | Is set if GH(granularity hint) equals 10. |
| GHD  | 29     | RO   | Is set if GH(granularity hint) equals 01. |

### DIGITAL RESTRICTED DISTRIBUTION

PALcode and IPRs 3-15

#### 3.9.1.5 Istream TB Invalidate All Process, ITB\_IAP

This is a write-only register. Any write to this register invalidates all ITB entries, whose ASM bit equals zero.

#### 3.9.1.6 IStream TB Invalidate All, ITB\_IA

This is a write-only register. Any write to this register invalidates all ITB entries, and resets the ITB NLU pointer to its initial state. RESET Palcode must execute an MTPR ITB\_IA instruction in order to initialize the NLU pointer.

#### 3.9.1.7 ITB\_IS

This is a write-only register. Writing a virtual address to this IPR invalidates the ITB entry that meets any one of the following criteria:

- An ITB entry whose VA field matches ITB\_IS<42:13> and whose ASN field matches ITB\_ ASN<10:4>.
- An ITB entry whose VA field matches ITB\_IS<42:13> and whose ASM bit is set.

#### Figure 3–12: ITB\_IS



#### 3.9.1.8 Formatted Faulting VA register, IFAULT\_VA\_FORM

This is a read-only register which contains the formatted faulting virtual address on an ITBMiss/IACCVIO. The formatted faulting address generated depends on whether NT super page mapping is enabled through the SPE <0> bit of the ICSR.

#### Figure 3–13: IFAULT\_VA\_FORM in non NT mode

### Figure 3–14: IFAULT\_VA\_FORM in NT mode

| 63               |             | 30 |  | 22 | <br>      | 03 |     |  |
|------------------|-------------|----|--|----|-----------|----|-----|--|
| + <b>-</b> -<br> | VPTB[63:30] |    |  |    | VA[31:13] |    | RAZ |  |

### 3.9.1.9 Virtual Page table Base register, IVPTBR

This is a read-write register.

### Figure 3–15: IVPTBR

| 63    |             | 30 |     | 00     |
|-------|-------------|----|-----|--------|
| +<br> | VPTB[63:30] |    | IGN | ++<br> |

### 3.9.1.10 Icache Parity Error Status register, ICPERR\_STAT

This is read/write register that contains information about an Icache Parity error. The error status bits may be cleared by writing a 1 to the appropriate bits.

### Figure 3–16: ICPERR\_STAT Read format

| 63             |         | 12              | 11    |         | 00 |
|----------------|---------|-----------------|-------|---------|----|
| <br> <br> <br> | RAZ/IGN | T<br>  P<br>  E | I P I | RAZ/IGN |    |

| Table 3–10: | STAT | Field | Descriptions |
|-------------|------|-------|--------------|
|             |      |       |              |

| Name | Extent | Туре | Description        |
|------|--------|------|--------------------|
| DPE  | 11     | W1C  | Data parity error. |
| TPE  | 12     | W1C  | Tag parity error.  |

### 3.9.1.11 ICache Flush Control register, IC\_FLUSH\_CTL

This is a write-only register. Writing any value to this register flushes the entire Icache.

#### 3.9.1.12 Exception Address register, EXC\_ADDR

The EXC\_ADDR register is a read-write register used to restart the machine after exceptions or interrupts. The HW\_REI instruction causes a return to the instruction pointed to by the EXC\_ADDR register. This register can be written both by hardware and software. Hardware writes happen as a result of exceptions/interrupts and CALLPAL instructions. Hardware writes which occur as a result of exceptions/interrupts take precedence over all other writes.

In case of an exception/interrupt, hardware writes a PC to this register in S6 of the execution pipeline. In case of precise exceptions, this is the PC of the instruction that caused the exception. In case of imprecise exceptions/interrupts, this is the PC of the next instruction that would have issued if the exception/interrupt was not reported.

In case of a CALLPAL instruction, the PC of the instruction after the CALLPAL is written to EXC\_ADDR in S5. Software writes of the register through the HW\_MTPR instruction also take place in S5. At a given time only a CALLPAL or HW\_MTPR instruction will attempt to write EXC\_ADDR as both these instructions are slotted to the E1 pipe.

BIT <0> of this register is used to indicate PAL mode. On a HW\_REI the mode of the machine is determined by BIT <0> of the EXC\_ADDR register.

#### Figure 3–17: EXC\_ADDR Read/Write format



#### 3.9.1.13 Exception Summary register, EXC\_SUM

The exception summary register records the different arithmetic traps that have occurred since the last time EXC\_SUM was written. Any write to this register clears bits <16:10>.

#### Figure 3–18: Exception Summary register Read Format, EXC\_SUM

| 63        |         | 16 15 14 13 12 11 10 09 00                                                                                |
|-----------|---------|-----------------------------------------------------------------------------------------------------------|
| <br>  · · | RAZ/IGN | I   I   U   F   D   I   S      <br>  O   N   N   O   Z   N   W   RAZ/IGN  <br>  V   E   F   V   E   V   C |

#### 3–18 PALcode and IPRs

| Name | Extent | Туре | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
|------|--------|------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| SWC  | 10     | WA   | Indicates Software completion possible. This bit is set after<br>a floating point instruction containing the /S modifier com-<br>pletes with an arithmetic trap and all previous floating point<br>instructions that trapped since the last MTPR EXC_SUM also<br>contained the /S modifier. The SWC bit is cleared whenever<br>a floating point instruction without the /S modifier completed<br>with an arithmetic trap. The bit remains cleared regardless<br>of additional arithmetic traps until the register is written via<br>an MTPR instruction. The bit is always cleared upon any<br>MTPR write to the EXC_SUM register. |
| INV  | 11     | WA   | Indicates invalid operation.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
| DZE  | 12     | WA   | Indicates divide by zero.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
| FOV  | 13     | WA   | Indicates floating point overflow.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
| UNF  | 14     | WA   | Indicates floating point underflow.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| INE  | 15     | WA   | Indicates floating inexact error.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| IOV  | 16     | WA   | Indicates Fbox convert to integer overflow or Integer Arithmetic<br>Overflow.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |

Table 3–11: EXC\_SUM Field Descriptions

#### 3.9.1.14 Exception Mask Register, EXC\_MASK

The exception mask register records the destinations of instructions that have caused an arithmetic trap, since the last time EXC\_MASK was cleared. The destination is recorded as a single bit mask in the 64 bit IPR representing F0-F31 and I0-I31. A write to EXC\_SUM clears the EXC\_MASK register.

#### Figure 3–19: Exception Mask register Read Format, EXC\_MASK



### 3.9.1.15 PAL Base Register, PAL\_BASE

The PAL\_BASE register is a read/write register which contains the base address for PALcode. The register is cleared by hardware on reset.

Figure 3–20: PAL\_BASE

| 63 |         | 40 39 |                 | 14 | 13      | 00 |
|----|---------|-------|-----------------|----|---------|----|
|    | RAZ/IGN |       | PAL_BASE[39:14] |    | RAZ/IGN |    |

#### 3.9.1.16 Processor Status, PS

The processor\_status register is a read/write register containing the current mode bits of the architecturally defined PS.

#### Figure 3–21: Processor Status, PS

| 63      |     | 03     | 02           | 00  |
|---------|-----|--------|--------------|-----|
| RAZ/IGN | i c | C<br>M | <br>  RAZ/IC | , i |

#### 3.9.1.17 Ibox Control/Status Register, ICSR

This is a read-write register which contains Ibox related control and status information.

#### Figure 3–22: Ibox Control/Status Register ICSR

 63
 39
 38
 37
 36
 35
 34
 33
 32
 31
 30
 29
 28
 27
 26
 25
 23
 22
 19
 18
 17
 16
 00

 I
 IT
 ISICRIF
 IF
 IF
 IPCIPCIS
 S
 IS
 IS
 PC
 IP
 IP
 I

 I
 IS
 IT
 ID
 IB
 IM
 IO
 I
 IL
 ID
 IC10]
 W
 IP
 MUX0
 IC
 IC
 IRAZ/IGNI

 I
 IS
 IT
 ID
 IB
 IM
 IO
 I
 IL
 ID
 IC10]
 W
 IP
 MUX1
 MUX0
 IC
 IC
 IRAZ/IGNI

 I
 T
 IA
 IE
 ID
 IT
 IS
 IE
 IE
 IE
 IE
 IC20]
 IS10]
 IO
 I

| Name | Extent | Туре         | Description                                                                                                       |
|------|--------|--------------|-------------------------------------------------------------------------------------------------------------------|
|      | 17     | RW,0         | TBD                                                                                                               |
|      | 18     | RW,0         | TBD                                                                                                               |
|      | 22:19  | <b>RW</b> ,0 | TBD                                                                                                               |
|      | 25:23  | RW,0         | TBD                                                                                                               |
| FPE  | 26     | RW,0         | If set floating point instructions may be issued. When clear<br>floating point instructions cause FEN exceptions. |
| HWE  | 27     | RW,0         | If set, allows PALRES instructions to be issued in kernel mode.                                                   |

### Table 3–12: ICSR Field Descriptions

| Table 3-12 (Cont.): | ICON FIEID | Descri       |                                                                                                                                                                                                                                                    |
|---------------------|------------|--------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Name                | Extent     | Туре         | Description                                                                                                                                                                                                                                        |
| SPE                 | 29:28      | RW,0         | If SPE<1> is set, it enables super page mapping of istream virtual addresses VA<39:13> directly to physical address PA<39:13: if VA<42:41> = 10. Virtual address bit VA<40> is ignored in this translation. Access is allowed only in kernel mode. |
|                     |            |              | SPE<0> when set, enables super page mapping of istream<br>virtual addresses VA<42:30>=1FFE (Hex) directly to physical<br>address PA<39:30>= 0(Hex). VA<30:13> is mapped directly<br>to PA<30:13>. Access is allowed only in kernel mode.           |
| SDE                 | 30         | RW,0         | If set, enables PAL shadow registers.                                                                                                                                                                                                              |
| SLE                 | 31         | RW,0         | If set, enables serial line interrupts.                                                                                                                                                                                                            |
|                     | 32         | RW,0         | TBD                                                                                                                                                                                                                                                |
|                     | 33         | RW,0         | TBD                                                                                                                                                                                                                                                |
| FMS                 | 34         | RW,0         | If set, forces miss on Icache references.                                                                                                                                                                                                          |
| FBT                 | 35         | RW,0         | If set, forces bad Icache tag parity.                                                                                                                                                                                                              |
| FBD                 | • 36       | RW,0         | If set, forces bad Icache data parity.                                                                                                                                                                                                             |
| CRDE                | 37         | <b>RW</b> ,0 | If set, enable correctable error interrupts.                                                                                                                                                                                                       |
| ISTA                | 38         | RO           | Reading this bit indicates ICACHE BIST status. If set, ICACHE BIST was successful.                                                                                                                                                                 |
| TST                 | 39         | RW,0         | Writing a 1 to this bit causes the TEST_STATUS_H pin of the chip to be asserted.                                                                                                                                                                   |

### Table 3–12 (Cont.): ICSR Field Descriptions

Table 3–13: Performance Counter 0 Programming information

| PCMUX0<3:0> | Input | Comment |
|-------------|-------|---------|
|             |       | TBD     |
|             |       |         |

### Table 3–14: Performance Counter 1 Programming information

| PCMUX1<2:0> | Input | Comment |
|-------------|-------|---------|
|             |       | TBD     |
|             |       |         |

### 3.9.1.18 Interrupt Priority Level Register, IPL

This is a read/write register containing the value of the architecturally specified IPL register. Whenever hardware detects an interrupt whose target IPL level is greater than the value in IPL<4:0>, an interrupt is taken.

Figure 3–23: Interrupt Priority Level Register, IPL

| RAZ/IGN   IPL<4:0> |
|--------------------|
|                    |

#### 3.9.1.19 Interrupt Id Register, INTID

This is a read only register. It is written by hardware with the target IPL of the highest priority pending interrupt. The hardware recognizes an interrupt if this IPL is greater than the IPL given by IPL<4:0>. Interrupt service routines may use the value of this register to determine the cause of the interrupt. PAL code, for the interrupt service, must ensure that the IPL level in INTID is greater than the IPL level specified by the IPL register. This restriction is required because a level sensitive hardware interrupt may disappear before the interrupt service routine is entered (passive release).

The contents of INTID are not correct on a HALT interrupt, as this particular interrupt does not have a target IPL at which it can be masked. When a HALT interrupt occurs INTID indicates the next highest priority pending interrupt. PAL code for interrupt service must check the ISR to determine if a HALT interrupt has occured.

### Figure 3-24: Interrupt Id Register, INTID



#### 3.9.1.20 Aynchronous System Trap Request Register, ASTRR

The Asynchronous System Trap Request Register is a read/write register which contains bits to request AST interrupts in each of the four processor modes(USEK). In order to generate an AST interrupt, the corresponding enable bit in the ASTER must be set and the current processor mode given in PS<4:3> should be equal or higher than the mode associated with the AST request.

#### 3–22 PALcode and IPRs

Figure 3–25: Asynchronous System Trap Request Register, ASTRR

| <br>    | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
|---------|-----------------------------------------------------------|
|         | U IS E K                                                  |
| RAZ/IGN | A A A A                                                   |
|         |                                                           |

# 3.9.1.21 Aynchronous System Trap Enable Register, ASTER

The Asynchronous System Trap Enable Register is a read/write register which contains bits to enable corresponding AST interrupt requests.

Figure 3–26: Asynchronous System Trap Enable Register, ASTER format

| 63 |         | 03 02 01 00                                  |
|----|---------|----------------------------------------------|
|    | RAZ/IGN | U  S  E  K  <br> A  A  A  A  <br> E  E  E  E |
| +  |         |                                              |

## 3.9.1.22 Software Interrupt Request Register. SIRR

The Software Interrupt Request Register is a write only register used to control software interrupt requests. A software request for a particular IPL may be requested by setting the appropriate bit in SIRR<15:1>. The internal hardware representation consists of a 15 bit register, known as the Software Interrupt Summary register(SISR). When any bit in SIRR<15:1> is set, the corresponding bit in SISR gets set. SISR is cleared on RESET.

#### Figure 3–27: Software Interrupt Request Register, SIRR write format



#### 3.9.1.23 Software Interrupt Clear Register. SICR

The Software Interrupt Clear Register is a write only register used to control the clearing of software interrupt requests. A software request for a particular IPL may be disabled by setting the appropriate bit in SICR<15:1>. This causes the corresponding interrupt request bit in SISR to be cleared.

Figure 3–28: Software Interrupt Clear Register, SICR write format

| 63             |     | 18               | 04 03       | 00  |
|----------------|-----|------------------|-------------|-----|
| <br> <br> <br> | IGN | <br>  SICR<1<br> | .5:1>  <br> | IGN |

## 3.9.1.24 HW Interrupt Clear register, HWINT\_CLR

This is a write-only register, used to clear edge-sensitive hardware interrupt requests.

## Figure 3-29: Hardware Interrupt Clear Register, HWINT\_CLR

| I     ICRIS  PC PC      I       I     ID     IL     I       I     ICRIS  PC PC      I |            |         |         |          | 32 | 00              |
|---------------------------------------------------------------------------------------|------------|---------|---------|----------|----|-----------------|
|                                                                                       | +<br> <br> | CR<br>D | S<br> L | PC<br> 1 |    | +<br> <br> <br> |

| Name | Extent | Туре | Description                                      |
|------|--------|------|--------------------------------------------------|
| PCOC | 32     | WO   | Clears perf counter 0 interrupt requests.        |
| PC1C | 33     | wo   | Clears perf counter 1 interrupt requests.        |
| SLC  | 34     | wo   | Clears serial line interrupt requests.           |
| CRDC | 35     | wo   | Clears correctable read data interrupt requests. |

## Table 3–15: HWINT\_CLR Field Descriptions

## 3.9.1.25 Interrupt Summary register, ISR

The Interrupt Summary register is a read only register which contains information about all pending hardware/software/AST interrupt requests. The SISR section of the ISR, is cleared on reset.

Figure 3–30: Interrupt Summary Register, ISR read format

| 63            |           | 29 28                          |                |               |          |        |        |          |          | <br>04 | 03        | 00                                      |
|---------------|-----------|--------------------------------|----------------|---------------|----------|--------|--------|----------|----------|--------|-----------|-----------------------------------------|
| <br>  RAZ<br> | C  <br> R | S  I  <br>L  2  <br>I  0  <br> | I   I<br>2   2 | I  P<br> 2  C | IP<br>IC | P<br>F | M<br>C | H<br>  L | A<br>  T |        | U<br> AS' | SEK  <br>IRR<3:0> <br>AND  <br>IER<3:0> |

3–24 PALcode and IPRs

| Name       | Extent | Туре | Description                                                                                                                                    |
|------------|--------|------|------------------------------------------------------------------------------------------------------------------------------------------------|
| ASTRR[3:0] | 3:0    | RO   | AST requests 3 through 0 (USEK) at IPL 2.                                                                                                      |
| SIRR[15:1] | 18:4   | RO,0 | Software interrupt requests 15 through 1 corresponding to IPL 15 through 1.                                                                    |
| ATR        | 19     | RO   | Is set if any AST request and corresponding enable bit is set<br>and if the processor mode is equal to or higher than the AST<br>request mode. |
| HLT        | 20     | RO   | External Hardware interrupt - halt .                                                                                                           |
| MCK        | 21     | RO   | External Hardware interrupt - system machine check (IPL 31).                                                                                   |
| PFL        | 22     | RO   | External Hardware interrupt - Powerfail (IPL 30).                                                                                              |
| PC1        | 23     | RO   | External hardware interrupt - Performance counter 1 (IPL 29).                                                                                  |
| PC0        | 24     | RO   | External hardware interrupt - Performance counter 0 (IPL 29).                                                                                  |
| I23        | 25     | RO   | External hardware interrupt at IPL 23.                                                                                                         |
| I22        | 26     | RO   | External hardware interrupt at IPL 22.                                                                                                         |
| I21        | 27     | RO   | External hardware interrupt at IPL 21.                                                                                                         |
| I20        | 28     | RO   | External hardware interrupt at IPL 20.                                                                                                         |
| SLI        | 29     | RO   | Serial line interrupt.                                                                                                                         |
| CRD        | 30     | RO   | Correctable ECC errors (IPL 31).                                                                                                               |

 Table 3–16:
 ISR read format Field Descriptions

## 3.9.1.26 Serial line transmit, SL\_XMIT

The serial line transmit register is a write-only register used to transmit bit-serial data off chip under the control of a software timing loop. In order to transmit data, the SLE bit in the ICSR must be set enabling the serial line. If the serial-line is enabled, the value of the TMT bit is transmitted on the SROM\_CLK\_H pin. In normal operation mode (not in test-mode), the SROM\_CLK\_H pin is overloaded and serves both the serial line transmission and the Icache serial ROM interface. This bit is cleared on RESET.





#### 3.9.1.27 Serial line receive, SL\_RCV

The serial line receive register is a read-only register used to receive bit-serial data under the control of a software timing loop. The RCV bit in the SL\_RCV register is functionally connected to the SROM\_DAT\_H pin. A serial line interrupt is requested whenever a transition is detected on the SROM\_DAT\_H pin and the SLE bit in the ICSR is set. During normal operations (not in test-mode), the SROM\_DAT\_H pin is overloaded and serves both the serial line reception and the ICache serial ROM interface.

#### Figure 3–32: Serial line receive Register, SL\_RCV

| ++-+   | ,<br> |
|--------|-------|
| i iv i | 1     |

# 3.9.2 Mbox and Dcache IPRs

# NOTE

Traps are factored into MBOX IPR write operations unless noted otherwise. Unless explicitly stated, IPRs are not cleared or set by hardware on chip or on timeout reset.

#### 3.9.2.1 DTB\_ASN, Dstream TB Address Space Number

The DTB\_ASN register is a write-only register which, when not in PALmode, must be written with an exact duplicate of the ITB\_ASN register's ASN field.

Figure 3–33: DTB\_ASN

6 5 5 0 3 7 6 0 +-----+ | ASN <6:0> | IGN | +----+

#### 3.9.2.2 DTB\_CM, Dstream TB Current Mode

The DTB\_CM register is a write-only register which, when not in PALmode, must be written with an exact duplicate of the Ibox Processor Status (IPS) register's CM field. These bits indicate the Current Mode of the machine.

3–26 PALcode and IPRs

#### Figure 3–34: DTB\_CM

| 6 |     | 00000     |
|---|-----|-----------|
| 3 |     | 5 4 3 2 0 |
|   |     | 10101     |
|   | IGN | M M  IGN  |
|   |     | 11101     |

#### Table 3–17: DTB\_CM Mode Bits

| CM<1> | CM<0> | Current Mode    |
|-------|-------|-----------------|
| 0     | 0     | Kernel Mode     |
| 0     | 1     | Executive Mode  |
| 1     | 0     | Supervisor Mode |
| 1     | 1     | User Mode       |

#### 3.9.2.3 DTB\_TAG, Dstream TB TAG

The DTB\_TAG register is a write-only register which writes the DTB tag and the contents of the DTB\_PTE register to the DTB. To insure the integrity of the DTBs, the DTB's PTE array is updated simultaneously from the internal DTB\_PTE register when the DTB\_TAG register is written. The entry to be written is chosen at the time of the DTB\_TAG write operation by a not-last-used algorithm implemented in hardware. A write to the DTB\_TAG register increments the TB entry pointer of the DTB which allows writing the entire set of DTB PTE and TAG entries. The TB entry pointer is initialized to entry zero on both chip reset and timeout reset.

#### Figure 3–35: DTB\_TAG, Dstream TB Tag



#### 3.9.2.4 Dstream TB PTE, DTB\_PTE

The DTB\_PTE register is a read/write register representing the 64-entry DTB page table entries. The entry to be written is chosen by a not-last-used algorithm implemented in hardware. Writes to the DTB\_PTE use the memory format bit positions as described in the Alpha SRM with the exception that some fields are ignored. In particular the PFN valid bit is not stored in the DTB.

To ensure the integrity of the DTB, the PTE is actually written to a temporary register and not transferred to the DTB until the DTB\_TAG register is written. As a result, writing the DTB\_PTE and then reading without an intervening DTB\_TAG write will not return the data previously written to the DTB\_PTE register.

Reads of the DTB\_PTE require two instructions. First, a read from the DTB\_PTE sends the PTE data to the DTB\_PTE\_TEMP register. A zero value is returned to the integer register file on a DTB\_PTE read. A second instruction reading from the DTB\_PTE\_TEMP register returns the PTE entry to the register file. Reading the DTB\_PTE register increments the TB entry pointer of the DTB which allows reading the entire set of DTB PTE entries.



Figure 3–36: DTB\_PTE, Dstream TB PTE

## 3.9.2.5 DTB\_PTE\_TEMP

The DTB\_PTE\_TEMP register is a read-only holding register for DTB\_PTE read data. Reads of the DTB\_PTE require two instructions to return the PTE data to the register file. The first reads the DTB\_PTE register to the DTB\_PTE\_TEMP register and returns zero to the register file. The second returns the DTB\_PTE\_TEMP register to the integer register file.

#### Figure 3–37: DTB\_PTE\_TEMP

| 63 |     | 39 |            | 12   |   |     |     |     |   |    |    |   |    |   |                 |           |
|----|-----|----|------------|------|---|-----|-----|-----|---|----|----|---|----|---|-----------------|-----------|
|    | RAZ | •  | PFN<39:13> | RA   | Z | I   | ŧ   | 1   | ł | l  | ł  | 1 | Í. | 1 | + <b></b> +<br> |           |
|    |     |    |            | <br> |   |     |     |     |   |    |    |   |    | 1 | 1               |           |
|    |     |    | Ì          |      |   | - i | - i | - i | Ì | Ì  | i  | Ì | Ì  | Ì | Í               |           |
|    |     |    |            |      |   | 1   | l   | I   | 1 | 1  | 1  | 1 | 1  | 1 | 1               |           |
|    |     |    | i<br>L     |      |   | 1   |     | 1   |   | i  |    |   | 1  |   | <br>+>          | FOR       |
|    |     |    | 1          |      |   | 1   |     |     | i | Ì  | 1  | i | i  | + | >               |           |
|    |     |    | l.         |      |   | ł   | Ì   | - i | 1 | Ì  | Ì  | 1 | +  |   | >               | KRE       |
|    |     |    |            |      |   | 1   | 1   | ١   | 1 | ł  |    | + |    |   | >               |           |
|    |     |    |            |      |   |     |     | 1   | ł | 1  | +. |   |    |   | >               |           |
|    |     |    |            |      |   |     | 1   | 1   | 1 | +• |    |   |    |   | >               | URE       |
|    |     |    | · 1        |      |   | 1   | 1   | 1   | + |    |    |   |    |   | >               | KWE       |
|    |     |    | I.         |      |   | 1   | 1   | +   |   |    |    |   |    |   | >               | EWE       |
|    |     |    | 1          |      |   | i   | +   |     |   |    |    |   |    |   | >               | SWE       |
|    |     |    | 1          |      |   | +   |     |     |   |    |    |   |    |   | >               | UWE       |
|    |     |    | +          | <br> |   |     |     |     |   |    |    |   |    |   | >               | PFN<3913> |

#### 3.9.2.6 MM\_STAT, Dstream MM Fault Status Register

When D-stream faults or Dcache parity errors occur the information about the fault is latched and saved in the MM\_STAT register. The VA, VA\_FORM and MM\_STAT registers are locked against further updates until software reads the VA register. MM\_STAT bits are only modified by hardware when the register is not locked and a memory management error, DTB miss, or Dcache parity error occurs. The MM\_STAT is not unlocked or cleared on reset.

| 63 |     | 17 | 16 |             | 10 |                               |                           |  |                | 01 00                                                          |
|----|-----|----|----|-------------|----|-------------------------------|---------------------------|--|----------------|----------------------------------------------------------------|
|    | RAZ |    |    |             | I  |                               |                           |  | 1 1            |                                                                |
|    |     |    | +  |             | +  | <br> <br> <br> <br> <br> <br> | +<br> <br> <br> <br> <br> |  | <br>   <br>  + | +-+<br>   >WR<br>+>ACV-<br>>FOR<br>>FOW<br>>DTB_MIS<br>>BAD_VA |
|    |     |    |    | ;<br>;<br>+ |    | +                             |                           |  |                | >RA<br>>OPCODE                                                 |

## Figure 3–38: MM\_STAT, Dstream MM Fault Register

| Table 3–18: | MM_STAT | Field | Descrip | tions |       |
|-------------|---------|-------|---------|-------|-------|
| 37          |         |       |         | ~     | • • • |

| Name     | Extent | Туре | Description                                                   |
|----------|--------|------|---------------------------------------------------------------|
| WR       | 0      | RO   | Set if reference which caused error was a write.              |
| ACV      | 1      | RO   | Set if reference caused an access violation. Includes bad VA. |
| FOR      | 2      | RO   | Set if reference was a read and the PTE's FOR bit was set.    |
| FOW      | 3      | RO   | Set if reference was a write and the PTE's FOW bit was set.   |
| DTB_MISS | 4      | RO   | Set if reference resulted in a DTB miss.                      |
|          |        |      |                                                               |

| Table 3-16 (Coll.): |        |      |                                             |  |  |  |  |
|---------------------|--------|------|---------------------------------------------|--|--|--|--|
| Name                | Extent | Туре | Description                                 |  |  |  |  |
| BAD_VA              | 5      | RO   | Set if reference had a bad virtual address. |  |  |  |  |
| RA                  | 10:6   | RO   | Ra field of the faulting instruction.       |  |  |  |  |
| OPCODE              | 16:11  | RO   | Opcode field of the faulting instruction.   |  |  |  |  |

|  | Table 3-18 ( | (Cont.): | MM_STAT Field Desc | riptions |
|--|--------------|----------|--------------------|----------|
|--|--------------|----------|--------------------|----------|

## 3.9.2.7 VA, Faulting Virtual Address

When D-stream faults, DTB misses, or Dcache parity errors occur the effective virtual address associated with the fault, miss, or error is latched in the read-only VA register. The VA, VA\_ FORM, and MM\_STAT registers are locked against further updates until software reads the VA register. The VA IPR is not unlocked on reset.





#### 3.9.2.8 VA FORM, Formatted Virtual Address

VA\_FORM contains the virtual page table entry address calculated as a function of the faulting VA and the Virtual Page Table Base (VA and MVPTBR registers). This is done as a performance enhancement to the Dstream TBmiss PALflow. The VA is formatted as a 32-bit PTE when the NT\_Mode bit, MCSR<SP0>, is set. VA\_FORM is a read-only IPR, and is locked on any D-stream fault, DTB miss, or Dcache parity error. The VA, VA\_FORM, and MM\_STAT registers are locked against further updates until software reads the VA register. The VA\_FORM IPR is not unlocked on reset. Figure 3-40 describes VA\_FORM when MCSR<SP0> is clear. Figure 3-41 describes VA\_FORM when MCSR<SP0> is set.

| Figure 3-40: | VA FORM | . Formatted VA | Register for NT | Mode=0 |
|--------------|---------|----------------|-----------------|--------|
|--------------|---------|----------------|-----------------|--------|

| 1 | VPTB<63:33> |   | i | VA<42:13> |   | I | 0 | · 1 |
|---|-------------|---|---|-----------|---|---|---|-----|
| + |             |   | + |           |   | + |   | +   |
| 3 |             | 3 | 2 |           | 3 | 2 |   | 0   |
| 6 |             | 3 | 3 |           | 0 | 0 |   | 0   |

#### 3–30 PALcode and IPRs

| Figure 3–41: VA_FORM, Formatted VA Register, NT_Mode= | Figure 3-41: | 1: VA FOR | M. Formatted | VA Register | NT Mode=1 |
|-------------------------------------------------------|--------------|-----------|--------------|-------------|-----------|
|-------------------------------------------------------|--------------|-----------|--------------|-------------|-----------|

| 6           | 3 | 2      | 2 | 2 | . 0       | 0 | 0   |
|-------------|---|--------|---|---|-----------|---|-----|
| 3           | 0 | 9<br>+ | 2 | 1 | . 3       | 2 | 0   |
| VPTB<63:30> |   | i      | 0 | i | VA<31:13> |   | 0 1 |

| Table 3–19: | VA FORM | <b>Field Descriptions</b> |
|-------------|---------|---------------------------|
|-------------|---------|---------------------------|

| Name      | Extent | Туре | Description                                                        |
|-----------|--------|------|--------------------------------------------------------------------|
| VA<42:13> | 32:03  | RO   | Subset of the original faulting Virtual Address,NT_Mode=0.         |
| VPTB      | 63:33  | RO   | Virtual Page Table Base address as stored in MVPTBR,NT_<br>Mode=0. |
| VA<31:13> | 21:03  | RO   | Subset of the original faulting Virtual Address,NT_Mode=1.         |
| VPTB      | 63:30  | RO   | Virtual Page Table Base address as stored in MVPTBR,NT_<br>Mode=1. |

#### 3.9.2.9 MVPTBR, Mbox Virtual Page Table Base Register

MVPTBR contains the virtual address of the base of the page table structure. It is stored in the Mbox to be used in calculating the VA\_FORM IPR for the Dstream TBmiss PAL flow. Unlike the VA register, the MVPTBR is not locked against further updates when a Dstream fault, DTB Miss or Dcache parity error occurs. The MVPTBR is a write-only IPR that looks like this:

#### Figure 3–42: MVPTBR



#### 3.9.2.10 DC\_PERR\_STAT, Dcache Parity Error Status

When a Dcache parity error occurs, the error status is latched and saved in the DC\_PERR\_STAT register. The VA, VA\_FORM and MM\_STAT registers are locked against further updates until software reads the VA register. If a Dcache parity error is detected while the Dcache parity error status register is unlocked, the error status is loaded into DC\_PERR\_STAT<6:2>. The LOCK bit is set and the register is locked against further updates (except for the SEO bit) until software writes a "one" to clear the LOCK bit. The SEO bit is set when a Dcache parity error occurs while the Dcache parity error status register is locked. Once the SEO bit is set it is locked against further updates until the software writes a "one" to DC\_PERR\_STAT<0> to unlock and clear the bit. The DC\_PERR\_STAT is not unlocked or cleared on reset.

## Figure 3-43: DC\_PERR\_STAT, Dcache Parity Error Status



| Name | Extent | Туре | Description                                                                                                                                                            |
|------|--------|------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| SEO  | 0      | W1C  | Set if second Dcache parity error occurred after register was locked.                                                                                                  |
| LOCK | 1      | W1C  | Set if parity error detected in Dcache. Bits <5:2> are locked<br>against further updates when this bit is set. Bits <5:2>*are<br>cleared when the LOCK bit is cleared. |
| DP0  | 2      | RO   | Set on data parity error in Dcache bank 0.                                                                                                                             |
| DP1  | 3      | RO   | Set on data parity error in Dcache bank1.                                                                                                                              |
| TP0  | 4      | RO   | Set on tag parity error in Dcache bank 0.                                                                                                                              |
| TP1  | 5      | RO   | Set on tag parity error in Dcache bank 1.                                                                                                                              |

#### Table 3–20: DC\_PERR\_STAT Field Descriptions

#### 3.9.2.11 Dstream TB Invalidate All Process, DTBIAP

This is a write-only register. Any write to this register invalidates all DTB entries in which the ASM bit is equal to zero.

#### 3.9.2.12 Dstream TB Invalidate All, DTBIA

This is a write-only register. Any write to this register invalidates all 64 DTB entries, and resets the DTB NLU pointer to its initial state.

#### 3.9.2.13 DTBIS, Dstream TB Invalidate Single

This is a write-only register. Writing a virtual address to this IPR invalidates the DTB entry that meets any one of the following criteria:

- A DTB entry whose VA field matches DTBIS<42:13> and whose ASN field matches DTB\_ASN<63:57>.
- A DTB entry whose VA field matches DTBIS<42:13> and whose ASM bit is set.

# DEC Chip 21164-AA (EV5 CPU) Specification, Revision 1.9, December 1992





# NOTE

The DTBIS is written before the normal IBOX trap point. The DTB invalidate single operation will be aborted by the IBOX only for the following trap conditions: ITB miss, PC mispredict, or when the HW\_MTPR DTBIS is executed in user mode.

#### 3.9.2.14 MCSR, Mbox Control Register

The MCSR register is a read/write register that controls features and records status in the Mbox. This register is cleared on chip reset but not on timeout reset.

## Figure 3–45: MCSR, Mbox Control Register



| Table 3- | 21: | MCSR | Field | Descri | ptions |
|----------|-----|------|-------|--------|--------|
|----------|-----|------|-------|--------|--------|

| Name       | Extent | Туре | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
|------------|--------|------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| BIG_ENDIAN | 0      | RW,0 | Big Endian mode enable. When set, bit 2 of the physical address is inverted for all longword Dstream references.                                                                                                                                                                                                                                                                                                                                                                                                                                               |
| SP<1:0>    | 2:1    | RW,0 | Super page mode enables. SP<1> enables one-to-one super<br>page mapping of D-stream virtual addresses with VA<39:13><br>directly to physical addresses PA<39:13>, if virtual address<br>bits VA<42:41> = 2. Virtual address bit VA<40> is ignored<br>in this translation. SP<0> enables one-to-one super page<br>mapping of D-stream virtual addresses with VA<42:30> =<br>1FFE(Hex) to physical addresses with PA<39:30> = 0(Hex).<br>SP<0> is the NT_Mode bit that is used to control VA format-<br>ting on a read from the VA_FORM IPR. Superpage access is |

only allowed in kernel mode.

| Table 3-21 (Cont.): | MCSR Field Descriptions |      |                                                                                                                                                                                                                                                        |  |  |
|---------------------|-------------------------|------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|
| Name                | Extent                  | Туре | Description                                                                                                                                                                                                                                            |  |  |
| MBOX_TEST_SEL       | 3                       | RW,0 | MBOX Test Select. This bit is used to control the MBOX/CBOX<br>parallel port mux selection. When set, the MBOX P_PORT_<br>DATA<8:0> bus is sent to the DECchip 21164-AA parallel test<br>port. This bit is used for diagnostic and test purposes only. |  |  |

| Table 3-21 (Cont.): | MCSR F | ield | Descri | ptions |
|---------------------|--------|------|--------|--------|
|---------------------|--------|------|--------|--------|

# 3.9.2.15 DC\_MODE, Dcache Mode Register

The DC\_MODE register is a read/write register that controls diagnostic and test modes in the Dcache. This register is cleared on chip reset but not on timeout reset.

| Figure 3-46: | DC MODE. | Dcache | Mode Register | 1 |
|--------------|----------|--------|---------------|---|
|--------------|----------|--------|---------------|---|



| Table 3-22: | DC_MO | DE Field | Descriptions |
|-------------|-------|----------|--------------|
|-------------|-------|----------|--------------|

| Name            | Extent | Туре         | Description                                                                                                                                                                                                                                                                                                                                                                     |
|-----------------|--------|--------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| DC_ENA          | 0      | RW,0         | Software Dcache enable. Unless the Dcache has been dis-<br>abled in hardware (DC_DOA is set), the DC_ENA bit enables<br>the Dcache. (The Dcache is enabled if DC_ENA=1 AND DC_<br>DOA=0). When clear, the Dcache command will be set to NOP,<br>all D-stream references will be forced to miss in the Dcache,<br>and outstanding fills will be blocked from filling the Dcache. |
| DC_FHIT         | 1      | <b>RW</b> ,0 | Dcache force hit. When set, this bit forces all D-stream references to hit in the Dcache.                                                                                                                                                                                                                                                                                       |
| DC_BAD_PARITY   | 2      | RW,0         | When set, this bit inverts the data parity inputs to the Dcache. This will have the effect of putting bad data par-<br>ity into the Dcache on stores that hit in the Dcache. This bit<br>will have no effect on the tag parity written to the Dcache or<br>the data parity written to the CBOX Write Data Buffer.                                                               |
| DC_PERR_DISABLE | 3      | RW,0         | When set, this bit disables Dcache parity error reporting.<br>When clear, this bit enables all Dcache tag and data parity<br>errors.                                                                                                                                                                                                                                            |

# DEC Chip 21164-AA (EV5 CPU) Specification, Revision 1.9, December 1992

| Name   | Extent | Туре | Description                                                                                                                                                                                                                                                                                                                                                                                                                     |
|--------|--------|------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| DC_DOA | 4      | RO   | Hardware Dcache Disable. When set, the Dcache is fautly and<br>has been disabled under hardware control (a programmable/readab<br>fuse resides in the MBOX). The Dcache command will be set<br>to NOP, all D-stream references will be forced to miss in the<br>Dcache, and outstanding fills will be blocked from filling the<br>Dcache. When DC_DOA is clear, the Dcache can be enabled<br>under software control (DC_ENA=1). |

# Table 3-22 (Cont.): DC\_MODE Field Descriptions

# NOTE

The DC\_MODE bits are only used for diagnostics and test. For normal operation, they will only be supported in the following configuration:

DC\_ENA = 1 DC\_FHIT = 0 DC\_BAD\_PARITY = 0 DC\_PERR\_DISABLE = 0

## 3.9.2.16 MAF\_MODE, MAF Mode Register

The MAF\_MODE register is a read/write register that controls diagnostic and test modes in the MBOX Miss Address File. This register is cleared on chip reset but not on timeout reset.

# Figure 3-47: MAF\_MODE, MAF Mode Register



| TADIE J-23. WAT WODE FIELD DESCRIPTION | Table 3–23: | MAF MODE | <b>Field Descriptions</b> |
|----------------------------------------|-------------|----------|---------------------------|
|----------------------------------------|-------------|----------|---------------------------|

| Name          | Extent | Туре | Description                                                                                                                         |
|---------------|--------|------|-------------------------------------------------------------------------------------------------------------------------------------|
| DREAD_NOMERGE | 0      | RW,0 | Miss Address File DREAD Merge Disable. When set, this<br>bit disables all merging in the DREAD portion of the miss<br>address file. |

# DIGITAL RESTRICTED DISTRIBUTION

#### PALcode and IPRs 3-35

| Name            | Extent | Туре | Description                                                                                                                                                                                 |
|-----------------|--------|------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| WB_FLUSH_ALWAYS | 1      | RW,0 | When set, this bit forces the write buffer to always flush whenever an entry is made to it.                                                                                                 |
| WB_NOMERGE      | 2      | RW,0 | When set, this bit disables all merging in the write buffer.                                                                                                                                |
| MAF_NO_BYPASS   | 3      | RW,0 | When set, this bit disables Dread bypass requests in the MAF<br>arbiter. All Dread requests will be loaded into the MAF pend-<br>ing queue before arbitration takes place.                  |
| DREAD_PENDING   | 4      | R,0  | This bit indicates the status of the MAF Dread file. When set,<br>there are one or more outstanding Dread requests in the MAF<br>file. When clear, there are no outstanding Dread requests. |

| Table 3-23 | (Cont.): | MAF | <b>MODE Field Descriptions</b> |
|------------|----------|-----|--------------------------------|
|------------|----------|-----|--------------------------------|

# NOTE

The following bits are only used for diagnostics and test. For normal operation, they will only be supported in the following configuration:

DREAD\_NOMERGE = 0 WB\_FLUSH\_ALWAYS = 0 WB\_NOMERGE = 0 MAF\_NO\_BYPASS = 0

#### 3.9.2.17 DC\_FLUSH, Dcache Flush Register

A write to this register clears all the valid bits in both banks of the Dcache.

## 3.9.2.18 ALT\_MODE, Alternate mode

ALT\_MODE is a write-only IPR. The AM field specifies the alternate processor mode used by HW\_LD and HW\_ST instructions.

| Fi | igure | 348: | ALT_MODE |  |
|----|-------|------|----------|--|
|    |       |      |          |  |

| 6 |     | 0 | 00 | 0 | 0     |
|---|-----|---|----|---|-------|
| 3 |     | 5 |    | 2 | 0     |
|   | IGN |   |    | • | SN SN |

#### 3–36 PALcode and IPRs

| Table 3-24: ALT | Table 3-24: ALT Mode |  |  |  |  |  |
|-----------------|----------------------|--|--|--|--|--|
| ALT_MODE<4:3>   | Mode                 |  |  |  |  |  |
| 0 0             | Kernel               |  |  |  |  |  |
| 01              | Executive            |  |  |  |  |  |
| 10              | Supervisor           |  |  |  |  |  |
| 11              | User                 |  |  |  |  |  |

#### 3.9.2.19 CC, Cycle Counter

DECchip 21164-AA supports a cycle counter as described in the Alpha SRM. The low half of the counter, when enabled, increments once each CPU cycle. The upper half of the CC register is the counter offset. CC<63:32> is written on a HW\_MTPR to the CC IPR; bits <31:0> are unchanged. CC\_CTL<32> is used to enable or disable the cycle counter. The lower half of the cycle counter is written on a HW\_MTPR to the CC\_CTL IPR.

The CC register is read by the RPCC instruction as defined in the Alpha SRM (The RPCC instruction returns a 64-bit value) The cycle counter is disabled on reset.

The write-only CC Register looks like this:

#### Figure 3-49: CC, Cycle Counter Register

63 32 31 0 +-----+ | CC, offset | IGN | +----+

## 3.9.2.20 CC\_CTL, Cycle Counter Control

The CC\_CTL register is a write-only register that is used to write the low 32 bits of the cycle counter and to enable or disable the counter. Bits CC<31:4> are written with the value CC\_CTL<31:4> on a HW\_MTPR to the CC\_CTL register. Bits CC<3:0> are written with zero; bits CC<63:32> are not changed. If CC\_CTL<32> is set then the counter is enabled, otherwise the counter is disabled.





| Table 3–25: CC_CTL Field Descriptions |        |      |                                                                        |  |
|---------------------------------------|--------|------|------------------------------------------------------------------------|--|
| Name                                  | Extent | Туре | Description                                                            |  |
| Count<31:4>                           | 31:4   | W0   | Cycle count. This value is loaded into bits <31:4> of the CC register. |  |
| CC_ENA                                | 32     | WO   | Cycle Counter enable. When set, this bit enables the CC register.      |  |

## 3.9.2.21 DC\_TEST\_CTL, Dcache Test TAG Control Register

The DC\_TEST\_CTL register is a read/write IPR used exclusively for test and diagnostics.

An address written to this register will be used to index into the Dcache array when reading or writing the DC\_TEST\_TAG register. See Section 3.9.2.22 for a description of how this register is used.

#### Figure 3–51: DC\_TEST\_CTL, Dcache Test TAG Control Register



| Tab | le : | 32( | 6: | DC | TEST | CTL | Field | Descrip | tions |
|-----|------|-----|----|----|------|-----|-------|---------|-------|
|     |      |     |    |    |      |     |       |         |       |

| Name  | • | Extent | Туре | Description                                                                                                                                                                                                                 |
|-------|---|--------|------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| BANK0 |   | 0      | RW   | Dcache Bank0 enable. When set, reads from DC_TEST_TAG will return the tag from Dcache bank0 and writes to DC_TEST_TAG will write to Dcache bank0. When clear, reads from DC_TEST_TAG will return the tag from Dcache bank1. |

#### 3–38 PALcode and IPRs

| Name  | Extent | Туре | Description                                                                                                                       |
|-------|--------|------|-----------------------------------------------------------------------------------------------------------------------------------|
| BANK1 | 1      | RW   | Dcache Bank1 enable. When set, writes to DC_TEST_TAG will write to Dcache bank1. This bit has no effect on reads.                 |
| INDEX | 12:3   | RW   | Dcache tag index. This field is used on reads/writes from/to<br>the DC_TEST_TAG register to index into the Dcache tag ar-<br>ray. |

| Table 3-26 ( | (Cont.): | DC TEST | CTL | Field | Descriptions |
|--------------|----------|---------|-----|-------|--------------|
|              |          |         |     |       |              |

# 3.9.2.22 DC\_TEST\_TAG, Dcache Test TAG Register

The DC\_TEST\_TAG register is a read/write IPR used exclusively for test and diagnostics.

When DC\_TEST\_TAG is read, the value in the DC\_TEST\_CTL register is used to index into the Dcache and the value in the tag, tag parity, valid and data parity bits for that index are read out of the Dcache and loaded into the DC\_TEST\_TAG\_TEMP IPR register. A zero value is returned to the integer register file. If BANK0 is set, the read is from Dcache bank0. Otherwise it is from Dcache bank1.

When DC\_TEST\_TAG is written, the value written to DC\_TEST\_TAG is written to the Dcache index referenced by the value in the DC\_TEST\_CTL register. The tag, tag parity, and valid bits are affected by this write. Data parity bits are not affected by this write (use DC\_MODE<DC\_BAD\_PARITY> and force hit modes). If BANK0 is set, the write is to Dcache bank0. If BANK1 is set, the write is to Dcache bank1. If both are set, the write will occur to both banks.

| Figure 3–52: DC_TEST_TAG, Dcache Test TAC | G Register |
|-------------------------------------------|------------|
|-------------------------------------------|------------|

| 63  | 39 38  |   | 13 12 11 | 03 02 01 00                  |
|-----|--------|---|----------|------------------------------|
| IGN | +<br>! |   |          | IGN                          |
|     |        |   |          | +<br>!                       |
|     |        | 1 |          | +> TAG_PARITY<br>> OWO VALID |
|     |        |   | +        | > OW1_VALID<br>> TAG<38:13>  |

| Table | 3-27: | DC | TEST | <b>TAG Field</b> | Descriptions |
|-------|-------|----|------|------------------|--------------|
|-------|-------|----|------|------------------|--------------|

| Name       | Extent | Туре | Description                                                                                                            |
|------------|--------|------|------------------------------------------------------------------------------------------------------------------------|
| TAG_PARITY | 2      | WO   | Tag Parity. This bit refers to the Dcache tag parity bit which covers tag bits 38 through 13 (valid bits not covered). |
| OW0_VALID  | 11     | WO   | Octaword valid bit 0. This bit refers to the Dcache valid bit for the low order octaword within a Dcache 32B block.    |
| OW1_VALID  | 12     | WO   | Octaword valid bit 1. This bit refers to the Dcache valid bit for the high order octaword within a Dcache 32B block.   |

| Table 3–27 (Cont.): | DC_TEST_ | TAG FI | eld Descriptions                                                                                           |
|---------------------|----------|--------|------------------------------------------------------------------------------------------------------------|
| Name                | Extent   | Туре   | Description                                                                                                |
| TAG                 | 38:13    | wo     | Tag<38:13>. This refers to the tag field in the Dcache array.<br>(Note: Bit 39 is not stored in the array) |
|                     |          |        |                                                                                                            |

DO TEOT TAO ENAN

#### 3.9.2.23 DC\_TEST\_TAG\_TEMP, Dcache Test TAG Temp Register

The DC\_TEST\_TAG\_TEMP register is a read-only IPR used exclusively for test and diagnostics.

Reading the Dcache tag array requires a 2 step process. First, a read from DC\_TEST\_TAG reads the tag array and data parity bits and loads them into the DC\_TEST\_TAG\_TEMP register. An undefined value is returned to the integer register file. A second read of the DC\_TEST\_TAG\_ TEMP register will return the Dcache test data to the register file.

#### Figure 3–53: DC\_TEST\_TAG\_TEMP, Dcache Test TAG Temp Register



| Name         | Extent | Туре | Description                                                                                                                                             |
|--------------|--------|------|---------------------------------------------------------------------------------------------------------------------------------------------------------|
| TAG_PARITY   | 2      | RO   | Tag Parity. This bit refers to the Dcache tag parity bit which<br>covers tag bits 38 through 13 (valid bits not covered).                               |
| DATA_PAR0<0> | 3      | RO   | Data Parity. This bit refers to the Bank0 Dcache data parity<br>bit which covers the lower longword of data indexed by dc_<br>test_ctl <index>.</index> |
| DATA_PAR0<1> | 4      | RO   | Data Parity. This bit refers to the Bank0 Dcache data parity<br>bit which covers the upper longword of data indexed by DC_<br>TEST_CTL <index>.</index> |
| DATA_PAR1<0> | 5      | RO   | Data Parity. This bit refers to the Bank1 Dcache data parity<br>bit which covers the lower longword of data indexed by DC_<br>TEST_CTL <index>.</index> |

| Table 3-28: | DC TEST | TAG TEM | IP Field Descriptions |
|-------------|---------|---------|-----------------------|
|-------------|---------|---------|-----------------------|

| Name         | Extent | Туре | Description                                                                                                                                             |
|--------------|--------|------|---------------------------------------------------------------------------------------------------------------------------------------------------------|
| DATA_PAR1<1> | 6      | RO   | Data Parity. This bit refers to the Bank1 Dcache data parity<br>bit which covers the upper longword of data indexed by DC_<br>TEST_CTL <index>.</index> |
| OW0_VALID    | 11     | RO   | Octaword valid bit 0. This bit refers to the Dcache valid bit for the low order octaword within a Dcache 32B block.                                     |
| OW1_VALID    | 12     | RO   | Octaword valid bit 1. This bit refers to the Dcache valid bit<br>for the high order octaword within a Dcache 32B block.                                 |
| TAG          | 38:13  | RO   | Tag<38:13>. This refers to the tag field in the Dcache array.<br>(Note: Bit 39 is not stored in the array)                                              |

| Table 3-28 ( | (Cont.): | DC TEST | TAG TEMP | Field | Descriptions |
|--------------|----------|---------|----------|-------|--------------|
|              |          |         |          |       |              |

## **DIGITAL RESTRICTED DISTRIBUTION**

PALcode and IPRs 3-41

# 3.9.3 Cbox IPRs

DECchip 21164-AA specific IPRs for controlling Scache, Bcache, System Configuration, and logging error information are listed below. These IPR's cannot be read or written from the system. These IPRs have been placed in the 1MB region of DECchip 21164-AA specific I/O address space ranging from FFFFF00000 to FFFFFFFFF. Any read or write to undefined IPR in this address space will produce **UNDEFINED** behavior.

| Register    | Address      | Туре          | Description                                                         |
|-------------|--------------|---------------|---------------------------------------------------------------------|
| SC_CTL      | FF FFF0 00A8 | ( <b>RW</b> ) | Controls Scache behavior.                                           |
| SC_STAT     | FF FFF0 00E8 | (R)           | Logs Scache related errors.                                         |
| SC_ADDR     | FF FFF0 0188 | ( <b>R</b> )  | Contains the address for Scache related errors.                     |
| BC_CONTROL  | FF FFF0 0128 | ( <b>W</b> )  | Controls Bcache/System Interface and Bcach testing.                 |
| BC_CONFIG   | FF FFF0 01C8 | ( <b>W</b> )  | Contains Bcache configuration parameters.                           |
| BC_TAG_ADDR | FF FFF0 0108 | ( <b>R</b> )  | Contains tag and control bits for fills from Bcache.                |
| EI_STAT     | FF FFF0 0168 | (R)           | Logs Bcache/system related errors.                                  |
| EI_ADDR     | FF FFF0 0148 | ( <b>R</b> )  | Contains the address for Bcache/system re-<br>lated errors.         |
| FILL_SYN    | FF FFF0 0068 | ( <b>R</b> )  | Contains fill syndrome or parity bits for fills from Bcache/memory. |
| LD_LOCK     | FF FFF0 01E8 | ( <b>R</b> )  | Contains the address for LDx_L commands.                            |

 Table 3–29:
 CBOX\_IPRS Descriptions

## 3.9.3.1 Scache Control Register, SC\_CTL

SC\_CTL is a read/write register which controls the behavior of the Scache.



#### 3–42 PALcode and IPRs

| Field       | Extent | Туре            | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
|-------------|--------|-----------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| SC_FHIT     | 0      | (RW,0)          | When set, this bit can be used to force cacheable Ld's and<br>ST's to hit in the Scache, irrespective of the tag status<br>bits. Non-cacheable references will not be forced to hit ir<br>the Scache and will be driven off-chip. In this mode, only<br>one Scache set may be enabled. The Scache tag and data<br>parity checking will be disabled.<br>For STx, value of status, parity and tag bits specified by<br>SC_TAG_STAT field will be used to write the Scache tag<br>Scache tag will be the STx address received by the Cbox<br>Scache tags will be written with the STx address. SC_<br>FHIT bit will be cleared on reset. |
| SC_FLUSH    | 1      | ( <b>RW,0</b> ) | When set, this bit can be used to flush all the valid bits in<br>the Scache tag store. It will be cleared on reset.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
| SC_TAG_STAT | 7:2    | ( <b>RW</b> )   | This bit field can be only used in the SC_FHIT mode to<br>write any combination of tag status and parity bits in the<br>Scache. The parity bit can be used to write bad tag parity<br>The correct value of tag parity is even. These bits will be<br>cleared on reset. See Table 3–31 for the encodings.                                                                                                                                                                                                                                                                                                                              |
| SC_FB_DP    | 11:08  | ( <b>RW,0</b> ) | <ul> <li>This field can be used to write bad data parity for the selected LW's within the OW when writing the Scache. If any one of these bits is set to one, then the corresponding LW's computed parity value will be inverted when writing the Scache.</li> <li>For Scache writes, the Cbox allocates two consecutive cy cles to write up to two OW's based on the LW valid bits received from the Mbox. Therefore, the same LW parity control bits will be used for writing both OWs. For exam ple, Bit 8 corresponds to LW0 and LW4. This bit field will be cleared on reset.</li> </ul>                                         |
| SC_BLK_SIZE | 12     | ( <b>RW,1</b> ) | This bit can be used to select the Scache and Bcache block<br>size to be either 64 byte or 32 byte. The Scache and Bcache<br>will always have identical block size. All the Bcache and<br>memory fills or write transactions will be of the selected<br>block size. At the power up time this bit will be set and the<br>default block size will be 64 byte. When clear, the block<br>size will be 32 byte. This bit must be set to the desired<br>value to reflect the correct Scache/Bcache block size before<br>DECchip 21164-AA does the first cacheable read or write<br>from Bcache or system.                                  |

Table 3–30: SC\_CTL Field Descriptions

| Field     | Extent | Туре            | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
|-----------|--------|-----------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| SC_SET_EN | 15:13  | ( <b>RW,1</b> ) | This field will be used to enable the Scache sets. Only one<br>or all three sets may be enabled at a time. Enabling any<br>combination of two sets at a time results in unpredictable<br>behavior.<br>During chip test, bad sets in the Scache will be per-<br>manently disabled by blowing away their fuses. "Fuse<br>Blower" mechanism will enable either all sets or one set.<br>Any write to enable permanently disabled set will have no<br>effect. Power-up code must first read this ipr to find out<br>the good sets before enabling them. |

| Table 3-30 | (Cont.): | SC CTL | Field | Descriptions |
|------------|----------|--------|-------|--------------|
|            |          |        |       |              |

Table 3–31: SC\_TAG\_STAT Field Description

| Scache Tag Status<7:2> | Description                                                    |
|------------------------|----------------------------------------------------------------|
| SC_TAG_STAT<7:4>       | Tag Parity, Valid, Shared, Dirty; bits 7, 6, 5, 4 respectively |
| SC_TAG_STAT<3:2>       | OW Modified bits                                               |

# 3.9.3.2 Scache Status Register, SC\_STAT

The Scache status register is read only. It is not cleared or unlocked by reset. Any PAL code read of this register unlocks SC\_ADDR and SC\_STAT and clears SC\_STAT.

If an Scache tag or data parity error is detected during an Scache lookup, the SC\_STAT register is locked against further updates from subsequent transactions.



# 3-44 PALcode and IPRs

| Field       | Extent | Туре | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
|-------------|--------|------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| SC_TPERR    | 2:0    | (RO) | These bits, when set indicate that Scache tag lookup re-<br>sulted in a tag parity error and identify the set that had<br>the tag parity error.                                                                                                                                                                                                                                                                                                                     |
| SC_DPERR    | 10:3   | (RO) | These bits, when set indicate that Scache read resulted<br>in a data parity error and indicate which LW within the<br>two OWs had the data parity error. These bits are loaded<br>if any LW within two OW's read from the Scache during<br>lookup had a data parity error. Bit 3 corresponds to LWO<br>as shown in the diagram above.<br>If SC_CTL <sc_fhit> is set, this field will be used for<br/>loading the LW parity bits read out from the Scache.</sc_fhit> |
| CBOX_CMD    | 15:11  | (RO) | This field indicates the Scache transaction that resulted<br>in a Scache tag or data parity error. This field will be<br>written at the time the actual Scache error bit is written.<br>The Scache transaction may be D-read, I-read, or Write<br>command from the Mbox or any one of the fill or victim<br>commands from the BIU or the system command being<br>serviced. See Table 3-33 for the encodings.                                                        |
| SC_SCND_ERR | 16     | (RO) | This bit, when set indicates that an Scache transaction<br>resulted in a parity error while the SC_TPERR or SC_<br>DPERR bit was already set from the earlier transaction.<br>This bit is not set for two errors in different octawords of<br>the same transaction.                                                                                                                                                                                                 |

| Table 3–32: SC_STAT Field Descrip | ptions |
|-----------------------------------|--------|
|-----------------------------------|--------|

Table 3–33: SC\_CMD Field Descriptions

| SC CMD Source<15:14> | SC CMD Encoding<13:11> | Description                    |
|----------------------|------------------------|--------------------------------|
| 1x                   | 100                    | Set Shared from System         |
|                      | 101                    | Read Dirty from System         |
|                      | 010                    | Invalidate from System         |
|                      | 111                    | Scache Fill from Bcache/memory |
| 00                   | 001                    | Scache I-read                  |
| 01                   | 001                    | Scache D-read                  |
|                      | 011                    | Scache D-write                 |

## 3.9.3.3 Scache Address Register, SC\_ADDR

The Scache Address register is read only. It is not cleared or unlocked by reset. The address gets loaded in this register every time the Scache is accessed if one of the error bits in the SC\_STAT register is not set. If an Scache tag or data parity error is detected, then this register gets locked preventing further updates. This register is unlocked whenever SC\_STAT is read.

For Scache Reads the address bits <39:4> are valid to identify the address being driven to the Scache. Address bit <4> identifies which OW was accessed first. For each Scache lookup, there is one tag access and two data access cycles. If there is a hit, two OWs are read out in consecutive CPU cycles. Tag parity error is detected only while reading the first OW. However, data parity error can be detected on either of the two OWs.

If SC\_CTL<SC\_FHIT> is set, SC\_ADDR is used for storing the tag and status bits. Each block in the Scache tag store has unique valid, shared, dirty and modify bits on a sub-block (32B) basis. Tag and parity bits are common for both sub-blocks. For LD's, Scache drives these bits from the set which is enabled. In this mode, tag and data parity checking are disabled and the SC\_ADDR and SC\_STAT iprs are not locked on a error.



RAO --> Read As One

#### 3.9.3.4 Bcache Control Register, BC\_CONTROL

The Bcache control register is write only.



#### 3–46 PALcode and IPRs

| Field            | Extent | Туре   | Description                                                                                                                                                                                                                                                                                                                                                                                                        |  |
|------------------|--------|--------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| BC_ENABLED       | 0      | (WO,0) | When set, the external Bcache is enabled. When clear, the<br>Bcache is disabled. When the Bcache is disabled, the BIU<br>will neither do external cache reads nor writes. This bit<br>will be cleared on reset.                                                                                                                                                                                                    |  |
| RESERVED         | 1      | (WO,0) | Must be zero.                                                                                                                                                                                                                                                                                                                                                                                                      |  |
| EI_OPT_CMD       | 2      | (WO,0) | When set, the optional commands, LOCK, SET DIR<br>and MEMORY BARRIER will be driven to the DECo<br>21164-AA external interface command pins to be acknow<br>edged by the system interface. When clear, these commands will be internally acknowledged by DECchip 211<br>AA and will not be driven off-chip to the system interface.<br>This bit will be cleared on reset.                                          |  |
| RESERVED         | 3      | (WO,0) | Must be zero.                                                                                                                                                                                                                                                                                                                                                                                                      |  |
| CORR_FILL_DAT    | 4      | (WO,1) | Correct fill data from Bcache or memory, in ECC mode.<br>When this bit is set, fill data from Bcache or memory will<br>first go through error correction logic before being driven<br>to the Scache or Dcache. If the error is correctable, it will<br>be transparent to the machine.<br>When this bit is clear, fill data from Bcache or memory                                                                   |  |
|                  |        |        | will be directly driven to the Dcache before ECC error is<br>detected. If the error is correctable, corrected data will be<br>returned again, Dcache will be invalidated, and error trap<br>will be taken. This bit will be set on reset.                                                                                                                                                                          |  |
| VTM_FIRST        | 5      | (WO,1) | Set for systems without a victim buffer. On a Be<br>miss, DECchip 21164-AA will first drive out the vi<br>ized block's address on the system address bus, foll<br>by the read miss address and command. Cleared for<br>tems with a victim buffer. If clear, on a Bcache miss<br>victim, DECchip 21164-AA will first drive out the<br>miss followed by the victim address and command.<br>bit will be set on reset. |  |
| EI_ECC_OR_PARITY | 6      | (WO,1) | This bit determines whether to operate the external inter-<br>face in QW ECC or Byte parity mode. When set, DECchip<br>21164-AA generates/expects QW ECC on the data check<br>pins. When clear, DECchip 21164-AA generates/expects<br>even byte parity on the data check pins. This bit will be<br>set on reset.                                                                                                   |  |

| Table 3–34: | BC | CONTROL | Field | Descriptions |
|-------------|----|---------|-------|--------------|
|             |    |         |       |              |

DIGITAL RESTRICTED DISTRIBUTION

.

| Table 3-34 (Cont.): | BC_CONTROL Field Descriptions |                 |                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |  |  |
|---------------------|-------------------------------|-----------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|
| Field               | Extent                        | Туре            | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                             |  |  |
| BC_FHIT             | 7                             | (WO,0)          | Bcache force hit. When this bit is set and the Bcache is<br>enabled, all external LD's and ST's are forced to hit in the<br>Bcache, irrespective of the tag status bits. In this mode,<br>all the addresses must be in non-cacheable address space<br>(<39>=1). Bcache tag parity and data ecc/parity checking<br>is disabled.                                                                                                                                          |  |  |
|                     | •                             |                 | For STx, value of status, parity and tag bits specified by<br>BC_TAG_STAT field will be used to write the Bcache tags.<br>Bcache tag and index will be the STx address received by<br>the BIU. It will write the Bcache tag RAM's with the STx<br>address minus the Bcache index. BC_FHIT bit will be<br>cleared on reset.                                                                                                                                              |  |  |
| BC_TAG_STAT         | 12:8                          | (WO)            | This bit field can be only used in BC_FHIT mode to<br>write any combination of tag status and parity bits in the<br>Bcache. The parity bit can be used to write bad tag parity.<br>These bits will be undefined on reset. See Table 3-35 for<br>the encodings.                                                                                                                                                                                                          |  |  |
| BC_BAD_DAT          | 14:13                         | (WO,0)          | When set, this field can be used to write bad data with<br>correctable or uncorrectable error in ECC mode. When bit<br><13> is set, data bit <0> and <64> are inverted. When<br>bit <14> is set, data bit <1> and <65> are inverted. When<br>the same OW is read from the Bcache, DECchip 21164-<br>AA will detect correctable/uncorrectable ECC error on both<br>the QWs based on the value of bits <14:13> used when<br>writing. This field will be cleared on reset. |  |  |
| EI_DIS_ERR          | 15                            | (WO,1)          | When set, this bit causes the DECchip 21164-AA to ignore<br>any ECC or parity error on a fill data received from the<br>Bcache or memory and no machine check is taken. It will<br>also ignore any Bcache tag or control parity error. This bit<br>will be set on reset.                                                                                                                                                                                                |  |  |
| TL_PIPE_LATCH       | 16                            | (WO,0)          | When set, this bit causes DECchip 21164-AA to pipe the<br>system control pins (ADDR_BUS_REQ_H, CACK_H, and<br>DACK_H) for one cpu cycle. This bit will be cleared on<br>reset.                                                                                                                                                                                                                                                                                          |  |  |
| RESERVED            | 18:17                         | ( <b>WO,0</b> ) | This bit field is reserved. It is a place holder for potential features.                                                                                                                                                                                                                                                                                                                                                                                                |  |  |

Table 3–34 (Cont.): BC\_CONTROL Field Descriptions

| Bcache Tag Status<12:8> | Description                       |  |  |
|-------------------------|-----------------------------------|--|--|
| BC_TAG_STAT<12>         | Parity for Bcache tag             |  |  |
| BC_TAG_STAT<11>         | Parity for Bcache tag status bits |  |  |
| BC_TAG_STAT<10>         | Bcache tag valid bit              |  |  |
| BC_TAG_STAT<9>          | Bcache tag shared bit             |  |  |
| BC_TAG_STAT<8>          | Bcache tag dirty bit              |  |  |

Table 3–35: BC TAG STAT Field Description

# 3.9.3.5 Bcache Configuration Register, BC\_CONFIG

The Bcache configuration register is write only.

| Field        | Extent | Туре            | Description                                                                                                                                                                                                                                                                                                                                                                                                               |  |
|--------------|--------|-----------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| BC_SIZE      | 2:0    | (WO,1)          | This field is used to indicate the size of the Bcache. On<br>power-up, this field will be initialized to a value of 1MB<br>Bcache. See Table 3–37 for the encodings.                                                                                                                                                                                                                                                      |  |
| RESERVED     | 3      | ( <b>WO,0</b> ) | Must Be Zero.                                                                                                                                                                                                                                                                                                                                                                                                             |  |
| BC_RD_SPD    | 7:4    | (WO,4)          | This field is used to indicate to the BIU the read acc<br>time of the Bcache, measured in CPU cycles, from the st<br>of a read until data is valid at the input pins. The Bcar<br>read speed must be with in four to ten CPU cycles.<br>power-up, this field will be initialized to a value of for<br>CPU cycles.<br>For systems without a Bcache, the read speed must<br>equal to SYS clock to CPU clock ratio.          |  |
| BC_WR_SPD    | 11:8   | (WO,4)          | This field is used to indicate to the BIU the write time<br>the Bcache, measured in CPU cycles. The Bcache wr<br>speed must be with in four to ten CPU cycles. On pow<br>up, this field will be initialized to a value of four Cl<br>cycles.<br>For systems without a Bcache, the write speed must<br>equal to SYS clock to CPU clock ratio.                                                                              |  |
| BC_RD_WR_SPC | 14:12  | (WO,1)          | This field is used to indicate to the BIU the number of<br>CPU cycles to wait when switching from a private read to<br>a private write Bcache transaction. For other data move<br>ment commands, such as Read Dirty or Fill from memory<br>it is up to the system to direct system wide data move<br>ment in a way that is safe. On power-up, this field will be<br>initialized to a read/write spacing of one CPU cycle. |  |
| RESERVED     | 15     | (WO,0)          | Must Be Zero.                                                                                                                                                                                                                                                                                                                                                                                                             |  |

| Table 3-36 (Cont.): | BC_CON       | FIG FIEID D                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | escriptions                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |  |
|---------------------|--------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| Field               | Extent       | Туре                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |  |
| FILL_WE_OFFSET      | 18:16 (WO,1) | Bcache write enable pulse offset, from the Sysclock edge<br>for fills from the system. This field does not affect privat<br>writes to Bcache. It is used during fills from the system<br>when writing the Bcache to determine the number of CPU<br>cycles to wait before driving out the write pulse value a<br>programmed in the BC_WE_CTL field.<br>This field is programmed with a value in the range of on<br>to seven CPU cycles. It must never exceed the syscloc<br>ratio. (E.g., if the sysclock ratio is 3, this field must no<br>be larger than 3.) On power-up, this field is initialized to<br>a write offset value of one CPU cycle. |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |  |
| RESERVED            | 19           | (WO,0)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | Must Be Zero.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |  |
| BC_WE_CTL           | 28:20        | (WO,0)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | Bcache write enable control. This field is used to control<br>the timing of the write enable during write or fill. If the<br>bit is set the write pulse is asserted. If the bit is clear<br>the write pulse is not asserted. Each bit corresponds to<br>a CPU cycle. At the start of a Bcache write cycle, write<br>pulse will always be de-asserted for one CPU cycle. After<br>the first cycle, bit <20> of the register is used to assert the<br>write pulse. Each cycle, the next bit will be used to assert<br>the write pulse. On power-up, all bits in this field will be<br>cleared. |  |
| RESERVED            | 33:29        | (WO)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | Must Be Zero.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |  |

| Table 3-36 | (Cont.):               | BC | CONFIG | Field  | Descriptions |
|------------|------------------------|----|--------|--------|--------------|
|            | <b>OOIIIIIIIIIIIII</b> | ~~ |        | 1 1010 | Boooriptiono |

| Tabl | <b>ie 3-</b> | -37: | BC_S | SIZE | Field | Description | ns |
|------|--------------|------|------|------|-------|-------------|----|
|      |              |      |      |      |       |             |    |

| Bcache Size<2:0> | Size                |
|------------------|---------------------|
| 000              | Invalid Bcache size |
| 001              | 1 <b>M</b>          |
| 010              | 2M                  |
| 011              | 4M <sup>1</sup>     |
| 100              | 8M <sup>1</sup>     |
| 101              | 16M                 |
| 110              | 32M                 |
| 111              | 64M                 |

<sup>1</sup> Preferred Bcache size for DECchip 21164-AA verification

#### 3–50 PALcode and IPRs

.

## 3.9.3.6 External Interface Status Register, EI\_STAT

The External Interface (EI) Status Register is read only. Any PAL code read of this register unlocks and clears it. Read of EI\_STAT will also unlock EI\_ADDR, BC\_TAG, and FILL\_SYN registers subject to some restrictions listed below. This register is not unlocked or cleared by reset.

Fill data from Bcache or memory could have correctable (c) or uncorrectable (u) errors in ECC mode. In parity mode, fill data parity errors are treated as uncorrectable hard errors. System address/cmd parity errors are always treated as uncorrectable hard errors irrespective of the mode. The sequence for reading, unlocking, and clearing EI\_ADDR, BC\_TAG, FILL\_SYN, and EI\_STAT are as follows:

- 1. Read EI\_ADDR, BC\_TAG, FILL\_SYN: Can be read in any order. Doesn't unlock or clear any register.
- 2. Read EI\_STAT register: Reading of this register will unlock EI\_ADDR, BC\_TAG, FILL\_SYN registers as described below. EI\_STAT will also be unlocked and cleared on read subject to conditions listed below.

| Corr. Error           | Uncorr. Error | 2nd Hard Error | Load Reg | Lock Reg       | Action when EI_STA is read                                       |
|-----------------------|---------------|----------------|----------|----------------|------------------------------------------------------------------|
| 0                     | 0             | not possible   | no       | no             | clear and unlock ev-<br>erything                                 |
| 1                     | 0             | not possible   | yes      | no             | clear and unlock ev-<br>erything                                 |
| 0                     | 1             | 0              | yes      | yes            | clear and unlock ev-<br>erything                                 |
| <b>1</b> <sup>1</sup> | 1 ·           | 0              | yes      | yes            | clear (c) bit don't un-<br>lock. Transition to<br>(0,1,0) state. |
| 0                     | 1             | 1              | no       | already locked | clear and unlock ev-<br>erything                                 |
| 11                    | 1             | 1              | no       | already locked | clear (c) bit don't un-<br>lock. Transition to<br>(0,1,1) state. |

#### Table 3–38: Loading/Locking Rules for External Interface Registers

<sup>1</sup> These are special cases. It is possible that when EI\_ADDR was read, only correctable error bit is set and registers are not locked. By the time EI\_STAT is read, uncorrectable error is detected and the registers get loaded again and locked. The value of EI\_ADDR read earlier is no longer valid. Therefore, for the (1, 1, x) case, when EI\_STAT is read correctable error bit is cleared and registers are not unlocked or cleared. Software must re-do the ipr read sequence. On the second read, error bits are in (0, 1, x) state, all the related iprs are unlocked, and EI\_STAT is cleared.

- If the first error is correctable, the registers are loaded but NOT locked. On second correctable error, registers are neither loaded nor locked.
- Registers are locked on first uncorrectable error except the second hard error bit.

• The second hard error bit is set ONLY for an uncorrectable error followed by uncorrectable error. If correctable error follows an uncorrectable, it will not be logged as a second error. Note that Bcache tag parity errors are uncorrectable in this context.

| 63                                    | 36 35 34 33 32 31 30 29 28 27 0 |
|---------------------------------------|---------------------------------|
| RAO                                   | R0 R0 R0 R0 R0 R0 R0  RA0       |
| • • • • • • • • • • • • • • • • • • • | ++++++++++++                    |
|                                       | +> BC TPERR                     |
|                                       | +> BC TC PERR                   |
|                                       | +> EI_ES                        |
|                                       | +> COR_ECC_EF                   |
|                                       | +> UNC_ECC_EF                   |
|                                       | +> EI_PAR_ERF                   |
|                                       | +> FIL_IRD                      |
|                                       | +> SEO HRD EF                   |

| Table 3-39: | EI STAT | ' Field | Descriptions |
|-------------|---------|---------|--------------|
|-------------|---------|---------|--------------|

| Field       | Extent | Туре | Description                                                                                                                                                                                                                                                                         |
|-------------|--------|------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| BC_TPERR    | 28     | RO   | This bit, when set, indicates that a Bcache read encountered bad parity in the tag address RAM.                                                                                                                                                                                     |
| BC_TC_PERR  | 29     | RO   | This bit, when set, indicates that a Bcache read encountered bad parity in the tag control RAM.                                                                                                                                                                                     |
| EI_ES       | 30     | RO   | External interface error source. This field indicates if the error<br>source for fill data is Bcache or memory, or system for ad-<br>dress/command parity error. When set, it indicates that the<br>error source is memory or system. If not set, it is Bcache.                     |
| COR_ECC_ÈRR | 31     | RO   | Correctable ECC error. This bit, when set, indicates that a fill data received from outside the CPU contained a correctable ECC error.                                                                                                                                              |
| UNC_ECC_ERR | 32     | RO   | Uncorrectable ECC error. This bit, when set, indicates that<br>a fill data received from outside the CPU contained an un-<br>correctable ECC error. In the parity mode it indicates parity<br>error.                                                                                |
| EI_PAR_ERR  | 33     | RO   | External Interface address/command parity error. This bit,<br>when set, indicates that an address and command received by<br>the CPU has a parity error.                                                                                                                            |
| FIL_IRD     | 34     | RO   | This bit is only meaningful when one of the ECC or parity<br>error bits is set. FIL_IRD is set to indicate that the error<br>which caused one of the error bits to get set occurred during an<br>Icache fill and clear to indicate that the error occurred during<br>a Dcache fill. |

#### 3–52 PALcode and IPRs

| Field       | Extent | Extent Type Description |                                                                                                                                                                                                                                                    |  |  |
|-------------|--------|-------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|
| SEO_HRD_ERR | 35     | RO                      | Second external interface hard error. This field indicates that<br>the fill from Bcache or memory or the system address/command<br>received by the CPU has a hard error while one of the hard<br>error bit in the EI_STAT register is already set. |  |  |

| Table 3-39 ( | Cont. | ): | EI STAT | Field | Descriptions |
|--------------|-------|----|---------|-------|--------------|
|              |       |    |         |       |              |

## 3.9.3.7 External Interface Address Register, El\_ADDR

External Interface Address register. This read-only register contains the physical address associated with errors reported by EI\_STAT register. Its content is meaningful only when one of the error bits is set. Read of EI\_STAT unlocks EI\_ADDR register.

| 63 | 3      | 40 | <br>    | - | 3   | -         |
|----|--------|----|---------|---|-----|-----------|
| i  | RAO    | I  | EI_ADDR | 1 | RAO | Ì         |
| •  | 0>Read |    | <br>    |   |     | <b></b> T |

## 3.9.3.8 Bcache Tag Address Register, BC\_TAG\_ADDR

BC\_TAG\_ADDR is a read-only IPR. Unless locked, the BC\_TAG\_ADDR register is loaded with the results of every Bcache tag read. When a tag or tag control parity error occurs this register is locked against further updates. Software may read this register by using the DECchip 21164-AA specific I/O space address instruction. This register is unlocked whenever EI\_STAT register is read. This register is not unlocked by reset.

| 63  | 39 38<br>++   | 20 19 18 17 16 15 14 13 12 11 0                                                                                                                        |
|-----|---------------|--------------------------------------------------------------------------------------------------------------------------------------------------------|
| RAO | 0  TAG [3820] | RAO RO RO RO RO RO RO RO RAO                                                                                                                           |
|     |               | ++<br>             <br>          +> HIT<br>        +> TAGCTL_P<br>      +> TAGCTL_P<br>    +> TAGCTL_D<br>    +> TAGCTL_S<br>  +> TAGCTL_V<br>+> TAG_P |

RAO --> Read As One

Unused tag bits in the TAG field of this register are always zero, based on the size of the Bcache as determined by the BC\_SIZE field of the BC\_CTL register.

#### 3.9.3.9 Fill\_Syndrome Register, FILL\_SYN

The FILL\_SYN register is a 16-bit read-only register. It is loaded but not locked on a correctable ECC error, except that another correctable error does not re-load it. It gets loaded and locked if an uncorrectable ECC error or parity error is recognized during a fill from Bcache or memory as shown in Table 3–38. The FILL\_SYN register is unlocked when the EI\_STAT register is read. This register is not unlocked by reset.

If the chip is in ECC mode and an ECC error is recognized during a cache fill operation, the syndrome bits associated with the bad quadword are loaded in the FILL\_SYN register. FILL\_SYN[7..0] contains the syndrome associated with the lower quadword of the octaword, and FILL\_SYN[15..8] contain the syndrome associated with the higher quadword of the octaword. A syndrome value of zero means that no errors where found in the associated quadword. See Table 3-40 for a list of syndromes associated with correctable single-bit errors.

If the chip is in parity mode and a parity error is recognized during a cache fill operation, the FILL\_SYN register indicates which of the bytes in the octaword got bad parity. FILL\_SYNDROME[7..0] is set appropriately to indicate the bytes within the lower quadword that were corrupted and FILL\_SYN[15..8] is set to indicate the corrupted bytes within the upper quadword.

| 6<br>3 |     | 1<br>6 | +      | 0<br>8 | 0<br>7                      | 0<br>0                  |
|--------|-----|--------|--------|--------|-----------------------------|-------------------------|
| +      | RAZ |        | HI[70] | ]      | +<br> <br>  LO[70<br> <br>+ | +<br> <br>]  <br> <br>+ |

|          | Synuromes For Sing |           |               |   |
|----------|--------------------|-----------|---------------|---|
| Data Bit | Syndrome(Hex)      | Check Bit | Syndrome(Hex) |   |
| 00       | CE                 | 00        | 01            | · |
| 01       | CB .               | 01        | 02            |   |
| 02       | D3                 | 02        | 04            |   |
| 03       | D5                 | 03        | 08            |   |
| 04       | D6                 | 04        | 10            |   |
| 05       | D9                 | 05        | 20            |   |
| 06       | DA                 | 06        | 40            |   |
| 07       | DC                 | 07        | 80            |   |
| 08       | 23                 |           |               |   |
| 09       | 25                 |           |               |   |
| 10       | 26                 |           |               |   |
| 11       | 29                 |           |               |   |
| 12       | 2A                 |           |               |   |
| 13       | 2C                 |           |               |   |
| 14       | 31                 |           |               |   |

Table 3–40: Syndromes For Single-Bit Errors

| Table 3-40 ( | Cont.): Syndromes F | or Single-Bit Erro | Drs           |
|--------------|---------------------|--------------------|---------------|
| Data Bit     | Syndrome(Hex)       | Check Bit          | Syndrome(Hex) |
| 15           | 34                  |                    |               |
| 16           | 0E                  |                    |               |
| 17           | 0B                  |                    |               |
| 18           | . 13                |                    |               |
| 19           | 15                  |                    |               |
| 20           | 16                  |                    |               |
| 21           | 19                  |                    |               |
| 22           | 1A                  |                    |               |
| 23           | 1C                  |                    |               |
| 24           | E3                  |                    |               |
| 25           | <b>E</b> 5          |                    |               |
| 26           | <b>E</b> 6          |                    |               |
| 27           | E9                  |                    |               |
| 28           | EA                  |                    |               |
| 29           | EC                  |                    |               |
| 30           | F1                  |                    |               |
| 31           | F4                  |                    | · · ·         |
| 32           | <b>4F</b>           |                    |               |
| 33           | 4A                  |                    |               |
| 34           | 52                  |                    |               |
| 35           | 54                  |                    |               |
| 36           | 57                  |                    |               |
| 37           | 58                  |                    |               |
| 38           | 5B                  |                    |               |
| 39           | 5D                  |                    |               |
| 40           | A2                  |                    |               |
| 41           | A4                  |                    |               |
| 42           | A7                  |                    |               |
| 43           | A8                  |                    |               |
| 44           | AB                  |                    |               |
| 45           | AD                  |                    |               |
| 46           | <b>B</b> 0          |                    |               |
| 47           | B5                  |                    |               |
| 48           | 8F                  |                    |               |
| 49           | 8A.                 |                    |               |
| 50           | 92                  |                    |               |

 Table 3–40 (Cont.):
 Syndromes For Single-Bit Errors

| Table 3-40 | (Cont.): Syndromes F | or Single-Bit Erro | ors           |   |
|------------|----------------------|--------------------|---------------|---|
| Data Bit   | Syndrome(Hex)        | Check Bit          | Syndrome(Hex) |   |
| 51         | 94                   |                    |               |   |
| 52         | 97                   |                    |               |   |
| 53         | 98                   |                    |               |   |
| 54         | 9B                   |                    |               |   |
| 55         | 9D                   |                    |               |   |
| 56         | 62                   |                    |               |   |
| 57         | 64                   |                    |               |   |
| 58         | 67                   |                    |               |   |
| 59         | 68                   |                    |               | • |
| 60         | 6B                   |                    |               |   |
| 61         | 6D                   |                    |               |   |
| 62         | 70                   |                    |               |   |
| 63         | 75                   |                    |               |   |
|            |                      |                    |               |   |

Table 2 10 (Capt ). Sundramas For Single Bit Errors

#### 3.9.3.10 Load Lock Register, LD\_LOCK

The Load Lock register is read only. It can be read by PAL code for diagnostic purpose. It is not cleared by reset.

The address gets loaded in this register whenever the Cbox receives LDx\_L command and the Scache is accessed. This register is always loaded irrespective of hit/miss or any parity error in the Scache.

| 63  | •   | 0 39 |            |   | 3   | • |
|-----|-----|------|------------|---|-----|---|
| i i | RAO | i    | LDx_L ADDR | ł | RAO | i |
|     |     |      |            |   | +   | + |

RAO -->Read As One

# 3.9.4 PAL Restrictions

#### 3.9.4.1 Definitions

The list of Mbox instructions is: LDx, LDQ\_U, LDx\_L, HW\_LD, STx, STQ\_U, STx\_C, HW\_ST, and FETCHx. The list of virtual Mbox instructions is: LDx, LDQ\_U, LDx\_L, HW\_LD (virtual), STx, STQ\_U, STx\_C, HW\_ST (virtual), and FETCHx. Load instructions are LDx, LDQ\_U, LDx\_ L, and HW\_LD. Store instructions are STx, STQ\_U, STx\_C, HW\_ST.

#### 3–56 PALcode and IPRs

# DEC Chip 21164-AA (EV5 CPU) Specification, Revision 1.9, December 1992

| The following in cycle 0:                                                  | Restrictions:                                             |
|----------------------------------------------------------------------------|-----------------------------------------------------------|
| CALL_PAL entry                                                             | No HW_REI or HW_REI_STALL in cycle 0                      |
|                                                                            | No MFPR EXC_ADDR in cycle 0                               |
| PALshadow write                                                            | No HW_REI or HW_REI_STALL in 0, 1                         |
| HW_LD, lock bit set                                                        | PAL must slot to E0                                       |
|                                                                            | No other Mbox instruction in 0                            |
| HW_LD, VPTE bit set                                                        | No other virtual reference in 0                           |
| Any Load instruction                                                       | No Mbox MTPR or MFPR in 0                                 |
|                                                                            | No MFPR MAF_MODE in 1,2                                   |
|                                                                            | No MFPR DC_PERR_STAT in 1,2                               |
| Any Store instruction                                                      | No MFPR DC_PERR_STAT in 1,2                               |
| Any Mbox instruction, if it traps                                          | MTPR any Ibox IPR not aborted in 0,1                      |
|                                                                            | MTPR DTBIS not aborted in 0,1                             |
| Any Ibox trap except pc mispred,<br>itbmiss, or OPCDEC due to user<br>mode | MTPR DTBIS not aborted in 0,1 .                           |
| HW_REI_STALL                                                               | Only 1 HW_REI_STALL in an aligned block of 4 instructions |
| MTPR Any Ibox IPR (including<br>PALtemps)                                  | No MFPR same IPR in cycle 1,2                             |
|                                                                            | No Floating Point conditional branch in 0                 |
|                                                                            | No FEN or OPCDEC instruction in 0                         |
| MTPR ASTRR, ASTER, SIRR, SICR                                              | No MFPR INTID in 0,1,2,3,4                                |
| MTPR EXC_ADDR                                                              | No HW_REI in cycle 0,1                                    |
| MTPR IC_FLUSH_CTL                                                          | Must be followed by 33 NOPs                               |
| MTPR ICSR: HWE, FPE                                                        | No HW_REI in 0,1,2                                        |
| MTPR ICSR: SPE, FMS                                                        | If HW_REI_STALL, then no HW_REI_STALL in 0,1              |
|                                                                            | If HW_REI, then no HW_REI in 0,1,2,3,4                    |
| MTPR ICSR: SDE                                                             | No PALshadow read/write in 0,1,2,3                        |
|                                                                            | No HW_REI in 0,1,2                                        |
| MTPR ITB_ASN                                                               | Must be followed by HW_REI_STALL                          |
|                                                                            | No HW_REI_STALL in cycle 0,1                              |
| MTPR ITB_PTE                                                               | Must be followed by HW_REI_STALL                          |
| MTPR ITB_IAP, ITB_IS, ITB_IA                                               | Must be followed by HW_REI_STALL                          |
|                                                                            | HW_REI_STALL must be in the same Istream octaword         |
| MTPR IVPTBR                                                                | No MFPR IFAULT_VA_FORM in 0,1,2.                          |
| MTPR PAL_BASE                                                              | No CALL_PAL in 0,1,2,3,4,5,6,7                            |
|                                                                            | No HW_REI in 0,1,2,3,4,5,6                                |
| MTPR PS                                                                    | No HW_REI in 0,1,2                                        |

# Table 3–41: PAL Restrictions Table

# DIGITAL RESTRICTED DISTRIBUTION

PALcode and IPRs 3-57

| The following in cycle 0: | Restrictions:                                                                                                                     |
|---------------------------|-----------------------------------------------------------------------------------------------------------------------------------|
| MTPR CC, CC_CTL           | No RPCC in 0,1,2                                                                                                                  |
| MTPR DC_FLUSH             | No Mbox instructions in 1,2                                                                                                       |
|                           | No outstanding fills in 0                                                                                                         |
| MTPR DC_MODE              | No Mbox instructions in 1,2,3,4                                                                                                   |
|                           | No MFPR DC_MODE in 1,2                                                                                                            |
|                           | No outstanding fills in 0                                                                                                         |
| MTPR DC_PERR_STAT         | No load or store instructions in 1                                                                                                |
|                           | No MFPR DC_PERR_STAT in 1,2                                                                                                       |
| MTPR DC_TEST_CTL          | No MFPR DC_TEST_TAG in 1,2,3                                                                                                      |
| MTPR DC_TEST_TAG          | No outstanding DC fills in 0                                                                                                      |
|                           | No MFPR DC_TEST_TAG in 1,2,3                                                                                                      |
| MTPR DTB_ASN              | No virtual Mbox instructions in 1,2,3                                                                                             |
| MTPR DTB_CM, ALT_MODE     | No virtual Mbox instructions in 1,2                                                                                               |
| MTPR DTB_PTE              | No virtual Mbox instructions in 2                                                                                                 |
|                           | No MTPR DTB_ASN, DTB_CM, ALT_MODE, MCSR, MAF_MODE, DC_MODE, DC_PERR_STAT, DC_TEST_CTL, DC_TEST_TAG in 2                           |
| MTPR DTB_TAG              | No virtual Mbox instructions in 1,2,3                                                                                             |
|                           | No MTPR DTB_TAG in 1                                                                                                              |
|                           | No MFPR DTB_PTE in 1,2                                                                                                            |
|                           | No MTPR DTBIS in 1,2                                                                                                              |
| MTPR DTBIAP, DTBIA        | No virtual Mbox instructions in 1,2,3                                                                                             |
| MTPR MAF_MODE             | No Mbox instructions in 1,2,3                                                                                                     |
|                           | No MFPR MAF_MODE in 1,2                                                                                                           |
| MTPR MCSR                 | No virtual Mbox instructions in 0,1,2,3,4                                                                                         |
|                           | No MFPR MCSR in 1,2                                                                                                               |
|                           | No MFPR VA_FORM in 1,2,3                                                                                                          |
| MTPR MVPTBR               | No MFPR VA_FORM in 1,2                                                                                                            |
| MFPR DC_TEST_TAG          | No outstanding DC fills in 0                                                                                                      |
|                           | No MFPR DC_TEST_TAG_TEMP in 1                                                                                                     |
| MFPR DTB_PTE              | No Mbox instructions or Mbox MTPR DTB_ASN, DTB_CM, ALT_MODE, MCSR, MAF_MODE, DC_MODE, DC_PERR_STAT, DC_TEST_CTL, DC_TEST_TAG in 1 |
|                           | No MFPR PTE_TEMP in 1,2,3                                                                                                         |
|                           | No MFPR DTB_PTE in 1                                                                                                              |
| MFPR VA                   | Must be done in ARITH, MACHINE CHECK, DTBMISS_SINGLE, UNALIGN, and DFAULT traps                                                   |

Table 3-41 (Cont.): PAL Restrictions Table

3–58 PALcode and IPRs

| Store to SC_CTL, BC_CTL, BC_<br>CONFIG | Must be preceded by MB                                   |
|----------------------------------------|----------------------------------------------------------|
|                                        | No concurrent cacheable Istream references               |
| Load from SC_STAT                      | Unlocks SC_ADDR and SC_STAT                              |
| Load from EI_STAT                      | Unlocks EI_ADDR, EI_STAT, FILL_SYN, and BC_TAG_ADDR      |
| Store to any Cbox IPR                  | Must synch with following references, probably with a MB |
| Any Cbox IPR address                   | No Ldx_L or Stx_C                                        |

# Table 3–42: Cbox IPR Restrictions Table

# DIGITAL RESTRICTED DISTRIBUTION

•

# 3.10 Revision History

| Who      | When                 | Description of change             |  |  |
|----------|----------------------|-----------------------------------|--|--|
| EV5 team | March 2, 1992        | First pass.                       |  |  |
| EV5 team | November 24,<br>1992 | Update to reflect current design. |  |  |

Table 3–43: Revision History

# **DIGITAL RESTRICTED DISTRIBUTION**

•

# Chapter 4

**External Interface** 

# 4.1 Chip Interface

# Figure 4–1: DECchip 21164-AA System Interface



# DIGITAL RESTRICTED DISTRIBUTION

External Interface 4-1

# 4.1.1 Overview

The DECchip 21164-AA chip is contained in the 503 pin package. All of the extra pins, compared to EV4, are used for power and ground. This means that the system interface will remain a 128 bit bi-directional data bus. The only way to improve the bandwidth of the system interface is to cycle it faster and to use it more often.

The cycle time of the system interface will be some integer multiple of the DECchip 21164-AA cycle time. The minimum multiple is 3x. The maximum multiple is 15x. The tested points between the min. and max. are TBD. The DECchip 21164-AA team will focus on the testing of values that our SYSTEM partners plan to use. Some testing of all possible values will be done.

DECchip 21164-AA can be used to build systems with or without a module level Bcache. The read and write speed of the Bcache can be programmed independantly of the Sysclock ratio and each other. Some care must be taken to make fills and read/read dirty transactions work. The cache system supports a block size of 32 bytes or 64 bytes. The block size is selected by mode bit.

Section 9.1 lists the DECchip 21164-AA signal pins. Figure 4-1 shows a simple picture of the system interface.

Chapter 8 describes the AC requirements for DECchip 21164-AA.

DECchip 21164-AA can take one command/address from the SYSTEM at a time. The Scache and/or Bcache will be probed to determine what must be done with the command. If nothing will be done, the command is ACKed and removed. If a Bcache read, set shared, or invalidate is required it will be done as soon as the Bcache is free. The command will be ACKed at the start of the Bcache transaction.

In general, the DECchip 21164-AA BIU can hold one or two misses and one or two Scache victim address. These four addresses along with the SYSTEM request will ARB for the Bcache. Data movement for the SYSTEM is the highest priority for the Bcache. This includes fill, reading dirty data, invalidates, and set shared. If there are no SYSTEM requests for the Bcache, a DECchip 21164-AA command will be selected.

All transactions between DECchip 21164-AA and the SYSTEM are non-pended, except for fills. DECchip 21164-AA may request up to two fills from memory (if the SYSTEM allows two). Any read or write transaction in the cache must be completed once it is started.

Blocks in the Scache/Bcache that have data movement pending to them will not be read or written by the CPU until the data movement is completed. The SYSTEM will not be prevented from reading or writing blocks in the Scache/Bcache. For example if the CPU has requested a write to a clean block, it will not be allow to access that block until the block until the write completes, but the SYSTEM will always be able to access the block.

The SYSTEM may have one or more Bcache victim buffers. Each time a Bcache victim is produced, DECchip 21164-AA will stop reading the Bcache until the SYSTEM takes the current victim. Bcache operations will then resume.

DECchip 21164-AA requires wrapped reads on INT16 boundaries. The valid wrap orders for 64 byte blocks are selected by bits PA<5:4>, they are:

- 0, 1, 2, 3
- 1, 0, 3, 2
- 2, 3, 0, 1

4–2 External Interface

• 3, 2, 1, 0

For 32 byte blocks the valid wrap orders are selected by PA<4>, they are:

- 0,1
- 1,0

WRITE BLOCK and WRITE BLOCK LOCK commands from DECchip 21164-AA will not be wrapped. They will always write INT16 zero, one, two, and three. BCACHE VICTIM commands will provide the data with the same wrap order as the read miss that produced them.

# 4.1.2 Physical Memory Regions

DECchip 21164-AA physical memory is divided into three regions. The first region is the first half of the physical address space. It is treated by DECchip 21164-AA as memory-like. The second region is the second half of the physical address space except for a 1MByte region reserved for Cbox IPRs. It is treated by DECchip 21164-AA as non-cachable. The third region is the 1Mbyte region reserved for Cbox IPRs.

In the first region, writeback caching, write merging and load merging are all permitted. All DECchip 21164-AA accesses in this region are 32-byte or 64-byte depending on the block size.

DECchip 21164-AA does not cache data accessed in the second and third region of the physical address space. DECchip 21164-AA read accesses in these regions are always 32-byte requests. Load merging is permitted, but the request includes a mask to tell the SYSTEM environment which INT8s are accessed. Write accesses are 32-byte requests, with a mask indicating which INT4s are actually modified. DECchip 21164-AA will never write more than 32-bytes at a time in non-cached space.

DECchip 21164-AA does not emit accesses to the Cbox IPR region if they map to a Cbox IPR. Accesses in this region that are not to a defined Cbox IPR produce UNDEFINED results.

| Region                                        | Address Range                  | Description                                                                                                                   |  |  |
|-----------------------------------------------|--------------------------------|-------------------------------------------------------------------------------------------------------------------------------|--|--|
| memory-like                                   | 0000000000-<br>7FFFFFFFFF(hex) | Writeback cached, load and store merging allowed                                                                              |  |  |
| non-cacheable 800000000-<br>FFFFEFFFF(hex)    |                                | not cached, load merging limited                                                                                              |  |  |
| Cbox IPR region FFFFF00000-<br>FFFFFFFFF(hex) |                                | Cbox IPRs, accesses do not appear on the pins u<br>less an undefined location is accessed (which produc<br>UNDEFINED results) |  |  |

| Table 4–1: Physical Memory Regions | Table | 4–1: | Physical | Memory | Regions |
|------------------------------------|-------|------|----------|--------|---------|
|------------------------------------|-------|------|----------|--------|---------|

# 4.1.3 **Possible Configurations**

The DECchip 21164-AA cache system allows for several system configurations. They can be broken into two classes: those that use the write invalidate cache coherence protocol and those that use the flush based protocol. Table 4–2 shows the components that would make up the system designs that are possible with DECchip 21164-AA.

| System Type      | Scache | Scache Du<br>Tag | plicate<br>Bcache | Bcache Duplic<br>Tag | ate<br>Lock Reg. |
|------------------|--------|------------------|-------------------|----------------------|------------------|
| Write Invalidate | Yes    | No               | No                | No                   | No               |
| Write Invalidate | Yes    | Yes              | No                | No                   | Required         |
| Write Invalidate | Yes    | No               | Yes               | Required             | Required         |
| Flush            | Yes    | No               | No                | No                   | No               |
| Flush            | Yes    | No               | Yes               | No                   | No               |
| Flush            | Yes    | No               | Yes               | Yes                  | Required         |

#### Table 4–2: System Designs

In a write invalidate based design, DECchip 21164-AA will expect the SYSTEM to use the READ DIRTY, READ DIRTY/INVALIDATE, INVALIDATE, and SET SHARED, commands to keep the state of each block up to date.

In a flush based design, DECchip 21164-AA will expect the READ and FLUSH commands to be used to remove blocks from the cache.

# 4.1.4 Maintaining Cache Coherence

In a coherent design, DECchip 21164-AA requires the SYSTEM to have some properties to make things work.

DECchip 21164-AA requires the SYSTEM to allow only one change to a block at a time. This means that if DECchip 21164-AA wins the bus to read or write a block, no other node on the bus will be allowed to access that block until the data has been moved.

If DECchip 21164-AA attempts to write a clean/private block of memory, it will send a SET DIRTY command to the SYSTEM. At the same time the SYSTEM might be sending a SET SHARED or INVALIDATE command to DECchip 21164-AA for the same block. The bus is the coherence point in the SYSTEM, so if the bus has already changed the state of the block to shared, setting the dirty bit is the wrong thing to do. DECchip 21164-AA will not resend the SET DIRTY command when the ownership of the ADDRESS/CMD bus is returned. The write will be restarted and use the new tag state to generate a new system request.

It is also possible for the SYSTEM to send an INVALIDATE at the same time DECchip 21164-AA is attempting to do a WRITE BLOCK or WRITE BLOCK LOCK. In this case DECchip 21164-AA will abort the WRITE BLOCK transaction, service the INVALIDATE, and then restart the WRITE BLOCK transaction.

In both of these cases if the SET DIRTY or WRITE BLOCK is started by DECchip 21164-AA, and then interrupted by the SYSTEM, DECchip 21164-AA will resume the same transaction unless the SYSTEM request was to the same block as the request DECchip 21164-AA had started. In this case the DECchip 21164-AA request will be restarted internally by the CPU and it is unpredictible what transaction DECchip 21164-AA will next present to the system.

DECchip 21164-AA will maintain the processors Dcache as a subset of the Scache. If a Bcache is present, the Scache will be maintained as a subset of the Bcache.

The processors Icache is not a subset of any cache and is incoherent with the rest of the cache system.

# 4.1.5 Cache State

The following tables describe the DECchip 21164-AA multiprocessor cache coherence protocol, a modification of the protocol described in the Laser System Bus Specification Revision 1.2. DECchip 21164-AA will not take an update to a shared block, the block will always be invalidated.

Table 4–3: Cache States

| v | S | D | State of cache line assuming tag match                                                                                                                                                                |
|---|---|---|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 0 | x | x | Not valid                                                                                                                                                                                             |
| 1 | 0 | 0 | Valid for read or write. This cache line contains the only cached copy of<br>the block and the copy in memory is identical to this line.                                                              |
| 1 | 0 | 1 | Valid for read or write. This cache line contains the only cached copy of<br>the block. The contents of the block have been modified more recently than<br>the copy in memory.                        |
| 1 | 1 | 0 | Valid for read or write but writes must be broadcast on the bus. This block MAY be in some other CPUs cache.                                                                                          |
| 1 | 1 | 1 | Valid for read or write but write must be broadcast on the bus. This block<br>MAY be in some other CPUs cache. The contents of the block have been<br>modified more recently than the copy in memory. |

# Table 4-4: System Actions

| System<br>Command | Tag Probe<br>Results | Bus Response    | New Cache State | Comments                     |
|-------------------|----------------------|-----------------|-----------------|------------------------------|
| Read              | Miss                 | ~Shared, ~Dirty | No change       |                              |
| Rd_ex             | Miss                 | ~Shared, ~Dirty | No change       |                              |
| Write             | Miss                 | ~Shared, ~Dirty | No change       |                              |
| Read              | Hit, ~Dirty          | Shared, ~Dirty  | Shared, ~Dirty  |                              |
| Read              | Hit, Dirty           | Shared, Dirty   | Shared, Dirty   | This cache supplies the data |
| Rd_ex             | Hit, ~Dirty          | ~Shared, ~Dirty | Invalid         |                              |
| Rd_ex             | Hit, Dirty           | ~Shared, Dirty  | Invalid         | This cache supplies the data |
| Write             | Hit                  | ~Shared, ~Dirty | Invalid         |                              |

| Table 4–5:           | Processor Actions      |                              |              |                 |  |  |
|----------------------|------------------------|------------------------------|--------------|-----------------|--|--|
| Processor<br>Command | Cache Probe<br>Results | DECchip 21164-AA<br>ADDR CMD | Bus Response | New Cache State |  |  |
| Read                 | Invalid                | Read Miss                    | ~Shared      | ~Shared, ~Dirty |  |  |
| Read                 | Invalid                | Read Miss                    | Shared       | Shared, ~Dirty  |  |  |
| Write                | Invalid                | Read Miss Mod                | ~Shared      | ~Shared, Dirty  |  |  |
| Read                 | Miss, ~Dirty           | Read Miss                    | ~Shared      | ~Shared, ~Dirty |  |  |
| Read                 | Miss, ~Dirty           | Read Miss                    | Shared       | Shared, ~Dirty  |  |  |
| Write                | Miss, ~Dirty           | Read Miss Mod                | ~Shared      | ~Shared, Dirty  |  |  |
| Read                 | Miss, Dirty            | Victim, ~Shared<br>Read Miss |              | ~Shared, ~Dirty |  |  |
| Read                 | Miss, Dirty            | Victim,<br>Read Miss         | Shared       | Shared, ~Dirty  |  |  |
| Write                | Miss, Dirty            | Victim,<br>Read Miss Mod     | ~Shared      | ~Shared, Dirty  |  |  |
| Read                 | Hit                    | NOP                          | NOP          | No change       |  |  |
| Write                | Hit, Dirty, ~Shared    | NOP                          | NOP          | ~Shared, Dirty  |  |  |
| Write                | Hit, Dirty, Shared     | Write Block                  | ~Shared      | ~Shared, ~Dirty |  |  |
| Write                | Hit, ~Dirty, ~Shared   | Set Dirty                    | NOP          | ~Shared, Dirty  |  |  |
| Write                | Hit, ~Dirty, Shared    | Write Block                  | ~Shared      | ~Shared, ~Dirty |  |  |

| ľ | al | Die | 4-5 | 5: I | Pro | ces | sor | ACti | ons |
|---|----|-----|-----|------|-----|-----|-----|------|-----|
|   |    |     |     |      |     |     |     |      |     |

If DECchip 21164-AA requests a READ MISS MOD, DECchip 21164-AA expects the block to be returned ~shared, dirty. However, if the system returns the data shared, ~dirty DECchip 21164-AA will follow with a WRITE BLOCK command. Doing this might expose the system to livelock problems.

#### 4.1.6 **DECchip 21164-AA Interface**

The interface can be divided into two parts. The SYSTEM interface and the Bcache interface. Both parts share the data bus.

The SYSTEM interface is made up of a bi-directional command and address bus, and several control signals. They are described in Section 4.1.6.1. The Bcache interface signals are described in Section 4.1.6.2.

# 4.1.6.1 System Interface

These are the signals that make up the SYSTEM interface. All are driven and received by DECchip 21164-AA on the rising edge of Sysclock.

• ADDR\_H<39:4>

**Bi-directional** 

This is the address of the requested data or operation. If bit 39 is asserted, the reference is to non-cached memory.

• CMD\_H<3:0>

**Bi-directional** 

Table 4-6 lists the encodings for the commands that DECchip 21164-AA can drive on the CMD bus. Optional commands can be disabled in systems that do not require them. It is unpredictable if DECchip 21164-AA will drive a disabled command to the SYSTEM, however, no CACK should ever be sent for a disabled command.

| CMD<3:0> | Command          | Optional | Comments                                         |
|----------|------------------|----------|--------------------------------------------------|
| 0000     | NOP              | No       | Nothing                                          |
| 0001     | LOCK             | Yes      | New lock register address                        |
| 0010     | FETCH            | No       | DECchip 21164-AA passing a FETCH to the system   |
| 0011     | FETCH_M          | No       | DECchip 21164-AA passing a FETCH_M to the system |
| 0100     | MEMORY BARRIER   | Yes      | MB instruction                                   |
| 0101     | SET DIRTY        | Yes      | Dirty bit will be set if shared is still clear   |
| 0110     |                  |          | spare                                            |
| 0111     |                  |          | spare                                            |
| 1000     | READ MISS0       | No       | Request for data                                 |
| 1001     | READ MISS1       | No       | Request for data                                 |
| 1010     | READ MISS MOD0   | No       | Request for data, modify intent                  |
| 1011     | READ MISS MOD1   | No       | Request for data, modify intent                  |
| 1100     | BCACHE VICTIM    | No       | Bcache victim should be removed                  |
| 1101     |                  |          | spare                                            |
| 1110     | WRITE BLOCK      | No       | Request to write a block                         |
| 1111     | WRITE BLOCK LOCK | No       | Request to write a block with lock               |

| Table 4–6: | DECchip | 21164-AA | Command | s to t | he S | System |
|------------|---------|----------|---------|--------|------|--------|
|------------|---------|----------|---------|--------|------|--------|

Table 4-7 lists the encodings for the commands that the SYSTEM can drive on the CMD bus.

| Iavie 4-1. | System commands to DEccmp 21104-AA |                                             |  |  |
|------------|------------------------------------|---------------------------------------------|--|--|
| CMD<3:0>   | Command                            | Comments                                    |  |  |
| 0000       | NOP                                | Nothing                                     |  |  |
| 0001       | FLUSH                              | Remove block from caches, return dirty data |  |  |
| 0010       | INVALIDATE                         | Remove the block                            |  |  |
| 0011       | SET SHARED                         | Block goes to the shared state              |  |  |
| 0100       | READ                               | Read a block                                |  |  |
| 0101       | READ DIRTY                         | Read a block, set shared                    |  |  |
| 0111       | <b>READ DIRTY/INV</b>              | Read a block, invalidate                    |  |  |
|            |                                    |                                             |  |  |

| Table 4-7: | System | Commands to | ) DECchip | 21164-AA |
|------------|--------|-------------|-----------|----------|
|------------|--------|-------------|-----------|----------|

#### • ADDR\_CMD\_PAR\_H

#### **Bi-directional**

This is the odd parity on the current command and address bus. DECchip 21164-AA will take a machine check if a parity error is detected. The SYSTEM should do the same if it detects an error.

• VICTIM\_PENDING\_H

#### Output

Indicates that the current read miss had generated a victim. Systems may want to hold off requesting the command/address bus until the victim has been removed.

- ADDR\_BUS\_REQ\_H
  - Input

If this signal is asserted before the rising edge of a Sysclock, DECchip 21164-AA will not drive the ADDRESS or CMD busses during the next cycle.

- CACK\_H
  - Input

If this signal is asserted before the rising edge of a Sysclock, DECchip 21164-AA will drive the next address and cmd during the next cycle.

CFAIL\_H

Input

CFAIL has two uses. It should be used during the CACK cycle of a WRITE\_BLOCK\_LOCK command to indicate that the write has failed. It can also be used in cycles were CACK is not asserted to force an Ibox timeout event which, in turn, causes a partial reset of DECchip 21164-AA and will trap to the MCHK PAL code entry point.

• RES\_H<1:0>

Output

Table 4–8 lists the encoding of DECchip 21164-AA responses to SYSTEM requests.

| RES<1:0> | Command    | Comments                |
|----------|------------|-------------------------|
| 00       | NOP        | Nothing                 |
| 01       | NOACK      | Data not found or clean |
| 10       | ACK/Scache | Data from Scache        |
| 11       | ACK/Bcache | Data from Bcache        |

Table 4–8: DECchip 21164-AA Responses to System Commands

#### INT4\_VALID\_H<3:0>

Output

During writes, these wires are used to indicate which INT4 of data are valid. This is useful for non-cached writes that have been merged in the write buffer. During reads, these wires indicate which INT8 bytes of a 32 byte block need to be read and returned to the processor. This is useful for reads to non-cached memory.

• SCACHE\_SET\_H<1:0>

#### Output

During a read miss request, these pins will indicate the Scache set number that will be filled when the data is returned. This information can be used by the SYSTEM to maintain a duplicate copy of the Scache tag store.

• FILL\_H

Input

If this signal is asserted in Sysclock N, DECchip 21164-AA will provide the address indicated by the FILL ID to the Bcache in Sysclock N+2. The Bcache will begin to write in that Sysclock. At the end of the write, DECchip 21164-AA will wait for the next Sysclock and then begin the write again (It may take more than one Sysclock to write the Bcache).

• FILL\_ID\_H

Input

If this signal is asserted in Sysclock N, DECchip 21164-AA will provide the address from miss register 1. If it is deasserted, the address in miss register zero will be used for the fill.

• FILL\_ERROR\_H

Input

If this signal is asserted while a fill is pending from memory, it will indicate to DECchip 21164-AA that system has detected an invalid address or hard error. System will still provide an apparently normal fill sequence with correct ECC/parity though the data is not valid. DECchip 21164-AA will trap to the MCHK PAL code entry point.

• DACK H

Input

For Fills, if this signal is asserted before the Sysclock edge, it will indicate to DECchip 21164-AA that fill data was valid that Sysclock and DECchip 21164-AA should switch to the next address at the Sysclock edge.

For writes, if this signal is asserted before the Sysclock edge, it indicates that DECchip 21164-AA should provide the next address and data at the Sysclock edge.

• FILL\_NOCHECK\_H

Input

Do not check the parity or ECC for the current data cycle on a fill.

#### DIGITAL RESTRICTED DISTRIBUTION

External Interface 4-9

# • SYSTEM\_LOCK\_FLAG\_H

Input

This wire indicates the state of the system lock flag. During Fills, DECchip 21164-AA will AND the value of the system copy with its own copy to produce the true value of the lock flag.

IDLE\_BC\_H

Input

When this wire is asserted, DECchip 21164-AA will finish the current Bcache read or write. The CPU will not be allowed to start a new read or write until the wire is deasserted. Systems must assert this wire in time to idle the Bcache before a fill arrives. It can also be used to improve the response time of DECchip 21164-AA to SYSTEM requests.

The time required to idle the Bcache is a function of the internal design of DECchip 21164-AA, the block size, the read and write speed of the Bcache, the amount of tri-state overlap that must be avoided, and the Sysclock ratio. Take the larger of:

read\_idle = 3 + (block\_size/16)\*BC\_RD\_SPD + tri-state\_ram\_turn\_off
or
write\_idle = 5 + (block\_size/16)\*BC\_WRT\_SPD + tri-state\_cpu\_turn\_off

and round up to the next Sysclock value. This is the number of Sysclocks required between DECchip 21164-AA receiving IDLE\_BC until the Bcache will be idle.

For example if the Sysclock ratio is 6, BC\_RD\_SPD is 4, BC\_WRT\_SPD is 5, block size is 32B, and two idle CPU cycle are required to turn off the RAM drivers, for reads, and zero are required to turn off DECchip 21164-AA's write drivers, then it will take  $\max(3+2^*4+2, 5+2^*5+0)/6 = 3$  Sysclocks to idle the cache. If IDLE\_BC is asserted in Sysclock N, then the first fill data could be written in Sysclock N+3.

For FILL requests, IDLE\_BC can be de-asserted any time after the fill starts.

DATA\_BUS\_REQ\_H

Input

If this signal is asserted in Sysclock N, DECchip 21164-AA will not drive the data bus in Sysclock N+2. Before asserting this signal the system should assert IDLE\_BC for the correct number of cycles. If this signal is deasserted in Sysclock N, DECchip 21164-AA will drive the data bus in Sysclock N+2.

# 4.1.6.2 Bcache Interface

These signals make up the Bcache interface. Reads and writes of the Bcache that do not involve the SYSTEM will begin on any CPU clock. If the Bcache read or write involves receiving or sending data to the SYSTEM, then the access will begin on a rising Sysclock edge.

• INDEX\_H<25:4>

Output

These wire are used to index the Bcache.

• DATA\_H<127:0>

**Bi-directional** 

This bus is used to move data between DECchip 21164-AA, the Bcache, and the SYSTEM.

DATA\_CHECK\_H<15:0>

**Bi-directional** 

Either even byte parity or INT8 ECC for the current data cycle.

4–10 External Interface

# DIGITAL RESTRICTED DISTRIBUTION

- TAG\_DATA\_H<38:22>
- **Bi-directional**

Bcache tag data bits. This allows for Bcaches in the 4MB to 64MB range.

• TAG\_DATA\_PAR\_H

Bi-directional Odd parity for TAG\_DATA\_H<38:22>, the SYSTEM should force unused bits to zero.

• TAG\_VALID\_H

Bi-directional

The current tag contains a valid block. DECchip 21164-AA will assert this pin during fills.

• TAG\_SHARED\_H

# **Bi-directional**

The block is in the shared state. During fills the SYSTEM should drive TAG\_SHARED\_H with the correct value.

• TAG\_DIRTY\_H

**Bi-directional** 

The block is in the dirty state. During fills the SYSTEM should assert this bit if the DECchip 21164-AA request was a READ MISS MOD, and the shared bit is not asserted.

- TAG\_CTL\_PAR\_H
  - **Bi-directional**

Odd parity for TAG\_VALID\_H, TAG\_SHARED\_H, and TAG\_DIRTY\_H. During fills the system should drive the correct parity based on the state of the V, S and D bits.

TAG\_RAM\_OE\_H

Output

This signal will be asserted by DECchip 21164-AA during any Bcache read.

• TAG\_RAM\_WE\_H

#### Output

This signal will be asserted by DECchip 21164-AA, using the write pulse register contents, during any tag write. During the first CPU cycle of a write, the write pulse will be de-asserted. In the second and following CPU cycles of the write, the write pulse will be asserted if the corresponding bit in the write pulse register is asserted.

• DATA\_RAM\_OE\_H

Output

This signal will be asserted by DECchip 21164-AA during any Bcache read.

DATA\_RAM\_WE\_H

Output

This signal will be asserted by DECchip 21164-AA, using the write pulse register contents, during any data write. During the first CPU cycle of a write, the write pulse will be deasserted. In the second and following CPU cycles of the write, the write pulse will be asserted if the corresponding bit in the write pulse register is asserted.

# 4.1.7 DECchip 21164-AA Interface Command Descriptions

#### • FETCH/FETCH\_M

From DECchip 21164-AA

These commands are issued by DECchip 21164-AA when the FETCH and FETCH\_M instructions are executed.

# • FLUSH

From SYSTEM

The FLUSH command will cause a block to be removed from the DECchip 21164-AA cache system. If the block is not found, DECchip 21164-AA will respond with NOACK. If the block is found and the block is clean, DECchip 21164-AA will respond with NOACK. The block will be invalidated in the Dcache, Scache, and Bcache. If the block is found and dirty, DECchip 21164-AA will respond with ACK/Scache or ACK/Bcache. If the data was found dirty in the Scache it will be driven at the pins in the same Sysclock as the ACK/Scache. If the data is found dirty in the Bcache, the Bcache read will start on the same Sysclock as ACK. The block will be invalidated in the Dcache, Scache, and Bcache.

• LOCK

#### From DECchip 21164-AA

This command is used to load the System lock register. The state of the SYSTEM lock register flag is used on each fill to update the DECchip 21164-AA copy of the lock flag. See Section 4.1.8.12 for the full story.

#### • MEMORY BARRIER

#### From DECchip 21164-AA

This command is issued by DECchip 21164-AA to synchronize read and write accesses with other processors in the SYSTEM. DECchip 21164-AA issues this command when a MB instruction is executed. DECchip 21164-AA will stop issuing memory reference instructions and wait for the command to be acknowledged before continuing.

• NOP

#### From DECchip 21164-AA or SYSTEM

Nothing. This command should be driven by the owner of the CMD bus if it has nothing to do.

• READ

#### From SYSTEM

The READ command will probe the Scache and Bcache to see if the requested block is present. If the block is present, DECchip 21164-AA will respond with ACK/Scache or ACK/Bcache. If the data is in Scache, the data will be driven on the DATA bus in the same Sysclock as the ACK. If the data is in the Bcache, a Bcache read will begin in the same Sysclock as the ACK. If the block is not present in either cache, DECchip 21164-AA will assert NOACK on the RES wires.

#### • READ DIRTY

#### From SYSTEM

The READ DIRTY command will probe the Scache to see if the requested block is present and dirty. If the block is not found, or the block is clean, and the SYSTEM does not contain a Bcache, DECchip 21164-AA will respond with a NOACK. If the block is found and dirty in the Scache, DECchip 21164-AA will respond with ACK/Scache and drive the data on the DATA bus. If the block is not found in the Scache, and the SYSTEM contains a Bcache, it is

4–12 External Interface

assumed to be in the Bcache. DECchip 21164-AA will respond with ACK/Bcache, index the Bcache to read the block, and will change the block status to the shared dirty state.

• READ DIRTY INVALIDATE

#### From SYSTEM

This command is identical to the READ DIRTY command except if the block is present it will be invalidated from the caches.

- READ MISSn
  - From DECchip 21164-AA

This command is used to indicate that DECchip 21164-AA has probed its caches and that the addressed block was not present.

READ MISS MODIFYn

# From DECchip 21164-AA

This command is used to indicate that DECchip 21164-AA plans to write to the returned cache block. Normally the dirty bit should be set when the tag status is returned to DECchip 21164-AA.

• SET SHARED

#### From SYSTEM

The SET SHARED command is used by the SYSTEM to change the state of a block in the cache system to shared. The shared bit in the Scache will be set if the block is present. The Bcache tag will be written to the shared not dirty state. DECchip 21164-AA assumes that this is ok, because the SYSTEM would have sent a READ DIRTY if the dirty bit were set.

If the block is found in the Scache, DECchip 21164-AA will respond with ACK/Scache. Otherwise, if the SYSTEM contains a Bcache, the block is assumed to be in the Bcache and DECchip 21164-AA will respond with ACK/Bcache. If the SYSTEM does not contain a Bcache and the block is not found in the Scache, DECchip 21164-AA will respond with a NOACK.

SET DIRTY

#### From DECchip 21164-AA

DECchip 21164-AA wants to write a clean, private block in its Scache and wants the dirty bit set in the duplicate tag store. The CPU will not proceed with the write until an CACK response is received from the SYSTEM. When the CACK is received, DECchip 21164-AA will attempt to set the dirty bit. If the shared bit is still clear the dirty bit will be set and the write completed. If the shared bit is set the dirty bit will not be set, and DECchip 21164-AA will request a WRITE BLOCK. The copy of the dirty bit in the Bcache will not be updated until the block is removed from the Scache.

#### • INVALIDATE

#### From SYSTEM

DECchip 21164-AA will probe the Scache and invalidate the block if it is present. If the Bcache is present the block will be changed to the invalid state without probing.

If the block is found in the Scache, DECchip 21164-AA will respond with ACK/Scache. Otherwise, if the SYSTEM contains a Bcache, the block is assumed to be in the Bcache and DECchip 21164-AA will respond with ACK/Bcache. If the SYSTEM does not contain a Bcache and the block is not found in the Scache, DECchip 21164-AA will respond with a NOACK.

• BCACHE VICTIM

### From DECchip 21164-AA

If there is a victim buffer in the SYSTEM, this command is used to pass the address of the victim to the SYSTEM. The read miss that produced the victim will preceed the BCACHE VICTIM command. The VICTIM\_PENDING wire will be asserted during the read miss command to indicate that a BCACHE victim command is waiting, and that the Bcache is starting the read of the victim data.

If the SYSTEM does not have a victim buffer the BCACHE VICTIM command will preceed the read miss commands. The BCACHE VICTIM command will be driven, along with the address of the victim. At the same time the Bcache will be read to provide the victim data.

• WRITE BLOCK

#### From DECchip 21164-AA

DECchip 21164-AA wants to write a block of data back to memory. DECchip 21164-AA will drive the command, address, and first INT16 of data on a Sysclock edge. DECchip 21164-AA will output the next INT16 of data when a DACK is received. When the SYSTEM asserts CACK, DECchip 21164-AA will remove the command and address from the pins and begin the write of the Scache. CACK can be asserted before all the data is removed.

• WRITE BLOCK LOCK

From DECchip 21164-AA

This command is the same as a WRITE BLOCK except that a CFAIL may be asserted by the SYSTEM to indicate that the data can not be written. this command is only used for STx\_C in non-cached space.

# 4.1.8 Transactions

This section will describe how the commands are used to move data in and out of DECchip 21164-AA and its cache system.

Figure 4–1 shows the resources that can be used by the CPU and SYSTEM. They are listed here.

- 2 CPU commands and addresses
- 2 Scache victim address
- 2 System command and address

#### 4.1.8.1 Read Miss

DECchip 21164-AA will start a Bcache read on any CPU clock. The index will be asserted to the RAM for a programmable number of CPU cycles in the range of 4 to 10. The tag will be accessed at the same time. At the end of the first read, DECchip 21164-AA will latch the data and tag information and begin the read of the next 16 bytes of data. The tag will be checked for a hit. If there is a miss, a READ MISS or READ\_MISS\_MOD command along with the address will be queued to the CMD/ADDRESS bus. It will appear on the pins at the next Sysclock edge. Figure 4–2 shows the timing of a Bcache read and the resulting READ MISS request.

Figure 4–2 shows the READ MISS command being CACKed as soon as it is sent. This will allow DECchip 21164-AA to make additional READ MISS requests. It is also possible for the SYSTEM to defer the CACK until the fill data is returned. This allows the SYSTEM to use CMD<0> for the value of FILL\_ID. The CACK should arrive no later than the last fill DACK.





#### 4.1.8.2 Read Miss with victim

DECchip 21164-AA supports two models for removing displaced dirty blocks from the Bcache. The first assumes that the SYSTEM does not contain a victim buffer. In this case the victim must be read from the Bcache before the new block can be requested. In the second case, if the SYSTEM does have a victim buffer, DECchip 21164-AA will request the new block from memory while it starts to read the victim from the Bcache. The victim command and address will follow the miss request.

In either case, DECchip 21164-AA treates a miss/victim as single transaction. If the assertion of ADDR\_BUS\_REQ or IDLE\_BC causes the BIU sequencer to reset, both the miss and victim transactions will be restarted from the begining. For example if DECchip 21164-AA is operating in victim first mode and it sends a BCACHE VICTIM command to the SYSTEM and then the system sends an INVALIDATE to DECchip 21164-AA, DECchip 21164-AA will restart the Bcache read and resend the BCACHE VICTIM command and data and then the READ\_MISS.

The next two sections describe each of these methods of victim processing.

# 4.1.8.2.1 Without a Victim Buffer

If the SYSTEM does not contain a victim buffer, DECchip 21164-AA will stop reading the Bcache as soon as the miss is detected. This will be sometime during the second read. A BCACHE VICTIM command will be asserted at the next Sysclock along with the victim address. A Bcache read of the victim will also be started at the Sysclock edge. When the DACK is received for the first part of the victim, DECchip 21164-AA will begin reading the next part of the victim. CACK can be sent anytime during the processing of the victim. DECchip 21164-AA will send out the READ MISS command in the Sysclock after the CACK is received. Figure 4–3 shows the timing of a victim being removed.



#### Figure 4–3: Read Miss with Victim

#### 4.1.8.2.2 With a Victim Buffer

When the miss is detected, if the SYSTEM has a victim buffer, DECchip 21164-AA will wait for the next Sysclock edge and then assert a READ MISS command, the read miss address, the VICTIM\_PENDING wire, and index the Bcache to begin the read of the victim. When the SYSTEM asserts CACK, DECchip 21164-AA will send out the BCACHE VICTIM command along with the victim address. Each assertion of DACK will cause the Bcache index to advance to the next part of the block. Figure 4-4 shows the timing of a read miss with a victim.



#### Figure 4-4: Read Miss with Victim Buffer

#### 4.1.8.3 Fill

The fill wires are used to control the return of fill data to DECchip 21164-AA and the Bcache if it is present. The IDLE\_BC\_H wire must be used to stop CPU requests in the Bcache in such a way that the Bcache will be idle when the fill data arrives (but not the fill command). FILL\_H should be asserted at least two Sysclocks before the fill data arrives. The FILL\_ID\_H wire should

4-16 External Interface

be asserted at the same time to indicate if the fill will be for a READ MISS0 or READ MISS1. DECchip 21164-AA will use this information to select the correct fill address. If FILL and FILL\_ID are asserted at the end of Sysclock N, then DECchip 21164-AA will assert the Bcache index and begin a Bcache write during Sysclock N+2. The SYSTEM should drive the data onto the DATA bus and assert DACK before the end of the Sysclock cycle. This will cause DECchip 21164-AA to move on to the next fill address and begin another write of the Bcache. The SYSTEM must allocate the right number of Sysclock cycles to allow the writing of the Bcache if it is present. For example if the Bcache requires 17ns to write and the Sysclock is 12ns, two Sysclock cycles will be required for each write.

During the first fill of a block, the SYSTEM should also drive the correct values on the TAG\_SHARED, TAG\_DIRTY, and, TAG\_PARITY wires. DECchip 21164-AA will assert TAG\_VALID and write the Bcache tag store during the first fill.





#### 4.1.8.4 Write Block

The WRITE BLOCK command will be used to complete writes to shared data, to remove Scache victims in Bcache-less systems, and to complete writes to non-cached memory.

The WRITE BLOCK LOCK command follows the same protocol. The LOCK qualifier might allow the SYSTEM to be more aggressive on non-interlocked writes.

DECchip 21164-AA will assert the WRITE BLOCK command along with the address and the first 16 bytes of data at the start of a Sysclock. If the SYSTEM takes away the ownership of the CMD and ADDRESS bus, DECchip 21164-AA will hold on to the write and wait for the ownership of the bus to be returned. If the block in question is invalidated, the write will be restarted by the CPU and will result in the READ MISS MOD request instead.

When the SYSTEM has taken the first part of the data it should assert DACK. This will cause DECchip 21164-AA to drive the next 16 bytes of data at the next Sysclock edge.

If the SYSTEM asserts CACK, DECchip 21164-AA will output the next command in the next Sysclock. Receiving the CACK indicates to DECchip 21164-AA that the write will be taken and that it is safe to update the Scache with write data.

During each cycle the INT4\_VALID\_H<3:0> wires will indicate which INT4 parts of the write are really being written by the processor. For writes to cached memory, all of the data will be valid. For writes to non-cached memory, only those INT4 with the INT4\_VALID\_H<n> signal asserted are valid.

Figure 4–6 shows the timing of a write block command.

| FI | gure | <b>9 4</b> ( | 6: | Write | Block |
|----|------|--------------|----|-------|-------|
|----|------|--------------|----|-------|-------|

| SYSCLOCK = 5 CPU                                   | CYCLES, | BCACHE - | 4 CPU CLO | OCKS | <br>  |     |  |
|----------------------------------------------------|---------|----------|-----------|------|-------|-----|--|
| SYSCLOCK<br>CMD<br>Addr<br>CACK                    |         |          |           |      |       |     |  |
| EV5_RES<br>CFAIL<br>FILL<br>DACK<br>VICTIM PENDING |         |          |           |      |       |     |  |
| BUS REQ<br>CPUCLOCK<br>INDEX<br>DATA               |         | ·····    |           |      | ····· | uuu |  |

#### 4.1.8.5 Set Dirty, Lock

Figure 4-7 shows the timing of a SET DIRTY command and a LOCK command.

The SET DIRTY command is used by DECchip 21164-AA to inform a duplicate tag store that a cached block is changing from the not-shared clean state to the not-shared dirty state. When the CACK is received from the SYSTEM, DECchip 21164-AA will attempt to set the dirty bit. If the shared bit has been set since the original probe of the Scache, or the block has been invalidated, DECchip 21164-AA will restart the write. This will produce a new request which reflects the new state of the block. If the block is still in the not-shared clean state, the dirty bit will be set and the write completed.

The LOCK command is used by DECchip 21164-AA to pass the address of a LDx\_L to the SYSTEM. A system lock register is required in any system that filters write traffic with a duplicate tag store. If the locked block is displaced from the DECchip 21164-AA caches, DECchip 21164-AA will use the value of the system lock register to determine if the LDx\_L/STx\_C sequence should pass or fail.

#### 4-18 External Interface





# 4.1.8.6 Flush

The FLUSH command can be used to remove blocks from the DECchip 21164-AA cache system. If the block is dirty, the block will be read from the caches to allow the updating of memory. Figure 4–8 shows the timing of a FLUSH transaction.

#### Figure 4–8: Flush



#### 4.1.8.7 Read Dirty, and Read Dirty/INV

The READ DIRTY command is used to read modified data from the cache system. The block is also transitioned into the shared state. Figure 4–9 shows the timing of a READ\_DIRTY transaction. The Scache will be probed and the data read if it is found. The state will also be set to shared. If the data is not found in the Scache, it is assumed to be in the Bcache. DECchip 21164-AA will start the read of the Bcache and write the tag to the shared state.

The READ DIRTY/INV command is identical to the READ DIRTY command except the block is transitioned to the invalid state instead of the shared state.



# Figure 4–9: Read Dirty

# 4.1.8.8 Invalidate

The INVALIDATE command can be used to remove a block from the cache system. Unlike the FLUSH command, any modified data will not be read. The Scache will be probed and invalidated if the block is found. The Bcache will be invalidated without probing. Figure 4-10 shows the timing of an INVALIDATE transactions.

| CLOCK       |                     |            |      |            |    |   |   |          |
|-------------|---------------------|------------|------|------------|----|---|---|----------|
| 1           | $ \longrightarrow $ | INVALIDATE |      |            |    |   |   |          |
| R           | $ \longrightarrow $ | INVLD_ADDR | <    |            |    |   |   |          |
| ĸ           |                     | i<br>      |      | !<br>      |    |   |   | <u>i</u> |
| _RES        | 0.000               |            | l    | ACK_SCACHE |    |   |   |          |
| ALL         |                     |            |      |            |    |   |   |          |
| L           |                     | <u> </u>   | 1    |            |    |   |   | <u> </u> |
| ĸ           |                     |            |      |            |    |   |   | <u> </u> |
| TIM PENDING | ·····               | i          | į    | l          |    |   |   | <u> </u> |
| REQ         |                     | 1<br>!     | ·    |            |    |   |   | <u> </u> |
| CLOCK       | ா                   | h          | hunn | hnnn       | mm | h | h | h        |
| EX          |                     |            |      | WLD ADDH   |    |   |   | ÷        |

#### Figure 4–10: Invalidate

4-20 External Interface

#### 4.1.8.9 Set Shared

When DECchip 21164-AA revieves a SET\_SHARED command, it will probe the Scache and change the state of the block to shared if it is found. DECchip 21164-AA will assume that the block is in the Bcache and write the state of the tag to shared, not-dirty. Figure 4–11 shows the timing of a SET\_SHARED command.



| SYSCLOCK = 5 CPL               | J CYC | LES     | , BCACHE              | - 4 | CPU CLC | OCKS | •                                     |     |     |          |
|--------------------------------|-------|---------|-----------------------|-----|---------|------|---------------------------------------|-----|-----|----------|
| SYSCLOCK<br>CMD<br>ADDR        |       |         | SET_SHARE<br>SHR_ADDR |     |         |      |                                       |     |     |          |
| CACK<br>EV5_RES<br>CFAIL       |       |         |                       |     |         |      | · · · · · · · · · · · · · · · · · · · |     |     |          |
| FILL<br>DACK<br>VICTIM PENDING |       |         |                       |     |         |      |                                       |     |     |          |
| BUS REQ<br>CPUCLOCK<br>INDEX   | קרוו  | rt<br>T | uuu                   |     |         |      | www                                   | www | uuu |          |
| DATA                           |       | -       |                       |     |         |      | 1                                     |     |     | <u>.</u> |

#### 4.1.8.10 Non-cached Reads

Reads to physical addresses that have bit 39 asserted will not be cached in the Dcache, Scache, or Bcache. They will be merged like any other read in the miss address file. To prevent several reads to non-cached memory from being merged into a single 32 byte bus request, software must insert MB instructions. The miss address file will merge as many Dstream reads together as it can and send the request to the BIU via the Scache. The BIU will not merge two 32 byte requests into a single 64 byte request. The BIU will request a READ MISS from the SYSTEM. DATA\_VALID<3:0>\_H will indicate which of the four quadwords are being requested by software. The SYSTEM should return the fill data to DECchip 21164-AA in the normal way. DECchip 21164-AA will not write the Dcache, the Scache, or the Bcache with the refill. The requested data will be written in the register file or Icache.

#### 4.1.8.11 Non-cached Writes

Writes to physical addresses that have bit 39 asserted will not be written to any of the caches. They will be merged in the write buffer before being sent to the SYSTEM. If software does not want writes to merge it must insert MB or WMB instructions between them.

When the write buffer decides to dump data to non-cached memory the BIU will request a WRITE BLOCK. Each data cycle, DATA\_VALID<3:0> will indicate which INT4s within the INT16 were really written.

### 4.1.8.12 Locks

The LDx\_L instructions will be forced to miss in the Dcache. When the Scache is read, the Lock register in the BIU will be loaded with the physical address and the lock flag set. The BIU will send a LOCK command to the SYSTEM so it can load its lock register. The SYSTEM lock register will only be used if the locked block is displaced from the cache system. The lock flag will be cleared if any of the following things happen:

- Any write from the bus occurs to the locked block (FLUSH, INVALIDATE, or READ\_DIRTY\_INV).
- A STx\_C by the processor.

The SYSTEM copy of the lock register is required on systems that have a duplicate tag store to filter write traffic. The direct mapped Icache, Dcache, and Bcache along with the sub-setting rules, branch prediction, and Istream prefetching can cause a lock to always fail because of constant Scache thrashing of the locked block. Each time a block is loaded into the Scache, the value of the lock register will be ANDed with the value of the SYSTEM\_LOCK\_FLAG signal. If the locked block is displaced from the cache system, DECchip 21164-AA will not see bus writes to the locked block, in this case the SYSTEM's copy of the lock register will correct the processor copy of the lock flag when the block is filled into the cache via the signal SYSTEM\_LOCK\_FLAG\_H.

Systems that do not have a duplicate tag stores, and send all probe traffic to DECchip 21164-AA are not required to have a copy of the lock flag. They should wire the SYSTEM\_LOCK\_FLAG\_H to TRUE.

When the STx\_C is issued the Ibox will stop issuing memory type instructions. The store will update the Dcache in the normal way, and be placed in the write buffer by itself. It will not be merged with other pending writes. The write buffer will be flushed.

When the write buffer gets to a STx\_C in cached memory, it will probe the Scache to check the block state. When the STx\_C passes through the Scache, an invalidate will be sent to the Dcache. If the Lock flag is clear, the STx\_C will fail. If the block is not-shared dirty, the write buffer will write the STx\_C data into the Scache. Success will be written to the register file and the Ibox will begin issuing memory instructions again. If the block is in the shared state, the BIU will request a WRITE BLOCK LOCK. If the WRITE BLOCK LOCK is CACKed, the Scache will be written and the Ibox started as above. If the WRITE BLOCK LOCK is CFAILed, the STx\_C will fail. No data will be written.

When the write buffer gets to a STx\_C in non-cached memory it will probe the Scache to check the block state. It will miss. The state of the Lock flag will be ignored. The BIU will request a WRITE BLOCK LOCK. If the WRITE BLOCK LOCK is CACKed, the Ibox is started as above. If the WRITE BLOCK LOCK is CFAILed the STx\_C will fail. No data will be written.

# 4.1.9 Clocks

#### 4.1.9.1 CPU Clock

External logic will supply DECchip 21164-AA with a differential clock at twice the desired internal clock frequency via the CLK\_IN\_H and CLK\_IN\_L pins. DECchip 21164-AA divides this clock by two to generate the internal chip clock.

### 4.1.9.2 System Clock

The CPU clock is divided by a programmable value between 3 and 15 to generate a system clock, which is supplied to the external interface via the SYS\_CLK\_OUT1\_H,L pins. Table 5–1 for the valid ratios of System clock to CPU clock.

SYS\_CLK\_OUT1 is delayed by a programmable number of CPU cycles between 0 and 7 to produce SYS\_CLK\_OUT2\_H, L.

The output of the programmable divider is symmetric if the divisor is even, and asymmetric with SYS\_CLK\_OUT1\_H and SYS\_CLK\_OUT2\_H TRUE for one extra CPU cycle if the divisor is odd.

The false-to-true transition of the SYS\_CLK\_OUT1\_H is the "Sysclock" used as a timing reference throughout the specification.

### 4.1.9.3 Reference Clock

The SYSTEM may supply a reference clock to which DECchip 21164-AA will synchronize SYS\_ CLK\_OUT1\_H. To do this the frequency of SYS\_CLK\_OUT1 must be ever so slightly higher than that of REF\_CLK\_IN. This will cause the rising edge of SYS\_CLK\_OUT1 to drift back towards the rising edge of REF\_CLK\_IN. DECchip 21164-AA will detect when the edges meet and stall the internal clock generator for one CLK\_IN cycle. This will move the rising edge of SYS\_CLK\_OUT back in front of REF\_CLK\_IN. Figure 4-12 attempts to show this timing.

#### Figure 4–12: Reference Clock Timing

#### 4.1.9.4 Sysclock to Bcache cycle time ratios

The Bcache cycle time may be faster, the same, or slower than the Sysclock.

Reads and writes that are private to DECchip 21164-AA and the Bcache may start on any CPU clock. There is no relation between the Sysclock and the Bcache accesses.

If the SYSTEM is involved in a Bcache transaction, each read or write will start on a Sysclock. It is up to the SYSTEM to control the rate of the Bcache transactions using the DACK wire.

The Bcache will be written during WRITE BLOCK, WRITE BLOCK LOCK, READ DIRTY, and READ DIRTY INV commands that source data from the Scache. The write of the first part of the block will start in the Sysclock that drove the command/response and address to the SYSTEM. The SYSTEM must allow enough time for the write to complete before asserting DACK. The next write will start on the Sysclock edge that DACK was asserted on.

When DECchip 21164-AA receives the fill indication for the SYSTEM it will start writing the Bcache in the N+2 Sysclock. At the end of the write time, DECchip 21164-AA will wait for the next Sysclock edge. If DACK is not asserted, the Bcache write will begin again at the same index. If DACK is asserted, the index will advance to the next part of the fill and the write will begin again. The SYSTEM must provide the data and DACK signal at the correct Sysclock edges to complete the fill correctly.

# 4.1.10 Tri-state Overlap

The ADDRESS/CMD bus and the DATA/TAG bus must be operated in a way that prevents more than one driver from driving the bus at a time. This section will describe the features in DECchip 21164-AA that might be used to prevent tri-state overlap.

The owner of each bus must drive the bus to some value each cycle.

In general DECchip 21164-AA assumes that its drivers turn on and off very fast (0.5ns to 1ns range). SRAMs turn on and off slowly. System drivers fall someplace in the middle.

Figure 4–13 shows DECchip 21164-AA and the SYSTEM taking turns driving the CMD/ADDRESS bus. If ADDR\_BUS\_REQ is asserted at the end of a Sysclock 0, the next cycle on the CMD/ADDRESS bus belongs to the SYSTEM. DECchip 21164-AA will turn off it's drivers at the start of the Sysclock 1. The SYSTEM must turn on it's drivers during Sysclock 1, but must insure that the driver doesn't turn on before DECchip 21164-AA turns off. DECchip 21164-AA will sample the state of the CMD/ADDRESS bus at the end of Sysclock 1.

If ADDR\_BUS\_REQ remains asserted, the SYSTEM should continue to drive the CMS/ADDRESS bus.

To pass the bus back to DECchip 21164-AA, the SYSTEM should turn off its drivers during a Sysclock and de-assert ADDR\_BUS\_REQ. DECchip 21164-AA will not sample the state of the bus if ADDR\_BUS\_REQ is de-asserted. At the next Sysclock edge, DECchip 21164-AA will drive the bus.

#### Figure 4–13: Driving the CMD/ADDRESS Bus

|                      | 0        | 1           | 2      | 3  |
|----------------------|----------|-------------|--------|----|
| SYS_CLK_OUT1         | 11       |             |        |    |
| ADDR_BUS_REQ         | 1111111  | 11111111111 | 111111 |    |
| DECchip 21164-AA DRV | ON (     | OFF         |        | ON |
| SYSTEM DRV           | OFF      | -   ON      | OFF    |    |
| DECchip              | 21164-AA | samples he  | re ^   |    |

The DATA bus can be driven by DECchip 21164-AA, the Bcache, or the SYSTEM.

For DECchip 21164-AA Bcache Writes followed by DECchip 21164-AA Bcache Reads, we assume that DECchip 21164-AA stops driving the DATA bus well in advance of the Bcache turning on.

For DECchip 21164-AA Bcache Reads followed by DECchip 21164-AA Bcache Writes, DECchip 21164-AA will insert a programmable number of CPU cycles between the read and the write. This will allow time for the Bcache drivers to turn off before turning on the DECchip 21164-AA data drivers. These rules apply to WRITE BLOCK, WRITE BLOCK LOCK, READ, READ DIRTY, and FLUSH commands as well.

DECchip 21164-AA will not prevent tri-state overlap at the start of a fill. The SYSTEM must assert IDLE\_BC early enough to allow all the drivers to turn off before the SYSTEM turns on its drivers.

At the end of the Fill, DECchip 21164-AA will wait READ->WRITE programmable number of CPU cycles before starting a read or write. This time should allow the SYSTEM to turn off it's drivers. If this is not enough time, the system may assert DATA\_BUS\_REQ to gain addition cycles.

# 4.1.11 **Restrictions**

This section will document restrictions on the use of DECchip 21164-AA interface features.

# 4.1.11.1 Fills after other transactions

If the system is removing data from DECchip 21164-AA with any of the system commands, or if the system is removing a Bcache victim from the Bcache and it wants to follow any of these transactions with a fill, then the earliest assertion of the FILL signal is the Sysclock after the last DACK.

Fills followed by Fills is a special case. Fills can be pipelined back to back to use 100% of the data bus bandwidth.

This restriction may be lifted in the future.

#### 4.1.11.2 Sending System commands

A SYSTEM can send up to TWO commands to DECchip 21164-AA. It must then wait for the assertion of the RES\_H signal for the first command before it can send the third command.

# 4.1.11.3 CACK for WRITE BLOCK commands

When DECchip 21164-AA requests a WRITE BLOCK or WRITE BLOCK LOCK, the SYSTEM can DACK the data before asserting CACK. The SYSTEM must assert CACK no later than the last DACK.

# 4.1.11.4 No Bcache Systems

SYSTEMs without a Bcache must have a block size of 64 bytes and all three sets in the Scache must be enabled.

### 4.1.11.5 Scache duplicate tag store

SYSTEMs without a Bcache that do have an Scache duplicate tag store are also required to maintain tags for the two blocks in the DECchip 21164-AA Scache victim buffer.

# NOTE

FETCH and FETCH\_M commands will no longer be auto acked by DECchip 21164-AA. They will always be driven to the SYSTEM for acknowledgement.

SET DIRTY, LOCK, and MB commands have been merged in to a single command group in the BC\_CONTROL<EI\_OPT\_CMD> ipr.

# 4.1.12 ECC/Parity

The chip will support INT8 ECC for the external Bcache and memory system. ECC will be provided by the CPU for each INT8 that is written into the Bcache. Fill data read from the Bcache and memory will be checked by hardware. Uncorrected data will be sent to the Dcache, and register files. Single bit errors will be corrected by hardware. The Scache and Icache will be filled with corrected data. Double bit errors will be detected. If the SYSTEM has indicated that the data should not be checked, no checking or correcting will be performed.

Each data bus cycle will deliver one INT16 worth of data. ECC is calculated as ECC(data<63:0>) and ECC(data<127:64>). This allows ECC to be calculated on each side of the chip. Figure 4–14 shows the code. Two IDT49C460 or AMD29C660 parts can be cascaded to produce this ECC code. A single IDT49C466 will also support this ECC code.

The code provides single bit correct, double bit detect, and all 1's and all 0's detect.

If the DECchip 21164-AA is in parity mode, it will generate byte parity and place it on the DATA\_ CHECK\_H<15:0> for writes. Parity will be checked for reads. Parity for data<7:0> will be driven on DATA\_CHECK\_H<0> and so on.

#### Figure 4–14: ECC code

 11
 11111
 1111
 222
 222
 223
 3333
 3333
 4444
 4445
 5555
 5555
 6666
 cccc
 cccc

 0123
 4567
 8901
 2345
 6789
 0123
 4567
 8901
 2345
 6789
 0123
 4567
 8901
 2345
 6789
 0123
 4567

 CB0
 .111
 .1
 .1
 .1
 1
 .1
 1
 .1
 1
 .1
 1
 .1
 1
 .1
 1
 .1
 1
 .1
 .1
 .1
 .1
 .1
 .1
 .1
 .1
 .1
 .1
 .1
 .1
 .1
 .1
 .1
 .1
 .1
 .1
 .1
 .1
 .1
 .1
 .1
 .1
 .1
 .1
 .1
 .1
 .1
 .1
 .1
 .1
 .1
 .1
 .1
 .1
 .1
 .1
 .1
 .1
 .1
 .1
 .1
 .1
 .1
 .1
 .1
 .1
 .1
 .1
 .1
 .1
 .1
 .1
 .1
 .1
 .1

CB2 and CB3 are calculated for ODD parity (an odd number of "1"s counting the CB)

CBO, CB1, CB4, CB5, CB6, CB7 are calculated for EVEN parity (an even number of "1"s counting the CB)

4-26 External Interface

For x4 RAMs, Dave Hartwell has provide the following bit arrangement that will detect nibble errors.

| CB0 | CB1 | CB5 | CB6 |  |  |  |   |  |
|-----|-----|-----|-----|--|--|--|---|--|
| CB2 | D0  | D4  | D5  |  |  |  |   |  |
| CB3 | CB4 | D7  | D8  |  |  |  |   |  |
| CB7 | D2  | D3  | D11 |  |  |  | • |  |
| D1  | D6  | D10 | D13 |  |  |  |   |  |
| D9  | D14 | D18 | D21 |  |  |  |   |  |
| D12 | D16 | D17 | D22 |  |  |  |   |  |
| D15 | D19 | D20 | D23 |  |  |  |   |  |
| D24 | D25 | D27 | D30 |  |  |  |   |  |
| D26 | D28 | D29 | D31 |  |  |  |   |  |
| D32 | D34 | D35 | D37 |  |  |  |   |  |
| D33 | D36 | D38 | D40 |  |  |  |   |  |
| D39 | D41 | D43 | D46 |  |  |  |   |  |
| D42 | D44 | D45 | D47 |  |  |  |   |  |
| D48 | D50 | D51 | D53 |  |  |  |   |  |
| D49 | D52 | D54 | D56 |  |  |  |   |  |
| D55 | D57 | D59 | D62 |  |  |  |   |  |
| D58 | D60 | D61 | D63 |  |  |  |   |  |

# Figure 4–15: x4 bit arrangement

# 4.2 Revision History

| Table | 4-9: | Revision | History |
|-------|------|----------|---------|
|-------|------|----------|---------|

| Who         | When     | Rev | <b>Description of change</b>    |
|-------------|----------|-----|---------------------------------|
| Pete Bannon | 12/16/91 | 0.8 | DRAFT 0.8 text                  |
| Pete Bannon | 12/31/91 | 0.9 | DRAFT 0.9 text                  |
| Pete Bannon | 3/ 1/92  | 1.0 | FILL ERROR, new non-cached read |
| Pete Bannon | 3/27/92  | 1.2 | New WS focus interface          |
| Pete Bannon | 3/27/92  | 1.3 | New victim sequence             |
| Pete Bannon | 4/21/92  | 1.4 | New ECC code                    |
| Pete Bannon | 11/30/92 | 1.5 | general update                  |

# Chapter 5

# **Reset and Initialization**

# 5.1 SYS\_RESET\_L and DC\_OK\_H

The DECchip 21164-AA reset process starting from a powered off state uses two input signals, SYS\_RESET\_L and DC\_OK\_H. Until power has reached the proper operating point, DC\_OK\_H must be deasserted and SYS\_RESET\_L must be asserted. After power has reached the proper operating point, DC\_OK\_H is asserted. After that, SYS\_RESET\_L is deasserted.

From a powered on state, the reset sequence begins with SYS\_RESET\_L assertion. In any case, after SYS\_RESET\_L is deasserted, DECchip 21164-AA begins a sequence of operations: Icache BiSt, followed by an optional automatic Icache initialization via an external serial ROM interface, and finally dispatching to the RESET PALcode trap entry point.

If DC\_OK\_H is not asserted, SYS\_RESET\_L is forced asserted internally.

SYS\_RESET\_L forces the CPU into a known state. Chapter 3 gives the reset state of each IPR and Section 9.1 gives the reset state of the pins.

While DC\_OK\_H is deasserted, DECchip 21164-AA provides its own internal clock source from an on-chip ring oscillator. When DC\_OK\_H is asserted, the DECchip 21164-AA clock source is the differential clock input pins, CLK\_IN\_H and CLK\_IN\_L.

SYS\_RESET\_L must remain asserted while DC\_OK\_H is deasserted and for a period of time after DC\_OK\_H assertion which is at least TBD internal CPU cycles in length and at least TBD Sysclock cycles in length. After that, SYS\_RESET\_L is deasserted. SYS\_RESET\_L deassertion generally should be synchronous with respect to Sysclock.

# ISSUE

Does DECchip 21164-AA have to support asynchronous deassertion of SYS\_RESET\_L?

When DECchip 21164-AA is running off the internal ring oscillator, the internal clock frequency is in the range TBD. Also the Sysclock divisor ratio is forced to TBD and the SYS\_CLK\_OUT2\_x delay is forced to TBD. After DC\_OK\_H is asserted, the Sysclock divisor and SYS\_CLK\_OUT2\_x delay are determined by input pins while SYS\_RESET\_L remains asserted. See Section 5.2.

# 5.1.1 **Power Up Requirements**

The DECchip 21164-AA chip uses a 3.3V power supply. This 3.3V power supply must be stable before any input or bidirectional pin rises above 4V.

The VREF\_H input pin must have reached the correct stable operating point before DC\_OK\_H is asserted. See Chapter 7.

# 5.1.2 Pin State with DC\_OK\_H Not Asserted

While DC\_OK\_H is not asserted (and SYS\_RESET\_L is asserted), every output and bidirectional DECchip 21164-AA pin is tristated and pulled weakly to ground by a small pull-down transistor.

# 5.2 Sysclock Ratio and Delay

While in reset, DECchip 21164-AA reads Sysclock configuration parameters from the interrupt pins. Table 5–1 shows how the Sysclock divisor is determined and Table 5–2 shows how the SYS\_CLK\_OUT2\_x delay is determined. These inputs should be driven with the correct configuration whenever SYS\_RESET\_L is asserted. When these inputs change while SYS\_RESET\_L is asserted, it takes TBD internal CPU cycles before the new Sysclock behavior is correct.

| IRQ_H<3>     | IRQ_H<2>  | IRQ_H<1> | IRQ_H<0> | Ratio              |
|--------------|-----------|----------|----------|--------------------|
| L            | L         | Н        | Н        | 3                  |
| $\mathbf{L}$ | H         | L        | L        | 4                  |
| L            | н         | L        | Н        | 5                  |
| L            | н         | Н        | L        | 6                  |
| L            | н         | H        | H        | 7                  |
| н            | L         | L        | L        | 8                  |
| H            | L         | L        | H        | 9.                 |
| н            | L         | н        | L        | 10                 |
| н            | H         | H        | H        | 15                 |
|              | all other | values   |          | unspecified effect |

 Table 5–1:
 System Clock Divisor

| Table 5-2: Syste      | Table 5–2: System Clock Delay |                   |       |  |  |  |  |  |  |
|-----------------------|-------------------------------|-------------------|-------|--|--|--|--|--|--|
| SYS_MCH_CHK_<br>IRQ_H | PWR_FAIL_IRQ_<br>H            | MCH_HLT_IRQ_<br>H | Delay |  |  |  |  |  |  |
| L                     | L                             | L                 | 0     |  |  |  |  |  |  |
| L                     | L                             | Н                 | 1     |  |  |  |  |  |  |
| L                     | H                             | L                 | 2     |  |  |  |  |  |  |
| L                     | Н                             | Н                 | 3     |  |  |  |  |  |  |
| Н                     | L                             | L                 | 4     |  |  |  |  |  |  |
| H                     | L                             | Н                 | 5     |  |  |  |  |  |  |
| H                     | Н                             | L                 | 6     |  |  |  |  |  |  |
| Н                     | Н                             | Н                 | 7     |  |  |  |  |  |  |

# 5.3 BiSt

Normally upon deassertion of SYS\_RESET\_L, DECchip 21164-AA automatically executes Icache BiSt (Built in Self-test). If PORT\_MODE\_H<1> is asserted, the test port is in debug test interface mode and BiSt is bypassed. Otherwise, the Icache is automatically tested and the result is made available in ICSR and on TEST\_STATUS\_H<0>. Internally, the CPU chip reset continues to be asserted throughout the BiSt test process.

# 5.4 Serial ROM

After Icache BiSt completes, an optional serial ROM Icache load sequence begins. If SROM\_ PRESENT\_L was not asserted when SYS\_RESET\_L transitioned to deasserted, the serial ROM load process is skipped, internal CPU reset is deasserted, and PALcode execution begins at the RESET trap entry point. If SROM\_PRESENT\_L was asserted when SYS\_RESET\_L transitioned to deasserted, the serial ROM load sequence is completed prior to deassertion of internal CPU reset and PALcode execution at the RESET trap entry point.

Figure 5-1 gives a timing diagram of a serial ROM load sequence. Chapter 11 describes the format of the Icache data. Every data and tag bit in the Icache is loaded by this sequence.

#### Figure 5–1: Serial ROM Load Timing

| SYS_RESET_L       | /~~~~~                                  | ~~~~~~  |        | ~~~~~~ | ~~~~~ | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|-------------------|-----------------------------------------|---------|--------|--------|-------|-----------------------------------------|
| SROM_OE_L         | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ~~~~~\_ |        |        |       | / ~~~~~                                 |
| SROM_CLK_H        |                                         |         | /~~~\_ | /~~~   | \/~   | ~\                                      |
| sample SROM_DAT_H | •                                       |         | ^      | ^      | ^     |                                         |
|                   |                                         |         |        |        |       |                                         |

# 5.5 Cache Initialization

Regardless of whether Icache BiSt is executed, the Icache is flushed during the reset sequence prior to serial ROM load. If serial ROM load is bypassed, the Icache is initially in the flushed state.

The Scache is flushed and enabled by internal reset. This is required if serial ROM load is bypassed. The initial Istream reference after reset is location 0. Since that is a cacheable-space reference, it will probe the Scache.

The Bcache is disabled by reset.

The Dcache is disabled by reset. It is not initialized or flushed by reset.

# 5.6 **BIU** initialization

After reset, the Cbox is in the default configuration dictated by the reset state of the IPR bits which select the configuration options. (Note that the Bcache configuration registers are not initialized by reset.) The Cbox response to system commands and internally generated memory accesses will be determined by this default configuration. Systems should be compatible with ths default configuration or arrange to change it before initiating any accesses to cacheable space. Since the initial PALcode trap entry point is in cacheable space, system environmennts which are not compatible with the default configuration must utilize the serial ROM Icache load feature to initially load and execute a PALcode program which will configure Cbox IPRs as needed.

### 5.7 Unitialized state

A number of IPR bits are not initialized by reset. These are error reporting registers and some other IPR states. These must be initialized by initialization PALcode.

### 5.8 Timeout Reset

The Ibox contains a timeout timer which times out when a very long period of time passes with not one instruction completing. When this timeout occurs, an internal reset event occurs which clears sufficient internal state to allow the CPU to begin executing again. Registers, IPRs, and Caches are not affected. Dispatch to the PALcode MCHK trap entry point occurs immediately.

# 5.9 Clock Reset

A TBD method will exist which allow a chip tester to initialize the Sysclock divider logic. This allows for deterministic operation during chip test. Due to the size of internal logic propagation delays as compared to the normal speed of the internal CPU clock, it will be necessary to run the internal CPU clock at a low speed while initializing the Sysclock divider.

# 5.10 IEEE 1149.1 Test Port Reset

TRST\_L must be asserted whenever SYS\_RESET\_L is asserted or DC\_OK\_H is deasserted.

Continuous TRST\_L assertion during normal operation can be used to prevent the IEEE 1149.1 Test Port from affecting DECchip 21164-AA operation.

# DIGITAL RESTRICTED DISTRIBUTION

**Reset and Initialization 5-5** 

# 5.11 Revision History

| When                 | Description of change        |                                                              |  |  |  |  |
|----------------------|------------------------------|--------------------------------------------------------------|--|--|--|--|
| 1-March-1992         | Brief statement of plan.     |                                                              |  |  |  |  |
| 30-November-<br>1992 | Update                       |                                                              |  |  |  |  |
|                      | 1-March-1992<br>30-November- | 1-March-1992 Brief statement of plan.<br>30-November- Update |  |  |  |  |

Table 5-3: Revision History

# Chapter 6

# **Error Handling**

# 6.1 Overview

This is an overview of DECchip 21164-AA's error handling strategy. Each internal cache (Icache, Dcache, and Scache) implements parity protection for tag and data. ECC protection is implemented for memory and Bcache data. (The implementation provides detection of all double-bit errors and correction of all single bit errors.) Correctable Istream and Dstream ECC errors are corrected in hardware without PALcode intervention. Bcache tags are parity protected. The Ibox implements logic which detects when no progress has been made for a very long time (a TBD number of CPU cycles of issue stall or infinitely repeated traps) and forces a machine check trap.

PALcode handles error traps. If the error is destructive, PALcode attempts retry when it is reasonable. If retry fails, PALcode posts a machine check exception. If retry succeeds, PALcode posts a correctable error interrupt. PALcode builds a logout frame in the HWRPB at the time the error is handled (speculatively, before retry).

Where possible, the address of affected data is reported to the operating system. In some cases, the system may be able to recover from an error by terminating all processes which had access to the affected memory location.

- Icache data or tag parity error: A trap occurs before the erroneous instruction is executed. PALcode retries once. (Note: the Icache is not flushed in this event. If an Icache parity occurs early in the PALcode routine at the machine check entry point, an infinite loop may result.)
- Dcache data or tag parity error. A machine check occurs. Generally no retry is possible. The Mbox records the Virtual Address of the INT16 with the error. A second error bit prevents multiple errors from going undetected.
- Scache tag or data parity error: A machine check occurs. Generally no retry is possible. The Cbox records the physical address of the INT16 with the error.
- Istream or Dstream correctable ECC error (Bcache or memory): DECchip 21164-AA hardware corrects the data. A separately maskable correctable error interrupt occurs at IPL 31 (same as machine check). PALcode can scrub the location. (Using LDxL, STxC. If the STxC fails, the location can be assumed to be scrubbed.) Note that there will be performance degradation in systems when extremely high rates of correctable ECC errors are present due to the internal handling of this error (the implementation utilizes a replay trap and automatic Dcache flush to prevent use of the incorrect data).
- Istream uncorrectable ECC errors (Bcache or memory): A trap prevents the erroneous instruction from executing. PALcode may retry once.

#### DIGITAL RESTRICTED DISTRIBUTION

Error Handling 6–1

- Dstream uncorrectable ECC errors (Bcache or memory): A machine check occurs. No retry is possible. The Cbox records the address of the INT16 with the error.
- System reads of Scache: Parity errors accessing the Scache cause a machine check trap. The Cbox records the address. Recovery is generally not possible.
- System reads of the Bcache: DECchip 21164-AA does not check the ECC on outgoing Bcache data. If it is bad, the receiving processor will detect it.
- Bcache tag parity errors: These are generally not recoverable. DECchip 21164-AA detects the error and posts a machine check trap. Bcache hit is determined based on the tag alone, not the parity bit. The victim is processed according to the status bits in the tag, ignoring the control field parity. The Cbox records the probe address and actual tag value read by the Cbox. PALcode can distinguish fatal from non-fatal occurrences by checking for the case in which a potentially dirty block is replaced without the victim being properly written back and the case of false hit when the tag parity is incorrect.
- For systems in which fill timeout can occur, the system environment should detect fill timeout and cleanly terminate the reference to DECchip 21164-AA. If the system environment expects fill timeouts to occur, it should detect them. If it does not expect them (as might be true in small systems with fixed memory access timing), it is likely that the internal Ibox timeout will eventually detect a stall if a fill fails to occur. To properly terminate a fill in an error case, the FILL\_ERROR\_H pin is asserted for one cycle and the normal fill sequence involving the FILL\_H, FILL\_ID\_H, and DACK pins is generated by the system environment. FILL\_ ERROR\_H assertion forces a PALcode trap to the MCHK entry point, but has no other effect.
- System machine check: DECchip 21164-AA has a maskable machine check interrupt input pin. It is used by system environments to signal fatal errors which are not directly connected to a read access from DECchip 21164-AA. It is masked at IPL 31 and anytime DECchip 21164-AA is in PALmode.
- Ibox timeout: When the Ibox detects a timeout, it causes a PALcode trap to the MCHK entry point. Simultaneously, a partial internal reset occurs: most state except IPR state is reset. This should not be depended on by systems in which fill timeouts occur in typical use (e.g., operating system or console code probing locations to determine if certain hardware is present). The purpose of this error detection mechanism is to attempt to prevent system hang and to attempt to write a machine check stack frame.
- Assertion of CFAIL\_H in a sysclock cycle in which CACK\_H is not asserted causes DECchip 21164-AA to immediately execute a partial internal reset and take a PALcode trap to the MCHK entry point. This is exactly the same result as an Ibox timeout, only a timeout did not occur. This can be used to restore DECchip 21164-AA and the external environment to a consistent state after the external environment detects a command or address parity error.
- When DECchip 21164-AA detects a command or address parity error, the command is unconditionally NOACKed and a PALcode trap to the MCHK entry point occurs.

# 6.2 Revision History

| Table 6-1: | Revision | History |
|------------|----------|---------|
|------------|----------|---------|

| Who | When         | Description of change     |  |
|-----|--------------|---------------------------|--|
| JHE | 1-March-1992 | Brief strategy statement  |  |
| JEM | 13-Nov-1992  | Overview for new release. |  |
| JHE | 19-Dec-1992  | Edits for new release.    |  |

## DIGITAL RESTRICTED DISTRIBUTION

Error Handling 6–3

6-4 Error Handling

# Chapter 7

# **DC Characteristics**

## 7.1 Overview

DECchip 21164-AA is capable of running in a CMOS/TTL environment or an ECL environment. The chips will be tested and characterized in a CMOS environment. The specifications below assume a CMOS/TTL environment. Differences for an ECL environment are noted in Section 7.2.

# 7.1.1 Power Supply

In CMOS mode the VSS pins are connected to 0.0V, and the VDD pins are connected to 3.3V, +/- 5%.

The VREF\_H analog input should be connected to a 1.4V +/-10% reference supply.

### 7.1.2 Input Clocks

CLK\_IN (\_H,\_L) is expected to be a differential signal generated from an ECL oscillator circuit, although non-ECL circuits may also be used. It may be AC coupled, with a nominal DC bias of VDD/2 set by a high-impedence (i.e. >1K) resistive network on chip. It need not be AC coupled if VDD is used as the VCC supply to the ECL oscillator. See the AC Characteristics chapter for more detail.

## 7.1.3 Signal pins

Input pins are ordinary CMOS inputs with standard TTL levels, see Table 7–1. Once power has been applied and VREF\_H has met its hold time, the majority of input pins can be driven by 5.0V (nominal) signals without harming DECchip 21164-AA. There are some signals that are sampled before VREF\_H is stable, and these signals can not be driven above the power supply. These signals are:

- DC\_OK\_H
- ECL\_OUT\_H
- TRST\_L
- TDI\_H
- TDO\_H
- TMS\_H
- TCK\_H

Output pins are ordinary 3.3V CMOS outputs. Although output signals are rail-to-rail, timing is specified to standard TTL levels, see Table 7–1.

Bidirectional pins are ordinary 3.3V CMOS bidirectional. On input, they act like input pins. On output, they drive like output pins.

Once power has been applied, input (except noted above) and bidirectional pins can be driven to a maximum DC voltage of 5.5V without harming DECchip 21164-AA (it is not necessary to use static RAMS with 3.3V outputs).

| Parameter |                                          | Requirements |     |       |                                    |
|-----------|------------------------------------------|--------------|-----|-------|------------------------------------|
| Symbol    | Description                              | Min          | Max | Units | <b>Test Conditions</b>             |
| TTL Inpu  | its/Outputs                              |              |     |       |                                    |
| Vih       | High level input voltage                 | 2.0          |     | v     |                                    |
| Vil       | Low level input voltage                  |              | 0.8 | v     |                                    |
| Voh       | High level output voltage                | 2.4          |     | v     | Ioh = -100uA                       |
| Vol       | Low level output voltage                 |              | 0.4 | V     | Iol = 3.2mA                        |
| Power/Le  | akage                                    |              |     |       |                                    |
| Icin      | Clock input Leakage                      | -50          | 50  | uA    | -0.5 <vin<5.5v< td=""></vin<5.5v<> |
| Iil       | Input leakage current                    | 10           | 10  | uA    | 0 <vin<vdd td="" v<=""></vin<vdd>  |
| Iol       | Output leakage current (three-<br>state) | -10          | -10 | uA    |                                    |

### Table 7–1: CMOS DC Characteristics

# 7.2 ECL 100K Mode

In ECL 100K mode a combination of on-chip and off-chip circuits provide ECL 100K compatible interfaces.

# 7.2.1 Power Supply

In ECL 100K mode the VDD pins are connected to 0.0V, and the VSS pins are connected to -3.3V, +/- 5%.

# 7.2.2 Reference Supply

In ECL 100K mode the VREF\_H input is connected to a reference supply at VDD-1.3V. The best way to generate the reference supply is to use the VBB output provided by several chips, such as the ECLinPS MC100E111.

### 7.2.3 Inputs

In ECL 100K mode inputs appear to be ordinary ECL 100K inputs, with the exception that they lack the pull down resistor that is normally present in ECL 100K circuits.

### 7.2.4 Outputs

In ECL 100K mode external resistors create the correct ECL 100K levels. The following stylized circuit is used.



### 7.2.5 Bidirectionals

In ECL 100K mode the bidirectional pins should be converted into unidirectional input and output busses as close to DECchip 21164-AA as possible. The DECchip 21164-AA chip bidirectional bus is buffered and driven onto the system output bus. The system input bus is driven onto DECchip 21164-AA's bidirectional bus using cut-off drivers controlled by the CPU's output enables.

The same resistor network used on output pins is used on bidirectional pins.

# 7.3 Power Dissipation

Table 7-2 Shows the estimated power maximum consumption at 286Mhz. Power consumption scales linearly with frequency in the frequency range 225Mhx to 312Mhz.

| Table 7–2: DECchip 21164-AA Estimated Power Dissipation @Vdd=3 | .45V |
|----------------------------------------------------------------|------|
|----------------------------------------------------------------|------|

| Speed   | Min | Тур | Max | Units |  |
|---------|-----|-----|-----|-------|--|
| 286 Mhz | TBD | TBD | 60  | Watts |  |

.

# 7.4 Revision History

| Who         | When                 | Description of change |  |  |  |
|-------------|----------------------|-----------------------|--|--|--|
| Pete Bannon | December 16,<br>1991 | Include EV4 text      |  |  |  |
| JHE         | December 16,<br>1992 | Updates               |  |  |  |

Table 7–3: Revision History

7–6 DC Characteristics

.

# Chapter 8

# **AC Characteristics**

This chapter is TBD.

# 8.1 Revision History

# Table 8–1: Revision History

| Who | When | Description of change |
|-----|------|-----------------------|
|     |      | TBD                   |

TBD

DEC Chip 21164-AA (EV5 CPU) Specification, Revision 1.9, December 1992

8–2 AC Characteristics

٠

# **Chapter 9**

# Pinout

# 9.1 DECchip 21164-AA Pinout Overview

The DECchip 21164-AA chip is contained in the 503 pin package. 289 of these pins are used for signals, the remaining pins are used for power and ground. Looking down at the top of the chip the crude pinout of DECchip 21164-AA will look like this:

## Figure 9–1: DECchip 21164-AA Pinout



# 9.2 DECchip 21164-AA Signal Pins

The following table is a list of the signal pins on the DECchip 21164-AA chip. In the table, all the pins listed as "O" are output only, those listed as "I" are input only, and those listed as "B" are bidirectional and tri-statable.

# DEC Chip 21164-AA (EV5 CPU) Specification, Revision 1.9, December 1992

| CLK_IN_H,L<br>CPU_CLK_OUT_H | I     | CPU clock input     | must be clocking | 2 |
|-----------------------------|-------|---------------------|------------------|---|
| CPU_CLK_OUT_H               |       |                     |                  | - |
|                             | 0     | CPU clock output    | clock ouptut     | 1 |
| SYS_CLK_OUT1_H,L            | 0     | System clock output | clock output     | 2 |
| SYS_CLK_OUT2_H,L            | 0     | System clock output | clock output     | 2 |
| REF_CLK_IN_H,L              | 0     | System clock input  |                  | 2 |
| SYS_RESET_L                 | $I^1$ | Reset               | •                | 1 |

Table 9–1: Clock Pins

<sup>1</sup>This input may be driven asynchronously; an internal synchronizer is implemented.

| Signal Name        | Туре | Function                                                               | Reset State              | Number |
|--------------------|------|------------------------------------------------------------------------|--------------------------|--------|
| ADDR_H<39:4>       | В    | Address bus                                                            | unspecified <sup>1</sup> | 36     |
| CMD_H<3:0>         | В    | Command bus                                                            | NOP <sup>2</sup>         | 4      |
| ADDR_CMD_PAR_H     | В    | Odd parity for address and CMD                                         | NOP <sup>2</sup>         | 1      |
| VICTIM_PENDING_H   | 0    | This miss produced a victim                                            | unspecified              | 1      |
| ADDR_BUS_REQ_H     | I    | System wants to use the ad-<br>dress and command busses                | -                        | 1      |
| CACK_H             | I    | DECchip 21164-AA command taken                                         | must be deasserted       | 1      |
| CFAIL_H            | I    | WRITE_BLOCK_LOCK command<br>failed or request to force Ibox<br>timeout | must be deasserted       | 1      |
| FILL_ERROR_H       | Ι    | request for machine check trap                                         | must be deasserted       | 1      |
| ADDR_RES_H<1:0>    | 0    | DECchip 21164-AA response to CMD                                       | NOP                      | 2      |
| INT4_VALID_H<3:0>  | 0    | write data valid/INT8 read re-<br>quest                                | unspecified              | 4      |
| SCACHE_SET_H<1:0>  | 0    | Scache set allocated                                                   | unspecified              | 2      |
| FILL_H             | I    | Fill warning                                                           | must be deasserted       | 1      |
| FILL_ID_H          | I    | Which fill?                                                            | should be deasserted     | 1      |
| DACK_H             | I    | Data ready or taken                                                    | must be deasserted       | 1      |
| FILL_NOCHECK_H     | I    | Don't check ECC/parity                                                 | should be deasserted     | 1      |
| SYSTEM_LOCK_FLAG_H | Ι    | Current state of the lock flag                                         | should be deasserted     | 1      |
| IDLE_BC_H          | I    | No more CPU accesses to the Bcache                                     | must be deasserted       | 1      |
| DATA_BUS_REQ_H     | I    | System wants to use the data<br>bus                                    | -                        | 1      |
| Section Total      |      |                                                                        |                          | 61     |

 Table 9–2:
 System Interface Pins

<sup>1</sup>Driven or tristated depending on ADDR\_BUS\_REQ\_H at most recent Sysclock edge. If driven, the value is unspecified. <sup>2</sup>Driven or tristated depending on ADDR\_BUS\_REQ\_H at most recent Sysclock edge. If driven, the command is NOP.

| Signal Name        | Туре | Function                                       | <b>Reset State</b> | Number |
|--------------------|------|------------------------------------------------|--------------------|--------|
| INDEX_H<25:4>      | 0    | Bcache index                                   | unspecified        | 22     |
| DATA_H<127:0>      | В    | Data Bus                                       | tristated          | 128    |
| DATA_CHECK_H<15:0> | В    | INT8 ECC check bits or byte parity             | tristated          | 16     |
| TAG_DATA_H<38:20>  | В    | B-cache tag (1MB min)                          | tristated          | 19     |
| TAG_DATA_PAR_H     | В    | Tag parity                                     | tristated          | 1      |
| TAG_VALID_H        | В    | Tag valid                                      | tristated          | 1      |
| TAG_SHARED_H       | В    | Tag shared                                     | tristated          | 1      |
| TAG_DIRTY_H        | В    | Tag dirty                                      | tristated          | 1      |
| TAG_CTL_PAR_H      | В    | Tag V/S/D parity                               | tristated          | 1      |
| TAG_RAM_OE_H       | 0    | Tag RAM output enable as-<br>serted for reads  | asserted           | 1      |
| TAG_RAM_WE_H       | 0    | Tag RAM write enable                           | deasserted         | 1      |
| DATA_RAM_OE_H      | 0    | Data RAM output enable as-<br>serted for reads | asserted           | 1      |
| DATA_RAM_WE_H      | 0    | Data RAM write enable                          | deasserted         | 1      |
| Section Total      |      |                                                |                    | 194    |

Table 9–3: Bcache Pins

| Signal Name        | Туре                  | Function                            | Reset State                       | Number |
|--------------------|-----------------------|-------------------------------------|-----------------------------------|--------|
| IRQ_H<3:0>         | I1                    | Interrupt requests                  | Sysclock divisor ra-<br>tio input | 4      |
| SYS_MCH_CHK_IRQ_H  | <b>I</b> <sup>1</sup> | System machine check inter-<br>rupt | Sysclock delay input <sup>2</sup> | 1      |
| PWR_FAIL_IRQ_H     | I1                    | Power failure interrupt             | Sysclock delay input <sup>2</sup> | 1      |
| MCH_HLT_IRQ_H      | I1                    | Halt request                        | Sysclock delay input <sup>2</sup> | 1      |
| PORT_MODE_H<1:0>   | Ι                     | Test port mode                      | -                                 | 2      |
| TDI_H              | Ι                     | IEEE 1149.1 Serial Data Input       | -                                 | 1      |
| TDO_H              | 0                     | IEEE 1149.1 Serial Data Output      | -                                 | 1      |
| TMS_H              | I                     | IEEE 1149.1 Test Mode Select        | -                                 | 1      |
| TCK_H              | Ι                     | IEEE 1149.1 Test Clock              | •                                 | 1      |
| TRST_L             | I                     | IEEE 1149.1 Test Reset              | should be asserted <sup>3</sup>   | 1      |
| TEST_STATUS_H<1:0> | 0                     | Test status/handshake for BiST      | deasserted                        | 2      |
| SROM_PRESENT_L     | Ι                     |                                     | -                                 | 1      |
| SROM_OE_L          | 0                     | Serial ROM output enable            | $deasserted^4$                    | 1      |
| SROM_CLK_H         | 0                     | Serial ROM clock/Tx data            | $deasserted^4$                    | 1      |
| SROM_DAT_H         | Ι                     | Serial ROM data/Rx data             | -                                 | 1      |
| DC_OK_H            | $I^1$                 | Power and clocks ok                 | -                                 | 1      |
| VREF_H             | Ι                     | Input reference                     | -                                 | 1      |
| ECL_OUT_H          | Ι                     | ECL outputs                         | -                                 | 1      |
| PERF_MON_H         | I1                    | Performance monitor input           | -                                 | 1      |
| Section Total      |                       |                                     |                                   | 24     |

| <b>Table 9-4:</b> | Interrupt | and | Misc. | Pins |
|-------------------|-----------|-----|-------|------|
|-------------------|-----------|-----|-------|------|

| Chip Total | 289 |
|------------|-----|
|            |     |

<sup>1</sup>This(These) input(s) may be driven asynchronously; an internal synchronizer is implemented.

<sup>2</sup>Input for SYS\_CLK\_OUT2\_H,L delay relative to SYS\_CLK\_OUT1\_H,L.

<sup>3</sup>TRST\_L can be asserted during normal operation to ensure the IEEE 1149.1 port remains inactive. The pin has special functions in test modes. See Chapter 11.

<sup>4</sup>If PORT\_MODE\_H<0> is asserted, this output is unspecified.

# 9.3 Revision History

Table 9–5: Revision History

| Who         | When       | Description of change              |
|-------------|------------|------------------------------------|
| Pete Bannon | 3/22/92    | New pinout                         |
| Pete Bannon | 4/22/92    | Change assertion of TAG_WE, TAG_OE |
| Pete Bannon | 10/22/92   | Update test pins                   |
| JHE         | 4-DEC-1992 | Add reset information.             |

# Chapter 10

The Package

TBS

This chapter is To Be Supplied.

# **10.1 Revision History**

 Table 10–1:
 Revision History

| Who | When                                   | Description of change |   |
|-----|----------------------------------------|-----------------------|---|
| -   | ······································ |                       | - |

.

.

10-2 The Package

•

•

# Chapter 11

# **Test Interface and Testability Features**

# 11.1 Introduction

The DECchip 21164-AA CPU chip's testability features address broad issues of providing costeffective and thorough testing of DECchip 21164-AA through its life cycle. Some specific goals supported by DECchip 21164-AA testability features include:

- Chip debug.
- Efficient and thorough testing of embedded RAM arrays.
- Built-in Self Repair (BiSr) of instruction cache (ICache) and support for reduced probe test for efficient and low cost wafer probe testing.
- High fault coverage chip manufacturing test.
- Effective burn-in test.
- Module assembly verification test via IEEE 1149.1 architecture.
- Automatic power-on Built-in Self-test (BiSt) of the ICache.
- Limited support for concurrent fault detection in fault tolerant system that employ duplicate DECchip 21164-AAs.

The testability features included on DECchip 21164-AA include ICache self-test and self-repair, internal Linear Feedback Shift Registers (LFSRs) and scan observability registers, support for reduced probe count wafer probe test, IEEE 1149.1 test access port and boundary scan register, and several other test features. DECchip 21164-AA also includes a comprehensive test interface port that permits efficient access to the chip's testability and diagnosability features during debug and manufacturing testing phases.

# 11.2 Test Port

Test Interface Port on DECchip 21164-AA consists of 13 dedicated pins that support three port interface modes: 1) Normal mode, 2) Manufacturing test mode, and 3) Debug test mode. Table 11–1 summarizes the test port pins and their functions in the three modes.

|                  |     | <b>Normal Function</b>    |     | Manufacturing                    |     | Debug          |     |
|------------------|-----|---------------------------|-----|----------------------------------|-----|----------------|-----|
| Pin Name         | Тур | Signal                    | Тур | Signal                           | Тур | Signal         | Тур |
| PORT_MODE_H<1>   | I   | LOW                       | I   | LOW                              | I   | HIGH           | I   |
| PORT_MODE_H<0>   | В   | LOW                       | Ι   | HIGH                             | I   | dbg_data_h< 8> | 0   |
| SROM_PRESENT_L   | В   | <pre>srom_present_l</pre> | Ι   | test control                     | I   | dbg_data_h< 7> | 0   |
| SROM_DATA_H      | Ι   | srom_data_h/Rx            | Ι   | srom_data_h                      | Ι   | srom_data_h/Rx | I   |
| BROM_CLK_H       | 0   | srom_clk_h/Tx             | 0   | obs_data_h< 8>                   | 0   | srom_clk_h/Tx  | 0   |
| SROM_OE_L        | 0   | srom_oe_l                 | 0   | obs_data_h< 7>                   | 0   | srom_oe_l      | 0   |
| TDI_H            | В   | tdi_h                     | I   | obs_data_h< 6>                   | 0   | dbg_data_h< 6> | 0   |
| rdo_h            | 0   | tdo_h                     | 0   | obs_data_h< 5>                   | 0   | dbg_data_h< 5> | 0   |
| TMS_H            | В   | tms_h                     | I   | obs_data_h< 4>                   | 0   | dbg_data_h< 4> | 0   |
| ICK_H            | В   | tck_h                     | Ι   | obs_data_h< 3>                   | 0   | dbg_data_h< 3> | 0   |
| TRST_L           | В   | trst_l                    | I   | obs_data_h< 2>                   | 0   | dbg_data_h< 2> | 0   |
| TEST_STATUS_H<0> | 0   | test status               | 0   | test status / obs_<br>data_h< 1> | 0   | dbg_data_h< 1> | 0   |
| TEST_STATUS_H<1> | 0   | test status               | 0   | test status / obs_<br>data_h< 0> | 0   | dbg_data_h< 0> | 0   |

Table 11-1: DECchip 21164-AA Test Port Pins and Port Modes

### 11.2.1 Normal Test Interface Mode

The test port is in normal test interface mode when the PORT\_MODE\_H<1:0> are tied to 00. This is the default mode. In this mode the test port supports a serial ROM interface, a serial diagnostic terminal interface, and an IEEE 1149.1 test access port.

#### 11.2.1.1 SROM Port

SROM\_PRESENT\_L, SROM\_DATA\_H, SROM\_OE\_L, SROM\_CLK\_H constitute the SROM interface.

If serial ROMs (such as an AMD Am1736) are present in the system, the pin SROM\_PRESENT\_ L may be pulled down on the board. DECchip 21164-AA samples this pin during the system reset. If the pin is pulled down during the system reset, then the DECchip 21164-AA's reset sequence automatically loads its ICache from serial ROMs before executing its first instruction. If SROM\_PRESENT\_L is pulled-up during system reset, the SROM load is disabled. In this case the ICache valid bits are cleared by the reset sequence, causing the first instruction fetch to miss the ICache and seek the instructions from the off chip memory.

During SROM load:

• SROM\_OE\_L signal supplies the output enable to the serial ROM, serving both as an output enable and as a reset (refer to the serial ROM specifications for details).

DECchip 21164-AA asserts this signal low for the duration of ICache load from serial ROM. Once the load is complete, the signal is deasserted.

- SROM\_CLK\_H output signal supplies the clock to the ROM that causes it to advance to the next bit. The cycle time of this clock is 128 times the cpu clock rate.
- SROM\_DATA\_H pin reads the serial ROM data.

The serial ROMs can contain enough ALPHA code to complete the configuration of the external interface (e.g. setting the timing on the external cache RAMs, and diagnose the path between the CPU chip and the real ROM).

DECchip 21164-AA is in PALmode following the deassertion of system reset and the conclusion of ICache self-test - this gives the code loaded into the ICache access to all of the visible state within the chip.

See Section 11.4 for the details of the ICache fill operation from SROMs.

#### 11.2.1.2 Serial Terminal Port

Once the data in the serial ROM has been loaded into the ICache, the three SROM Port pins turn into a simple parallel I/O pins that can be used to drive a diagnostic terminal such a RS422.

When the serial ROM is not being read, the SROM\_OE\_L output signal is false. The serial diagnostic terminal port is enabled if this pin is wired to the active high enable of an RS422 (or 26LS32) receiver driving onto SROM\_DATA\_H and to the active high enable of an RS422 (or 26LS31) driver driven from srom\_clk\_h pin. The CPU allows SROM\_DATA\_H to be read and SROM\_CLK\_H to be written by PALcode. This supports a bit-banged serial interface.

IPRs associated with this interface are described in the chapter on PAL Code and IPRs.

#### 11.2.1.3 IEEE 1149.1 Test Access Port

TDI\_H, TDO\_H, TCK\_H, TMS\_H and TRST\_L make up the IEEE 1149.1 test access port. This port accesses DECchip 21164-AA chip's boundary scan register and chip tri-state functions for board level manufacturing test. The port also allows access to the die identification code. The port is compliant with all requirements of IEEE 1149.1 test access port. See IEEE Std. 1149.1 "A Test Access Port and Boundary Scan Architecture" for the full description of the specification.

Figure 11–1 shows the user-visible features from this port.

#### **TAP Controller**

The TAP Controller contains a state machine. It interprets IEEE 1149.1 protocols received on TMS\_H signal and generates appropriate clocks and control signals for the testability features under its jurisdiction.

#### **Bypass Register**

The Bypass Register is a 1-bit shift register. It provides a short single-bit scan path through the port (chip).



#### **Instruction Register**

The Instruction Register (IR) is 3-bits wide. It supports EXTEST, SAMPLE, BYPASS, HIGHZ and DIE\_ID instructions. Table 11–2 summarizes the instructions and their functions.

During the capture operation, the shift register stage of IR is loaded with '001'. This automatic load feature is useful for testing the integrity of the IEEE 1149.1 scan chain on module.

| IR< 2:0> Name |        | Scan Register Selected | Remarks                             |
|---------------|--------|------------------------|-------------------------------------|
| 111           | BYPASS | Bypass Register        | Default.                            |
| 110           | HIGHZ  | Bypass Register        | Tristates all I/O and output pins   |
| 101           | BYPASS | Bypass Register        | Duplicate BYPASS                    |
| 100           | HIGHZ  | Bypass Register        | Duplicate HIGHZ                     |
| 011           | DIE_ID | Die ID Register        |                                     |
| 010           | SAMPLE | Boundary Scan          |                                     |
| 001           | DIE_ID | Die ID Register        | Duplicate DIE_ID                    |
| 000           | EXTEST | Boundary Scan Register | BSR drives chip I/O and output pins |

Table 11–2: Instruction Register

Note that the SAMPLE, BYPASS and DIE\_ID instructions are non-intrusive. That is, they could be operated while chip is doing its normal functions. EXTEST and HIGHZ instructions force chip's internal logic to a reset state.

#### **Die-ID Register**

Die-ID Register is 32-bit scan register. It shifts out fuse-programmed die information. The format and content of the information to be programmed will be determined by the manufacturing.

#### **Boundary Scan Register**

Boundary Scan Register on DECchip 21164-AA is approx. 286 **TBD** bits long. It supports SAMPLE and EXTEST instructions. See Section 11.9 for the organization of this register.

#### Effects of EXTEST and HIGHZ instruction

The effect of EXTEST or HIGHZ instruction on DECchip 21164-AA chip is as follows

- EXTEST instruction allows the boundary scan register to have complete control over the output and bidirectional pins. HIGHZ instruction forces all output and bidirection pins to a high impedance state.
- The effect on clock input and output pins is **TBD**.
- The internal chip logic is forced to a reset state. This prevent the cpu from reacting to irrelevant test data that may appear at the chip's inputs.

### 11.2.1.4 Test Status Pins

Two test status pins TEST\_STATUS\_H<1:0> pins are used for extracting of test status information from the chip. System reset drives both test status pins low.

• During ICache BiSt

TEST\_STATUS\_H<<0> is asserted high to indicates that the ICache BiSt has failed. TEST\_ STATUS\_H<1> is asserted high to indicate presence of more than two failing rows.

The start of ICache BiSt forces TEST\_STATUS\_H<0> pin to go high. If the ICache BiSt passes, TEST\_STATUS\_H<0> is deasserted, otherwise it remains asserted. TEST\_STATUS\_ H<1> is asserted as high as soon as third bad ICache row is detected. This nay be used to detect unrepairable ICache early, thus reducing average test time. System users may ignore this pin.

During On-Line LFSR mode

When the internal LFSRs are turned on in on-line mode (ON\_OBL\_1 command described later), the TEST\_STATUS\_H<0> outputs the quotient generated by the observability LFSRs. A new quotient bit is observed with every system clock rising edge. This feature is useful to people implementing fault tolerant systems. Also, the feature can be exploited for the burn-in and life test for monitoring failures. See Section 11.6.2 for more details.

• IPR Read/writes to Test Status Pins PALcode can write to TEST\_STATUS\_H<1> and can read the TEST\_STATUS\_H<0> via hardware IPR access. See Chapter 3.

The default operation for TEST\_STATUS\_H<0> pin is to output the BiSt result. The default operation for TEST\_STATUS\_H<1> pin is to output the IPR written value.

# **11.2.2 Manufacturing Test Interface Mode**

The DECchip 21164-AA test port is in Manufacturing Test Interface Mode when PORT\_MODE\_H<1:0> are tied to 01 (binary). This mode allows control of ICache test features, internal LFSR and Scan Observability Registers, and efficient byte-serial read-out of observability features, including ICache bit map. Figure 11-2 shows the user-visible features during manufacturing test interface mode.

The SROM\_PRESENT\_L pin is used for test control. Asserting a high on this pin initiates a test operation state. In this state, DECchip 21164-AA chip automatically loads the 8-bit Test Command Register and executes all required test actions, including any additional shift operations. Input test data is serially fed at the SROM\_DATA\_H pin. Test results from chip are shifted out byte-serially (9 bits at a time) on the test pins.

The SROM\_PRESENT\_L pin may be returned to low once test shift operation has been initiated. A new test command may be loaded by once again asserting a high on SROM\_PRESENT\_L pin after all actions of the previous command have been completed.

When the manufacturing test interface mode is activated, all inputs to the IEEE 1149.1 port are driven with their default values.

#### Test Command Register (TCR)

Test Command Register is 8 bits wide. Table 11–3 summarizes the test commands and their actions.

# DEC Chip 21164-AA (EV5 CPU) Specification, Revision 1.9, December 1992





|            | Command      |                                                                                                                                                                                                                                                                                 |
|------------|--------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| TCR< 7:0>  | Mnemonic     | Action                                                                                                                                                                                                                                                                          |
| 00 000 XXX | RD_ICache    | Reads out ICache contents on test port. Useful for debug/bit mapping etc.                                                                                                                                                                                                       |
| 00 001 00X | WR_IC_F0     | Writes ICache serially. Data shifts at system clock rate. Internal<br>chip reset extended. Used for subsequent read out for ICache test<br>purposes.                                                                                                                            |
| 00 001 01X | WR_IC_F1     | Writes ICache serially. Data shifts at system clock rate. Internal<br>chip reset NOT extended. Used for speedier ICache fill during man-<br>ufacturing.                                                                                                                         |
| 00 001 10X | WR_IC_S0     | Writes ICache serially. Data shifts at slow rate. (cycle time = cpu clock cycle * 128) rate. Internal chip reset extended.                                                                                                                                                      |
| 00 001 11X | WR_IC_S1     | Writes ICache serially. Data shifts at slow rate. (cycle time = cpu clock cycle * 128) rate. Internal chip reset not extended. This instruction is forced by CPU during power-on/reset sequence to automatically load from SROM.                                                |
| 00 100 XXX | LD_BKG       | Loads ICache fill Scan path with background pattern. This instruc-<br>tion is forced by the BiSt logic.                                                                                                                                                                         |
| 00 101 XXX | SC_FRCAM     | Scans out Failing Row CAM on test port. This instruction is forced<br>by the BiSt logic.                                                                                                                                                                                        |
| 00 110 XXX | SC_BIST      | Scans out portions of BiSt logic for testing the BiSt logic.                                                                                                                                                                                                                    |
| 00 010 XXX | RU_BIST      | Runs ICache BiSt. This instruction is forced by the power-on/reset sequence.                                                                                                                                                                                                    |
| 00 011 XXX | RU_RETENT    | Runs ICache Retention BiSt.                                                                                                                                                                                                                                                     |
| 00 111 XXX | IC_NOP       | No ICache action. However, forces internal chip reset.                                                                                                                                                                                                                          |
| 01 ss dddd | SC_src_delay | Scans out selected register. $src = 0X$ selects LFSR scan path. $src = 1X$ selects internal scan register. delay selects cycle (0 to 15) to be captured for observation. The command performs the capture-scan out sequence continuously, until another test command is loaded. |
| 10 X0 0XXX | OFF_OBL      | Turns off Observability LFSR data compression mode.                                                                                                                                                                                                                             |
| 10 X0 10XX | ON_OBL_0     | Turns On the Observability LFSR data compression in off-line mode                                                                                                                                                                                                               |
| 10 X0 11XX | ON_OBL_1     | Turns On the Observability LFSR data compression in On-line mode                                                                                                                                                                                                                |
| 10 X1 0XXX | OFF_CBL      | Turns Off (if previously turned on) the intrusive controllability fea<br>tures, LFSR pattern generators, etc on the chip.                                                                                                                                                       |
| 10 X1 1XXX | ON_CBL       | Turns On the intrusive controllability features (such as LFSR pattern generators) on the chip.                                                                                                                                                                                  |
| 11 XX XXXX |              | Reserved.                                                                                                                                                                                                                                                                       |
| 11 11 1111 | PRT_NOP      | No test port actions.                                                                                                                                                                                                                                                           |

 Table 11–3:
 Test Command Register

11–8 Test Interface and Testability Features

#### Notes:

- 1. The internal chip logic is forced to reset during all ICache test commands.
- 2. The cycle time for shifting data during SROM load is 128 \* cpu clock cycle. Assuming the fastest DECchip 21164-AA cycle time of 3ns, this translates to the fastest shift rate of 384ns..
- 3. The scan and LFSR observability registers can be operated and read out without interfering with normal system operation.

#### **ICache Fill Scan Register**

This is a 200-bit long scan register used for filling the ICache serially from SROMS or tester. See section Section 11.4 for the details of the serial fill operation.

### ICache Read Scan Register

This is 100-bit long read scan register path used for dumping the ICache contents. See section Section 11.4 for the details of the serial read operation.

### **Observability LFSRs**

This is TBD-bit register used for enhancing fault coverage of manufacturing test. See section Section 11.6 for details.

#### **Observability Scan Registers**

This is TBD-bit register used primarily for chip debug. See section Section 11.7 for details.

#### **Controllability Features**

Test Port also has the provision for supporting internal controllability features. If these features are provided, they are turned on and off via the ON\_CBL and OFF\_CBL test commands.

#### **FRCAM Scan Register**

This scan register is 13-bit long. It consists of 12 bits of failing row CAM and unrepairable\_ic flag. See Section 11.3 for more details.

#### Port Observability Register

. This is 9-bit serial-in Parallel-out observability register. The parallel outputs of the register update the corresponding test port pins every system clock cycle. This allows tester to observe 9-bits of scan data simultaneously. This reduces the vector depth requirement on chip tester's failure capture memory (DFM) by a factor of eight.

The internal observability LFSRs and the Internal Scan Registers shift at the chip's internal clock rate. The scan paths in ICache test logic shift at the system clock rate.

#### 11.2.3 Debug Test Interface Mode

DECchip 21164-AA test port is in Debug Test Interface Mode when PORT\_MODE\_H<1> is tied to 1. Debug Test Interface Mode allows the critical chip nodes to be monitored in parallel.

Signals to be observed on parallel port are selected by TBD IPR bits. (See chapter on PALCode and IPRs for the details.)

#### **Restrictions of parallel debug test port**

- 1. When parallel debug port is activated, all inputs corresponding to the normal test input pins are fed with their default values.
- 2. The PORT\_MODE\_H<1> pin allows to switch back and forth between the normal test port and the parallel debug port.
- 3. Parallel debug port is designed to support chip/system debugging in prototype system environments only. Some small logic may be required to ensure that there is no interference with other chips connected to the test port.

## 11.2.4 Activating Debug/Manufacturing Port Modes in a System

Both Debug and Manufacturing port modes can be activated in a system by incorporating a few jumpers, and if necessary, some support logic. Jumpers are required as some of the test pins are shared for outputing the debug/observability information from the chip. Jumpers prevent observability data from interfering with the operation of the other chips connected to the shared test pins. Support logic is required only if system wants to load test commands automatically through the manufacturing test port mode, for example, to turn on/off the observability LFSRs in on-line mode.

Figure 11-3 shows a typical module and the places where jumpers may be necessary to activate the debug and manufacturing test port modes.

# 11.3 ICache BiSt

The DECchip 21164-AA ICache is tested by Built-in Self-test that implements a full march algorithm. The self test logic covers all three (Data, BHT, and TAG) ICache arrays.

ICache BiSt is invoked automatically upon deassertion of system reset if the BiSt is not bypassed. BiSt is bypassed if the PORT\_MODE\_H<1> pin is asserted high during system reset.

BiSt Bypass feature allows ICache BiSt as well as the Built-in Self Repair to be bypassed during. debug and in between pattern runs on testers, if so desired.

BiSt runs for **TBD** cycles.

The Go/NoGo result of BiSt is made available on TEST\_STATUS\_H<0> pin. TEST\_STATUS\_ H<0> is forced low by the system reset and high with the start of the BiSt. If at the end of BiSt, any of the ICache rows are bad, the pin remains asserted high, otherwise it is deasserted. Software can read this status through an IPR. If ICache fails in more than two rows, TEST\_ STATUS\_H<1> is asserted high. This pin is cleared by the the system reset.

11–10 Test Interface and Testability Features

#### DEC Chip 21164-AA (EV5 CPU) Specification, Revision 1.9, December 1992



Figure 11–3: Tes Port Connections on Module

#### **Built-in Self-Repair**

When the BiSt is invoked on wafers that have not gone through the fuse repair process, the ICache BiSt sequence automatically performs the following steps.

- Perform the BiSt. Store up to two the failing row addresses in the failing row CAMs.
- Self repair the ICache data array.
- Repeat BiSt.
- Dump the content of the failing row CAMs on the test port.

The repair information shifted out consists of the following bits.

| Field Name           | Extent |                                 |  |  |  |  |
|----------------------|--------|---------------------------------|--|--|--|--|
| unrepairable_ic_flag | 0      | High = unrepairable cache       |  |  |  |  |
| CAM_0 Valid Flag     | 1      | High = 1st repair address valid |  |  |  |  |
| CAM_0 Reg            | 2:6    | 1st repairable row-pair address |  |  |  |  |
| CAM_1 Valid Flag     | 7      | High = 2nd repair address valid |  |  |  |  |
| CAM_1 Reg            | 8:12   | 2nd repairable row pair address |  |  |  |  |
|                      |        |                                 |  |  |  |  |

| Table 11-4 | : FRCAN | Scan Re | eaister ( | Organization |
|------------|---------|---------|-----------|--------------|
|------------|---------|---------|-----------|--------------|

#### Notes:

- The automatic BiSt and BiSR run identically under the normal and the manufacturing test interface modes.
- Built-in self-repair feature is available only prior to laser repair process. BiSt logic uses a fuse programmed internal signal to determine whether the BiSR is required.

### 11.4 ICache Serial Write and Read Operations

#### **Serial Write Operation**

The ICache can be written serially from the SROM or for testing purposes from the SROM port pins. On DECchip 21164-AA, all ICache bits, including each block's tag, ASN, ASM, valid and branch history bits can be loaded serially. Once the serial load has been invoked (either automatically by the chip reset sequence, or via the IC\_WR\_xx command from the manufacturing test port), the entire cache is loaded automatically from the lowest to the highest addresses.

The serial bits are received in a 200-bit long Fill Scan Path from which they are written in parallel into the ICache address. The Fill scan path is organized as shown in Figure 11-4. The farthest bit (tag < 42) is shifted in first and the nearest bit (BHT < 7>) is shifted in last. Note that the data and predecode bits in the data array are interleaved.

The automatic serial fill invoked by the chip reset sequence occurs at the slower SROM clock rate (period = cpu clock rate \* 128). The serial invoked by IC\_WR\_xx can occur at the SROM rate or at the system clock rate. In either case, the ICache fill operation is automatic.

#### Serial Read Operation

All three ICache arrays can be read out serially for testing purposes. Manufacturing port's IC\_RD command initiates the serial read operation. Arrays are dumped from the lowest to the highest address. The data is first received into a READ Scan Path (RSP), from which it is serially shifted to the test port's Port Observability Register at the system clock rate. The data can be read out at the test port pins 9-bits at a time.

#### Figure 11–4: SROM Fill Scan Path Bit Order

```
SROM DATA H
                   serial input ->
BHT Array
                    7 ->
                         6 -> ... ->
                                       0 ->
                  127 -> 95 -> 126 -> 94 -> ... -> 96 -> 64 ->
Data
Predecodes
                   19 -> 14 -> 18 -> 13 -> ... -> 15 ->
                                                           9 ->
                   b ->
Data parity
Predeocde parity
                    b ->
                                8 -> 3 -> ... -> 5 ->
Predecodes
                   9 ->
                          4 ->
                                                           0 ->
                   63 -> 31 -> 62 -> 30 -> ... -> 32 ->
Data
                                                          0 ->
Tag Parity
                   b ->
Tag Valids
                    0 ->
                          1 ->
TAG ASM
                   b ->
TAG ASN
                    0 ->
                          1 -> ... ->
                                      7 ->
TAGS
                   13 -> 14 -> ... -> 42
  b = Single bit signal
```

Figure 11–5: Read Scan Path Bit Order

```
Serial out
                    serial out <-
BHT array leader dmy <- err <- rfl <- rf0 <-
                      7 <- 6 <- ... <-
BHT Bits
                                            0 <-
Data array leader dmy <- err <- rfl <- rf0 <-
                   d37 <- 36 <- ... <-
Data Bits
                                           0 <--
Tag array leader dmy <- err <- rfl <- rf0 <-
Tag Parity
                      b <-
Tag Valids
                     0 <-
                            1 <--
TAG ASM
                      b <-
                      0 <-
                            1 <- ... <- 7 <-
TAGS
                     13 <- 14 <- ... <- 42
    ъ
            = Single bit signal
    dmy = Dummy bit. Makes RSP for the array even bit length
err = Error bit. Useful for BiSt logic testability
    rfl,rf0 = Used by BiSt logic to store reference patterns
```

The RSP is 100-bits long and consists of three segments: 12-bit BHT segment, 42-bit Data array segment, and 46-bit Tag array segment. Besides the bits that capture data from ICache array, each segment has 4 extra bits used by the BiSt logic.

The 150 bits of data from the data array are read into the 38 bits of Read Scan Path via a multiplexer which selects one of the four physically adjacent data bits. The entire array is read by making four passes through the ICache addresses. (Note that this causes the BHT and tag arrays to be read four times!) This necessitates that the data dumped by the serial ICache read operation must be carefully reconstructed before interpreting them.

The organization of the bits in the read scan path is shown in Figure 11-5.

### 11.5 SCache/DCache Test Features

See PALCode and IPR chapter and the cache section in chapter on DECchip 21164-AA Microarchitecture. Also, see Section 11.7 for description of SCache scan chain.

SCache Test and Repair Algorithm TBD.

# 11.6 Observability LFSRs (OBLs)

### 11.6.1 Organization

DECchip 21164-AA implements several Observability LFSRs (OBLs) to enhance the fault coverage. The OBLs are turned and off on by the ON\_OBL\_x and OFF\_OBL test commands described in Section 11.2.2. LFSRs also operate as ordinary scan registers. They are read out by the SC\_ src\_delay test command.

All LFSRs in DECchip 21164-AA are accessed from a single scan chain. Figure 11-6 summarizes the LFSR organization. The details of the signals captured and the LFSR design (feedback taps) are given in Table 11-5.

#### Figure 11–6: LFSR Chain Organization



#### Table 11–5: Observability LFSR Organization

LFSR Name: Backup Cache Index Pins

Size: 28 bits

Feedback polynomial: 220000001(Octal, Taps bits 28 and 25)

Access Chain Number: ....

| Bit # | Signal name           | Remarks          |  |
|-------|-----------------------|------------------|--|
| 28    | feedback              |                  |  |
| 27:26 | p%ev5_sc_set_h< 01:1> | unprobed outputs |  |
| 25 .  | feedback              |                  |  |
| 24:3  | p%bc_index_h< 4:25>   | Unprobed outputs |  |
| 2:1   | p%ev5_adr_res_h< 1:0< | Unprobed outputs |  |

11–14 Test Interface and Testability Features

| Table 11 | -5 (Cont.): Observability | LFSR Organization |
|----------|---------------------------|-------------------|
| 0        | available                 |                   |
| LFSR N   | ame:                      |                   |
| Size:    |                           |                   |
| Feedbac  | ek polynomial:            |                   |
| Access   | Chain Number:             |                   |
| Bit #    | Signal name               | Remarks           |
| 0        | Tbd                       | TBD               |
|          |                           |                   |

Table 11–5 (Cont.): Observability LFSR Organization

(As the design work progresses, more details of LFSR operation will be added here.)

### 11.6.2 On-line LFSR Operation

DECchip 21164-AA supports an on-line testing mode via its observability LFSRs. The quotient bit generated by the observability LFSR in IBOX is brought out to the TEST\_STATUS\_H<0> pin when the LFSRs are turned on in an on-line mode (ON\_OBL\_1 test command). Monitoring and comparing this pin with the expected serial stream can provide an indication of DECchip 21164-AA health on the fly.

This feature can be exploited by the fault tolerant systems that employ multiple redundant DECchip 21164-AAs. They can compare the TEST\_STATUS\_H<0> on two or more DECchip 21164-AAs performing identical tasks. The same principle can be extended in other test applications such as burn-in test for monitoring failures.

During the on-line test mode, a new quotient bit is observed with every system clock rising edge. Since the observability LFSRs work at the CPU clock rate, not every quotient bit is observed on TEST\_STATUS\_H<0>. This is generally acceptable since typically an error in an input to an LFSR produces a multitude of erroneous quotient bits.

Note that the LFSRs must be turned on only after DECchip 21164-AA initialization has been completed.

# 11.7 Observability Scan Registers (OBSs)

Internal Scan Registers offer observability of debug-critical signals. They are accessed from the test port under the manufacturing test interface mode as described in Section 11.2.2. The capture action of internal scan register occurs **TBD** cpu cycles after the Test Command Register is loaded with the appropriate SCAN command. Table 11–6 gives organization of the DECchip 21164-AA's Observability Scan Registers.

| Scan Ch  | ain Name: SCache                     |                                 |
|----------|--------------------------------------|---------------------------------|
| Size: 16 | 4 bits                               |                                 |
|          | Scan (                               | Chain for Part 1 of SCache:     |
| Bit #    | Signal name                          | Remarks                         |
| 0        | S%IFB_PAR_H<0>                       | LW Parity for Data<31:0>        |
| 1:32     | S%IFB_H<0:31>                        | Data<0:31>                      |
| 33       | S_DIR_CTL%LSEL_WSC_H                 | LW write enable for Data<31:0>  |
| 34:44    | S_DCR%ADDR_7A_L<14:4>                | Address driven to SCache        |
| 45       | S_DCR%HIT_H<2>                       | SET_HIT signal, set 2           |
| 46       | S_DCR%HIT_H<1>                       | SET_HIT signal, set 1           |
| 47       | s_DCR%HIT_H<0> SET_HIT signal, set 0 |                                 |
| 48       | S_DIR_CTL%RSEL_WSC_H                 | LW write enable for Data 63:32  |
| 49:80    | <b>S%IFB_H&lt;32:63&gt;</b>          | Data<32:63>                     |
| 81       | S%IFB_PAR_H<1>                       | LW Parity for Data<63:32>       |
|          | Scan                                 | Chain for Part 2 of SCache:     |
| Bit #    | Signal name                          | Remarks                         |
| 0        | S%IFB_PAR_H<2>                       | LW Parity for Data<95:64>       |
| 1        | S%IFB_H<64:95>                       | Data<64:95>                     |
| 33       | S_DIL_CTL%LSEL_WSC_H                 | LW write enable for Data<95:64> |
| 34:44    | S_DCL%ADDR_7A_L<14:4>                | Address driven to SCache        |
| 45       | S_DCL%HIT_H<2>                       | SET_HIT signal, set 2           |
| 46       | S_DCL%HIT_H<1>                       | SET_HIT signal, set 1           |
| 47       | S_DCL%HIT_H<0>                       | SET_HIT signal, set 0           |
|          |                                      |                                 |

LW write enable for Data<127:96>

LW Parity for Data<127:96>

Data<96:127>

#### Table 11–6: Observability Scan Register Organization

# 11.8 Controllability Features

S\_DIL\_CTL%RSEL\_WSC\_H

S%IFB\_H<96:127>

S%IFB\_PAR\_H<3>

TBD.

48

81

49:80

11–16 Test Interface and Testability Features

# **11.9 Boundary Scan Register**

DECchip 21164-AA Boundary Scan Register is approx. 286 bits long. Table 11-7 gives the boundary scan register organization. The Boundary scan register begins at the TDI\_H pin and traverses in clock-wise direction and ends at TDO\_H pin.

#### **NOT FINAL**

The list below is based on the DECchip 21164-AA die size and pad assignments as of 11/23/92.

| Signal Name          | Туре | Count | BSR Cell     | Remarks          |
|----------------------|------|-------|--------------|------------------|
| P%TDI                | В    | 1     | None         | <u>,</u>         |
| P%SROM_OE_L          | 0    | 1     | out_bcell    |                  |
| P%SROM_CLK_H         | 0    | 1     | out_bcell    |                  |
| P%SROM_DATA_H        | В    | 1     | in_bcell     |                  |
| P%SROM_PRESENT_L     | В    | 1     | in_bcell     | was SROM_DISABLE |
| P%PORT_MODE_H< 0:1>  | Ι    | 2     | in_bcell     |                  |
| P%SYS_RESET_L        | · I  | 1     | in_bcell     |                  |
| P%DC_OK_H            | I    | 1     | in_bcell     |                  |
| P%SYS_MCH_CHK        | I    | 1     | in_bcell     |                  |
| P%PWR_FAIL_IRQ       | Ι    | 1     | in_bcell     |                  |
| P%MCH_HALT_IRQ       | Ι    | 1     | in_bcell     |                  |
| P%IRQ< 3:0>          | I    | 4     | in_bcell     |                  |
| P%CLK_IN_H, _L       | Ι    | 2     | in_bcell     |                  |
| P%CPU_CLK_OUT_H      | 0    | 1     | $out\_bcell$ |                  |
| P%SYS_CLK_OUT_H, _L  | 0    | 2     | out_bcell    |                  |
| P%SYS_CLK2_OUT_H, _L | 0    | 1     | out_bcell    |                  |
| P%ECL_OUT_H          | I    | 1     | in_bcell     |                  |
| P%VREF_H             | Ι    | 1     | in_bcell     |                  |
| P%REF_CLK_IN_H, L    | Ι    | 2     | in_bcell     |                  |
| P%PERF_MON< 0>       | I    | 1     | in_bcell     |                  |
| P%ADDR< 21:5>        | В    | 17    | io_bcell     | U-R corner       |
| P%ADDR< 4>           | В    | 1     | io_bcell     | U-R corner       |
| P%DATA< 063:0>       | В    | 64    | io_bcell     |                  |
| P%DATA_CHECK< 0:7>   | В    | 8     | io_bcell     |                  |
| P%DATA_VALID< 1:0>   | 0    | 2     | out_bcell    |                  |
| P%EV5_SC_SET< 1:0>   | 0    | 2     | out_bcell    | L-R Corner       |
| P%BC_INDEX< 25:4>    | 0    | 22    | out_bcell    | L-R Corner       |

 Table 11–7:
 Boundary Scan Register Organization

| Signal Name           | Туре | Count | BSR Cell  | Remarks       |
|-----------------------|------|-------|-----------|---------------|
| P%EV5_ADDR_RES< 1:0>  | 0    | 2     | out_bcell |               |
| P%IDLE_BC             | Ι    | 1     | in_bcell  |               |
| P%SYS_LCK_FLG         | Ι    | 1     | in_bcell  |               |
| P%DATA_BUS_REQ_H      | Ι    | 1     | in_bcell  |               |
| P%ADDR_BUS_REQ_H      | I    | 1     | in_bcell  |               |
| P%FILL_NOCHK          | Ι    | 1     | in_bcell  | •             |
| P%FILL_ERR            | I ·  | 1     | in_bcell  |               |
| P%FILL_ID_H           | Ι    | 1     | in_bcell  |               |
| P%FILL_H              | Ι    | 1     | in_bcell  | •             |
| P%DACK_H              | I    | 1     | in_bcell  |               |
| P%CFAIL_H             | I    | 1     | in_bcell  |               |
| P%CACK_H              | I    | 1     | in_bcell  |               |
| P%ADDR_CMD_PAR_H      | В    | 1     | io_bcell  |               |
| P%VTM_PENDING         | 0    | 1     | in_bcell  |               |
| P%DATA_RAM_WE         | 0    | 1     | out_bcell |               |
| P%DATA_RAM_OE         | 0    | 1     | out_bcell |               |
| P%TAG_RAM_WE          | 0    | 1     | out_bcell |               |
| P%TAG_RAM_OE          | 0    | 1     | out_bcell |               |
| P%EV5_CMD< 0:3>       | В    | 4     | io_bcell  |               |
| P%TAG_DAT_PAR         | В    | 1     | io_bcell  |               |
| P%TAG_CTL_PAR         | В    | 1     | io_bcell  |               |
| P%TAG_DIRTY           | В    | 1     | io_bcell  |               |
| P%TAG_SHARED          | В    | 1     | io_bcell  |               |
| P%TAG_VALID           | В    | 1     | io_bcell  |               |
| P%BC_TAG< 20:38>      | В    | 19    | io_bcell  | L-L Corner    |
| P%DATA_VALID< 2:3>    | 0    | 2     | out_bcell |               |
| P%DATA_CHECK< 15:8>   | В    | 8     | out_bcell |               |
| P%DATA< 064:127 >     | В    | 64    | io_bcell  |               |
| P%ADDR_H< 39:37>      | В    | 3     | io_bcell  | U-L Corner    |
| P%ADDR_H< 36:22>      | В    | 15    | io_bcell  | U-L Corner    |
| P%spare               |      | 1     | io_bcell  | Captures zero |
| P%TEST_STATUS_H< 1:0> | 0    | 2     | out_bcell |               |
| P%TRST_L              | Ι    | 1     | None      |               |
| P%TCK                 | В    | 1     | None      |               |
| P%TMS                 | В    | 1     | None      |               |
| P%TDO                 | 0    | 1     | None      |               |

Table 11 7 (Cont.), D ..... -\_ alatan Organizati

11–18 Test Interface and Testability Features

**DIGITAL RESTRICTED DISTRIBUTION** 

.

| Signal Name       | Туре | Count | BSR Cell  | Remarks        |
|-------------------|------|-------|-----------|----------------|
| en_for_left_data  | sig  | 1     | out_bcell | tbd            |
| en_for_right_data | sig  | 1     | out_bcell | $\mathbf{tbd}$ |
| en_for_bc_tag     | sig  | 1     | out_bcell | $\mathbf{tbd}$ |
| en_for_??         | sig  | 1     | out_bcell | $\mathbf{tbd}$ |

Table 11–7 (Cont.): Boundary Scan Register Organization

### 11.10 Testability IPRs

The following is the list of IPRs connected to testability features. See chapter on PALCode and IPRs for more details.

- 1. TEST\_STATUS\_H<0> read and TEST\_STATUS\_H<1> write (ICSR).
- 2. Debug port visibility select bits in IPRs (TBD).
- 3. Serial Terminal Port IPRs (SL\_RCV, SL\_XMIT)
- 4. Scache IPRs (SC\_CTL, SC\_ADDR)
- 5. Dcache IPRs (DC\_MODE, DC\_TEST\_CTL, DC\_TEST\_TAG\_TEMP)
- 6. Bcache IPRs (BC\_CONTROL, BC\_TAG\_ADDR)

# 11.11 Test Feature Reset and Initialization

Reset, initialization and defaults of testability features are described through-out this chapter and in the chapter on Reset and Initialization. For convenience, this section summarizes the power-on reset sequence, as it pertains to the testability features for the normal operation. The sequence of events is as follows:

- 1. SYS\_RESET\_L is asserted.
- 2. The values on the SROM\_PRESENT\_L and PORT\_MODE\_H<1> pins are sampled on SYS\_ RESET\_L deassertion.
- 3. If BiSt is bypassed (indicated by a '1' sampled on PORT\_MODE\_H<1>), go to the next step. If the BiSt is not bypassed, keep rest of the chip in reset state. Perform ICache BiSt (and BiSR, if BiSR is required). Clear ICache Tag valid bits at the end of BiSt.
- 4. If SROMs are not present, (indicated by '1' sampled on SROM\_PRESENT\_L), go to the next step.

If SROMs are present, keep rest of the chip in reset state. Load ICache from the SROMs.

5. Deassert internal reset. Fetch the first instruction.

# 11.12 Open Issues

- 1. Should we make chip run w/o external oscillator and with internal PLL during EXTEST and HIGHZ instructions?
- 2. Details of bits in OBL and OBS chains to be defined.
- 3. Details of signals brought to the parallel debug port need to be defined.
- 4. The following additional test feature enhancements on boundary scan are currently being considered
  - CLAMP\_IO Instruction.
  - A ring oscillator mode for the boundary scan.

# 11.13 Revision History

| Who           | When     | Description of change                                                                                                                                     |  |
|---------------|----------|-----------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| Dilip Bhavsar | 2/13/92  | Working draft                                                                                                                                             |  |
| Dilip Bhavsar | 6/25/92  | Working draft                                                                                                                                             |  |
| Dilip Bhavsar | 7/1/92   | Working draft                                                                                                                                             |  |
| Dilip Bhavsar | 9/16/92  | Rev 0.1 Changes: Second test_status_h pin added. TCR siz<br>changed from 4 to 8 bits to program cpu cycle to be capture<br>during scan. Opcodes redefined |  |
| Dilip Bhavsar | 11/23/92 | Rev 1.0 Clean-up and updates                                                                                                                              |  |

 Table 11–8:
 Revision History

DEC Chip 21164-AA (EV5 CPU) Specification, Revision 1.9, December 1992

11–22 Test Interface and Testability Features