# **Chapter Six** Tsung-Han Tsai 1 # **Pipelining** - · Improve performance by increasing instruction throughput - Multiple instructions are overlapped in execution Time 6 PM 7 8 9 10 11 12 1 2 AM Task order Multicycle version ## **Pipelining** - Ideal speedup is number of stages in the pipeline. Do we achieve this? - Example on Page 438. Pipelining improve throughput not execution time of an individual Ins. 2 ns 2 ns 2 ns Tsung-Han Tsai 3 #### **Pipelining** - What makes it easy (like MIPS) - all instructions are the same length (some computers have Ins with different length(bits)) - just a few instruction formats - memory operands appear only in loads and stores - What makes it hard? - structural hazards: suppose we had only one memory - control hazards: need to worry about branch instructions - data hazards: an instruction depends on a previous instruction - · We'll build a simple pipeline and look at these issues - We'll talk about modern processors and what really makes it hard: - exception handling - trying to improve performance with out-of-order execution, etc. # **Graphically Representing Single-Cycle Instruction Execution** - Place datapaths on a time line -> Can help with answering questions like: - How many cycles does it take to execute this instruction? - What is the ALU doing during cycle 4? - The relationship among different instructions that are executed $_{f Q}$ Tsung-Han Tsai ### **Pipelined Datapath** - Pipeline registers are used to separate pipeline stage - Each bus with n lines (n bits) use n D-FF as the pipeline registers - All the registers are triggered by a same global clock # **Pipeline Control** - · Pass control signals along just like the data - · We have 5 stages. What needs to be controlled in each stage? - Instruction Fetch and PC Increment - Instruction Decode / Register Fetch - Execution - Memory Stage - Write Back | | Execution/Address Calculation<br>stage control lines | | | | Memory access stage control lines | | | stage control lines | | |-------------|------------------------------------------------------|------------|------------|------------|-----------------------------------|-------------|--------------|---------------------|---------------| | Instruction | Reg<br>Dst | ALU<br>Op1 | ALU<br>Op0 | ALU<br>Src | Branch | Mem<br>Read | Mem<br>Write | Reg<br>write | Mem to<br>Reg | | R-format | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | | lw | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | | sw | Х | 0 | 0 | 1 | 0 | 0 | 1 | 0 | Х | | beq | Х | 0 | 1 | 0 | 1 | 0 | 0 | 0 | Х | Tsung-Han Tsai # **Dependencies** - Problem with starting next instruction before first is finished - dependencies that "go backward in time" are data hazards #### **Software Solution** - · Have compiler guarantee no hazards - · Where do we insert the "nops"? ``` sub $2, $1, $3 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15, 100($2) ``` Problem: this really slows us down! ## **Forwarding** - · Use temporary results, don't wait for them to be written - register file forwarding to handle read/write to same register - ALU forwarding sw \$15, 100(\$2) / what if this \$2 was \$13? Tsung-Han Tsai # Can't always forward - · Load word can still cause a hazard: - an instruction tries to read a register following a load instruction that writes to the same register. · Thus, we need a hazard detection unit to "stall" the load instruction Tsung-Han Tsai 25 ## **Stalling** We can stall the pipeline by keeping an instruction in the same stage #### **Hazard Detection Unit** • Stall by letting an instruction that won't write anything go ## **Branch Hazards** When we decide to branch, other instructions are in the pipeline! We are predicting "branch not taken" - need to add hardware for flushing instructions if we are wrong ## **Dynamic Scheduling** - Dynamic pipeline scheduling: dynamic pipelining by the hardware to avoid pipeline hazards - · The hardware performs the "scheduling" - hardware tries to find instructions to execute - out of order execution is possible - speculative execution and dynamic branch prediction