Assist spectrum designs none physical bit stream blind
Node-to-Network Interface in Scalable Multiprocessors
CS 258, Spring 99
• Input buffer overflow
– N-1 queue over-commitment => must slow sources – reserve space per source (credit)
» when available for reuse?• Ack or Higher level
– Refuse input when full
» backpressure in reliable network
» tree saturation
» deadlock free
» what happens to traffic not bound for congested dest?
CS258 S99 | 2 |
---|
» Each may generate a response, which cannot be sent!» What happens when internal buffering is full?
• logically independent request/reply networks – physical networks
– virtual channels with separate input/output queues
CS258 S99 | 3 |
---|
Scalable Network
Message
CA | ° ° ° | M | CA | P | ||||
---|---|---|---|---|---|---|---|---|
|
||||||||
|
||||||||
P | Node Architecture | |||||||
|
M |
• Key Design Issue:
• How much interpretation of the message?
CS258 S99 | 4 |
---|
5 | ||||
---|---|---|---|---|
|
nCUBE, iPSC, . . . | |||
|
||||
CM-5, *T | ||||
J-Machine, | ||||
|
Paragon, Meiko | |||
|
||||
RP3, BBN, T3D | ||||
CS258 S99 |
Data |
|
---|
DMA
channels
Addr | Cmd | P | Addr | |||
---|---|---|---|---|---|---|
Length | Length | |||||
interrupt | ||||||
Rdy | Rdy | |||||
Memory |
• DMA controlled by regs, generates interrupts
|
||
---|---|---|
dest addr |
• Receive
– must receive into system buffer, since no interpretation inCA
CS258 S99 | 6 |
---|
|
---|
Switch
Addr | Addr | Addr | Addr | |||
---|---|---|---|---|---|---|
Length | Length |
|
Memory
Memory |
---|
7 |
|||||||||
---|---|---|---|---|---|---|---|---|---|
|
CS258 S99 |
Host Memory | NIC |
---|
Data | Addr Len | TX | addr | DMA |
|
|
||
---|---|---|---|---|---|---|---|---|
RX | len | |||||||
Status | ||||||||
Next | ||||||||
Addr Len |
|
|||||||
Status |
|
Proc | ||||||
Next | ||||||||
Addr Len | Addr Len | |||||||
Status | Status | |||||||
Next | Next | |||||||
Data |
|
---|
Mem | P | Mem | ||
---|---|---|---|---|
|
CS258 S99 | 9 |
---|
User Level Network ports Virtual address space
port | Processor |
---|
Registers
Program counter
CS258 S99 | 10 |
---|
Processing | Diagnostics network |
|
||
---|---|---|---|---|
|
||||
PM PM | ||||
Processing |
|
|||
partition | partition | processors |
• tag per message
SPARC | $ | Data | |||
---|---|---|---|---|---|
$ | networks | ||||
|
|||||
NI | |||||
|
SRAM |
DRAM | Vector | DRAM | DRAM | Vector | ||
---|---|---|---|---|---|---|
unit | unit | |||||
ctrl | ctrl | ctrl | ||||
|
DRAM |
|
DRAM |
CS258 S99 |
|
11 |
---|
D ata | Ad dress |
---|
M e m | M em |
|
---|
• Hardware support to vector to address specified in message
– message ports in registers
CS258 S99 | 12 |
---|
|
||
---|---|---|
CS258 S99 |
|
CS258 S99 | 14 |
---|
*T: Network Co-Processor
CS258 S99 | 15 |
---|
Interface unit
• Nodes integrate
communication with
computation on
systolic basis
CS258 S99 | 16 |
---|
Dedicated processing without dedicated hardware design
|
CS258 S99 | 17 |
---|
Mem | M P | Mem | M P |
|
|
---|---|---|---|---|---|
P | P |
User |
|
User |
---|
CS258 S99 | 18 |
---|
Levels of Network Transaction
Network
Mem | M P | ° ° ° | NI | M P | ||
---|---|---|---|---|---|---|
P | P |
• User Processor stores cmd / msg / data into shared output queue – must still check for output queue full (or make elastic)
• Communication assists make transaction happen
– checking, translation, scheduling, transport, interpretation
• Effect observed on destination address space and/or events• Protocol divided between two layers
|
CS258 S99 | 19 |
---|
16 |
|
rte |
---|
i860xp | Mem | $ | 2048 B | ° ° ° |
|
MP handler | |
---|---|---|---|---|---|---|---|
NI | |||||||
50 MHz | |||||||
|
|||||||
16 KB $ | |||||||
4-way | rDMA | ||||||
32B Block |
|
M P |
CS258 S99 | 20 |
---|