3/14/12 (Solution)
Problem 1: Hyper-Cube
Interconnection Network (6 points)
Answer the following questions based on the Hyper-Cube
interconnection network
1)
What is the percentage of links in an 8-cube
used to implement an equivalent 2-D mesh (16x16) interconnection network? (3 points)
There are totally 8*256/2 links.
There are 15*16*2 links of a 16X16
mesh.
Percentage =(15*16*2)/(4*256)=15/32=46.875%
2)
If the links are duplex (i.e., 1->3, and
3->1 can be done concurrently), and each node cannot receive more than 1
message in a step (i.e., 1->3 and 2->3 cannot be done concurrently). Show
how to use 3 steps to complete the following communication permutation. (3 points)
Sender -> Receiver
|
Step 1
|
Step 2
|
Step 3
|
0->1
|
0->1
|
|
|
1->2
|
1->0
|
0->2
|
|
2->3
|
2->3
|
|
|
3->4
|
3->2
|
2->0
|
0->4
|
4->5
|
4->5
|
|
|
5->6
|
5->4
|
4->6
|
|
6->7
|
6->7
|
|
|
7->0
|
7->6
|
6->4
|
4->0
|
Problem 2: Omega Network (4 points)
Given a multiprocessor with 4096
nodes, an Omega Network is used for the interconnection of the multiprocessor.
16X16 switches are used.
1)
How many switches are required for this Omega
network? (2 points)
Log16(4096)=3 (stages)
4096/16=256
Totally
256*3=768
2)
Show how Node 1073 can send a message to Node
2055. (Hint: You may use Hexadecimal representation of the Node IDs.) (2 points)
1073= 0100 0011 0001two=
411 Hex, 2055= 1000 0000 0111 two= 807 Hex
EX SH EX SH EX
431 ->
438 -> 384 -> 380
-> 803 -> 807
Problem 3:
Synchronization (5 points)
The following C
program can be parallelized.
C programming
int A[5];
int index=0;
while(index != 4) {
index=index+1;
A[index-1]=
A[index];
}
Assume that the initial values of the shared variable
array are A[5]={1, 2, 3, 4, 5}. With a single processor, the resulting values
of the array after executing the program will become A[5]= {2, 3, 4, 5, 5}.
1)
Give an example to show that the above code
results in a racing if two processors are involved. (2 points)
See the following events:
The key problem is
that the index increments and the A matrix updates could be performed in any
order.
For instance, if P0
reads index=0, then writes index=1. Immediately following that, P1 reads
index=1 and writes index=2. Then, if P1 performs A[1]=A[2] before P0 performs
A[0]=A[1], then, the resulting A[0]=3 which is not the same as it from
sequential execution. Thus, there is a racing problem.
2)
The implementation of the program using
assembly is shown below:
Assume that the base address of
array A is in register R1, so, A[0], A[1], …, A[4] can be accessed by
0(R1), 4(R1), … 16(R1) respectively
since each integer number is 4 bytes long. Also, assume that the variable index
is stored at 0(R2). The initial value of 0(R2) is 0 since index is initialized
to 0.
li R0, #4 #
load immediate R0=4
Loop: lw R3, 0(R2) #
R3=index
beq R3, R0, Exit # if(index= =4) goto Exit
addi R3, R3, 1 #
R3=index+1
sw R3, 0(R2) # index= index+1;
mult R4, R3, 4 #
R4=4*R3. Note that each integer number is
#4 bytes long!!
add R5, R1, R4 #
R5=R1+R4, i.e., R5 is the address of A[index]
lw R3, 0(R5) # R3=A[index]
sw R3, -4 (R5) # A[index-1] =A[index];
j Loop #
goto Loop
Exit:
How do you resolve the problem using the atomic exchange, ll
& sc so the resulting values of the array will be the same as the execution
on a single processor? (3 points)
A
possible solution is.
li R6, #4 #
load immediate R6=4
Loop: ll
R3, 0(R2) # R3=index
beq R3, R6, Exit # if(index= =4) goto Exit
addi R3, R3, 1 # R3=index+ 1
mult
R4, R3, 4 # R4=4*R3. Note that
each integer number is 4 bytes long!!
add R5, R1, R4 # R5=R1
+R4, i.e., R5 is the address of A[index]
lw R4, 0(R5) #
R4=A[index] R3 buffers the value of A[index] before it
#
could be probably updated by the other processor.
sc R3, 0 (R2) #
A[index-1] =A[index]; Only one successful write will occur.
beqz R3, Loop # if R3 is 0, index has been modified by the other
processor.
#
Then, abort array update operation
sw R4,-4(R5) #
A[index-1] = A[index];
j Loop #
next iteration
The trick here is
that the shared variable, index, is not updated until A[index] is loaded. Thus,
given the previous example, if P0 reads index=0, then neither of the processors
will be able to read index=1 before A[1] is buffered by R4.
Thus,
the right value of A[1] is preserved by P0 even if A[1]=A[2] is performed
before A[0]=A[1]! (Note that the access order of these two instructions is not
guaranteed.
No comments:
Post a Comment