UCES
Course: Methods and Analysis in UCES
Prerequisites:
Objectives:
General Information: In this lecture we present the need for 3D models and high performance computing. Some elementary versions of SOR for vector pipelines and multiprocessors are presented, and they are used for the diffusion of heat in three directions.
Contact Person: R. E. White, NCSU,white@math.ncsu.edu
Revision Date: 8-4-94
Copyright ©1994, R. E. White. All rights reserved.
The Maple codes are executed very slowly, and for larger problems such as in the 2D models for heat and fluid flow this can become a hindrance to the scientific investigation. Several alternatives are to use faster programming languages, faster algorithms, or faster computers. In this section we will use the Fortran programming language and a computer which has more than one processor and has vector pipes.
We will consider the cooling fin problem where there is diffusion in all three directions. When each direction is discretized, say with N unknowns in each direction, then there will be N3 total unknowns. So, if the N is doubled, then the total number of unknowns will increase by a factor of 8! Moreover, if one uses the full version of Gaussian elimination, the number of floating point operations will be of order (N3 )3 /3. And, a doubling of N will increase the number of floating point operations to execute the Gaussian elimination algorithm by a factor of 64! This is known as the curse of dimensionality, and it requires the use of faster computers, languages and algorithms.
Many problems are not only 3D problems, but often have more than one physical quantity associated with them. Two examples are aircraft modeling and weather prediction models. In the case of an aircraft, the lift forces are determined by the velocity with three components, the pressure and in many cases the temperatures. So, there are five quantities which vary with 3D space. Weather forecasting models are much more complicated because (1) there are more 3D quantities, (2) often one does not precisely know the boundary conditions and (3) there are chemical and physical changes in system. Such problems require very complicated models, and the need for faster languages, algorithms and computing hardware is essential to give realistic numerical simulations.
Consider an electric transformer that is used on a power line. Heat is generated by the electrical current flowing through the wire inside the transformer. In order to cool the transformer, fins which are not long or very thin in any direction are attached to the transformer. Thus, there will be significant temperature distributions in each of the three directions, and consequently, there will be heat diffusion in all three directions. The problem is to determine the steady state heat distribution in the 3D fin so that one can determine the fin's cooling effectiveness.
![]() |
| Figure: 3D Fin on an Electrical Transformer |
In order to model the temperature, we will first assume temperature is given
along the 3D boundary of the volume
![]() |
| Figure: Heat Flow in Small Volume |
The heat flowing through these six surfaces will be given by the Fourier heat law applied to each of the three directions. A steady state model is
| change in heat of (dx dy dz) = 0 |
f(x,y,z) (dx dy dz) dt |
| + (dx dy) dt (Kuz(x, y, z+dz) - Kuz(x,y,z)) | |
| + (dx dz) dt (Kuy(x, y+dy, z) - Kuy(x,y,z)) | |
| + (dy dz) dt (Kux(x+dx, y, z) - Kux(x,y,z)). |
This approximation gets more accurate as dx, dy and dz go to zero. So, divide
by
| 0 = f(x,y,z) + (Kuz(x,y,z))z + (Kux(x,y,z))x + (Kuy(x,y,z))y | (1) |
In order to keep the notation as simple as possible, we will assume that the
number of cells in each direction, nx, ny and nz, are such that
Finite Difference Model of (1).
| 0 = f(ih,jh,lh) | + [(ui,j,l+1 - uijl )/h - (uijl - ui,j,l-1 )/h]/h | |
| + [(ui+1,j,l - uijl )/h - (uijl - ui-1,j,l )/h]/h | ||
| + [(ui,j+1,l - uijl )/h - (uijl - ui,j-1,l )/h]/h | (2) |
where 1
i
nx - 1,
1
j
ny - 1 and
1
l
nz - 1.
Equation (2) simplifies upon multiplication by h2 and collection of like terms to
| 6uijl = | f(ih,jh,lh)h2 + ui-1,j,l + ui+1,j,l + ui,j-1,l | |
| + ui,j+1,l + ui,j,l-1 + ui+1,j,l+1 . | (3) |
Equation (3) suggests the use of the SOR algorithm where there are three nested loops within the SOR loop. The uijl are now stored in a 3D array, and either f(ih,jh,lh) can be computed every SOR sweep, or f(ih,jh,lh) can be computed once and stored in a 3D array.
choose nx, ny , nz such that h= L/nx = H/ny = T/nz
for m = 1,maxit
for l = 1,nz
for j = 1,ny
for i = 1,nx
utemp = (f(i*h, j*h, l*h)*h*h
+ u(i-1,j,l) + u(i+1,j,l)
+ u(i,j-1,l) + u(i,j+1,l)
+ u(i,j,l-1) + u(i,j,l+1))/6
u(i,j,l) = (1 - w)*u(i,j,l) + w*utemp
endloop
endloop
endloop
test for convergence
endloop.
The computation of each new u(i,j,l) requires the six nearest values u as depicted in the next figure. Let the nodes be partitioned into two large blocks and a smaller third block separating the two large blocks of nodes. Thus, if ijl is in block 1 ( or 2), then only input from block 1 (or 1) and block 3 will be required to do the SOR computation. This suggests that one can reorder the nodes so that disjoint blocks of nodes, which are separated by a plane of nodes, can be computed concurrently in the SOR algorithm. This is known as domain decompositon or substructuring when the block can be identified with a physical substructure such as a wing of an aircraft.
![]() |
| Figure: Domain Decomposition and SOR |
In the algorithm below we have used the following arrays to define the blocks:
Domain Decomposition and 3D SOR Algorithm.
choose nx, ny, nz such that h= L/nx = H/ny = T/nz define blocks 1, 2 and 3 for m = 1,maxit concurrently do SOR on blocks 1 and 2 update u do SOR on block 3 test for convergence endloop.SOR on block k:
for l = nblock(k,1),nblock(k,2)
for j = 1,ny
for i = 1,nx
utemp = (f(i*h, j*h, l*h)*h*h +
u(i-1,j,l) + u(i+1,j,l)+
u(i,j-1,l) + u(i,j+1,l) +
u(i,j,l-1) + u(i,j,l+1))/6
u(i,j,l) = (1 - w)*u(i,j,l) + w*utemp
endloop
endloop
endloop.
The first code illustrates the 3D steady state cooling fin problem with the
finite difference discrete model given in (3) where
| L = 2 | H = 1 | T = 4 | w = 1.8 |
| nx = 40 | ny = 20 | nz = 80 | eps = 0.001. |
There were 58,539 unknowns, and the number of iterations required for
convergence was 62. The SOR parameter was
= 1.80 and is near optimal for the
convergence criteria of eps = 0.001.
The Alliant FX/40 vector/multiprocessing was used. With no use of the vector pipe the serial computation required 28.99 sec., and the two processor computation with domain decomposition required 15.26 sec. The speedup is almost equal to two. The speedup is not two because of (1) the time needed to exchange data between the two processors, and (2) the time needed to do the SOR on block 3 which is done by just one of the processors. The Alliant FX/40 is a loop based parallel version of Fortran, and the concurrent calculations are done in loop 50 where the SOR subroutine is concurrently called for blocks 1 and 2.
Fortran Code For 3D Cooling Fin With Doman Decompostion and SOR.
program pde
implicit double precision (a-d,f-h,o-z)
real tt1(2),tt2(2)
dimension u(0:40,0:20,0:80)
integer nblock(3,2),numi(1:3)
common u,w,h,eps,nblock,numi,nx,ny,nz
nx = 40
ny = 20
nz = 80
nn = (nx-1)*(ny-1)*(nz-1)
w = 1.80
dw = 0.025
h = 1.0/ny
eps = 0.001
maxit = 200
u(0:nx,0:ny,0:nz) = 70
u(0,1:ny-1,1:nz-1) = 160
nblock(1,1) = 1
nblock(1,2) = int((nz-1)/2)
nblock(2,1) = nblock(1,2) + 2
nblock(2,2) = nz-1
nblock(3,1) = nblock(1,2) + 1
nblock(3,2) = nblock(3,1)
t1 = etime(tt1)
do 100 iter = 1,maxit
numi(1:3) = 0
cvd$l concur
cvd$l nodepchk
cvd$l cncall
do 50 k=1,2
call sor(k)
50 continue
call sor(3)
nnumi = numi(1) + numi(2) + numi(3)
if (nnumi.eq.nn) goto 101
100 continue
101 continue
t2 = etime(tt2)
time = t2 - t1
print*, time
print*, iter,nnumi,nn,w
print*, u(5,5,40),u(10,5,40), u(20,5,40),u(30,5,40)
print*, u(5,15,40),u(10,15,40), u(20,15,40),u(30,15,40)
print*, u(5,10,70),u(10,10,70), u(20,10,70),u(30,10,70)
print*, u(5,10,60),u(10,10,60), u(20,10,60),u(30,10,60)
print*, u(5,10,40),u(10,10,40), u(20,10,40),u(30,10,40)
print*, u(5,10,20),u(10,10,20), u(20,10,20),u(30,10,20)
print*, u(5,10,10),u(10,10,10), u(20,10,10),u(30,10,10)
w = w + dw
stop
end
c
Output Data:
15.26
62 58539 58539 1.80
109.12 087.09 073.49 070.69
109.12 087.09 073.49 070.69
115.00 089.26 073.27 070.55
118.55 092.92 074.57 070.85
118.93 093.46 074.93 070.97
118.55 092.92 074.57 070.55
115.00 089.25 073.26 070.55
subroutine sor(k)
implicit double precision (a-d,f-h,o-z)
dimension u(0:40,0:20,0:80)
integer nblock(3,2),numi(1:3)
common u,w,h,eps,nblock,numi,nx,ny,nz
cvd$l nodepchk
cvd$l novector
cvd$l noconcur
do 10 l=nblock(k,1),nblock(k,2)
do 10 j = 1,ny-1
do 10 i = 1,nx-1
up = u(i,j,l)
uu = (0.0 +
$ u(i-1,j,l) + u(i+1,j,l)
$ +u(i,j-1,l) + u(i,j+1,l)
$ +u(i,j,l-1) + u(i,j,l+1))*0.166666667
u(i,j,l) = (1.0 -w)*up + w*uu
error = abs(u(i,j,l) - up)
if (error.lt.eps) numi(k) = numi(k) + 1
10 continue
return
end
The next code shows how one can reorder nodes so that the vector pipelines in each of the two processors can be used to execute the SOR algorithm. We do this for the the 1D diffusion model given by
| -(Kux )x = f(x) | (continuum version) |
| K(-ui-1 + 2ui - ui+1 ) = h2 f(ih) | (discrete version). |
The SOR method requires input of ui-1 and ui+1 in order to compute the new SOR value of ui. Thus, if i is even, then only the u with odd subscripts are required as input. The vector version of each SOR iteration is to group all the even nodes and all the odd nodes:
This is some times called red-black ordering.
The following Fortran code illustrates this and is coupled with the domain decomposition reordering. The domain decomposition blocks are two large left and right blocks listed first, and the third block is a single node in the center. Loop 50 does the concurrent calculation over the large left and right blocks. Within the large blocks even and odd nodes are grouped together, and the vector pipes are used to execute loops 10 and 11 in the subroutine SOR. The serial time with no vector pipe was .3111 sec., and with one processor and vector pipe the time was .1930 sec. When two processors were used as well as the vector pipe, the computation time was .1056 sec.
Fortran Code in 1D with Vector and Domain Decomposition For SOR.
program ode
implicit double precision (a-d,f-h,o-z)
real tt1(2),tt2(2)
dimension u(500),up(500)
integer nblock(3,2)
common u,w,h,nblock
n = 240
nn = n-1
w = 1.98
dw = 0.005
h = 1.0/n
eps = 0.00001
maxit = 2000
c do 999 itw = 1,1
u(0:500) = 0.0
up(0:500) = 0.0
nblock(1,1) = 1
nblock(1,2) = int((n-1)/2)
nblock(2,1) = nblock(1,2) + 2
nblock(2,2) = n-1
nblock(3,1) = nblock(1,2) + 1
nblock(3,2) = nblock(3,1)
t1 = etime(tt1)
do 100 iter = 1,maxit
cvd$l concur
cvd$l nodepchk
cvd$l cncall
do 50 k=1,2
call sor(k)
50 continue
call sor(3)
numi = 0
do 60 i=1,n-1
if (abs(u(i)-up(i)).lt.eps) numi = numi + 1
up(i) = u(i)
60 continue
if (numi.eq.nn) goto 101
100 continue
101 continue
t2 = etime(tt2)
time = t2 - t1
print*, iter,numi,n,w
print*, time,u(nblock(3,1))
w = w + dw
c 999 continue
stop
end
c
subroutine sor(k)
implicit double precision (a-d,f-h,o-z)
dimension u(500)
integer nblock(3,2)
common u,w,h,nblock
cvd$l nodepchk
cvd$l vector
cvd$l noconcur
do 10 i=nblock(k,1),nblock(k,2),2
uu = (10.0*h*h*(1.0 - i*h)*i*h +
$ u(i-1) + u(i+1))*0.5
u(i) = (1.0 -w)*u(i) + w*uu
10 continue
cvd$l nodepchk
cvd$l vector
cvd$l noconcur
do 11 i=nblock(k,1)+1,nblock(k,2),2
uu = (10.0*h*h*(1.0 - i*h)*i*h +
$ u(i-1) + u(i+1))*0.5
u(i) = (1.0 -w)*u(i) + w*uu
11 continue
return
end
The output from the 3D code gives variable temperatures in all three directions. This indicates that a 1D or a 2D model is not applicable for this particular fin. A possible problem with the present 3D model is the given boundary condition for the portion of the surface which is between the fin and the surrounding region. Here the alternative is a convective boundary condition
Both the surrounding temperature, usur , and the temperature of the transformer (we used u = 160 at x = 0) will vary with time. Thus, this really is a time dependent problem, and furthermore, one should consider the entire transformer and not just one fin.
. When will the given boundary condition
be acceptable?