UNDERGRADUATE COMPUTATIONAL ENGINEERING AND SCIENCE

UCES

"High Performance Computing and Diffusion in 3D"

Course:   Methods and Analysis in UCES

Prerequisites:

  1. Math:   Calculus 3 and matrices
  2. Computing:   Fortran
  3. Science:   Physics and 2D heat diffusion

Objectives:

  1. Math:   Partial differential equation with three variables
  2. Computing:   Use of vector pipelines and multiprocessors
  3. Science:   Model which require 3D

General Information: In this lecture we present the need for 3D models and high performance computing. Some elementary versions of SOR for vector pipelines and multiprocessors are presented, and they are used for the diffusion of heat in three directions.

Contact Person:   R. E. White, NCSU,white@math.ncsu.edu

Revision Date:   8-4-94

Copyright ©1994, R. E. White. All rights reserved.

Return to Table of Contents


3.8   High Performance Computing and Diffusion in 3D

3.8.1.   Introduction.

The Maple codes are executed very slowly, and for larger problems such as in the 2D models for heat and fluid flow this can become a hindrance to the scientific investigation. Several alternatives are to use faster programming languages, faster algorithms, or faster computers. In this section we will use the Fortran programming language and a computer which has more than one processor and has vector pipes.

We will consider the cooling fin problem where there is diffusion in all three directions. When each direction is discretized, say with N unknowns in each direction, then there will be N3 total unknowns. So, if the N is doubled, then the total number of unknowns will increase by a factor of 8! Moreover, if one uses the full version of Gaussian elimination, the number of floating point operations will be of order (N3 )3 /3. And, a doubling of N will increase the number of floating point operations to execute the Gaussian elimination algorithm by a factor of 64! This is known as the curse of dimensionality, and it requires the use of faster computers, languages and algorithms.

Many problems are not only 3D problems, but often have more than one physical quantity associated with them. Two examples are aircraft modeling and weather prediction models. In the case of an aircraft, the lift forces are determined by the velocity with three components, the pressure and in many cases the temperatures. So, there are five quantities which vary with 3D space. Weather forecasting models are much more complicated because (1) there are more 3D quantities, (2) often one does not precisely know the boundary conditions and (3) there are chemical and physical changes in system. Such problems require very complicated models, and the need for faster languages, algorithms and computing hardware is essential to give realistic numerical simulations.


3.8.2.   Applied Area.

Consider an electric transformer that is used on a power line. Heat is generated by the electrical current flowing through the wire inside the transformer. In order to cool the transformer, fins which are not long or very thin in any direction are attached to the transformer. Thus, there will be significant temperature distributions in each of the three directions, and consequently, there will be heat diffusion in all three directions. The problem is to determine the steady state heat distribution in the 3D fin so that one can determine the fin's cooling effectiveness.

Figure: 3D Fin on an Electrical Transformer


3.8.3.   Model.

In order to model the temperature, we will first assume temperature is given along the 3D boundary of the volume (0,L)×(0,H)×(0,T). Consider a small mass within the above fin whose volume is (dx dy dz). This volume will have heat sources or sinks via the two (dx dz) surfaces, two (dy dz) surfaces, and two (dx dy) surfaces as well as any internal heat source given by f with units of heat/(vol. time).

Figure: Heat Flow in Small Volume

The heat flowing through these six surfaces will be given by the Fourier heat law applied to each of the three directions. A steady state model is

change in heat of (dx dy dz) = 0 ~ f(x,y,z) (dx dy dz) dt

  + (dx dy) dt (Kuz(x, y, z+dz) - Kuz(x,y,z))

    + (dx dz) dt (Kuy(x, y+dy, z) - Kuy(x,y,z))

      + (dy dz) dt (Kux(x+dx, y, z) - Kux(x,y,z)).

This approximation gets more accurate as dx, dy and dz go to zero. So, divide by (dx dy dz) dt and let dx, dy and dz go to zero.

Steady State 3D Diffusion.

0 = f(x,y,z) + (Kuz(x,y,z))z + (Kux(x,y,z))x + (Kuy(x,y,z))y (1)

  for (x,y,z) in (0,L)×(0,H)×(0,T),

    f(x,y,z) is the internal heat source,
    K is the thermal conductivity and
    u(x,y,z) given on the surface of (0,L)×(0,H)×(0,T).


3.8.4.   Method.

In order to keep the notation as simple as possible, we will assume that the number of cells in each direction, nx, ny and nz, are such that dx = dy = dz = h. Let uijl be the approximation of u(ih,jh,lh). Approximate the second order derivatives by the centered finite differences. There are nn = (nx - 1)(ny - 1)(nz - 1) equations \ for nn unknowns uijl.

Finite Difference Model of (1).

0 = f(ih,jh,lh) + [(ui,j,l+1 - uijl )/h - (uijl - ui,j,l-1 )/h]/h

  + [(ui+1,j,l - uijl )/h - (uijl - ui-1,j,l )/h]/h

    + [(ui,j+1,l - uijl )/h - (uijl - ui,j-1,l )/h]/h (2)

where 1 i nx - 1,   1 j ny - 1 and   1 l nz - 1.

Equation (2) simplifies upon multiplication by h2 and collection of like terms to

6uijl = f(ih,jh,lh)h2 + ui-1,j,l + ui+1,j,l + ui,j-1,l

    + ui,j+1,l + ui,j,l-1 + ui+1,j,l+1 . (3)

Equation (3) suggests the use of the SOR algorithm where there are three nested loops within the SOR loop. The uijl are now stored in a 3D array, and either f(ih,jh,lh) can be computed every SOR sweep, or f(ih,jh,lh) can be computed once and stored in a 3D array.

3D SOR Algorithm.

        choose nx, ny , nz such that h= L/nx = H/ny = T/nz
        for m = 1,maxit
            for l = 1,nz
                for j = 1,ny
                    for i = 1,nx
                        utemp = (f(i*h, j*h, l*h)*h*h 
                                     + u(i-1,j,l) + u(i+1,j,l)
                                         + u(i,j-1,l) + u(i,j+1,l) 
                                            + u(i,j,l-1) + u(i,j,l+1))/6
                        u(i,j,l) = (1 - w)*u(i,j,l) + w*utemp
                    endloop
                endloop
            endloop
	    test for convergence
	endloop.

The computation of each new u(i,j,l) requires the six nearest values u as depicted in the next figure. Let the nodes be partitioned into two large blocks and a smaller third block separating the two large blocks of nodes. Thus, if ijl is in block 1 ( or 2), then only input from block 1 (or 1) and block 3 will be required to do the SOR computation. This suggests that one can reorder the nodes so that disjoint blocks of nodes, which are separated by a plane of nodes, can be computed concurrently in the SOR algorithm. This is known as domain decompositon or substructuring when the block can be identified with a physical substructure such as a wing of an aircraft.

Figure: Domain Decomposition and SOR

In the algorithm below we have used the following arrays to define the blocks:

nblock(k,1) = min l     for uijl in block k and
nblock(k,2) = max l     for uijl in block k.

Domain Decomposition and 3D SOR Algorithm.

	choose nx, ny, nz such that h= L/nx = H/ny = T/nz
	define blocks 1, 2 and 3
	for m = 1,maxit
		concurrently do SOR on blocks 1 and 2
		update u
		do SOR on block 3
		test for convergence
	endloop.

SOR on block k:

        for l = nblock(k,1),nblock(k,2)
            for j = 1,ny
                for i = 1,nx
                    utemp = (f(i*h, j*h, l*h)*h*h +
                                 u(i-1,j,l) + u(i+1,j,l)+
                                       u(i,j-1,l) + u(i,j+1,l) +
                                             u(i,j,l-1) + u(i,j,l+1))/6
                    u(i,j,l) = (1 - w)*u(i,j,l) + w*utemp
                endloop
            endloop
        endloop.


3.8.5.   Implementation.

The first code illustrates the 3D steady state cooling fin problem with the finite difference discrete model given in (3) where f(x,y,z) = 0.0. We used the following parameters:

L = 2 H = 1 T = 4 w = 1.8
nx = 40 ny = 20 nz = 80 eps = 0.001.

There were 58,539 unknowns, and the number of iterations required for convergence was 62. The SOR parameter was omega = 1.80 and is near optimal for the convergence criteria of eps = 0.001.

The Alliant FX/40 vector/multiprocessing was used. With no use of the vector pipe the serial computation required 28.99 sec., and the two processor computation with domain decomposition required 15.26 sec. The speedup is almost equal to two. The speedup is not two because of (1) the time needed to exchange data between the two processors, and (2) the time needed to do the SOR on block 3 which is done by just one of the processors. The Alliant FX/40 is a loop based parallel version of Fortran, and the concurrent calculations are done in loop 50 where the SOR subroutine is concurrently called for blocks 1 and 2.

Fortran Code For 3D Cooling Fin With Doman Decompostion and SOR.

      program pde
      implicit double precision (a-d,f-h,o-z)
      real tt1(2),tt2(2)
      dimension u(0:40,0:20,0:80)
      integer nblock(3,2),numi(1:3)
      common u,w,h,eps,nblock,numi,nx,ny,nz 
      nx = 40
      ny = 20
      nz = 80
      nn = (nx-1)*(ny-1)*(nz-1)
      w = 1.80
      dw = 0.025 
      h = 1.0/ny
      eps = 0.001
      maxit = 200 
      u(0:nx,0:ny,0:nz) = 70
      u(0,1:ny-1,1:nz-1) = 160
      nblock(1,1) = 1
      nblock(1,2) = int((nz-1)/2)
      nblock(2,1) = nblock(1,2) + 2
      nblock(2,2) = nz-1
      nblock(3,1) = nblock(1,2) + 1
      nblock(3,2) = nblock(3,1)
      t1 = etime(tt1)
      do 100 iter = 1,maxit
         numi(1:3) = 0
cvd$l concur
cvd$l nodepchk
cvd$l cncall
      do 50 k=1,2
         call sor(k)
   50 continue
      call sor(3)
      nnumi = numi(1) + numi(2) + numi(3)
      if (nnumi.eq.nn) goto 101
  100 continue
  101 continue
      t2 = etime(tt2)
      time = t2 - t1
      print*, time
      print*, iter,nnumi,nn,w        
      print*, u(5,5,40),u(10,5,40), u(20,5,40),u(30,5,40)
      print*, u(5,15,40),u(10,15,40), u(20,15,40),u(30,15,40)
      print*, u(5,10,70),u(10,10,70), u(20,10,70),u(30,10,70)
      print*, u(5,10,60),u(10,10,60), u(20,10,60),u(30,10,60)
      print*, u(5,10,40),u(10,10,40), u(20,10,40),u(30,10,40)
      print*, u(5,10,20),u(10,10,20), u(20,10,20),u(30,10,20)
      print*, u(5,10,10),u(10,10,10), u(20,10,10),u(30,10,10)
      w = w + dw  
      stop
      end
c

   Output Data:
		15.26
		62		58539	 	58539		1.80
		109.12	087.09	073.49	070.69
		109.12	087.09	073.49	070.69
		115.00	089.26	073.27	070.55
		118.55	092.92	074.57	070.85
		118.93	093.46	074.93	070.97
		118.55	092.92	074.57	070.55
		115.00	089.25	073.26	070.55

      subroutine sor(k)
      implicit double precision (a-d,f-h,o-z)
      dimension u(0:40,0:20,0:80)
      integer nblock(3,2),numi(1:3)
      common u,w,h,eps,nblock,numi,nx,ny,nz
cvd$l nodepchk
cvd$l novector
cvd$l noconcur
      do 10 l=nblock(k,1),nblock(k,2)
         do 10 j = 1,ny-1
            do 10 i = 1,nx-1
            up = u(i,j,l)
            uu = (0.0 +
    $              u(i-1,j,l) + u(i+1,j,l)
    $             +u(i,j-1,l) + u(i,j+1,l)
    $             +u(i,j,l-1) + u(i,j,l+1))*0.166666667
            u(i,j,l)  = (1.0 -w)*up + w*uu
            error = abs(u(i,j,l) - up)
            if (error.lt.eps) numi(k) = numi(k) + 1
   10 continue
      return
      end

The next code shows how one can reorder nodes so that the vector pipelines in each of the two processors can be used to execute the SOR algorithm. We do this for the the 1D diffusion model given by

-(Kux )x = f(x) (continuum version)
K(-ui-1 + 2ui - ui+1 ) = h2 f(ih)       (discrete version).

The SOR method requires input of ui-1 and ui+1 in order to compute the new SOR value of ui. Thus, if i is even, then only the u with odd subscripts are required as input. The vector version of each SOR iteration is to group all the even nodes and all the odd nodes:

This is some times called red-black ordering.

The following Fortran code illustrates this and is coupled with the domain decomposition reordering. The domain decomposition blocks are two large left and right blocks listed first, and the third block is a single node in the center. Loop 50 does the concurrent calculation over the large left and right blocks. Within the large blocks even and odd nodes are grouped together, and the vector pipes are used to execute loops 10 and 11 in the subroutine SOR. The serial time with no vector pipe was .3111 sec., and with one processor and vector pipe the time was .1930 sec. When two processors were used as well as the vector pipe, the computation time was .1056 sec.

Fortran Code in 1D with Vector and Domain Decomposition For SOR.


      program ode
      implicit double precision (a-d,f-h,o-z)
      real tt1(2),tt2(2)
      dimension u(500),up(500)
      integer nblock(3,2)
      common u,w,h,nblock
      n = 240 
      nn = n-1
      w = 1.98
      dw = 0.005 
      h = 1.0/n
      eps = 0.00001
      maxit = 2000 
c      do 999 itw = 1,1
      u(0:500) = 0.0
      up(0:500) = 0.0
      nblock(1,1) = 1
      nblock(1,2) = int((n-1)/2)
      nblock(2,1) = nblock(1,2) + 2
      nblock(2,2) = n-1
      nblock(3,1) = nblock(1,2) + 1
      nblock(3,2) = nblock(3,1)
      t1 = etime(tt1)
      do 100 iter = 1,maxit
cvd$l    concur
cvd$l    nodepchk
cvd$l    cncall
         do 50 k=1,2
            call sor(k)
   50    continue
         call sor(3)
         numi = 0
         do 60 i=1,n-1
            if (abs(u(i)-up(i)).lt.eps) numi = numi + 1
            up(i) = u(i)
   60    continue
         if (numi.eq.nn) goto 101
  100 continue
  101 continue
      t2 = etime(tt2)
      time = t2 - t1
      print*, iter,numi,n,w
      print*, time,u(nblock(3,1))
      w = w + dw  
c 999  continue  
      stop
      end
c
      subroutine sor(k)
      implicit double precision (a-d,f-h,o-z)
      dimension u(500)
      integer nblock(3,2)
      common u,w,h,nblock
cvd$l nodepchk
cvd$l vector
cvd$l noconcur
      do 10 i=nblock(k,1),nblock(k,2),2
         uu = (10.0*h*h*(1.0 - i*h)*i*h +
     $             u(i-1) + u(i+1))*0.5 
         u(i)  = (1.0 -w)*u(i) + w*uu
   10 continue
cvd$l nodepchk
cvd$l vector
cvd$l noconcur
      do 11 i=nblock(k,1)+1,nblock(k,2),2
         uu = (10.0*h*h*(1.0 - i*h)*i*h +
     $             u(i-1) + u(i+1))*0.5 
         u(i)  = (1.0 -w)*u(i) + w*uu
   11 continue
      return
      end


3.8.6.   Assessment.

The output from the 3D code gives variable temperatures in all three directions. This indicates that a 1D or a 2D model is not applicable for this particular fin. A possible problem with the present 3D model is the given boundary condition for the portion of the surface which is between the fin and the surrounding region. Here the alternative is a convective boundary condition

Both the surrounding temperature, usur , and the temperature of the transformer (we used u = 160 at x = 0) will vary with time. Thus, this really is a time dependent problem, and furthermore, one should consider the entire transformer and not just one fin.


3.8.7.   Homework.

  1. Experiment with different L, H and T. Determine when a 1D or 2D model would be acceptable. Why is this important?

  2. Modify the 3D code to include the above convective boundary condition. Experiment with the convective coefficient c as it ranges from 0 to infinity. When will the given boundary condition be acceptable?

  3. Experiment with the domain decomposition and determine the speedups.

  4. Use a variation of the red-black SOR reordering to do the 3D SOR problem using a vector pipeline.