Next: Week 8 Up: Week 7 Previous: Class 1: Pragma

Class 2: Writing programs for the Transputer Machine

Announcements:

  1. Remember that the mid-semester report documents are due next Tuesday. In that class, each group will also be asked to give a five minute presentation of what they are doing.
  2. The HTML documents should also be up by then (most of them are already up, but some are not).
  3. The homework is also due on Tuesday, next week.

We will start today's class by completing the discussion on the synchronization primitives available on the Power Challenge.

Also, the Power Challenge User Guide provides the following strategy to port applications to the PC:

  1. Port your code to the Power Challenge and make sure it gives the right answers on one processor.
  2. Tune the code to improve serial performance. Pay special attention to cache utilization and software pipelining.
  3. Use the most aggressive optimization appropriate for your code.
  4. Profile the code using pixie and prof (see Profiling and Timing) or CASEVision Workshop.
  5. Parallelize those loops and routines that use the largest percentage of CPU time. This can be done by hand or using:
    1. -pfa option for f77
    2. -pca option for cc
  6. Run the parallelized code to make sure it gives the right answers and to measure speed-up.
  7. Profile again with pixie and prof or Workshop to examine for load imbalance.
  8. If load imbalance is detected, experiment with different -mp schedule types or a different number of threads. For information on threads, refer to the SGI C Programmer's Guide (Chapter 5) available using InSight (see InSight).
Repeat steps 4 though 8 until you get the speedup you expect.

Pixie is a program that divides your program into blocks and tells you the execution frequency of the different blocks. Use this and prof to profile your serial and parallel code to see which parts are most resource intensive.

The commands etime and dtime can be used to time sections of code within your program.

You can run programs in interactive mode (for short jobs) or batch mode (for longer jobs). If need be, you can also request the machine be dedicated to your job (for portions of time on the weekend-see /usr/news).


Transputer-based machines:
The PPL transputer based machine is an example of a MIMD distributed memory machine which uses message passing to communicate information between two processors. It is composed of 64 T-805 transputers built by INMOS (the machine itself is put together by Alta Technologies). The 64 transputers are divided into two groups of 32 each, and can be used by two users simultaneously. They can also be hooked together to form a 64 PE array.

A transputer is a single-chip computer which also has four links that makes it easy to hook together transputers in a two-dimensional array for parallel processing. The T-805 has a 32-bit CPU, a 64-bit floating point unit, a 4K on-chip RAM (our machine also provides each transputer with access to 4MB RAM), and 4 bidirectional communication links (of 20 MBits per second each). These can be used to connect one transputer to four other transputers.

You must remember that the T-805 is mid-eighties technology, so you shouldn't expect it to compete with something like the MIPS R8000 chip (that forms a single PE in the Power Challenge, and is 1994 technology). Therefore, don't expect the transputer machine to compete with state of the art machines like the Power Challenge. However, you can learn valuable concepts about how to program MIMD distributed memory message passing machines by learning how to write programs on the transputer machine. (As an aside, INMOS has come out with a new family of transputers called T9000 which is a 64 bit processor and allows for much faster computation and communication).

The PPL machine is structured such that the PEs are connected as a two-dimensional mesh (see the description in Andrew Hustrulid's project).

The compiler on the transputer machine is a C compiler from Logical Systems. This is a C compiler with extensions for process creation, allocation of communication channels, and message passing between concurrent processes (in the form of library calls).

A README file provides you with information about how to setup your environment for comiling programs using LSC. For more information on LSC, please see Dan Hyde's LSC Handbook.



Next: Week 8 Up: Week 7 Previous: Class 1: Pragma


mmisra@mines.edu
Tue Dec 5 07:44:03 MST 1995