Next: Week 10 Up: Week 9 Previous: Class 1: Back

Class 2: Logical Systems C Programming

Since there was some confusion about how distributed memory machines are structured, I've included some discussion here:

As mentioned in class, distd memory machines are easier to design and scale than shared memory machines, but programming on them is somewhat harder. Also, one has to be very concerned about communication overhead when using these machines.

A distd memory machine provides each processor with local memory, and there is no shared global space. Therefore, PEs that want to share information do so by message passing-sending messages back and forth. Each data structure has a home associated with it, and a process requiring the data structure must request it from the home process. Appeal of this kind of approach:

  1. No longer a need for explicit synchronization.
  2. Portability of code-all message passing machines have similar primitives to exchange data. The basic primitives are the send-receive pair.
Examples of this kind of architecture are the transputer machines, hypercube based machines like iPSC/2, nCUBE, iParagon, etc. Programming these machines requires a completely different programming paradigm from the shared memory programming we have seen on the Power Challenge.

One of the earliest programming language for these kinds of machines was CSP (which stands for Communicating Sequential Processes). CSP works on the model that there are a number of sequential processes running concurrently, and they exchange information by passing messages. These processes could run on one processor, but the true effectiveness emerges when the processes run on different processors. A number of later languages would use CSP's philosophy.

CSP's communication primitives consist of:

The primitives are double-blocking-the processes wait till both processes execute the respective send and receive statements.

Transputers and Occam:
Transputers are microprocessors produced by INMOS that are specially designed for parallel processing. Each transputer has four physical links, and these can be used to hook it up to four other transputers. Each chip also has on-chip RAM which is used as local memory. See figure 5.4 The latest version of this family of chips is the T9000, although not many of them are being produced at this time. Our machines have 30MHz T805 chips in them, with 4MB of RAM each.

Occam is the native language for use on transputers.

Returning back to our discussion on LSC, we will look at ld-one and ld-net again. As mentioned before, you can choose to use one partition of 32 transputers or the other. Partition 1 can be accessed by
setenv LINKNAME /dev/hsidrv2
while the other partition can be used by setting
setenv LINKNAME /dev/hsidrv6
ld-one is used to load one executable onto a single processor, and run it in sequential mode. It might be worthwhile to use ld-one to debug your program before you run it in parallel. ld-one will reset the transputer, load a primary bootstrap onto it, then load the executable, and run it on the single transputer. An example of the use of ld-one is:
ld-one exam1 cio
where exam1.tld is the transputer executable, and cio is the input/output library.

To run different executables on different transputers, it is necessary to use ld-net to load the various executable onto the processors. To get a good idea of how .nif files can be setup for virtual channels and real channels, read Andrew Hustrulid's project report from last year. This also contains an example .nif file for actual channels.

To get started, it might be instructional to copy exam1.c, exam5.c, and pipe.nif from /export/home/molly_brown8/tputer, and compile and run these programs in single and multiprocessor modes.


Lecture Notes not complete yet.



Next: Week 10 Up: Week 9 Previous: Class 1: Back


mmisra@mines.edu
Tue Dec 5 07:44:03 MST 1995