Tune File Connection Behavior

An overview for tuning file connection behavior.

Use Alternative File Names

The -a option specifies the actual file name to which a connection is made. This option allows files to be created in different directories without changing the FILE= specifier on an OPEN statement.

For example, consider the following assign command issued to open unit 1:
assign -a /tmp/mydir/tmpfile u:1
The program then opens unit 1 with any of the following statements:
WRITE(1) variable          ! implicit open
OPEN(1)                    ! unnamed open
OPEN(1,FORM='FORMATTED')   ! unnamed open

Unit 1 is connected to file /tmp/mydir/tmpfile. Without the -a attribute, unit 1 would be connected to file fort.1.

When the -a attribute is associated with a file, any Fortran open that is set to connect to the file causes a connection to the actual file name. An assign command of the following form causes a connection to file $FILENV/joe:
assign -a $FILENV/joe ftfile
This is true when the following statement is executed in a program:
OPEN(IUN,FILE='ftfile')
If the following assign command is issued and in effect, any Fortran INQUIRE statement whose FILE= specification is foo refers to the file named actual instead of the file named foo for purposes of the EXISTS=, OPENED=, or UNIT= specifiers:
assign -a actual f:foo
If the following assign command is issued and in effect, the -a attribute does not affect INQUIRE statements with a UNIT= specifier:
assign -a actual ftfile
When the following OPEN statement is executed, INQUIRE(UNIT=n,NAME=fname) returns a value of ftfile in fname, as if no assign had occurred:
OPEN(n,file='ftfile')

The I/O library routines use only the actual file (-a) attributes from the assign environment when processing an INQUIRE statement. During an INQUIRE statement that contains a FILE= specifier, the I/O library searches the assign environment for a reference to the file name that the FILE= specifier supplies. If an assign-by-filename exists for the file name, the I/O library determines whether an actual name from the -a option is associated with the file name. If the assign-by-filename supplied an actual name, the I/O library uses that name to return values for the EXIST=, OPENED=, and UNIT= specifiers; otherwise, it uses the file name. The name returned for the NAME= specifier is the file name supplied in the FILE= specifier. The actual file name is not returned.

Specify File Structure

A file structure defines the way records are delimited and how the end-of-file is represented. The assign command supports two mutually exclusive file structure options:
  • To select a structure using an FFIO layer, use assign -F
  • To select a structure explicitly, use assign -s

    Using FFIO layers is more flexible than selecting structures explicitly. FFIO allows nested file structures, buffer size specifications, and support for file structures not available through the -s option. Better I/O performance is realized by using the -F option and FFIO layers.

    The remainder of this section covers the -s option.

    Fortran sequential unformatted I/O uses four different file structures: f77 blocked structure, text structure, unblocked structure, and COS blocked structure. By default, the f77 blocked structure is used unless a file structure is selected at open time. If an alternative file structure is needed, the user can select a file structure by using the -s or -F option on the assign command.

    The -s and -F options are mutually exclusive. The following examples show how to use different assign command options to select different file structures.

    Structure

    assign Command

    F77 blocked
    assign -F f77
    text
    assign -F text
    assign -s text
    unblocked
    assign -F system
    assign -s unblocked
    COS blocked
    assign -F cos
    assign -s cos
The following examples show how to adjust blocking:
  • To select an unblocked file structure for a sequential unformatted file:
    IUN = 1
    CALL ASNUNIT(IUN,'-s unblocked',IER)
    OPEN(IUN,FORM='UNFORMATTED',ACCESS='SEQUENTIAL')
  • The assign -s u command can also be used to specify the unblocked file structure for a sequential unformatted file. When this option is selected, I/O is unbuffered. Each Fortran READ or WRITE statement results in a read or write system call such as the following:
    CALL ASNFILE('fort.1','-s u',IER)
    OPEN(1,FORM='UNFORMATTED',ACCESS='SEQUENTIAL')
  • To assign unit 10 a COS blocked structure:
    assign -s cos u:10
The full set of options allowed with the assing -s command are as follows:
  • bin (not recommended)
  • blocked
  • cos
  • sbin
  • text
  • unblocked
Table 1. Fortran Access Methods and Options
Access and Formassign -s ft Defaultsassign -s ft Options
Sequential unformatted, BUFFER IN and BUFFER OUTblocked / cos / f77
bin
sbin
u
unblocked
Direct unformattedunblocked
bin
sbin
u
unblocked
Sequential formattedtext
blocked
cos
sbin/text
Direct formattedtext
sbin/text

A file with an unblocked file structure contains undelimited records. Because it does not contain any record control words, it does not have record boundaries. The unblocked file structure can be specified for a file opened with either unformatted sequential access or unformatted direct access. It is the default file structure for a file opened as an unformatted direct-access file.

Do not attempt to use a BACKSPACE statement to reposition a file with an unblocked file structure. Since record boundaries do not exist, the file cannot be repositioned to a previous record.

BUFFER IN and BUFFER OUT statements can specify a file having an unbuffered and unblocked file structure. If the file is specified with assign -s u, BUFFER IN and BUFFER OUT statements can perform asynchronous unformatted I/O.

There are several ways to use the assign command to specify unblocked file structure. All ways result in a similar file structure but with different library buffering styles, use of truncation on a file, alignment of data, and recognition of an end-of-file record in the file. The following unblocked data file structure specifications are available:

SpecificationStructure
assign -s unblockedLibrary-buffered
assign -F systemNo library buffering
assign -s sbinBuffering that is compatible with standard I/O; for example, both library and system buffering

The type of file processing for an unblocked data file structure depends on the assign -s ft option that is declared or assumed for a Fortran file.

For more information about buffering, see Specify Buffer Behavior.

An I/O request for a file specified using the assign -s unblocked command does not need to be a multiple of a specific number of bytes. Such a file is truncated after the last record is written to the file. Padding occurs for files specified with the assign -s bin command and the assign -s unblocked command. Padding usually occurs when noncharacter variables follow character variables in an unformatted direct-access file.

No padding is done in an unformatted sequential access file. An unformatted direct-access file created by a Fortran program on CLE systems contains records that are the same length. The end-of-file record is recognized in sequential-access files.

Use an assign -s sbin specification for a Fortran file opened with either unformatted direct access or unformatted sequential access. The file does not contain record delimiters. The file created for assign -s sbin in this instance has an unblocked data file structure and uses unblocked file processing.

The assign -s sbin option can be specified for a Fortran file that is declared as formatted sequential access. Because the file contains records that are delimited with the new-line character, it is not an unblocked data file structure. It is the same as a text file structure.

The assign -s sbin option is compatible with the standard C I/O functions.

Cray discourages the use of assign - s sbin because it typically yields poor I/O performance. If an FFIO layer cannot be used, using assign -s text for formatted files and assign -s unblocked for unformatted files usually produces better I/O performance than using assign -s sbin.

An I/O request for a file that is specified with assign -s bin does not need to be a multiple of a specific number of bytes. Padding occurs when noncharacter variables follow character variables in an unformatted record.

The I/O library uses an internal buffer for the records. If opened for sequential access, a file is not truncated after each record is written to the file.

The assign -s u command specifies undefined or unknown file processing. An assign -s u specification can be specified for a Fortran file declared as unformatted sequential or direct access. Because the file does not contain record delimiters, it has an unblocked data file structure. Both synchronous and asynchronous BUFFER IN and BUFFER OUT processing can be used with u file processing.

Fortran sequential files declared by using assign -s u are not truncated after the last word written. The user must execute an explicit ENDFILE statement on the file.

The text file structure consists of a stream of 8-bit ASCII characters. Every record in a text file is terminated by a newline character (\n, ASCII 012). Some utilities may omit the newline character on the last record, but the Fortran library treats such an occurrence as a malformed record. This file structure may be specified for a file that is declared as either formatted sequential access or formatted direct access. It is the default file structure for formatted sequential access and formatted direct access files.

The assign -s text command specifies the library-buffered text file structure. Both library and system buffering are done for all text file structures.

An I/O request for a file using assign -s text does not need to be a multiple of a specific number of bytes.

BUFFER IN and BUFFER OUT statements cannot be used with this structure. Use a BACKSPACE statement to reposition a file with this structure.

The cos or blocked file structure uses control words to mark the beginning of each sector and to delimit each record. Specify this file structure for a file that is declared as unformatted sequential access. Synchronous BUFFER IN and BUFFER OUT statements can create and access files with this file structure.

Specify this file structure with one of the following assign commands:

assign -s cos
assign -s blocked
assign -F cos
assign -F blocked        

These four assign commands result in the same file structure.

An I/O request on a blocked file is library buffered.

In a cos file structure, one or more ENDFILE records are allowed. BACKSPACE statements can be used to reposition a file with this structure.

A blocked file is a stream of words that contains control words called Block Control Word (BCW) and Record Control Words (RCW) to delimit records. Each record is terminated by an EOR (end-of-record) RCW. At the beginning of the stream, and every 512 words thereafter (including any RCWs), a BCW is inserted. An end-of-file (EOF) control word marks a special record that is always empty. Fortran considers this empty record to be an endfile record. The end-of-data (EOD) control word is always the last control word in any blocked file. The EOD is always immediately preceded by either an EOR, or by an EOF and a BCW.

Each control word contains a count of the number of data words to be found between it and the next control word. In the case of the EOD, this count is 0. Because there is a BCW every 512 words, these counts never point forward more than 511 words.

A record always begins at a word boundary. If a record ends in the middle of a word, the rest of that word is zero filled; the ubc field of the closing RCW contains the number of unused bits in the last word.

The following illustration and table is a representation of the structure of a BCW.

munusedbdfunusedbnfwi
(4)(7)(1)(19)(24)(9)
FieldBitsDescription
m0-3Type of control word; 0 for BCW
bdf11Bad Data flag (1-bit, 1=bad data)
bn31-54Block number (modulo 224)
fwi55-63Forward index; the number of words to the next control word

The following illustration and table is a representation of the structure of an RCW.

mubctranbdfsrsunusedpfiprifwi
(4)(6)(1)(1)(1)(7)(20)(15)(9)
FieldBitsDescription
m0-3Type of control word; 108 for EOR, 168 for EOF, and 178 for EOD
ubc4-9Unused bit count; number of unused low-order bits in last word of previous record
tran10Transparent record field (unused)
bdf11Bad data flag (unused)
srs12Skip remainder of sector (unused)
pfi20-39Previous file index; offset modulo 220 to the block where the current file starts (as defined by the last EOF)
pri40-54Previous record index; offset modulo 215 to the block where the current record starts
fwi55-63Forward index; the number of words to the next control word

Specify Buffer Behavior

A buffer is a temporary storage location for data while the data is being transferred. Buffers are often used for the following purposes:
  • Small I/O requests can be collected into a buffer, and the overhead of making many relatively expensive system calls can be greatly reduced.
  • Many data file structures such as cos contain control words. During the write process, a buffer can be used as a work area where control words can be inserted into the data stream (a process called blocking). The blocked data is then written to the device. During the read process, the same buffer work area can be used to remove the control words before passing the data on to the user (called deblocking).
  • When data access is random, the same data may be requested many times. A cache is a buffer that keeps old requests in the buffer in case these requests are needed again. A cache that is sufficiently large or efficient can avoid a large part of the physical I/O by having the data ready in a buffer. When the data is often found in the cache buffer, it is referred to as having a high hit rate. For example, if the entire file fits in the cache and the file is present in the cache, no more physical requests are required to perform the I/O. In this case, the hit rate is 100%.
  • Running the I/O devices and the processors in parallel often improves performance; therefore, it is useful to keep processors busy while data is being moved. To do this when writing, data can be transferred to the buffer at memory-to-memory copy speed. Use an asynchronous I/O request. The control is then immediately returned to the program, which continues to execute as if the I/O were complete (a process called write-behind). A similar process called read-ahead can be used while reading; in this process, data is read into a buffer before the actual request is issued for it. When it is needed, it is already in the buffer and can be transferred to the user at very high speed.
  • When direct I/O is enabled (assign -B on), data is staged in the system buffer cache. While this can yield improved performance, it also means that performance is affected by program competition for system buffer cache. To minimize this effect, avoid public caches when possible.
  • In many cases, the best asynchronous I/O performance can be realized by using the FFIO cachea layer (assign -F cachea). This layer supports read-ahead, write-behind, and improved cache reuse.

    The size of the buffer used for a Fortran file can have a substantial effect on I/O performance. A larger buffer size usually decreases the system time needed to process sequential files. However, large buffers increase a program's memory usage; therefore, optimizing the buffer size for each file accessed in a program on a case-by-case basis can help increase I/O performance and minimize memory usage.

    The -b option on the assign command specifies a buffer size, in blocks, for the unit. The -b option can be used with the -s option, but it cannot be used with the -F option. Use the -F option to provide I/O path specifications that include buffer sizes; the -b, and -u options do not apply when -F is specified.

    For more information about the selection of buffer sizes, see the assign(1) man page.

The following examples of buffer size specification illustrate using the assign -b and assign -F options:
  • If unit 1 is a large sequential file for which many Fortran READ or WRITE statements are issued, increase the buffer size to a large value, using the following assign command:
    assign -b buffer-size u:buffer-count
  • If the file foo is a small file or is accessed infrequently, minimize the buffer size using the following assign command:
    assign -b 1 f:foo
The Fortran I/O library can read and write files with record blocking and data formats native to operating systems from other vendors. The assign -F command specifies a foreign record blocking; the assign -C command specifies the type of character conversion; the -N option specifies the type of numeric data conversion. When -N or -C is specified, the data is converted automatically during the processing of Fortran READ and WRITE statements. For example, assume that a record in file fgnfile contains the following character and integer data:
character*4 ch
integer int
open(iun,FILE='fgnfile',FORM='UNFORMATTED')
read(iun) ch, int 
Use the following assign command to specify foreign record blocking and foreign data formats for character and integer data:
assign -F ibm.vbs -N ibm -C ebcdic fgnfile
One of the most common uses of the assign command is to swap big-endian for little-endian files. To access big-endian unformatted files on a little-endian system such as the Cray XE, use the following command:
assign -N swap_endian fgnfile

This assumes the file is a normal f77 unformatted file with 32-bit record control images with a byte count. The library routines swap both the control images and the data when reading or writing the file.

If all unformatted sequential files are the opposite endianness, use the following command:
assign -N swap_endian g:su

The Fortran I/O library automatically selects default buffer sizes according to file access type as shown in the table, Default Buffer Sizes for Fortran I/O Library Routines. Override the defaults by using the assign command. The following subsections describe the default buffer sizes on various systems.

One block is 4,096 bytes on CLE systems.
Table 2. Default Buffer Sizes for Fortran I/O Library Routines
Access TypeDefault Buffer Size
Sequential formatted16 blocks (65,536 bytes)
Sequential unformatted128 blocks (524,288 bytes)
Direct formattedThe smaller of:
  • The record length in bytes + 1
  • 16 blocks (65,536 bytes)
Direct unformattedThe larger of:
  • The record length
  • 16 blocks (65,536 bytes)

Four buffers of default size are allocated. For more information, see the description of the cachea layer in the intro_ffio(3F) man page.

The term library buffering refers to a buffer that the I/O library associates with a file. When a file is opened, the I/O library checks the access, form, and any attributes declared on the assign command to determine the type of processing that should be used on the file. Buffers are an integral part of the processing.

If the file is assigned with one of the following assign options, library buffering is used:
  • -s blocked
  • -F spec (buffering as defined by spec)
  • -s cos
  • -s bin
  • -s unblocked

The -F option specifies flexible file I/O (FFIO), which uses library buffering if the specifications selected include a need for buffering. In some cases, more than one set of buffers might be used in processing a file. For example, the -F bufa,cos option specifies two library buffers for a read of a blank compressed COS blocked file. One buffer handles the blocking and deblocking associated with the COS blocked control words, and the second buffer is used as a work area to process blank compression. In other cases (for example, -F system), no library buffering occurs.

The operating system uses a set of buffers in kernel memory for I/O operations. These are collectively called the system cache. The I/O library uses system calls to move data between the user memory space and the system buffer. The system cache ensures that the actual I/O to the logical device is well formed, and it tries to remember recent data in order to reduce physical I/O requests.

The following assign command options can be expected to use system cache:
  • -s sbin
  • -F spec (FFIO, depends on spec)

For the assign -F cachea command, a library buffer ensures that the actual system calls are well formed and the system buffer cache is bypassed. This is not true for the assign -s u option. If assign -s u is planned to be used to bypass the system cache, all requests must be well formed.

The simplest form of buffering is none at all; this unbuffered I/O is known as direct I/O. For sufficiently large, well-formed requests, buffering is not necessary and can add unnecessary overhead and delay. The following assign command specifies unbuffered I/O:
assign -s u  ...

Use the assign command to bypass both library buffering and the system cache for all well-formed requests. The data is transferred directly between the user data area and the logical device. Requests that are not well formed will result in I/O errors.

Specify Foreign File Formats

The Fortran I/O library can read and write files with record blocking and data formats native to operating systems from other vendors. The assign -F command specifies a foreign record blocking; the assign -C command specifies the type of character conversion; the -N option specifies the type of numeric data conversion. When -N or -C is specified, the data is converted automatically during the processing of Fortran READ and WRITE statements. For example, assume that a record in file fgnfile contains the following character and integer data:
character*4 ch
integer int
open(iun,FILE='fgnfile',FORM='UNFORMATTED')
read(iun) ch, int
Use the following assign command to specify foreign record blocking and foreign data formats for character and integer data:
assign -F ibm.vbs -N ibm -C ebcdic fgnfile
One of the most common uses of the assign command is to swap big-endian for little-endian files. To access big-endian unformatted files on a little-endian system such as the Cray XE, use the following command:
assign -N swap_endian fgnfile

This assumes the file is a normal f77 unformatted file with 32-bit record control images with a byte count. The library routines swap both the control images and the data when reading or writing the file.

If all unformatted sequential files are the opposite endianness, use the following command:
assign -N swap_endian g:su

Specify Memory Resident Files

The assign -F mr command specifies that a file will be memory resident. Because the mr flexible file I/O layer does not define a record-based file structure, it must be nested beneath a file structure layer when record blocking is needed.

For example, if unit 2 is a sequential unformatted file that is to be memory resident, the following Fortran statements connect the unit:
CALL ASNUNIT (2,'-F cos,mr',IER)
OPEN(2,FORM='UNFORMATTED')

The -F cos,mr specification selects COS blocked structure with memory residency.

Use and Suppress File Truncation

The assign -T option activates or suppresses truncation after the writing of a sequential Fortran file. The -T on option specifies truncation; this behavior is consistent with the Fortran standard and is the default setting for most assign -s fs specifications.

The assign(1) man page lists the default setting of the -T option for each -s fs specification. It also indicates if suppression or truncation is allowed for each of these specifications.

FFIO layers that are specified by using the -F option vary in their support for suppression of truncation with -T off.

The following figure, Access Methods and Default Buffer Sizes, summarizes the available access methods and the default buffer sizes.

Figure: Access Methods and Default Buffer Sizes