Tune File Connection Behavior
An overview for tuning file connection behavior.
Use Alternative File Names
The -a option specifies the actual file name to which a connection is made. This option allows files to be created in different directories without changing the FILE= specifier on an OPEN statement.
assign -a /tmp/mydir/tmpfile u:1WRITE(1) variable ! implicit open
OPEN(1) ! unnamed open
OPEN(1,FORM='FORMATTED') ! unnamed openUnit 1 is connected to file /tmp/mydir/tmpfile. Without the -a attribute, unit 1 would be connected to file fort.1.
assign -a $FILENV/joe ftfileOPEN(IUN,FILE='ftfile')assign -a actual f:fooassign -a actual ftfileOPEN(n,file='ftfile')The I/O library routines use only the actual file (-a) attributes from the assign environment when processing an INQUIRE statement. During an INQUIRE statement that contains a FILE= specifier, the I/O library searches the assign environment for a reference to the file name that the FILE= specifier supplies. If an assign-by-filename exists for the file name, the I/O library determines whether an actual name from the -a option is associated with the file name. If the assign-by-filename supplied an actual name, the I/O library uses that name to return values for the EXIST=, OPENED=, and UNIT= specifiers; otherwise, it uses the file name. The name returned for the NAME= specifier is the file name supplied in the FILE= specifier. The actual file name is not returned.
Specify File Structure
- To select a structure using an FFIO layer, use assign -F
- To select a structure explicitly, use assign -s
Using FFIO layers is more flexible than selecting structures explicitly. FFIO allows nested file structures, buffer size specifications, and support for file structures not available through the -s option. Better I/O performance is realized by using the -F option and FFIO layers.
The remainder of this section covers the -s option.
Fortran sequential unformatted I/O uses four different file structures: f77 blocked structure, text structure, unblocked structure, and COS blocked structure. By default, the f77 blocked structure is used unless a file structure is selected at open time. If an alternative file structure is needed, the user can select a file structure by using the -s or -F option on the assign command.
The -s and -F options are mutually exclusive. The following examples show how to use different assign command options to select different file structures.
Structure
assign Command
F77 blockedassign -F f77textassign -F text assign -s textunblockedassign -F system assign -s unblockedCOS blockedassign -F cos assign -s cos
- To select an unblocked file structure for a sequential unformatted file:
IUN = 1 CALL ASNUNIT(IUN,'-s unblocked',IER) OPEN(IUN,FORM='UNFORMATTED',ACCESS='SEQUENTIAL') - The assign -s u command can also be used to specify the unblocked file structure for a sequential unformatted file. When this option is selected, I/O is unbuffered. Each Fortran READ or WRITE statement results in a read or write system call such as the following:
CALL ASNFILE('fort.1','-s u',IER) OPEN(1,FORM='UNFORMATTED',ACCESS='SEQUENTIAL') - To assign unit 10 a COS blocked structure:
assign -s cos u:10
- bin (not recommended)
- blocked
- cos
- sbin
- text
- unblocked
| Access and Form | assign -s ft Defaults | assign -s ft Options |
|---|---|---|
| Sequential unformatted, BUFFER IN and BUFFER OUT | blocked / cos / f77 |
|
| Direct unformatted | unblocked |
|
| Sequential formatted | text |
|
| Direct formatted | text |
|
A file with an unblocked file structure contains undelimited records. Because it does not contain any record control words, it does not have record boundaries. The unblocked file structure can be specified for a file opened with either unformatted sequential access or unformatted direct access. It is the default file structure for a file opened as an unformatted direct-access file.
Do not attempt to use a BACKSPACE statement to reposition a file with an unblocked file structure. Since record boundaries do not exist, the file cannot be repositioned to a previous record.
BUFFER IN and BUFFER OUT statements can specify a file having an unbuffered and unblocked file structure. If the file is specified with assign -s u, BUFFER IN and BUFFER OUT statements can perform asynchronous unformatted I/O.
There are several ways to use the assign command to specify unblocked file structure. All ways result in a similar file structure but with different library buffering styles, use of truncation on a file, alignment of data, and recognition of an end-of-file record in the file. The following unblocked data file structure specifications are available:
| Specification | Structure |
|---|---|
| assign -s unblocked | Library-buffered |
| assign -F system | No library buffering |
| assign -s sbin | Buffering that is compatible with standard I/O; for example, both library and system buffering |
The type of file processing for an unblocked data file structure depends on the assign -s ft option that is declared or assumed for a Fortran file.
For more information about buffering, see Specify Buffer Behavior.
An I/O request for a file specified using the assign -s unblocked command does not need to be a multiple of a specific number of bytes. Such a file is truncated after the last record is written to the file. Padding occurs for files specified with the assign -s bin command and the assign -s unblocked command. Padding usually occurs when noncharacter variables follow character variables in an unformatted direct-access file.
No padding is done in an unformatted sequential access file. An unformatted direct-access file created by a Fortran program on CLE systems contains records that are the same length. The end-of-file record is recognized in sequential-access files.
Use an assign -s sbin specification for a Fortran file opened with either unformatted direct access or unformatted sequential access. The file does not contain record delimiters. The file created for assign -s sbin in this instance has an unblocked data file structure and uses unblocked file processing.
The assign -s sbin option can be specified for a Fortran file that is declared as formatted sequential access. Because the file contains records that are delimited with the new-line character, it is not an unblocked data file structure. It is the same as a text file structure.
The assign -s sbin option is compatible with the standard C I/O functions.
Cray discourages the use of assign - s sbin because it typically yields poor I/O performance. If an FFIO layer cannot be used, using assign -s text for formatted files and assign -s unblocked for unformatted files usually produces better I/O performance than using assign -s sbin.
An I/O request for a file that is specified with assign -s bin does not need to be a multiple of a specific number of bytes. Padding occurs when noncharacter variables follow character variables in an unformatted record.
The I/O library uses an internal buffer for the records. If opened for sequential access, a file is not truncated after each record is written to the file.
The assign -s u command specifies undefined or unknown file processing. An assign -s u specification can be specified for a Fortran file declared as unformatted sequential or direct access. Because the file does not contain record delimiters, it has an unblocked data file structure. Both synchronous and asynchronous BUFFER IN and BUFFER OUT processing can be used with u file processing.
Fortran sequential files declared by using assign -s u are not truncated after the last word written. The user must execute an explicit ENDFILE statement on the file.
The text file structure consists of a stream of 8-bit ASCII characters. Every record in a text file is terminated by a newline character (\n, ASCII 012). Some utilities may omit the newline character on the last record, but the Fortran library treats such an occurrence as a malformed record. This file structure may be specified for a file that is declared as either formatted sequential access or formatted direct access. It is the default file structure for formatted sequential access and formatted direct access files.
The assign -s text command specifies the library-buffered text file structure. Both library and system buffering are done for all text file structures.
An I/O request for a file using assign -s text does not need to be a multiple of a specific number of bytes.
BUFFER IN and BUFFER OUT statements cannot be used with this structure. Use a BACKSPACE statement to reposition a file with this structure.
The cos or blocked file structure uses control words to mark the beginning of each sector and to delimit each record. Specify this file structure for a file that is declared as unformatted sequential access. Synchronous BUFFER IN and BUFFER OUT statements can create and access files with this file structure.
Specify this file structure with one of the following assign commands:
assign -s cos
assign -s blocked
assign -F cos
assign -F blocked
These four assign commands result in the same file structure.
An I/O request on a blocked file is library buffered.
In a cos file structure, one or more ENDFILE records are allowed. BACKSPACE statements can be used to reposition a file with this structure.
A blocked file is a stream of words that contains control words called Block Control Word (BCW) and Record Control Words (RCW) to delimit records. Each record is terminated by an EOR (end-of-record) RCW. At the beginning of the stream, and every 512 words thereafter (including any RCWs), a BCW is inserted. An end-of-file (EOF) control word marks a special record that is always empty. Fortran considers this empty record to be an endfile record. The end-of-data (EOD) control word is always the last control word in any blocked file. The EOD is always immediately preceded by either an EOR, or by an EOF and a BCW.
Each control word contains a count of the number of data words to be found between it and the next control word. In the case of the EOD, this count is 0. Because there is a BCW every 512 words, these counts never point forward more than 511 words.
A record always begins at a word boundary. If a record ends in the middle of a word, the rest of that word is zero filled; the ubc field of the closing RCW contains the number of unused bits in the last word.
The following illustration and table is a representation of the structure of a BCW.
| m | unused | bdf | unused | bn | fwi |
|---|---|---|---|---|---|
| (4) | (7) | (1) | (19) | (24) | (9) |
| Field | Bits | Description |
|---|---|---|
| m | 0-3 | Type of control word; 0 for BCW |
| bdf | 11 | Bad Data flag (1-bit, 1=bad data) |
| bn | 31-54 | Block number (modulo 224) |
| fwi | 55-63 | Forward index; the number of words to the next control word |
The following illustration and table is a representation of the structure of an RCW.
| m | ubc | tran | bdf | srs | unused | pfi | pri | fwi |
|---|---|---|---|---|---|---|---|---|
| (4) | (6) | (1) | (1) | (1) | (7) | (20) | (15) | (9) |
| Field | Bits | Description |
|---|---|---|
| m | 0-3 | Type of control word; 108 for EOR, 168 for EOF, and 178 for EOD |
| ubc | 4-9 | Unused bit count; number of unused low-order bits in last word of previous record |
| tran | 10 | Transparent record field (unused) |
| bdf | 11 | Bad data flag (unused) |
| srs | 12 | Skip remainder of sector (unused) |
| pfi | 20-39 | Previous file index; offset modulo 220 to the block where the current file starts (as defined by the last EOF) |
| pri | 40-54 | Previous record index; offset modulo 215 to the block where the current record starts |
| fwi | 55-63 | Forward index; the number of words to the next control word |
Specify Buffer Behavior
- Small I/O requests can be collected into a buffer, and the overhead of making many relatively expensive system calls can be greatly reduced.
- Many data file structures such as cos contain control words. During the write process, a buffer can be used as a work area where control words can be inserted into the data stream (a process called blocking). The blocked data is then written to the device. During the read process, the same buffer work area can be used to remove the control words before passing the data on to the user (called deblocking).
- When data access is random, the same data may be requested many times. A cache is a buffer that keeps old requests in the buffer in case these requests are needed again. A cache that is sufficiently large or efficient can avoid a large part of the physical I/O by having the data ready in a buffer. When the data is often found in the cache buffer, it is referred to as having a high hit rate. For example, if the entire file fits in the cache and the file is present in the cache, no more physical requests are required to perform the I/O. In this case, the hit rate is 100%.
- Running the I/O devices and the processors in parallel often improves performance; therefore, it is useful to keep processors busy while data is being moved. To do this when writing, data can be transferred to the buffer at memory-to-memory copy speed. Use an asynchronous I/O request. The control is then immediately returned to the program, which continues to execute as if the I/O were complete (a process called write-behind). A similar process called read-ahead can be used while reading; in this process, data is read into a buffer before the actual request is issued for it. When it is needed, it is already in the buffer and can be transferred to the user at very high speed.
- When direct I/O is enabled (assign -B on), data is staged in the system buffer cache. While this can yield improved performance, it also means that performance is affected by program competition for system buffer cache. To minimize this effect, avoid public caches when possible.
- In many cases, the best asynchronous I/O performance can be realized by using the FFIO cachea layer (assign -F cachea). This layer supports read-ahead, write-behind, and improved cache reuse.
The size of the buffer used for a Fortran file can have a substantial effect on I/O performance. A larger buffer size usually decreases the system time needed to process sequential files. However, large buffers increase a program's memory usage; therefore, optimizing the buffer size for each file accessed in a program on a case-by-case basis can help increase I/O performance and minimize memory usage.
The -b option on the assign command specifies a buffer size, in blocks, for the unit. The -b option can be used with the -s option, but it cannot be used with the -F option. Use the -F option to provide I/O path specifications that include buffer sizes; the -b, and -u options do not apply when -F is specified.
For more information about the selection of buffer sizes, see the assign(1) man page.
- If unit 1 is a large sequential file for which many Fortran READ or WRITE statements are issued, increase the buffer size to a large value, using the following assign command:
assign -b buffer-size u:buffer-count - If the file foo is a small file or is accessed infrequently, minimize the buffer size using the following assign command:
assign -b 1 f:foo
character*4 ch
integer int
open(iun,FILE='fgnfile',FORM='UNFORMATTED')
read(iun) ch, int assign -F ibm.vbs -N ibm -C ebcdic fgnfileassign -N swap_endian fgnfileThis assumes the file is a normal f77 unformatted file with 32-bit record control images with a byte count. The library routines swap both the control images and the data when reading or writing the file.
assign -N swap_endian g:suThe Fortran I/O library automatically selects default buffer sizes according to file access type as shown in the table, Default Buffer Sizes for Fortran I/O Library Routines. Override the defaults by using the assign command. The following subsections describe the default buffer sizes on various systems.
| Access Type | Default Buffer Size |
|---|---|
| Sequential formatted | 16 blocks (65,536 bytes) |
| Sequential unformatted | 128 blocks (524,288 bytes) |
| Direct formatted | The smaller of:
|
| Direct unformatted | The larger of:
|
Four buffers of default size are allocated. For more information, see the description of the cachea layer in the intro_ffio(3F) man page.
The term library buffering refers to a buffer that the I/O library associates with a file. When a file is opened, the I/O library checks the access, form, and any attributes declared on the assign command to determine the type of processing that should be used on the file. Buffers are an integral part of the processing.
- -s blocked
- -F spec (buffering as defined by spec)
- -s cos
- -s bin
- -s unblocked
The -F option specifies flexible file I/O (FFIO), which uses library buffering if the specifications selected include a need for buffering. In some cases, more than one set of buffers might be used in processing a file. For example, the -F bufa,cos option specifies two library buffers for a read of a blank compressed COS blocked file. One buffer handles the blocking and deblocking associated with the COS blocked control words, and the second buffer is used as a work area to process blank compression. In other cases (for example, -F system), no library buffering occurs.
The operating system uses a set of buffers in kernel memory for I/O operations. These are collectively called the system cache. The I/O library uses system calls to move data between the user memory space and the system buffer. The system cache ensures that the actual I/O to the logical device is well formed, and it tries to remember recent data in order to reduce physical I/O requests.
- -s sbin
- -F spec (FFIO, depends on spec)
For the assign -F cachea command, a library buffer ensures that the actual system calls are well formed and the system buffer cache is bypassed. This is not true for the assign -s u option. If assign -s u is planned to be used to bypass the system cache, all requests must be well formed.
assign -s u ...Use the assign command to bypass both library buffering and the system cache for all well-formed requests. The data is transferred directly between the user data area and the logical device. Requests that are not well formed will result in I/O errors.
Specify Foreign File Formats
character*4 ch
integer int
open(iun,FILE='fgnfile',FORM='UNFORMATTED')
read(iun) ch, intassign -F ibm.vbs -N ibm -C ebcdic fgnfileassign -N swap_endian fgnfileThis assumes the file is a normal f77 unformatted file with 32-bit record control images with a byte count. The library routines swap both the control images and the data when reading or writing the file.
assign -N swap_endian g:suSpecify Memory Resident Files
The assign -F mr command specifies that a file will be memory resident. Because the mr flexible file I/O layer does not define a record-based file structure, it must be nested beneath a file structure layer when record blocking is needed.
CALL ASNUNIT (2,'-F cos,mr',IER)
OPEN(2,FORM='UNFORMATTED')The -F cos,mr specification selects COS blocked structure with memory residency.
Use and Suppress File Truncation
The assign -T option activates or suppresses truncation after the writing of a sequential Fortran file. The -T on option specifies truncation; this behavior is consistent with the Fortran standard and is the default setting for most assign -s fs specifications.
The assign(1) man page lists the default setting of the -T option for each -s fs specification. It also indicates if suppression or truncation is allowed for each of these specifications.
FFIO layers that are specified by using the -F option vary in their support for suppression of truncation with -T off.
The following figure, Access Methods and Default Buffer Sizes, summarizes the available access methods and the default buffer sizes.
