.TOC .STL C Compiler & Library ***** Software Support Manual DECUS C LANGUAGE SYSTEM Compiler & Library Software Support Manual by Martin Minow Edited by Robert B. Denny DECUS Structured Languages SIG Version of 7-Aug-80 .INT This document describes the RSX/VMS/RSTS/RT11 DECUS C language system runtime support library. It also contains all internal functions and such compiler internal information as is avail- able. .PAGE .MID Copyright (C) 1980, DECUS .BLN 1 General permission to copy or modify, but not for profit, is hereby granted, provided that the above copyright notice is included and reference made to the fact that reproduction privileges were granted by DECUS. .BLN 1 The information in this document is subject to change without notice and should not be construed as a commitment by Digital Equipment Corporation or by DECUS. .BLN 1 Neither Digital Equipment Corporation, DECUS, nor the authors assume any responsibility for the use or reliability of this document or the described software. .BLN 1 This software is made available without any support whatsoever. The person responsible for an implementation of this system should expect to have to understand and modify the source code if any problems are encountered in implementing or maintaining the compiler or its run-time library. The DECUS `Structured Languages Special Interest Group' is the primary focus for communication among users of this software. UNIX is a trademark of Bell Telephone Laboratories. RSX, RSTS/E, RT11 and VMS are trademarks of Digital Equipment Corporation. .CPT Introduction .BLN 1 The C support library contains three major groups of functions: .SWT 1 o Functions called by the compiler to perform certain mathematical operations (such as floating-point multiply or function entrance) that are not performed in-line. o Functions that may be called by C programs to perform operations, such as copying a string, that are more efficiently performed by optimized assembly language subroutines. o An implementation of (most of) the Unix Standard I/O library. .BLN 1 The standard installation procedure yields two documents: CLIB.DOC, containing normal user documentation and WIZARD.DOC, containing information on all library routines, even those which are not directly called by user programs. This is WIZARD.DOC. .CPT The Standard Library .BLN 1 The RSX standard run-time library is in C:C.OLB. The RT11 standard run-time library is in C:CLIB.OBJ. .MID WARNING .MID This version of the library is somewhat incompatible .MID with previous versions of the Decus C compiler and with .MID Unix Version 6. This incompatibility allows greater .MID compatibility with Unix Version 7. .BLN 1 The standard I/O interface header file is in C:STDIO.H and should be included in all C programs as follows: .MID 1 #include .HLV 1 Introduction to Unix-style Stream I/O .LIN This section presents an overview of the `standard' I/O interface for C programs. These routines have proven easy to use and, while they do not provide the full functionality of RSX or RMS, they give the programer essentially all that is needed for everyday use. The discussion includes the following: .AND 0 Opening and closing files .AND0 Reading data from files .AND0 Writing data to files .LIN 1 Note that this file system is limited: files are sequential with only a limited amount of random-access capability. Also, little of the power of RMS/FCS is available. On the other hand, the limitations make C programs easily transportable to other operating systems. .BLN When a C program is started, three files are predefined and opened: .SWT 1 stdin The `standard' input file stdout The `standard' output file stderr The `standard' error file .LIN 1 Stderr is always assigned to the command terminal. Stdin and stdout are normally assigned to the command terminal, but may be reassigned or closed when the program starts. .HLV 1 Opening and Closing Files .BLN 1 The fopen and fclose functions control access to files. They are normally used as follows: .SWT 1 #include /* Define standard I/O */ FILE *ioptr; /* Pointer to file area */ ... if ((ioptr = fopen("filename", "mode")) == NULL) error("Can't open file"); .LIN 1 All information about an open file is stored in a buffer whose address is returned by fopen(). The buffer is dynamically allocated when the file is opened and deallocated when the file is closed. Its format is described under the heading IOV. The mode string contains flags defining how the file is to be accessed: .SWT 1 r Read only w Write new file The simplest way to read a file is to call the getchar() or getc() routines which read the next character from a file. For example, the following program counts the number of characters in a file: #include main() { register int ccount; register int lcount; register int c; FILE *ioptr; if ((ioptr = fopen("foo.bar", "r")) == NULL) error("Cannot open file"); ccount = 0; lcount = 0; while ((c = getc(ioptr)) != EOF) { count++; if (c == '\n') lcount++; } printf("%d characters, %d lines.\n", ccount, lcount); } .BLN 1 Other input routines include: .SWT 1 gets Read a line from the standard input fgets Read a line from a file fgetss Read a line from a file, remove terminator ungetc Push one character back on a file fseek Position the file to a specific record .LIN 1 These routines are used together with the ferr() and feof() functions which allow testing for error and end of file conditions, respectively. The package assumes that all error conditions are lethal and force end of file. .HLV 1 Writing Data to a File .BLN1 There are several routines for data output which are directly analagous to the data input routines: .SWT 1 putchar Write a character to standard output putc Write a character to a specified file puts Write a string to the standard outpt fputs Write a string to a specified file fputss Write a record to a specified file ftell Return record location (for fseek) .BLN 1 In addition, the printf() or fprintf() routine is used to format and print data (as was shown in the previous example). Printf() is flexible, powerful, and easy to use; and is perhaps the single most important routine in the file system. .HLV 1 Interaction with the Operating System .BLN 1 The support library attempts to provide a consistant operating system interface on several implementations of quite different operating systems. The possiblities include: .SWT 1 o Native RT11 o Native RSX11-M or IAS o RT11 emulated on RSTS/E o RSX11-M emulated on RSTS/E o RSX11-M emulated on VAX/VMS .BLN 1 This section mentions the inconsistencies and random bugs that are more or less intentionally present. .HLV 2 Logical Unit Numbers - .LIN 1 RT11, RSTS/E, and RSX11-M use small integers to number all I/O channels. Unfortunately, the numbering is not consistent and channel usage differs among the various systems. This has resulted in several quirks: .SWT 1 o On RT11, console terminal interaction uses the .ttyout and .print monitor functions. While no device is opened by the fopen() function, an channel number is reserved. o On RSX11-M, a channel must be allocated to the console terminal. When a C program starts, the `command terminal' is opened on logical unit 1 and assigned to stderr. This is not done using the fopen() routine (although the stderr IOV structure is defined as if fopen() were called). Also, fclose() cannot close stderr. In addition to the standard I/O routines, there are two routines, msg() and regdmp(), that direct output to logical unit 1. These routines were used to debug the library, and are otherwise useful for disasters. Note that, on RT11, msg() and regdmp() use the .print monitor function. o On both systems, the first true files are stdin and stdout. These are opened on logical units 0 and 1 (RT11) or 2 and 3 (RSX). Code in fclose() double-checks that the logical unit number of the file being closed corresponds to the proper slot in the logical unit table. o Since the standard I/O routines know little about the operating system, they do not deal with certain special features. For example, on RT11, the `user service routine' (USR) is used to open files. The file open routine does not attempt to locate IOV areas so that the USR does not swap over them. This can be a problem for very large programs. On RSX, logical unit numbers are assigned dynamically. This means that `LUN assignment' cannot be reliably performed by task-build control files (or task initiation). .HLV 2 Wild-Card Files - .LIN 1 On native RSX11-M and RSX11-M emulated on VMS or RSTS/E, the fwild() and fnext() routines can be used to process `wild-card' files. This functionality is not (currently) supported on RT11 modes. .LIN 1 On RSX11-M and VMS/RSX emulation, the file name, extension, and/or file version may be expressed as `*' to indicate wild-card support. Also, fwild() properly handles version numbers `;0' and `;-1'. Wild-card devices, units, or UIC codes are not supported. Note that, on RSX11 emulated on VMS, the `;-1' (earliest version) requires special processing by the user program, as noted in the description of fwild(). .LIN 1 On RSX11-M emulated on RSTS/E, the file name and/or extension may contain `*' or `?' wild-cards. Wild-card devices, units, or UIC (PPN) codes are not supported. .HLV 2 Memory allocation - .LIN1 Memory is allocated using the malloc() function. It works correctly on all operating systems, expanding memory on RSX and RT11/RSTS. On `native' RT11, all available memory is allocated by invoking the monitor .settop function with a `-2' argument. Neither library supports extended memory (PLAS) directives. .LIN 1 The sbreak() routine may be used for permanent allocation of memory. Once memory has been allocated using sbreak(), it cannot be freed. .CPT 1 Support Information .LIN 1 The C system has been implemented on VMS V2.0, RSX11-M V3.2, RSTS/E V7.0, and RT11 V03B. All installations should expect to have to modify the command procedures to suit their specific needs. While the compiler has been built from scratch on RT11, the procedures unrealistically assume a system with no disk memory limitiations. The C system has not been tested on IAS, RSX11-D, RSX11-PLUS, or HT11. .LIN The C system is distributed on 9-track magtape in `DOS-11' format. The full distribution requires a 2400 foot (or three 600 foot) tapes. .MID 1 Please Note .BLN 1 This is a DECUS `product'; there is no support available from the authors. The person responsible for an implementation of this system should expect to have to understand and modify the source code if any problems are encountered in implementing or maintaining the compiler or its run-time library. The DECUS `Structured Languages Special Interest Group' is the primary focus for communication among users of this software. .LIN 1 This section contains information on the internals of the compiler, assembler, and run-time libraries. It is not complete; it is probably not accurate either. .HLV 1 The C compiler .BLN 1 The compiler source is distributed on account [5,3]. There are four groups of files in this account: .SWT 1 o The C compiler source is in modules named CC???.MAC. The root code is in CC0RT.MAC, while overlays are in CC00?.MAC, CC10?, CC20?, and CC3??. The overlays have the following functions: .SWT 1 0. Pass 0 reads the input source file, writing a temporary file after processing #define, #include, and #ifdef statements. 1. Pass 1 reads the expanded source file, parses the language and writes an `intermediate code' file. In the intermediate file all operations have been converted into `low-level' constructions. Variable and structure references are compiled into absolute expressions. All information needed by the code generator is contained in this file. Except for compiler flags and file names, nothing is retained in memory. 2. Pass 2 reads the code file, writing PDP-11 assembly language (in the format requested by the AS assembler). 3. Pass 3 contains some end of processing code to delete files. All internal files are in Ascii. In order to make the compiler portable between RSX, RSTS/E, and RT11, the file system is simple and inefficient. RT11 users may run into intermediate file size problems. The only (software) solution requires invoking the compilation with explicit size arguments for the output files: #out.s[N1],interm.tm1[N2],expand.tmp[N3]=in.c The expanded source file should be less than the size of the input file plus all #include files. The intermediate file size may be as much as twice the size of the expanded source file. o C programs are assembled into PDP-11 object code by the AS assembler, whose source files are named `AS????.MAC'. The AS assembler is described in AS.DOC. o A support library for the two compilers is included in a group of files labeled A?????.MAC. The C compiler build process also builds the support library. This library is similar -- but not identical -- to the run-time support library. The compiler and assembler are installed by executing command files. There are several groups of command files supplied to assist in implementation on the various operating systems. It is highly likely that the files will have to be edited to suit each individual installation's needs. VMS, RSX, and RT11 files are run by the indirect command file processor, while RSTS/E files are run by the ATPK system program. RSTS/E installation requires must be run from a privlieged account. The following file conventions are used: VMS command files begin with the letter `V', RSTS/E RT11 mode with the letter `R', and RSTS/E RSX mode with the letter `X'. Native RT11 command files begin with `T' and native RSX11-M files begin with the letter `M'. Thus, XMAKCC.CMD builds the CC compiler for RSTS/E RSX11-M mode. The `.TKB' files are indirect command files for the task-builder. .MID 1 Warning .BLN 1 As distributed, the compiler and run-time libraries assume hardware support for the SXT (sign-extend) instruction. This is present on all PDP-11 CPU's with EIS, and on all versions of the LSI-11 (11/03, LSI-11, 11/23). If the compiler is to generate code that will run on the 11/05, 11/10, or 11/20, all RSX.MAC and RT11.MAC configuration files must be edited to define symbol C$$SXT equal to zero. Note that the compiler must be build with C$$SXT set correctly. .LIN 1 As distributed, the RSX configuration files assume hardware EIS support. If this is not the case (hardware EIS is not needed for RSX11-M) edit the RSX.MAC control files to define symbol C$$EIS equal to zero. There are two RT11 configuration files, and the RSTS/E library build procedure builds an EIS library, CLIB.OBJ, as well as a library without EIS support, CLIBN.OBJ. .LIN 1 The default `native' RT11 command file builds a non-EIS library, naming it CLIB.OBJ. Programs linked against the non-EIS library will run on machines with EIS. Also, save-images linked on RT11 will run on RSTS/E and vice-versa without relinking. .LIN 1 There are certain differences in task-image format between native RSX11-M and RSX11-M as emulated on RSTS/E. Thus, only source or object files are transportable between the two operating systems. .HLV 1 The C Run-time Support Library .BLN 1 The C support library is distributed on two accounts, [5,4] and [5,5]. Account [5,4] contains only those routines which have no I/O dependencies, while account [5,5] contains the standard I/O library. Many routines are conditionally compiled to select RT11 or RSX mode. All routines should be compiled together with either RT11.MAC (no EIS support), RT11.EIS (hardware EIS support), or RSX.MAC, identical copies of which are present in each account. .BLN 1 To build the libraries, execute one of the following command files: .SWT 1 RMAKLB.CMD RT11 library on RSTS/E VMAKLB.COM RSX library on VMS XMAKLB.CMD RSX library on RSTS/E MMAKLB.CMD RSX library on RSX11-M TMAKLB.COM RT11 library on RT11 .BLN 1 Account [5,1] contains a program, GETCMD.C which was used to build assembler command files. On RSTS/E, a command file, [5,1]RLBCMD.CMD, may be used to rebuild all library command files. All library source code is in PDP-11 MACRO. .HLV 1 The RSTS/E Interface Library .BLN Account [5,6] contains a library of subroutines that allow C programs access to all RSTS/E executive functionality. There is no documentation other than the source code and [5,6]README.506. .HLV 1 The RSX-11 Extensions Library .BLN 1 Account [5,7] contains a library of subroutines that allow C programs access to all RSX-11M executive functionality. Refer to [5,7]CX.DOC for more information. This library has not been tested on `emulated' RSX-11M modes (under VMS or RSTS/E), nor on IAS or RSX11-PLUS. .HLV 1 RT11 Extensions Library .LIN 1 Access to RT11 executive functionality may be had by calling the system library (SYSLIB). The C library call() routine may be used to generate the necessary calling sequences. Note that SYSLIB routines that depend on Fortran I/O conventions must not be called. .HLV 1 C Tools and Toys .BLN 1 The distribution kit contains several accounts [6,*] with C programs and subroutines. Account [6,1] contains a library of `software tools' which should be useful for all C installations. Among these are the following: .SWT 1 echo Echo arguments onto the standard output. This program serves as the primary test of the run-time library. grep `Get regular expression pattern.' This program reads one or more files, printing text lines that match a given pattern. kwik `Key word in context' utility. sortc A file sort utility. sorts Sort-merge subroutines for incorporation into other programs. wc Count bytes, words, and lines in one or more files. ccxrf Cross-reference listings for C source files. .BLN1 Please refer to README.??? files in the appropriate accounts for more information. .HLV 1 Building the Documentation .LIN The source of the library documentation is included within the library source (Macro) files. Before building the library documentation, compile and task-build the `Macro to RNO' conversion program, [5,1]GETRNO.C and use it to process files. Note that GETRNO.C only runs in RSX mode. See the source of GETRNO for further details. The control files [5,1]RGTRNO.CMD and [5,1]RGTDOC.CMD illustrate the process on RSTS/E. .CPT Compiler Internal Information .LIN As noted above, compiler work files are in human-readable Ascii. The parser generates a temporary file with an internal pseudo-code. The code generator converts this pseudo-code to PDP-11 assembly language. Except for compiler flags, all communication between passes is via intermediate files. .HLV 1 Intermediate Code File Format .LIN The intermediate code file consists of single line entries for each operation. The first byte is an operator, followed by parameter entries as needed. Numeric values are transmitted in octal. The following operators are defined: .SWT 1 Location counter control: C Switch to the string P-section (.STNG.) O Switch to the data P-section (.DATA.) P Switch to the program P-section (.PROG.) Flow control: L [number] Define a local label J [number] Jump to a local label X Function exit N Function entry Data reservation, etc.: A Generate a .even pseudo-op B [number] Generate a .blkb [number] D Generate an external label G Generate a .globl H Generate a .byte I Generate a .word as a local pointer Y Generate a .word Special operations: W Generate a switch table Q Make a symbol table entry M Transmit a line number between passes Tree (expression) operators: R Return S Switch E Compile for effect T Jump if true F Jump if false K Initialize variable Z Tree item Note that `trees' are the result of compiling expressions, such as (a + b * (c & 4)) * d Each non-terminal node of a tree generally consists of a left part, an operation, and a right part. Operator precedence has been resolved. The above example thus becomes: | ------------------ | | --------- | | | | | -------- | | | | | | | ----- | | | | | | ((a + (b * (c & 4))) * d) For example, here is a simple subroutine and the intermediate code it generates: fact(n) D fact | define function fact { N 0 4 | enter function if (n == 0) F 1 | if false, goto L1 Z [n == 0] | compile expr. value return(1); R | return Z 1 | compile expr. value J 2 | goto L2 return(n * fact(n - 1)); L 1 | Label L1, N != 0 R | return Z [n * fact(n - 1)] | compile expr. value L 2 | label L2 } X | Exit from function .LIN 1 The C source code is parsed by a top-down, recursive parser. Expressions are parsed bottom-up, by priority. Casts of types are processed by special routines. The parser does some optimizations, such as constant folding. Also, all automatic variables (local to a function), are assigned absolute offsets from the argument frame (Register 5), while structure elements are assigned absolute offsets from the base of the structure. .HLV 1 The Code Generator .LIN The code generator consists of a top-level input file reader, which processes `easy things' directly. It calls subroutines to compile switch statements and expressions. The expression compiler performs various optimizations, calls a Sethi-Ullman register allocator, then a code generator. The Sethi-Ullman algorithm was described in the Journal of the ACM, Volume 17, No. 4, October 1970. pp. 715-728. .HLV 1 Optimizations and Register Allocation - .BLN 1 The following optimizations are performed: .SWT 1 o Expressions are reordered. o Constant expressions are folded. o Dummy operations are eliminated, including -(-A), !(!A), ~(~A), (A & A) and (A | A). o Dummy address expressions are eliminated: *(&A) and &(*A) all become A. o Constant address arithmetic is performed in expressions such as &A + const. o Constant offsets from registers become index operations in expressions such as *(reg + constant) or *(reg - constant). o PDP11 autoincrement and autodecrement are used in expressions such as *p++ and *--p. o Conversion between integer and pointer datatypes are converted to multiply and divide operations. o Multiply by 1 is deleted. o The and operator is converted to BIC. o Null operations are deleted, including +0, -0, &-1, |0, ^0, |=0, etc. .BLN 1 Register allocation is as follows: .SWT 1 o R0 and R1 are always scratch registers. o R2, R3, and R4 may be reserved by the C program by using the `register ...' construction. The parser tells the coder the highest register it can use (in the function initialization intermediate code operation). o The Sethi-Ullman algorithm stores the number of registers needed into each tree node. This number is always >= 2 to insure that R0 and R1 are available for scratch usage. o `A tree is easy if the other tree uses few enough registers and there is a free register available.' o The GETREG subroutine claims registers. If the code table is modified, be sure that GETREG isn't called on a tree that isn't easy. .HLV 1 Code generation - .LIN 1 Almost all code is generated by expanding macro operators out of the code tables. There are four tables (CTAB, ETAB, STAB, RTAB) and a pseudo-table, TTAB. Only RTAB is complete, If an expansion which was attempted in CTAB or ETAB fails, it is retried in RTAB. If an expansion which was attempted in STAB fails, it is done by: .SWT 1 [Rtab] mov Reg,-(sp) .BLN 1 The code body is in Ascii; bytes with the parity (eighth) bit set are used macro operators: `address of left', `push right', etc. Code table entries are selected by the type of the arguments (int, char, long, unsigned, float, double, and pointer) and by kind (constant, address, easy, and any). Here is a typical code table (for addition): .SWT 1 int any int con1 ; Add 1 to any integer [LL] ; Compile left (subtree) inc [R] ; INC Rx int any int addr ; Add var. to anything [LL] ; Compile left (subtree) add [AR],[R] ; ADD var,Rx int any int easy [SRV] ; NOP if it's an address [LL] ; Compile left (subtree) add [AR],[R] ; Add var,Rx int any int any [PR] ; Compile right to stack [LL] ; Compile left to reg. add (sp)+,[R] ; ADD (sp)+,Rx .BLN 1 Here is another example, `= for effect' (as in `A = B;'): .SWT 1 int any int con0 ; I = 0; char any int con0 ; C = 0; [SLAC] ; Get left op. address clr[TL] [AL] ; CLR(B) var int addr int any char addr int any [LR] ; Compile right subtree mov[TL] [R],[AL] ; MOV(B) Rx,var int any int easy char any int easy [SRV] [SLAC] mov[TL] [AR], [AL] int any int any char any int any [PLA] [LR] mov[TL] [R],@(sp)+ .HLV 1 Macros Used in Code Generation - .BLN 1 The following are defined in the code generator: .SWT 1 [M] Set modulo return [F] Set function return [R] Current register [R+1] Current register + 1 [AL] Address of left [ALN] Address of left, no side effect [AR] Address of right [ARN] Address of right, no side effect [OP0] Opcode [OP1] Opcode [AL+2] Address of left, long [AR+2] Address of right, long [TL] Type of left [T] Type of right or left [SRVA] Set right value anywhere [SRV] Set right value [SRAA] Set right address anywhere [SRA] Set right address [SLVA] Set left value anywhere [SLV] Set left value [SLAA] Set left address anywhere [SLA] Set left address [SLAC] Set left address current reg. [LL] Load left [LL+1] Load left into [R+1] [LR] Load right [PL] Push left [PLA] Push left address [PR] Push right [V] ADC or SBC for longs .HLV 1 Extended Hardware Support .LIN 1 This version of the compiler contains some of the code necessary for generation of inline EIS (hardware integer multiply-divide) and FPU (hardware floating-point) operations. The code does not currently work: EIS instructions are always compiled by calling subroutines and all attempts to use floating-point result in a `fatal compiler abort.' To add EIS support, you must write code table entries (in CC206 at label EISTAB) and debug pass 2. This is a fair chore. Adding floating-point support requires writing code table entries as well as a fair amount of code to perform conversions. This is liable to be a somewhat difficult undertaking. Do not assume that any of the code that currently exists actually works correctly -- it has never been tested.