***************************************************************************** ANSI-ISO PASCAL FAQ Welcome to the FAQ ! I will answer here various questions about ANSI-ISO Pascal, and compilers of that language. This FAQ is not limited to any one machine, operating system or language level. Any language that is based on the original Wirth language or the standards that came from it may be covered here. Requests to add information are welcome, submissions are encouraged. ***************************************************************************** ========= "CLASSIC" PASCAL =========== This section concentrates on the original Pascal as defined by N. Wirth, and utilized in the early standards. ***************************************************************************** Q. WHY IS THE NAME PASCAL ? A. This should be a trivia question. Pascal was named after the French mathematician Blaise Pascal, who created a calculating machine (not a true computer). ***************************************************************************** Q. WHAT IS ANSI-ISO PASCAL ? A. Pascal is one of a series of languages put forth by one of the most prolific computer language creators, Nicklaus Wirth, a professor at Institut fur informatik, ETH, Zurich, Switzerland. Professor Wirth participated in various versions of Algol, a language put forth by international cooperation that introduced the basic concepts of structured programming to the world. Wirth terms Pascal as a descendant of Algol 60 (for Algol, 1960 standard). The "official" descendant of Algol 60 was Algol W, famous for having assignment as an expression operator (a basic feature of the later language C). Wirth felt that the design committee for Algol, after Algol 60, was losing focus and creating an unnecessarily complex language. While Algol W has had it's fans, the language Pascal was considered to be a new high of consistent language design. The first draft of Pascal was created in 1968. The first compiler was operational in 1970, and the language was generally published in 1971. In 1973, after two years of testing and use, the language was revised into it's final form. The first compiler for Pascal was implemented on a CDC 6000 computer at ETH, for "unrevised" Pascal. After the language was revised, a new, high optimization compiler for the new language was created using the old compiler, then the source for that compiler itself changed to "revised" format, so that it could compile itself (known as "bootstrapping" a compiler). In 1974 there were 10 compilers running on various systems. By 1979 there were at least 80. In 1977, various committees began the work to standardize the language. In 1982, the ISO (International Standards Organization) issued ISO 7185, the official Pascal standard. In the same year, the US ANSI committee issued ANSI/IEEE770X3.97-1983, the US standard for Pascal. In addition, several countries around the world issued their own national standard for Pascal. ***************************************************************************** Q. What are the different Pascal standards? There are currently 3 different documents that can be classified as Pascal standards: Unextended Pascal, Extended Pascal, and the Object-Oriented Extensions to Pascal. ***************************************************************************** Q. WHAT IS THE CURRENT STATUS OF PASCAL STANDARDS ? A. Originally, unextended Pascal was actually 2 standards. An ANSI/IEEE standard (ANSI/IEEE770X3.97-1993) and an ISO standard (ISO 7185 : 1983). There were 2 standards for mostly political differences that I won't get into here. For the most part, the ISO standard was a superset of the ANSI/IEEE standard and included the conformant array feature. See the foreword of Extended Pascal for some additional history of the development of unextended Pascal. In 1989, ISO 7185 was revised to correct various errors and ambiguities found in the original document. Also, the ANSI/IEEE770X3.97 standard was replaced with a "pointer" to the ISO 7185 standard. So finally in 1989, there was only 1 unextended Pascal standard in the world. The unextended Pascal standard (ISO 7185 : 1990) is still in force as a valid ISO language standard. ***************************************************************************** Q. WHAT ARE THE BASIC FEATURES OF PASCAL ? A. Pascal is a structured language, using if-then-else, while, repeat-until, and for-to/downto control structures. It differs primarily from proceeding languages in that data structures were also included, with records (a feature borrowed from COBOL), arrays, files, sets and pointers. Pascal is also unusual for forging an effective compromise between language simplicity, power, and matching of language structures to underlying machine implementation. Pascal also has many features for compiler writers. The language is constructed to have a minimum of ambiguity. Pascal, with few exceptions, can be processed "forward" with all of the smaller elements (like constants, types, etc) being defined before they are used. Pascal requires the types and exact sizes of operands to be known before they are operated on, again leading to simplified language processing and efficient output code (although this feature has often been called a problem). For this reason, Pascal still remains a popular language to implement compilers for as part of a compiler science class. ***************************************************************************** Q. WHAT IS J&W (OR THE "REPORT") ? A. This refers to the "Pascal user manual and report", by Kathleen Jensen and Niklaus Wirth. This is the original bible of Pascal. The second edition contained the finalized language under Wirth. It is no longer available. The current edition is the third, containing almost twice as many pages, and contains the second edition extensively revised to meet the ISO Pascal standard. ***************************************************************************** Q. WHAT ARE THE DIFFERENCES BETWEEN STANDARD PASCAL AND THE ORIGINAL PASCAL ? WAS IT CHANGED EXTENSIVELY ? A. [This is one of the common myths. In fact, Microsoft customer service explained to me by phone conversation that they believed that the reason they were not compatible with ANSI or ISO pascal was that their compilers were based on Wirth's original Pascal, and the ANSI and ISO language had in fact been changed extensively. This continues to be repeated as fact around the internet] The stated goal of the standards committees was to keep Pascal unchanged, but simply address the insecurities and ambiguities that had been discovered by users of the language. The MAJOR changes are: 1. Procedure and function parameters (where the procedure or function itself was passed as a reference) appeared without a parameter list in the declared procedure or function. The standard requires that the parameter list appears as well, so that it can be checked against any call of that procedure or function. For example: procedure junk(function y: real); begin y(z); ... end; ... x := junk(sin, y); procedure junk(function y(x: real): real); etc. 2. The original language only allowed procedure and function parameters to have value parameters. The standard allows value or VAR parameters. 2. In conjunction with (2), standard procedures and functions (those defined by the compiler itself) are no longer acceptable as procedure or function parameters in the standard. The REPORT shows several examples of passing such functions. 3. In the original language, it was left as implementation defined as to the exact rules of whether type x was compatible with type y. In fact, the first implementations at ETH (which were not documented in the REPORT) were based on "best effort", such that: var rx: record x, y: integer; c: char end; and var ry: record x: integer; y: 0..10; c: char end; Were considered compatible because they had the same basic structure. The standard tightened these rules up considerably. In the standard, types are compatible with a few exceptions only if they are the same type or "aliases" of the same type as: type a = b; The standard also exactly defines the rules for assignment and other compatibility modes. 4. The REPORT defined symbol lengths to be implementation defined. The standard defines them as "unlimited", which for practical purposes means that if the program lines will fit through the compiler, and a symbol fits on one of those lines, it should work. 5. The report leaves the rules for intra procedure goto's as implementation defined. The standard says they must only target the OUTER level of the block. 6. The control variable in a "for" statement must be a variable local to the procedure, function or program block in which it appears. This change was to allow a more efficient implementation with better checking. In fact, most of what the standard did was simply acknowledge what were already good coding practices. The original REPORT method of assuring portability could be stated as: APPLICATION: Stay within the guidelines as possible. Don't rely on implementation dependent features, such as the compiler's ability to recognize the similarity between types, etc. COMPILER: Implement the language as fully as possible, and always try to do the most reasonable thing for implementation dependent features, such as attempting to determine whether types are compatible as best as possible. The idea being that a program will not fail unless it is a poorly written program run on a poorly written compiler. The standard changed that to a much more exact set of rules that all compilers and programs must follow. As an example of the compatibility between the REPORT language and the standard, I moved several thousand lines of my own Pascal source from the "old" to a standard compiler without A SINGLE CHANGE because of the standard. The only error I found was that the compiler would not accept: var s: array [1..10] of char; ... writeln(s); Because such Pascal strings must be "packed". This was actually also required in the REPORT, I just had not read it correctly (or well enough). ***************************************************************************** Q. WHAT ARE THE DIFFERENCES BETWEEN BORLAND PASCAL AND THE STANDARD ? A. Because Borland Pascal is arguably the most prevalent version of Pascal in existence, it is useful to compare the two languages. Note that I compare here only the differences between Borland and the basic standard. Undiscussed are any extensions provided by Borland. In other words, this section answers the question "why doesn't my standard Pascal program run under Borland ?", and perhaps "what can I write in Borland that will also be compatible with the standard ?". Borland originally claimed to be compatible with the ANSI version of the standard (the first CP/M Turbo Pascal). As to whether or not the omissions in Borland cause portability problems or are easily surmounted, this is for you to decide. As to why these differences exist, this is certainly a story on its own. Borlands C compiler did not originally match the C language either (as detailed by Kernighan and Ritchie's "white book"), but Borland corrected their compiler to meet the ANSI standard after it (the standard) was issued. Call it a political issue. 1. Lack of file buffer variable handling. Standard Pascal has file "buffer variables", and "get" and "put" procedures to operate on them. This functionality was entirely omitted in Borland Pascal. 2. Lack of intraprocedural gotos. UCSD introduced this convention, which was designed to both discourage use of "goto"s and keep language implementation simple. Unfortunately, intraprocedural gotos are the most useful type of gotos: program test; label 99; procedure alpha; begin if error then goto 99 { exit } end; begin ... 99: { clean up files and exit } ... end. Intraprocedural gotos are used to implement error "bailout" in Pascal, similar to "exception handling" in other languages. Borland later added a goto method to do this that was completely incompatible with the standard (patterned after the C language). 3. Lack of procedure and function parameters. Borland Pascal provides a much more general and powerfull mechanism of procedure and function "types", which are then used to create procedure and function parameters: type CompareFunction = function(Key1, Key2 : string) : integer; function Sort(Compare : CompareFunction); begin ... end; Great. But Borland does not also implement the method from original Pascal, requiring source changes. 4. Lack of "sized" dynamic variable allocation. Standard Pascal allows the tag fields of a variant record to be specified as a parameter to the "new" and "dispose" procedures: var r: record case b: boolean of false: (i: integer); true: (c: char); end; p: ^r; ... new(p, true); This allows variant records to take up less space in memory. It should also be noted that Borland Pascal allows a considerable number of operations that would be errors in the standard language. This leaves the opportunity for a programmer to inadvertently create programs that break the standard in many ways. Also, borland requires the use of a non-standard integer type ("longint") to get the maximum precision of an integer. ***************************************************************************** WHAT IS THE DIFFERENCE BETWEEN PASCAL AND C OR SIMILAR LANGUAGES ? C is called a "low level" language because it operates on the kinds of units that the CPU itself deals with, such as integers, and pointers. In particular, the most powerfull (and dangerous) feature of C is the ability to treat any array reference as a pointer reference, and vice versa. C can also translate one type to another at will. This creates a "insecure" language, which means that it is not possible for the compiler to check if the program will go wild and start writing all over variables, programs, or the operating system, hard disk, and perhaps that unbacked up copy of your big project. Before I used high level languages, I worked in assembly exclusively (there were few or no compilers available in the early days of microprocessors). I was debating a friend about which was better, assembly language or HLLs, and told him that if I were to use a HLL, I would still potentially have a program that could write on itself, but would be in a worse position to do something about it, since I would essentially be reduced to disassembling generated code to find the problem. He pointed out something that I did not believe at the time, that it was indeed possible to design an HLL so that the program could never write or read anything but data, and could do no more damage than going into an infinite loop (while staying in the program). In fact, that level security is possible, and not even very expensive. And when a program is well tested and mature, even that level of security can be dropped out by compiler option, meaning that a secure language can be just as efficient as a non-secure one. Working with a secure language is a true pleasure. If the program "halts" (goes into an infinite loop), I can just hit a key and find out where it is stuck at. Rebooting the machine is not required, and because the program cannot destroy data arbitrarily, debugging the problem is much less difficult. In fact, I don't think it is an overstatement to say that at least %50 of debugging time is saved using this arrangement. In the meantime, C has taken a dramatic 180 degree turn back to type security. Most compilers now do extensive type checking, and complain (sometimes to excess) about any bad use of types in C. The C++ language attempts to bring C all the way back to type security, including the ability to check for out of bounds array (pointer) references. When translating from one to the other, pointers and type conversion are typically the central issue. Pascal cannot arbitrarily point to anything, nor translate any type to another. In my experience, program translations can be done only one way. Pascal programs can be translated to C, but most C programs cannot be translated to Pascal, because that program will just contain too many broken rules to be corrected without a complete rewrite. ***************************************************************************** WHY IS C MORE EFFICIENT THAN PASCAL ? It isn't. This is one of the more common myths. The old proof goes something like this: a++ Means "add one" in C, and: a := a+1 Means "add one" in Pascal. For anyone who knows assembly language, you know that there is usually a special machine instruction for "add one" or "increment", and that to add a constant one may take two instructions. This gets more complex. In C, pointers are synonyms for arrays, so you can pick up an array and walk through it: void strlen(char s[]) { char l[]; l = s; while (*l++); return l-s; } Gives the length of a zero terminated string. The program relys on being able to directly index into the string with a machine address (pointer), and perform math with two pointers. So C is more efficient than Pascal right ? Well, assuming you have two compilers as dumb as posts, yes, that would be true. And in the early days of desktop computing, compilers that dumb were common (in fact, a HARDWARE feature of the 286 and later CPUs, and of the operating systems, was built around stupid compilers[&]). But any modern compiler knows that "a+1" can be changed to an increment. And compilers know how to translate array references to pointer references. In fact, to a good optimizing compiler, the source language is irrelevant. By the time the program is encoded, it has been remade extensively, to the point it may no longer even resemble the original source. At this level, the only effect the source language has on the output is that one language may give the compiler more information about the program than another. And Pascal is better at this than any other language in common usage. So the fact that a C programmer has taken great pains to use pointers instead of arrays may just be wasted effort. A higher level description of the problem will probably end up producing the exact same code with less programmer effort. The C language was a very clever design in that it could be implemented with little or no optimization and still yield acceptable code, by placing the burden of optimization upon the programmer. This was a very important characteristic of C when microprocessors were still limited to 64kb of address space. [&] I refer to the fact that the 286 and later processors, and OS/2 and windows (even win32) are built to use stack parameter passing, which most PC compilers no longer use. ***************************************************************************** Q. WHAT GOOD BOOKS ARE AVAILABLE ON THE STANDARD LANGUAGE ? A. Many books have been published on Pascal. I will be happy to collect reviews here. TITLE: Pascal user manual and report, third edition. Kathleen Jensen and Niklaus Wirth, Revised by Andrew Mickel and James Miner. Published by Springer-Verlag. COMMENTS: A definitive reference on standard pascal and a must have book. TITLE: Standard Pascal: User reference manual. Doug Cooper. Published by W. W. Norton and Company. COMMENTS: Doug Cooper is a Professor at UC Berkley. If you buy just ONE book on standard Pascal, this would be it. Contains ALL of the points in the standard, in the most readable format anywhere. TTTLE: Oh! Pascal. Doug Cooper. Published by W. W. Norton and Company. COMMENTS: Another Doug Cooper blockbuster, this is probably the most used classroom book on Pascal. Recommended if you are learning standard Pascal. ***************************************************************************** Q. Where can I find books ? A. On the Web, I found a server with an amazing number of books available for order at: http://www.amazon.com The drawback is that no real book descriptions are included, and shipping is expensive. But this would seem the way to go to get hard to find books. Submission of other interesting online catalogs are encouraged. ***************************************************************************** Q. WHAT ARE THE RULES OF ANSI/ISO PASCAL ? A. It is unusual to describe a language completely in a FAQ, but books on standard Pascal are sufficiently rare that I feel it is warranted. Also, many books introduce themselves as "books on Pascal", without specifying what language they use (in an obvious manner). I have seen several such books that are really based on non-standard Pascals. You can match the features in the book to the actual standard language here. Note that the following description could be wrong or incomplete. LEXOGRAPHY Pascal source consists of identifiers, keywords, numbers and special character sequences. A pascal identifier must begin with 'a' to 'z', but may continue with 'a' to 'z' and '0' to '9'. There is no length limit on labels, but there may be a practical limit. If the compiler cannot process a source line longer than N, you cannot have a label longer than N, since labels may not cross lines. Keywords (or reserved words) appear just as labels, but have special meaning wherever they appear, and may never be used as identifiers: and array begin case const div do downto else end file for function goto if in label mod nil not of or packed procedure program record repeat set then to type until var while with A number can appear in both integer and real form. Integers will appear as a sequence of digits: 83 00004 Are valid integer numbers. For a number to be taken as "real" (or "floating point") format, it must either have a decimal point, or use scientific notation: 1.0 1e-12 0.000000001 Are all valid reals. At least one digit must exist on either side of a decimal point. Strings are made up of a sequence of characters between single quotes: 'string' The single quote itself can appear as two single quotes back to back in a string: 'isn''t' Finally, special character sequences are one of the following: + - * / = < > [ ] . , : ; ^ ( ) <> <= >= .. @ { } (* *) (. .) Note that these are just aliases for the same character sequence: @ and ^ (or the "up arrow" if allowed in the typeface) (. and [ .) and ] (* and { *) and } Spaces and line endings in the source are ignored except that they may act as "separators". No identifier, keyword, special character sequence or number may be broken by a separator or other object. No two identifiers, keywords or numbers may appear in sequence without an intervening separator: MyLabel - Valid My Label - Invalid begin farg := 1 - Valid beginfarg := 1 - Invalid 1.0e-12 - Valid 1.e-122e-3 - Invalid PROGRAM STRUCTURE A Pascal program appears as a nested set of "blocks", each of which has the following form: block_type name(parameter [, parameter]...); label x[, y]... const x = y; [q = r;]... type x = y; [q = r;]... var x[,y]...: z; [x[,y]...: z;]... [block]... begin statement[; statement] end[. | ;] Note that: [option] means optional. [repeat]... means can appear 0 or more times. [x | y] means one or the other. There are three types of blocks, program, procedure and function. Every program must contain a program block, and exactly one program block exists in the source file. Each block has two distinct sections, the declaration and statements sections. The declarations immediately before a statement section are considered "local" to that section. The declaration section builds a description of the data used by the coming statement section in a logical order. For example, constants are usually used to build type declarations, and type declarations are used to build variables, and all of these may be used by nested blocks. LABEL DECLARATION The first declaration, labels, are numeric sequences that denote the target of any goto's appearing in the block: label 99, 1234; Are valid labels. Labels "appear" to be numbers, and must be in the range 0 to 9999. The "appearance" of a number means that: label 1, 01, Are the same label. CONSTANT DECLARATION Constant declarations introduce fixed valued data as a specified identifier: const x = 10; q= -1; y = 'hi there'; r = 1.0e-12; z = x; Are all valid constant declarations. Only integer, real and character constants may be so defined (no sets may appear). TYPES The type declaration allows types to be given names, and are used to create variables later: type x = array [1..10] of integer; i = integer; z = x; Types can be new types, aliases of old types, etc. VARIABLE DECLARATION Variables set aside computer storage for a element of the given type: var x, y: integer; z: array [1..10] of char; BLOCK DECLARATION A block can be declared within a block, and that block can declare blocks within it, etc. There is no defined limit as to the nesting level. Because only one program block may exist, by definition all "sub blocks" must be either procedure or function blocks. Once defined, a block may be accessed by the block it was declared in. But the "surrounding" block cannot access blocks that are declared within such blocks: program test; procedure junk; procedure trash; begin { trash } ... end; { trash } begin { junk } trash; ... end; { junk } begin { test } junk; ... end. { test } Here test can call junk, but only junk can call trash. Trash is "hidden" from the view of test. Similarly, a subblock can access any of the variables or other blocks that are defined in surrounding blocks: program test; var x; procedure q; begin end; procedure y; begin q; x := 1 end; begin y; writeln('x') end. The variable "x" can be accessed from all blocks declared within the same block. It is also possible for a block to call itself, or another block that calls it. This means that recursion is allowed in Pascal. DECLARATION ORDER Every identifier must be declared before it is used, with only one exception, pointers, which are discussed later. But there is a way to declare procedures and functions before they are fully defined to get around problems this may cause. PREDEFINED TYPES Several types are predeclared in Pascal. These include integer, boolean, char, real and text. Predeclared types, just as predeclared functions and procedures, exist in a conceptual "outer block" around the program, and can be replaced by other objects in the program. BASIC TYPES Types in Pascal can be classed as ordinal, real and structured. The ordinal and real types are referred to as the "basic" types, because they have no complex internal structure. Ordinal types are types whose elements can be numbered, and there are a finite number of such elements. INTEGER TYPES The basic ordinal type is "integer", and typically it represents the accuracy of a single word on the target machine: var i: integer; A predefined constant exists, "maxint", which tells you what the maximum integral value of an integer is. So: type integer = -maxint..maxint; Would be identical to the predefined type "integer". Specifically, the results of any operation involving ordinals will only be error free if they lie within -maxint to +maxint. Although other ordinal types exist in Pascal, all such types have a mapping into the type "integer", and are bounded by the same rules. The "ord" function can be used on any ordinal to find the corresponding integer. ENUMERATED TYPES Enumerated types allow you to specify an identifier for each and every value of an ordinal: type x = (one, two, three, four); Introduces four new identifiers, each one having a constant value in sequence from the number 0. So for the above: one = 0 two = 1 three = 2 four = 3 Enumerated types may have no relationship to numbers whatever: type y = (red, green, blue); Or some relationship: type day = (mon, tue, wed, thur, fri, sat, sun); Here the fact that "day"s are numbers (say, isn't that a lyric ?) is usefull because the ordering has real world applications: if mon < fri then writeln('yes'); And of course, subranges of enumerated types are quite possible: type workday = (mon..fri); Enumerated types are fundamentally different from integer and subrange types in the fact that they cannot be freely converted to and from each other. There is only one conversion direction defined, to integer, and that must be done by special predefined function: var i: integer; d: day; ... i := ord(d); { find integral value of d } BOOLEAN TYPES The only predefined enumerated type is "boolean", which could be declared: type boolean = (false, true); However, booleans cannot be cross converted (being enumerated types), this user created type could not in fact be used just as the predeclared one. Booleans are special in that several predefined procedures, and all of the Comparison operators ("=", ">", etc.) give boolean results. In addition, several special operators are defined just for booleans, such as "and", "or" etc. CHARACTER TYPES Character types in Pascal hold the values of the underlying character set, usually ISO single byte encoded (including ASCII). The Pascal standard makes no requirements as to what characters will be present or what order they will appear in. However, as a practical matter, most Pascal programs rely on the characters of the alphabet and the digits '0'-'9' being present, and that these are numbered sequentially (which leaves out EBCDIC, for example). A character declaration appears as: var c: char; Character values can also be converted to and from integers at will, but only by using the special functions to do so: ord(c); { find integer value of character } chr(i); { find character value of integer } SUBRANGE TYPES Subrange types are simply a voluntary programmer restriction of the values an ordinal type may hold: type constrained = -10..50; (the notation x..y means all values from x to y inclusive.) It is an error to assign a value outside of the corresponding range to a variable of that type: var x: constrained ... x := 100; { invalid! } But note that there are no restrictions on the USE of such a type: writeln('The sum is: ', x+100); Here, even though the result of x+100 is greater than the type of x, it is not an error. When used in an expression, a subrange is directly equivalent to the type "integer". Subranges can be declared of any ordinal type: type enum = (one, two, three, four, five, six, seven, eight, nine, ten); var e: three..seven; var c: 'a'..'z'; Etc. REAL TYPES Real types, or "floating point", allow approximations of a large range of numbers to be stored. The tradeoff is that reals have no direct ordinality (cannot be counted), and so have no direct relationship with integers. Real types are the only basic type which is not ordinal. var r: real; Integers are considered "promotable" to reals. That is, is is assumed that an integer can always be represented as a real. However, there may be a loss of precision when this is done (because the mantissa of a real may not be as large as an integer). Reals are never automatically promoted to integer, however, and the programmer must choose between finding the nearest whole number to the real, or simply discarding the fraction. This choice must be made explicitly by predefined function. STRUCTURED TYPES A structured type is a type with a complex internal structure. In fact, the structured types all have one thing in common: they can hold more than one basic type object at one time. They are structured because they are "built up" from basic types, and from other structured types. PACKING Structured types can also be "packed", which is indicated by the keyword "packed" before the type declaration. Packing isn't supposed to change the function of the program at all. Stripping the "packed" keywords out of a program will not change the way it works (with the exception of "strings", below). Packing means that (if implemented: its optional) the program should conserve space by placing the values in as few bits as possible, even if this takes more code (and time) to perform. Packing is better understood if you understand the state of computers before Microprocessors (the jurassic age of computers ?). Most mainframe computers access memory as a single word size only, and not even a neat multiple of 8 bits either (for example, 36 bit computer; the CDC 6000 has 60 bit words). The machine reads or writes in words only. There is no byte access, no even/odd addressing, etc. Because storage on such a machine of small items could be wastefull (especially characters), programs often pack many single data items into a single word. The advent of the Minicomputer changed that. DEC started with an 8 bit machine (just as microprocessors did), and when they changed to 16, then 32 bits the ability to address single bytes was maintained. For this reason, many people refer to such a machine as "automatically packed", or that Pascals packing feature is unessary on such machines. However, quantizing data by 8 bit bytes is not necessarily the most extreme packing method available. For example, a structure of boolean values, which take up only 1 bit per element, left to byte packing would waste 7/8s of the storage allocated. SET TYPES Set types are perhaps the most radical feature of Pascal. A set type can be thought of as an array of bits indicating the presence or absence of each value in the base type: var s: set of char; Would declare a set containing a yes/present or no/not present indicator for each character in the computer's character set. The base type of a set must be ordinal. ARRAY TYPES The most basic structured type is the array. Pascal is unusual in that both the upper and lower bounds of arrays are declared (instead of just the upper bound or length), and that the index type can be any ordinal type: var a: array [1..10] of integer; Would declare an array of 10 integers with indexes from 1 to 10. You may recognize the index declaration as a subrange, and indeed any subrange type can be used as an index type: type sub = 0..99; var a: array [sub] of integer; Arrays can also be declared as multidimensional: var a: array [1..10] of array [1..10] of char; There is also a shorthand form for array declarations: var a: array [1..10, 1..10] of char; Is equivalent to the last declaration. A special type of array definition is a "string". Strings are arrays of packed characters, with integer indexes, whose lower bound is 1: var s: packed array [1..10] of char; String types are special in that any two strings with the same number of components are compatible with each other, including constant strings. RECORD TYPES Records give the ability to store completely different component types together as a unit. There they can be manipulated, copied and passed as a unit. It is also possible to create different typed objects that occupy the same storage space. var r: record a: integer; b: char end; Gives a single variable with two completely different components, which can be accessed independently, or used as a unit. var vr: record a: integer; case b: boolean of { variant } true: (c: integer; d: char); false: (e: real) { end } end; Variant records allow the same "collection of types", but introduce the idea that not all of the components are in use at the same time, and thus can occupy the same storage area. In the above definition, a, b, c, d, and e are all elements of the record, and can be addressed individually. However, there are three basic "types" of record elements in play: 1. "base" or normal fixed record elements, such as a. 2. The "tagfield" element. Such as b. 3. The "variants", such as c, d, and e. All the elements before the case variant are normal record elements and are always present in the record. The tagfield is also always present, but has special function with regards to the variant. It must be an ordinal type, and ALL of it's possible values must be accounted for by a corresponding variant. The tagfield gives both the program and the compiler the chance to tell what the rest of the record holds (ie., what case variant is "active"). The tagfield can also be omitted optionally: var vr: record a: integer; case boolean of { variant } true: (c: integer; d: char); false: (e: real) { end } end; In this case, the variant can be anything the program says it is, without checking. The variants introduce what essentially is a "sub record" definition that gives the record elements that are only present if the selecting variant is "active". A variant can hold any number of such elements. If the compiler chooses to implement variants, the total size of the resulting record will be no larger than the fixed record parts plus the size of the largest variant. It is possible for the compiler to treat the variant as a normal record, allocating each record element normally, in which case the variant record would be no different from a normal record. FILE TYPES Files are identical to arrays in that they store a number of identical components. Files are different from arrays in that the number of components they may store is not limited or fixed beforehand. The number of components in a file can change during the run of a program. A file can have any type as a component type, with the exception of other file types. This rule is strict: you may not even have structures which contain files as components. A typical file declaration is: var f: file of integer; Would declare a file with standard integer components. A special predefined file type exists: var f: text; Text files are supposedly equivalent to: type text = file of char; But there are special procedures and functions that apply to text files only. POINTER TYPES Pointers are indirect references to variables that are created at runtime: var ip: ^integer; Pointers are neither basic or structured types (they are not structured because they do not have multiple components). Any type can be pointed to. In practice, pointers allow you to create a series of unnamed components which can be arranged in various ways. The type declaration for pointers is special in that the type specified to the right of "^" must be a type name, not a full type specification. Pointer declarations are also special in that a pointer type can be declared using base types that have not been declared yet: type rp: ^rec; rec: record next: rp; val: integer end; The declaration for rp contains a reference to an undeclared type, rec. This "forward referencing" of pointers allows recursive definition of pointer types, essential in list processing. TYPE COMPATIBILITY Type compatibility (ability to use two different objects in relation to each other), occurs on three different levels: 1. Two types are identical. 2. Two types are compatible. 3. Two types are assignment compatible. Two types are identical if the exact same type definition was used to create the objects in question. This can happen in several different ways. Two objects can be declared in the same way: var a, b: array [1..10] of record a, b: integer end; Here a and b are the same (unnamed) type. They can also be declared using the same type name: type mytype = record a, b: integer end; var a: mytype; b: mytype; Finally, an "alias" can be used to create types: type mytype = array [1..10] of integer; myother = mytype; var a: mytype; b: myother; Even though an alias is used, these objects till have the same type. Two types are considered compatible if: 1. They are identical types (as described above). 2. Both are ordinal types, and one or both are subranges of an identical type. 3. Both are sets with compatible base types and "packed" status. 4. Both are string types with the same number of components. Finally, two types are assignment compatible if: 1. The types are compatible, as described above. 2. Neither is a file, or has components of file type. 3. The destination is real, and the source is integer (because integers can allways be promoted to real, as above). 4. The source "fits" within the destination. If the types are subranges of the same base type, the source must fall within the destination's range: var x: 1..10; ... x := 1; { legal } x := 20; { not legal } 5. Both are sets, and the source "fits" within the destination. If the base types of the sets are subranges, all the source elements must also exist in the destination: var s1: set of 1..10; ... s1 := [1, 2, 3]; { legal } s1 := [1, 15]; { not legal } EXPRESSIONS The basic operands in Pascal are: xxx - Integer constant. A string of digits, without sign, whose value is bounded by -maxint..maxint. x.xex - Real constant. 'string' - String constant. [set] - Set constant. A set constant consists of zero or more elements separated by ",": [1, 2, 3] A range of elements can also appear: [1, 2..5, 10] The elements of a set must be of the same type, and the "apparent" base type of the set is the type of the elements. The packed or unpacked status of the set is whatever is required for the context where it appears. ident - Identifier. Can be a variable or constant from a const declaration. func(x, y) - A function call. Each parameter is evaluated, and the function called. The result of the function is then used in the encompassing expression. The basic construct built on these operands is a "variable access", where "a" is any variable access. ident - A variable indentifier. a[index] - Array access. It is also possible to access any number of dimensions by listing multiple indexes separated by ",": [x, y, z, ...] a.off - Record access. The "off" will be the element identifier as used in the record declaration. a^ - Pointer reference. The resulting reference will be of the variable that the pointer indexes. If the variable reference is a file, the result is a reference to the "buffer variable" for the file. Note that a VAR parameter only allows a variable reference, not a full expression. For the rest of the expression operators, here they are in precedence, with the operators appearing in groups according to priority (highest first). "a" and "b" are operands. (a) - A subexpresion. not - The boolean "not" of the operand, which must be boolean. a*b - Multiplication/set intersection. If the operands are real or integer, the multiplication is found. If either operand is real, the result is real. If the operands are sets, the intersection is found, or a new set with elements that exist in both sets. a/b - Divide. The operands are real or integer. The result is a real representing a divided by b. a div b - Integer divide. The operands must be integer. The result is an integer giving a divided by b with no fractional part. a mod b - Integer modulo. The operands must be integer. The result is an integer giving the modulo of a divided by b. a and b - Boolean "and". Both operands must be boolean. The result is a boolean, giving the "and" of the operands. +a - Identity. The operand is real or integer. The result is the same type as the operand, and gives the same sign result as the operand (essentially a no-op). -a - Negation. The operand is real or integer. The result is the same type as the operand, and gives the negation of the operand. a+b - Add/set union. If the operands are real or integer, finds the sum of the operands. If either operand is real, the result is real. If both operands are sets, finds a new set which contains the elements of both. a-b - Subtract/set difference. If the operands are real or integer, finds a minus b. If either operand is real, the result is real. If both operands are sets, finds a new set which contains the elements of a that are not also elements of b. a or b - Boolean "or". Both operands must be boolean. The result is boolean, giving the boolean "or" of the operands. a < b - Finds if a is less than b, and returns a boolean result. The operands can be basic or string types. a > b - Finds if a is greater than b, and returns a boolean result. The operands can be basic or string types. a <= b - Finds if a is less than or equal to b, and returns a boolean result. The operands can be basic, string, set or pointer types. a >= b - Finds if a is greater than or equal to b, and returns a boolean result. The operands can be basic, string, set or pointer types. a = b - Finds if a is equal to b, and returns a boolean result. The operands can be basic, string, set or pointer types. a <> b - Finds if a is not equal to b, and returns a boolean result. The operands can be basic, string, set or pointer types. a in b - Set inclusion. A is an ordinal, b is a set with the same base type as a. Returns true if there is an element matching a in the set. PREDEFINED FUNCTIONS The following predefined functions exist: sqr(x) - Finds the square of x, which can be real or integer. The result is the same type as x. sqrt(x) - Finds the square root of x, which can be real or integer. The result is allways real. abs(x) - Finds the absolute value of x, which can be real or integer. The result is the same type as x. sin(x) - Finds the sine of x,which can be real or integer. x is expressed in radians. The result is always real. cos(x) - Finds the cosine of x,which can be real or integer. x is expressed in radians. The result is always real. arctan(x) - Finds the arctangent of x, which can be real or integer. The result is always real, and is expressed in radians. exp(x) - Finds the exponential of x, which can be real or integer. The result is always real. ln(x) - Finds the natural logarithim of x, which can be real or integer. The result is always real. ord(x) - Finds the integer equivalent of any ordinal type x. succ(x) - Finds the next value of any ordinal type x. pred(x) - Finds the last value of any ordinal type x. chr(x) - Finds the char type equivalent of any integer x. trunc(x) - Finds the nearest integer below the given real x (converts a real to an integer). round(x) - Finds the nearest integer to the given real x. STATEMENTS Pascal uses "structured statements". This means you are given a few standard control flow methods to build a program with. ASSIGNMENT The fundamental statement is the assignment statement: v := x; There is a special operator for assignment, ":=" (or "becomes"). Only a single variable reference may appear to the right, and any expression may appear to the left. The operands must be assignment compatible, as defined above. IF STATEMENT The if statement is the fundamental flow of control structure: if cond then statement [else statement] In Pascal, only boolean type expressions may appear for the condition (not integers). The if statement specifys a single statement to be executed if the condition is true, and an optional statement if the condition is false. You must beware of the "bonding problem" if you create multiple nested if statements: if a = 1 then if b = 2 then writeln('a = 1, b = 2') else writeln('a <> 1'); Here the else clause is attached to the very last statement that appeared, which may not be the one we want. WHILE STATEMENT Just as if is the fundamental flow of control statement, while is the fundamental loop statement: while cond do statement The while statement continually executes it's single statement as long as the condition is true. It may not execute the statement at all if the condition is never true. REPEAT STATEMENT A repeat statement executes a block of statements one or more times: repeat statement [; statement] until cond It will execute the block of statements as long as the condition is false. The statement block will always be executed at least once. FOR STATEMENT The for statement executes a statement a fixed number of times: for i := lower to upper do statement for i := upper downto lower do statement The for statement executes the target statement as long as the "control variable" lies within the set range of lower..upper. It may not execute at all if lower > upper. The control variable in a for is special, and it must obey several rules: 1. It must be ordinal. 2. It must be local to the present block (declared in the present block). 3. It must not be "threatened" in the executed statement. To threaten means to modify, or give the potential to modify, as in passing as a VAR parameter to a procedure or function (see below). CASE STATEMENT The case statement defines an action to be executed on each of the values of an ordinal: case x of c1: statement; c2: statement; ... end; The "selector" is an expression that must result in an ordinal type. Each of the "case labels" must be type compatible with the selector. The case statement will execute one, and only one, statement that matches the current selector value. If the selector matches none of the cases, then an error results. It is NOT possible to assume that execution simply continues if none of the cases are matched. A case label MUST match the value of the selector. GOTO STATEMENT The goto statement directly branches to a given labeled statement: goto 123 ... 123: Several requirements exist for gotos: 1. The goto label must have been declared in a label declaration. 2. A goto cannot jump into any one of the structured statements above (if, while, repeat, for or case statements). 3. If the the target of the goto is in another procedure or function, that target label must be in the "outer level" of the procedure or function. That means that it may not appear inside any structured statement at all. COMPOUND STATEMENT A statement block gives the ability to make any number of statements appear as one: begin statement [; statement]... end All of the above statements control only one statement at a time, with the exception of repeat. The compound statement allows the inclusion of a whole substructure to be controlled by those statements. PROCEDURES AND FUNCTIONS When you need to use a block of the same statements several times, a compound block can be turned into a procedure or function and given a name: procedure x; begin ... end; function y: integer; begin ... end; Then, the block of statements can be called from anywhere: var i: integer; x; { calls the procedure } i := y; { calls the function } The difference between a procedure and a function is that a function returns a result, which can only be a basic or pointer type (not structured). This makes it possible to use a function in an expression. In a function, the result is returned by a special form of the assign statement: function y: integer; begin ... y := 1 { set function return } end; The assignment is special because only the name of the function appears on the left hand side of ":=". It does not matter where the function return assignment appears in the function, and it is even possible to have multiple assignments to the function, but AT LEAST one such assignment must be executed before the function ends. If the procedure or function uses parameters, they are declared as: procedure x(one: integer; two, three: char); begin ... end; The declaration of a parameter is special in that only a type name may be specified, not a full type specification. Once appearing in the procedure or function header, parameters can be treated as variables that just happen to have been initialized to the value passed to the procedure or function. The modification of parameters has no effect on the original parameters themselves. Any expression that is assignment compatible with the parameter declaration can be used in place of the parameter during it's call: x(x*3, 'a', succ('a')); If it is desired that the original parameter be modified, then a special form of parameter declaration is used: procedure x(var y: integer); begin y := 1 end; Declaring y as a VAR parameter means that y will stand for the original parameter, including taking on any values given it: var q: integer; ... x(q); Would change q to have the value 1. In order to be compatible with a VAR the passed parameter must be of identical type as the parameter declaration, and be a variable reference. Finally, Pascal provides a special mode of parameter known as a procedure or function parameter which passes a reference to a given procedure or function: procedure x(procedure y(x, q: integer)); ... procedure z(function y: integer); ... To declare a procedure or function parameter, you must give it's full parameter list, including a function result if it is a function. A procedure or function is passed to a procedure or function by just it's name: procedure r(a, b: integer); begin ... end; begin x(r); { pass procedure r to procedure x } ... The parameter list for the procedure or function passed must be "congruent" with the declared procedure or function parameter declaration. This means that all it's parameters, and all of the parameters of it's procedure or function parameters, etc., must match the declared parameter. Once the procedure or function has been passed, it is then ok for the procedure or function that accepts it to use it: procedure x(procedure y(x, q: integer)); begin y(1, 2); ... Would call r with parameters 1 and 2 Procedures and functions can be declared in advance of the actual appearance of the procedure or function block using the forward keyword: procedure x(a, b: integer); forward; procedure y; begin x(1, 2) ... end; procedure x; begin ... The forward keyword replaces the appearance of the block in the first appearance of the declaration. In the second appearance, only the name of the procedure appears, not it's header parameters. Then the block appears as normal. The advance declaration allows recursive structuring of procedure and function calls that would be otherwise not be possible. PREDEFINED PROCEDURES AND FILE OPERATIONS A file is not accessed directly (as an array is). Instead, Pascal automatically declares one component of the files base type which is accessed by special syntax: f^ So that: f^ := 1; Assigns to the file "buffer" component, and: v := f^; Reads the file buffer. Unless the file is empty or you are at the end of the file, the file buffer component will contain the contents of the component at the file location you are currently reading or writing. Other than that, the file buffer behaves as an ordinary variable, and can even be passed as a parameter to routines. The way to actually read or write through a file is by using the predeclared procedures: get(f); Loads the buffer variable with the next element in the file, and advances the file position by one element, and: put(f); Outputs the contents of the buffer variable to the file and advances the file position by one. These two procedures are really all you need to implement full reading and writing on a file. It also has the advantage of keeping the next component in the file as a "lookahead" mechanism. However, it is much more common to access files via the predefined procedures read and write: read(f, x); Is equivalent to: x := f^; get(f); And: write(f, x); Is equivalent to: f^ := x; put(f); Read and write are special in that any number of parameters can appear: read(f, x, y, z, ...); write(f, x, y, z, ...); The parameters to read must be variable references. The parameters to write can be expressions of matching type, except for the file parameter (files must always be VAR references). Writing to a file is special in that you cannot write to a file unless you are at the end of the file. That is, you may only append new elements to the end of the file, not modify existing components of the file. Files are said to exist in three "states": 1. Inactive. 2. Read. 3. Write. All files begin life in the inactive state. For a file to be read from, it must be placed into the read state. For a file to be written, it must be placed in the write state. The reset and rewrite procedures do this: reset(f); Places the buffer variable at the 1st element of the file (if it exists), and sets the file mode to "read". rewrite(f); Clears any previous contents of the file, and places the buffer variable at the start of the file. The file mode is set to "write". A file can be tested for only one kind of position, that is if it has reached the end: eof(f); Is a function that returns true if the end of the file has been reached. eof must be true before the file can be written. PREDEFINED PROCEDURES AND TEXT FILES As alluded to before, text files are treated specially under Pascal. First, The ends of lines are treated specially. If the end of a line is reached, a read call will just return a space. A special function is required to determine if the end of the line has been reached: eoln(f); Returns true if the current file position is at the end of a line. Pascal strictly enforces the following structure to text files: line 1 line 2 ... line N There will always be an eoln terminating each line. If the file being read does not have an eoln on the last line, it will be added automatically. Besides the standard read and write calls, two procedures are special to text files: readln(f...); writeln(f...); Readln behaves as a normal read, but after all the items in the list are read, The rest of the line is skipped until eoln is encountered. Writeln behaves as a normal write, but after all the items in the list are written, an eoln is appended to the output. Text files can be treated as simple files of characters, but it is also possible to read and write other types to a text file. Integers and reals can be read from a text file, and integers, reals, stringsbooleans, and strings can be written to text files. These types are written or read from the file by converting them to or from a character based format. The format for integers on read must be: [+/-]digit[digit]... Examples: 9 +56 -19384 The format for reals on read is: [+/-]digit[digit]...[.digit[digit]...][e[+/-]digit[digit]...] Examples: -1 -356.44 7e9 +22.343e-22 All blanks are skipped before reading the number. Since eolns are defined as blanks, this means that even eoln is skipped to find the number. This can lead to an interesting situation when a number is read from the console. If the user presses return without entering a number (on most systems), nothing will happen until a number is entered, no matter how many times return is hit ! Write parameters to textfiles are of the format: write(x[:field[:fraction]]); The field states the number of character positions that you expect the object to occupy. The fraction is special to reals. The output format that occurs in each case are: integer: The default field for integers is implementation defined, but is usually the number of digits in maxint, plus a position for the sign. If a field is specified, and is larger than the number of positions required to output the number and sign, then blanks are added to the left side of the output until the total size equals the field width. If the field width is less than the required positions, the field width is ignored. real: The default field for reals is implementation defined. There are two different format modes depending on whether the fraction parameter appears. If there is no fraction, the format is: -0.0000000e+000 Starting from the left, the sign is either a "-" sign if the number is negative, or blank if the number is positive or zero. Then the first digit of the number, then the decimal point, then the fraction of the number, then either 'e' or 'E' (the case is implementation defined), then the sign of the exponent, then the digits of the exponent. The number of digits in the exponent are implementation defined, as are the number of digits in a fraction if no field width is defined. If the field width appears, and it is larger than the total number of required positions in the number (all the characters in the above format without the fraction digits), then the fraction is expanded until the entire number fills the specified field, using right hand zeros if required. Otherwise, the minimum required positions are always printed. If a fraction appears (which means the field must also appear), the format used is: [-]00...00.000..00 The number is converted to it's whole number equivalent, and all the of whole number portion of the number printed, regardless of the field size, proceeded by "-" if the number is negative. Then, a decimal point appears, followed by the number of fractional digits specified in the fraction parameter. If the field is greater then the number of required positions and specified fraction digits, then leading spaces are appended until the total size equals the field width. The minimum positions and the specified fractional digits are always printed. HEADER FILES The header files feature was originally designed to be the interface of Pascal to the external files system, and as such is implementation by definition. It is also (unfortunately) ignored in most implementations. The header files appear as a simple list of identifiers in the program header: program test(input, output, source, object); Each header file automatically assumes the type text. If the file needs to be another type, it should be declared again in the variables section of the program block: program test(intlist); var intlist: file of integer; Two files are special, and should not be redeclared. These are input and output. The input files are understood to represent the main input and main output from the program, and are present in all Pascal programs. In addition, they are the default files is special forms of these procedures and functions: This form is equivalent to This form -------------------------------------------------------------- write(...) write(output, ...) writeln(...) writeln(output, ...) writeln writeln(output) read(...) read(input, ...) readln(...) readln(input, ...) readln readln(input) eof eof(input) eoln eoln(input) PACKING PROCEDURES Because arrays are incompatible with each other even when they are of the same type if their packing status differs, two procedures allow a packed array to be copied to a non-packed array and vice versa: unpack(PackedArray, UnpackedArray, Index); Unpacks the packed array and places the contents into the unpacked array. The index gives the starting index of the unpacked array where the data is to be placed. Interestingly, the two arrays need not have the same index type or even be the same size ! The unpacked array must simply have enough elements after the specified starting index to hold the number of elements in the packed array. pack(UnpackedArray, Index, PackedArray); Packs part of the unpacked array into the packed array. The index again gives the starting position to copy data from in the unpacked array. Again, the arrays need not be of the same index type or size. The unpacked array simply need enough elements after the index to provide all the values in the packed array. DYNAMIC ALLOCATION In Pascal, pointer variables are limited to the mode of variable they can index. The objects indexed by pointer types are anonymous, and created or destroyed by the programmer at will. A pointer variable is undefined when it is first used, and it is an error to access the variable it points to unless that variable has been created: var p: ^integer; ... new(p); { create a new integer type } p^ := 1; { place value } Would create a new variable. Variables can also be destroyed: dispose(p); Would release the storage allocated to the variable. It is an error (a very serious one) to access the contents of a variable that has been disposed. A special syntax exists for the allocation of variant records: var r: record a: integer; case b: boolean of true: (c: integer); false: (d: char) { end } end; ... new(p, true); ... dispose(p, true); For each of new and dispose, each of the tagfields we want to discriminate are parameters to the procedure. The appearance of the tagfield values allow the compiler to allocate a variable with only the amount of space required for the record with that variant. This can allow considerable storage savings if used correctly. The appearance of a discriminant in a new procedure does not also automatically SET the value of the tagfield. You must do that yourself. For the entire life of the variable, you must not set the tagfield to any other value than the value used in the new procedure, nor access any of the variants in the record that are not active. The dispose statement should be called with the exact same tagfield values and number. Note that ALL the tagfields in a variable need not appear, just all the ones, in order, that we wish to allocate as fixed. ***************************************************************************** Q. WHAT ARE SOME STANDARD METHODS AND HINTS FOR PASCAL ? A. There are several techniques in Standard Pascal that are not obvious. STRINGS Pascal and C have one feature in common, they include no basic support for handling of strings of characters. Strings, as implemented in Basic and other languages, are a "high cost" data element, mainly because a lot of character copying must occur within the string functions. Most "professional" (ie., used for paid programming) languages choose to leave creation of string handling up to the programmer. The reason is that in many or even most cases, applying primitives to simple arrays of characters can achieve the same result at less expense. However, one big difference between C and Pascal is that C allows variable length string passage, which greatly facilitates creation of general purpose string handling routines, and manipulation of string constants. In Pascal, by contrast, you must declare a string as a fixed length array: var string: packed array [1..50] of char; Which means that all of your strings must have the same length as the handler routines expect. Further, for any reasonable size of string, assigning string constants to strings is prohibitive: string = 'hello, world '; And this is a short example ! More commonly, strings must be 100-200 characters in length, so the assignment of string constants is just impractical. USING "SPACE PADDED" STRINGS No matter what the length of string, the first and best trick is to make extensive use of space padded strings: var s: packed array [1..10] of char; ... s := 'hello '; For example to read a word from the input: var i: integer; s := ' '; i := 1; { set 1st character in string } while not input^ = ' ' do { read characters } if i <= 10 then begin { not overflow } read(s[i]); { get next character } i := i+1 { next character } end; If the user inputs "one more", s would be "one ", or the first word with blank padding. Note the trick used to find the end of the word. Eoln is returned as a space, which also happens to be the word delimiter, and in standard Pascal every line must be terminated by a eoln. Once all your strings are in space padded form, operations on them are easy: var s1, s2: packed array [1..10] of char; ... s1 = s2; { find if strings are equal } s1 < s2; { find if s1 is lexograpically smaller than s2 } s1 := s2; { assign strings } Note that comparing strings for order depends on the value of the space character. If space has a smaller ordinal value than all other characters, then "ab " is going to be less than "abc ". If space has a larger values than others, the opposite would be true. In ASCII, the space is the lowest value character, and this indeed gives the lexographical sort that is most popular (ie., that shorter character sequences go first). Space padded strings even work for strings with ebedded spaces ! A string like: s := 'this is a very long string '; Would be equal to "this is a very long string" without the trailing blanks. This works because, reading left to right as we do, any spaces after the text are unimportant (and typically will not make any difference to printout). Putting text through processing using blank padded strings will have the effect of trimming all the trailing blanks off the text, often a desirable side effect. To find the length of a blank padded string: var e: integer; s: packed array [1..100] of char; ... e := 100; { set maximum } { find end of string } while (s[e] > 1) and (s[e] = ' ') do e := e-1; Will set e to be the last character of the string, or to 1 if the string is empty. A check for a blank string need not be: s = ' '; or similar, but simply: s[1] = ' '; Because if the first character is empty, the entire string is usually empty as well. USE "CUSTOM" STRING SIZES The best way to achieve "tight" code in standard pascal is to use custom strings for given tasks. For instance, if you are going to input a command from the user, and use that to look up a string in a table, you can create a string type based on the maximum length of command string that you are going to need. Then you can create a few custom handling procedures for that string type. This works best if you use constant declarations to declare the length of the string: const strlen = 100; type mystring = packed array [1..strlen] of char; This way, if you must change the string size later, there is less difficulty. Although these kinds of solutions may create a larger program listing, the result is usually faster than using a general purpose string library, and the cost saved by not including the general string library may pay for any duplication of effort. CLEARING STRINGS Clearing long strings is best done as: const strlen = 100; var i: integer; s: packed array [1..strlen] of char; ... for i := 1 to strlen do s[i] := ' '; This code will usually take less space than spelling out a long string of blanks, take the same amount of CPU time (after all, a string assign uses a copy loop as well), and most importantly, won't have to be changed if we change the string lengths in the program. INITIALIZING STRINGS If you have to have both long strings and also occasionally set these strings to a known constant, it helps to create a procedure to initialize them. You simply find what the longest constant string you are going to assign to the variable strings, then create a special string type for that. Finally, you create a custom procedure to do the copy: const strlen = 250; { our "big" string } cstlen = 12; { our constant strings } type string = packed array [1..strlen] of char; cstring = packed array [1..cstlen] of char; var s: string; procedure inistr(var s: string; c: cstring); begin for i := 1 to strlen do s[i] := ' '; { clear result } for i := 1 to cstlen do s[i] := c[i] { place string } end; ... inistr(s, 'hello, world'; "REAL" STRINGS Because Pascal does not have built in strings does not prevent you from implementing them yourself: type string = record len: integer; str: packed array [1..100] of char; end; Then you can go ahead and define a full set of procedures to concatenate, find substrings, etc., within a string. The principal drawback here is again, initializing strings. But using the "initialize procedure" method, in combination with the "find end of blank string" method, it is easy to create a procedure that will do the job: var s: string; ... inistr(s, 'mystring '); Here intstr would find the exact length of the string by blank termination, then assign that to the string length, and place the string body. FINDING AN ENUMERATED VALUE FROM AN INTEGER One of the more irritating things in Pascal is converting an integer to an enumerated type. Even though there is a one to one correspondence between integers and enumerateds, and you can find the integer value of any enumerated value with ord, you cannot go the other way easily. You might use a case statement or similar kludge. But there is a trick you can use that takes a bit of space, but incurs very little speed penalty and is as easy to use sourcewise as the hypothetical "unord" function would be: type enum = (one, two, three, four, five, six, seven, eight, nine, ten); var ei: enum; etran: array [10] of enum; begin { initalize translation array } for ei := one to ten do etran[ord(ei)] := ei; ... ei := etran[5]; Now translating an integer back to the enumerated type is simply a fast array lookup. The price for this is the size of the translation array, and the fact that you must declare the translation array with a number that depends on the size of enum. CREATING CONSTANT TABLES Standard Pascal has no capability to create tables of fixed data at compile time. But your compiler may be able to turn assigns into preinitalized data if it can determine that the assign will ABSOLUTELY happen before anything else. The best way to do this is to perform all such assigns first: var table: record a: integer; b: integer; c: char end; begin table.a := 1; table.b := 45; table.c := 'x'; ... Typically this should only be done at the program block level. Doing this gives the compiler the maximum chance to perform this optimization. CREATING YOUR OWN DYNAMIC VARIABLE RECYCLING You may be able to do a better job of dynamic variable recycling than the standard "dispose" routine. getting and putting a lot of variable length blocks tends to create "fragmentation" difficulties. You can illustrate this problem fairly simply. If you are using two different data types in dynamic storage, one of 100 bytes length, and the other 200 bytes in length, and these are created and disposed at random, the standard new and dispose procedures may be taking back the 200 byte blocks and breaking them down into 100 byte blocks to satisfy the calls for that size of variable. The result is that eventually, the call to new for a 200 byte block may fail, even though plenty of storage still exists, because it is all broken into 100 byte blocks. I have found that in many cases, it is better to hold on to unused blocks, and recycle them yourself: program mine(input, output); type blkptr = ^block; block = record next: blkptr; array [1..100] of byte end; var freblk; { the free block list } { get a new block } procedure getblk(var p: blkptr); { returns the block } begin if freblk <> nil then begin { recycle existing block } p := freblk; { index the top block } freblk := freblk^.next { remove from list } else new(p) { otherwise create a new one } end; { put an unused block } procedure putblk(p: blkptr); { block to dispose of } begin p := freblk; { insert to free list } freblk := p end; begin freblk := nil; { clear free block list } ... This system reduces fragmentation, because blocks are reserved for a particular use, and not broken down into smaller parts. It tends to be storage efficient, because programs typically do the same sort of work over and over again. That is, if you needed to get N blocks of X type, this means that X type blocks will be used a lot in the run of the program (although obviously there are programs that break that rule). I add "meters" to count the number of blocks going in and out of the free list to tell me how the system is performing in real life. CREATING VARIABLE LENGTH ARRAYS If you require variable length arrays in standard Pascal, what do you do ? For example, if you are going to create a text editor, and want to store variable length lines, but you don't want to place a limit on the length of a line. You need a variable length string type to contain each line. The answer is that you can create variable length arrays yourself ! The secret is to chain dynamically allocated arrays together to make a larger array: type blkptr = ^block; block = record next: blkptr; data: packed array [1..10] of char end; var p: blkptr; { allocate string in terms of blocks } procedure alcblk(var p: blkptr; { returns string allocated } l: integer); { length of string } var t: blkptr; begin p := nil; { clear result list } while l > 0 do begin { allocate blocks } new(t); { get new block } t^.next := p; { link into list } p := t; l := l-10 { count that block } end end; { get character from string } function getblk(p: blkptr; { string to fetch character from } i: integer); { index to get character from } : char; { returns character from index } begin while i > 10 do begin p := p^.next; i := i-10 end; { index proper block } getblk := p^.data[i] { return resulting character } end; { place string character } procedure putblk(p: blkptr; { string to put character to } i: integer; { index to put character to } c: char); { character to place } begin while i > 10 do begin p := p^.next; i := i-10 end; { index proper block } p^.data[i] := c { place character } end; This technique could be used on any array type. Although it looks pretty horrible, the system can keep any number of any length of string, such as an advanced editor would do, and the fact that all strings are broken down into fixed length "quanta" keeps storage fragmentation to a minimum, or even eliminates it entirely (and important attribute for an editor). The processing cost of the system can be lessened by pulling the variable length data to/from a large buffer, and working on it there. But of course, this would limit the length of data you can work on. PERFORMING BITWISE BOOLEAN FUNCTIONS If you have to perform booleans on a standard compiler that does not have boolean bitwise operators on integers, you can create them. First, if you know that the same bits are not set in two words, then you can just add them to get "or" functionality: i := i+64; { set bit 7 } You can also mask off bits in an integer using "div" and "mod": i := i mod 256; { is equivalent to i and $ff } i := i div 256*256; { is equivalent to i and not $ff } For generalized boolean operations you can use: function bor(a, b: integer); var i, r, p: integer; begin r := 0; { clear result } p := 1; { set 1st power } for i := 1 to maxbit do r := r*2; { move bits up } if ord(odd(a) or odd(b) then r := r+p; { add in power } p := p*2 { find next power } end; bor := r { return result } end; function band(a, b: integer); var i, r, p: integer; begin r := 0; { clear result } p := 1; { set 1st power } for i := 1 to maxbit do r := r*2; { move bits up } if ord(odd(a) and odd(b) then r := r+p; { add in power } p := p*2 { find next power } end; bor := r { return result } end; function bxor(a, b: integer); var i, r, p: integer; begin r := 0; { clear result } p := 1; { set 1st power } for i := 1 to maxbit do r := r*2; { move bits up } if ord(odd(a) <> odd(b) then r := r+p; { add in power } p := p*2 { find next power } end; bor := r { return result } end; These assume a 32 bit integer, but can be set to any integer length. Note that the sign bit is specifically left out of the operation. You can find the value of maxbit (the number of bits in an integer) as: var i, x: integer; ... x := maxint; i := 0; while x <> 0 do begin x := x div 2; i := i+1 end; This won't count the sign bit, which is correct for the above routines. This should be done only once, when the program starts up. READING AND WRITING INTEGERS TO A BYTE FILE Often you must deal with files that randomly mix different types of data that are not ameniable to declaration as a file of records. If you read and write large integers using only standard constructs (and not "type changing" using variant records), you may find it easier to use a format known as "signed magnitude" than to try to write the number using the 2's complement format used by the CPU. In signed magnitude, the sign bit is determined and written out separately from the value of the integer: { write integer to byte file } procedure wrtint(var f: bytfil; { file to write to } i: integer); { integer to write } var s: byte; { sign holder } begin { remove sign and save } if i < 0 then s := 128 else s := 0; i := abs(i); { remove sign } write(f, i div 16777216+s); { output high byte with sign } write(f, i div 65536 mod 256); { output high middle } write(f, i div 256 mod 256); { output low middle } write(f, i mod 256) { output low byte } end; { read integer from byte file } procedure rdint(var f: bytfil; { file to read from } var i: integer); { integer to read } var s: boolean; { sign holder } b: byte; begin s := false; { set no sign } read(f, b); { get high byte } if b >= 128 then begin s := true; b := b-128 end; i := b*16777216; read(f, b); { get high middle } i := i+b*65536; read(f, b); { get low middle } i := i+b*256; read(f, b); { get low byte } i := i+b; if s then i := -i { add back sign } end; For 32 bit integers written in "big endian" format (high order bytes first). Note that the sign is written and read as bit 32, the same place as the normal sign. Note that the only value you lose this way is $80000000, which is an invalid value under standard Pascal anyways. ***************************************************************************** Q. WHAT ARE SOME GOOD CHARACTERISTICS OF A STANDARD PASCAL COMPILER ? A. Just following the standard is not enough to create a useable compiler. Obviously freedom from limits (except the limit of available memory) is a good attribute of a compiler, as well as producing the best code possible. Other points to look for: 1. Should be able to represent "set of char" without problem. This is really a must. For a while, it was under consideration to add this requirement to the standard. It wasn't, but most character handling programs I have seen rely on being able to use character sets, so you might as well consider it part of the standard. 2. Represents 8 bit characters. I have found it best if the compiler leaves it entirely up to you what is done with the 8th bit (of ASCII or ISO character sets). This allows you do either deal with parity, or manipulate 256 valued extended character sets (like IBM ASCII). 3. "file of char" is not the same as "text". This is a somewhat obscure point. Normal "text" files are "filtered" by Pascal I/O. Line endings are made regular, and eolns are converted to spaces. But there may be times that you want to talk in terms of the exact characters themselves, read and write carriage returns and line feeds directly, etc. For example, when reading direct from the computer console, you may want to see directly if the user hits the "return key". There is no requirement in the standard that "file of char" do the same filtering that "text" does, and better compilers consider "file of char" a clue to completely get out of the way and pass raw characters to and from the file. Also, it is common for Pascal systems to buffer up whole lines when reading from the console (see below). File of char will allow this buffering to be bypassed. 4. Input from the computer console is buffered. If you have a program like: program readit(input, output); var i: integer; begin writeln('Input the number: '); readln(i); writeln('The number was: ', i:1) end; If the console is read directly, and the user makes a mistake on entry, it will not be possible to back up and correct typing. If the line is buffered, the user can back up, correct the error, and continue without the simple program above needing to do anything about it. Without this capability, you would have to write such "line editting" features in yourself. A REALLY GOOD Pascal system would implement a complete set of line editting features, such as back up, insert characters, etc. 5. Ability to represent a file of bytes. One thing that really amazes me about some compilers I have seen is an inability to read or write files of byte value: type byte = 0..255; { a predefined type on many compilers } bytfil = packed file of byte; If your compiler decides that bytes are integers, at whatever the size of integers are on your machine (16, 32 or 64 bits), it may just read or write integer size values to the file ! This forces you to break apart the bytes from an integer yourself, which is not only tedious, but creates a nonportable program. 6. Does something with the program parameters. Somewhere, at some time, it became standard to just ignore the program parameters. This is a real shame, because when implemented, they are really useful for creating quick, short programs: program copy(source, destination); var c: char; begin while not eof(source) do begin if eoln(source) then begin readln(source); writeln(destination) end else begin read(source, c); write(destination, c) end end end. Now if the compiler ignores the program header parameters, this program does nothing, and will probably terminate with an error. But the compiler can also automatically attach program parameters other than the standard "input" or "output" to command line parameters, like: > copy myfile.txt thatfile.txt So the compiler opens source to "myfile.txt", and destination to "thatfile.txt". The result is a program that is completely standard, and yet performs named file manipulation. Further, this method is much simpler than using the "open by name" extensions present in most Pascals, and allows you to create simple example programs faster. 7. Ability to tolerate control characters in source. If the compiler ignores control characters in the source, such as tabs and form feeds (treats them as spaces), you will have the freedom to format your source so that it can be directly printed. 8. Don't have "extended" modes that break the standard rules. The Pascal standards basically require that the compiler have a "switch" (command line option) that causes the compiler to accept only standard Pascal, and rejects any nonstandard constructs. Because some compilers were originally nonstandard, and were brought into compliance with the standard after the fact, it is quite possible that they may not accept standard programs with this switch off ! This is truly unfortunate, since it forces you to write nonstandard Pascal just to take advantage of Extensions to the language provided by your compiler. 9. Accept Pascal strings for extended file naming procedures. A common extension to Pascal is a Basic string construct (as opposed to a pascal string, which is just an array of characters). Unfortunately, it has become common to REQUIRE that such strings be used with extended file "open by name" procedures. A good compiler should accept standard pascal strings as arguments to such procedures, which allows you to use the string types you choose. These procedures should also ignore trailing spaces in such strings, which allows use of the typical Pascal "space padded string". 10. Implements "lazy I/O". Pascal as originally specified assumes that all files are batch files (ie., not "interactive", or connected to a console). This creates problems, say, reading from the console keyboard, because Pascal assumes that the first character in a text file is always available. Lazy I/O will only request the first character from a file when it is actually used, and is completely compatible with the standard. 11. Can provide "strict packing". Strict packing means that if a record is marked as "packed": var r: packed record a: boolean; b: integer; c: char end; Then, for example, "a" will only occupy a single bit, and all fields will be packed into as few bits as possible. This has implications beyond simple storage savings, because you can use strict packing to emulate ANY data structure from any language. for example, if you receive the date as packed into a single word: ------------------------------------------- | year (0-99) | month (1-12) | day (1-31) | ------------------------------------------- 7 bits 4 bits 5 bits This exactly fits in a single 16 bit word ! Then, this can be declared as a packed record: var date: packed record year: 0..99; month: 1..12; day: 1..31 end; And no "format conversion" is required. Note that if the alignment of packed records to the data structure desired is not perfect, you can insert "shims" in the form of single bits (boolean values). ***************************************************************************** Q. WHAT ARE COMMON EXTENSIONS AND METHODS TO PASCAL ? A. Most compilers include extensions to the basic language, the most popular (and necessary) extensions being manipulation of external files by name, and ability to separate program parts into modular form. In fact, a standard of sorts was created by the popular UCSD Pascal system for these two items. This is a list of popular extensions, and the form they usually take. Note that extensions that are part of the extended Pascal standard are discussed in the extended standard section. 1. Specification of hex constants. Allows you to directly specify hex numbers in the source: var a: integer; ... a := $56; "$" as the leading character seems to be the most common. Some compilers also allow binary radix (base 2) and octal radix (base 8). 2. Integer bit boolean operations. Many or even most compilers allow you to use the "and" and "or" operators on integers: var a, b: integer; ... a := a and b; The result is a bitwise "and" of the two integers. Note that the results when one or more of the operands is signed varies from compiler to compiler, and use of these operators with signed integers should be avoided. Many compilers also include an "xor" operator. 3. File open and close by name. This allows you to open an external file by name. Many compilers use the UCSD plan: var f: text; ... assign(f, 'myfile.txt'); reset(f); ... In this method, an ASCII name can be "bonded" to the file by the procedure "assign". If no assign is performed on a file, it is opened as a "temporary" file (ie., the system just coins a name for the file, and deletes it when complete) upon "reset" or "rewrite". This allows the system to be completely backward compatible with standard Pascal. If your system also has a "close" function: close(f) You should use it. Closing open files releases system space (used to keep track of files), and allows a series of files to be opened under one file variable. 4. Modular compilation. Again the most common, UCSD adds an extra declaration section before "label": program junk(input, output); uses trash, mylib; label 1, 2; ... This tells the compiler to include all the (public) constants, types and blocks in the files specified within uses list to the present program. The exact details of the format needed to compile the module (called a "unit" in UCSD) vary quite a bit from compiler to compiler. 5. Include statements. Most compilers allow the appearance of an "include" statement or marker in the source, that specifies that another file is to be included in line. Most common is UCSDs "control comment": {$I file.pas} Which would include the file "file.pas" inline, replacing the entire comment. 5. Flexible declaration order. Many or most compilers allow the declarations to occur in any order, and for declaration sections to occur multiple times: program test; var this, that: integer; const x = 10; var mystring: packed array [1..x] of char; ... The reason for this is that when using modular compilation or include statements, each program section must have its own set of declarations. Note that the rule that objects must be declared before use is still in effect even though the declarations can appear in any order. 6. Strings. A full Basic string type is included in Pascal, which allows constructs as: var s: string; s := 'hi there'; { any length can be assigned } s := s+' george'; { concatenation } writeln('Length is: ', len(s)); { find string length } ... This also was in UCSD Pascal. 7. "_" allowed in identifiers. This convention comes (mainly) from C, where indentifiers like: my_label Are common. Since Pascal must live in systems where these names are common, this is allowed in most implementations. 8. "goto" labels as normal identifiers. The appearance of goto labels as integers might be considered a sort of arcane plot to punish people for using "goto"s. Many or most implementations allow goto labels to appear in the same form as identifiers: label destination; ... destination: ... goto destination; ***************************************************************************** Q. WHAT COMPILERS EXIST FOR STANDARD PASCAL A. This is a VERY incomplete list of known standard Pascals, with my notes. There is no machine/operating system limits to which compilers I intend to list. FREEWARE The following compilers are free, and available over the net. 1. GNU Pascal. GPC is a front end to the very extensive GCC system, which is a modular compiler that generates code for a truly amazing number of different operating systems as machines. Because GPC fits into that system, it should be able to go anywhere that GCC goes. However, as of this writing, only a Linux compiler has officially been released. Compilers for other operating systems and machines are under way. GPC is stated to be compatible with the unextended standard, and work is under way to add the extended standard to it. However, exceptions to the standard reportedly exist in the language. The exact details are unknown. I will hopefully be checking this out personally when I get a dedicated Linux box going. Location: ftp://kampi.hut.fi/jtv/gnu-pascal Start by reading the GPC.GUIDE file, it will tell you everything. GPC states that it should work on any computer or operating system where GCC is implemented (which would make it the most widely available Pascal version anywhere). However, I have had people attempting the port tell me that this involves a lot of work, and as a result I have yet to see a port outside of Linux. As more ports appear, I will add them to this list. 2. Willhelm J. Withagen's project compiler for OS/2. I have tested this compiler, and noted one deviance from the standard, and the author informs me that intraprocedural gotos are not completely implemented. Location: ftp://ftp.cdrom.com/pub/os2/dev32/pasos2b.zip PAIDWARE These compilers are available for cost. Unfortunately, I will only be listing compilers that are currently shipping, which leaves several compilers out whose makers have dropped the product or had a business failure. 1. Prospero Pro Pascal. In existence for quite some time, the DOS compiler supports small, large and huge memory models. Prospero compilers show an extra effort to conform to the existing standards, and have excellent documentation. There is a windows version of this compiler, as well as one for the old (1.x) OS/2. The only drawback to the compiler is the less than stellar code speed. Price: ??? Email: prospero@prospero.demon.co.uk 2. Prospero ep32 Pascal. This is prospero's new 32 bit compiler for OS/2. From what I saw, Prospero did an excellent job of integrating the compiler to OS/2, including providing routine headers for the OS/2 system calls, a translator from C to Pascal for any other functions, and a full IDE. This compiler supports the full original and extended Pascal standards. The only drawbacks I found were the (again) slow code speed, and the fact that intraprocedural gotos don't work across module lines (which makes error recovery very difficult). Price: about $250 (US). Email: prospero@prospero.demon.co.uk ***************************************************************************** ======== EXTENDED PASCAL ========== Now we concentrate on the official "extended" version of the Pascal standard. This information was provided by John Reagan of DEC, who serves on the Pascal standards committees, and also programs Pascal compilers for DEC. **************************************************************************** Q. WHAT IS EXTENDED STANDARD PASCAL ? A. The Extended Pascal standard was completed in 1989 and is a superset of ISO 7185. The Extended Pascal standard is both an ANSI/IEEE and ISO/IEC standard. Both standards are identical in content with minor editorial differences between the IEEE and ISO/IEC style guiles. The ANSI/IEEE number is ANSI/IEEE 770X3.160-1989. The ISO/IEC number is ISO/IEC 10206 : 1991. Here is part of the foreword from the Extended Pascal standard to provide a short summary of the new features in Extended Pascal. - Modularity and Separate Compilation. Modularity provides for separately-compilable program components, while maintaining type security. o Each module exports one or more interfaces containing entities (values, types, schemata, variables, procedures, and functions) from that module, thereby controlling visibility into the module. o A variable may be protected on export, so that an importer may use it but not alter its value. A type may be restricted, so that its structure is not visible. o The form of a module clearly separates its interfaces from its internal details. o Any block may import one or more interfaces. Each interface may be used in whole or in part. o Entities may be accessed with or without interface-name qualification. o Entities may be renamed on export or import. o Initialization and finalization actions may be specified for each module. o Modules provide a framework for implementation of libraries and non-Pascal program components. Example: module employee_sort interface; export employee_sort = (sort_by_name,sort_by_clock_number, employee_list); import generic_sort; type employee = record last_name,first_name : string(30); clock_number : 1..maxint; end; employee_list(num_employees : max_sort_index) = array [1..num_employees] of employee; procedure sort_by_name(employees : employee_list; var something_done : Boolean); procedure sort_by_clock_number(employees : employee_list; var something_done : Boolean); end. - Schemata. A schema determines a collection of similar types. Types may be selected statically or dynamically from schemata. o Statically selected types are used as any other types are used. o Dynamically selected types subsume all the functionality of, and provide functional capability beyond, conformant arrays. o The allocation procedure NEW may dynamically select the type (and thus the size) of the allocated variable. o A schematic formal-parameter adjusts to the bounds of its actual-parameters. o The declaration of a local variable may dynamically select the type (and thus the size) of the variable. o The with-statement is extended to work with schemata. o Formal schema discriminants can be used as variant selectors. Example: type SWidth = 0..1023; SHeight = 0..2047; Screen(width: SWidth; height: SHeight) = array [0..height, 0..width] of boolean; Matrix(M,N: integer) = array [1..M,1..N] of real; Vector(M: integer) = array [1..M] of real; Color = (red,yellow); Color_Map(formal_discriminant: color) = record case formal_discriminant of red: (red_field : integer); yellow : (yellow_field : integer); end; function bound : integer; var s : integer; begin write('How big?'); readln(s); bound := s; end; var My_Matrix : Matrix(10,10); My_Vector : Vector(bound); { Notice the run-time expression! } Matrix_Ptr : ^Matrix; X,Y : integer; begin readln(x,y); new(Matrix_Ptr,X,Y); end - String Capabilities. The comprehensive string facilities unify fixed-length strings and character values with variable-length strings. o All string and character values are compatible. o The concatenation operator (+) combines all string and character values. o String may be compared using blank padding via the relation operators, or using no padding via the functions EQ, LT, GT, NE, LE, and GE. o The functions LENGTH, INDEX, SUBSTR, and TRIM provide information about, or manipulate, strings. o The substring-variable notation makes accessible, as a variable, a fixed-length portion of a string variable. o The transfer procedures READSTR and WRITESTR process strings in the same manner that READ and WRITE process textfiles. o The procedure READ has been extended to read strings from textfiles. - Binding of Variables. o A variable may optionally be declared to be bindable. Bindable variables may be bound to external entities (file storage, real-time clock, command lines, etc.). Only bindable variables may be so bound. o The procedures BIND and UNBIND, together with the related type BINDINGTYPE, provide capabilities for connection and disconnection of bindable internal (file and non-file) variables to external entities. o The function BINDING returns current or default binding information. - Direct Access File Handling. o The declaration of a direct-access file indicates an index by which individual file elements may be accessed. o The procedures SEEKREAD, SEEKWRITE, and SEEKUPDATE position the file. o The functions POSITION, LASTPOSITION, and EMPTY report the current position and size of the file. o The update file mode and its associated procedure UPDATE provide in-place modification. - File Extend Procedure. The procedure EXTEND prepares an existing file for writing at its end. - Constant Expressions. A constant expression may occur in any context needing a constant value. - Structured Value Constructors. An expression may represent the value of an array, record, or set in terms of its components. This is particularly value for defining structured constants. - Generalized Function Results. The result of a function may have any assignable type. A function result variable may be specified, which is especially useful for functions returning structures. [A function call may be directly array-index, field-selected, or pointer-dereferenced without having to use an intermediate variable.] - Initial Variable State. The initial state specifier of a type [or record field] can specify the value that variables [, or fields, or variant selectors] are to be created with. - Relaxation of Ordering of Declarations. There may be any number of declaration parts (labels, constants, types, variables, procedures and functions) and in any order. The prohibition of forward references in declarations is retained. - Type Inquiry. A variable or parameter may be declared to have the type of another parameter of another variable. - Implementation Characteristics. The constant MAXCHAR is the largest value of type CHAR. The constant MINREAL, MAXREAL, and EPSREAL describe the range of magnitude and the precision of real arithmetic. - Case-Statement and Variant Record Enhancements. Each case-constant-list may contain ranges of values. An OTHERWISE clause represents all values not listed in the case-constant-lists. - Set Extensions. o An operator (><) computes the set symmetric difference. o The function CARD yields the number of members in a set. o A form of the for-statement iterates through the members of a set. - Date and Time. The procedure GETTIMESTAMP and the functions DATE and TIME, together with the related type TIMESTAMP, provide numeric representations of the current date and time and convert numeric representations to strings. - Inverse Ord. A generalizations of SUCC and PRED provides an inverse ORD capability. - Standard Numeric Input. The definition of acceptable character sequences read from a textfile includes all standard numeric representations defined by ANSI X3.42-1975. - Non-Decimal Representation of Numbers. Integer numeric constants may be expressed using bases two through thirty-six. - Underscores in Identifiers. The underscore character (_) may occur within identifiers and are significant to their spelling. - Zero Field Widths. The total field width and fraction digits expressions in write parameters may be zero. - Halt. The procedure HALT causes termination of the program. - Complex Numbers. o The simple-type COMPLEX allows complex numbers to be expressed in either Cartesian or polar notation. o The monadic operators + and - and dyadic operators +, -, *, /, =, [and] <> operate on complex values. o The functions CMPLX, POLAR, RE, IM, and ARG construct or provide information about complex values. o The functions ABS, SQR, SQRT, EXP, LN, SIN, COS, [and] ARCTAN operate on complex values. - Short Circuit Boolean Evaluation. The operators AND_THEN and OR_ELSE are logically equivalent to AND and OR, except that evaluation order is defined as left-to-right, and the right operand is not evaluated if the value of the expression can be determined solely from the value of the left operand. - Protected Parameters. A parameter of a procedure or a function can be protected from modification within the procedure or function. - Exponentation. The operators ** and POW provide exponentation of integer, real, and complex numbers to real and integer powers. - Subranges Bounds. A general expression can be used to specify the value of either bound in a subrange. - Tag Fields of Dynamic Variables. Any tag field specified by a parameter to the procedure NEW is given the specified value. - Conformant Arrays. Conformant arrays provide upward compatibility with level 1 of ISO 7185, Programming languages - PASCAL. ***************************************************************************** Q. HOW EASY IS IT TO CONVERT BORLAND PROGRAMS TO THE EXTENDED PASCAL STANDARD ? A. As mentioned earlier, Turbo Pascal does not conform to any of the Pascal standards. If you carefully chose a subset of unextended Pascal, you may be able to port code if you're lucky/careful. To be fair, Turbo Pascal has some wonderful features that make it very powerful in the environments in which it runs. However, those same features are of little use on non Windows/DOS platforms and probably are not good candidates for standardization. There are several Turbo Pascal features which are semantically similar to features in unextended Pascal or Extended Pascal. Here is a list of mappings between Turbo Pascal features and Extended Pascal features: - Case constructs a. Extended Pascal uses otherwise instead of else. Borland Pascal: case c of 'A' : ; 'B' : ; else ...; end; Extended Pascal case c of 'A' : ; 'B' : ; otherwise ...; end; b. Missing cases cause Extended Pascal compilers to halt. In the case statement above if you had no otherwise clause and char c had the value 'C', you got an error (note, this would be unnoticed in Borland Pascal). - Procedure and function types and variables Here is an area of subtle differences. Turbo Pascal has true procedure/function types but doesn't have standard Pascal's procedural/functional parameters. Borland Pascal type CompareFunction = function(Key1, Key2 : string) : integer; function Sort(Compare : CompareFunction); begin ... end; Extended Pascal function Sort(Compare : function(Key1, Key2 : string) : integer); begin ... end; Moving from Turbo Pascal to Extended Pascal might be difficult if the Turbo Pascal program saves, compares, trades, etc. procedure values. For example, an array of procedure values isn't possible in Extended Pascal. Moving the other way is a little easier as show by the above examples. - Strings a. Borland Pascal's string type has a special case, namely string without a length meaning the same as string[255]. There is no default in Extended Pascal so you have to change all string types to string(255). Example: var s : string; becomes: var s : string(255); Note also that you have to use parentheses instead of brackets. b. A nice pitfall is the pointer to string as in: type PString = ^String; In Extended Pascal this is a pointer to a schema type!! Don't forget to translate this to: type string255 = string(255); PString = ^string255; If you indeed want to use String as a schema pointer you can define things like: type MyStr : ^String; begin New(MyStr, 1024); end; to allocate 1024 bytes of string space. c. As you could see above, Extended Pascal has no 255 byte limit for strings. It is however save to assume a limit of about 32000 bytes. At least Prospero's Extended Pascal limits strings to 32760 bytes. GNU Pascal seems to allow larger strings. DEC Pascal limits strings to 65535 bytes. - Constant variables a. Extended Pascal translates Borland's gruesome: const i:integer = 0; to: var i : integer value 0; Much nicer ain't it? b. Even nicer is that you can assign initialization values to types. Like: type MyInteger = integer value 0; var i : MyInteger; All variables of type MyInteger are automatically initialized to 0 when created. c. Constant arrays of type string are translated from: const MyStringsCount = 5; type Ident = string[20]; const MyStrings : array [1..MyStringsCount] of Ident = ( 'EXPORT', 'IMPLEMENTATION', 'IMPORT', 'INTERFACE', 'MODULE'); to: const MyStringsCount = 5; type Ident = string(20); var MyStrings : array [1..MyStringsCount] of Ident value [ 1:'EXPORT'; 2:'IMPLEMENTATION'; 3:'IMPORT'; 4:'INTERFACE'; 5:'MODULE']; There seem to be pros and cons to each style. Some folks don't like having to specify an index since it requires renumbering if you want to add a new item to the middle. However, if you index by an enumerated type, you might be able to avoid major renumbering by hand. - Variant records The following construction is not allowed in Extended Pascal: type PersonRec = record Age : integer; case EyeColor : (Red, Green, Blue, Brown) of Red, Green : (Wears_Glasses : Boolean); Blue, Brown : (Length_of_lashes : integer); end; end; The variant field needs an explicit type. Code this as: type EyeColorType = (Red, Green, Blue, Brown); PersonRec = record Age : integer; case EyeColor : EyeColorType of Red, Green : (Wears_Glasses : Boolean); Blue, Brown : (Length_of_lashes : integer); end; end; - Units a. You can translate units almost automatically to Extended Pascal Modules, taking into account some differences of course. Extended Pascal does not automatically export everything named in a module, but you have to create seperate export clauses. For example translate the following unit: unit A; interface uses B, C; procedure D; implementation procedure D; begin end; end. to this module: module A interface; export A = (D); import B; C; procedure D; end. module A implementation; procedure D; begin end; end. You can have one or more export clauses and the name of an export clause doesn't have to be equal to the name of the module. You also see in this example how to translate the Borland Pascal "uses" clause to the Extended Pascal "import" clause. b. Borland Pascal allows you to have code in a unit that is executed once, at startup, to initialize things. You can translate this to Extended Pascal's "to begin do ..end" structure. Borland Pascal: unit A; interface implementation begin { do something } end. Extended Pascal: module A interface; end. module A implementation; to begin do begin { do something } end; end. Extended Pascal also has a "to end do .... end" so you can translate Exit handlers also. - Files Extended Pascal treats files quite differently as Borland Pascal. I'm not going to treat file pointers, Get and Put here, but instead I focus on special Extended Pascal features. In Borland Pascal you can read any text file as follows: var t : text; Line : string; begin Assign(t, 'MYTEXT.TXT'); Reset(t); while not eof(t) do begin readln(t, Line); writeln(Line); end; end; The Assign function associated the textfile T with the file MYTEXT.TXT. In Extended Pascal, files are considered entities external to your program. External entities, which don't need to be files, need to be bound to a variable your program. Any variable to which external entities can be bound needs to be declared bindable. So the variable declaration of t becomes: var t : bindable text; Extended Pascal has the bind function that binds a variable with an external entity. Here is an Extended Pascal procedure that emulates the Assign procedure in Turbo Pascal. procedure Assign(var t : text; protected Name : string); var b : BindingType; begin unbind (t); b := binding (t); b.Name := Name; bind (t, b); b := binding (t); end; Comments: the unbind procedure unbinds a bindable variable from its external entity. If it is not bound, nothing happens. The binding function initializes b. We call binding to set some fields of the BindingType record. Next we set the name field to the name of the file. Calling bind will bind t to the external entity. If we now call binding again, we get the current state of t's binding type. We can now check for example if the bind has succeeded by: if not b.bound then { do error processing } Note that Prospero's Pascal defaults to creating the file if it does not exists! You need to use Prospero's local addition of setting b.existing to true to work-around this. I've not worked with binary files enough, so no advice yet on how to access them, but you access them much the same. As last an example of getting the size of a file. function FileSize(filename : String) : integer; var f : bindable file [0..MaxInt] of char; b : BindingType; begin unbind(f); b := binding (f); b.Name := filename; bind(f, b); b := binding(f); SeekRead(f, 0); if empty(f) then filesize := 0 else filesize := LastPosition(f) + 1; unbind(f); end(*file_size*); ***************************************************************************** Q. WHAT IS THE OBJECT ORIENTED PASCAL STANDARD ? A. After the Extended Pascal standard was completed, the committee took up the task of adding object-oriented features to Extended Pascal. Actually, the work applies to both Extended Pascal and unextended Pascal, but to get the full benefit from the object-oriented features, certain features from Extended Pascal must also be utilized. This work was done as an ANSI Technical Report. Unlike the 2 previous standards, the technical report is not full of "standardize", but rather is more informal and readable than a full-bodied standard. This report was completed in 1993. Features from the technical report include: - A CLASS definition with support for ABSTRACT and PROPERTY classes. Abstract classes are place holders in the class hierarchy. An object cannot be created for an abstract class. Property classes provide characteristics, attributes, or properties of another class. An object can be tested to see whether or not is has a property. - Multiple inheritance from at most one concrete or abstract class and zero or more property classes. - Method definitions can be marked with ABSTRACT or OVERRIDE. - Constructors and destructors with zero or more parameters. - A predefined ROOT class containing a predefined constructor (CREATE), a predefined destructor (DESTROY), and two predefined methods (CLONE and EQUAL). - Predefined property class TEXTWRITABLE with methods READOBJ and WRITEOBJ. - CLASS VIEWs provide a new class type that is a partially opaque view of an existing class type. Views are used to provide visibility, security, and protocol between public and private uses of the class. - The object model is a reference model. This means that you create and destroy objects explicitly. Objects can be thought of as if they were accessed indirectly through references. - A membership operator, IS. ***************************************************************************** Q. WHO CREATES THE PASCAL STANDARDS ? A. Would you believe magic elves? No, I guess not... The Pascal standards are development and maintained by the American committee, X3J9, and the International working group, WG2. Most of the standardizers actually belong to both committees. During its peak, the committee met at least 4 times a year for a week at a time. The location of the meetings floated around from member to member trying to alternate east-coast vs west-coast. The WG2 meetings tried to alternate between North America and UK/Europe. Recently, the committee's work has slowed down and the meetings average about 1 or 2 per year with very rare meetings in the the UK. The committee is open to the public. To become a voting member on X3J9 you just have to satisfy some attendance/participation requirements (very easy) and be a member of X3 by paying dues (X3 dues can be waived for those unable to pay). Basically, anybody with a brain may attend (and if you've been to some previous meetings, you'll note that we sometimes waive that requirement too :-) ) If you are interested in attending a Pascal meeting, drop me a note at reagan@hiyall.enet.dec.com and I can give you information on the next meeting's location. Over the past years, many different vendors, user-groups, academics, etc. have participated in XJ39. Here is a quick list (not intended to be a total list) of participants: Digital Equipment Corporation, Apple Computer, Microsoft, Tandem Computers, Sun Microsystems, Intel, Siemens Nixdorf, IBM, Hewlett Packard, Edinburgh Portable Compilers, Prospero Software Ltd. University of Bochum, Borland International, US Air Force, University of Teesside, Visible Software, US Census Bureau, Symantec, Unisys, GTE, Control Data Corporation, Cray Research Inc., E-Systems, ACE - Associated Computer Experts bv, Stanford Linear Accelerator Center, Central Michigan University, Pace University, St. Peter's College, Prime Computer, Queens University, Research Libraries Group, Florida International University, Apollo Computer, NCR Corporation, Data General, and various individuals representing themselves. (Well, technically almost all members of the committee represent themselves and not their employers, but I thought people would recognize the names of the companies and not the names of the individuals.) ***************************************************************************** Q. ARE THERE ANYMORE STANDARDS IN PROGRESS ? Recently, the committee has been working on an exception handling model that is similar to what you might see in C++ or Modula-3. We hope to produce another ANSI Technical Report on this in the future. However, we're in need of more people to participate in our work. In the past, most of the work was physically done at the meetings, but I think that in the future, we'll have to do work by e-mail or newsgroups in a more informal fashion and only have physical meetings to consolidate the final work. ***************************************************************************** Q. HOW DO I GET COPIES OF THE STANDARDS ? The unextended Pascal and Extended Pascal standards are both copyrighted by the IEEE in the US and ISO/IEC in other contries. You can obtain copies from the IEEE, your countries standards body, University libraries, corporate libraries, or the ISO/IEC in Genhva Switzerland. Also, there is a Standards FAQ posted montly in News.Announce which helps finding who to contact. It can also be found at ftp.uni-erlangen.de:/pub/doc/ISO/. You can also Telnet to info.itu.ch with name "gopher". Here is some contact information: Philips Business Information 1-800-777-5006 Document Center 1-415-591-7600 ANSI 1-212-642-4900 Attention: Customer Service 11 West 42nd Street New York, NY 10036 ISO Sales Case Postale 56 CH-1211 Geneve 20 Switzerland email: sales@isocs.iso.ch http://www.iso.ch/welcome.html ! in English http://www.iso.ch/welcomef.html ! en Frangais I do not think that the IEEE has republished the unextended Pascal standard since it was essentially replaced with a pointer to the ISO 7185 standard. The IEEE may still have old copies lying around. The IEEE order number is SH08912. The ISBN number is 0-471-88944-X. I know that the revised ISO 7185 is available (I have a copy from BSI in the UK). For Extended Pascal, the IEEE order number is SH13243 and the ISBN number is 1-55937-031-9. Last time I checked the IEEE's price was around US$55. I'm not sure what the ISO/IEC charges. In addition, I've been told that the GNU Pascal kit at kampi.hut.fi:/jtv/gnu-pascal contains a LeX script for Extneded Pascal. I have no idea about its accuracy. The Object-Oriented Extensions to Pascal technical report is a ANSI technical report number ANSI/X3-TR-13-1994 (is 13 an omen?) I'm not sure of its copyright status, but it also isn't online in its final form (the editor was using some variant of Word Perfect and said he was unable to provide a readable text form). You can get the technical report from CBEMA in Washington DC. I have no idea at present how to get it outside the United States. I personally have a few hardcopies laying around my office and if you drop me a line at reagan@hiyall.enet.dec.com, I'll see what I can do. I should point out that the unextended Pascal and Extended Pascal standards are written in a very legalistic form and are not light reading. They really aren't suitable as an implementation guide or learning how to use features from unextended or Extended Pascal. On the other hand, the Object-Oriented Extensions to Pascal technical report is written in a less formal style. The committee is planning on producing "standardese" for the technical report to include in a future revision of the unextended and Extended Pascal standards. ***************************************************************************** Q. WHAT EXTENDED COMPILERS EXIST ? A. For Extended Pascal, only Prospero Pascal claims complete acceptance of Extended Pascal source code. Other vendor's compilers, like Digital, accept portions of the Extended Pascal standard. You'll have to ask your favorite vendor about support. The publically available GPC (Gnu Pascal Compiler) project includes many of the features of the extended standard, with the goal of implementing the entire standard. At present, there is no official Extended Pascal Validation Suite. Now that Prospero has obtained the rights to the Pascal Validation Suite, there is chance for a future EPVS. You'll have to ask Prospero about their plans. For the Object-Oriented Extensions to Pascal, I know of no compiler that yet claims support of the document. Again, ask your vendor to make sure. ***************************************************************************** ADDITIONS TO THE FAQ Submissions to this FAQ are encouraged. Just send me a Q & A formatted letter with a problem that is of common interest to Pascal programmers, and I will include it. sam@value.net *****************************************************************************