1% BEGIN LICENSE BLOCK 2% Version: CMPL 1.1 3% 4% The contents of this file are subject to the Cisco-style Mozilla Public 5% License Version 1.1 (the "License"); you may not use this file except 6% in compliance with the License. You may obtain a copy of the License 7% at www.eclipse-clp.org/license. 8% 9% Software distributed under the License is distributed on an "AS IS" 10% basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See 11% the License for the specific language governing rights and limitations 12% under the License. 13% 14% The Original Code is The ECLiPSe Constraint Logic Programming System. 15% The Initial Developer of the Original Code is Cisco Systems, Inc. 16% Portions created by the Initial Developer are 17% Copyright (C) 2006 Cisco Systems, Inc. All Rights Reserved. 18% 19% Contributor(s): 20% 21% END LICENSE BLOCK 22 23\section{Input/Output} 24 25%---------------------------------------------------------------------- 26\subsection{Streams} 27%---------------------------------------------------------------------- 28 29The {\eclipse} input/output system (io.c, bip_io.c) is based on the 30concept of streams. These are abstracted and buffered I/O channels, 31implemented on top of the operating systems low-level I/O operations. 32On the implementation level, each stream is described by a stream 33descriptor, holding among others information about 34\begin{itemize} 35\item operating system I/O handle 36\item type of stream, and method table 37\item various stream properties 38\item buffers and associated bookkeeping data 39\end{itemize} 40On the {\eclipse} language level, anonymous streams are identified by 41a small integers (the system maintains a global table mapping numbers 42to stream descriptors), or by atomic names (implemented via atom 43properties, see \ref{secproperty}). 44 45The system strives to present a unified stream interface for different 46kinds of underlying 'devices': 47\begin{description} 48\item[file] operating system files 49\item[tty] interactive character based I/O devices, consoles 50\item[pipe] inter-process channels with limited buffer size, unidirectional 51\item[socket] inter-process or -machine channels, bidirectional 52\item[string] in-memory buffers similar to files 53\item[queue] in-memory buffers with a read and a write end, similar to pipes 54\item[null] pseudo device with no input and discarding output 55\end{description} 56While obviously not all operations make sense on all types of stream, 57the implementation relies on method tables providing appropriate functionality 58for different operations on different stream types. 59 60All stream I/O is buffered, therefore usually a flush operation is required 61to force written data to go to the underlying device (if any). 62Buffers for real devices are of fixed size, and automatically flushed 63when full. For the in-memory streams (string, queue) the buffers expand 64as needed, since they represent the container itself. 65 66To support lexical analysis and parser-style code in general, stream 67support 4 bytes lookahead, by keeping that amount of already read data 68in the buffer, so the input can be backed up as much. 69 70Some types of streams (queues and sockets) 71support data-driven operation with their ability to raise a synchronous 72event when input data arrives on a previously empty stream. The event 73to be raised can be attached as a property to the stream. 74For queues, this is implemented simply using the post_event functionality 75of the embedding interface. For sockets, the implementation relies 76on the Unix SIGIO feature which is then mapped to a synchronous event. 77On Windows, this is achieved by an auxiliary thread that waits for socket 78input and posts the event to the {\eclipse} engine when input arrives. 79 80Queue streams are in particular meant to support communication with a 81host program in a setting where {\eclipse} is embedded as a library 82into a host program. The queue stream can be used to transfer control 83back (yield/resume model, see Embedding Manual) to the host program 84whenever input is required, or output action requested. 85 86 87%---------------------------------------------------------------------- 88\subsection{Lexical Analyzer} 89%---------------------------------------------------------------------- 90 91The lexical analyser (lex.c) breaks up input from I/O streams into 92lexical tokens. The definition of tokens can be found in the User 93Manual appendix on syntax. In contrast, in the implementation, the 94definition of tokens is somewhat refined, in order to enable the 95parser to detect special situations. In particular, the lexical 96analyser produces special tokens for 97\begin{itemize} 98\item opening brackets when preceded by blank space 99\item numbers when preceded by blank space 100\item identifiers when quoted 101\end{itemize} 102The syntax of tokens is defined in terms of character classes. The 103assignment of characters to classes can be modified in {\eclipse} by 104means of a character table (chtab/2 declaration). Therefore this 105 character table serves as input to the lexical analyser (it is part 106 of the module-specific syntax descriptor). 107A number of other syntax options are configurable as well, e.g.\ 108\begin{itemize} 109\item adjacent string concatenation 110\item different escape sequence standards 111\item different syntax for radix numbers 112\item nested comments 113\end{itemize} 114The lexical analyser provides a C and a Prolog API (read_token/3), the 115former is used by the parser, the latter is as a built-in predicate. 116In addition, the subroutine for number-string conversion can be used 117independently, e.g.\ for the number_string/2 built-in. 118 119Up to 3 characters of lookahead are needed in the lexical analyser, e.g.\ 120in the input "3e+a" is a sequence of 4 individual tokens (integer, atom, 121atom, atom), while "3e+2" is a floating point number, but this only 122becomes apparent after reading the forth character. The necessary 123lookahead facility is provided by the stream implementation. 124 125 126%---------------------------------------------------------------------- 127\subsection{Parser} 128%---------------------------------------------------------------------- 129 130The parser (read.c) is implemented as a recursive descent parser in C, 131takes its input from the lexical analyzer, and constructs {\eclipse} 132terms directly on the global stack. The main difficulty is the 133handling of operators with user-defined precedences. The algorithm 134used is similar to the one in O'Keefe's public domain Prolog parser 135read.pl (which can be found as an {\eclipse} third-party library). 136 137The main difference with the Prolog parser is that the {\eclipse} 138parser is completely deterministic and resolves all ambiguities with a 139maximal lookahead of two tokens, where the Prolog parser employs 140unlimited backtracking to resolve ambiguous input. As a consequence, 141{\eclipse} will reject some otherwise valid constructs as ambiguous, 142notably "not/foo" because of the conflict between low-precedence 143prefix operator not/1 and infix operator / /2. In such cases, extra 144parentheses are required to resolve the ambiguity, i.e.\ "(not)/foo". 145See the User Manual appendix on syntax for more ambiguity resolution 146rules. It can be argued that this is a disadvantage, but constructs 147which cannot be parsed with a finite lookahead are also hard to 148understand for a human reader, so little harm is done by disallowing 149them. 150 151The {\eclipse} syntax supports a number of extensions compared to plain 152Prolog syntax. The parser is also highly configurable and accepts syntax 153flags to emulate the behaviour of other Prolog systems and standards. 154The syntax extensions and variations that affect parser implementation are: 155\begin{itemize} 156\item array subscript syntax, e.g.\ M\[3\] 157\item variable attribute syntax, e.g.\ X\{A\} 158\item structure syntax, e.g.\ emp\{sal:10\} 159\item binary prefix operators 160\item treatment of vertical bars 161\item significance of blanks 162\end{itemize} 163 164The parser also keeps track of subterms that need macro expansion 165(done in a subsequent pass), and has a mode in which it constructs 166a parse term annotated with source layout information. 167 168 169%---------------------------------------------------------------------- 170\subsection{Term Writer} 171%---------------------------------------------------------------------- 172 173The term writer (write.c) is a C routine that traverses an {\eclipse} 174terms and produces a textual representation on an output stream. 175It supports various options, essentially corresponding to the options 176that can be given to the write_term/2,3 built-in (see User Manual, section 177Term Output Formats). 178 179A few points are of interest in the implementation: 180\begin{itemize} 181\item an important distinctions is between output formats that are for 182 human consumption, and formats that can be read back by the parser, 183 yielding an exact copy of the original term. 184\item the context module is relevant for the output syntax created 185\item most actual formatting is done by C printf, except for floating 186 point numbers (where we have certain accuracy requirements), 187 and for strings containing NUL characters. 188\item write-macros are optionally applied if the term type, or its 189 functor, has the corresponding portray-property. The transformation 190 predicates are executed using a recursive emulator. 191\item external data handles are printed using their own to_string conversion 192 function, if one is provided. 193\item special syntax (operators, subscript, attributes, structures) is 194 re-created when the corresponding syntax options are set, and 195 canonical output is not requested. 196\item quoting of atoms depends on the character class definitions in effect 197 (compare lexical analyzer) 198\item variables can be printed in different ways. The source variable name 199 can be used, a unique number can be used (we use the offset of 200 the variable's address from the start of the stack, in pword units), 201 a combination of these, or just a simple underscore. 202\item cyclic terms will lead to looping in the output routine, unless a 203 depth limit is in effect. Better treatment of this would require 204 defining a syntax for cyclic terms. 205\item for meta-cycles, i.e.\ cycles that occur because variable attributes 206 are printed which indirectly contain the variable itself, there is 207 special provision: the attribute is only printed once with the first 208 variable occurrence, so no looping occurs. 209\item floating point variables require some special treatment to produce 210 the required precision, and the required digits on both sides of 211 the decimal point, and the {\eclipse} syntax for infinities 212 (1.0Inf). For some host systems, workaround are required to handle 213 negative zeroes correctly. 214\end{itemize} 215 216 217 218