1% BEGIN LICENSE BLOCK
2% Version: CMPL 1.1
3%
4% The contents of this file are subject to the Cisco-style Mozilla Public
5% License Version 1.1 (the "License"); you may not use this file except
6% in compliance with the License.  You may obtain a copy of the License
7% at www.eclipse-clp.org/license.
8% 
9% Software distributed under the License is distributed on an "AS IS"
10% basis, WITHOUT WARRANTY OF ANY KIND, either express or implied.  See
11% the License for the specific language governing rights and limitations
12% under the License. 
13% 
14% The Original Code is  The ECLiPSe Constraint Logic Programming System. 
15% The Initial Developer of the Original Code is  Cisco Systems, Inc. 
16% Portions created by the Initial Developer are
17% Copyright (C) 2006 Cisco Systems, Inc.  All Rights Reserved.
18% 
19% Contributor(s): 
20% 
21% END LICENSE BLOCK
22
23\section{Input/Output}
24
25%----------------------------------------------------------------------
26\subsection{Streams}
27%----------------------------------------------------------------------
28
29The {\eclipse} input/output system (io.c, bip_io.c) is based on the
30concept of streams.  These are abstracted and buffered I/O channels,
31implemented on top of the operating systems low-level I/O operations.
32On the implementation level, each stream is described by a stream
33descriptor, holding among others information about
34\begin{itemize}
35\item operating system I/O handle
36\item type of stream, and method table
37\item various stream properties
38\item buffers and associated bookkeeping data
39\end{itemize}
40On the {\eclipse} language level, anonymous streams are identified by
41a small integers (the system maintains a global table mapping numbers
42to stream descriptors), or by atomic names (implemented via atom
43properties, see \ref{secproperty}).
44
45The system strives to present a unified stream interface for different
46kinds of underlying 'devices':
47\begin{description}
48\item[file] operating system files
49\item[tty] interactive character based I/O devices, consoles
50\item[pipe] inter-process channels with limited buffer size, unidirectional
51\item[socket] inter-process or -machine channels, bidirectional
52\item[string] in-memory buffers similar to files
53\item[queue] in-memory buffers with a read and a write end, similar to pipes
54\item[null] pseudo device with no input and discarding output
55\end{description}
56While obviously not all operations make sense on all types of stream,
57the implementation relies on method tables providing appropriate functionality
58for different operations on different stream types.
59
60All stream I/O is buffered, therefore usually a flush operation is required
61to force written data to go to the underlying device (if any).
62Buffers for real devices are of fixed size, and automatically flushed
63when full. For the in-memory streams (string, queue) the buffers expand
64as needed, since they represent the container itself.
65
66To support lexical analysis and parser-style code in general, stream
67support 4 bytes lookahead, by keeping that amount of already read data
68in the buffer, so the input can be backed up as much.
69
70Some types of streams (queues and sockets)
71support data-driven operation with their ability to raise a synchronous
72event when input data arrives on a previously empty stream. The event
73to be raised can be attached as a property to the stream.
74For queues, this is implemented simply using the post_event functionality
75of the embedding interface.  For sockets, the implementation relies
76on the Unix SIGIO feature which is then mapped to a synchronous event.
77On Windows, this is achieved by an auxiliary thread that waits for socket
78input and posts the event to the {\eclipse} engine when input arrives.
79
80Queue streams are in particular meant to support communication with a
81host program in a setting where {\eclipse} is embedded as a library
82into a host program. The queue stream can be used to transfer control
83back (yield/resume model, see Embedding Manual) to the host program
84whenever input is required, or output action requested.
85
86
87%----------------------------------------------------------------------
88\subsection{Lexical Analyzer}
89%----------------------------------------------------------------------
90
91The lexical analyser (lex.c) breaks up input from I/O streams into
92lexical tokens.  The definition of tokens can be found in the User
93Manual appendix on syntax.  In contrast, in the implementation, the
94definition of tokens is somewhat refined, in order to enable the
95parser to detect special situations.  In particular, the lexical
96analyser produces special tokens for
97\begin{itemize}
98\item opening brackets when preceded by blank space
99\item numbers when preceded by blank space
100\item identifiers when quoted
101\end{itemize}
102The syntax of tokens is defined in terms of character classes.  The
103assignment of characters to classes can be modified in {\eclipse} by
104means of a character table (chtab/2 declaration). Therefore this
105 character table serves as input to the lexical analyser (it is part
106 of the module-specific syntax descriptor).
107A number of other syntax options are configurable as well, e.g.\
108\begin{itemize}
109\item adjacent string concatenation
110\item different escape sequence standards
111\item different syntax for radix numbers
112\item nested comments
113\end{itemize}
114The lexical analyser provides a C and a Prolog API (read_token/3), the
115former is used by the parser, the latter is as a built-in predicate. 
116In addition, the subroutine for number-string conversion can be used
117independently, e.g.\ for the number_string/2 built-in.
118
119Up to 3 characters of lookahead are needed in the lexical analyser, e.g.\
120in the input "3e+a" is a sequence of 4 individual tokens (integer, atom,
121atom, atom), while "3e+2" is a floating point number, but this only
122becomes apparent after reading the forth character. The necessary
123lookahead facility is provided by the stream implementation.
124
125
126%----------------------------------------------------------------------
127\subsection{Parser}
128%----------------------------------------------------------------------
129
130The parser (read.c) is implemented as a recursive descent parser in C,
131takes its input from the lexical analyzer, and constructs {\eclipse}
132terms directly on the global stack.  The main difficulty is the
133handling of operators with user-defined precedences.  The algorithm
134used is similar to the one in O'Keefe's public domain Prolog parser
135read.pl (which can be found as an {\eclipse} third-party library).
136
137The main difference with the Prolog parser is that the {\eclipse}
138parser is completely deterministic and resolves all ambiguities with a
139maximal lookahead of two tokens, where the Prolog parser employs
140unlimited backtracking to resolve ambiguous input.  As a consequence,
141{\eclipse} will reject some otherwise valid constructs as ambiguous,
142notably "not/foo" because of the conflict between low-precedence
143prefix operator not/1 and infix operator / /2.  In such cases, extra
144parentheses are required to resolve the ambiguity, i.e.\ "(not)/foo". 
145See the User Manual appendix on syntax for more ambiguity resolution
146rules.  It can be argued that this is a disadvantage, but constructs
147which cannot be parsed with a finite lookahead are also hard to
148understand for a human reader, so little harm is done by disallowing
149them.
150
151The {\eclipse} syntax supports a number of extensions compared to plain
152Prolog syntax. The parser is also highly configurable and accepts syntax
153flags to emulate the behaviour of other Prolog systems and standards.
154The syntax extensions and variations that affect parser implementation are:
155\begin{itemize}
156\item array subscript syntax, e.g.\ M\[3\]
157\item variable attribute syntax, e.g.\ X\{A\}
158\item structure syntax, e.g.\ emp\{sal:10\}
159\item binary prefix operators
160\item treatment of vertical bars
161\item significance of blanks
162\end{itemize}
163
164The parser also keeps track of subterms that need macro expansion
165(done in a subsequent pass), and has a mode in which it constructs
166a parse term annotated with source layout information.
167
168
169%----------------------------------------------------------------------
170\subsection{Term Writer}
171%----------------------------------------------------------------------
172
173The term writer (write.c) is a C routine that traverses an {\eclipse}
174terms and produces a textual representation on an output stream.
175It supports various options, essentially corresponding to the options
176that can be given to the write_term/2,3 built-in (see User Manual, section
177Term Output Formats).
178
179A few points are of interest in the implementation:
180\begin{itemize}
181\item an important distinctions is between output formats that are for
182	human consumption, and formats that can be read back by the parser,
183	yielding an exact copy of the original term.
184\item the context module is relevant for the output syntax created
185\item most actual formatting is done by C printf, except for floating
186	point numbers (where we have certain accuracy requirements),
187	and for strings containing NUL characters.
188\item write-macros are optionally applied if the term type, or its
189	functor, has the corresponding portray-property. The transformation
190	predicates are executed using a recursive emulator.
191\item external data handles are printed using their own to_string conversion
192	function, if one is provided.
193\item special syntax (operators, subscript, attributes, structures) is
194	re-created when the corresponding syntax options are set, and
195	canonical output is not requested.
196\item quoting of atoms depends on the character class definitions in effect
197	(compare lexical analyzer)
198\item variables can be printed in different ways. The source variable name
199	can be used, a unique number can be used (we use the offset of
200	the variable's address from the start of the stack, in pword units),
201	a combination of these, or just a simple underscore.
202\item cyclic terms will lead to looping in the output routine, unless a
203	depth limit is in effect.  Better treatment of this would require
204	defining a syntax for cyclic terms.
205\item for meta-cycles, i.e.\ cycles that occur because variable attributes
206	are printed which indirectly contain the variable itself, there is
207	special provision: the attribute is only printed once with the first
208	variable occurrence, so no looping occurs.
209\item floating point variables require some special treatment to produce
210	the required precision, and the required digits on both sides of
211	the decimal point, and the {\eclipse} syntax for infinities
212	(1.0Inf). For some host systems, workaround are required to handle
213	negative zeroes correctly.
214\end{itemize}
215
216
217
218