History log of /seL4-l4v-10.1.1/HOL4/src/parse/base_tokens.sml
Revision Date Author Comments
# a3643037 28-Jul-2017 Michael Norrish <Michael.Norrish@nicta.com.au>

Allow trailing semicolons in datatype declarations

Also:
- factor out datatype declarations for related AST types so that they
print nicely in Poly/ML interactive loops
- share more code between parsers of old and new syntax

With test-cases.

Closes #446


# 19456733 28-Jul-2011 Michael Norrish <Michael.Norrish@nicta.com.au>

Fix tokenizing of decimals when they're concatenated with various symbols.

Also allow decimals to include eye-helping underscores.


# e9f6ecad 28-Jul-2011 Michael Norrish <Michael.Norrish@nicta.com.au>

Use polymorphism to get rid of base_token0 type.


# d3e399bb 28-Jul-2011 Michael Norrish <Michael.Norrish@nicta.com.au>

Parse decimal fractions, turning them into divisions.

Client theories need to overload the magic string

GrammarSpecials.decimal_fraction_special

to their division constant in order to get this to work.


# 29268290 31-Jul-2009 Michael Norrish <Michael.Norrish@nicta.com.au>

More corrections required to our sources when I do the migration to
Basis 97 properly.


# 5c771b3d 15-Jan-2009 Michael Norrish <Michael.Norrish@nicta.com.au>

Modify lexing so that "mixed" tokens are possible if a user puts them
into the grammar. For example, one might do

set_fixity "-->a" (Infix(NONASSOC, 450))

This then causes ``p -->a q`` to be parsed as an infix application of
-->a to p and q. If you wrote ``p --> a q``, the -->a token would not
be seen. Without -->a being in the grammar, it will be lexed as
before, a symbolic token (-->) followed by an alphanumeric one (a).
(Note also that things like --a--> where the mixing up happens in the
middle of the token, are just as possible.)

This change also allows traditional alphanumeric identifiers to
combine with Unicode subscripts. Using LaTeX notation, a_1 is now
possible.

The system builds with selftest level 1, so I believe I have preserved
backwards compatibility. (I have also added a test to
src/bool/selftest.sml to check for _1 working as a case expression
pattern properly.)


# 1e693f02 14-Apr-2005 Michael Norrish <Michael.Norrish@nicta.com.au>

Changed the lexing implementation in src/parse to use our own copy of mllex.
Speed is comparable, code is conforming SML, and mllex is also something
of a standard for the other implementations too. I will eventually remove
all uses of mosmllex and mosmlyac from the distribution, perhaps starting
with tools/Holmake.
Functionality is slightly altered: it is now not possible to have
antiquotations inside comments. If there is hue and cry about this, I can
restore this functionality though in rather a hackish way. None of the
core distribution relies on being able to do this.


# fefe1d48 25-Jan-2005 Michael Norrish <Michael.Norrish@nicta.com.au>

Allow differently-based numeric literals. Can now write
0xAF
or
0b1010011
or
063
The latter is octal and requires the flag in base_tokens to be set to
true. (This is a backwards compatibility bodge, just in case there are
people up there who liked leading decimal zeroes.)

Also, all numeric constants can be intermingled with underscores, which
are ignored. These can be used to space numbers for extra legibility.
For example, 12_345_678 is easier to spot as 12 million than 12345678.

Note the convention that numeric type identifier characters don't overlap
with hex digits and are lowercase. For example, 0xAAw would be the word
170. In the potentially ambiguous case of 0xaa, the a is taken to be a
hex digit, not a numeric type specifier.


# 275a997c 03-Sep-2002 Keith Wansbrough <keith@lochan.org>

This commit contains the initial version of the error-location
reporting code I have been working on over the last couple of weeks.

Basically, error messages will now report the source location they
relate to, enabling users to locate bugs more easily within large
definitions.

So far, errors are reported by the lexer, the term parser, and the
datatype parser. Type parser errors will follow, and (hopefully)
typecheck errors.

src/0/locn is a small location-handling library. locn.locn is the
type of locations, and 'a locn.located is a synonym for 'a *
locn.locn. There is a location, Loc_None, for compiler-generated
text, and another, Loc_Unknown, for text of unknown location.

src/0/Feedback now includes some functions for generating located errors.

The lexer now returns located tokens, term_token and type_token
maintain the locations, and parse_term maintains them *internally* on
each terminal and nonterminal. When the system is complete, locations
will persist into absyn and preterm (if I understand the code
correctly!) to enable the typechecker to yield located errors.

I would appreciate any feedback:

* Is the performance impact significant?
* Are any error messages incorrect? (there's a lot of
location-manipulation code in parse_term, some of which I don't
fully understand, and surely some bugs)
* Have I broken anything for anyone, or am I likely to?

The actual output format of the location information is provisional
only, and I intend to generate absolute file positions (rather than
fragment offsets) shortly.

--KW 8-)

NB: tags pre-locn-change and post-locn-change cover this commit.


# a1dfcb9b 26-Jul-2002 Michael Norrish <Michael.Norrish@nicta.com.au>

A complete re-implementation of lexing. Speeds things up considerably
through the use of mosmllex. (The example at the bottom of term_tokens.sml
goes through four to five times faster than it used to.) The basic
design is now more stateful than it used to be. There is a new type,
called the qbuf, which provides buffered access to a quotation. The
implementation of parse_type changes completely because it can no longer
do the monadic-parsing trick of backing up over failed parses. The
implementation of the term parser doesn't really change at all.