History log of /seL4-l4v-10.1.1/HOL4/src/parse/term_tokens.sml
Revision Date Author Comments
# 793c19de 17-Sep-2018 Michael Norrish <Michael.Norrish@nicta.com.au>

Get chained Unicode negations to print properly

Selftests for both the pretty-printing behaviour and the underlying
token-splitting function that this depends on.


# 6a81a039 21-May-2018 Michael Norrish <Michael.Norrish@nicta.com.au>

Remove TABs from src

Will also make selftest to check that they aren't introduced


# f179922a 14-May-2018 Michael Norrish <Michael.Norrish@nicta.com.au>

Lex tokens consisting of only $-signs as identifiers

This adds behaviour to the system: previously it was impossible to get
anything out of quotations containing bare $-signs. Now these are
lexed as term identifiers. (The type parser was already capable of
lexing such things sensibly.)

I am strongly inclined to now use single dollar-signs as in Haskell,
as an alternative form of adding parentheses. For example,

f $ g $ x + 1

is a lighter way of writing f (g (x + 1)). The effect is achieved by
making $ a right-associative infix with precedence looser than normal
function application.

Closes #512


# ac4806d9 18-Feb-2014 Michael Norrish <Michael.Norrish@nicta.com.au>

Implement "ignore overloads for printing" feature.

Demonstrate with change to list, so that ``~MEM a l`` prints that way
rather than as ``a NOTIN set l``. Selftest checks this, and also that
``x NOTIN {2;3;4}`` continues to print the right way.

Closes #95.


# c5b67214 28-Jul-2011 Michael Norrish <Michael.Norrish@nicta.com.au>

Fix some fiddly lexing problems to do with the new decimals syntax.


# 19456733 28-Jul-2011 Michael Norrish <Michael.Norrish@nicta.com.au>

Fix tokenizing of decimals when they're concatenated with various symbols.

Also allow decimals to include eye-helping underscores.


# 61dad220 28-Jul-2011 Michael Norrish <Michael.Norrish@nicta.com.au>

Provide entry point for testing the term lexer.


# d3e399bb 28-Jul-2011 Michael Norrish <Michael.Norrish@nicta.com.au>

Parse decimal fractions, turning them into divisions.

Client theories need to overload the magic string

GrammarSpecials.decimal_fraction_special

to their division constant in order to get this to work.


# cb36afe9 16-Nov-2009 Michael Norrish <Michael.Norrish@nicta.com.au>

Fix for bug of failing to parse num$0. Also more test-cases.


# 29268290 31-Jul-2009 Michael Norrish <Michael.Norrish@nicta.com.au>

More corrections required to our sources when I do the migration to
Basis 97 properly.


# 60d4c007 14-May-2009 Michael Norrish <Michael.Norrish@nicta.com.au>

Forgot that it was reasonable to have constant names include digits,
though they can't begin with digits.


# e8d172d6 14-May-2009 Michael Norrish <Michael.Norrish@nicta.com.au>

A better fix for the problem with lexing things like cl$-->. Now we
just make some reasonable assumptions about the forms of possible
constant names. In particular, they have to be ASCII, can't
include a right parenthesis and must also not mix lexical categories,
so "a-b" is not a valid constant name as far as the lexer is
concerned. Of course, you can still do these and stranger things at
the syntax level.


# b50415e4 14-May-2009 Michael Norrish <Michael.Norrish@nicta.com.au>

A tentative fix for the problem discovered in
examples/ind_def/clScript.sml (which arose when milScript.sml
attempted to load the generated theory file).

The issue is that the tokeniser applied its standard rules for classes
of character to strings following the dollar in cl$-->, and split
after the minus sign, creating the token cl$--. The issue is that the
constants in a particular theory could
1. be named just about anything
2. won't appear in the "specials" list that the lexer uses to ensure
that it doesn't 'split' known-to-be-good tokens in half.


# 12b17695 12-Feb-2009 Michael Norrish <Michael.Norrish@nicta.com.au>

Be smarter about splitting tokens up with whitespace. Now get the
following:
- load "arithmeticTheory";
> val it = () : unit
- ``- -x``;
<<HOL message: inventing new type variable names: 'a>>
> val it = ``--x`` : term
- load "wordsLib";
> val it = () : unit
- ``- -x``;
<<HOL message: inventing new type variable names: 'a>>
> val it = ``- -x`` : term
Would have previously got --x printed in both circumstances, even
though there is a constant -- in wordsTheory.


# d653fb1c 11-Feb-2009 Michael Norrish <Michael.Norrish@nicta.com.au>

Implement unary minus (the old ~ still also works). Selftest level 2
passes. Changes in the "words world" were replacing $- with -. In
the two quotient changes, I made sure that the infix operator
that used the - character was part of the grammar before it was
parsed. This is necessary because - is now a non-aggregating
character, mainly so that ``--p`` parses as three tokens, not two. Of
course, if you added -- to your grammar, then it would go back to
parsing as two.


# fcde1b9f 18-Jan-2009 Michael Norrish <Michael.Norrish@nicta.com.au>

Some more fiddling with character-classes for the purposes of lexing
and pretty-printing. In particular, change Unicode lambda to be
treated as a 'symbol' rather than a letter, so that one can write
λx.x
(that's a Unicode lambda followed by x.x) and have it lex correctly.

Also treat Unicode logical negation as non-aggregating so that one can
have two of them in a row treated as two tokens not one.


# 5c771b3d 15-Jan-2009 Michael Norrish <Michael.Norrish@nicta.com.au>

Modify lexing so that "mixed" tokens are possible if a user puts them
into the grammar. For example, one might do

set_fixity "-->a" (Infix(NONASSOC, 450))

This then causes ``p -->a q`` to be parsed as an infix application of
-->a to p and q. If you wrote ``p --> a q``, the -->a token would not
be seen. Without -->a being in the grammar, it will be lexed as
before, a symbolic token (-->) followed by an alphanumeric one (a).
(Note also that things like --a--> where the mixing up happens in the
middle of the token, are just as possible.)

This change also allows traditional alphanumeric identifiers to
combine with Unicode subscripts. Using LaTeX notation, a_1 is now
possible.

The system builds with selftest level 1, so I believe I have preserved
backwards compatibility. (I have also added a test to
src/bool/selftest.sml to check for _1 working as a case expression
pattern properly.)


# 54bd1058 13-Jan-2009 Anthony Fox <anthony.fox@cl.cam.ac.uk>

Restore special treatment of "_" in variable names.
That is, treat ``_x`` as a single token.


# f8646a44 23-Dec-2008 Michael Norrish <Michael.Norrish@nicta.com.au>

Modify the lexer to handle Unicode tokens properly. For example, you
can type the equivalent of \forall\tau_1 and it will split this blob
of three Unicode characters (and some larger number of bytes) into two
tokens (\forall followed by \tau_1). The pretty-printer also knows
about token categories.


# 36c48f38 02-Aug-2006 Michael Norrish <Michael.Norrish@nicta.com.au>

Fix the earlier problem that was causing the lexer to get confused when
a non-aggregating character like [ was followed by a character literal.
It is now possible to write [#"c"].


# 8800e335 30-Jul-2006 Michael Norrish <Michael.Norrish@nicta.com.au>

Fix for a bug found by Mike in handling of character literals. Bug
added to regression test in src/string/selftest.sml.


# 1e693f02 14-Apr-2005 Michael Norrish <Michael.Norrish@nicta.com.au>

Changed the lexing implementation in src/parse to use our own copy of mllex.
Speed is comparable, code is conforming SML, and mllex is also something
of a standard for the other implementations too. I will eventually remove
all uses of mosmllex and mosmlyac from the distribution, perhaps starting
with tools/Holmake.
Functionality is slightly altered: it is now not possible to have
antiquotations inside comments. If there is hue and cry about this, I can
restore this functionality though in rather a hackish way. None of the
core distribution relies on being able to do this.


# fefe1d48 25-Jan-2005 Michael Norrish <Michael.Norrish@nicta.com.au>

Allow differently-based numeric literals. Can now write
0xAF
or
0b1010011
or
063
The latter is octal and requires the flag in base_tokens to be set to
true. (This is a backwards compatibility bodge, just in case there are
people up there who liked leading decimal zeroes.)

Also, all numeric constants can be intermingled with underscores, which
are ignored. These can be used to space numbers for extra legibility.
For example, 12_345_678 is easier to spot as 12 million than 12345678.

Note the convention that numeric type identifier characters don't overlap
with hex digits and are lowercase. For example, 0xAAw would be the word
170. In the potentially ambiguous case of 0xaa, the a is taken to be a
hex digit, not a numeric type specifier.


# 275a997c 03-Sep-2002 Keith Wansbrough <keith@lochan.org>

This commit contains the initial version of the error-location
reporting code I have been working on over the last couple of weeks.

Basically, error messages will now report the source location they
relate to, enabling users to locate bugs more easily within large
definitions.

So far, errors are reported by the lexer, the term parser, and the
datatype parser. Type parser errors will follow, and (hopefully)
typecheck errors.

src/0/locn is a small location-handling library. locn.locn is the
type of locations, and 'a locn.located is a synonym for 'a *
locn.locn. There is a location, Loc_None, for compiler-generated
text, and another, Loc_Unknown, for text of unknown location.

src/0/Feedback now includes some functions for generating located errors.

The lexer now returns located tokens, term_token and type_token
maintain the locations, and parse_term maintains them *internally* on
each terminal and nonterminal. When the system is complete, locations
will persist into absyn and preterm (if I understand the code
correctly!) to enable the typechecker to yield located errors.

I would appreciate any feedback:

* Is the performance impact significant?
* Are any error messages incorrect? (there's a lot of
location-manipulation code in parse_term, some of which I don't
fully understand, and surely some bugs)
* Have I broken anything for anyone, or am I likely to?

The actual output format of the location information is provisional
only, and I intend to generate absolute file positions (rather than
fragment offsets) shortly.

--KW 8-)

NB: tags pre-locn-change and post-locn-change cover this commit.


# ae96f1d2 14-Aug-2002 Michael Norrish <Michael.Norrish@nicta.com.au>

Following a suggestion of Konrad's, allow term tokens to be multiple
apostrophes (so ' '' ''' etc are all possible term tokens).


# 4e6a0987 01-Aug-2002 Michael Norrish <Michael.Norrish@nicta.com.au>

Allow single apostrophes to appear as term identifiers. I.e., ' is now a
valid term token. Getting your levels confused and trying 'a is still
an error. I have designs on the "tick" for the finite map theory, where
I think it will make a nice infix function application symbol.


# a1dfcb9b 26-Jul-2002 Michael Norrish <Michael.Norrish@nicta.com.au>

A complete re-implementation of lexing. Speeds things up considerably
through the use of mosmllex. (The example at the bottom of term_tokens.sml
goes through four to five times faster than it used to.) The basic
design is now more stateful than it used to be. There is a new type,
called the qbuf, which provides buffered access to a quotation. The
implementation of parse_type changes completely because it can no longer
do the monadic-parsing trick of backing up over failed parses. The
implementation of the term parser doesn't really change at all.


# fd9df862 02-Nov-2000 Michael Norrish <Michael.Norrish@nicta.com.au>

Dollared identifiers now get properly translated into constants.


# a2511ab7 02-Nov-2000 Michael Norrish <Michael.Norrish@nicta.com.au>

Changes to bring about Kananaskis parser. Known bug exists in
treatment of qualified identifiers.


# d5716d2a 10-Jul-2000 Michael Norrish <Michael.Norrish@nicta.com.au>

Slight update to a comment, making it more accurate.


# 437135ef 25-Nov-1999 Michael Norrish <Michael.Norrish@nicta.com.au>

Fixed a lexical bug; "," was considered an aggregating character, so
writing ($+, 3) was parsed as the application of a function called +,
to the argument 3. By making "," non-aggregating, this behaviour goes
away.


# b6a0e462 11-Oct-1999 Michael Norrish <Michael.Norrish@nicta.com.au>

Two bugs fixed:
1. `foo bar baz)` was parsing OK, because I wasn't requiring that the
lookahead component of the PStack be empty
2. `foo $` and `foo $^t` were both raising Subscript errors; this has
been improved to raise appropriate LEX_ERRORs.


# 99a265f3 06-Oct-1999 Michael Norrish <Michael.Norrish@nicta.com.au>

Modified implementation of lexer, seeming to make it 40% faster.
Minor fix made to fragstr.sml


# dd5f687d 07-Sep-1999 Michael Norrish <Michael.Norrish@nicta.com.au>

Lexer now reports lexing error by spotting that there is input still to
be grabbed that it can't make any sense of.


# d28425f8 06-Sep-1999 Michael Norrish <Michael.Norrish@nicta.com.au>

Quite radical changes which allow for "." to be used as the record
field selection operator. Lexer has changed quite dramatically:
it is no longer responsible for determining whether or not a token
is a "keyword" or not. Instead this is done at the parsing level.

The lexer is also now unable to cope with identifiers that are a mix
of alphabetically acceptable and symbolically acceptable characters.


# dc18dcf9 03-Sep-1999 Michael Norrish <Michael.Norrish@nicta.com.au>

Many changes, mainly to do with record syntax. Also implemented long IDs
as theory-name$id. All of HOL doesn't quite build because field selection
operator is ->, which is used in the real library. Will change this to be
just a simple "." Have also started on concrete syntax for updates.


# cf9ed16b 07-Jul-1999 Michael Norrish <Michael.Norrish@nicta.com.au>

Numerous changes allowing for printing and parsing of numerals and
treating them as members of different families depending on the contents
of the numeral_info part of the term_grammar. Tested inasmuch the
distribution builds correctly, and with integerTheory loaded.


# c34bf0c0 29-Jun-1999 Michael Norrish <Michael.Norrish@nicta.com.au>

Changes in Parse to call now-zeroed versions of export_theory etc in Theory.
Changes elsewhere to implement string literals.


# 0058df84 28-Jun-1999 Michael Norrish <Michael.Norrish@nicta.com.au>

Fixed to bring it up to date with Taupo release 0.
(Changes have come across from parse_branch development.)