# BEGIN LICENSE BLOCK # Version: CMPL 1.1 # # The contents of this file are subject to the Cisco-style Mozilla Public # License Version 1.1 (the "License"); you may not use this file except # in compliance with the License. You may obtain a copy of the License # at www.eclipse-clp.org/license. # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See # the License for the specific language governing rights and limitations # under the License. # # The Original Code is The ECLiPSe Constraint Logic Programming System. # The Initial Developer of the Original Code is Cisco Systems, Inc. # Portions created by the Initial Developer are # Copyright (C) 2006 Cisco Systems, Inc. All Rights Reserved. # # Contributor(s): # # END LICENSE BLOCK ---------------------------------------------------------------------- Format of trail entries ---------------------------------------------------------------------- To make it suitable for the GC, the addresses inside trail frames have to point to proper Prolog words, even if it is not the Prolog word itself that is trailed. For this purpose, the offsetted trail has been introduced. Standard address trail frame: +--------+--------+--------+--------+ | address 00| +--------+--------+--------+--------+ <-- TT Tag trail frame: +--------+--------+--------+--------+ | address 00| +--------+--------+--------+--------+ | shifted tag |01| +--------+--------+--------+--------+ <-- TT Multiple word trail with offset: +--------+--------+--------+--------+ \ | data | | . . | . . > nnnn + 1 32 bit words . . | | data | | +--------+--------+--------+--------+ / | address 00| +--------+--------+--------+--------+ | word offset |nnnntt10| +--------+--------+--------+--------+ <-- TT For the GC it is essential that the address points to a tagged double word. If there is a need by some extension, the offset field could be shortened in favour of the nnnn field. The two bits tt specify what kind of data has been trailed. This information is not needed when untrailing, but for the garbage collector (the trailed data may have to be updated). tt meaning 00 TRAILED_PWORD trailed data consists of pwords (val,tag) 01 TRAILED_REF trailed data consists of pointers that may point to prolog data areas (especially stacks) A reference tag is assumed for the pointer. This means, you can't pointer-trail the value part of a TLIST or TCOMP. 10 TRAILED_WORD32 trailed words are general data, the garbage collector can ignore them 11 TRAILED_COMP trailed data consists of structure pointers Some rules have to be followed when this offset-trail is used: - the address must always point to a tagged pword - no pword may be trailed that is also independently reachable (ie. that may have a reference to it) ---------------------------------------------------------------------- Extension trail frames ---------------------------------------------------------------------- The code 11 in the low 2 bits allows for another 16 kinds of trail frames. These trail frames all have the following common format: +--------+--------+--------+--------+ | extension defined data | . . . . . . | extension defined data | +--------+--------+--------+--------+ | address 00| +--------+--------+--------+--------+ | word size of trail frame |eeeett11| +--------+--------+--------+--------+ <-- TT The eeee field indentifies the exact type of frame: eeee 0000 simple undo frame (ECLiPSe >= 5.5) 0001 stamped undo frame (ECLiPSe >= 5.5) .... others unused If a new frame type is defined, it must be dealt with in the untrailing routine and, provided it contains pointers to global stack objects, in the garbage collector. Keep in mind that pointers to the global stack must always point to tagged Prolog words in order to be updateable when the stack is compacted. ---------------------------------------------------------------------- Extension trail frame: Undo frame (up to ECLiPSe 5.4, later see below): ---------------------------------------------------------------------- +--------+--------+--------+--------+ | & untrail_function | +--------+--------+--------+--------+ | address 00| +--------+--------+--------+--------+ | 3 |00000011| +--------+--------+--------+--------+ <-- TT ---------------------------------------------------------------------- Extension trail frame: Stamped Undo frame (ONLY ECLiPSe 5.3-5.4!) ---------------------------------------------------------------------- +--------+--------+--------+--------+ \ | data | | . . | . . > size + 1 32 bit words . . | | data | | +--------+--------+--------+--------+ / | old stamp value | +--------+--------+--------+--------+ | & untrail_function | +--------+--------+--------+--------+ | (stamp) address 00| +--------+--------+--------+--------+ | frame size |0001tt11| +--------+--------+--------+--------+ <-- TT The tt field indicates the type of trailed data, as in the value trail. The undo function is called as follows: undo(&data, data_size_in_words, redundancy_flags) The redundancy_flags indicate whether - the trail is redundant according to the timestamp - undo is being called from the garbage collector rather than in a failure The frame is created by calling: ec_trail_undo(&stamp, &function, &data, data_size_in_words, data_type) this function checks the timestamp and does nothing if it is new. Otherwise the trail is done and the timestamp updated. Note that the location of the stamp does not matter for deciding the redundancy of the frame, only the stamp value is important. This trail could also be used to trail external data structures (their address would have to be stored within the data block) while having timestamps for them in a separate memory block. The size of the trail frames is limited by the length of the size field, if that is not enough, indirection has to be used. If the data consists of pwords, early untrailing during garbage collection is tricky: the data may be marked (and thus unusable) at the time the untrail function is called. If the undo function needs to look at the data even for redundant untrails, we must make sure at trail time that the data that is referenced by the trail frame is not referenced by anybody else. o what happens when the timestamp location becomes garbage? (is the trail a strong or weak pointer to it?) It is a strong pointer. The lifetime of the trail depends on the stamp value only, not the location. Instances of timestamped undo trail: request_fail_event(Struct, StampArg, Event) raise event on failure. do nothing on redundancy. request_fail_write(Struct, StampArg, Stream, Term) write Term on failure. do nothing on redundancy. ---------------------------------------------------------------------- Extension trail frame: Simple and Stamped Undo frame (from ECLiPSe 5.5) ---------------------------------------------------------------------- The two types have been largely unified. Simple and stamped undo frame differ only in that the stamped frame has two extra fields, the stamp address and the old stamp value. Both frames have a pointer to an "item" which is the related data structure that determines the lifetime of the frame. Simple undo frame: +--------+--------+--------+--------+ \ | data | | . . | . . > size + 1 32 bit words . . | | data | | +--------+--------+--------+--------+ / | & untrail_function | +--------+--------+--------+--------+ | item address | +--------+--------+--------+--------+ | frame size |0000tt11| +--------+--------+--------+--------+ <-- TT Stamped undo frame: +--------+--------+--------+--------+ \ | data | | . . | . . > size + 1 32 bit words . . | | data | | +--------+--------+--------+--------+ / | old stamp value | +--------+--------+--------+--------+ | stamp address | +--------+--------+--------+--------+ | & untrail_function | +--------+--------+--------+--------+ | item address | +--------+--------+--------+--------+ | frame size |0001tt11| +--------+--------+--------+--------+ <-- TT The tt field indicates the type of trailed data, as in the value trail. The frames are created by a single function: ec_trail_undo(&function, &item, &stamp, &data, data_size_in_words, trail_flags) The arguments are &function address of the undo function &item address of the data structure to which the undo applies. This can be a global stack address (in which case it must point to a properly tagged pword) or an arbitrary heap address (in which case the content does not matter), or it can be NULL. If item is a global stack item, and this item is about to be garbage collected, the undo function will be called early. If item is NULL or outside the global stack, the undo function will only be called on failure. &stamp address of a timestamp. This should be NULL if you don't want to use timestamping. Using timestamping implies that multiple trails within the same choicepoint segment are redundant, and the system will suppress all but the first. The location of the timestamp is immaterial. It can be part of the item or not. It will be kept alive by the existence of the trail frame. &data,size address and size (in words) of data to be passed to the undo function. This data will be copied into the trail frame. trail_flags the type of the data (TRAILED_PWORD or TRAILED_WORD) The undo function is called as follows: undo(pword *item, word *data, int data_size_in_words, int undo_context) Untrailing is done: undo_context == UNDO_FAIL on failure (unless redundant according to stamp) undo_context == UNDO_GC on gc, when the untrail can be done early (when the item is inaccessible in the post-trailed state) If the data consists of pwords, early untrailing during garbage collection is tricky: the data may be marked (and thus unusable) at the time the untrail function is called. If the undo function needs to look at the data even for redundant untrails, we must make sure at trail time that the data that is referenced by the trail frame is not referenced by anybody else. CAUTION1: The data fields are blindly copied and when the untrail function is executed, the data may not be properly aligned if they require >4 byte boundary alignment (e.g. doubles). Accessing these misaligned field directly can cause a segmentation violation on some archectures. In these cases, the data should be copied (using memcpy()) to a properly aligned data structure first. CAUTION2: The GC does NOT mark the data pointed to by the item address field. This item is only used to determine the lifetime of the trail frame. In particular, this undo-trail CANNOT be used to implement a general value-trail because it does not assume (during GC) that the item has been modified at trailing time, so the current value would not be marked and updated properly during GC. It is in principle a pure side-effect trail. However, simple modifications in the item, like resetting counters or bits, should be ok. NOTE: A trail that would combine value trail and side-effect trail would need to store the tag of the item pointer as well, in order to be able to do proper marking. Another difference between a reset-trail and side-effect trail is that the reset-trail is redundant when the item is about to disappear anyway during failure. The side effect still needs to be done in that case (e.g. cleanup/deallocation for external handles). Overview of different cases during GC: age Reset-trail Sideeffect-trail ^ | trail/newstamp redundant redundant | | trail/oldstamp early early | + chp | o item, not marked | - ^ | trail/newstamp redundant redundant | | trail/oldstamp keep keep | + chp | o item, marked | - ^ | trail/newstamp redundant redundant | | trail/oldstamp redundant early | o item, not marked | + chp | - ^ | trail/newstamp redundant redundant | | trail/oldstamp redundant keep | o item, marked | + chp | - Unstamped trails must be treated like oldstamp trails. ---------------------------------------------------------------------- Ideas ---------------------------------------------------------------------- Modified multiple pword trail with timestamp: (not implemented as of 12/2001) mainly for trailing a data block one of whose (p)words is a timestamp. +--------+--------+--------+--------+ \ | data | | . . | . . > size + 1 32 bit words . . | | data | | +--------+--------+--------+--------+ / | address 00| +--------+--------+--------+--------+ | stamp | data | frame |tt000001| | offset | offset | size | | +--------+--------+--------+--------+ <-- TT (stamp only if pwords) Can we have a dense array of timestamps, i.e. a header tag and N tagless timestamps? On global: needs to be trailed as base+offset, no individual lifetime. Timestamps on the heap (not global stack) would normally need to be tagged because of the garbage collector's ALREADY_MARKED_FROM bit. But we could also use the offsetted trail and get away with one mark for the whole array in a header, if they all get marked together. Bit trail (not implemented) +--------+--------+--------+--------+ | buffer address | +--------+--------+--------+--------+ | bit number |00100011| +--------+--------+--------+--------+ <-- TT Bit sequence trail (not implemented) +--------+--------+--------+--------+ | buffer address | +--------+--------+----+---+--------+ | start bit number | n |00110011| +--------+--------+----+---+--------+ <-- TT ---------------------------------------------------------------------- Garbage Collection issues related to the trail ---------------------------------------------------------------------- Joachim, 10/2001 There are 2 passes through the trail (plus a compaction pass). mark_fom_trail() ---------------- This is one top-down pass through the whole of the collected trail segment (TT down to GCB->tt) and is done right at the beginning of the marking phase of the collection. TRAIL_ADDRESS, TRAIL_TAG Mark from all trailed locations outside of the collection segment. These potentially hold new references into the collection segment because they have been bound while the c.s. was grown. TRAIL_SINGLE_VALUE Mark from all trailed locations outside of the collection segment. Because of possible multiple value trails, set ALREADY_MARKED_FROM bit. Do not yet mark from the old value in the trail! TRAIL_MULTI_VALUE This is a bit dodgy: Mark from all trailed locations outside of the collection segment. The ALREADY_MARKED_FROM bit is currently not set here (but should)!!! The old value in the trail is immediately marked from (this is suboptimal). Thoughts: - multi-non-pword trail implies that base address has to be taken as representatnt for the whole thing (deciding early untrail, markedness) - multi-pword trail: can lead to conflicts, when some pwords marked, some unmarked we don't know whether to early untrail (don't to be safe). Better not to have it at all? - TRAILED_WORD32 currently only used to trail value-only modifications of non-pointers or tag modifications. Only single WORD32 used. TRAIL_UNDO Nothing early_untrail() ---------------- This is an incremental pass choicepoint for choicepoint. The segment above choicepoint 'prev' is processed. At the time of processing, we have already marked from states that correspond to creation time of the next choicepoint above, and now we simulate what would happen on failure to 'prev'. Everything that's not marked at this time can be early-untrailed because it is only possibly needed after the failure. TRAIL_ADDRESS, TRAIL_TAG If the trailed item is above prev->tg, just delete the trail entry, it is cut garbage (either didn't need to be trailed in the first place or has been rendered unnecessary due to subsequent cut). The trailed item will anyway disappear on backtracking. If the trailed item is in the collection segment, but below prev->tg: If not marked, we can early untrail now and delete the trail entry. If already marked, only add the trail entry to relocation chain (but in case of TMETA, the attribute might not be marked yet, so do it now (this will normally be the case because the bound variable was marked earlier, which does not lead to marking the attribute)). TRAIL_SINGLE_VALUE If the trailed item is above prev->tg, as above. (we seem to have missed the case of cut garbage on the grounds that the trailed old value is a pointer above prev->tg FIXIT!) If the trailed item is in the collection segment, but below prev->tg: If not marked, we can early untrail now and delete the trail entry. If already marked, add the trail entry to relocation chain. Also, mark from the old values in the trail frame. If the trailed item is older than the collection segment (but see FIXIT above), we try to detect unnecessary pointer trails: if the _old_value_ is newer than prev->tg (i.e. any pointer type pointing above prev->tg), the trail entry is also cut garbage. If the trailed item is older than the collection segment, we still need to mark from the old value in the trail. We also reset the ALREADY_MARKED_FROM bit which was set earlier. TRAIL_MULTI_VALUE As above, but we cannot remove unnecessary pointer trails because they are batched up with other stuff. TRAIL_UNDO If the trailed item is in the collection segment: If not marked, we can early untrail now and delete the trail entry. If already marked, enter into relocation chain. If the trailed item is older than the collection segment, do nothing. General trail frame scheme -------------------------- trailed_location (pword *) if above prev->tg: delete trail frame (cut garbage) if between gcb->tg and prev->tg: unmarked: early untrail marked: enter trailed_location* into relocation chain, mark from timestamp_location* mark from trailed_data if below gcb->tg: mark from timestamp_location* (but probably also below gcb->tg) mark from trailed_data reset ALREADY_MARKED_FROM bit timestamp_location (pword *) if above prev->tg: delete trail frame (cut garbage) (may still need to reset ALREADY_MARKED_FROM bit -- FIXIT) trailed_data unless the frame is deleted: mark from trailed_data Instance: Single pword trail --------------------- trailed_location: address timestamp_location: address (current data = timestamp, if pointer) trailed_data: old value Instance: Multi pword trail --------------------- trailed_location: base address timestamp_location: base address + timestamp_offset trailed_data: old value(s), offset that would mean that the trail can be deleted based on the timestamp, i.e. reset information is redundant. E.g. lib(eplex) has a 3-pword trail that could be improved by indicating which of them is the timestamp. Similarly, the related undo-trail should know the timestamp. The two trail frames semantically belong together and should have identical lifetimes. Instance: Multi word trail --------------------- trailed_location: base address timestamp_location: base address + timestamp_offset trailed_data: old value(s), offset This would normally have to go together with a pword trail of the timestamp. Instance: Equivalent of old Undo trail -------------- trailed_location: base address timestamp_location: base address + timestamp_offset trailed_data: old value(s), offset Instance: cut_fail_action ------------------------ Here the trailed_location (the cut action frame) and the trail entry are always in the same segment, could be trailed value. trailed_location: 0 timestamp_location: 0 trailed_data: cut action frame This frame will not be affected by gc (only relocated). Instance: free-trail for external handles ----------------------- The anchor and the trail are in the same segment. trailed_location: the global stack anchor timestamp_location: 0 trailed_data: optional will be early-untrailed (freed) when trailed_location becomes garbage. Instance: fail-event-trail ----------------------- UNTRAIL_REDUNDANT_TRAILS no UNTRAIL_WHEN_LOCATION_DISAPPEARS no (needed? don't really know when location disappears, unless tail frame is in same segment) trailed_location: related data structure timestamp_location: its timestamp field trailed_data: event name The untrail function will be called with the data structure address and the event name, and post the named event. The events should never be posted at gc time! How to avoid that? By setting the flags to 'no'. If the trail gets redundant (by timestamp), delete (don't untrail). Instance: fail-write-request ----------------------- UNTRAIL_REDUNDANT_TRAILS no UNTRAIL_WHEN_LOCATION_DISAPPEARS no (needed?) trailed_location: related data structure, e.g. containing stream timestamp_location: its timestamp field trailed_data: data to be written The untrail function will be called with the data structure address and the trailed_data.