1.\" $Id: mandoc_escape.3,v 1.1 2014/08/05 05:48:56 schwarze Exp $
| 1.\" $Id: mandoc_escape.3,v 1.2 2014/10/28 14:06:31 schwarze Exp $
|
2.\" 3.\" Copyright (c) 2014 Ingo Schwarze <schwarze@openbsd.org> 4.\" 5.\" Permission to use, copy, modify, and distribute this software for any 6.\" purpose with or without fee is hereby granted, provided that the above 7.\" copyright notice and this permission notice appear in all copies. 8.\" 9.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES 10.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF 11.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR 12.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES 13.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN 14.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF 15.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. 16.\"
| 2.\" 3.\" Copyright (c) 2014 Ingo Schwarze <schwarze@openbsd.org> 4.\" 5.\" Permission to use, copy, modify, and distribute this software for any 6.\" purpose with or without fee is hereby granted, provided that the above 7.\" copyright notice and this permission notice appear in all copies. 8.\" 9.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES 10.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF 11.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR 12.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES 13.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN 14.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF 15.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. 16.\"
|
17.Dd $Mdocdate: August 5 2014 $
| 17.Dd $Mdocdate: October 28 2014 $
|
18.Dt MANDOC_ESCAPE 3 19.Os 20.Sh NAME 21.Nm mandoc_escape 22.Nd parse roff escape sequences 23.Sh LIBRARY 24.Lb libmandoc 25.Sh SYNOPSIS 26.In sys/types.h 27.In mandoc.h 28.Ft "enum mandoc_esc" 29.Fo mandoc_escape 30.Fa "const char **end" 31.Fa "const char **start" 32.Fa "int *sz" 33.Fc 34.Sh DESCRIPTION 35This function scans a 36.Xr roff 7 37escape sequence. 38.Pp 39An escape sequence consists of 40.Bl -dash -compact -width 2n 41.It 42an initial backslash character 43.Pq Sq \e , 44.It 45a single ASCII character called the escape sequence identifier, 46.It 47and, with only a few exceptions, an argument. 48.El 49.Pp 50Arguments can be given in the following forms; some escape sequence 51identifiers only accept some of these forms as specified below. 52The first three forms are called the standard forms. 53.Bl -tag -width 2n 54.It \&In brackets: Ic \&[ Ns Ar argument Ns Ic \&] 55The argument starts after the initial 56.Sq \&[ , 57ends before the final 58.Sq \&] , 59and the escape sequence ends with the final 60.Sq \&] . 61.It Two-character argument short form: Ic \&( Ns Ar ar 62This form can only be used for arguments 63consisting of exactly two characters. 64It has the same effect as 65.Ic \&[ Ns Ar ar Ns Ic \&] . 66.It One-character argument short form: Ar a 67This form can only be used for arguments 68consisting of exactly one character. 69It has the same effect as 70.Ic \&[ Ns Ar a Ns Ic \&] . 71.It Delimited form: Ar C Ns Ar argument Ns Ar C 72The argument starts after the initial delimiter character 73.Ar C , 74ends before the next occurrence of the delimiter character 75.Ar C , 76and the escape sequence ends with that second 77.Ar C . 78Some escape sequences allow arbitrary characters 79.Ar C 80as quoting characters, some restrict the range of characters 81that can be used as quoting characters. 82.El 83.Pp 84Upon function entry, 85.Fa end 86is expected to point to the escape sequence identifier. 87The values passed in as 88.Fa start 89and 90.Fa sz 91are ignored and overwritten. 92.Pp 93By design, this function cannot handle those 94.Xr roff 7 95escape sequences that require in-place expansion, in particular 96user-defined strings 97.Ic \e* , 98number registers 99.Ic \en , 100width measurements 101.Ic \ew , 102and numerical expression control 103.Ic \eB . 104These are handled by 105.Fn roff_res , 106a private preprocessor function called from 107.Fn roff_parseln , 108see the file 109.Pa roff.c . 110.Pp 111The function 112.Fn mandoc_escape 113is used 114.Bl -dash -compact -width 2n 115.It 116recursively by itself, because some escape sequence arguments can 117in turn contain other escape sequences, 118.It 119for error detection internally by the 120.Xr roff 7 121parser part of the 122.Lb libmandoc , 123see the file 124.Pa roff.c , 125.It 126above all externally by the 127.Xr mandoc 128formatting modules, in particular 129.Fl Tascii 130and 131.Fl Thtml , 132for formatting purposes, see the files 133.Pa term.c 134and 135.Pa html.c , 136.It 137and rarely externally by high-level utilities using the mandoc library, 138for example 139.Xr makewhatis 8 , 140to purge escape sequences from text. 141.El 142.Sh RETURN VALUES 143Upon function return, the pointer 144.Fa end 145is set to the character after the end of the escape sequence, 146such that the calling higher-level parser can easily continue. 147.Pp 148For escape sequences taking an argument, the pointer 149.Fa start 150is set to the beginning of the argument and 151.Fa sz 152is set to the length of the argument. 153For escape sequences not taking an argument, 154.Fa start 155is set to the character after the end of the sequence and 156.Fa sz 157is set to 0. 158Both 159.Fa start 160and 161.Fa sz 162may be 163.Dv NULL ; 164in that case, the argument and the length are not returned. 165.Pp 166For sequences taking an argument, the function 167.Fn mandoc_escape 168returns one of the following values: 169.Bl -tag -width 2n 170.It Dv ESCAPE_FONT 171The escape sequence 172.Ic \ef 173taking an argument in standard form: 174.Ic \ef[ , \ef( , \ef Ns Ar a . 175Two-character arguments starting with the character 176.Sq C 177are reduced to one-character arguments by skipping the 178.Sq C . 179More specific values are returned for the most commonly used arguments: 180.Bl -column "argument" "ESCAPE_FONTITALIC" 181.It argument Ta return value 182.It Cm R No or Cm 1 Ta Dv ESCAPE_FONTROMAN 183.It Cm I No or Cm 2 Ta Dv ESCAPE_FONTITALIC 184.It Cm B No or Cm 3 Ta Dv ESCAPE_FONTBOLD 185.It Cm P Ta Dv ESCAPE_FONTPREV 186.It Cm BI Ta Dv ESCAPE_FONTBI 187.El 188.It Dv ESCAPE_SPECIAL 189The escape sequence 190.Ic \eC 191taking an argument delimited with the single quote character 192and, as a special exception, the escape sequences 193.Em not 194having an identifier, that is, those where the argument, in standard 195form, directly follows the initial backslash: 196.Ic \eC' , \e[ , \e( , \e Ns Ar a . 197Note that the one-character argument short form can only be used for 198argument characters that do not clash with escape sequence identifiers. 199.Pp
| 18.Dt MANDOC_ESCAPE 3 19.Os 20.Sh NAME 21.Nm mandoc_escape 22.Nd parse roff escape sequences 23.Sh LIBRARY 24.Lb libmandoc 25.Sh SYNOPSIS 26.In sys/types.h 27.In mandoc.h 28.Ft "enum mandoc_esc" 29.Fo mandoc_escape 30.Fa "const char **end" 31.Fa "const char **start" 32.Fa "int *sz" 33.Fc 34.Sh DESCRIPTION 35This function scans a 36.Xr roff 7 37escape sequence. 38.Pp 39An escape sequence consists of 40.Bl -dash -compact -width 2n 41.It 42an initial backslash character 43.Pq Sq \e , 44.It 45a single ASCII character called the escape sequence identifier, 46.It 47and, with only a few exceptions, an argument. 48.El 49.Pp 50Arguments can be given in the following forms; some escape sequence 51identifiers only accept some of these forms as specified below. 52The first three forms are called the standard forms. 53.Bl -tag -width 2n 54.It \&In brackets: Ic \&[ Ns Ar argument Ns Ic \&] 55The argument starts after the initial 56.Sq \&[ , 57ends before the final 58.Sq \&] , 59and the escape sequence ends with the final 60.Sq \&] . 61.It Two-character argument short form: Ic \&( Ns Ar ar 62This form can only be used for arguments 63consisting of exactly two characters. 64It has the same effect as 65.Ic \&[ Ns Ar ar Ns Ic \&] . 66.It One-character argument short form: Ar a 67This form can only be used for arguments 68consisting of exactly one character. 69It has the same effect as 70.Ic \&[ Ns Ar a Ns Ic \&] . 71.It Delimited form: Ar C Ns Ar argument Ns Ar C 72The argument starts after the initial delimiter character 73.Ar C , 74ends before the next occurrence of the delimiter character 75.Ar C , 76and the escape sequence ends with that second 77.Ar C . 78Some escape sequences allow arbitrary characters 79.Ar C 80as quoting characters, some restrict the range of characters 81that can be used as quoting characters. 82.El 83.Pp 84Upon function entry, 85.Fa end 86is expected to point to the escape sequence identifier. 87The values passed in as 88.Fa start 89and 90.Fa sz 91are ignored and overwritten. 92.Pp 93By design, this function cannot handle those 94.Xr roff 7 95escape sequences that require in-place expansion, in particular 96user-defined strings 97.Ic \e* , 98number registers 99.Ic \en , 100width measurements 101.Ic \ew , 102and numerical expression control 103.Ic \eB . 104These are handled by 105.Fn roff_res , 106a private preprocessor function called from 107.Fn roff_parseln , 108see the file 109.Pa roff.c . 110.Pp 111The function 112.Fn mandoc_escape 113is used 114.Bl -dash -compact -width 2n 115.It 116recursively by itself, because some escape sequence arguments can 117in turn contain other escape sequences, 118.It 119for error detection internally by the 120.Xr roff 7 121parser part of the 122.Lb libmandoc , 123see the file 124.Pa roff.c , 125.It 126above all externally by the 127.Xr mandoc 128formatting modules, in particular 129.Fl Tascii 130and 131.Fl Thtml , 132for formatting purposes, see the files 133.Pa term.c 134and 135.Pa html.c , 136.It 137and rarely externally by high-level utilities using the mandoc library, 138for example 139.Xr makewhatis 8 , 140to purge escape sequences from text. 141.El 142.Sh RETURN VALUES 143Upon function return, the pointer 144.Fa end 145is set to the character after the end of the escape sequence, 146such that the calling higher-level parser can easily continue. 147.Pp 148For escape sequences taking an argument, the pointer 149.Fa start 150is set to the beginning of the argument and 151.Fa sz 152is set to the length of the argument. 153For escape sequences not taking an argument, 154.Fa start 155is set to the character after the end of the sequence and 156.Fa sz 157is set to 0. 158Both 159.Fa start 160and 161.Fa sz 162may be 163.Dv NULL ; 164in that case, the argument and the length are not returned. 165.Pp 166For sequences taking an argument, the function 167.Fn mandoc_escape 168returns one of the following values: 169.Bl -tag -width 2n 170.It Dv ESCAPE_FONT 171The escape sequence 172.Ic \ef 173taking an argument in standard form: 174.Ic \ef[ , \ef( , \ef Ns Ar a . 175Two-character arguments starting with the character 176.Sq C 177are reduced to one-character arguments by skipping the 178.Sq C . 179More specific values are returned for the most commonly used arguments: 180.Bl -column "argument" "ESCAPE_FONTITALIC" 181.It argument Ta return value 182.It Cm R No or Cm 1 Ta Dv ESCAPE_FONTROMAN 183.It Cm I No or Cm 2 Ta Dv ESCAPE_FONTITALIC 184.It Cm B No or Cm 3 Ta Dv ESCAPE_FONTBOLD 185.It Cm P Ta Dv ESCAPE_FONTPREV 186.It Cm BI Ta Dv ESCAPE_FONTBI 187.El 188.It Dv ESCAPE_SPECIAL 189The escape sequence 190.Ic \eC 191taking an argument delimited with the single quote character 192and, as a special exception, the escape sequences 193.Em not 194having an identifier, that is, those where the argument, in standard 195form, directly follows the initial backslash: 196.Ic \eC' , \e[ , \e( , \e Ns Ar a . 197Note that the one-character argument short form can only be used for 198argument characters that do not clash with escape sequence identifiers. 199.Pp
|
200If the argument consists of more than one character 201and starts with the character 202.Sq u , 203.Dv ESCAPE_UNICODE 204is returned as described below. 205If the argument is just the single character 206.Sq u , 207.Dv ESCAPE_ERROR 208is returned.
| 200If the argument matches one of the forms described below under 201.Dv ESCAPE_UNICODE , 202that value is returned instead.
|
209.Pp 210The 211.Dv ESCAPE_SPECIAL 212special character escape sequences can be rendered using the functions 213.Fn mchars_spec2cp 214and 215.Fn mchars_spec2str 216described in the 217.Xr mchars_alloc 3 218manual. 219.It Dv ESCAPE_UNICODE 220Escape sequences of the same format as described above under 221.Dv ESCAPE_SPECIAL ,
| 203.Pp 204The 205.Dv ESCAPE_SPECIAL 206special character escape sequences can be rendered using the functions 207.Fn mchars_spec2cp 208and 209.Fn mchars_spec2str 210described in the 211.Xr mchars_alloc 3 212manual. 213.It Dv ESCAPE_UNICODE 214Escape sequences of the same format as described above under 215.Dv ESCAPE_SPECIAL ,
|
222but with an argument starting with the character 223.Sq u :
| 216but with an argument of the forms 217.Ic u Ns Ar XXXX , 218.Ic u Ns Ar YXXXX , 219or 220.Ic u10 Ns Ar XXXX 221where 222.Ar X 223and 224.Ar Y 225are hexadecimal digits and 226.Ar Y 227is not zero:
|
224.Ic \eC'u , \e[u . 225As a special exception, 226.Fa start 227is set to the character after the
| 228.Ic \eC'u , \e[u . 229As a special exception, 230.Fa start 231is set to the character after the
|
228.Sq u ,
| 232.Ic u ,
|
229and the 230.Fa sz 231return value does not include the
| 233and the 234.Fa sz 235return value does not include the
|
232.Sq u
| 236.Ic u
|
233either. 234.Pp 235Such Unicode character escape sequences can be rendered using the function 236.Fn mchars_num2uc 237described in the 238.Xr mchars_alloc 3 239manual. 240.It Dv ESCAPE_NUMBERED 241The escape sequence 242.Ic \eN 243followed by a delimited argument. 244The delimiter character is arbitrary except that digits cannot be used. 245If a digit is encountered instead of the opening delimiter, that 246digit is considered to be the argument and the end of the sequence, and 247.Dv ESCAPE_IGNORE 248is returned. 249.Pp 250Such ASCII character escape sequences can be rendered using the function 251.Fn mchars_num2char 252described in the 253.Xr mchars_alloc 3 254manual. 255.It Dv ESCAPE_IGNORE 256.Bl -bullet -width 2n 257.It 258The escape sequence 259.Ic \es 260followed by an argument in standard form or by an argument delimited 261by the single quote character: 262.Ic \es' , \es[ , \es( , \es Ns Ar a . 263As a special exception, an optional 264.Sq + 265or 266.Sq \- 267character is allowed after the 268.Sq s 269for all forms. 270.It 271The escape sequences 272.Ic \eF , 273.Ic \eg , 274.Ic \ek , 275.Ic \eM , 276.Ic \em , 277.Ic \en , 278.Ic \eV , 279and 280.Ic \eY 281followed by an argument in standard form. 282.It 283The escape sequences 284.Ic \eA , 285.Ic \eb , 286.Ic \eD , 287.Ic \eo , 288.Ic \eR , 289.Ic \eX , 290and 291.Ic \eZ 292followed by an argument delimited by an arbitrary character. 293.It 294The escape sequences 295.Ic \eH , 296.Ic \eh , 297.Ic \eL , 298.Ic \el , 299.Ic \eS , 300.Ic \ev , 301and 302.Ic \ex 303followed by an argument delimited by a character that cannot occur 304in numerical expressions. 305However, if any character that can occur in numerical expressions 306is found instead of a delimiter, the sequence is considered to end 307with that character, and 308.Dv ESCAPE_ERROR 309is returned. 310.El 311.It Dv ESCAPE_ERROR 312Escape sequences taking an argument but not matching any of the above patterns. 313In particular, that happens if the end of the logical input line 314is reached before the end of the argument. 315.El 316.Pp 317For sequences that do not take an argument, the function 318.Fn mandoc_escape 319returns one of the following values: 320.Bl -tag -width 2n 321.It Dv ESCAPE_SKIPCHAR 322The escape sequence 323.Qq \ez . 324.It Dv ESCAPE_NOSPACE 325The escape sequence 326.Qq \ec . 327.It Dv ESCAPE_IGNORE 328The escape sequences 329.Qq \ed 330and 331.Qq \eu . 332.El 333.Sh FILES 334This function is implemented in 335.Pa mandoc.c . 336.Sh SEE ALSO 337.Xr mchars_alloc 3 , 338.Xr mandoc_char 7 , 339.Xr roff 7 340.Sh HISTORY 341This function has been available since mandoc 1.11.2. 342.Sh AUTHORS 343.An Kristaps Dzonsons Aq Mt kristaps@bsd.lv 344.An Ingo Schwarze Aq Mt schwarze@openbsd.org 345.Sh BUGS 346The function doesn't cleanly distinguish between sequences that are 347valid and supported, valid and ignored, valid and unsupported, 348syntactically invalid, or undefined. 349For sequences that are ignored or unsupported, it doesn't tell 350whether that deficiency is likely to cause major formatting problems 351and/or loss of document content. 352The function is already rather complicated and still parses some 353sequences incorrectly. 354. 355.ig 356For these sequences, the list given below specifies a starting string 357and either the length of the argument or an ending character. 358The argument starts after the starting string. 359In the former case, the sequence ends with the end of the argument. 360In the latter case, the argument ends before the ending character, 361and the sequence ends with the ending character. 362..
| 237either. 238.Pp 239Such Unicode character escape sequences can be rendered using the function 240.Fn mchars_num2uc 241described in the 242.Xr mchars_alloc 3 243manual. 244.It Dv ESCAPE_NUMBERED 245The escape sequence 246.Ic \eN 247followed by a delimited argument. 248The delimiter character is arbitrary except that digits cannot be used. 249If a digit is encountered instead of the opening delimiter, that 250digit is considered to be the argument and the end of the sequence, and 251.Dv ESCAPE_IGNORE 252is returned. 253.Pp 254Such ASCII character escape sequences can be rendered using the function 255.Fn mchars_num2char 256described in the 257.Xr mchars_alloc 3 258manual. 259.It Dv ESCAPE_IGNORE 260.Bl -bullet -width 2n 261.It 262The escape sequence 263.Ic \es 264followed by an argument in standard form or by an argument delimited 265by the single quote character: 266.Ic \es' , \es[ , \es( , \es Ns Ar a . 267As a special exception, an optional 268.Sq + 269or 270.Sq \- 271character is allowed after the 272.Sq s 273for all forms. 274.It 275The escape sequences 276.Ic \eF , 277.Ic \eg , 278.Ic \ek , 279.Ic \eM , 280.Ic \em , 281.Ic \en , 282.Ic \eV , 283and 284.Ic \eY 285followed by an argument in standard form. 286.It 287The escape sequences 288.Ic \eA , 289.Ic \eb , 290.Ic \eD , 291.Ic \eo , 292.Ic \eR , 293.Ic \eX , 294and 295.Ic \eZ 296followed by an argument delimited by an arbitrary character. 297.It 298The escape sequences 299.Ic \eH , 300.Ic \eh , 301.Ic \eL , 302.Ic \el , 303.Ic \eS , 304.Ic \ev , 305and 306.Ic \ex 307followed by an argument delimited by a character that cannot occur 308in numerical expressions. 309However, if any character that can occur in numerical expressions 310is found instead of a delimiter, the sequence is considered to end 311with that character, and 312.Dv ESCAPE_ERROR 313is returned. 314.El 315.It Dv ESCAPE_ERROR 316Escape sequences taking an argument but not matching any of the above patterns. 317In particular, that happens if the end of the logical input line 318is reached before the end of the argument. 319.El 320.Pp 321For sequences that do not take an argument, the function 322.Fn mandoc_escape 323returns one of the following values: 324.Bl -tag -width 2n 325.It Dv ESCAPE_SKIPCHAR 326The escape sequence 327.Qq \ez . 328.It Dv ESCAPE_NOSPACE 329The escape sequence 330.Qq \ec . 331.It Dv ESCAPE_IGNORE 332The escape sequences 333.Qq \ed 334and 335.Qq \eu . 336.El 337.Sh FILES 338This function is implemented in 339.Pa mandoc.c . 340.Sh SEE ALSO 341.Xr mchars_alloc 3 , 342.Xr mandoc_char 7 , 343.Xr roff 7 344.Sh HISTORY 345This function has been available since mandoc 1.11.2. 346.Sh AUTHORS 347.An Kristaps Dzonsons Aq Mt kristaps@bsd.lv 348.An Ingo Schwarze Aq Mt schwarze@openbsd.org 349.Sh BUGS 350The function doesn't cleanly distinguish between sequences that are 351valid and supported, valid and ignored, valid and unsupported, 352syntactically invalid, or undefined. 353For sequences that are ignored or unsupported, it doesn't tell 354whether that deficiency is likely to cause major formatting problems 355and/or loss of document content. 356The function is already rather complicated and still parses some 357sequences incorrectly. 358. 359.ig 360For these sequences, the list given below specifies a starting string 361and either the length of the argument or an ending character. 362The argument starts after the starting string. 363In the former case, the sequence ends with the end of the argument. 364In the latter case, the argument ends before the ending character, 365and the sequence ends with the ending character. 366..
|