1NAME 2 DateTime::Format::Builder - Create DateTime parser classes and objects. 3 4SYNOPSIS 5 package DateTime::Format::Brief; 6 our $VERSION = '0.07'; 7 use DateTime::Format::Builder 8 ( 9 parsers => { 10 parse_datetime => [ 11 { 12 regex => qr/^(\d{4})(\d\d)(\d\d)(\d\d)(\d\d)(\d\d)$/, 13 params => [qw( year month day hour minute second )], 14 }, 15 { 16 regex => qr/^(\d{4})(\d\d)(\d\d)$/, 17 params => [qw( year month day )], 18 }, 19 ], 20 } 21 ); 22 23DESCRIPTION 24 DateTime::Format::Builder creates DateTime parsers. Many string formats 25 of dates and times are simple and just require a basic regular 26 expression to extract the relevant information. Builder provides a 27 simple way to do this without writing reams of structural code. 28 29 Builder provides a number of methods, most of which you'll never need, 30 or at least rarely need. They're provided more for exposing of the 31 module's innards to any subclasses, or for when you need to do something 32 slightly beyond what I expected. 33 34TUTORIAL 35 See DateTime::Format::Builder::Tutorial. 36 37ERROR HANDLING AND BAD PARSES 38 Often, I will speak of `undef' being returned, however that's not 39 strictly true. 40 41 When a simple single specification is given for a method, the method 42 isn't given a single parser directly. It's given a wrapper that will 43 call `on_fail()' if the single parser returns `undef'. The single parser 44 must return `undef' so that a multiple parser can work nicely and actual 45 errors can be thrown from any of the callbacks. 46 47 Similarly, any multiple parsers will only call `on_fail()' right at the 48 end when it's tried all it could. 49 50 `on_fail()' (see later) is defined, by default, to throw an error. 51 52 Multiple parser specifications can also specify `on_fail' with a coderef 53 as an argument in the options block. This will take precedence over the 54 inheritable and over-ridable method. 55 56 That said, don't throw real errors from callbacks in multiple parser 57 specifications unless you really want parsing to stop right there and 58 not try any other parsers. 59 60 In summary: calling a method will result in either a `DateTime' object 61 being returned or an error being thrown (unless you've overridden 62 `on_fail()' or `create_method()', or you've specified a `on_fail' key to 63 a multiple parser specification). 64 65 Individual parsers (be they multiple parsers or single parsers) will 66 return either the `DateTime' object or `undef'. 67 68SINGLE SPECIFICATIONS 69 A single specification is a hash ref of instructions on how to create a 70 parser. 71 72 The precise set of keys and values varies according to parser type. 73 There are some common ones though: 74 75 * length is an optional parameter that can be used to specify that 76 this particular *regex* is only applicable to strings of a certain 77 fixed length. This can be used to make parsers more efficient. It's 78 strongly recommended that any parser that can use this parameter 79 does. 80 81 You may happily specify the same length twice. The parsers will be 82 tried in order of specification. 83 84 You can also specify multiple lengths by giving it an arrayref of 85 numbers rather than just a single scalar. If doing so, please keep 86 the number of lengths to a minimum. 87 88 If any specifications without *length*s are given and the particular 89 *length* parser fails, then the non-*length* parsers are tried. 90 91 This parameter is ignored unless the specification is part of a 92 multiple parser specification. 93 94 * label provides a name for the specification and is passed to some of 95 the callbacks about to mentioned. 96 97 * on_match and on_fail are callbacks. Both routines will be called 98 with parameters of: 99 100 * input, being the input to the parser (after any preprocessing 101 callbacks). 102 103 * label, being the label of the parser, if there is one. 104 105 * self, being the object on which the method has been invoked 106 (which may just be a class name). Naturally, you can then invoke 107 your own methods on it do get information you want. 108 109 * args, being an arrayref of any passed arguments, if any. If 110 there were no arguments, then this parameter is not given. 111 112 These routines will be called depending on whether the regex match 113 succeeded or failed. 114 115 * preprocess is a callback provided for cleaning up input prior to 116 parsing. It's given a hash as arguments with the following keys: 117 118 * input being the datetime string the parser was given (if using 119 multiple specifications and an overall *preprocess* then this is 120 the date after it's been through that preprocessor). 121 122 * parsed being the state of parsing so far. Usually empty at this 123 point unless an overall *preprocess* was given. Items may be 124 placed in it and will be given to any postprocessor and 125 `DateTime->new' (unless the postprocessor deletes it). 126 127 * self, args, label as per *on_match* and *on_fail*. 128 129 The return value from the routine is what is given to the *regex*. 130 Note that this is last code stop before the match. 131 132 Note: mixing *length* and a *preprocess* that modifies the length of 133 the input string is probably not what you meant to do. You probably 134 meant to use the *multiple parser* variant of *preprocess* which is 135 done before any length calculations. This `single parser' variant of 136 *preprocess* is performed after any length calculations. 137 138 * postprocess is the last code stop before `DateTime->new()' is 139 called. It's given the same arguments as *preprocess*. This allows 140 it to modify the parsed parameters after the parse and before the 141 creation of the object. For example, you might use: 142 143 { 144 regex => qr/^(\d\d) (\d\d) (\d\d)$/, 145 params => [qw( year month day )], 146 postprocess => \&_fix_year, 147 } 148 149 where `_fix_year' is defined as: 150 151 sub _fix_year 152 { 153 my %args = @_; 154 my ($date, $p) = @args{qw( input parsed )}; 155 $p->{year} += $p->{year} > 69 ? 1900 : 2000; 156 return 1; 157 } 158 159 This will cause the two digit years to be corrected according to the 160 cut off. If the year was '69' or lower, then it is made into 2069 161 (or 2045, or whatever the year was parsed as). Otherwise it is 162 assumed to be 19xx. The DateTime::Format::Mail module uses code 163 similar to this (only it allows the cut off to be configured and it 164 doesn't use Builder). 165 166 Note: It is very important to return an explicit value from the 167 *postprocess* callback. If the return value is false then the parse 168 is taken to have failed. If the return value is true, then the parse 169 is taken to have succeeded and `DateTime->new()' is called. 170 171 See the documentation for the individual parsers for their valid keys. 172 173 Parsers at the time of writing are: 174 175 * DateTime::Format::Builder::Parser::Regex - provides regular 176 expression based parsing. 177 178 * DateTime::Format::Builder::Parser::Strptime - provides strptime 179 based parsing. 180 181 Subroutines / coderefs as specifications. 182 A single parser specification can be a coderef. This was added mostly 183 because it could be and because I knew someone, somewhere, would want to 184 use it. 185 186 If the specification is a reference to a piece of code, be it a 187 subroutine, anonymous, or whatever, then it's passed more or less 188 straight through. The code should return `undef' in event of failure (or 189 any false value, but `undef' is strongly preferred), or a true value in 190 the event of success (ideally a `DateTime' object or some object that 191 has the same interface). 192 193 This all said, I generally wouldn't recommend using this feature unless 194 you have to. 195 196 Callbacks 197 I mention a number of callbacks in this document. 198 199 Any time you see a callback being mentioned, you can, if you like, 200 substitute an arrayref of coderefs rather than having the straight 201 coderef. 202 203MULTIPLE SPECIFICATIONS 204 These are very easily described as an array of single specifications. 205 206 Note that if the first element of the array is an arrayref, then you're 207 specifying options. 208 209 * preprocess lets you specify a preprocessor that is called before any 210 of the parsers are tried. This lets you do things like strip off 211 timezones or any unnecessary data. The most common use people have 212 for it at present is to get the input date to a particular length so 213 that the *length* is usable (DateTime::Format::ICal would use it to 214 strip off the variable length timezone). 215 216 Arguments are as for the *single parser* *preprocess* variant with 217 the exception that *label* is never given. 218 219 * on_fail should be a reference to a subroutine that is called if the 220 parser fails. If this is not provided, the default action is to call 221 `DateTime::Format::Builder::on_fail', or the `on_fail' method of the 222 subclass of DTFB that was used to create the parser. 223 224EXECUTION FLOW 225 Builder allows you to plug in a fair few callbacks, which can make 226 following how a parse failed (or succeeded unexpectedly) somewhat 227 tricky. 228 229 For Single Specifications 230 A single specification will do the following: 231 232 User calls parser: 233 234 my $dt = $class->parse_datetime( $string ); 235 236 1 *preprocess* is called. It's given `$string' and a reference to the 237 parsing workspace hash, which we'll call `$p'. At this point, `$p' 238 is empty. The return value is used as `$date' for the rest of this 239 single parser. Anything put in `$p' is also used for the rest of 240 this single parser. 241 242 2 *regex* is applied. 243 244 3 If *regex* did not match, then *on_fail* is called (and is given 245 `$date' and also *label* if it was defined). Any return value is 246 ignored and the next thing is for the single parser to return 247 `undef'. 248 249 If *regex* did match, then *on_match* is called with the same 250 arguments as would be given to *on_fail*. The return value is 251 similarly ignored, but we then move to step 4 rather than exiting 252 the parser. 253 254 4 *postprocess* is called with `$date' and a filled out `$p'. The 255 return value is taken as a indication of whether the parse was a 256 success or not. If it wasn't a success then the single parser will 257 exit at this point, returning undef. 258 259 5 `DateTime->new()' is called and the user is given the resultant 260 `DateTime' object. 261 262 See the section on error handling regarding the `undef's mentioned 263 above. 264 265 For Multiple Specifications 266 With multiple specifications: 267 268 User calls parser: 269 270 my $dt = $class->complex_parse( $string ); 271 272 1 The overall *preprocess*or is called and is given `$string' and the 273 hashref `$p' (identically to the per parser *preprocess* mentioned 274 in the previous flow). 275 276 If the callback modifies `$p' then a copy of `$p' is given to each 277 of the individual parsers. This is so parsers won't accidentally 278 pollute each other's workspace. 279 280 2 If an appropriate length specific parser is found, then it is called 281 and the single parser flow (see the previous section) is followed, 282 and the parser is given a copy of `$p' and the return value of the 283 overall *preprocess*or as `$date'. 284 285 If a `DateTime' object was returned so we go straight back to the 286 user. 287 288 If no appropriate parser was found, or the parser returned `undef', 289 then we progress to step 3! 290 291 3 Any non-*length* based parsers are tried in the order they were 292 specified. 293 294 For each of those the single specification flow above is performed, 295 and is given a copy of the output from the overall preprocessor. 296 297 If a real `DateTime' object is returned then we exit back to the 298 user. 299 300 If no parser could parse, then an error is thrown. 301 302 See the section on error handling regarding the `undef's mentioned 303 above. 304 305METHODS 306 In the general course of things you won't need any of the methods. Life 307 often throws unexpected things at us so the methods are all available 308 for use. 309 310 import 311 `import()' is a wrapper for `create_class()'. If you specify the *class* 312 option (see documentation for `create_class()') it will be ignored. 313 314 create_class 315 This method can be used as the runtime equivalent of `import()'. That 316 is, it takes the exact same parameters as when one does: 317 318 use DateTime::Format::Builder ( blah blah blah ) 319 320 That can be (almost) equivalently written as: 321 322 use DateTime::Format::Builder; 323 DateTime::Format::Builder->create_class( blah blah blah ); 324 325 The difference being that the first is done at compile time while the 326 second is done at run time. 327 328 In the tutorial I said there were only two parameters at present. I 329 lied. There are actually three of them. 330 331 * parsers takes a hashref of methods and their parser specifications. 332 See the tutorial above for details. 333 334 Note that if you define a subroutine of the same name as one of the 335 methods you define here, an error will be thrown. 336 337 * constructor determines whether and how to create a `new()' function 338 in the new class. If given a true value, a constructor is created. 339 If given a false value, one isn't. 340 341 If given an anonymous sub or a reference to a sub then that is used 342 as `new()'. 343 344 The default is `1' (that is, create a constructor using our default 345 code which simply creates a hashref and blesses it). 346 347 If your class defines its own `new()' method it will not be 348 overwritten. If you define your own `new()' and also tell Builder to 349 define one an error will be thrown. 350 351 * verbose takes a value. If the value is undef, then logging is 352 disabled. If the value is a filehandle then that's where logging 353 will go. If it's a true value, then output will go to `STDERR'. 354 355 Alternatively, call `$DateTime::Format::Builder::verbose()' with the 356 relevant value. Whichever value is given more recently is adhered 357 to. 358 359 Be aware that verbosity is a global wide setting. 360 361 * class is optional and specifies the name of the class in which to 362 create the specified methods. 363 364 If using this method in the guise of `import()' then this field will 365 cause an error so it is only of use when calling as 366 `create_class()'. 367 368 * version is also optional and specifies the value to give `$VERSION' 369 in the class. It's generally not recommended unless you're combining 370 with the *class* option. A `ExtUtils::MakeMaker' / `CPAN' compliant 371 version specification is much better. 372 373 In addition to creating any of the methods it also creates a `new()' 374 method that can instantiate (or clone) objects. 375 376SUBCLASSING 377 In the rest of the documentation I've often lied in order to get some of 378 the ideas across more easily. The thing is, this module's very flexible. 379 You can get markedly different behaviour from simply subclassing it and 380 overriding some methods. 381 382 create_method 383 Given a parser coderef, returns a coderef that is suitable to be a 384 method. 385 386 The default action is to call `on_fail()' in the event of a non-parse, 387 but you can make it do whatever you want. 388 389 on_fail 390 This is called in the event of a non-parse (unless you've overridden 391 `create_method()' to do something else. 392 393 The single argument is the input string. The default action is to call 394 `croak()'. Above, where I've said parsers or methods throw errors, this 395 is the method that is doing the error throwing. 396 397 You could conceivably override this method to, say, return `undef'. 398 399USING BUILDER OBJECTS aka USERS USING BUILDER 400 The methods listed in the METHODS section are all you generally need 401 when creating your own class. Sometimes you may not want a full blown 402 class to parse something just for this one program. Some methods are 403 provided to make that task easier. 404 405 new 406 The basic constructor. It takes no arguments, merely returns a new 407 `DateTime::Format::Builder' object. 408 409 my $parser = DateTime::Format::Builder->new(); 410 411 If called as a method on an object (rather than as a class method), then 412 it clones the object. 413 414 my $clone = $parser->new(); 415 416 clone 417 Provided for those who prefer an explicit `clone()' method rather than 418 using `new()' as an object method. 419 420 my $clone_of_clone = $clone->clone(); 421 422 parser 423 Given either a single or multiple parser specification, sets the object 424 to have a parser based on that specification. 425 426 $parser->parser( 427 regex => qr/^ (\d{4}) (\d\d) (\d\d) $/x; 428 params => [qw( year month day )], 429 ); 430 431 The arguments given to `parser()' are handed directly to 432 `create_parser()'. The resultant parser is passed to `set_parser()'. 433 434 If called as an object method, it returns the object. 435 436 If called as a class method, it creates a new object, sets its parser 437 and returns that object. 438 439 set_parser 440 Sets the parser of the object to the given parser. 441 442 $parser->set_parser( $coderef ); 443 444 Note: this method does not take specifications. It also does not take 445 anything except coderefs. Luckily, coderefs are what most of the other 446 methods produce. 447 448 The method return value is the object itself. 449 450 get_parser 451 Returns the parser the object is using. 452 453 my $code = $parser->get_parser(); 454 455 parse_datetime 456 Given a string, it calls the parser and returns the `DateTime' object 457 that results. 458 459 my $dt = $parser->parse_datetime( "1979 07 16" ); 460 461 The return value, if not a `DateTime' object, is whatever the parser 462 wants to return. Generally this means that if the parse failed an error 463 will be thrown. 464 465 format_datetime 466 If you call this function, it will throw an errror. 467 468LONGER EXAMPLES 469 Some longer examples are provided in the distribution. These implement 470 some of the common parsing DateTime modules using Builder. Each of them 471 are, or were, drop in replacements for the modules at the time of 472 writing them. 473 474THANKS 475 Dave Rolsky (DROLSKY) for kickstarting the DateTime project, writing 476 DateTime::Format::ICal and DateTime::Format::MySQL, and some much needed 477 review. 478 479 Joshua Hoblitt (JHOBLITT) for the concept, some of the API, impetus for 480 writing the multilength code (both one length with multiple parsers and 481 single parser with multiple lengths), blame for the Regex custom 482 constructor code, spotting a bug in Dispatch, and more much needed 483 review. 484 485 Kellan Elliott-McCrea (KELLAN) for even more review, suggestions, 486 DateTime::Format::W3CDTF and the encouragement to rewrite these docs 487 almost 100%! 488 489 Claus F�rber (CFAERBER) for having me get around to fixing the 490 auto-constructor writing, providing the 'args'/'self' patch, and 491 suggesting the multi-callbacks. 492 493 Rick Measham (RICKM) for DateTime::Format::Strptime which Builder now 494 supports. 495 496 Matthew McGillis for pointing out that `on_fail' overriding should be 497 simpler. 498 499 Simon Cozens (SIMON) for saying it was cool. 500 501SUPPORT 502 Support for this module is provided via the datetime@perl.org email 503 list. See http://lists.perl.org/ for more details. 504 505 Alternatively, log them via the CPAN RT system via the web or email: 506 507 http://rt.cpan.org/NoAuth/ReportBug.html?Queue=DateTime%3A%3AFormat%3A%3ABuilder 508 bug-datetime-format-builder@rt.cpan.org 509 510 This makes it much easier for me to track things and thus means your 511 problem is less likely to be neglected. 512 513LICENCE AND COPYRIGHT 514 Copyright E<copy> Iain Truskett, 2003. All rights reserved. 515 516 This library is free software; you can redistribute it and/or modify it 517 under the same terms as Perl itself, either Perl version 5.000 or, at 518 your option, any later version of Perl 5 you may have available. 519 520 The full text of the licences can be found in the Artistic and COPYING 521 files included with this module, or in perlartistic and perlgpl as 522 supplied with Perl 5.8.1 and later. 523 524AUTHOR 525 Originally written by Iain Truskett <spoon@cpan.org>, who died on 526 December 29, 2003. 527 528 Maintained by Dave Rolsky <autarch@urth.org>. 529 530SEE ALSO 531 `datetime@perl.org' mailing list. 532 533 http://datetime.perl.org/ 534 535 perl, DateTime, DateTime::Format::Builder::Tutorial, 536 DateTime::Format::Builder::Parser 537 538