1package PerlIO; 2 3our $VERSION = '1.03'; 4 5# Map layer name to package that defines it 6our %alias; 7 8sub import 9{ 10 my $class = shift; 11 while (@_) 12 { 13 my $layer = shift; 14 if (exists $alias{$layer}) 15 { 16 $layer = $alias{$layer} 17 } 18 else 19 { 20 $layer = "${class}::$layer"; 21 } 22 eval "require $layer"; 23 warn $@ if $@; 24 } 25} 26 27sub F_UTF8 () { 0x8000 } 28 291; 30__END__ 31 32=head1 NAME 33 34PerlIO - On demand loader for PerlIO layers and root of PerlIO::* name space 35 36=head1 SYNOPSIS 37 38 open($fh,"<:crlf", "my.txt"); # portably open a text file for reading 39 40 open($fh,"<","his.jpg"); # portably open a binary file for reading 41 binmode($fh); 42 43 Shell: 44 PERLIO=perlio perl .... 45 46=head1 DESCRIPTION 47 48When an undefined layer 'foo' is encountered in an C<open> or 49C<binmode> layer specification then C code performs the equivalent of: 50 51 use PerlIO 'foo'; 52 53The perl code in PerlIO.pm then attempts to locate a layer by doing 54 55 require PerlIO::foo; 56 57Otherwise the C<PerlIO> package is a place holder for additional 58PerlIO related functions. 59 60The following layers are currently defined: 61 62=over 4 63 64=item :unix 65 66Lowest level layer which provides basic PerlIO operations in terms of 67UNIX/POSIX numeric file descriptor calls 68(open(), read(), write(), lseek(), close()). 69 70=item :stdio 71 72Layer which calls C<fread>, C<fwrite> and C<fseek>/C<ftell> etc. Note 73that as this is "real" stdio it will ignore any layers beneath it and 74got straight to the operating system via the C library as usual. 75 76=item :perlio 77 78A from scratch implementation of buffering for PerlIO. Provides fast 79access to the buffer for C<sv_gets> which implements perl's readline/E<lt>E<gt> 80and in general attempts to minimize data copying. 81 82C<:perlio> will insert a C<:unix> layer below itself to do low level IO. 83 84=item :crlf 85 86A layer that implements DOS/Windows like CRLF line endings. On read 87converts pairs of CR,LF to a single "\n" newline character. On write 88converts each "\n" to a CR,LF pair. Note that this layer likes to be 89one of its kind: it silently ignores attempts to be pushed into the 90layer stack more than once. 91 92It currently does I<not> mimic MS-DOS as far as treating of Control-Z 93as being an end-of-file marker. 94 95(Gory details follow) To be more exact what happens is this: after 96pushing itself to the stack, the C<:crlf> layer checks all the layers 97below itself to find the first layer that is capable of being a CRLF 98layer but is not yet enabled to be a CRLF layer. If it finds such a 99layer, it enables the CRLFness of that other deeper layer, and then 100pops itself off the stack. If not, fine, use the one we just pushed. 101 102The end result is that a C<:crlf> means "please enable the first CRLF 103layer you can find, and if you can't find one, here would be a good 104spot to place a new one." 105 106Based on the C<:perlio> layer. 107 108=item :mmap 109 110A layer which implements "reading" of files by using C<mmap()> to 111make (whole) file appear in the process's address space, and then 112using that as PerlIO's "buffer". This I<may> be faster in certain 113circumstances for large files, and may result in less physical memory 114use when multiple processes are reading the same file. 115 116Files which are not C<mmap()>-able revert to behaving like the C<:perlio> 117layer. Writes also behave like C<:perlio> layer as C<mmap()> for write 118needs extra house-keeping (to extend the file) which negates any advantage. 119 120The C<:mmap> layer will not exist if platform does not support C<mmap()>. 121 122=item :utf8 123 124Declares that the stream accepts perl's internal encoding of 125characters. (Which really is UTF-8 on ASCII machines, but is 126UTF-EBCDIC on EBCDIC machines.) This allows any character perl can 127represent to be read from or written to the stream. The UTF-X encoding 128is chosen to render simple text parts (i.e. non-accented letters, 129digits and common punctuation) human readable in the encoded file. 130 131Here is how to write your native data out using UTF-8 (or UTF-EBCDIC) 132and then read it back in. 133 134 open(F, ">:utf8", "data.utf"); 135 print F $out; 136 close(F); 137 138 open(F, "<:utf8", "data.utf"); 139 $in = <F>; 140 close(F); 141 142=item :bytes 143 144This is the inverse of C<:utf8> layer. It turns off the flag 145on the layer below so that data read from it is considered to 146be "octets" i.e. characters in range 0..255 only. Likewise 147on output perl will warn if a "wide" character is written 148to a such a stream. 149 150=item :raw 151 152The C<:raw> layer is I<defined> as being identical to calling 153C<binmode($fh)> - the stream is made suitable for passing binary data 154i.e. each byte is passed as-is. The stream will still be 155buffered. 156 157In Perl 5.6 and some books the C<:raw> layer (previously sometimes also 158referred to as a "discipline") is documented as the inverse of the 159C<:crlf> layer. That is no longer the case - other layers which would 160alter binary nature of the stream are also disabled. If you want UNIX 161line endings on a platform that normally does CRLF translation, but still 162want UTF-8 or encoding defaults the appropriate thing to do is to add 163C<:perlio> to PERLIO environment variable. 164 165The implementation of C<:raw> is as a pseudo-layer which when "pushed" 166pops itself and then any layers which do not declare themselves as suitable 167for binary data. (Undoing :utf8 and :crlf are implemented by clearing 168flags rather than popping layers but that is an implementation detail.) 169 170As a consequence of the fact that C<:raw> normally pops layers 171it usually only makes sense to have it as the only or first element in 172a layer specification. When used as the first element it provides 173a known base on which to build e.g. 174 175 open($fh,":raw:utf8",...) 176 177will construct a "binary" stream, but then enable UTF-8 translation. 178 179=item :pop 180 181A pseudo layer that removes the top-most layer. Gives perl code 182a way to manipulate the layer stack. Should be considered 183as experimental. Note that C<:pop> only works on real layers 184and will not undo the effects of pseudo layers like C<:utf8>. 185An example of a possible use might be: 186 187 open($fh,...) 188 ... 189 binmode($fh,":encoding(...)"); # next chunk is encoded 190 ... 191 binmode($fh,":pop"); # back to un-encocded 192 193A more elegant (and safer) interface is needed. 194 195=item :win32 196 197On Win32 platforms this I<experimental> layer uses native "handle" IO 198rather than unix-like numeric file descriptor layer. Known to be 199buggy as of perl 5.8.2. 200 201=back 202 203=head2 Custom Layers 204 205It is possible to write custom layers in addition to the above builtin 206ones, both in C/XS and Perl. Two such layers (and one example written 207in Perl using the latter) come with the Perl distribution. 208 209=over 4 210 211=item :encoding 212 213Use C<:encoding(ENCODING)> either in open() or binmode() to install 214a layer that does transparently character set and encoding transformations, 215for example from Shift-JIS to Unicode. Note that under C<stdio> 216an C<:encoding> also enables C<:utf8>. See L<PerlIO::encoding> 217for more information. 218 219=item :via 220 221Use C<:via(MODULE)> either in open() or binmode() to install a layer 222that does whatever transformation (for example compression / 223decompression, encryption / decryption) to the filehandle. 224See L<PerlIO::via> for more information. 225 226=back 227 228=head2 Alternatives to raw 229 230To get a binary stream an alternate method is to use: 231 232 open($fh,"whatever") 233 binmode($fh); 234 235this has advantage of being backward compatible with how such things have 236had to be coded on some platforms for years. 237 238To get an un-buffered stream specify an unbuffered layer (e.g. C<:unix>) 239in the open call: 240 241 open($fh,"<:unix",$path) 242 243=head2 Defaults and how to override them 244 245If the platform is MS-DOS like and normally does CRLF to "\n" 246translation for text files then the default layers are : 247 248 unix crlf 249 250(The low level "unix" layer may be replaced by a platform specific low 251level layer.) 252 253Otherwise if C<Configure> found out how to do "fast" IO using system's 254stdio, then the default layers are: 255 256 unix stdio 257 258Otherwise the default layers are 259 260 unix perlio 261 262These defaults may change once perlio has been better tested and tuned. 263 264The default can be overridden by setting the environment variable 265PERLIO to a space separated list of layers (C<unix> or platform low 266level layer is always pushed first). 267 268This can be used to see the effect of/bugs in the various layers e.g. 269 270 cd .../perl/t 271 PERLIO=stdio ./perl harness 272 PERLIO=perlio ./perl harness 273 274For the various value of PERLIO see L<perlrun/PERLIO>. 275 276=head2 Querying the layers of filehandles 277 278The following returns the B<names> of the PerlIO layers on a filehandle. 279 280 my @layers = PerlIO::get_layers($fh); # Or FH, *FH, "FH". 281 282The layers are returned in the order an open() or binmode() call would 283use them. Note that the "default stack" depends on the operating 284system and on the Perl version, and both the compile-time and 285runtime configurations of Perl. 286 287The following table summarizes the default layers on UNIX-like and 288DOS-like platforms and depending on the setting of the C<$ENV{PERLIO}>: 289 290 PERLIO UNIX-like DOS-like 291 292 unset / "" unix perlio / stdio [1] unix crlf 293 stdio unix perlio / stdio [1] stdio 294 perlio unix perlio unix perlio 295 mmap unix mmap unix mmap 296 297 # [1] "stdio" if Configure found out how to do "fast stdio" (depends 298 # on the stdio implementation) and in Perl 5.8, otherwise "unix perlio" 299 300By default the layers from the input side of the filehandle is 301returned, to get the output side use the optional C<output> argument: 302 303 my @layers = PerlIO::get_layers($fh, output => 1); 304 305(Usually the layers are identical on either side of a filehandle but 306for example with sockets there may be differences, or if you have 307been using the C<open> pragma.) 308 309There is no set_layers(), nor does get_layers() return a tied array 310mirroring the stack, or anything fancy like that. This is not 311accidental or unintentional. The PerlIO layer stack is a bit more 312complicated than just a stack (see for example the behaviour of C<:raw>). 313You are supposed to use open() and binmode() to manipulate the stack. 314 315B<Implementation details follow, please close your eyes.> 316 317The arguments to layers are by default returned in parenthesis after 318the name of the layer, and certain layers (like C<utf8>) are not real 319layers but instead flags on real layers: to get all of these returned 320separately use the optional C<details> argument: 321 322 my @layer_and_args_and_flags = PerlIO::get_layers($fh, details => 1); 323 324The result will be up to be three times the number of layers: 325the first element will be a name, the second element the arguments 326(unspecified arguments will be C<undef>), the third element the flags, 327the fourth element a name again, and so forth. 328 329B<You may open your eyes now.> 330 331=head1 AUTHOR 332 333Nick Ing-Simmons E<lt>nick@ing-simmons.netE<gt> 334 335=head1 SEE ALSO 336 337L<perlfunc/"binmode">, L<perlfunc/"open">, L<perlunicode>, L<perliol>, 338L<Encode> 339 340=cut 341 342