178064Sumepackage encoding::warnings; 266776Skris$encoding::warnings::VERSION = '0.14'; 355163Sshin 455163Sshinuse strict; 555163Sshinuse 5.007; 662632Skris 755163Sshin=head1 NAME 855163Sshin 955163Sshinencoding::warnings - Warn on implicit encoding conversions 1055163Sshin 1155163Sshin=head1 VERSION 1255163Sshin 1355163SshinThis document describes version 0.13 of encoding::warnings, released 1455163SshinJune 20, 2016. 1555163Sshin 1655163Sshin=head1 NOTICE 1755163Sshin 1862632SkrisAs of Perl 5.26.0, this module has no effect. The internal Perl feature 1955163Sshinthat was used to implement this module has been removed. In recent years, 2055163Sshinmuch work has been done on the Perl core to eliminate discrepancies in the 2155163Sshintreatment of upgraded versus downgraded strings. In addition, the 2255163SshinL<encoding> pragma, which caused many of the problems, is no longer 2355163Sshinsupported. Thus, the warnings this module produced are no longer 2455163Sshinnecessary. 2555163Sshin 2655163SshinHence, if you load this module on Perl 5.26.0, you will get one warning 2755163Sshinthat the module is no longer supported; and the module will do nothing 2855163Sshinthereafter. 2955163Sshin 3055163Sshin=head1 SYNOPSIS 3155163Sshin 3255163Sshin use encoding::warnings; # or 'FATAL' to raise fatal exceptions 3355163Sshin 3455163Sshin utf8::encode($a = chr(20000)); # a byte-string (raw bytes) 3555163Sshin $b = chr(20000); # a unicode-string (wide characters) 3655163Sshin 3755163Sshin # "Bytes implicitly upgraded into wide characters as iso-8859-1" 3855163Sshin $c = $a . $b; 3966776Skris 4055163Sshin=head1 DESCRIPTION 4155163Sshin 4262632Skris=head2 Overview of the problem 4355163Sshin 4462632SkrisBy default, there is a fundamental asymmetry in Perl's unicode model: 4555163Sshinimplicit upgrading from byte-strings to unicode-strings assumes that 4655163Sshinthey were encoded in I<ISO 8859-1 (Latin-1)>, but unicode-strings are 4755163Sshindowngraded with UTF-8 encoding. This happens because the first 256 4855163Sshincodepoints in Unicode happens to agree with Latin-1. 4955163Sshin 5055163SshinHowever, this silent upgrading can easily cause problems, if you happen 5155163Sshinto mix unicode strings with non-Latin1 data -- i.e. byte-strings encoded 5255163Sshinin UTF-8 or other encodings. The error will not manifest until the 5355163Sshincombined string is written to output, at which time it would be impossible 5455163Sshinto see where did the silent upgrading occur. 5555163Sshin 5655163Sshin=head2 Detecting the problem 5762632Skris 5855163SshinThis module simplifies the process of diagnosing such problems. Just put 5955163Sshinthis line on top of your main program: 6055163Sshin 6155163Sshin use encoding::warnings; 6255163Sshin 6355163SshinAfterwards, implicit upgrading of high-bit bytes will raise a warning. 64118664SumeEx.: C<Bytes implicitly upgraded into wide characters as iso-8859-1 at 6555163Sshin- line 7>. 6655163Sshin 6755163SshinHowever, strings composed purely of ASCII code points (C<0x00>..C<0x7F>) 6855163Sshinwill I<not> trigger this warning. 6962632Skris 70118664SumeYou can also make the warnings fatal by importing this module as: 7162632Skris 72118664Sume use encoding::warnings 'FATAL'; 7362632Skris 7462632Skris=head2 Solving the problem 75118660Sume 7662632SkrisMost of the time, this warning occurs when a byte-string is concatenated 7762632Skriswith a unicode-string. There are a number of ways to solve it: 7855163Sshin 7955163Sshin=over 4 80118660Sume 8155163Sshin=item * Upgrade both sides to unicode-strings 8255163Sshin 8355163SshinIf your program does not need compatibility for Perl 5.6 and earlier, 8455163Sshinthe recommended approach is to apply appropriate IO disciplines, so all 8555163Sshindata in your program become unicode-strings. See L<encoding>, L<open> and 86118660SumeL<perlfunc/binmode> for how. 8755163Sshin 8855163Sshin=item * Downgrade both sides to byte-strings 8955163Sshin 9055163SshinThe other way works too, especially if you are sure that all your data 9155163Sshinare under the same encoding, or if compatibility with older versions 9255163Sshinof Perl is desired. 9355163Sshin 9455163SshinYou may downgrade strings with C<Encode::encode> and C<utf8::encode>. 9562632SkrisSee L<Encode> and L<utf8> for details. 9655163Sshin 9755163Sshin=item * Specify the encoding for implicit byte-string upgrading 9855163Sshin 9955163SshinIf you are confident that all byte-strings will be in a specific 100118664Sumeencoding like UTF-8, I<and> need not support older versions of Perl, 10155163Sshinuse the C<encoding> pragma: 10255163Sshin 10355163Sshin use encoding 'utf8'; 10455163Sshin 10555163SshinSimilarly, this will silence warnings from this module, and preserve the 10655163Sshindefault behaviour: 10755163Sshin 10855163Sshin use encoding 'iso-8859-1'; 10955163Sshin 110118660SumeHowever, note that C<use encoding> actually had three distinct effects: 11155163Sshin 11255163Sshin=over 4 11355163Sshin 11455163Sshin=item * PerlIO layers for B<STDIN> and B<STDOUT> 11555163Sshin 116118660SumeThis is similar to what L<open> pragma does. 117118664Sume 11855163Sshin=item * Literal conversions 11955163Sshin 12055163SshinThis turns I<all> literal string in your program into unicode-strings 121118664Sume(equivalent to a C<use utf8>), by decoding them using the specified 12255163Sshinencoding. 12355163Sshin 12455163Sshin=item * Implicit upgrading for byte-strings 125118660Sume 126118664SumeThis will silence warnings from this module, as shown above. 127118664Sume 128118664Sume=back 129118664Sume 130118664SumeBecause literal conversions also work on empty strings, it may surprise 13155163Sshinsome people: 13255163Sshin 13355163Sshin use encoding 'big5'; 134118664Sume 13555163Sshin my $byte_string = pack("C*", 0xA4, 0x40); 13655163Sshin print length $a; # 2 here. 13755163Sshin $a .= ""; # concatenating with a unicode string... 138118664Sume print length $a; # 1 here! 13955163Sshin 14055163SshinIn other words, do not C<use encoding> unless you are certain that the 14155163Sshinprogram will not deal with any raw, 8-bit binary data at all. 14255163Sshin 14355163SshinHowever, the C<Filter =E<gt> 1> flavor of C<use encoding> will I<not> 14455163Sshinaffect implicit upgrading for byte-strings, and is thus incapable of 14555163Sshinsilencing warnings from this module. See L<encoding> for more details. 14655163Sshin 14755163Sshin=back 148118664Sume 149118664Sume=head1 CAVEATS 15055163Sshin 15155163SshinFor Perl 5.9.4 or later, this module's effect is lexical. 15255163Sshin 15355163SshinFor Perl versions prior to 5.9.4, this module affects the whole script, 15455163Sshininstead of inside its lexical block. 15555163Sshin 15655163Sshin=cut 15755163Sshin 15855163Sshin# Constants. 15955163Sshinsub ASCII () { 0 } 16055163Sshinsub LATIN1 () { 1 } 16155163Sshinsub FATAL () { 2 } 16255163Sshin 16355163Sshinsub import { 16455163Sshin if ($] >= 5.025003) { 16555163Sshin require Carp; 16655163Sshin Carp::cluck( 16755163Sshin "encoding::warnings is not supported on Perl 5.26.0 and later" 16855163Sshin ); 16955163Sshin return; 170118664Sume } 171118664Sume 172118664Sume # Install a ${^ENCODING} handler if no other one are already in place. 173118664Sume my $class = shift; 174118664Sume my $fatal = shift || ''; 17555163Sshin 176118660Sume local $@; 177118664Sume return if ${^ENCODING} and ref(${^ENCODING}) ne $class; 178118664Sume return unless eval { require Encode; 1 }; 17955163Sshin 18055163Sshin my $ascii = Encode::find_encoding('us-ascii') or return; 181118660Sume my $latin1 = Encode::find_encoding('iso-8859-1') or return; 182118664Sume 18355163Sshin # Have to undef explicitly here 184 undef ${^ENCODING}; 185 186 # Install a warning handler for decode() 187 my $decoder = bless( 188 [ 189 $ascii, 190 $latin1, 191 (($fatal eq 'FATAL') ? 'Carp::croak' : 'Carp::carp'), 192 ], $class, 193 ); 194 195 no warnings 'deprecated'; 196 ${^ENCODING} = $decoder; 197 use warnings 'deprecated'; 198 $^H{$class} = 1; 199} 200 201sub unimport { 202 my $class = shift; 203 $^H{$class} = undef; 204 undef ${^ENCODING}; 205} 206 207# Don't worry about source code literals. 208sub cat_decode { 209 my $self = shift; 210 return $self->[LATIN1]->cat_decode(@_); 211} 212 213# Warn if the data is not purely US-ASCII. 214sub decode { 215 my $self = shift; 216 217 DO_WARN: { 218 if ($] >= 5.009004) { 219 my $hints = (caller(0))[10]; 220 $hints->{ref($self)} or last DO_WARN; 221 } 222 223 local $@; 224 my $rv = eval { $self->[ASCII]->decode($_[0], Encode::FB_CROAK()) }; 225 return $rv unless $@; 226 227 require Carp; 228 no strict 'refs'; 229 $self->[FATAL]->( 230 "Bytes implicitly upgraded into wide characters as iso-8859-1" 231 ); 232 233 } 234 235 return $self->[LATIN1]->decode(@_); 236} 237 238sub name { 'iso-8859-1' } 239 2401; 241 242__END__ 243 244=head1 SEE ALSO 245 246L<perlunicode>, L<perluniintro> 247 248L<open>, L<utf8>, L<encoding>, L<Encode> 249 250=head1 AUTHORS 251 252Audrey Tang 253 254=head1 COPYRIGHT 255 256Copyright 2004, 2005, 2006, 2007 by Audrey Tang E<lt>cpan@audreyt.orgE<gt>. 257 258This program is free software; you can redistribute it and/or modify it 259under the same terms as Perl itself. 260 261See L<http://www.perl.com/perl/misc/Artistic.html> 262 263=cut 264