178064Sumepackage encoding::warnings;
266776Skris$encoding::warnings::VERSION = '0.14';
355163Sshin
455163Sshinuse strict;
555163Sshinuse 5.007;
662632Skris
755163Sshin=head1 NAME
855163Sshin
955163Sshinencoding::warnings - Warn on implicit encoding conversions
1055163Sshin
1155163Sshin=head1 VERSION
1255163Sshin
1355163SshinThis document describes version 0.13 of encoding::warnings, released
1455163SshinJune 20, 2016.
1555163Sshin
1655163Sshin=head1 NOTICE
1755163Sshin
1862632SkrisAs of Perl 5.26.0, this module has no effect.  The internal Perl feature
1955163Sshinthat was used to implement this module has been removed.  In recent years,
2055163Sshinmuch work has been done on the Perl core to eliminate discrepancies in the
2155163Sshintreatment of upgraded versus downgraded strings.  In addition, the
2255163SshinL<encoding> pragma, which caused many of the problems, is no longer
2355163Sshinsupported.  Thus, the warnings this module produced are no longer
2455163Sshinnecessary.
2555163Sshin
2655163SshinHence, if you load this module on Perl 5.26.0, you will get one warning
2755163Sshinthat the module is no longer supported; and the module will do nothing
2855163Sshinthereafter.
2955163Sshin
3055163Sshin=head1 SYNOPSIS
3155163Sshin
3255163Sshin    use encoding::warnings; # or 'FATAL' to raise fatal exceptions
3355163Sshin
3455163Sshin    utf8::encode($a = chr(20000));  # a byte-string (raw bytes)
3555163Sshin    $b = chr(20000);                # a unicode-string (wide characters)
3655163Sshin
3755163Sshin    # "Bytes implicitly upgraded into wide characters as iso-8859-1"
3855163Sshin    $c = $a . $b;
3966776Skris
4055163Sshin=head1 DESCRIPTION
4155163Sshin
4262632Skris=head2 Overview of the problem
4355163Sshin
4462632SkrisBy default, there is a fundamental asymmetry in Perl's unicode model:
4555163Sshinimplicit upgrading from byte-strings to unicode-strings assumes that
4655163Sshinthey were encoded in I<ISO 8859-1 (Latin-1)>, but unicode-strings are
4755163Sshindowngraded with UTF-8 encoding.  This happens because the first 256
4855163Sshincodepoints in Unicode happens to agree with Latin-1.
4955163Sshin
5055163SshinHowever, this silent upgrading can easily cause problems, if you happen
5155163Sshinto mix unicode strings with non-Latin1 data -- i.e. byte-strings encoded
5255163Sshinin UTF-8 or other encodings.  The error will not manifest until the
5355163Sshincombined string is written to output, at which time it would be impossible
5455163Sshinto see where did the silent upgrading occur.
5555163Sshin
5655163Sshin=head2 Detecting the problem
5762632Skris
5855163SshinThis module simplifies the process of diagnosing such problems.  Just put
5955163Sshinthis line on top of your main program:
6055163Sshin
6155163Sshin    use encoding::warnings;
6255163Sshin
6355163SshinAfterwards, implicit upgrading of high-bit bytes will raise a warning.
64118664SumeEx.: C<Bytes implicitly upgraded into wide characters as iso-8859-1 at
6555163Sshin- line 7>.
6655163Sshin
6755163SshinHowever, strings composed purely of ASCII code points (C<0x00>..C<0x7F>)
6855163Sshinwill I<not> trigger this warning.
6962632Skris
70118664SumeYou can also make the warnings fatal by importing this module as:
7162632Skris
72118664Sume    use encoding::warnings 'FATAL';
7362632Skris
7462632Skris=head2 Solving the problem
75118660Sume
7662632SkrisMost of the time, this warning occurs when a byte-string is concatenated
7762632Skriswith a unicode-string.  There are a number of ways to solve it:
7855163Sshin
7955163Sshin=over 4
80118660Sume
8155163Sshin=item * Upgrade both sides to unicode-strings
8255163Sshin
8355163SshinIf your program does not need compatibility for Perl 5.6 and earlier,
8455163Sshinthe recommended approach is to apply appropriate IO disciplines, so all
8555163Sshindata in your program become unicode-strings.  See L<encoding>, L<open> and
86118660SumeL<perlfunc/binmode> for how.
8755163Sshin
8855163Sshin=item * Downgrade both sides to byte-strings
8955163Sshin
9055163SshinThe other way works too, especially if you are sure that all your data
9155163Sshinare under the same encoding, or if compatibility with older versions
9255163Sshinof Perl is desired.
9355163Sshin
9455163SshinYou may downgrade strings with C<Encode::encode> and C<utf8::encode>.
9562632SkrisSee L<Encode> and L<utf8> for details.
9655163Sshin
9755163Sshin=item * Specify the encoding for implicit byte-string upgrading
9855163Sshin
9955163SshinIf you are confident that all byte-strings will be in a specific
100118664Sumeencoding like UTF-8, I<and> need not support older versions of Perl,
10155163Sshinuse the C<encoding> pragma:
10255163Sshin
10355163Sshin    use encoding 'utf8';
10455163Sshin
10555163SshinSimilarly, this will silence warnings from this module, and preserve the
10655163Sshindefault behaviour:
10755163Sshin
10855163Sshin    use encoding 'iso-8859-1';
10955163Sshin
110118660SumeHowever, note that C<use encoding> actually had three distinct effects:
11155163Sshin
11255163Sshin=over 4
11355163Sshin
11455163Sshin=item * PerlIO layers for B<STDIN> and B<STDOUT>
11555163Sshin
116118660SumeThis is similar to what L<open> pragma does.
117118664Sume
11855163Sshin=item * Literal conversions
11955163Sshin
12055163SshinThis turns I<all> literal string in your program into unicode-strings
121118664Sume(equivalent to a C<use utf8>), by decoding them using the specified
12255163Sshinencoding.
12355163Sshin
12455163Sshin=item * Implicit upgrading for byte-strings
125118660Sume
126118664SumeThis will silence warnings from this module, as shown above.
127118664Sume
128118664Sume=back
129118664Sume
130118664SumeBecause literal conversions also work on empty strings, it may surprise
13155163Sshinsome people:
13255163Sshin
13355163Sshin    use encoding 'big5';
134118664Sume
13555163Sshin    my $byte_string = pack("C*", 0xA4, 0x40);
13655163Sshin    print length $a;    # 2 here.
13755163Sshin    $a .= "";           # concatenating with a unicode string...
138118664Sume    print length $a;    # 1 here!
13955163Sshin
14055163SshinIn other words, do not C<use encoding> unless you are certain that the
14155163Sshinprogram will not deal with any raw, 8-bit binary data at all.
14255163Sshin
14355163SshinHowever, the C<Filter =E<gt> 1> flavor of C<use encoding> will I<not>
14455163Sshinaffect implicit upgrading for byte-strings, and is thus incapable of
14555163Sshinsilencing warnings from this module.  See L<encoding> for more details.
14655163Sshin
14755163Sshin=back
148118664Sume
149118664Sume=head1 CAVEATS
15055163Sshin
15155163SshinFor Perl 5.9.4 or later, this module's effect is lexical.
15255163Sshin
15355163SshinFor Perl versions prior to 5.9.4, this module affects the whole script,
15455163Sshininstead of inside its lexical block.
15555163Sshin
15655163Sshin=cut
15755163Sshin
15855163Sshin# Constants.
15955163Sshinsub ASCII  () { 0 }
16055163Sshinsub LATIN1 () { 1 }
16155163Sshinsub FATAL  () { 2 }
16255163Sshin
16355163Sshinsub import {
16455163Sshin    if ($] >= 5.025003) {
16555163Sshin        require Carp;
16655163Sshin        Carp::cluck(
16755163Sshin            "encoding::warnings is not supported on Perl 5.26.0 and later"
16855163Sshin        );
16955163Sshin        return;
170118664Sume    }
171118664Sume
172118664Sume    # Install a ${^ENCODING} handler if no other one are already in place.
173118664Sume    my $class = shift;
174118664Sume    my $fatal = shift || '';
17555163Sshin
176118660Sume    local $@;
177118664Sume    return if ${^ENCODING} and ref(${^ENCODING}) ne $class;
178118664Sume    return unless eval { require Encode; 1 };
17955163Sshin
18055163Sshin    my $ascii  = Encode::find_encoding('us-ascii') or return;
181118660Sume    my $latin1 = Encode::find_encoding('iso-8859-1') or return;
182118664Sume
18355163Sshin    # Have to undef explicitly here
184    undef ${^ENCODING};
185
186    # Install a warning handler for decode()
187    my $decoder = bless(
188        [
189            $ascii,
190            $latin1,
191            (($fatal eq 'FATAL') ? 'Carp::croak' : 'Carp::carp'),
192        ], $class,
193    );
194
195    no warnings 'deprecated';
196    ${^ENCODING} = $decoder;
197    use warnings 'deprecated';
198    $^H{$class} = 1;
199}
200
201sub unimport {
202    my $class = shift;
203    $^H{$class} = undef;
204    undef ${^ENCODING};
205}
206
207# Don't worry about source code literals.
208sub cat_decode {
209    my $self = shift;
210    return $self->[LATIN1]->cat_decode(@_);
211}
212
213# Warn if the data is not purely US-ASCII.
214sub decode {
215    my $self = shift;
216
217    DO_WARN: {
218        if ($] >= 5.009004) {
219            my $hints = (caller(0))[10];
220            $hints->{ref($self)} or last DO_WARN;
221        }
222
223        local $@;
224        my $rv = eval { $self->[ASCII]->decode($_[0], Encode::FB_CROAK()) };
225        return $rv unless $@;
226
227        require Carp;
228        no strict 'refs';
229        $self->[FATAL]->(
230            "Bytes implicitly upgraded into wide characters as iso-8859-1"
231        );
232
233    }
234
235    return $self->[LATIN1]->decode(@_);
236}
237
238sub name { 'iso-8859-1' }
239
2401;
241
242__END__
243
244=head1 SEE ALSO
245
246L<perlunicode>, L<perluniintro>
247
248L<open>, L<utf8>, L<encoding>, L<Encode>
249
250=head1 AUTHORS
251
252Audrey Tang
253
254=head1 COPYRIGHT
255
256Copyright 2004, 2005, 2006, 2007 by Audrey Tang E<lt>cpan@audreyt.orgE<gt>.
257
258This program is free software; you can redistribute it and/or modify it
259under the same terms as Perl itself.
260
261See L<http://www.perl.com/perl/misc/Artistic.html>
262
263=cut
264