Re: output ampersand using XML::Twig
From: Michel Rodriguez (mirod_at_xmltwig.com)
Date: 10/27/03
- Previous message: Jürgen Exner: "Re: how to find the dimension of array??"
- In reply to: Dave Roe: "output ampersand using XML::Twig"
- Next in thread: Dave Roe: "Re: output ampersand using XML::Twig"
- Reply: Dave Roe: "Re: output ampersand using XML::Twig"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Mon, 27 Oct 2003 13:49:27 +0100
Dave Roe wrote:
> I am using XML::Twig to generate HTML output during an Apache request.
> How can I output ' ' without it being converted into '&nbsp'?
> (the ampersand is converted into & and I lose the last semi-colon.)
> Is it an encoding issue or something that can be resolved with a CDATA
> section?
> #!/usr/bin/perl -w
>
> use strict;
> use XML::Twig;
>
> my $box = new XML::Twig::Elt('box');
> $box->set_content(' ');
>
> # I've also tried this:
> # my $box = new XML::Twig::Elt('#CDATA' => ' ')->wrap_in('box');
>
> my $twig = new XML::Twig();
> $twig->set_root($box);
>
> my $xml = $twig->sprint();
> print STDERR "$xml\n";
Hi,
The easy answer first: the reason you lose the semicolon is... because
it was never there: you wrote $box->set_content(' ') ;--)
Now for the real problem:
When you create the element, using XML::Twig::Elt, the string is stored
directly as the content of the element, and then escaped when output. So
the ampersand is normally escaped when you output it as xml, which is
what sprint does.
BTW, the reason for this is when you use XML::Twig to process existing
XML data, then it receives unescaped utf8 strings from the parser
(expat, through XML::Parser): if you have in the original XML
then XML::Twig receives the non-breakable space character, and if you
have & it receives just &. (unless you use the keep_encoding option,
in which case no escaping is done).
There are many ways to deal with this, the basic idea (except for the
first 2 solutions) being to get the unicode character for in the
string, and then playing with with output filters to convert it to an
entity when sprintf is used. Pick the one you like best (all code below
tested with 5.8.1, when noted also tested with a stock 5.6.1):
#!/usr/bin/perl -w
use strict;
use XML::Twig;
my $tag= 'box';
{ # the hackish way: turn off XML escapes for the element content
# works also on 5.6.1
my $box = new XML::Twig::Elt( $tag => ' ');
$box->set_asis( 1);
my $twig = new XML::Twig();
$twig->set_root( $box);
my $xml = $twig->sprint();
printf STDERR "%-35s: %s\n", "turn off xml escape for the element", $xml;
}
{ # an other hackish way: use the keep_encoding option
# works also on 5.6.1
my $box = new XML::Twig::Elt( $tag => ' ');
my $twig = new XML::Twig(keep_encoding => 1);
$twig->set_root( $box);
my $xml = $twig->sprint();
printf STDERR "%-35s: %s\n", "use the keep_encoding mode", $xml;
}
{ # just output the character, unicode-aware browsers
# will display it properly
# works also on 5.6.1
my $box = new XML::Twig::Elt( $tag => "\x{a0}");
my $twig = new XML::Twig();
$twig->set_root( $box);
my $xml = $twig->sprint();
printf STDERR "%-35s: %s\n", "output character", $xml;
}
{ # use an Encode output filter that encodes (using decimal
# character entities) anything outside the pure ascii range
use Encode;
my $filter= sub { return encode( "ascii", $_[0], Encode::FB_HTMLCREF) };
my $twig = new XML::Twig( output_filter => $filter);
my $box = new XML::Twig::Elt( $tag => "\x{a0}");
$twig->set_root( $box);
my $xml = $twig->sprint();
printf STDERR "%-35s: %s\n", "using html character entities", $xml;
}
{ # use an Encode output filter that encodes (using hexa
# character entities) anything outside the pure ascii range
use Encode;
my $filter= sub { return encode( "ascii", $_[0], Encode::FB_XMLCREF) };
my $twig = new XML::Twig( output_filter => $filter);
my $box = new XML::Twig::Elt( $tag => "\x{a0}");
$twig->set_root( $box);
my $xml = $twig->sprint();
printf STDERR "%-35s: %s\n", "using xml character entities", $xml;
}
{ # use charnames ':full' to enter the special character by name
use Encode;
use charnames ':full';
my $filter= sub { return encode( "ascii", $_[0], Encode::FB_XMLCREF) };
my $twig = new XML::Twig( output_filter => $filter);
my $box = new XML::Twig::Elt( $tag => "\N{NO-BREAK SPACE}");
$twig->set_root( $box);
my $xml = $twig->sprint();
printf STDERR "%-35s: %s\n", "using named entity input", $xml;
}
{ # use HTML::Entities to get the entity name
# the second argument to encode_entities ensures that only
# high-bit charactres are escaped, and not <, > & and ",
# which are supposed to be output (those characters in the content
# would be escaped by XML::Twig if needed, see below).
# works also on 5.6.1
use HTML::Entities;
use charnames ':full';
my $filter= sub { return encode_entities( $_[0], "\x80-\xff") };
my $twig = new XML::Twig( output_filter => $filter);
my $box = new XML::Twig::Elt( $tag => "\N{NO-BREAK SPACE}");
$twig->set_root( $box);
my $xml = $twig->sprint();
printf STDERR "%-35s: %s\n", "using named entity output", $xml;
$box = new XML::Twig::Elt( $tag => "< \N{NO-BREAK SPACE} > &");
$twig->set_root( $box);
$xml = $twig->sprint();
printf STDERR "%-35s: %s\n", "same, checking escapes", $xml;
}
- Previous message: Jürgen Exner: "Re: how to find the dimension of array??"
- In reply to: Dave Roe: "output ampersand using XML::Twig"
- Next in thread: Dave Roe: "Re: output ampersand using XML::Twig"
- Reply: Dave Roe: "Re: output ampersand using XML::Twig"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|