html_entity_decode + regex = profit?
- From: ReGenesis0 <ReGenesis0@xxxxxxx>
- Date: Tue, 28 Oct 2008 11:10:47 -0700 (PDT)
*bangs head against wall*
I'm stuck on a caret (^), and it's driving me nuts.
I've got an incoming string with a caret that's been escaped to
ˆ. The caret isn't a legal character where it's going (a text
string that will be used to generate a graphic with a font) so I have
to get rid of it.
My default solution was to check each character in the string against
a regex to see if it's one of the "allowed" characters. (The font
being used to generate the image only has ~60 chars, it's a display
font used for titles.) This regex, naturally, does NOT include a
caret as a legal character.
....and so, naturally, I get the number "710" showing up in the
resulting graphic.
html_entity_decode does nothing unless I set it to a restrictive
character set like UTF-8, which then eliminates some of the legal
characters i want to keep.
I can obviously just replace 'ˆ' in the original string... but
the /point/ is... there are probably OTHER problem characters slipping
through my net. I want a universal solution.
I feel like this shouldn't be this complicated. I have a friggin LIST
of allowable characters-- but even if I TEST AGAINST THAT LIST, one by
one, garbage from these encoded characters slips through.
I GATHER that this is a legacy of PHP's grotty character encoding. I
understand that. Is there ANYTHING I can do to convert an incoming
string to that each character == 1 character? Because (and if you'll
pardon my metaphorical black rage) THAT'S WHAT html_entity_decode is
SUPPOSED to do.
All I want to do is drop the whole damn thing into Unicode 16 or
something so that NO MATTER WHAT character I'm dealing with, be it a
circumflex, a greater-than, a euro, a bullet, a ~n, a Cyrillic
backwards R, or a Japanese 'ko,' ...is logically regarded BY PHP as a
SINGLE CHARACTER that can be subjected to logical scrutiny wihout
TEARING MY HAIR OUT.
RICH TEXT BITCHES. RICH TEXT!
*runs out of steam, panting*
*takes a breath, pushes back hair*
Sooo... am I missing something? Is this actually /possible/ and it's
just not working?
(And god DAMNIT, I can understand why PHP doesn't want to change the
existing behavior of html_entity_decode for all those legacy coders...
but why does there not seem to even be an OPTIONAL PARAMETER to force
it to convert ALL HTML entities instead of it's baffling behavior of
just converting the 'most common'? That's insane.)
-Derik
.
- Follow-Ups:
- Re: html_entity_decode + regex = profit?
- From: Curtis
- Re: html_entity_decode + regex = profit?
- Prev by Date: Re: Logging in to Outlook Web Access using PHP
- Next by Date: Password previously used ideas?
- Previous by thread: Re: PHP has encountered an access violation...
- Next by thread: Re: html_entity_decode + regex = profit?
- Index(es):
Relevant Pages
|