Re: Convert html diacritics to unicode



On Thu, 30 Oct 2008 12:33:51 -0700, deech wrote:

Hi all,
I am trying to convert an HTML page that includes accent characters into
unicode. Is there a way to do this is Common Lisp?

Yes. Unless you need to verify the correctness of the input or need some
output format other than HTML, a simple algorithm that replaces strings
using a table (eg "&eacute" -> "é") should suffice.

Search for the terms "replace string" in the c.l.l archives (eg using
Google groups).

HTH,

Tamas
.



Relevant Pages

  • Re: E. W. Dijkstra VS. John McCarthy. A rebuttal to Paul Grahamsweb writings.
    ... > In Common Lisp, with Tim Bradshaw's HTOUT macro package, this: ... In my own PHP HTML class I write: ... With the phpwiki html classes it's much better, ...
    (comp.lang.lisp)
  • Re: Looking for a CL "web scraper"
    ... > HTML, then you have to deal with some really bad HTML that ... > only parser I know of that can handle this kind of HTML is ... Closure is a complete and standards-conforming web browser, ... the choice of using Common Lisp allowed ...
    (comp.lang.lisp)
  • Some feedback about the The Common Lisp Directory project. (And no, its not in Python... ;-)
    ... As most of you have probably found out now, the switch of the Common Lisp ... Directory to Python was only effective on April First. ... Here are some feedback and real life data from the Common Lisp Directory ... the hability to blend the HTML generation in the Lisp code ...
    (comp.lang.lisp)
  • Re: Convert html diacritics to unicode
    ... Running html-entities:decode HTML does the trick. ... I am trying to convert an HTML page that includes accent characters into ... unicode. ... a simple algorithm that replaces strings ...
    (comp.lang.lisp)
  • Re: Shift_JIS conflict with CSS?
    ... dreadfully convoluted HTML, however with OO the only cluttering it did ... I had which understood UTF-8 was JWPce. ... stuff and resave the file without any corruption of the unicode. ... The bottom line is that if you want a plain text editor that can input ...
    (sci.lang.japan)