Re: How to strip HTML tags and just get the text




You posted a function that Delphi didn't put in my copy????
POSR



On Tue, 28 Jun 2005 17:37:37 +0300, "Aleksey Kuznetsov"
<nospam@xxxxxxxxxx> wrote:

>I decided to post this function. Any additions and comments welcome...
>
>{ removes the tags (all text withing <> brackets) from the string and
>returns the text without tags }
>function StripTags(const St: String): String;
>var
> B, E, SB, T: Integer;
>begin
> // todo: smarter parsing... actually " > " and " < " occurances can be
>allowed as usual text...
> Result := StringReplace(St, '&quot;', '"', [rfReplaceAll, rfIgnoreCase]);
> Result := StringReplace(Result, '&amp;', '&', [rfReplaceAll,
>rfIgnoreCase]);
> Result := StringReplace(Result, '&copy;', '(c)', [rfReplaceAll,
>rfIgnoreCase]);
> Result := StringReplace(Result, '&nbsp;', ' ', [rfReplaceAll,
>rfIgnoreCase]);
> Result := StringReplace(Result, '&#151;', '--', [rfReplaceAll]);
> Result := StringReplace(Result, ' ', ' ', [rfReplaceAll]);
> repeat
> B := Pos('<', Result);
> E := Pos('>', Result);
> if E = 0 then // if there is no ">" anymore
> begin
> if B <> 0 then
> Delete(Result, B, MaxInt); // this was last occurance of "<"
> // else -- no more tags
>
>{ StringReplace(Result, #13#10#13#10, #13#10, [rfReplaceAll]);
> StringReplace(Result, #10#10, #10, [rfReplaceAll]);}
> Result := Trim(Result);
> Break;
> end;
>
> if (B = 0) or (E < B) then // occurance of ">" without "<"... remove
>everything before ">"
> Delete(Result, 1, E)
> else
> begin
> T := E - B + 1;
> SB := PosR('<', Copy(Result, B + 1, T - 1));
> if SB <> 0 then // there is another "<" before ">"
> Delete(Result, SB + B, T - SB)
> else // normal tag
> Delete(Result, B, T);
> end;
> until False;
>
> Result := StringReplace(Result, '&lt;', '<', [rfReplaceAll,
>rfIgnoreCase]);
> Result := StringReplace(Result, '&gt;', '>', [rfReplaceAll,
>rfIgnoreCase]);
>end;
>

.



Relevant Pages

  • Validate HTML
    ... I want to validate some HTML in .net (delphi or c#) ... Certain tags are not container tags, ... but no click event script ...
    (borland.public.delphi.non-technical)
  • Re: building MS Office 2003 smart document
    ... If you are referring to smart *tags*, there's a tutorial in Delphi ... Informant magazine. ... Bruce ...
    (borland.public.delphi.non-technical)
  • Re: Programmatic links in a TKinter TextBox
    ... couple elif statements to the for loop, and removing the break lines. ... def click: ... for t in tags: ... Prev by Date: ...
    (comp.lang.python)
  • PDF 1.1 Reference
    ... i'm looking for a document summing up and explaining PDF 1.1 tags. ... Is there somewhere a file or site on the Internet availeble? ... Codeman ... Prev by Date: ...
    (comp.text.pdf)
  • Re: After Server Transfer page displaying Title
    ... tags out of the pages and it's fine.. ... >> If I do a server transfer to a new page, at the top of the page, the ... Prev by Date: ...
    (microsoft.public.dotnet.framework.aspnet)