Re: Filter mime/multipart E-Mail message to text/plain
- From: "Peter J. Holzer" <hjp-usenet2@xxxxxx>
- Date: Sat, 20 Jun 2009 22:40:48 +0200
On 2009-06-20 16:54, Marc Haber <mh+usenetspam0827@xxxxxxxxxx> wrote:
Marc Haber <mh+usenetspam0827@xxxxxxxxxx> wrote:
"Peter J. Holzer" <hjp-usenet2@xxxxxx> wrote:
On 2009-05-31 10:37, Marc Haber <mh+usenetspam0827@xxxxxxxxxx> wrote:
I need to write a filter (stdin->stdout) which can handle
MIME-formatted E-Mail messages of arbitrary size and will remove all
mime/multipart stuff, leaving only the first text/plain part intact
(including its encoding, my native language needs at least
ISO-8859-1). Message headers should stay completely unchanged
(including wrapped headers with continuation lines) modulo the changes
needed for the MIME adaption.
Are there any modules I should be looking at before writing the stuff
myself? MIME is sufficiently nasty that I'd like to avoid peddling
with it myself.
MIME-Tools (http://search.cpan.org/dist/MIME-tools/) are the standard
modules for MIME parsing and manipulation.
I was actually hoping for something less complex, but if that's not
available, I'll try to acquaint myself with MIME-Tools.
I now have the following code (sans error handling):
|use MIME::Parser;
|
|my $parser = MIME::Parser->new;
|$parser->output_to_core(1);
|my $entity=$parser->parse(\*STDIN);
|
|foreach my $part( $entity->parts ) {
| if( ($part->effective_type) =~ m|^text/plain| ) {
| #print $part->stringify_body. "\n";
| my $text=$part->bodyhandle->as_string;
| chomp $text;
| $text =~ s/^\s*//; $text =~ s/\s*$//;
| print "$text\n";
| last;
| }
|}
This works fine, only if the text part is
|Content-Type: text/plain; charset="ISO-8859-1"
|Content-Transfer-Encoding: quoted-printable
in which case the umlauts are printed verbatimly which doesn't show
too well on an UTF-8 terminal. If the input charset is UTF-8,
everything seems fine on the UTF-8 terminal.
Do MIME-Tools have possibilites to re-code a MIME Entity? How do I do
that?
I'm not quite sure what you are trying to achieve. I understood your
first posting that your filter should write a MIME message to stdout.
MIME messages aren't supposed to be printed directly to a terminal, they
are supposed to be parsed and interpreted by a MIME-conforming program
(usually a MUA or NUA). So it doesn't matter whether the encoding is
ISO-8859-1 or UTF-8 or GB18030 - the Content-Type header and the body
need to match in any case.
If you just want to print the text body of your mail to stdout, you need
to get the charset from the content-type header and decode the body
properly:
|use MIME::Parser;
|
+binmode STDOUT, ":encoding(UTF-8)"; # we want UTF-8 output
|
|my $parser = MIME::Parser->new;
|$parser->output_to_core(1);
|my $entity=$parser->parse(\*STDIN);
|
|foreach my $part( $entity->parts ) {
| if( ($part->effective_type) =~ m|^text/plain| ) {
| #print $part->stringify_body. "\n";
| my $text=$part->bodyhandle->as_string;
+ my $cs = $part->head->mime_attr("content-type.charset") ;
+ $cs = "iso-8859-1" unless ($cs && Encode::resolve_alias($cs));
+ $text = decode($cs, $text);
| chomp $text;
| $text =~ s/^\s*//; $text =~ s/\s*$//;
| print "$text\n";
| last;
| }
|}
You may want to adjust the fallback to iso-8859-1 in case of a missing
or unknown charset to your needs.
hp
.
- References:
- Re: Filter mime/multipart E-Mail message to text/plain
- From: Peter J. Holzer
- Re: Filter mime/multipart E-Mail message to text/plain
- Prev by Date: Intent to upload: Test::WWW::Mechanize::Driver
- Next by Date: how to output in utf-8
- Previous by thread: Re: Filter mime/multipart E-Mail message to text/plain
- Next by thread: Tie Registry question
- Index(es):
Relevant Pages
|
Loading