Re: Question concerning this list



Steven D'Aprano wrote:
On Sun, 31 Dec 2006 02:03:34 +0100, Thomas Ploch wrote:

Hello fellow pythonists,

I have a question concerning posting code on this list.

I want to post source code of a module, which is a homework for
university (yes yes, I know, please read on...).

So long as you understand your university's policy on collaborations.

Well, collaborations are wanted by my prof, but I think he actually
meant it in a way of getting students bonding with each other and
establishing social contacts. He just said that he will reject copy &
paste stuff and works that actually have nothing to do with the topic
(when we were laughing, he said we couldn't imagine what sometimes is
handed in).

It is a web crawler (which I will *never* let out into the wide world)

If you post it on Usenet, you will have let it out into the wide world.
People will see it. Some of those people will download it. Some of them
will run it. And some of them will run it, uncontrolled, on the WWW.

Out of curiosity, if your web crawler isn't going to be used on the web,
what were you intending to use it on?

It's a final homework, as I mentioned above, and it shouldn't be used
anywhere but our university server to test it (unless timing of requests
(i.e. only two fetches per second), handling of 'robots.txt' is
implemented). But you are right with the Usenet thing, havn't thought
about this actually, so I won't post the whole portion of the code.

which uses regular expressions (and yes, I know, thats not good, too).

Regexes are just a tool. Sometimes they are the right tool for the job.
Sometimes they aren't.

Alright, my prof said '... to process documents written in structural
markup languages using regular expressions is a no-no.' (Because of
nested Elements? Can't remember) So I think he wants us to use regexes
to learn them. He is pointing to HTMLParser though.

I have finished it (as far as I can), but since I need a good mark to
actually finish the course, I am wondering if I can post the code, and I
am wondering if anyone of you can review it and give me possible hints
on how to improve things.


It probably isn't a good idea to post a great big chunk of code and expect
people to read it all. If you have more specific questions than "how can
I make this better?", that would be good. Unless the code is fairly
short, it might be better to just post a few extracted functions and see
what people say about them, and then you can extend that to the rest of
your code.

You are probably right. For me it boils down to these problems:
- Implementing a stack for large queues of documents which is faster
than list.pop(index) (Is there a lib for this?)
- Getting Handlers for different MIME/ContentTypes and specify callbacks
only for specific Content-Types / MIME-Types (a lot of work and complex
checks)
- Handle different encodings right.

I will follow your suggestions and post my code concerning specifically
these problems, and not the whole chunk.

Thanks,
Thomas


.



Relevant Pages

  • Re: Pleas for help from clueless students?
    ... Some people seem to have real problems finding anything on google for ... However a lot of questions patently are homework. ... might never have heard of usenet can now offer their opinions and ... I haven't got as far as using outlook for news yet. ...
    (comp.arch.embedded)
  • Re: Solving this Equation For N
    ... and denominator were always hard for me. ... I guess if you post enough questions separately on usenet about ... After reading replies you get the separated equation: ... this isn't even homework; ...
    (sci.math)
  • Re: PLEASE HELP ME ! PLEASE
    ... teacher. ... HOMEWORK URGENT" type. ... USENET is a public forum that collects aggregate messages with no quality of service or promise of timely delivery. ... Direct replies will be blacklisted. ...
    (comp.lang.c)
  • Re: where are answers?
    ... don't immediately answer the easy questions. ... Usenet is not, in any way, a reliable message-passing scheme. ... same goes even for the Microsoft employees that post here. ... Is this a homework question? ...
    (microsoft.public.windowsxp.security_admin)