Re: Getting URLs



"webmasterATflymagnetic.com" <webmaster@xxxxxxxxxxxxxxx> writes:

Hi,

I have this function:

(setf *wget* "C:/Program Files/GnuWin32/bin/wget.exe")
(setf *oput* "C:/temp/lisp.out")

(defun get-url (str) "Executes wget to put the search result into
*oput*."
(when str
(ext:run-program *wget* :arguments
(list (concatenate 'string "--output-document=" *oput*)
str))))

to retrieve an html file and save it to disk. I then open the file as
a stream and manipulate the data there.

I have two questions:

1 Is this how you should execute external applications, and is it
portable across Lisp systems (ok this is a Windows eecutable, but is
the approach portable)? I'm using CLisp on Windows, but I also use
CLisp and ABCL on Linux -- however I've not tried executing external
programs on those yet.

2 Can this particular task (ie getting html content) be done totally
in Lisp?

Yes. However, you have already understood a fundamental concept, that
of abstraction. You've abstracted away the implementation of that
feature behind the GET-URL function. This one works, that's all you
need for now.

However, when you will change the program to get the web resources
(eg. you could (setf *wget* "/usr/bin/curl")), then your GET-URL
function won't work anymore (curl doesn't take the same options as
wget). Similarly, the direct use of ext:run-program renders it
dependent on CL implementations having a EXT package and a function
called RUN-PROGRAM in that package with the same API (:arguments
keyword argument) etc. It's documentation string is too specific too,
its parameter is ill-named and it is lacking another parameter. You
could redesign the GET-URL function to be resilient to these changes.

I would do something like:

(defun get-resource (uri &key (file-pathname nil in-file-p))
"
DO: Fetches a web resources at URI.
URI: A string containing the URI of the resource to fetch.
FILE-PATHNAME: When present it should be a string of pathname.
It indicates the resource should be stored in this file.
RETURN: * if FILE-PATHNAME is given,
then the pathname where the resource is stored
else a byte vector containing the resource ;
* the mime type of the resource (the Content-Type).
"
...
)


Otherwise, it is possible to directly connect from lisp to the web
server and implementing all the HTTP protocol and options, download
safely a resource directly into the lisp image. This could be, or
not, worthwhile the work (either to implement it or to find a library
and integrate it). Notably, if you are fetching resources that are
big, it might be better to store them in a file, and to process the
file from the disk, without loading it wholly in memory.


Notice also that some get-resource or similar functions already exist
in various libraries.


I'm still relatively new to Lisp and as it's such a big language I
thought I'd tackle it in chunks. It is pretty damned impressive in how
you can get something together very quickly, *and* see virtually the
entire program on the screen at the same time!

Cheers!

--
__Pascal Bourguignon__
.



Relevant Pages

  • Re: Pythons "only one way to do it" philosophy isnt good?
    ... Sure there is -- smart pointers handle many sorts of situations, ... is how non-memory resource deallocation has been dealt with in Lisp ... close files in Python. ... Lisp is also good for those things too. ...
    (comp.lang.python)
  • Re: Pythons "only one way to do it" philosophy isnt good?
    ... context manager is a superset of the RAII functionality. ... how non-memory resource deallocation has been dealt with in Lisp since ...
    (comp.lang.python)
  • Re: Pythons "only one way to do it" philosophy isnt good?
    ... >> resource deallocation. ... how non-memory resource deallocation has been dealt with in Lisp since ... close files in Python. ...
    (comp.lang.python)
  • Re: Compiles without Executing
    ... how you define "logically correct" this might fit the bill. ... To meet the "never executes" criterion, ... OS can't load the program -- again for resource reasons, ...
    (comp.programming)
  • Re: Compiles without Executing
    ... how you define "logically correct" this might fit the bill. ... To meet the "never executes" criterion, ... OS can't load the program -- again for resource reasons, ...
    (comp.programming)