Re: Getting URLs
- From: pjb@xxxxxxxxxxxxxxxxx (Pascal J. Bourguignon)
- Date: Wed, 12 Aug 2009 13:25:14 +0200
"webmasterATflymagnetic.com" <webmaster@xxxxxxxxxxxxxxx> writes:
Hi,
I have this function:
(setf *wget* "C:/Program Files/GnuWin32/bin/wget.exe")
(setf *oput* "C:/temp/lisp.out")
(defun get-url (str) "Executes wget to put the search result into
*oput*."
(when str
(ext:run-program *wget* :arguments
(list (concatenate 'string "--output-document=" *oput*)
str))))
to retrieve an html file and save it to disk. I then open the file as
a stream and manipulate the data there.
I have two questions:
1 Is this how you should execute external applications, and is it
portable across Lisp systems (ok this is a Windows eecutable, but is
the approach portable)? I'm using CLisp on Windows, but I also use
CLisp and ABCL on Linux -- however I've not tried executing external
programs on those yet.
2 Can this particular task (ie getting html content) be done totally
in Lisp?
Yes. However, you have already understood a fundamental concept, that
of abstraction. You've abstracted away the implementation of that
feature behind the GET-URL function. This one works, that's all you
need for now.
However, when you will change the program to get the web resources
(eg. you could (setf *wget* "/usr/bin/curl")), then your GET-URL
function won't work anymore (curl doesn't take the same options as
wget). Similarly, the direct use of ext:run-program renders it
dependent on CL implementations having a EXT package and a function
called RUN-PROGRAM in that package with the same API (:arguments
keyword argument) etc. It's documentation string is too specific too,
its parameter is ill-named and it is lacking another parameter. You
could redesign the GET-URL function to be resilient to these changes.
I would do something like:
(defun get-resource (uri &key (file-pathname nil in-file-p))
"
DO: Fetches a web resources at URI.
URI: A string containing the URI of the resource to fetch.
FILE-PATHNAME: When present it should be a string of pathname.
It indicates the resource should be stored in this file.
RETURN: * if FILE-PATHNAME is given,
then the pathname where the resource is stored
else a byte vector containing the resource ;
* the mime type of the resource (the Content-Type).
"
...
)
Otherwise, it is possible to directly connect from lisp to the web
server and implementing all the HTTP protocol and options, download
safely a resource directly into the lisp image. This could be, or
not, worthwhile the work (either to implement it or to find a library
and integrate it). Notably, if you are fetching resources that are
big, it might be better to store them in a file, and to process the
file from the disk, without loading it wholly in memory.
Notice also that some get-resource or similar functions already exist
in various libraries.
I'm still relatively new to Lisp and as it's such a big language I
thought I'd tackle it in chunks. It is pretty damned impressive in how
you can get something together very quickly, *and* see virtually the
entire program on the screen at the same time!
Cheers!
--
__Pascal Bourguignon__
.
- References:
- Getting URLs
- From: webmasterATflymagnetic.com
- Getting URLs
- Prev by Date: Re: Getting URLs
- Next by Date: Re: Getting URLs
- Previous by thread: Re: Getting URLs
- Next by thread: Re: Getting URLs
- Index(es):
Relevant Pages
|