Re: Lets think who will like to say delphi is dying?
- From: "Adem" <adem.meda@xxxxxxxxx>
- Date: 31 Mar 2008 16:14:38 -0700
L wrote:
Yep Most of it is "implied" like "<b> means bold"..
So I see you wish to be more "explicit".
No, not that..
The fact that "<b> means bold" is something else --it is (how shall I
put it) /implicit/ knowledge. I.e. neither SGML nor XML tells you what
exactly '<b>' should do --just like in, say, Pascal whereby parsing a
token (say, 'const') tells you anything which is implicit in 'const'.
That bit is part of the intelligence you have to build to your compiler
(or whatever works with the parsed stuff).
What i am interested, at this stage, about --say-- '<b>' is where it is
allowd and where it is not --or, where it is redundant.
take this HTML snippet, for example:
<b>
<body>
</body>
</b>
would this be legal/allowed?
If not, why and how dou know that?
The answer is:
And, the DTD tells you that.
That is, if that particular DTD rules it out.
W3C's DTDs don't allow for it (I know it by intuition) but some other
body's DTD may.
The only way I will be sure is by parsing out the DTD specified on the
document itself --and, the DTD is in SGML, which means I need to be
able to parse the darn SGML... <g>
Since XML is designed to be such a flexible language, it is not so
specific..
Actually, there are 2 basic usage forms for XML.
If you use it as a meta language that defines the rules for something
else, it seems to be somewhat insufficient.
If, on the other hand, you use XML to transfer some data from one end
to another, XML is quite sufficient/usable. That is, as long as you are
in control at both ends.
Years ago I wrote something called TXMLDataset to be used to
export/import/transfer data. As long as you were using TXMLDataset at
both ends, it was very very useful. [I ran out of patience supporting
it publicly and it died of neglect.]
until someone specifies some rules that somehow make it
more explicit.. i.e. it is kind of.. too flexible.. as is HTML in
many cases where many things are implied. At the same time, making
everything explicit is very hard in a flexible system.. pascal/strong
typed languages have kind of a fixed system but ability to extend the
type system.... so still flexibility even with restrictions...
Actually, HTML is/canbe quite strict too. Except that it has had an
unfortunate start: There were no DTDs in the beginning, so people wrote
free hand HTML code which they expected the browsers to hand all the
same. So, we have these browsers with quirks modes to allow for sloppy
HTML codes.
The fact that history is like that does not mean we should still
produce sloppy HTML. We have DTDs and we should make use of them.
But a pascal program is producing an Exe or Elf executable and this
format is known.. html an xml are not compiling anything into a known
end output unless we know what the html and XML are for sure doing..
such as html is displaying in a GUI.. but XML can be doing anything.
Since xml can do anything, it makes it hard for XML to be anything
concrete..
HTML or XML on their own (respectively) is too flexible --even
slimey--, I agree. But, with DTDs, they are all fixed.
<sidepost>
I believe it was you who wrote a page (if only I could locate a link to
it now) that a better form of HTML has to be somethong like Borland's
DFMs.
And, I agree, DFM is a more suitable format/protocol/language for the
world than HTML --except that, in DFM everything is fixed size (fixed
to the pixel) whereas HTML is geared towards a more flexible layout
which means --until DFM (ever) handles a similarly flexible layout--
HTML is the winner.
</sidepost>
As you know, EXE or ELF formats are outputs of compilers, not their
internal structures, so that the OS's 'loader' can load and execute
them.
When I say 'compile' a .html file (or stream) what I mean is this:
i) Pre-parse it until it tells you what DTD it is using.
ii) Load the corresponding VCL (or whatever you'd call that object
structure) which does the rest of the parsing process to produce a well
formed and fully object oriented DOM. In this DOM each and every object
is a well-defined one, and there is a different object for each and
every different 'tag' (and 'attribute', if necesary).
iii) In the case that you do not have a pre-made VCL for that DTD, you
will need to parse that DTD (i.e. SGML code) and produce (pascal)
source code for it and then compile it. This is not as hard as it
sounds, because the primitives in the DTD are finite, which means, most
of the time you'll be linking to a basic library. You may need to do
some hand coding (once) for that DTDs VCL and then you can use (and
share) it for the rest of your coding life.
All this is doable. Very doable. Only if someone comes up with proper
parser for SGML.
[[
Interestingly, I did a Google Search the other day, and I came across
someone I think is familiar in these NGs.
http://groups.google.com/group/comp.text.sgml/browse_thread/thread/657158dcf5946e44/1b9469e8386be41e?lnk=st&q=#1b9469e8386be41e
I am not sure, but it seems very likely that it is Marko Binic of
BergSoft. http://www.bergsoft.net/
If so, I hope he is reading this. He is my last hope :)
]]
Once you have compiled/objectified your HTML, what format you save that
to disk is your own business <g>; what I am interested in is getting to
that point :)
I'm not too familiar with XML, but what about strict "schemas"? Are
the competing?
See this:
http://www.w3.org/TR/NOTE-sgml-xml-971215
I have looked at some XML DTDs. The ones written from scratch seem to
be OK to work with, but those that are translated from SGML are far too
convoluted (hence unworkable) for my taste.
[All these lines just to summarize the situation. Now, let's move
on a little.]
Mere parsing of a .html file is the most primitive (elemental)
thing we do. And, by doing so, all we end up is a meaningless tag
tree.
Well we make some implied guesses.. such as bold means bold.. but I
see what you mean..
Thanks. I was afraid you's snap back --i don't know why I felt that
though-- saying I was talking through my hat :)
And then, it would generate the whole .html file on the fly. All
clean and rule-obeying etc..
Have to think about it more before I reply.. some definitely
interesting thoughts to grok here..
About time I got my own back :P [*] For I have read almost all of your
site, a lot of which seemed weird/unconventional at first but made me
think and agree most of the time..
[* that is, if you're the same Lars as in Lars Olson (L505) of
Z505.com ]
If the .html file is statically generated at the server, you could
of course get away with a hand coded parser,
All parsers are kind of hand coded in a manner, it is just a matter
of making them stricter... For example pascal compilers are strict
and are still hand coded.. a parser is still a parser..
(there are parsers which are less handcoded directly and use YACC or
similar things, but my point is that a parser is sstill a parser and
still requires eventual hand coding at some point)
The thing I was trying to get accross is this: Parsing HTML is NOT a
process of extracting what's between '<' and '>'. There's a definition
of what token passes as HTML and that definition is in the DTD file
referred in the '!DOCTYPE' line. Since each DTD can be different, not
taking the DTD into account does not count as parsing in my book.
The alternative I ma proposing (if that is the right word) is a lot
simpler. Instead of simply parsing the .html file, you compile it.
And, the funny thing is, it is a lot faster. Faster to use for the
developer, faster to use in run-time --I know because I have done
it. I compared a number of parsers available to the one I wrote.
Parsing and compiling are kind of the same thing.. but I think you
just mean more stricture compilation/parsing/"pre-paration".
Parsing and compiling are somewhat different. By 'parsing' you get a
token tree (and what is a token is defined in DTD). If you find a token
that is not defined in the DTD you may ignore it, or raise an error
--upto you.
By 'compiling' you apply the rules defined in the DTD. It dictates what
token may surround/contain what token (i.e. hierachical (sp?) list) nad
how you convert the values in its (string) attributes to native types.
<from child post>
All parsers are essentially hand coded at some point.. if you need to
make it flexible, then consider some sort of text file that holds a
ruleset or a "definition".. or even a database of some sort, if I can
abuse the term database... or an embedded definition of some sort
somewhere in the file.. (but since you are compiling, it may be
better external).
That text file that holds the ruleset *is* the DTD.
But, what format are you COMPILING it into?
[also see far above]
I am compiling it into native objects (a DOM tree made of native
objects) that Delphi can use without any further work.
</from child post>
Well software engineering is in a sad state and many people don't
understand a lot of serious concepts that needs to be addressed - I
have similar problems getting messages across to people on these
newsgroups and other places.. so don't feel alone.
Frankly, I am not claiming I understand it all either, but there are
things I believe are in my grasp and can be tackled to make everyone's
life a little better.
.
- Follow-Ups:
- Prev by Date: Re: Attention doomsayers
- Next by Date: Re: Meanwhile, sounds like the unicode update is coming along ...
- Previous by thread: Re: Attention doomsayers
- Next by thread: Re: Lets think who will like to say delphi is dying?
- Index(es):
Relevant Pages
|