Re: gotchas.html: Missing Hex

From: Boudewijn Dijkstra (usenet_at_bdijkstra.tmfweb.nl)
Date: 07/06/04


Date: Tue, 6 Jul 2004 20:49:03 +0200


"V S Rawat" <vsrawat@hclinfinet.com> schreef in bericht
news:40EA4E06.5030006@hclinfinet.com...
> Boudewijn Dijkstra wrote:
> >
> > There is a difference between entering characters into
> > your source code and entering characters into a char or
> > String. All \uXXXX escapes are treated the same and all
> > other escapes are treated the same. What more could you
> > want?
>
> OK.
> from my viewpoint, it is like this.
>
> If whatever is programmed in a software is known to a user,
> there is no problem. Every software has some compromises and
> users quickly adjust to them, as long as a user finds some
> other exciting feature in that software that are not there
> in others.
>
> Now that this "control chars not allowed in hex" is known to
> me I will adjust in future, though it will irritate me that
> I have to code them separately.

It is just \u000A and \u000D that you have to replace by \r and \n.
The other control characters can be used in any form without problems to the
compiler.

> I wonder how Mr Green found about that difference? If he had
> to spend hours in getting amazed at why his code is not
> working or had to read grizzalions of words in the docs to
> found about it at some remote unlikely corner, wouldn't you
> say that it is too taxing.
> ---------

Have you seen his website? I think he has written more words since the
existence of Java than the amount of words in the JLS and VMS (VM
specification) combined. Next to that, the docs aren't that big once you
have grasped the basics, so that you can read what you need as your skills
and needs advance.

> On a related note, I feel that it is also unnecessary
> restriction to differentiate between String str = "x"; and
> char chr = 'x'; (double quote for one type, and single quote
> for other).
>
> When the reference is clear, the software itself should
> understand and allow String str = 'x'; and char chr = "x";
> allowing both types of quote to be used interchangeably.
> Every other software does that.

The problem with Java is that it is, in fact, a necessary restriction
(CMIIAW). Java is a strongly typed language, that differentiates between
primitives and objects, unlike some other languages. Consider the problems
with your proposal when you want an anonymous String with one character,
when you want to split an arbitrary String into two halves and when
mixedly(?) concatenating Strings and characters.

> >> they should do it consistently and should convert all
> >> character, or should not convert a single character.
>
> > That would make it impossible to create a character
> > encoding that excludes ASCII control characters and write
> > Java source in it. Shouldn't it be possible to create an
> > arbitrary character encoding?
>
> Seems that being a newbie, I not able to see the
> consequences of what difference it makes.

In fact, I was wrong. It is not even possible now to write Java source in
an arbitrary character encoding, because the encoding would, for starters,
have to provide for the backslash. So, you can forget about that remark of
mine.

> Could you give me some idea why is it required to keep these
> different? Why can't it treat a quoteless \u000a different
> from "\u000a"?

It could have been treated differently, but it's internal designs make it
impossible. The escapes are processed, before things like quotes are
recognized and counted. This is a fundamental language feature, which you
can't change overnight.

> Aren't all other softwares pocessessing one type of
> encoding consistently in one single manner all over?

First of all, 'software' is an uncountable word; you cannot say "softwares",
"a software", "every software" or "these software". IMHO you should
consider the words "(programming) language", "(software) program" and
"compiler" more often. Back to your question, I think the answer is 'no',
but I could have been wrong in understanding the question.

> >> I am using unicode technique "\u003f" etc because I am
> >> working on unicode characters \u0600 through \u0D7F.
> >> With these, it is only natural that I write all other
> >> characters, even lower and upper ASCII including
> >> control characters, like that for ease of conversion to
> >> chars in the program after reading.
> >
> > You can avoid this by using a fancy editor that inserts
> > the characters directly into the source, without \uXXXX
> > escaping.
>
> Actually, I AM WRITING a fancy editor :-)
>
> What you say would have been real easy.
>
> But, I am not writing these in java program. I have kept all
> such things in separate files which I am then reading into
> the program. This prevents me from changing the program on
> any change in input, and keeps the program short. Users can
> also analyze it by going through the files, and need not
> look into the code (though the source is open).

In that case, you don't have to use those ugly escapes at all.

> Further, I like the *nix concept that input files should be
> kept simple. Thus, I have kept them lowerASCII text. That is
> all the reason why I need to write them as "\uxxxx" strings.

Other approaches can be simple, too.

> >> Writing in hex is convenient compared to writing in dec
> >> or octal because entire unicode documentation uses
> >> hex.
> >
> > If you know the keyboard layout of the script you are
> > writing in, using that keyboard layout is more
> > convenient, because you don't have to look up the code
> > point. I use it frequently for Greek characters. But in
> > your case, I don't expect you know all these 12(?)
> > alphabets and their keyboard layouts. :-)
>
> I HAVE CREATED keyboard layout for 8 indian and 1 persian
> script. That was my first java program using swing.

Creating them is one thing. Efficiently using them is another.

> Remember
> all those 8x40 array threads plaguing this ng for last one
> month? Thanks a lot to all of you.
>
> My first hand observation is that you guys do have a lot of
> patience with newbies. :-)

Some of us have ideals, some of us have nothing better to do.

> I am starting a new thread about keyboard layouts.
>
> >> Hope I do not face more quirks like above.
> >
> >
> > If it was a quirk, it wouldn't have been described in the
> > JLS.
>
> I have been programming in java for almost a month, and I
> have not read JLS yet. Actually, would you tell me what is
> JLS and where to find it?

JLS means 'Java language specification'. It is what defines the Java
language.
http://java.sun.com/docs/books/jls/second_edition/html/j.title.doc.html

> Seems I am one of those following the invariant assertion of
> programmers that "When nothing that you can thing of works,
> read the manual" :-)
>
> >> Thanks a lot for advance warning.
> >
> > If you've read the JLS before doing complex stuff like
> > escaping, it would have been in advance.
>
> Wouldn't you agree taht escaping has been "kept" complex in
> java by the above quirks. If this exception would not have
> been there, this thread would not have come up and I could
> have gone ahead using complex things without having to refer
> to JLS.

I agree that it is a design flaw. But I doubt as to whether a different and
possibly less flawed design decision could have been possible, viewing it
both from the political and the technical side.



Relevant Pages

  • Javac-compilor error
    ... discipline id.e.programming Java. ... from standard input and writes to standard output, but it is possible to redirect the input ... error occurs while trying to open the file, an exception of type IllegalArgumentException ... then this number of characters, then extra spaces are added to the front of x to bring ...
    (Fedora)
  • Re: Cons cell archaic!?
    ... So you are saying it's a blob of molton, but better blob than C or Java? ... indeed we will no longer need either assembly languages or C, ... practical way to learn what these recognized characters mean. ...
    (comp.lang.lisp)
  • Re: gotchas.html: Missing Hex
    ... > There is a difference between entering characters into ... char chr = 'x'; (double quote for one type, ... I am not writing these in java program. ... have not read JLS yet. ...
    (comp.lang.java.help)
  • Re: Is anything easier to do in java than in lisp?
    ... > Java chars are now just like C chars, only they are fixed to 16 bit ... they are not unicode chars. ... using the first 128 characters as-is and the last 128 characters only ... the claim that a java character is a Unicode character is not ...
    (comp.lang.lisp)
  • Re: Is anything easier to do in java than in lisp?
    ... > Java chars are now just like C chars, only they are fixed to 16 bit ... they are not unicode chars. ... using the first 128 characters as-is and the last 128 characters only ... the claim that a java character is a Unicode character is not ...
    (comp.lang.java)