Re: one interview question, 17 lines in java, 3 lines in ruby.



Lew wrote:

Well, my one is stored in bytes -- see a "Content-Type" field of my message. :-)

That tells how your message is sent, not how your source is stored.

Well, not necessarily stored as such. But believe me, on my disk a copy of my message is stored exactly as was transmitted, that is in bytes representing a source characters (encoded using charset ISO-8859-2) you (and others) have received later.

Similar is the original source file (C.java) of a published piece of code, which size is exactly 177 bytes. The only minor difference between the post message and the file is that the source code was converted into bytes using Cp1250 charset, which in this particular source code case gives exactly the same sequence of bytes, what using ISO-8859-2 charset gives.


According to the Java Language Specification, Java source files are in characters:
Programs are written in Unicode ...

and
Programs are written using the Unicode character set.

You see? Are *written*, not necessarily *stored* as such.

Other way around. JSL Chapter 3:
lexical translations are provided (§3.2) so that Unicode escapes (§3.3) can be used to include any Unicode character using only ASCII characters.

The /Unicode escapes/ are completely unrelated to what we are talking about. They are being processed after conversion of a source file bytes into characters (ASCII, or Unicode). In other words, they are already characters -- called a /raw Unicode character stream/ -- which are translated into other Unicode characters. Translation into sequence of input tokens begins just after that translation.


But you right, to avoid confusions in our small contest, better is to count characters. :)

The JLS requires it.

Nope. AIUI, I can _store_ the source code in whatever form I like, and the JLS can not prevent me from doing that. The only requirement is to instruct my compiler (normally using -encoding option) on how Unicode (or ASCII) characters are encoded (as bytes) in my Java source files.


piotr
.



Relevant Pages

  • Re: What a translation unit is.
    ... This is normally called a preprocessing TU. ... You can call it a "translated translation unit" if you wish. ... The source file is decomposed into preprocessing tokens/4/ and sequences ... New-line characters are retained. ...
    (alt.comp.lang.learn.c-cpp)
  • Re: What a translation unit is.
    ... nobody denies that the third phase of translation exists. ... > by one space character. ... New-line characters are retained. ... Your talking about decomposing the source file now. ...
    (alt.comp.lang.learn.c-cpp)
  • Re: should __FILE__, __DATE__ and __TIME__ remain constant?
    ... The presumed name of the current source file (a character string ... the number of new-line characters read or introduced in translation ... Jun, Woong ...
    (comp.std.c)
  • Re: Understanding Error Messages
    ... Go into HEX mode in ISPF edit of your SOURCE code and look at what is in column ... COBOL characters as seen in HEX MODE edit of your source code. ... HOW you got the bad data in your source file in the first place is another ...
    (comp.lang.cobol)
  • Re: [C] simple string question
    ... >> what I want to do is, copy characters from some fixed positions ... >> at a source file, and then write those fixed length characters ... You don't need to think about buffer sizes. ...
    (comp.lang.c)

Loading