Re: String literals in Java

From: Harri Pesonen (fuerte_at_sci.fi)
Date: 05/28/04


Date: Fri, 28 May 2004 19:01:07 +0300

Ryan Stewart wrote:

> "Harri Pesonen" <fuerte@sci.fi> wrote in message
> news:30Btc.4351$%n5.2498@reader1.news.jippii.net...
>
>>It is nice to have SQL formatted in readable form, in all phases: Query
>>Analyzer, Java source, Profiler, log file. """ is raw string as well, so
>>that \ does not need to be doubled, and it can contain tab characters as
>>well. It would be really great for database development. Of course, if
>>you have some data-aware control/component, you can write the SQL there.
>>But unfortunately you can't do it in Java.
>
> True, but at least all it takes is a simple System.out.println or log.debug
> to see if your SQL is correct.

I have to correct myself. In Python, """ can have escape sequences like
any normal string. Only r" has limited escape sequences:

"When an "r" or "R" prefix is present, a character following a backslash
is included in the string without change, and all backslashes are left
in the string. For example, the string literal r"\n" consists of two
characters: a backslash and a lowercase "n". String quotes can be
escaped with a backslash, but the backslash remains in the string; for
example, r"\"" is a valid string literal consisting of two characters: a
backslash and a double quote; r"\" is not a valid string literal (even a
raw string cannot end in an odd number of backslashes). Specifically, a
raw string cannot end in a single backslash (since the backslash would
escape the following quote character). Note also that a single backslash
followed by a newline is interpreted as those two characters as part of
the string, not as a line continuation."

So r" string can have " in it. Also:

"When an "r" or "R" prefix is used in conjunction with a "u" or "U"
prefix, then the \uXXXX escape sequence is processed while all other
backslashes are left in the string. For example, the string literal
ur"\u0062\n" consists of three Unicode characters: `LATIN SMALL LETTER
B', `REVERSE SOLIDUS', and `LATIN SMALL LETTER N'. Backslashes can be
escaped with a preceding backslash; however, both remain in the string.
As a result, \uXXXX escape sequences are only recognized when there are
an odd number of backslashes. "

"In triple-quoted strings, unescaped newlines and quotes are allowed
(and are retained), except that three unescaped quotes in a row
terminate the string. (A ``quote'' is the character used to open the
string, i.e. either ' or ".)"

All this is a bit confusing compared to how C# handles this:

"A verbatim string literal consists of an @ character followed by a
double-quote character, zero or more characters, and a closing
double-quote character. A simple example is @"hello". In a verbatim
string literal, the characters between the delimiters are interpreted
verbatim, the only exception being a quote-escape-sequence. In
particular, simple escape sequences and hexadecimal and Unicode escape
sequences are not processed in verbatim string literals. A verbatim
string literal may span multiple lines."

This sounds good, except that " has to be doubled. The ideal Java
verbatim string would be something like:

        String s = @"can have anything here,
including line feeds, file paths like:
        C:\Windows
even tabulators. How to embed " then? Simple, the terminating " must
have @ after it."@;

This way the only string sequence that is now allowed is the closing "@.

Perhaps there could be better terminators, like {"How about this then?"}
or <"And this?">. There are also a couple of characters that are not
normally used: ´Would this be cool?´ `Probably not.`

Harri



Relevant Pages

  • Re: Prothon should not borrow Python strings!
    ... """It does not make sense to have a string without knowing what encoding ... same cul de sac as Python. ... Prothon_String_As_ASCII // raises error if there are high characters ... Python's split between byte strings and Unicode strings is ...
    (comp.lang.python)
  • Re: Letter to US Sen. Byron Dorgan re unpaid overtime
    ... put them in stupid places. ... Programming is difficult (as you must surely appreciate, ... > strings will be in the range 1...1000 characters. ... impose an artificially small limit on string length." ...
    (comp.programming)
  • Re: Byte Array to String
    ... retrieved text will mismatch the original characters. ... encoding the characters. ... Dim strFileData as String ...
    (microsoft.public.dotnet.framework.aspnet)
  • Re: A note on personal corruption as a result of using C
    ... impossible to write effective string validation routines by definition ... (Note that a string literal may contain embedded null characters; ... without resorting to abusive language. ... In practice, programmers typically use "struct" ...
    (comp.programming)
  • Re: Self-Documenting Code Contest
    ... self-documenting. ... query:= 'documenting' asSortedCollection. ... string size < query size ... two words becomes a set of Characters. ...
    (comp.lang.smalltalk)