Re: Unrecognized escape sequences in string literals



On Mon, 10 Aug 2009 00:37:33 -0700, Carl Banks wrote:

On Aug 9, 11:10 pm, Steven D'Aprano
<ste...@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
On Sun, 09 Aug 2009 18:34:14 -0700, Carl Banks wrote:
Why should a backslash in a string literal be an error?

Because the behavior of \ in a string is context-dependent, which
means a reader can't know if \ is a literal character or escape
character without knowing the context, and it means an innocuous
change in context can cause a rather significant change in \.

*Any* change in context is significant with escapes.

"this \nhas two lines"

If you change the \n to a \t you get a significant difference. If you
change the \n to a \y you get a significant difference. Why is the
first one acceptable but the second not?

Because when you change \n to \t, you've haven't changed the meaning of
the \ character;

I assume you mean the \ character in the literal, not the (non-existent)
\ character in the string.


but when you change \n to \y, you have, and you did so
without even touching the backslash.

Not at all.

'\n' maps to the string chr(10).
'\y' maps to the string chr(92) + chr(121).

In both cases the backslash in the literal have the same meaning: grab
the next token (usually a single character, but not always), look it up
in a mapping somewhere, and insert the result in the string object being
built.

(I don't know if the *implementation* is precisely as described, but
that's irrelevant. It's still functionally a mapping.)



IOW it's an error-prone mess.

I've never had any errors caused by this.

Thank you for your anecdotal evidence. Here's mine: This has gotten me
at least twice, and a compiler complaint would have reduced my bug-
hunting time from tens of minutes to ones of seconds. [Aside: it was
when I was using Python on Windows for the first time]

Okay, that's twice in, how many years have you been programming?

I've mistyped "xrange" as "xrnage" two or three times. Does that make
xrange() "an error-prone mess" too? Probably not. Why is my mistake my
mistake, but your mistake the language's fault?


[...]

Oh, wait, no, I tell I lie -- I *have* seen people reporting "bugs" here
caused by backslashes. They're invariably Windows programmers writing
pathnames using backslashes, so I'll give you that one: if you don't know
that Python treats backslashes as special in string literals, you will
screw up your Windows pathnames.

Interestingly, the problem there is not that \y resolves to literal
backslash followed by y, but that \t DOESN'T resolve to the expected
backslash-t. So it seems to me that the problem for Windows coders is not
that \y doesn't raise an error, but the mere existence of backslash
escapes.



Someone (obviously not you because you're have perfect knowledge of the
language and 100% situation awareness at all times) might have a string
like "abcd\stuv" and change it to "abcd\tuvw" without even thinking
about the fact that the s comes after the backslash.

Deary me. And they might type "4+15" instead of "4*51", and now
arithmetic is an "error-prone mess" too. If you know of a programming
language which can prevent you making semantic errors, please let us all
know what it is.

If you edit code without thinking, you will be burnt, and you get *zero*
sympathy from me.


Worst of all: they might not even notice the error, because the repr of
this string is:

'abcd\tuwv'

They might not notice that the backslash is single, because (unlike you)
mortal fallible human beings don't always register tiny details like a
backslash being single when it should be double.

"Help help, 123145 looks too similar to 1231145, and now I calculated my
taxes wrong and will go to jail!!!"


Point is, this is a very bad inconsistency. It makes the behavior of \
impossible to learn by analogy, now you have to memorize a list of
situations where it behaves one way or another.

No, you don't "have" to memorize anything, you can go right ahead and
escape every backslash, as I did for years. Your code will still work
fine.

You already have to memorize what escape codes return special characters.
The only difference is whether you learn "...and everything else raises
an exception" or "...and everything else is returned unchanged".

There is at least one good reason for preferring an error, namely that it
allows Python to introduce new escape codes without going through a long,
slow process. But the rest of these complaints are terribly unconvincing.



--
Steven
.



Relevant Pages

  • Re: gfortran diagnostics and so on
    ... Well, in f0003, backslash is part of the standard Fortran character set. ... Because the backslash is part of the standard Fortran character set, the default behavior should be the printable character, **NOT** some kind of magic introductory character that transforms the interpretation of following character. ... The one I like best is to use one of the popular extensions to designate a particular literal string according to the C language. ...
    (comp.lang.fortran)
  • Re: Convert to /
    ... |> Why is it so hard to convert backslashes to forward slashes in java? ... | character immediately after the colon would be the tab ... | get a backslash character into a string literal in Java ...
    (comp.lang.java.help)
  • Re: Unrecognized escape sequences in string literals
    ... a reader can't know if \ is a literal character or escape character ... without knowing the context, and it means an innocuous change in context ... thinking about the fact that the s comes after the backslash. ...
    (comp.lang.python)
  • Re: Unrecognized escape sequences in string literals
    ... how unrecognized escape sequences are treated in Python. ... a backslash is just an ordinary character, ... Why should a backslash in a string literal be an error? ...
    (comp.lang.python)
  • Re: Raw String Question
    ... It seems the parser is interpreting the backslash as an escape ... character in a raw string if the backslash is the last character. ...
    (comp.lang.python)

Loading