Entering strings as user input but interpreting as Python input (sort of)



Hi:

I'm writing a Python program, a hex line editor, which takes in a line of input from the user such as:

>>> cmd = raw_input('-').split()
-e 01 02 "abc def" 03 04
>>> cmd
['e', '01', '02', '"abc', 'def"', '03', '04']

Trouble is, I don't want to split the quoted part where the space occurs.

So I would prefer the resulting list to contain:

['e', '01', '02', '"abc def"', '03', '04']

Furthermore, if the user entered:

-e 01 02 "abc \"def\"\r\n" 03 04

I would want the quoted part to be interpreted as if I entered it into Python itself (recognize escape sequences, and not split at spaces) as:

>>> s = '"abc \"def\"\r\n"'
>>> print s
"abc "def"
"
>>>

In other words, if a quoted string occurs in the user input, I want only that part to be treated as a Python string. Even more horrifying is that I want the outer quotes to remain as is (which Python doesn't do, of course).

I have begun to solve this problem by winding up writing what amounts to a custom split() method (I call it hsplit(), a function) which is a DFA that implements some of Python's string lexical analysis. Code shown below.

The point of this in the context of the hex editor is that the user should be able to enter hex bytes without qualifications like "0xXX" but rather as simply: "0A 1B 2C" etc. but also be able to input a string without having to type in hex ASCII codes. Hence the following input would be valid (the 'e' is the edit command to the editor):

-e 01 02 "a string with newline\n" 3d 4e 5f
-p


Is there a simpler way?

----------------------------------------------------------------
HSTRIP_NONE = 0
HSTRIP_IN_WORD = 1
HSTRIP_IN_QUOTE = 2
HSTRIP_IN_ESC = 3

def hsplit(string):
lst = []
word = []
state = HSTRIP_NONE # not in word
for c in string:

if state == HSTRIP_NONE:
if c == '"':
word.append(c)
state = HSTRIP_IN_QUOTE
elif c != ' ':
word.append(c)
state = HSTRIP_IN_WORD
# else c == ' ', so pass
elif state == HSTRIP_IN_QUOTE:
if c == '"':
word.append(c)
lst.append(''.join(word))
word = []
state = HSTRIP_NONE
elif c == '\\':
state = HSTRIP_IN_ESC
else:
word.append(c)
elif state == HSTRIP_IN_ESC:
if c == '\\':
word.append(c)
state = HSTRIP_IN_QUOTE
elif c == '"':
word.append(c)
state = HSTRIP_IN_QUOTE
elif c == 'n':
word.append('\n')
state = HSTRIP_IN_QUOTE
else: # c == non escape or quote
# for unrecognized escape, just put in verbatim
word.append('\\')
word.append(c)
state = HSTRIP_IN_QUOTE
else: # if state == HSTRIP_IN_WORD
if c == ' ' or c == '"':
lst.append(''.join(word))
if c == '"':
word = [c]
state = HSTRIP_IN_QUOTE
else:
word = []
state = HSTRIP_NONE
else:
word.append(c)
# this only happens if you run out of chars in string before a state change:
if word: lst.append(''.join(word))
return lst



----------------------------------------------------------------


--
Good day!

________________________________________
Christopher R. Carlen
Principal Laser&Electronics Technologist
Sandia National Laboratories CA USA
crcarleRemoveThis@xxxxxxxxxxxxxxx
NOTE, delete texts: "RemoveThis" and
"BOGUS" from email address to reply.
.



Relevant Pages

  • To add or to remove line numbers...
    ... using a line editor, he was expecting, of course, that the ... Note that positive values yield a string with a ... ENTCOM.BAS, you need to print in nice column the (hex) address, ... I copied my biggest ASM file into the RAMdisk, ...
    (comp.os.cpm)
  • Hex editor display - can this be more pythonic?
    ... I'm building a hex line editor as a first real Python programming exercise. ... I had considered using the .translatemethod of strings, however this would require a larger translation table than my printable string. ... Where printing chars are shown in parenthesis, characters with Python escape sequences will be shown as their escapes in parens., while non-printing chars with no escapes will be shown with nothing in parens. ...
    (comp.lang.python)
  • Re: How to stop print printing spaces?
    ... I've conjured up the idea of building a hex line editor as a first real Python programming exercise. ... which I get if I omit the space in the format string above. ...
    (comp.lang.python)
  • Re: n00bie wants advice.
    ... has only six hex numbers otherwise the results get rather large. ... Indenting is normally 4 spaces in Python ... Use string formatting for better readability. ...
    (comp.lang.python)
  • Re: socket send query
    ... > if i send the hex data values with a socket send then python assumes i ... > their corresponding hex values. ... It sounds to me like you have a string with two characters per byte ...
    (comp.lang.python)