Re: A Unicode problem -HELP



OK, I apologise for not being clearer.

1. Here is my input data file, line 2:
gn1:1,1.2 R")$I73YT R")$IYT@ncfsa

2. Here is my output data file, line 2:
u'gn', u'1', u'1', u'1', u'2', u'-', u'R")$I73YT', u'R")$IYT',
u'R")$IYT', u'@', u'ncfsa', u'nc', '', '', '', u'f', u's', u'a', '',
'', '', '', '', '', '', '', u'B.:R")$I^YT', u'b.:cv)cv^yc', '\xc9\x94'

3. Here is my main program:
# -*- coding: UTF-8 -*-
import codecs

import splitFunctions
import surfaceIPA

# Constants for file location

# Working directory constants
dir_root = 'E:\\'
dir_relative = '2 Core\\2b Data\\Data Working\\'

# Input file constants
input_file_name = 'in.grab.txt'
input_file_loc = dir_root + dir_relative + input_file_name
# Initialise input file
input_file = codecs.open(input_file_loc, 'r', 'utf-8')

# Output file constants
output_file_name = 'out.grab.txt'
output_file_loc = dir_root + dir_relative + output_file_name
# Initialise output file
output_file = codecs.open(output_file_loc, 'w', 'utf-8') # unicode

i = 0
for line in input_file:
if line[0] != '>': # Ignore headers
i += 1
if i != 1:
word_info = splitFunctions.splitGrab(line, i)
parse=splitFunctions.splitParse(word_info[10])
gloss=surfaceIPA.surfaceIPA(word_info[6],word_info[8],word_info[9],parse)
a=str(word_info + parse + gloss).encode('utf-8')
a=a[1:len(a)-1]
output_file.write(a)
output_file.write('\n')

input_file.close()
output_file.close()

print 'done'


4. Here is my problem:
At the end of my output file, where my unicode character \u0254 (OPEN
O) appears, the file has '\xc9\x94'

What I want is an output file like:

'gn', '1', '1', '1', '2', '-', ..... 'ɔ'

where ɔ is an open O, and would display correctly in the appropriate
font.

Once I can get it to display properly, I will rewrite gloss so that it
returns a proper translation of 'R")$I73YT', which will be a string of
unicode characters.

Is this clearer? The other two functions are basic. splitGrab turns
'gn1:1,1.2 R")$I73YT R")$IYT@ncfsa' into 'gn 1 1 1 2 R")$I73YT R")$IYT
@ ncfsa' and splitParse turns the final piece of this 'ncfsa' into 'n c
f s a'. They have to be done separately as splitParse involves some
translation and program logic. SurfaceIPA reads in 'R")$I73YT' and
other data to produce the unicode string. At the moment it just returns
two dummy strings and u'\u0254'.encode('utf-8').

All help is appreciated!

Thanks

.



Relevant Pages

  • Re: lightweight access to large data structures?
    ... what data base I should use for this purpose. ... how do you arrive at the queries that you will submit to ... you can often just arrange things such that just sorting the data file ... string of text, and reparse that string each time you need to access the ...
    (comp.lang.perl.misc)
  • RE: No data in Backend
    ... Dim LinkPathFile As String ... 'Delete the previous backup file if it exists. ... Data file and make backup to new location with new name. ...
    (microsoft.public.access.modulesdaovba)
  • Excel function to search and access database problem
    ... I had a sample data file that I saved as a CSV file, ... descriptions, etc), the excel file doesn't update when loaded, and I get an ... Dim pathlink As String ...
    (microsoft.public.excel.programming)
  • Breaking down a given data file into seperate arrays
    ... What I am trying/attempting to do it break down a given data file containing ... one line of data into seperate arrays (first and last names, gender, and 5 ... String data; ... while (pos!= line.length()) { ...
    (comp.lang.java.help)
  • Re: Program doesnt when user changed
    ... ' to return the full path of your data file ... Public Function MyDataFullPath(ByVal sDataFileName As String) As String ... >> I couldn't find the corresponding API declaration from my VB6 API ... I don't use an API viewer. ...
    (microsoft.public.vb.general.discussion)