Re: Filenames in Ada



Let's see if I understand the problem. Windows has two functions for each file operation, one -A version that expects or returns a file name in some 8-bit encoding like Windows-1252, and one -W version that expects or returns a file name in UTF-16 or maybe UCS-2? And all the file operations in the Ada library take and return file names as String, that is, Latin-1? And Gnat's implementation pretends that Latin-1 is identical to whatever 8-bit encoding Windows is using, and passes these Strings to Windows' -A functions, leaving you with no way to handle filenames that can't be expressed in said 8-bit encoding? Is that right?

It is my intention to add an encoding-aware interface to Ada.Directories under EAstrings.OS. For that to work reasonably on Windows, this problem needs to be solved. I suppose I also need to fix this in EAstrings.IO. I will need help from a Windows programmer to do this. (Of course I also need to get transcoding implemented on Windows before EAstrings will be of any use there.)

It seems that the right thing to do would be to tap into the Gnat library and make UTF-16 (or UCS-2) versions of the file operations. It could be as easy as changing the parameter type and replacing calls to the Windows functions with their -W equivalents, or it could be very hairy.

We'll need to determine whether it is UTF-16 or UCS-2. This page lists code page numbers for a whole lot of encodings, but UTF-16 is missing:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/unicode_81rn.asp

I take that as a hint that UTF-16 is Windows' idea of wide strings, and that all the others are considered "multi-byte character sets" or whatever the term is.

--
Björn Persson                              PGP key A88682FD
                   omb jor ers @sv ge.
                   r o.b n.p son eri nu
.



Relevant Pages

  • Re: RfD: XCHAR wordset
    ... It's somewhat worse, because Windows has "A" prototypes, which convert the ... current code page into UTF-16 on the fly. ... Actually, it might be possible to change the current code page to UTF-8, but ... Windows strings are usually not C strings, ...
    (comp.lang.forth)
  • Re: RfD: XCHAR wordset
    ... Unfortunately, on first analysis, this is one proposal that Win32Forth ... Windows is UTF-16, ... Windows has 'A' type prototypes for strings (which use ...
    (comp.lang.forth)
  • Re: RfD: XCHAR wordset
    ... But then, bigFORTH/MINOS uses only low-level stuff of Windows, ... The>utf-16 will be the first ... and reinitialize the temporary buffer. ... Both take Forth strings and return ...
    (comp.lang.forth)
  • Re: mount_ntfs(8) and filesize ... 2GB limit?
    ... restore as per Windows SOP. ... So, "ntfscat" is able to handle large file operations in FreeBSD, ...
    (freebsd-questions)
  • Re: slow response
    ... Do you have any mapped/network drives that may not currently be available? ... Word may be waiting for Windows to find all of the 'drives' it's supposed to show when you use the 'Look in' choice in ... File=>Open or Insert from file operations' and will eventually time out if Word has been 'stood up' by Windows for too long. ...
    (microsoft.public.word.docmanagement)