Re: Filenames in Ada



Björn Persson wrote:

> Let's see if I understand the problem. Windows has two functions for
> each file operation, one -A version that expects or returns a file name
> in some 8-bit encoding like Windows-1252, and one -W version that
> expects or returns a file name in UTF-16 or maybe UCS-2?

Well the Windows API in question where designed at a time when UTF-16 and
UCS-2 where still the same - that is Unicode had no codes defined above the
65535 border. At that time programmers did not care - or understood - the
difference between the two.

VFAT-32 is most likely a UCS-2 filesystem (anyone from china to confirm
that?). I remember an article about the "new" VFAT technology wasting
"enormous" amount of storrage using UCS-2 for character encoding.

Obviously the article came from an Latin-1 based country ;-) .

> And all the
> file operations in the Ada library take and return file names as String,
> that is, Latin-1? And Gnat's implementation pretends that Latin-1 is
> identical to whatever 8-bit encoding Windows is using, and passes these
> Strings to Windows' -A functions, leaving you with no way to handle
> filenames that can't be expressed in said 8-bit encoding? Is that right?

Yes indeed. But I take it that on a Russian system the Windows-1251 code
page is active and all filenames are expressed using that and not Latin 1.

> It is my intention to add an encoding-aware interface to Ada.Directories
> under EAstrings.OS. For that to work reasonably on Windows, this problem
> needs to be solved. I suppose I also need to fix this in EAstrings.IO. I
> will need help from a Windows programmer to do this. (Of course I also
> need to get transcoding implemented on Windows before EAstrings will be
> of any use there.)

It is sad that XML/Ada has no UCS-2 and UCS-4 convertion available - but
AdaCL allready has that - so not problem for you really.

> It seems that the right thing to do would be to tap into the Gnat
> library and make UTF-16 (or UCS-2) versions of the file operations. It
> could be as easy as changing the parameter type and replacing calls to
> the Windows functions with their -W equivalents, or it could be very
> hairy.

I had that idea as well and did take a look. Lots of "pragma Import" there.

> We'll need to determine whether it is UTF-16 or UCS-2. This page lists
> code page numbers for a whole lot of encodings, but UTF-16 is missing:
>
>
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/unicode_81rn.asp
>
> I take that as a hint that UTF-16 is Windows' idea of wide strings, and
> that all the others are considered "multi-byte character sets" or
> whatever the term is.

Well there seems an better article:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/fileio/fs/naming_a_file.asp

I wonder about that \\?\ stuff and what it really means

Martin
--
mailto://krischik@xxxxxxxxxxxxxxxxxxxxx
Ada programming at: http://ada.krischik.com
.



Relevant Pages

  • Re: unicode in ruby
    ... wchar_t on MacOS X and Windows is UTF-16. ... superior Unicode support than anything else) both use UTF-16 as the ... native filename encoding. ...
    (comp.lang.ruby)
  • WM_CHAR
    ... Note that WM_CHAR uses 16-bit Unicode ... of the character key that was pressed. ... version of Windows. ... WM_CHAR chooses between UTF-16 and ASCII depending on whether the window ...
    (microsoft.public.win32.programmer.kernel)
  • Re: Is WideCharToMultiByte(...) works fine If unicode char is more than 2 byte???
    ... > Unicode would expand beyond UCS-2. ... > systems where the operating system uses UTF-8 and wchar_t is 32 bits, ... > and I feel that this might be easier to use than UTF-16. ... Mihai Nita [Microsoft MVP, Windows - SDK] ...
    (microsoft.public.vc.mfc)
  • Re: UTF-8 and case-insensitivity
    ... should have said UTF-16 (ie. the variable length, 2 byte encoding). ... Samba currently treats the bytes on the wire from windows as UCS-2 (a ... send the line "unsubscribe linux-kernel" in ...
    (Linux-Kernel)
  • Re: Unicode and Zipfile problems
    ... >but you HAVE TO care, since on MS Windows, if a filename is unicode, ... UTF-16 is not even ASCII compatible, ... 16-bit Windows) and a few of these actually work. ... "Strange problem with encoding" from today) there will probably be no ...
    (comp.lang.python)