Re: Filenames in Ada

Björn Persson wrote:

> Martin Krischik wrote:
>> But I take it that on a Russian system the Windows-1251 code
>> page is active and all filenames are expressed using that and not Latin
>> 1.
> Speaking of that, do you know how to find out which code page is active?
> Can I get it from C's nl_langinfo like in Unix?

Depends. cygwin and mingw do indeed support those functions - but MS-C does
not. MS is a real pain when it comes to standart functionality. I guess
they don't want to support full ANSI/ISO C: It cost money and my lead to
less custom as porting away becomes easier.

>> It is sad that XML/Ada has no UCS-2 and UCS-4 convertion available - but
>> AdaCL allready has that - so not problem for you really.
> My main problem is lack of time. I found a job a year ago. Darn! ;-)

Damm - I got the bloody same problem ;-) .

>>>It seems that the right thing to do would be to tap into the Gnat
>>>library and make UTF-16 (or UCS-2) versions of the file operations. It
>>>could be as easy as changing the parameter type and replacing calls to
>>>the Windows functions with their -W equivalents, or it could be very
>> I had that idea as well and did take a look. Lots of "pragma Import"
>> there.
> I take it you mean that's a complication. You can't just import other
> functions?

Well it usually is some

pragma Import (C, ....., "_gnat_.....");

So we have Ada -> C -> libC and we don't only have to replace the Ada
functions but the C functions as well.

>> Well there seems an better article:


> Well, it says "Windows stores the long file names on disk in Unicode",
> so now we have to guess which encoding it is they call "Unicode". I'm
> guessing UTF-16, because UTF-16 was defined by Unicode while I think
> UCS-2 was defined by ISO.

I am more pessimistic here. If MS likes to implement an encoding with
variable length why would they have used an 16 encoding for long filenames
in VFAT? At that time MP3 libraries where uncommon and almost all filenames
where plain ASCII.

> Maybe you can do an experiment? Create a file with a surrogate pair in
> the name and see how it's shown in the file manager. You may get boxes
> or something if Windows doesn't have the glyphs, but if you see only one
> box it's obviusly been interpreted as UTF-16. If you see two boxes or
> you get some error then Windows seems to expect UCS-2.

Anybody got an example? If not I get one from Wikibooks. There is a good
book an "Mandarin Chinese" there which should have enough examples ;-).

>> I wonder about that \\?\ stuff and what it really means
> It looks like a crude hack to allow longer paths by bypassing parts of
> the library.

Well, Windows is full of Hacks as well. An so is Unix. Just yesterday I read
that the "Uni" in Unix stands for "single user". Originally Unix was a
scaled down single user alternative to "Multrix". Well that explains a lot.
Indeed there is only one real user on Unix - the user with the ID 0.

> Anyway it's not relevant to the question of character
> encodings.

The question is: Which encoding is used for \\?\ Filenames. If it is UTF-8
it would solve some of our problems.


Ada programming at: