Re: Filenames in Ada



Björn Persson wrote:

> Martin Krischik wrote:
>> But I take it that on a Russian system the Windows-1251 code
>> page is active and all filenames are expressed using that and not Latin
>> 1.
>
> Speaking of that, do you know how to find out which code page is active?
> Can I get it from C's nl_langinfo like in Unix?

Depends. cygwin and mingw do indeed support those functions - but MS-C does
not. MS is a real pain when it comes to standart functionality. I guess
they don't want to support full ANSI/ISO C: It cost money and my lead to
less custom as porting away becomes easier.

>> It is sad that XML/Ada has no UCS-2 and UCS-4 convertion available - but
>> AdaCL allready has that - so not problem for you really.
>
> My main problem is lack of time. I found a job a year ago. Darn! ;-)

Damm - I got the bloody same problem ;-) .

>>>It seems that the right thing to do would be to tap into the Gnat
>>>library and make UTF-16 (or UCS-2) versions of the file operations. It
>>>could be as easy as changing the parameter type and replacing calls to
>>>the Windows functions with their -W equivalents, or it could be very
>>>hairy.
>>
>> I had that idea as well and did take a look. Lots of "pragma Import"
>> there.
>
> I take it you mean that's a complication. You can't just import other
> functions?

Well it usually is some

pragma Import (C, ....., "_gnat_.....");

So we have Ada -> C -> libC and we don't only have to replace the Ada
functions but the C functions as well.

>> Well there seems an better article:

>>http://msdn.microsoft.com/library/default.asp?url=/library/en-us/fileio/fs/naming_a_file.asp

> Well, it says "Windows stores the long file names on disk in Unicode",
> so now we have to guess which encoding it is they call "Unicode". I'm
> guessing UTF-16, because UTF-16 was defined by Unicode while I think
> UCS-2 was defined by ISO.

I am more pessimistic here. If MS likes to implement an encoding with
variable length why would they have used an 16 encoding for long filenames
in VFAT? At that time MP3 libraries where uncommon and almost all filenames
where plain ASCII.

> Maybe you can do an experiment? Create a file with a surrogate pair in
> the name and see how it's shown in the file manager. You may get boxes
> or something if Windows doesn't have the glyphs, but if you see only one
> box it's obviusly been interpreted as UTF-16. If you see two boxes or
> you get some error then Windows seems to expect UCS-2.

Anybody got an example? If not I get one from Wikibooks. There is a good
book an "Mandarin Chinese" there which should have enough examples ;-).

>> I wonder about that \\?\ stuff and what it really means
>
> It looks like a crude hack to allow longer paths by bypassing parts of
> the library.

Well, Windows is full of Hacks as well. An so is Unix. Just yesterday I read
that the "Uni" in Unix stands for "single user". Originally Unix was a
scaled down single user alternative to "Multrix". Well that explains a lot.
Indeed there is only one real user on Unix - the user with the ID 0.

> Anyway it's not relevant to the question of character
> encodings.

The question is: Which encoding is used for \\?\ Filenames. If it is UTF-8
it would solve some of our problems.

Martin

--
mailto://krischik@xxxxxxxxxxxxxxxxxxxxx
Ada programming at: http://ada.krischik.com
.



Relevant Pages

  • Re: python 3.1 unicode question
    ... How do I set the encoding to something correct to ... Clearly windows knows how to display it. ... If you are running on a Linux system then the filenames are stored encoded ... Python 3.1 uses the surrogate escapes so that you can at least work with ...
    (comp.lang.python)
  • Re: [kde] Changing encoding in Dolphin
    ... If you have got a windows box that can read the filenames, ... 950 is the default multibyte encoding I have set in my Windows XP ... Archives: http://lists.kde.org/. ...
    (KDE)
  • Re: i18n: looking for expertise
    ... On Mac OS X, the encoding is ``utf-8''. ... On Windows NT+, file names are Unicode natively, so no conversion is ... It looks like you don't need to do any encoding of filenames however, ...
    (comp.lang.python)
  • unknown encoding in CD from ms windows
    ... files from a CD recorded in a Windows, because of accents in filenames. ... I *think* the CD was recorded in a windows version previous to XP, ... None of these are the encoding. ...
    (Ubuntu)
  • Re: Filesystem syntax constraints under Windows
    ... you bypass Windows' rules for names and write ... This layer does not use the Win32 API, ... and can be different filenames). ... restricted UTF-16 (no multi-point characters) for filenames and does not ...
    (comp.arch.embedded)