Re: Filenames in Ada



Martin Krischik wrote:
But I take it that on a Russian system the Windows-1251 code
page is active and all filenames are expressed using that and not Latin 1.

Speaking of that, do you know how to find out which code page is active? Can I get it from C's nl_langinfo like in Unix?


It is sad that XML/Ada has no UCS-2 and UCS-4 convertion available - but
AdaCL allready has that - so not problem for you really.

My main problem is lack of time. I found a job a year ago. Darn! ;-)

It seems that the right thing to do would be to tap into the Gnat
library and make UTF-16 (or UCS-2) versions of the file operations. It
could be as easy as changing the parameter type and replacing calls to
the Windows functions with their -W equivalents, or it could be very
hairy.

I had that idea as well and did take a look. Lots of "pragma Import" there.

I take it you mean that's a complication. You can't just import other functions?


Well there seems an better article:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/fileio/fs/naming_a_file.asp

Well, it says "Windows stores the long file names on disk in Unicode", so now we have to guess which encoding it is they call "Unicode". I'm guessing UTF-16, because UTF-16 was defined by Unicode while I think UCS-2 was defined by ISO.


Maybe you can do an experiment? Create a file with a surrogate pair in the name and see how it's shown in the file manager. You may get boxes or something if Windows doesn't have the glyphs, but if you see only one box it's obviusly been interpreted as UTF-16. If you see two boxes or you get some error then Windows seems to expect UCS-2.

I wonder about that \\?\ stuff and what it really means

It looks like a crude hack to allow longer paths by bypassing parts of the library. Anyway it's not relevant to the question of character encodings.


--
Björn Persson                              PGP key A88682FD
                   omb jor ers @sv ge.
                   r o.b n.p son eri nu
.



Relevant Pages

  • Re: unicode in ruby
    ... doesn't support unicode strings natively? ... (When Unix filesystems can write UTF-16 as ... to use decomposed characters instead of composed characters (e.g., ... even compress repetitive text which no encoding can. ...
    (comp.lang.ruby)
  • Re: unicode in ruby
    ... doesn't support unicode strings natively? ... put on Unix ages ago. ... (When Unix filesystems can write UTF-16 as ... translate to UTF-8 and/or follow the nonsensical POSIX rules for native ...
    (comp.lang.ruby)
  • Re: Trouble importing foreign language accents into Access 2003
    ... Unicode file. ... to only a field that has the accents, and I save it first in UTF-8, then ... I have also tried UTF-16, with and without BOM. ...
    (microsoft.public.access.externaldata)
  • Re: Case-sensitivity as option?
    ... Code points beyond 0x10FFFF cannot be encoded with UTF-16, ... it is unlikely that Unicode will ... Windows to UTF-8. ... encode them with normal surrogates. ...
    (comp.lang.forth)
  • WM_CHAR
    ... Note that WM_CHAR uses 16-bit Unicode ... of the character key that was pressed. ... version of Windows. ... WM_CHAR chooses between UTF-16 and ASCII depending on whether the window ...
    (microsoft.public.win32.programmer.kernel)