Re: reading filenames from stdin - with umlauts?



Dan Stromberg wrote:
However, to my disappointment, the java version of the program can't seem to deal with filenames that have umlauts in them. Filenames using only characters in the English alphabet seem fine.

I suspect the problem is that the file_name_, as it appears in a Linux ext3 filesystem, has an 8 bit per character representation, but java wants to convert the string I read from stdin to a 16 bit per character representation, and then doesn't reverse the conversion when I go to open the file by its name.

No. Java /always/ uses 16-bit characters; if it did that, it couldn't open files at all.

Try running this program:

import java.io.File;

public final class DirScan {

public static void main(final String[] args) {
for (final String dirName : args) {
System.out.println(dirName);
final File dir = new File(dirName);
final File[] files = dir.listFiles();
for (final File file : files) {
final String fileName = file.toString();
System.out.printf(" %-25s ", fileName);
for (int i = 0; i < fileName.length(); ++i)
System.out.printf(" %04X", (int) fileName.charAt(i));
System.out.println();
}
}

}

}

....specifying one or more directories as arguments.


--
John W. Kennedy
"Never try to take over the international economy based on a radical feminist agenda if you're not sure your leader isn't a transvestite."
-- David Misch: "She-Spies", "While You Were Out"
.



Relevant Pages

  • Re: file delete routine is intermittent
    ... terminator character to every string. ... I wondered about the double terminator at the end, ... attempt to delete additional filenames and wouldn't skip filenames as the OP ... > Each file name must be terminated by a single NULL character. ...
    (microsoft.public.vb.winapi)
  • Re: how to gets with an arbitrary "newline" character
    ... # an I/O channel that ends in an arbitrary character? ... # within each record (and use string map to translate). ... proc getn {channel args} { ...
    (comp.lang.tcl)
  • Re: bash: How to make a filename double-quote friendly?
    ... If the string is interpolated then \a becomes a BEL character, ... accpets them as filenames) will not interpolate anything. ...
    (comp.unix.shell)
  • [TOMOYO #15 3/8] Common functions for TOMOYO Linux.
    ... This file contains common functions (e.g. policy I/O, pattern matching). ... Since TOMOYO Linux is a name based access control, ... TOMOYO Linux's string manipulation functions make reviewers feel crazy, ... the Linux kernel accepts all characters but NUL character ...
    (Linux-Kernel)
  • RfD: Escaped Strings version 4
    ... the S" string can only contain printable characters, ... the S" string cannot contain the '"' character, ... as an escape character for the entry of characters that cannot be ... \b BS (backspace, ASCII 8) ...
    (comp.lang.forth)