Re: Unicode conversion



djake@xxxxxxxxx wrote:
How can I convert char in wchar_t?
And how can I convert wchar_t in char?

Thanks to anyone

Basically, set the LC_CTYPE of the locale to tell the implementation what encoding your char string is in, then use wcstombs and mbstowcs functions to perform the conversion.


The following code demonstrates their use.

(Can an expert here please check my code? It seems to work fine, but there may be off-by-one errors in the memory allocation and/or error checking.)

#include <locale.h>
#include <stdio.h>
#include <stdlib.h>

char *   /// convert wchar_t string to char in given encoding
wchar_t_string_to_char(const wchar_t *src,
                       const char    *encoding)
{
  if(setlocale(LC_CTYPE, encoding) == NULL)
  {
    fprintf(stderr, "requested encoding unavailable\n");
    return NULL;
  }
  size_t n = wcstombs(NULL, src, 0);
  char *dst = malloc(n + 1);
  if(dst == NULL)
  {
    fprintf(stderr, "memory allocation failed\n");
    return NULL;
  }
  if(wcstombs(dst, src, n + 1) != n)
  {
    fprintf(stderr, "conversion failed\n");
    free(dst);
    return NULL;
  }
  return dst;
}

wchar_t *  /// convert char string in given encoding to wchar_t
char_string_to_wchar_t(const char *src,
                       const char *encoding)
{
  if(setlocale(LC_CTYPE, encoding) == NULL)
  {
    fprintf(stderr, "requested encoding unavailable\n");
    return NULL;
  }
  size_t n = mbstowcs(NULL, src, 0);
  wchar_t *dst = malloc((n + 1) * sizeof *dst);
  if(!dst)
  {
    fprintf(stderr, "memory allocation failed\n");
    return NULL;
  }
  if(mbstowcs(dst, src, n + 1) != n)
  {
    fprintf(stderr, "conversion failed\n");
    free(dst);
    return NULL;
  }
  return dst;
}

int  /// test the above functions
main(void)
{
  char utf8[] = {0xE4, 0xBD, 0xA0, 0xE5, 0xA5, 0xBD, 0};
  wchar_t unicode[] = {0x4F60, 0x597D, 0};

  const char *encoding = "en_US.UTF-8";  // or try en_US.ISO-8859-1 etc.

  char    *converted1 = wchar_t_string_to_char(unicode, encoding);
  wchar_t *converted2 = char_string_to_wchar_t(utf8,    encoding);

  if(converted1)
  {
    printf("Unicode converted to UTF8: ");
    for(char *p = converted1; *p; p++)
      printf("%X ", (unsigned)(unsigned char)*p);
    free(converted1);
  }

  if(converted2)
  {
    printf("\nUTF8 converted to Unicode: ");
    for(wchar_t *p = converted2; *p; p++)
      printf("%X ", (unsigned)*p);
    free(converted2);
  }

  putchar('\n');
  return 0;
}
.



Relevant Pages

  • Re: pointer questions
    ... example the conversion from a void * pointer to a char *). ... You need to understand the difference between conversion and casting. ... If, for instance, src is a void* pointer, then the following definition: ... To create a definition with an initializer for dst, it needs to start out as a declaration for dst: ...
    (comp.lang.c)
  • Re: static_cast signed to unsigned
    ... I think you can: make the function take pointer types so ... > that Src is deduced to char rather than char*, ... > that a conversion from Src to Dest exists. ...
    (comp.lang.cpp)
  • Re: pointer questions
    ... If src is an array, before the cast operator is applied the ... array with type pointer to element type. ... pointer the conversion is well defined. ... converted value is stored in the char* named dst. ...
    (comp.lang.c)
  • Re: bytes, chars, and strings, oh my!
    ... encoding may vary from system to system. ... (PostScript is a language that doesn't distinguish between byte and char, because it was invented back in the 1980s era). ... Its char->byte conversion is dropping the zero high-byte, and treating all chars beyond '\u00FF' as being illegal. ...
    (comp.lang.java.programmer)
  • Re: wofstream
    ... >I found that wide char file stream doesn't write national symbols. ... How can I switch text encoding? ... that includes a codecvt facet for the conversion. ...
    (microsoft.public.vc.stl)