l i n u x - u s e r s - g r o u p - o f - d a v i s
L U G O D
 
Next Meeting:
January 6: Social gathering
Next Installfest:
TBD
Latest News:
Nov. 18: Club officer elections
Page last updated:
2001 Dec 30 17:05

The following is an archive of a post made to our 'vox-tech mailing list' by one of its subscribers.

Report this post as spam:

(Enter your email address)
Re: [vox-tech] C Obfuscator
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [vox-tech] C Obfuscator



Oh, okay.  Here's the excerpt from "The Design and Evolution of C++" by
Bjarne Stroustrup:

== BEGIN EXCERPT ==

6.5.3.2 Extended Character Sets

Support for a restricted character set representation for C++ is
essentially backward-looking.  A more interesting and difficult problem is
how to support extended character sets; that is, how to take advantage of
character sets with more characters than ASCII.  There are two distinct
problems:

   [1] How to support manipulation of extended character sets?
   [2] How to allow extended character sets in the source text of a C++
       program?

The C standards committee approached the former problem by defining a type
wchar_t to represent multi-byte characters.  In addition, a multi-byte
string type wchar_t[] and printf-family I/O for wchar_t were provided.
C++ continues in this direction by making wchar_t a proper type (rather
than merely a synonym for another type defined using typedef as it is in
C), by providing a standard string of wchar_t class called wstring, and by
supporting these types in stream I/O.

This supports only a single "wide character" type.  If a programmer needs
more types, say a Japanese character, a string of Japanese characters, a
Hebrew character, or a string of Hebrew characters, there are at least two
alternative approaches.  One can map these characters into a common
character set large enough to hold both, say, Unicode, and write code that
handles that using wchar_t.  Alternatively one can defined classes for
each kind of character and string, say, Jchar, Jstring, Hchar, and
Hstring, and have these classes supply the correct behavior for each.
Such classes ought to be generated from a common template.  My experience
is that either approach can work, but that any decision that touches
internationalization and multiple character sets becomes controversial and
emotional faster than any other kind of problem.

The question of if and how to allow extended character sets to be used in
C++ program text is no less tricky.  Naturally, I would like to use the
Danish words for apple, tree, boat, and island in programs dealing with
such concepts.  Allowing {ae}ble, tr{ae}, b{a^o}d, and {o/} in comments is
not difficult, and comments in languages other than English are indeed not
uncommon.  Allowing extended character sets in identifiers is more
problematic.  In principle, I'd like to allow identifiers written in
Danish, Japanese, and Korean in a C or C++ program.  These are no serious
technical problems in doing that.  In fact, a local C compiler written by
Ken Thompson allows all unicode characters with no special meaning in C in
identifiers.

I worry about portability and comprehension, though.  The technical
portability problem can be handled.  However, English has an important
role as a common language for programmers, and I suspect that it would be
unwise to abandon that without serious consideration.  To most
programmers, a systematic use of Hebrew, Chinese, Korean, etc., would be a
significant barrier to comprehension.  Even my native Danish could cause
some headaches for the average English-speaking programmer.

The C++ committee hasn't made any decisions on this issue so far, but I
suspect it will have to and that every possible resolution will be
controversial.

- The Design and Evolution of C++, Bjarne Stroustrup, pages 161 to 162.
  (Typos mine.  Also, some unprintable special characters were creatively
  reproduced by yours truly.)

== END EXCERPT ==

The language Bjarne uses isn't as strong as I remember reading two summers
ago.  And I guess I conveniently decided to forget the last sentence in
the discussion ("The C++ committee hasn't made any decisions on this issue
so far...").  Anyway, I hope the excerpt is as interesting for someone
else on the list to read as it was for me!

-Mark

PS: This book is a really good interesting read.  Some of you know I've
quoted from it many times on the list, and I've also recommended it as a
must read to many.  There are so many features to C++ that sometimes you
gotta ask, "why was this designed this way?"  This book answers all those
questions.  Check it out!

On Fri, 11 May 2001, Micah Cowan wrote:

> On Fri, May 11, 2001 at 05:05:38PM -0700, Mark K. Kim wrote:
> > On Fri, 11 May 2001, Henry House wrote:
> >
> > > A slightly oblique question: does anyone know how to turn on national
> > > language character support in GCC? ANSI C++ allows all sorts of national
> > > characters in identifiers, but a simple test program:
> >
> > Hmmm... I'm fairly certain ANSI/ISO C++ doesn't allow international
> > characters in identifiers.  In "Design and Evolution of C++" (written just
> > before the official standardization of C++), Bjarne made it very clear
> > that he was against using international characters in identifiers, and
> > explained why, and stated that is why they aren't allowed.  So unless some
> > change occurred at the last minute, I'm pretty sure international
> > characters aren't allowed as identifiers in ANSI/ISO C++.
> >
> > -Mark
>
> It does - I just checked; and in the same manner as C supports it.
>
> The relevant quote from the standard seems to be (from a description
> of the first preprocessing phase):
>
>   Physical source file characters are mapped, in an
>   implementation-defined manner, to the basic source character set
>   (introducing new-line characters for end-of-line indicators) if
>   necessary.
>
> Since it's implementation-defined, that pretty much means it can map
> foreign characters however it wants - so chances are, it's
> locale-dependant.
>
> Micah
>

---
Mark K. Kim
http://www.cbreak.org/mark/
PGP key available upon request.


LinkedIn
LUGOD Group on LinkedIn
Sign up for LUGOD event announcements
Your email address:
facebook
LUGOD Group on Facebook
'Like' LUGOD on Facebook:

Hosting provided by:
Sunset Systems
Sunset Systems offers preconfigured Linux systems, remote system administration and custom software development.

LUGOD: Linux Users' Group of Davis
PO Box 2082, Davis, CA 95617
Contact Us

LUGOD is a 501(c)7 non-profit organization
based in Davis, California
and serving the Sacramento area.
"Linux" is a trademark of Linus Torvalds.

Sponsored in part by:
Sunset Systems
Who graciously hosts our website & mailing lists!