0ec56ba325
From-SVN: r101130
160 lines
5.2 KiB
Plaintext
160 lines
5.2 KiB
Plaintext
From: herbs@cntc.com (Herb Sutter)
|
|
Subject: Guru of the Week #29: Solution
|
|
Date: 22 Jan 1998 00:00:00 GMT
|
|
Message-ID: <6a8q26$9qa@netlab.cs.rpi.edu>
|
|
Newsgroups: comp.lang.c++.moderated
|
|
|
|
|
|
.--------------------------------------------------------------------.
|
|
| Guru of the Week problems and solutions are posted regularly on |
|
|
| news:comp.lang.c++.moderated. For past problems and solutions |
|
|
| see the GotW archive at http://www.cntc.com. |
|
|
| Is there a topic you'd like to see covered? mailto:herbs@cntc.com |
|
|
`--------------------------------------------------------------------'
|
|
_______________________________________________________
|
|
|
|
GotW #29: Strings
|
|
|
|
Difficulty: 7 / 10
|
|
_______________________________________________________
|
|
|
|
|
|
>Write a ci_string class which is identical to the
|
|
>standard 'string' class, but is case-insensitive in the
|
|
>same way as the C function stricmp():
|
|
|
|
The "how can I make a case-insensitive string?"
|
|
question is so common that it probably deserves its own
|
|
FAQ -- hence this issue of GotW.
|
|
|
|
Note 1: The stricmp() case-insensitive string
|
|
comparison function is not part of the C standard, but
|
|
it is a common extension on many C compilers.
|
|
|
|
Note 2: What "case insensitive" actually means depends
|
|
entirely on your application and language. For
|
|
example, many languages do not have "cases" at all, and
|
|
for languages that do you have to decide whether you
|
|
want accented characters to compare equal to unaccented
|
|
characters, and so on. This GotW provides guidance on
|
|
how to implement case-insensitivity for standard
|
|
strings in whatever sense applies to your particular
|
|
situation.
|
|
|
|
|
|
Here's what we want to achieve:
|
|
|
|
> ci_string s( "AbCdE" );
|
|
>
|
|
> // case insensitive
|
|
> assert( s == "abcde" );
|
|
> assert( s == "ABCDE" );
|
|
>
|
|
> // still case-preserving, of course
|
|
> assert( strcmp( s.c_str(), "AbCdE" ) == 0 );
|
|
> assert( strcmp( s.c_str(), "abcde" ) != 0 );
|
|
|
|
The key here is to understand what a "string" actually
|
|
is in standard C++. If you look in your trusty string
|
|
header, you'll see something like this:
|
|
|
|
typedef basic_string<char> string;
|
|
|
|
So string isn't really a class... it's a typedef of a
|
|
template. In turn, the basic_string<> template is
|
|
declared as follows, in all its glory:
|
|
|
|
template<class charT,
|
|
class traits = char_traits<charT>,
|
|
class Allocator = allocator<charT> >
|
|
class basic_string;
|
|
|
|
So "string" really means "basic_string<char,
|
|
char_traits<char>, allocator<char> >". We don't need
|
|
to worry about the allocator part, but the key here is
|
|
the char_traits part because char_traits defines how
|
|
characters interact and compare(!).
|
|
|
|
basic_string supplies useful comparison functions that
|
|
let you compare whether a string is equal to another,
|
|
less than another, and so on. These string comparisons
|
|
functions are built on top of character comparison
|
|
functions supplied in the char_traits template. In
|
|
particular, the char_traits template supplies character
|
|
comparison functions named eq(), ne(), and lt() for
|
|
equality, inequality, and less-than comparisons, and
|
|
compare() and find() functions to compare and search
|
|
sequences of characters.
|
|
|
|
If we want these to behave differently, all we have to
|
|
do is provide a different char_traits template! Here's
|
|
the easiest way:
|
|
|
|
struct ci_char_traits : public char_traits<char>
|
|
// just inherit all the other functions
|
|
// that we don't need to override
|
|
{
|
|
static bool eq( char c1, char c2 ) {
|
|
return tolower(c1) == tolower(c2);
|
|
}
|
|
|
|
static bool ne( char c1, char c2 ) {
|
|
return tolower(c1) != tolower(c2);
|
|
}
|
|
|
|
static bool lt( char c1, char c2 ) {
|
|
return tolower(c1) < tolower(c2);
|
|
}
|
|
|
|
static int compare( const char* s1,
|
|
const char* s2,
|
|
size_t n ) {
|
|
return strnicmp( s1, s2, n );
|
|
// if available on your compiler,
|
|
// otherwise you can roll your own
|
|
}
|
|
|
|
static const char*
|
|
find( const char* s, int n, char a ) {
|
|
while( n-- > 0 && tolower(*s) != tolower(a) ) {
|
|
++s;
|
|
}
|
|
return n >= 0 ? s : 0;
|
|
}
|
|
};
|
|
|
|
[N.B. A bug in the original code has been fixed for the
|
|
GCC documentation, the corrected code was taken from
|
|
Herb Sutter's book, Exceptional C++]
|
|
|
|
And finally, the key that brings it all together:
|
|
|
|
typedef basic_string<char, ci_char_traits> ci_string;
|
|
|
|
All we've done is created a typedef named "ci_string"
|
|
which operates exactly like the standard "string",
|
|
except that it uses ci_char_traits instead of
|
|
char_traits<char> to get its character comparison
|
|
rules. Since we've handily made the ci_char_traits
|
|
rules case-insensitive, we've made ci_string itself
|
|
case-insensitive without any further surgery -- that
|
|
is, we have a case-insensitive string without having
|
|
touched basic_string at all!
|
|
|
|
This GotW should give you a flavour for how the
|
|
basic_string template works and how flexible it is in
|
|
practice. If you want different comparisons than the
|
|
ones stricmp() and tolower() give you, just replace the
|
|
five functions shown above with your own code that
|
|
performs character comparisons the way that's
|
|
appropriate in your particular application.
|
|
|
|
|
|
|
|
Exercise for the reader:
|
|
|
|
Is it safe to inherit ci_char_traits from
|
|
char_traits<char> this way? Why or why not?
|
|
|
|
|