VirtualBox

source: vbox/trunk/src/libs/xpcom18a4/xpcom/string/doc/string-guide.html@ 85855

Last change on this file since 85855 was 1, checked in by vboxsync, 55 years ago

import

  • Property svn:eol-style set to native
  • Property svn:keywords set to Author Date Id Revision
File size: 83.3 KB
Line 
1<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
2<html>
3 <head>
4 <title>an incomplete guide to mozilla/string</title>
5
6 <link rel="stylesheet" href="http://www.mozilla.org/projects/string/string-guide.css" title="remote stylesheet" type="text/css">
7 <link rel="alternate stylesheet" href="string-guide.css" title="local stylesheet" type="text/css">
8 </head>
9 <body>
10<!-- ----|---------|---------|---------|---------|---------|---------|---------| -->
11<!-- ...............................................................Front Matter -->
12<h1>an incomplete guide to <a class="exact-uri" href="http://lxr.mozilla.org/seamonkey/source/string/">mozilla/string</a></h1>
13 <h1><font color="red">This document is now deprecated in favor of <a href="http://www.mozilla.org/projects/xpcom/string-guide.html">The new string guide</a>.</font></h1>
14<div class="author-note">
15 <p>by <a href="http://ScottCollins.net/">Scott Collins</a><!-- /p -->
16 <p>last modified 8 April 2001<!-- /p -->
17</div>
18
19<div class="abstract">
20 <p>
21 <h1>Abstract</h1>
22 This document <span class="LXRSHORTDESC">provides
23 an <a href="#users_guide">introduction</a> to the design and use of the string classes in mozilla,
24 <a href="#implementors_guide">detailed information</a> on their implementation and how one may extend them,
25 and <a href="#faq">answers</a> to frequently asked questions about strings</span>.
26 </p>
27</div>
28
29
30
31<h2><a name="contents">contents</a></h2>
32
33<div class="contents">
34 <ul>
35 <li><a href="#users_guide" >user's guide</a></li>
36 <li><a href="#implementors_guide">implementor's guide</a></li>
37 <li><a href="#faq" >frequently asked questions</a></li>
38 </ul>
39</div>
40
41<p>
42 Please direct all comments, requests, and contributions to,
43 in order of preference,
44 the tracking bug <a href="http://bugzilla.mozilla.org/show_bug.cgi?id=70076">#70076</a> for this document,
45 the author <a class="exact-uri" href="mailto:[email protected]?subject=string-guide">[email protected]</a>, and/or
46 the newsgroup <a class="exact-uri" href="news:netscape.public.mozilla.xpcom">news:netscape.public.mozilla.xpcom</a>
47 (should there be a strings newsgroup?)
48</p>
49
50<div class="author-note">
51 <p>
52 A note to potential editors:
53 don't even <strong>consider</strong> modifying this document with an HTML editor.
54 That would destroy the internal formatting,
55 and make patches unmanagable.
56 </p>
57</div>
58
59
60
61
62<!-- ...............................................................User's Guide -->
63<hr>
64<h1><a name="users_guide">user's guide</a></h1>
65
66<div class="author-note">
67 <p>
68 Strings in mozilla are a world apart from <span class="code">char*</span>s.
69 If you don't know why they are different,
70 this section is the place for you to start.
71 If you're already familiar with the hierarchy of string classes in mozilla,
72 then you might want to skip ahead to the <a href="#implementors_guide">implementor's guide</a>
73 or the <a href="#faq">FAQ</a>.
74 </p>
75</div>
76
77<div class="contents">
78 <ul>
79 <li><a href="#users_guide_introduction">introduction</a></li>
80 <li><a href="#users_guide_how_to" >using the string classes correctly; using the correct string class</a></li>
81 <li><a href="#users_guide_iterators" >using string iterators</a></li>
82 <li><a href="#users_guide_summary" >summary</a></li>
83 </ul>
84</div>
85
86<h2><a name="users_guide_introduction">introduction</a></h2>
87 <h3>what and what isn't a string?</h3>
88<p>
89 A string is an opaque container holding a, possibly zero length, linear sequence of characters.
90 Understanding the implications of this statement is the foundation for understanding all mozilla's string classes.
91</p>
92
93 <h3>readable and writable</h3>
94 <h3>dependent strings</h3>
95 <h3>flat strings</h3>
96 <h3>encoding</h3>
97 <h3>sharing</h3>
98
99<h2><a name="users_guide_how_to">using the string classes correctly; using the correct string class</a></h2>
100 <h3>basic string operations</h3>
101 <h4>comparison</h4>
102 <h4>concatenation</h4>
103 <h4>substrings</h4>
104 <h4>find and replace</h4>
105 <h3>conversions</h3>
106 <h4>calling a function that expects a different kind of string</h4>
107 <h4>converting between string classes</h4>
108 <h4>converting between encodings</h4>
109 <h3>selecting the right string class</h3>
110 <h4>user string classes</h4>
111 <h4>selecting the right string class for a parameter</h4>
112 <h4>selecting the right string class for a local variable</h4>
113 <h4>selecting the right string class for a member variable</h4>
114 <h4>selecting the right string class for a return value</h4>
115 <h4>selecting the right string class in IDL</h4>
116 <h3>dont's</h3>
117
118<h2><a name="users_guide_iterators">using string iterators</a></h2>
119 <h3>what is an iterator?</h3>
120 <h3>reading iterators and writing iterators</h3>
121 <h3>`chunky' iterating for efficiency</h3>
122 <h3><span class="code">copy_string</span>, character sources and sinks</h3>
123 <h3>encoding conversion iterators</h3>
124
125<h2><a name="users_guide_summary">summary</a></h2>
126
127
128<!-- ........................................................Implementor's Guide -->
129<hr>
130<h1><a name="implementors_guide">implementor's guide</a></h1>
131
132<div class="author-note">
133 <p>
134
135 </p>
136</div>
137
138<div class="contents">
139 <ul>
140 <!-- li></li -->
141 </ul>
142</div>
143
144
145
146<!-- ........................................................................FAQ -->
147<hr>
148<h1><a name="faq">frequently asked questions</a></h1>
149
150<div class="author-note">
151</div>
152
153<div class="contents">
154 <ul>
155<!--
156 <li>
157 I have a wide string, i.e., an instance of a class derived from <span class="code">nsAString</span>
158 <ul>
159 <li>I want a pointer to the characters</span>
160 <li>I want a narrow string</li>
161 <li>I want to <span class="code">printf</span> it</li>
162 </ul>
163 </li>
164 <li>
165 I have a <span class="code">PRUnichar*</span>
166 <ul>
167 <li>I want a wide string</span>
168 <li>I want a narrow string</span>
169 <li>I want to <span class="code">printf</span> it</li>
170 </ul>
171 </li>
172 <li>
173 I have a narrow string, i.e., an instance of a class derived from <span class="code">nsACString</span>
174 <ul>
175 <li>I want a pointer to the characters</span>
176 <li>I want a narrow string</li>
177 <li>I want to <span class="code">printf</span> it</li>
178 </ul>
179 </li>
180 <li>
181 I have a <span class="code">char*</span>
182 <ul>
183 <li>I want a wide string</span>
184 <li>I want a narrow string</span>
185 </ul>
186 </li>
187 <li>
188 I have a literal character sequence, e.g., <span class="code">"Hello, World!\n"</span>
189 <ul>
190 <li>I want a wide string</span>
191 <li>I want a narrow string</span>
192 </ul>
193 </li>
194 <li>What's the best way to return a string?</li>
195 <li>How can I get a pointer to the characters in a string?</li>
196 <li>How can I <span class="code">printf</span> a string?</li>
197 </ul>
198-->
199</div>
200
201
202<table class="chart">
203 <tr>
204 <th></th>
205 <th colspan="5">you have some <span class="code">char</span>s</th>
206 </tr>
207 <tr>
208 <th>you want</th>
209 <th><span class="code">'x'</span></th>
210 <th><span class="code">char c</span></th>
211 <th><span class="code">"foo"</span></th>
212 <th><span class="code">char* cp</span></th>
213 <th><span class="code">nsACString& cs</span></th>
214 </tr>
215 <tr>
216 <th class="row-label"><span class="code">char</span></th>
217 <td colspan="2">.</td>
218<!-- "foo" --> <td><span class="code">[]</span></td>
219<!-- char* cp --> <td><span class="code">[]</span></td>
220<!-- nsACString& cs --> <td><a href="#faq_how_to_extract_a_character">extract a character</a></td>
221 </tr>
222 <tr>
223 <th class="row-label"><span class="code">PRUnichar</span></th>
224<!-- 'x' --> <td><span class="code">PRUnichar('x')</span></td>
225<!-- char c --> <td><span class="code">PRUnichar(c)</span></td>
226 <td colspan="3"><a href="#faq_how_to_convert_encoding">convert encoding</a>, <a href="#faq_how_to_extract_a_character">extract a character</a></td>
227 </tr>
228 <tr>
229 <th class="row-label"><span class="code">char*</span></th>
230<!-- 'x' --> <td><span class="code">&amp;</span></td>
231<!-- char c --> <td><span class="code">&amp;</span></td>
232<!-- "foo" --> <td><span class="code">&amp;</span></td>
233<!-- char* cp --> <td>.</td>
234<!-- nsACString& cs --> <td><a href="#faq_how_to_get_a_pointer">get a pointer</a></td>
235 </tr>
236 <tr>
237 <th class="row-label"><span class="code">PRUnichar*</span></th>
238 <td colspan="5"><a href="#faq_how_to_convert_encoding">convert encoding</a>, <a href="#faq_how_to_get_a_pointer">get a pointer</a></td>
239 </tr>
240 <tr>
241 <th class="row-label"><span class="code">nsACString</span></th>
242<!-- 'x' --> <td><span class="code">NS_LITERAL_CSTRING("x")</span></td>
243<!-- char c --> <td><a href="#faq_how_to_make_a_string">make a string</a></td>
244<!-- "foo" --> <td><span class="code">NS_LITERAL_CSTRING("foo")</td>
245<!-- char* cp --> <td><a href="#faq_how_to_make_a_string">make a string</a></td>
246<!-- nsACString& cs --> <td>.</td>
247 </tr>
248 <tr>
249 <th class="row-label"><span class="code">nsAString</span></th>
250<!-- 'x' --> <td><span class="code">NS_LITERAL_STRING("x")</span></td>
251<!-- char c --> <td><a href="#faq_how_to_convert_encoding">convert encoding</a></td>
252<!-- "foo" --> <td><span class="code">NS_LITERAL_STRING("foo")</span></td>
253 <td colspan="2"><a href="#faq_how_to_convert_encoding">convert encoding</a></td>
254 </tr>
255 <tr>
256 <th class="row-label">to call <span class="code">printf</span></th>
257 <td colspan="4">.</td>
258<!-- nsACString& cs --> <td><a href="#faq_how_to_call_printf">call <span class="code">printf</span></a></td>
259 </tr>
260</table>
261
262<table class="chart">
263 <tr>
264 <th></th>
265 <th colspan="3">you have some <span class="code">PRUnichar</span>s</th>
266 </tr>
267 <tr>
268 <th>you want</th>
269 <th><span class="code">PRUnichar w</span></th>
270 <th><span class="code">PRUnichar* wp</span></th>
271 <th><span class="code">nsAString& s</span></th>
272 </tr>
273 <tr>
274 <th class="row-label"><span class="code">char</span></th>
275<!-- PRUnichar w --> <td></td>
276<!-- PRUnichar* wp --> <td></td>
277<!-- nsAString& s --> <td></td>
278 </tr>
279 <tr>
280 <th class="row-label"><span class="code">PRUnichar</span></th>
281<!-- PRUnichar w --> <td></td>
282<!-- PRUnichar* wp --> <td><span class="code">[]</span></td>
283<!-- nsAString& s --> <td><a href="#faq_how_to_extract_a_character">extract a character</a></td>
284 </tr>
285 <tr>
286 <th class="row-label"><span class="code">char*</span></th>
287<!-- PRUnichar w --> <td></td>
288<!-- PRUnichar* wp --> <td></td>
289<!-- nsAString& s --> <td></td>
290 </tr>
291 <tr>
292 <th class="row-label"><span class="code">PRUnichar*</span></th>
293<!-- PRUnichar w --> <td><span class="code">&amp;</span></td>
294<!-- PRUnichar* wp --> <td></td>
295<!-- nsAString& s --> <td><a href="#faq_how_to_get_a_pointer">get a pointer</a></td>
296 </tr>
297 <tr>
298 <th class="row-label"><span class="code">nsACString</span></th>
299<!-- PRUnichar w --> <td></td>
300<!-- PRUnichar* wp --> <td></td>
301<!-- nsAString& s --> <td></td>
302 </tr>
303 <tr>
304 <th class="row-label"><span class="code">nsAString</span></th>
305<!-- PRUnichar w --> <td></td>
306<!-- PRUnichar* wp --> <td></td>
307<!-- nsAString& s --> <td></td>
308 </tr>
309 <tr>
310 <th class="row-label">to call <span class="code">printf</span></th>
311<!-- PRUnichar w --> <td></td>
312<!-- PRUnichar* wp --> <td></td>
313<!-- nsAString& s --> <td><a href="#faq_how_to_call_printf">call <span class="code">printf</span></a></td>
314 </tr>
315</table>
316
317<div class="faq">
318 <dl>
319 <dt>
320 is there any string doc?
321 </dt>
322 <dd>
323 Yes, you're soaking in it!
324 </dd>
325
326
327
328<!-- getting a pointer -->
329 <dt>
330 <a name="faq_how_to_get_a_pointer">I have a string, how do I get a pointer to the characters?</a>
331 </dt>
332 <dd>
333 You want to avoid this situation.
334 In your own interfaces, prefer string types over raw pointers.
335 Any interface that wants to process a string using a single pointer is making two expensive assumptions.
336 First, that the string is stored in one contiguous hunk; and
337 second, that the string is zero-terminated.
338 If this isn't the case,
339 then to get a pointer, storage must be allocated and the entire string must be copied to it and zero-terminated.
340 You may not be able to avoid needing a pointer when interacting with system calls.
341 </dd>
342 <dd>
343 Some string classes guarantee that they are `flat'.
344 That is, that their data is stored in one contiguous zero-terminated hunk.
345 This <strong>does not</strong> imply that there are no embedded nulls. Caveat emptor.
346 All strings that explicitly promise flatness
347 inherit from the class <span class="code">nsAFlatString</span>
348 or <span class="code">nsAFlatCString</span>
349 and can produce a constant pointer to their data with the <span class="code">get()</span> member function.
350 Even strings that don't explicitly promise to be flat
351 may happen to be flat.
352 The helper function <span class="code">PromiseFlatString</span> will produce
353 a <span class="code">const</span> dependent string that is guaranteed to be flat.
354 If you use this on a string that already happens to be flat,
355 the result is simply a reference through to that string.
356 Otherwise,
357 <span class="code">PromiseFlatString</span> does the work to allocate, copy, terminate, and manage
358 a temporary flat string.
359 Since the result of <span class="code">PromiseFlatString</span> is a temporary,
360 you must be careful not to get and hold a pointer to its data for longer than the temporary itself lives.
361 </dd>
362 <dd>
363<div class="source-code">
364<pre>
365 /* I have a string, how do I get a pointer to the characters? */
366
367extern void EvilNarrowOSFunction( const char* ); // evil OS routines that want a pointers
368extern void EvilWideOSFunction( const PRUnichar* );
369
370void func( const nsAString&amp; aString, const nsACString&amp; aCString )
371 {
372 EvilWideOSFunction( NS_LITERAL_STRING("Hello, World!").<span class="notice">get()</span> );
373 // literal strings are flat already (as are |nsString|s, et al), just use |.get()|
374
375 EvilWideOSFunction( <span class="notice">PromiseFlatString(</span>aString<span class="notice">).get()</span> );
376 // for strings that don't explicitly guarantee flatness, use |PromiseFlatString|
377
378
379 // beware holding the pointer for longer than the life of the promise
380 <span class="warning">const PRUnichar* wp = PromiseFlatString(aString).get(); // BAD! |wp| dangles
381 EvilWideOSFunction(wp);</span>
382
383 // if you really need to use the pointer from |PromiseFlatString| in more than one expression...
384 const nsAFlatString&amp; flat = <span class="notice">PromiseFlatString(</span>aString<span class="notice">)</span>;
385 EvilWideOSFunction(flat.<span class="notice">get()</span>);
386 SomeOtherFunction(flat.<span class="notice">get()</span>);
387
388 // similarly for |char| strings
389 EvilNarrowOSFunction( <span class="notice">PromiseFlatCString(</span>aCString<span class="notice">).get()</span> );
390 }
391</pre>
392</div>
393 </dd>
394
395
396
397<!-- extracting a character -->
398 <dt>
399 <a name="faq_how_to_extract_a_character">How do I get a particular character out of a string?</a>
400 </dt>
401 <dd>
402 Flat strings provide <span class="code">operator[]</span> and <span class="code">CharAt()</span>.
403 All strings provide <span class="code">First()</span>, <span class="code">Last()</span>, and access with iterators.
404 <strong>Don't</strong> promise a string flat just to do character indexing.
405 Prefer, instead, to get an iterator and <span class="code">advance</span> it to the position you care about.
406 </dd>
407 <dd>
408<div class="source-code">
409<pre>
410 /* How do I get a particular character out of a string? */
411
412PRUnichar Get5thCharacterOf( const nsAString& aString )
413 {
414 if ( aString.Length() >= 5 )
415 {
416 nsAString::const_iterator iter;
417 aString.BeginReading(iter); // make |iter| point to the beginning of |aString|
418 iter.advance(5);
419 return *iter;
420 }
421
422 return PRUnichar(0);
423 }
424</pre>
425</div>
426 </dd>
427 <dd>
428 Using iterators isn't as bad as the example above makes it feel.
429 The typical use is for advancing through a string, examining many characters.
430 </dd>
431
432
433
434<!-- how to convert encoding -->
435 <dt>
436 <a name="faq_how_to_convert_encoding">How do I convert from one encoding to another?</a>
437 </dt>
438 <dd>
439 </dd>
440
441
442
443<!-- how to make a string -->
444 <dt>
445 <a name="faq_how_to_make_a_string">How do I create a string?</a>
446 </dt>
447 <dd>
448 </dd>
449
450
451<!-- how to return a string -->
452 <dt>
453 What is the best way to return a string?
454 </dt>
455 <dd>
456 <p>
457 There are several reasonable ways to produce a string result from a function.
458 If you are already holding the answer as a sharable string,
459 you can simply return that string (pass-by-value).
460 Otherwise,
461 the most efficient and flexible way to return a string is
462 to assign your result into a non-<span class="code">const</span> reference parameter.
463 Don't bother to create a sharable string from scratch with your generated result.
464 </p>
465 <p>
466 Why?
467 The two things you want to minimize in string manipulation are,
468 in order of importance,
469 heap allocation, and
470 moving characters around.
471 </p>
472 </dd>
473 <dd>
474<div class="source-code">
475<pre>
476 /* What is the best way to return a string? */
477
478class foo
479 {
480 public:
481 // ...
482 void GetShortName( nsAString&amp; aResult ) const;
483 nsCommonString GetFullName() const;
484
485 private:
486 nsCommonString mFullName;
487
488 const PRUnichar* mShortName;
489 PRUint32 mShortNameLength;
490
491 };
492
493nsCommonString
494foo::GetFullName() const
495 {
496 return mFullName;
497 }
498
499void
500foo::GetShortName( nsAString&amp; aResult ) const
501 {
502 aResult = DependentString(mShortName, mShortNameLength);
503 }
504</pre>
505</div>
506 </dd>
507
508
509 <dt>
510 <a name="faq_how_to_call_printf">How do I <span class="code">printf</span> a string, e.g., for debugging.</a>
511 </dt>
512 <dd>
513 If your string is already narrow, you just have to worry about <a href="#faq_how_to_get_a_pointer">making it flat, and then getting a pointer</a>.
514 </dd>
515 <dd>
516 If your string happens to be wide,
517 you'll need to convert it before you can <span class="code">printf</span> something reasonable.
518 If it's just for debugging,
519 you probably wouldn't care if something odd was printed in the case of a Unicode character that didn't have
520 an ASCII equivalent. (If you have a UTF-8 terminal, the result is
521 perfectly legible and nothing odd is printed.)
522 The simplest thing in this case is to make a temporary conversion using <span class="code">NS_ConvertUTF16toUTF8</span>.
523 The result is conveniently flat already, so getting the pointer is simple.
524 Remember not to hold onto the pointer you get out of this beyond the lifetime of temporary.
525 </dd>
526 <dd>
527<div class="source-code">
528<pre>
529 /* How do I |printf| a string? */
530
531
532void PrintSomeStrings( const nsAString& aString, const PRUnichar* aKey, const nsACString& aCString )
533 {
534 // |printf|ing a narrow string is easy
535 printf("%s\n", <span class="notice">PromiseFlatCString(</span>aCString<span class="notice">).get()</span>); // GOOD
536
537 // the simplest way to get a |printf|-able |const char*| out of a string
538 printf("%s\n", <span class="notice">NS_ConvertUTF16toUTF8(</span>aKey<span class="notice">).get()</span>); // GOOD
539
540 // works just as well with an formal wide string type...
541 printf("%s\n", <span class="notice">NS_ConvertUTF16toUTF8(</span>aString<span class="notice">).get()</span>);
542
543
544 // But don't hold onto the pointer longer than the lifetime of the temporary!
545 <span class="warning">const char* cstring = NS_ConvertUTF16toUTF8(aKey).get(); // BAD! |cstring| is dangling
546 printf("%s\n", cstring);</span>
547 }
548</pre>
549</div>
550 </dd>
551
552 </dl>
553
554<p>
555 Here are the email answers I have yet to format into the FAQ.
556 Some of the URLs may be out-dated or moved.
557 The messages are in order from oldest to newest.
558</p>
559<p class="editnote">[Note : In June, 2003, these emails were modified
560to better reflect what is stored in 'wide' string
561classes (UTF-16 string instead of UCS-2) and what
562related methods do as a part of the patch for <a href=
563"http://bugzilla.mozilla.org/show_bug.cgi?id=183156"
564title="replace UCS2 in function/class/method names with UTF16">bug 183156</a>.
565Therefore, they're a little different from the original emails
566written by <a href="http://ScottCollins.net/">Scott Collins</a>]
567</p>
568<hr>
569<pre>
570Date: Thu, 13 Apr 2000 19:41:47 -0400
571</pre>
572
573<p>Encoding Wars
574
575<p>This message is all about strings and the various encodings that might
576be used to interpret their contents, the ramifications of that, and
577where we're heading. The point of this message is to say what we're
578currently thinking, and get feedback. I apologize in advance for the
579rambling, and for the fact that this message may accidentally mix
580discussion of how things <strong>are</strong> and how they will be.
581
582<p>There are many different possible encodings. Three in common use in
583the Mozilla source base are: ASCII, UTF-16, and UTF-8. In ASCII, every
584<!--the Mozilla source base are: ASCII, UCS2, and UTF8. In ASCII, every-->
585character fits in 7-bits and is typically stored in an 8-bit byte. We
586usually represent ASCII strings with <span class="code">nsCString</span>s, <span class="code">nsXPIDLCString</span>s,
587or <span class="code">char</span> string literals. In UTF-16, characters occupy one 16-bit code unit (
588<a href="http://www.unicode.org/glossary/index.html#BMP_character">
589<abbr title="Basic Multilingual Plane">BMP</abbr>characters</a>)
590or two 16-bit code units
591(<a href="http://www.unicode.org/glossary/index.html#supplementary_character">
592<abbr title="Supplementary Plane : Plane 1 through 16">non-BMP</abbr> characters</a>).
593We usually represent UTF-16 strings as <span class="code">nsString</span>s, etc., i.e., two-byte
594or `wide' strings. UTF-8 is a multi-byte encoding. A character might
595occupy one, two, three, or four bytes. It is easiest to store and
596manipulate such a string within a single-byte or `narrow' string
597implementation.
598
599<p>None of our current string implementations know the encoding of the
600data they hold at any given moment. An <span class="code">nsCString</span> might legitimately
601hold data encoded in ASCII, UTF-8 or even EBCDIC for that matter.
602
603<p>Operations that convert from one encoding to another, or operations
604that are encoding sensitive (e.g., <span class="code">to_upper</span>), rightly belong in
605i18n. The fact that our current string interfaces automatically and
606implicitly convert between wide and narrow strings is actually the
607source of many errors in two particular categories: (1) unintended
608extra work, (2) mistaken re-encoding, e.g., accidentally `converting'
609a UTF-8 string to UTF-16 by pretending the UTF-8 string is ASCII and then
610padding with <span class="code">'\0'</span>s.
611
612<p>We've known these were bad for a long time, and have been trying to
613find the right way to fix them. The current thinking is to just byte
614the bullet and eliminate implicit conversions. That has interesting
615ramifications.
616
617<div class="source-code">
618<pre>
619void foo( const nsString&amp; aUTF16string );
620
621foo("hello"); // works! constructs a temporary |nsString| by
622 // converting the ASCII literal with padding.
623 // Note: this requires an allocation
624</pre>
625</div>
626
627<p>Though we've always hated this form since it requires a heap
628allocation. In current code, we recommend
629
630<div class="source-code">
631<pre>
632foo( nsAutoString("hello") );
633</pre>
634</div>
635
636<p>which still copy/converts, but at least it probably doesn't need to do
637a heap allocation. In the best of all worlds, no conversion, copying,
638or allocation would be necessary. To do that, you would need to be
639able to directly specify a UTF-16 string, e.g., with the <span class="code">L"hello"</span>
640notation, and wrap that in an interface that just held a pointer.
641E.g., something like
642
643<div class="source-code">
644<pre>
645void foo( const nsAReadableString&amp; aUTF16string );
646
647foo( nsLiteralString(L"hello") );
648</pre>
649</div>
650
651<p>There are problems with this example, however. The <span class="code">L</span> notation
652specifically makes objects that are arrays of <span class="code">wchar_t</span>, which under
653GCC is a 4-byte element. This leads to incompatibility with JS, and
654the annoyance of possibly bloated storage (I'm sort of minimizing the
655situation here. It's worse that I make it sound). More about tricks
656to get around this in a bit, but first, let me talk about what to do
657in the meantime while we're just getting rid of implicit constructors.
658 Initially to get around this problem (what problem? The problem that
659<span class="code">foo("hello")</span> stopped compiling on my machine when I threw the
660switch) I made a routine called <span class="code">NS_ConvertToString</span> which looked like
661this
662
663<div class="source-code">
664<pre>
665inline
666nsAutoString
667NS_ConvertToString( const char* anASCIIstring )
668 {
669 nsAutoString aUCS2string;
670 aUCS2string.AssignWithConversion(anASCIIstring);
671 return aUCS2string;
672 }
673</pre>
674</div>
675
676<p>Which lets me write
677
678<div class="source-code">
679<pre>
680foo( NS_ConvertToString("hello") );
681</pre>
682</div>
683
684<p>This was <strong>OK</strong>, but in discussion there were concerns about performance
685on machines that didn't <span class="code">inline</span> well, and issues about naming. In
686that meeting we came up with an alternate naming strategy that we
687think has room for growth and an implementation more likely to be
688efficient on every platform. The implementation is to define a new
689class that derives from <span class="code">nsAutoString</span>, but allows construction from a
690<span class="code">char*</span>
691
692<div class="source-code">
693<pre>
694class NS_ConvertASCIItoUTF16 : public nsAutoString
695 {
696 public:
697 NS_ConvertASCIItoUTF16( const char* );
698 // ...
699 };
700</pre>
701</div>
702
703<p>Which gives identical (though renamed) notation for calling <span class="code">foo</span>:
704
705<div class="source-code">
706<pre>
707foo( NS_ConvertASCIItoUTF16("hello") );
708</pre>
709</div>
710
711<p>It looks like a function call to an explicit encoding conversion. It
712acts like a function call to an explicit encoding conversion. It <strong>is</strong>
713a function call to an explicit encoding conversion. We think that
714this naming pattern has room for growth. In the meeting, we concluded
715that the best representation for encoding conversions is a family of
716functions, and <span class="code">NS_ConvertASCIItoUTF16</span> fits right in. We think that
717XPCOM probably can't live without the ASCII to UTF-16 conversion (though
718as explicit as possible) but that all others rightly belong in i18n
719land.
720
721<p>You can probably deduce from the clues in <span class="code">NS_ConvertToString</span>, above,
722that constructors weren't the only thing that became explicit.
723Assignment, appending, comparison, et al, got renamed so that when
724assigning, appending, or comparing to a value in a different encoding
725the `WithConversion' form must be used. E.g.,
726
727<div class="source-code">
728<pre>
729nsString aUTF16string;
730nsCString anASCIIstring;
731// ...
732
733aUTF16string += anASCIIstring; // Currently legal, but not for long
734aUTF16string.Append(anASCIIstring); // same
735
736aUTF16string.AppendWithConversion(anASCIIstring); // the new way
737
738if ( aUTF16string == anASCIIstring ) // Sorry, this is going away too
739 // ...
740
741if ( aUTF16string.EqualsWithConversion(anASCIIstring) )
742 // ...
743</pre>
744</div>
745
746<p>Yes, it's long and annoying. Just like the extra work you were
747implicitly asking to have done, perhaps incorrectly. There are other
748reasons to rename these functions. When <span class="code">nsString</span> and <span class="code">nsCString</span>
749defined a ton of, e.g., <span class="code">Append</span>s each there was no problem, because
750nobody wanted to override <span class="code">Append</span>. Now, with strings inheriting from
751abstract base classes we immediately run into the problem that
752overriding and overloading don't mix very well in C++. Because of a
753feature of C++ called name hiding, it is problematic to override only
754a single signature of a name overloaded in a base class. The base
755<span class="code">nsAWritableString</span> provides several <span class="code">Append</span>s, all for objects of
756(hopefully) the same encoding. <span class="code">nsString</span> can't easily add a bunch of
757new <span class="code">Append</span>s (the converting ones) without running face first into
758the name hiding problem. The discussion of the fix for this is mostly
759unrelated to encoding issues, so I'll defer it to another post.
760
761<p>In hindsight, after the meeting, it seemed clear that all the
762`WithConversion' forms would be better named
763
764<div class="source-code">
765<pre>
766xxxConvertingASCIItoUTF16
767xxxConvertingUTF16toASCII
768</pre>
769</div>
770
771<p>however, the <strong>real</strong> goal (probably) is to move most such conversions
772into i18n. Just bringing attention to the previously implicit
773conversions is a good first step. Renaming these conversions as just
774suggested is probably the right thing to do, though it sort of
775validates them, which I'm not sure we really want. This is a decision
776we need to discuss further.
777
778<p>Now, back to the string literal problem above. One possible solution
779is to use a macro. Imagine
780
781<div class="source-code">
782<pre>
783NS_LITERAL_STRING("Hello")
784</pre>
785</div>
786
787<p>which on a machine where the <span class="code">L</span> trick works, turns into
788
789<div class="source-code">
790<pre>
791nsLiteralString(L"Hello")
792</pre>
793</div>
794
795<p>but on a machine where there is trouble, turns into something less
796appealing, but more likely to work, like
797
798<div class="source-code">
799<pre>
800NS_ConvertASCIItoUTF16("Hello")
801</pre>
802</div>
803
804<p>Another solution is to add a compilation step that fixes <span class="code">L</span> strings
805on bad platforms to be non-<span class="code">L</span> strings, but padded with <span class="code">\0</span>s. E.g.,
806<span class="code">L"Hello"</span> gets preprocessed into <span class="code">"\000H\000e\000l\000l\000o\000"</span>.
807This solution is more annoying to the developer, where the prior
808solution is more annoying during the runtime.
809
810<p>Before we go to too much trouble on this specific feature, we will
811probably want to do more measurement to see just how much and how
812often we are converting constant literal strings, and why.
813
814
815<p>I'm currently ripping through the tree fixing things to use the
816`WithConversion' forms where appropriate. I was also converting
817things to use <span class="code">NS_ConvertToString</span> where appropriate; unless I get
818talked out of it, I want to switch midstream to
819<span class="code">NS_ConvertASCIItoUTF16</span>, then go back and fix up the
820<span class="code">NS_ConvertToString</span> instances later. I've set things up so I can
821check in as I go. After all these conversions have been done, I'll be
822able to throw the switch (what switch? NEW_STRING_APIS) which will
823make <span class="code">nsString</span> inherit from <span class="code">nsAWritableString</span>, etc. and allow us to
824start exploiting these other opportunities (e.g., for literal strings,
825shared strings, etc. See
826<a class="exact-uri" href="http://bugzilla.mozilla.org/show_bug.cgi?id=28221">http://bugzilla.mozilla.org/show_bug.cgi?id=28221</a> for details and
827reasoning.)
828
829<p>I guess I'm expecting comments on:
830
831<ul>
832 <li>how really annoying this whole topic is
833 <li>how bad <span class="code">L"xxx"</span> is
834 <li>whether to move forward with <span class="code">NS_ConvertASCIItoUTF16</span>
835 <li>whether we should move to xxxConvertingASCIItoUTF16 etc instead
836 of `WithConverting'
837 <li>arguments about where encoding conversions should live
838 <li>arguments about whether going between 1 and 2 byte storage is an
839 encoding conversion
840 <li>questions about stuff I didn't mention or didn't explain well
841 <li>pointing out stuff I'm just plain wrong about, or things I forgot
842 <li>etc
843</ul>
844
845<p>So as not to jumble the discussion, I'll be separately posting other
846requests for comments about specific features of the design of the new
847string hierarchy.
848
849<p>I hope this helps keep everybody filled in on what we're thinking and
850able to point out what we're forgetting or screwing up :-)
851
852
853
854
855
856<hr>
857<pre>
858Date: Wed, 19 Apr 2000 21:12:47 -0400
859Subject: more string info
860</pre>
861
862<p> <a class="exact-uri" href="news://news.mozilla.org/[email protected]">news://news.mozilla.org/[email protected]</a>
863
864
865
866
867
868<hr>
869<pre>
870Date: Fri, 26 May 2000 15:31:37 -0400
871Subject: Re: Question on ==
872</pre>
873
874<p>I would prefer you compare with <span class="code">Equals</span> (which should really be named
875<span class="code">IsEqualTo</span>) rather than <span class="code">operator==()</span> because of this:
876
877<div class="source-code">
878<pre>
879char* a;
880char* b;
881
882// ...
883
884if ( a == b )
885 // ...
886</pre>
887</div>
888
889<p>Comparing two raw `string' pointers doesn't compare the characters
890they point to, but instead compares the bits of the pointers. For
891this reason, I may eventually make comparison of a string with a
892pointer using operators just go away.
893
894
895
896
897
898<hr>
899<pre>
900Date: Wed, 14 Jun 2000 14:38:55 -0400
901Subject: Re: Fix to XprtDefs.h
902</pre>
903
904<p>Yes, we're aware that turning off <span class="code">wchar_t</span> support makes <span class="code">wchar_t</span> be
905a synonym for <span class="code">unsigned short</span> under Metrowerks. We know that the
906current version of VC++ also makes these types equivalent. In theory,
907though, the types are distinct even when they are the same size and
908shape. By using real <span class="code">wchar_t</span> support, we are forced to recognize
909the distinction and navigate it appropriately with <span class="code">reinterpret_cast</span>
910(via <span class="code">NS_REINTERPRET_CAST</span>). The win here is that we aren't caught by
911compiler changes that suddenly make some set of compilers compliant
912and therefore break our code. We will add an autoconf test that lets
913UNIX compilers opt in to our string scheme when they have an
914appropriately shaped <span class="code">wchar_t</span>. If these happen to be compliant
915compilers, all will be well. If they don't, the casts don't hurt,
916because they are type correct. We are writing our code to meet the
917standard as we move forward.
918
919<p>The win for us is realized by the following macros
920
921<div class="source-code">
922<pre>
923#ifdef HAVE_CPP_2BYTE_WCHAR_T
924 #define NS_LITERAL_STRING(s) nsLiteralString(L##s, \
925 (sizeof(L##s)/sizeof(wchar_t))-1)
926#else
927 #define NS_LITERAL_STRING(s) NS_ConvertASCIItoUTF16(s, \
928 sizeof(s)-1)
929#endif
930</pre>
931</div>
932
933<p>An <span class="code">nsLiteralString</span> points directly to the literal characters. No
934copying, no conversion, and the length calculation happens at compile
935time. This has turned out to be as large a savings as 15% of code
936space and 8% of data space, net, in our string test harness It's
937faster as well, again by eliminating the copying, conversion, and
938length calculation. We don't know yet what those numbers translate
939into in our real code base, but we have high hopes.
940
941<p>I don't want to be in the position to ask you to change your code. I
942don't think it's appropriate for me to do so. The AIM application
943that is your client is our client as well. They need to resolve this
944difference between us in whatever way they think best. That may mean
945asking you if changing your apis is the right thing to do. Or it may
946mean applying the casts. Our code-base and yours, Justin, are more
947like cousins. I don't think you should have to change just to conform
948to us. You may think my arguments for using real <span class="code">wchar_t</span> have
949merit, and adopt similar usage just because you agree; but I think the
950only obligation you have is to follow the technical solution you think
951is right for your code.
952
953<p>If you decide to make this api change, it will mean shipping a new
954binary (on Mac) for your library to clients who want to switch over to
955the new api (since the name mangling will be different, and therefore,
956the link requirements will change).
957
958<p>Hope this helps,
959
960
961
962
963
964<hr>
965<pre>
966Date: Thu, 15 Jun 2000 19:36:55 -0400
967Subject: Re: Checkin approval for bug 32336
968</pre>
969
970<div class="source-code">
971<pre>
972S.Equals(NS_LITERAL_STRING("bar"), PR_TRUE, 3)
973</pre>
974</div>
975
976<p>doesn't compile because there is no three parameter form for <span class="code">Equals</span>.
977 For all definitions of <span class="code">Equals</span> on strings, see "nsAReadableString.h"
978
979<p><a class="exact-uri" href="http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsAReadableString.h">http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsAReadableString.h</a>
980
981<p>There is an <span class="code">EqualsWithConversion</span> that takes three parameters.
982
983<p> <a class="exact-uri" href="http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsString2.h#731">http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsString2.h#731</a>
984
985<p>It is ``EqualsWithConversion'' because it admits the possibility of an
986encoding specific transformation, in this case to provide
987case-insensitive comparison. This also wouldn't compile, however,
988since, at the moment, an <span class="code">nsLiteralString</span> doesn't provide an operator
989to produce a <span class="code">const PRUnichar*</span> (though perhaps it should), and it
990doesn't satisfy the other interfaces that match this call, e.g., a
991<span class="code">const nsString&</span>.
992
993<p>Perhaps I need to move case-insensitive comparison up out of
994<span class="code">nsString</span> into a global encoding specific transformations and
995algorithms file (which was on its way anyway as Waterson, knows); this
996use is one bit of evidence to support this. In the short term, this
997can be fixed (if we think the current behavior is wrong) by providing
998<span class="code">operator const CharT*() const</span> on literal string.
999
1000<p>If you can live with out case-folding, the earlier form is preferred
1001
1002<div class="source-code">
1003<pre>
1004S == NS_LITERAL_STRING("bar")
1005</pre>
1006</div>
1007
1008<p>if you can't, then one of the fixes I mentioned is in order.
1009
1010
1011
1012
1013
1014<hr>
1015<pre>
1016Date: Thu, 15 Jun 2000 19:47:12 -0400
1017Subject: Re: [Fwd: how to use nsString ?]
1018</pre>
1019
1020<pre class="email-quote">
1021 >I see these same examples time and again in the embedding
1022 >samples/docs, but I can't compile them.
1023</pre>
1024
1025<p>Apologies. Documentation mentioning strings is getting out of date.
1026Here are some specific answers.
1027
1028
1029<pre class="email-quote">
1030 >nsString URLString("http://www.mozilla.org");
1031</pre>
1032
1033<p>...is now perhaps best expressed as
1034
1035 nsString URLString( NS_LITERAL_STRING("http://www.mozilla.org") );
1036
1037<p>since an <span class="code">nsString</span> is a sequence of 2-byte wide characters, and the
1038routines that implicitly convert 1-byte sequences (like the literal
1039sequence you specified, "http:...") are now gone.
1040
1041<p>Up until not too long ago, one would have had to say
1042
1043<div class="source-code">
1044<pre>
1045nsString URLString;
1046URLString.AssignWithConversion("http://www.mozilla.org");
1047</pre>
1048</div>
1049
1050<p>The <span class="code">NS_LITERAL_STRING</span> construction is new machinery that has the
1051potential to make many operations much more efficient.
1052
1053<pre class="email-quote">
1054 >nsString URLString;
1055 >URLString.SetString("www.mozilla.org");
1056</pre>
1057
1058<p><span class="code">SetString</span> was a synonym for <span class="code">Assign</span> or assignment with
1059<span class="code">operator=()</span>, it too went away. The equivalent is the second
1060example I gave above, that is, the one with <span class="code">AssignWithConversion</span>.
1061
1062<p><span class="code">Assign</span> still exists. <span class="code">AssignWithConversion</span> takes on that
1063functionality for assignments that require encoding transformations
1064(e.g., from ASCII to UTF16). <span class="code">SetString</span> is gone, since it was always
1065a synonym for <span class="code">Assign</span>.
1066
1067<p>Learn more about the general APIs for strings that we are trying to
1068move to by examining
1069
1070<a class="exact-uri" href="http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsAReadableString.h">http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsAReadableString.h</a>
1071<a class="exact-uri" href="http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsAWritableString.h">http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsAWritableString.h</a>
1072
1073<p>Hope this helps,
1074
1075
1076
1077
1078
1079<hr>
1080<pre>
1081Date: Thu, 15 Jun 2000 21:26:51 -0400
1082Subject: Re: Checkin approval for bug 32336
1083</pre>
1084
1085<pre class="email-quote">
1086 >I *need* the count attribute, because I need to compare only the first
1087 >chars (that's inherent to the logic).
1088</pre>
1089
1090<p>This is what substrings are for. In that case, you could use
1091
1092<div class="source-code">
1093<pre>
1094Substring(S, 0, 3) == NS_LITERAL_STRING("bar")
1095</pre>
1096</div>
1097
1098<p>As for case-folding, it's best if you can case-fold everything up
1099front, instead of doing it repeatedly. I'll have to get back to you
1100on a general solution to that problem, or what my schedule for getting
1101it checked in would be. I'm sorry, I know that's not what you needed
1102to hear. If the source string is an <span class="code">nsString</span>, you can continue to
1103exploit its implementation of these routines, e.g., <span class="code">ToLower</span> all
1104up-front.
1105
1106<p>Hope this helps,
1107
1108
1109
1110
1111
1112<hr>
1113<pre>
1114Date: Mon, 19 Jun 2000 14:23:47 -0400
1115Subject: Re: string fu
1116</pre>
1117
1118<pre class="email-quote">
1119 >It seems less convenient to have to first check path.IsEmpty, and
1120 >then if false get path.Last and test it.
1121</pre>
1122
1123<p>What would you prefer? That extracting a character not in the string
1124always return <span class="code">CharT(0)</span>? Can't do it for two reasons: (1) <span class="code">0</span> may be
1125a valid character in a particular encoding, so it can't be used in
1126general as a ``no character at that position'' marker; and (2) I can't
1127control what an individual string implementation does when asked to
1128get an out-of-bounds fragment, it's explicitly undefined. That means
1129the result of <span class="code">CharAt</span> is explicitly undefined for indexes outside the
1130defined contents of the string. As a debugging convenience, I have
1131made this assert, but it has always been the case that retrieving such
1132a character had undefined results ... even in [the old] code.
1133
1134<p>OK, you might say, well at least let me ask for a character that is
1135only off the end by one. E.g., <span class="code">Last</span> of an empty string. Reason (1)
1136from above still applies. How bad is it to say, for the case you gave
1137
1138<div class="source-code">
1139<pre>
1140PRBool needsDelim = PR_FALSE;
1141if ( !path.IsEmpty() )
1142 {
1143 PRUnichar last = path.Last();
1144 needsDelim = !(last == '/' || last == '\\');
1145 }
1146</pre>
1147</div>
1148
1149<p>In general, you probably want to opt out of a whole lot of work when
1150the source string is empty. It is slightly less convenient, but it
1151doesn't tie us to a bunch of implementation specific mojo.
1152
1153
1154<pre class="email-quote">
1155 >Can we fix GetUnicode in this case?
1156</pre>
1157
1158<p>This is an annoying property of auto strings, e.g., that they always
1159have an allocated buffer. I'm happy to fix this bug, however, be
1160aware that <span class="code">GetUnicode</span> and <span class="code">GetBuffer</span> are artifacts of [the old]
1161implementation that we don't want to support. They are not part of
1162the abstract interface. We will keep them no longer than we have to.
1163They don't support our multi-fragment paradigm. People who require a
1164contiguous hunk of characters in the future, and are unwilling to
1165switch over to chunky-iterators, may be forced to copy the string to
1166their own buffer. There will be an implementation of narrow character
1167string that guarantees contiguous allocation and a zero-terminator,
1168much as <span class="code">nsCString</span> does now, for compatibility with platform uses,
1169but this won't be the default string class.
1170
1171
1172
1173
1174
1175<hr>
1176<pre>
1177Date: Mon, 19 Jun 2000 17:22:31 -0400
1178</pre>
1179
1180<p>Clarifying String Sematics
1181
1182<p>Recently, I added an assert to the string operations that extract
1183characters, namely <span class="code">First()</span>, <span class="code">Last()</span>, <span class="code">CharAt()</span>, and
1184<span class="code">operator[]()</span>. This assert fires when any of these routines are used
1185to access a character outside the defined contents of the string. For
1186<span class="code">First()</span> and <span class="code">Last()</span> that means whenever they are applied to an
1187empty string. For <span class="code">CharAt()</span> and <span class="code">operator[]()</span>, that means whenever
1188they are used to access an index outside the range of
1189<span class="code">0</span>..<span class="code">Length()-1</span>. There have been some complaints, however, the
1190result was always undefined. What follows is extracted from an email
1191exchange between me and warren on this topic. I hope it clarifies
1192strings semantics
1193
1194<p>Warren writes:
1195<pre class="email-quote">
1196 >I hit your funky CharAt assertion tonight in this piece of code:
1197
1198 >NS_IMETHODIMP
1199 >nsIOService::ResolveRelativePath(
1200 > const char *relativePath,
1201 > const char* basePath,
1202 > char **result )
1203 > {
1204 > nsCAutoString name;
1205 > nsCAutoString path(basePath);
1206 >
1207 > PRUnichar last = path.Last();
1208 > PRBool needsDelim = !(last == '/' || last == '\\' || last ==
1209 > '\0');
1210 > ...
1211
1212 >where basePath is null. It seems less convenient to have to first
1213 >check path.IsEmpty, and then if false get path.Last and test it.
1214</pre>
1215
1216<p>I replied:
1217<pre class="email-quote">
1218 >What would you prefer? That extracting a character not in the
1219 >string always return <span class="code">CharT(0)</span>? Can't do it for two reasons:
1220 >(1) <span class="code">0</span> may be a valid character in a particular encoding, so it
1221 >can't be used in general as a ``no character at that position''
1222 >marker; and (2) I can't control what an individual string
1223 >implementation does when asked to get an out-of-bounds fragment,
1224 >it's explicitly undefined. That means the result of <span class="code">CharAt</span> is
1225 >explicitly undefined for indexes outside the defined contents of
1226 >the string. As a debugging convenience, I have made this assert,
1227 >but it has always been the case that retrieving such a character
1228 >had undefined results ... even in [the old] code.
1229
1230 >OK, you might say, well at least let me ask for a character that
1231 >is only off the end by one. E.g., <span class="code">Last</span> of an empty string.
1232 >Reason (1) from above still applies. How bad is it to say, for the
1233 >case you gave
1234
1235 > PRBool needsDelim = PR_FALSE;
1236 > if ( !path.IsEmpty() )
1237 > {
1238 > PRUnichar last = path.Last();
1239 > needsDelim = !(last == '/' || last == '\\');
1240 > }
1241
1242 >In general, you probably want to opt out of a whole lot of work
1243 >when the source string is empty. It is slightly less convenient,
1244 >but it doesn't tie us to a bunch of implementation specific mojo.
1245</pre>
1246
1247<p>Warren also asks:
1248<pre class="email-quote">
1249 >Here's another issue, perhaps more serious. If I say this:
1250
1251 > foo(const PRUnichar* s) {
1252 > nsAutoString str(s);
1253 > bar(str.get());
1254 > }
1255
1256 >where s is null, bar will get passed a zero-length PRUnichar
1257 >sequence instead of null. This makes it so that you can't just
1258 >test for the argument == null. You have to nsCRT::strlen(arg) == 0
1259 >which is much less efficient. Can we fix GetUnicode in this case?
1260</pre>
1261
1262<p>And I reply:
1263<pre class="email-quote">
1264 >This is an annoying property of auto strings, e.g., that they
1265 >always have an allocated buffer. I'm happy to fix this bug,
1266 >however, be aware that <span class="code">GetUnicode</span> and <span class="code">GetBuffer</span> are artifacts
1267 >of [the old] implementation that we don't want to support. They
1268 >are not part of the abstract interface. We will keep them no
1269 >longer than we have to. They don't support our multi-fragment
1270 >paradigm. People who require a contiguous hunk of characters in
1271 >the future, and are unwilling to switch over to chunky-iterators,
1272 >may be forced to copy the string to their own buffer. There will
1273 >be an implementation of narrow character string that guarantees
1274 >contiguous allocation and a zero-terminator, much as <span class="code">nsCString</span>
1275 >does now, for compatibility with platform uses, but this won't be
1276 >the default string class.
1277</pre>
1278
1279<p>In a later message, Chris Waterson asks a related question
1280<pre class="email-quote">
1281 >scc: should we add <span class="code">operator PRUnichar*()</span> to
1282 >NS_ConvertASCIItoUTF16?
1283</pre>
1284
1285<p>And I reply:
1286<pre class="email-quote">
1287 >It seems reasonable. A lot more reasonable that forcing people to
1288 >call <span class="code">GetUnicode()</span>. I alluded to platform specific classes in an
1289 >earlier message to warren that you were cc'd on, Chris. I imagine
1290 >that the <span class="code">...Convert...</span> routines would be required to produce
1291 >contiguous allocation 0-terminated strings (though the as yet
1292 >unimplemented <span class="code">...Copy...</span> forms, of course wouldn't. So <span class="code">operator
1293 >const PRUnichar*() const</span> makes perfect sense to me here.
1294</pre>
1295
1296<p>Hope this makes sense,
1297
1298
1299
1300
1301<hr>
1302<pre>
1303Date: Tue, 20 Jun 2000 04:05:31 -0400
1304Subject: Re: NS_LITERAL_STRING is broken
1305</pre>
1306
1307<p>The behavior you describe sounds exactly like when you say
1308
1309<div class="source-code">
1310<pre>
1311const char* foobar = "foobar";
1312
1313... NS_LITERAL_STRING(foobar).get() ...
1314</pre>
1315</div>
1316
1317<p>because in this case, the thing passed in is a <span class="code">const char*</span>.
1318<span class="code">NS_LITERAL_STRING</span> is not meant to be used in this way. It is only
1319meant to be used around a <span class="code">"</span> delimited string. The type of such is
1320<span class="code">const char[N]</span> where N is the number of characters in the string + 1
1321for the zero terminator it helpfully adds. <span class="code">sizeof</span> such a type is
1322<span class="code">N</span>.
1323
1324<p>Are you sure you had the actual string as an argument, as in your
1325example to me? Or could the actual code have been like my sample,
1326above?
1327
1328
1329
1330
1331
1332<hr>
1333<pre>
1334Date: Thu, 29 Jun 2000 13:35:10 -0400
1335Subject: Re: a fix
1336</pre>
1337
1338<pre class="email-quote">
1339 > + if (Length() == 0) { return nsnull; }
1340</pre>
1341
1342
1343<p>Dave,
1344
1345<p>please read
1346
1347 <a class="exact-uri" href="news://news.mozilla.org/[email protected]">news://news.mozilla.org/[email protected]</a>
1348
1349<p>It's just plain wrong to let people try to index into a string outside
1350its defined contents. I can't just return <span class="code">'\0'</span> or <span class="code">PRUnichar('\0')</span>
1351there as that <strong>could</strong> be a legal value to have somewhere in your
1352string for some encodings ... and the encoding is not specified. So
1353your patch has the basic problem of defeating my plan to stop people
1354from doing this bad thing.
1355
1356<p>The second problem with your patch is that you use the symbolic
1357constant <span class="code">nsnull</span>, which is ostensibly a pointer value; <span class="code">Last</span> returns
1358a character. <span class="code">nsnull</span> is not appropriate for that purpose. In fact,
1359C++ gurus pretty much eschew the use of symbolic constants for <span class="code">0</span>.
1360<span class="code">NULL</span> is to be avoided. <span class="code">nsnull</span> is wrong-headed in that it presumes
1361we could have some <strong>other</strong> application specific value for <span class="code">NULL</span>. We
1362can't, it would never work. It's just wasted brain-print. Always use
1363<span class="code">0</span> for these situations, and if you want to communicate the fact that
1364something is a pointer type, either use a comment or a
1365(construction-style) cast, like so (graded examples from worst to
1366best:)
1367
1368<ul>
1369 <li>F: FindChildByNameWithHint("Chuck", nsnull);
1370
1371 <li>D: FindChildByNameWithHint("Chuck", NULL);
1372
1373 <li>C: FindChildByNameWithHint("Chuck", /* Child* */ 0);
1374
1375 <li>B: typedef Child* Child_ptr;
1376 FindChildByNameWithHint("Chuck", Child_ptr(0));
1377
1378 <li>A: FindChildByNameWithHint("Chuck", 0);
1379</ul>
1380
1381<p>Don't let this discourage you; keep up the good work :-)
1382
1383
1384
1385
1386
1387<hr>
1388<pre>
1389Date: Tue, 8 Aug 2000 23:47:16 -0400
1390Subject: Re: nsWritingIterator?
1391</pre>
1392
1393<pre class="email-quote">
1394 >Can you give me any pointers to examples, or docs, or just some
1395 >general advice?
1396</pre>
1397
1398 <a class="exact-uri" href="http://ScottCollins.net/Journal/discussion/string_iterators.html">http://ScottCollins.net/Journal/discussion/string_iterators.html</a>
1399
1400<p>does this help?
1401
1402<p>I can personally walk you through any specific scenario you need.
1403
1404
1405
1406
1407
1408<hr>
1409<pre>
1410Date: Wed, 9 Aug 2000 02:35:03 -0400
1411Subject: Re: nsWritingIterator?
1412</pre>
1413
1414<p>You got it right... it's <span class="code">nsWritingIterator<CharT></span> for whichever
1415character type you care about, either <span class="code">char</span> or <span class="code">PRUnichar</span>. You
1416_can_ use this iterator like a character pointer ... that is, you can
1417dereference it, assign into its dereference, etc. It is more
1418efficient, though, to directly address a particular range of
1419characters around where it points by asking it for its actual
1420character pointer with <span class="code">get</span>, and knowing that there are
1421<span class="code">size_forward()</span> characters available ahead of that pointer and
1422<span class="code">size_backward()</span> characters available behind it. After examining
1423those characters by hand, you can advance the iterator beyond the
1424characters you have examined (and possibly into the next chunk, should
1425one exist) by adding into it (with +=) the count of the characters you
1426have processed.
1427
1428<p>Here are three examples of running through a string and modifying some
1429of the characters in it. All use <span class="code">nsWritingIterator</span>s.
1430
1431
1432<div class="source-code">
1433<pre>
1434 // inefficient, but works in a pinch:
1435 // iterators can hide all details of chunks by acting like
1436 // a raw character pointer
1437
1438nsWritingIterator&lt;PRUnichar&gt; s = S.BeginWriting();
1439nsWritingIterator&lt;PRUnichar&gt; done_with_string = S.EndWriting();
1440
1441 // for each character in the string |S|
1442while ( s != done_with_string )
1443 {
1444 // if the character is lower case, capitalize it
1445 if ( 'a' &lt;= *s &amp;&amp; *s &lt;= 'z' )
1446 *s = *s -'a' + 'A';
1447 }
1448
1449
1450
1451
1452 // efficient
1453 // iterators provide a mechanism by which you can process
1454 // a chunk-at-a-time
1455
1456nsWritingIterator&lt;PRUnichar&gt; iter = S.BeginWriting();
1457nsWritingIterator&lt;PRUnichar&gt; done_with_string = S.EndWriting();
1458
1459 // for each chunk of the string
1460while ( iter != done_with_string )
1461 {
1462 size_t N = iter.size_forward(); // # of chars in this chunk
1463 PRUnichar* s = iter.get();
1464 PRUnichar* done_with_chunk = s + N;
1465
1466 // for each character in this chunk
1467 for ( ; s &lt; done_with_chunk; ++s )
1468 {
1469 // if the character is lower case, capitalize it
1470 if ( 'a' &lt;= *s &amp;&amp; *s &lt;= 'z' )
1471 *s = *s - 'a' + 'A';
1472 }
1473
1474 // advance the iterator past characters
1475 // we examined (and into the next chunk, if any)
1476 s += N;
1477 }
1478
1479
1480
1481 // elegant
1482 // pull your transformation into a `sink', and |copy_string|
1483 // will efficiently pump any kind of string into it
1484
1485struct Capitalize
1486 {
1487 // inline
1488 PRUint32
1489 write( PRUnichar* s, PRUint32 N )
1490 // processes one chunk, called repeatedly by |copy_string|
1491 {
1492 PRUnichar* done_with_chunk = s + N;
1493
1494 // for each character in this chunk
1495 for ( ; s &lt; done_with_chunk; ++s )
1496 {
1497 // if the character is lower case, capitalize it
1498 if ( 'a' &lt;= *s &amp;&amp; *s &lt;= 'z' )
1499 *s = *s - 'a' + 'A';
1500 }
1501 }
1502 };
1503
1504copy_string(S.BeginWriting(), S.EndWriting(), Capitalize());
1505</pre>
1506</div>
1507
1508
1509
1510<p>Does this show it better?
1511
1512
1513
1514
1515
1516<hr>
1517<pre>
1518Date: Thu, 17 Aug 2000 18:23:22 -0400
1519</pre>
1520
1521<pre class="email-quote">
1522 >I tried looking at the string header files but they
1523 >are awfully complicated.
1524</pre>
1525
1526<p>I'll explain things in a little <strong>more</strong> detail than you need, then so
1527that some of the stuff you see in these headers will make more sense.
1528I'll also answer your questions out of order.
1529
1530<p>First: the string hierarchy looks like this
1531
1532<a class="exact-uri" href="http://ScottCollins.net/Journal/discussion/string_hierarchy.gif">http://ScottCollins.net/Journal/discussion/string_hierarchy.gif</a>
1533
1534<p>The two most important headers are:
1535
1536<a class="exact-uri" href="http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsAReadableString.h">http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsAReadableString.h</a>
1537<a class="exact-uri" href="http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsAWritableString.h">http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsAWritableString.h</a>
1538
1539<p>These abstract classes, <span class="code">nsAReadable[C]String</span>, and
1540<span class="code">nsAWritable[C]String</span> are typically what you will want to use in the
1541interfaces of new code. If you write a piece of code that takes a
1542string for input, consider, e.g.,
1543
1544<div class="source-code">
1545<pre>
1546void consumes_a_string( const nsAReadableString&amp; aInput );
1547</pre>
1548</div>
1549
1550<p>If you write a piece of code that modifies a string, consider
1551
1552<div class="source-code">
1553<pre>
1554void modifies_a_string( nsAWritableString&amp; aResult );
1555</pre>
1556</div>
1557
1558
1559<p>When creating your own classes, member strings will typically be
1560<span class="code">nsString</span>s. When you can't avoid creating a short string that you
1561need only temporarily during a function, you will typically use
1562<span class="code">nsAutoString</span>. When someone passes you a raw pointer, or a raw
1563pointer and a length, representing a buffer of characters that you may
1564examine, but won't own, you can treat it like a string by wrapping it
1565in an <span class="code">nsLiteralString</span>, e.g.,
1566
1567<div class="source-code">
1568<pre>
1569void
1570reads_a_buffer( const PRUnichar* aInput, PRUint32 aInputLength )
1571 {
1572 nsLiteralString input(aInput, aInputLength);
1573 // doesn't allocate or copy
1574
1575 // ...
1576 }
1577</pre>
1578</div>
1579
1580<p>You will use <span class="code">nsLiteralString</span> around quoted constant strings as well,
1581though typically through the <span class="code">NS_LITERAL_STRING</span> macro, to avoid doing
1582a length calculation
1583
1584<div class="source-code">
1585<pre>
1586NS_LITERAL_STRING("x")
1587</pre>
1588</div>
1589
1590<p>expands to
1591
1592<div class="source-code">
1593<pre>
1594nsLiteralString(L"x", (sizeof(L"x")/sizeof(PRUnichar) - 1))
1595</pre>
1596</div>
1597
1598<p>if <span class="code">L</span> notation works as needed on your platform.
1599
1600Those are the basics. Now onto your questions:
1601
1602
1603<pre class="email-quote">
1604 >For example this won't compile. [...]
1605
1606 >str1 += L"abc " + str2 + L"def";
1607</pre>
1608
1609
1610<p><span class="code">L"abc "</span> makes a an object that is a <span class="code">const wchar_t[5]</span>, and none of
1611the string code knows about <span class="code">wchar_t</span>. The main reason is that
1612<span class="code">wchar_t</span> is not necessarily the right size (it can be 4 bytes under
1613gcc). If you wrap these constant expressions in <span class="code">NS_LITERAL_STRING</span>,
1614as described above, you should get the right thing, e.g.,
1615
1616<div class="source-code">
1617<pre>
1618str1 += NS_LITERAL_STRING("abc ") + str2 + NS_LITERAL_STRING("def");
1619</pre>
1620</div>
1621
1622
1623<pre class="email-quote">
1624 >Another one is:
1625 >function(const PRUnichar *foo);
1626 >call function(L"abc " + str2);
1627
1628 >It won't create a temporary nsString.
1629</pre>
1630
1631<p>This one, I have a quick and easy explanation for. If <span class="code">function</span> was
1632declared like this
1633
1634<div class="source-code">
1635<pre>
1636function( const nsAReadableString&amp; )
1637</pre>
1638</div>
1639
1640<p>then, no problem, since a <span class="code">nsPromiseConcatenation</span> (which was the
1641result of adding those two things together) <strong>is</strong> a readable string.
1642No other objects need to be created; no copying needs to be performed.
1643
1644<p>In all cases, we want the creation of <span class="code">nsString</span>s et al, to be
1645<span class="code">explicit</span>, since creation is unbelievably expensive, requiring heap
1646allocation, locks, copying, etc.
1647
1648<p>I hope this answers both your posts,
1649
1650
1651
1652
1653
1654<hr>
1655<pre>
1656Date: Thu, 17 Aug 2000 20:57:08 -0400
1657Subject: re our conversation
1658</pre>
1659
1660 return ToNewUnicode( nsLiteralCString(buffer) );
1661
1662
1663
1664
1665
1666
1667<hr>
1668<pre>
1669Date: Fri, 18 Aug 2000 02:52:45 -0400
1670Subject: Re: More questions and new string API
1671</pre>
1672
1673<pre class="email-quote">
1674 >1) How do I return a static string?
1675
1676 >const nsAReadableString&amp; foo() {return NS_LITERAL_STRING("x");}
1677 >errors on taking the address of a temporary variable.
1678</pre>
1679
1680<p>Unfortunately, <span class="code">NS_LITERAL_STRING</span>s definition is not particularly
1681amenable to this use. Instead, you would have to say something like
1682this:
1683
1684<div class="source-code">
1685<pre>
1686const nsAReadableString&
1687foo()
1688 {
1689#ifdef HAVE_CPP_2BYTE_WCHAR_T
1690 static nsLiteralString static_foo(L"x", 1);
1691#else
1692 static nsLiteralString static_foo;
1693 static PRBool initialized = PR_FALSE;
1694 if ( !initialized )
1695 {
1696 static_foo.AssignWithConversion("x", 1);
1697 initialized = PR_TRUE;
1698 }
1699#endif
1700 return static_foo;
1701 }
1702</pre>
1703</div>
1704
1705
1706<pre class="email-quote">
1707 >2) I'm using these with the STL library in an XPCOM component.
1708 >What type should I use with map? This doesn't work...
1709
1710 >typedef map<const nsAReadableString&, myType*> mapStringMyType;
1711 >mapStringMyType foo;
1712 >foo.find(nsAReadableString); - I want to find on a ReadableString
1713</pre>
1714
1715<p>I don't know what errors you are getting; but it probably doesn't work
1716because a reference isn't an assignable type. This is just a guess.
1717You may need to use
1718
1719<div class="source-code">
1720<pre>
1721map<const nsAReadableString*, myType*>
1722</pre>
1723</div>
1724
1725<p>If you actually want the map to manage ownership of the keys, then
1726you'll want to use a concrete type, e.g.,
1727
1728<div class="source-code">
1729<pre>
1730map<nsString, myType*>
1731</pre>
1732</div>
1733
1734<p>or perhaps
1735
1736<div class="source-code">
1737<pre>
1738map<nsSharedStringPtr, myType*>
1739</pre>
1740</div>
1741
1742<p>Or maybe there's something else wrong. Send me the error messages.
1743If you end up using a pointer, then of course you'll have to supply a
1744comparison function to the <span class="code">map</span> template. You won't be satisfied
1745with the default comparison of pointers :-) Sorry I couldn't answer
1746this one more completely.
1747
1748
1749<pre class="email-quote">
1750 >3) How do a get a raw PRUnichar pointer out of nsAReadableString
1751 >when I need to call something that wants 'unsigned short *'?
1752</pre>
1753
1754<p>The problem with this scenario is that an <span class="code">nsAReadableString</span> doesn't
1755promise that all its data is contiguous, nor that it is
1756zero-terminated, which is what I suspect you want in this case. If
1757the function you want to call can take {pointer, length} tuples, and
1758can consume the string in hunks without zero termination ... then you
1759can use <span class="code">copy_string</span> to pump the string into your function, see
1760
1761 <a class="exact-uri" href="http://ScottCollins.net/Journal/discussion/string_iterators.html">http://ScottCollins.net/Journal/discussion/string_iterators.html</a>
1762
1763<p>If not, and you absolutely have to have a contiguous zero-terminated
1764buffer, then there is a new facility (part of the DOMAPI branch) that
1765does what you need. It's not checked in on the trunk; it should
1766be in early next week. It is <span class="code">nsPromiseFlatString</span>. This class
1767promises a contiguous zero-terminated buffer; and has an <span class="code">operator
1768PRUnichar*</span> to produce a pointer to that buffer automatically. If the
1769underlying class <strong>is</strong> one that happens to be a single fragment and
1770zero-terminated, then, like <span class="code">nsPromiseSubstring</span> and
1771<span class="code">nsPromiseConcatenation</span>, this class merely holds a reference into the
1772original data. If, however, the underlying string is multi-fragment
1773or not zero-terminated, then <span class="code">nsPromiseFlatString</span> allocates a
1774contiguous buffer of appropriate size and copies the fragmented string
1775data to it. So given
1776
1777<div class="source-code">
1778<pre>
1779void ReadBuffer( PRUnichar* );
1780</pre>
1781</div>
1782
1783<p>You can call this as efficiently as possible with an arbitrary string
1784like so
1785
1786<div class="source-code">
1787<pre>
1788ReadBuffer( nsPromiseFlatString(aString) );
1789</pre>
1790</div>
1791
1792
1793<p>If the function you are calling needs to take ownership of the buffer
1794you hand it, then you will probably call <span class="code">ToNewUnicode</span> like so
1795
1796<div class="source-code">
1797<pre>
1798void ConsumeBuffer( PRUnichar* );
1799
1800ConsumeBuffer( ToNewUnicode(aString) );
1801</pre>
1802</div>
1803
1804<p>The global function <span class="code">ToNewUnicode</span> is declared in "nsReadableUtils.h",
1805and was only recently added to the build. It is currently being used
1806in the DOMAPI branch. It is part of the build, but the file
1807"dlldeps.c" in XPCOM may need to be modified to ensure it is exported
1808on your platform if you are building the tip.
1809
1810Needless to say, you want to avoid functions that require bare
1811pointers for several reasons: (a) they typically assume
1812zero-termination, which is not guaranteed by the normal encodings; (b)
1813they require contiguous allocation, which may not be possible; (c)
1814they scan for the end of the string, at linear cost (if the encoding
1815makes it possible at all), when the length could be known in advance.
1816If you have to do it, the above mechanisms work, but be aware of the
1817cost and the potential need to copy.
1818
1819
1820<pre class="email-quote">
1821 >4) How do I declare a local variable to hold a nsAReadableString?
1822 >and a member variable?
1823</pre>
1824
1825<p><span class="code">nsAReadableString</span> is an abstract type. So you can't have a concrete
1826instance of it. All strings in the hierarchy are readable strings.
1827If you just want a reference to a readable string, you can say, e.g.,
1828
1829<div class="source-code">
1830<pre>
1831struct foo
1832 {
1833 const nsAReadableString&amp; mString;
1834 // ...
1835
1836 foo( const nsAReadableString&amp; aString ) : mString(aString) { }
1837 };
1838</pre>
1839</div>
1840
1841<p>...similarly with pointers; but I suspect you are looking for
1842something more concrete. An <span class="code">nsString</span> is a <span class="code">nsAReadableString</span>, and
1843is the typical thing you want as a member variable. An <span class="code">nsAutoString</span>
1844is also an <span class="code">nsAReadableString</span> and is typically what you would use for
1845a short (in length) temporary (in lifetime) local variable, as I
1846mentioned in my previous post.
1847
1848
1849<pre class="email-quote">
1850 >5) If I call a function that returns a PRUnichar* and I want t
1851 >use it as a nsAReadableString should I wrap it in a
1852 >nsLiteralString?
1853</pre>
1854
1855<p>Yes, though remember, an <span class="code">nsLiteralString</span> assumes the lifetime of the
1856underlying data is under someone else's control. If the called
1857function gives you a buffer that you need to <span class="code">delete</span>, you will have
1858to manage that yourself. Currently, people often use <span class="code">nsXPIDLString</span>
1859to handle that. XPIDL strings are <strong>not</strong> part of the hierarchy. They
1860are only used as a sort of string-<span class="code">auto_ptr</span>. However, I'm
1861integrating their functionality into <span class="code">nsString</span>. There is no problem
1862in wrapping the same pointer in both as two separate local variables,
1863one to give you the readable interface, and one to manage the
1864lifetime.
1865
1866<p>If it's OK with you, I'd like to post this reply (including your
1867quoted questions) to n.p.m.xpcom and also put a copy near the string
1868iterator discussion I provided a link to above, so that other people
1869with similar questions can see these answers.
1870
1871<p>Hope this helps,
1872
1873
1874
1875
1876
1877<hr>
1878<pre>
1879Date: Sun, 3 Sep 2000 03:52:17 -0400
1880</pre>
1881
1882<p>In article <8nu9m2$eo14@secnews.netscape.com>, "Jon Smirl"
1883<jonsmirl@mediaone.com> wrote:
1884
1885> I have the new strings up and running in my app. They work as
1886> advertised and
1887> I haven't found any bugs. Thanks for the good job in designing and
1888> implementing them. Here's are a summary of issues I've encountered
1889> so far...
1890
1891<p>Thanks, and I appreciate your comments and insights.
1892
1893
1894>
1895> 1) Should there be a nsSegmentedString derived from nsString instead
1896> of building segment support into nsString? None of my strings are
1897> segmented but
1898> I keep executing code that is supports it. nsPromiseFlatString would
1899> be trivial in the non-segmented case.
1900
1901<p>The general case is that a string does not promise to have contiguous
1902data. A specific case is that, for some implementations, it does.
1903You couldn't do it the other way around, because a segmented string
1904couldn't satisfy all the promises of a flat string. However, through
1905the use of chunky iterators, operating on strings that happen to be
1906flat is very efficient. In fact, <span class="code">nsPromiseFlatString</span> is trivial in
1907the non-segmented case. In addition, I'll be adding an abstract flat
1908class into the hierarchy, which will present additional interface ...
1909in your local routines where you actually have declared a concrete
1910string instance that happens to be flat, the compiler will give you
1911the benefit of using the flat specific routines (e.g., a substring
1912object over a flat string is simpler than the general purpose
1913substring). I need to be cautious about this, though, since I don't
1914automatically want people propagating the flat type through their
1915interfaces. That would put us in the same boat we're in right now ...
1916where routines only work on a specific kind of string, which denies
1917other parts of the code the opportunity to use an implementation
1918beneficial to its specific needs, and typically for no good reason.
1919
1920>
1921> 2) Should nsAWritableString have a way to get the buffer and then
1922> return it?
1923> I need to get the buffer to pass it to OS calls. I'm doing this now
1924> by passing around nsStrings instead of the interface. If I just use
1925> the interface I encur an extra copy since I have to use a temporary
1926> buffer.
1927
1928<p>A specific string implementation could promise this, but in general, a
1929writable could not. After all, a writable doesn't even guarantee
1930contiguous storage. To some degree, this is what
1931<span class="code">nsPromiseFlatString</span> is for. However, this is a readable promise
1932only. It will also be the case that <span class="code">ns[C]String</span>s, in the very near
1933future will be able to just assume ownership of an arbitrary buffer
1934allocated on the free store with the XPCOM allocators ... getting one
1935to give up its buffer, on the other hand, presents some problems. Do
1936you have a lot of places where the system writes into your string
1937buffer space? Or do you have a lot of system routines that return you
1938new buffers? I can imagine using <span class="code">nsPromiseFlatString</span> for this, but
1939what happens when the OS alters the underlying data? If the promise
1940had generated that flat data on behalf of a multi-fragment string,
1941should it now put the changes back? It's possible to do, I just want
1942to know if it's correct to allow this situation to happen.
1943
1944
1945
1946>
1947> 3) There needs to be a NS_LITERAL_CHAR() to go along with
1948> NS_LITERAL_STRING().
1949
1950<p>OK.
1951
1952
1953
1954> Having NS_LITERAL_STRING() all over the code clutters
1955> it up and makes it hard to tell what the code is doing, could we
1956> have a standard short alias for this?
1957
1958<p>Yes, I'll try to think of something ... perhaps <span class="code">NS_LSTR</span>?
1959
1960
1961> 4) nsLiteralString should support n.ToInteger(&error);
1962
1963<p><span class="code">ToInteger</span> is actually a bad interface. It's only good if your
1964entire string is the number; this encourages you to edit your string
1965until it is one, or perhaps copy the numeric part to another string.
1966Better if you just <span class="code">sscanf</span> a string (don't know if I can provide
1967that in the general case, but I'm thinking about it), or else use
1968regular C++ extractors (which wouldn't be too hard for me to
1969provide), or else I could give you a <span class="code">ToInteger</span> that works on a pair
1970of iterators, extracting the integer from the digits between them.
1971
1972>
1973> 5) There should be a global define for an interface to a readonly
1974> empty string.
1975
1976<p>Yes, there will be.
1977
1978
1979>
1980> 6) Something is wrong with concatenation....
1981
1982<p>Hopefully I've fixed this now.
1983
1984
1985
1986> 8) A forward definition is missing in the h files
1987
1988<p>I'll check it out.
1989
1990
1991
1992<p>My understanding is that you have already found the answers to your
1993other questions.
1994
1995<p>I hope this helps,
1996
1997
1998
1999
2000<hr>
2001<pre>
2002Date: Wed, 20 Sep 2000 17:32:13 -0400
2003Subject: Re: how to free an nsString::ToNewCString
2004</pre>
2005
2006<pre class="email-quote">
2007 >What's the current approved way to free an nsString::ToNewCString?
2008</pre>
2009
2010<p><span class="code">nsMemory::Free</span>
2011
2012
2013
2014
2015
2016<hr>
2017
2018<p>You use several <span class="code">NS_ConvertASCIItoUTF16("...").get()</span>, these should be
2019
2020 NS_LITERAL_STRING("...").get()
2021
2022<p>Don't do this to the very first case where you aren't wrapping an actual literal string.
2023The first instance would should exploit <span class="code">NS_LITERAL_STRING</span> technology as well,
2024around the initial declarations of the strings ... probably want to do this with
2025<span class="code">NS_NAMED_LITERAL_STRING</span>.
2026
2027
2028
2029<hr>
2030<pre>
2031Date: Thu, 12 Oct 2000 00:57:28 -0400
2032Subject: string answers
2033</pre>
2034
2035<div class="source-code">
2036<pre>
2037nsresult
2038DoSomething( nsAWritableString&amp; answer )
2039 {
2040 nsresult rv;
2041
2042 nsXPIDLString registry_data;
2043 Fetch("key", getter_Shares(registry_data));
2044
2045 nsLiteralString path(not_my_string);
2046
2047 PRInt32 first_colon = path.FindChar(PRUnichar(':'));
2048 if ( first_colon != -1 )
2049 {
2050 // convert ... extract path from |path|
2051 nsCOMPtr<nsILocalFile> localFile( do_CreateInstance(CID, &rv)
2052);
2053 if ( localFile )
2054 {
2055
2056localFile->SetPersistentDescriptor(NS_ConvertUTF16toUTF8(path));
2057
2058 nsXPIDLString converted_path;
2059 localFile->GetUnicodePath(getter_Copies(converted_path));
2060 answer = converted_path.get();
2061 }
2062 }
2063 else
2064 {
2065 answer = path;
2066 }
2067
2068
2069 return rv;
2070 }
2071</pre>
2072</div>
2073
2074
2075
2076
2077
2078<hr>
2079<pre>
2080Date: Thu, 12 Oct 2000 02:03:49 -0400
2081Subject: Re: and the answer is ...
2082</pre>
2083
2084<p>You can see from the line of code that you're on, that this should
2085have been fine. <span class="code">nsMemory::Alloc</span> would be asked to allocate a 1 byte
2086object. But it failed trying to allocate that. Which suggests that
2087the allocator was busy and non-reentrant and the debugger tried to
2088misuse it. Yes?
2089
2090<p>Of course, this doesn't solve your problem. Perhaps we need to go
2091back to the idea of a function that returns a pointer to the first
2092hunk of the string.
2093
2094<div class="source-code">
2095<pre>
2096const char*
2097debug_string( const nsAReadableCString& aCString )
2098 {
2099 nsReadingIterator&lt;char&gt; iter;
2100 aCString.BeginReading(iter);
2101 return aCString.IsEmpty() ? "" : iter.get();
2102 }
2103</pre>
2104</div>
2105
2106<p>This code should work regardless of what the allocator is doing. The
2107downsides are (a) it only returns the first hunk of the string, in the
2108case of a multi-fragment string; and (b) that hunk <strong>might</strong> not be
2109zero-terminated.
2110
2111<p>Hope this helps,
2112
2113
2114
2115
2116
2117<hr>
2118<pre>
2119Date: Thu, 12 Oct 2000 08:30:32 -0400
2120Subject: Re: Self healing the cache :-)
2121</pre>
2122
2123<p>At 3:04 PM -0400 10/11/00, Mike Shaver wrote:
2124<pre class="email-quote">
2125 >NS_LITERAL_STRING(NS_XPCOM_SHUTDOWN_OBSERVER_ID);
2126</pre>
2127
2128<p>Macro ugliness makes <span class="code">NS_LITERAL_STRING</span> inappropriate for use over
2129other macros. In other words:
2130
2131<div class="source-code">
2132<pre>
2133NS_LITERAL_STRING("foo")
2134</pre>
2135</div>
2136
2137<p>is <strong>good</strong>.
2138
2139<div class="source-code">
2140<pre>
2141#define FOO "foo"
2142NS_LITERAL_STRING(FOO)
2143</pre>
2144</div>
2145
2146<p>is <strong>bad</strong>. Why? Because it turns into
2147
2148<div class="source-code">
2149<pre>
2150nsLiteralString(LFOO, sizeof(LFOO)...
2151</pre>
2152</div>
2153
2154<p>and there is no <span class="code">LFOO</span>. Sorry. If you have to do this to a
2155macro-ized string, do the magic by hand, e.g.,
2156
2157<div class="source-code">
2158<pre>
2159nsLiteralString(FOO, sizeof(FOO)/sizeof(PRUnichar)
2160 + sizeof(PRUnichar('\0')))
2161</pre>
2162</div>
2163
2164<p>or else if you don't care that <span class="code">nsLiteralString</span> will scan for the
2165length, just say
2166
2167<div class="source-code">
2168<pre>
2169nsLiteralString(FOO)
2170</pre>
2171</div>
2172
2173<p>Hope this helps,
2174
2175
2176
2177
2178
2179<hr>
2180<pre>
2181Date: Thu, 12 Oct 2000 08:36:14 -0400
2182Subject: Re: Self healing the cache :-)
2183</pre>
2184
2185<p>Actually, I'm not even sure you can do it by hand, since you didn't
2186
2187<div class="source-code">
2188<pre>
2189#define FOO L"foo"
2190</pre>
2191</div>
2192
2193<p>and <strong>can't</strong> do that cross-platform. The other way around this is to
2194define a global instead of a macro, that is, instead of saying
2195
2196<div class="source-code">
2197<pre>
2198#define FOO "foo"
2199</pre>
2200</div>
2201
2202<p>at the top of your file, say
2203
2204<div class="source-code">
2205<pre>
2206NS_NAMED_LITERAL_STRING(FOO, "foo")
2207</pre>
2208</div>
2209
2210<p>or else, if the macro was used only in one spot ... perhaps you could
2211just eliminate the macro in favor of <span class="code">NS_NAMED_LITERAL</span> in situ.
2212
2213<p>Arghh. In this case, you may be stuck with the extra work of
2214<span class="code">AssignWithConversion</span>.
2215
2216
2217
2218
2219
2220<hr>
2221<pre>
2222Date: Sun, 3 Dec 2000 16:38:07 -0400
2223Subject: Re: another copy_string question
2224</pre>
2225
2226<pre class="email-quote">
2227 >Is there a way to tell, inside the write() sink, if one is in the
2228 >final hunk? I need to do some special processing at the end.
2229</pre>
2230
2231<p>No, there isn't. But you could move such special processing into the
2232destructor of the sink. Remember, the sink is passed by reference, so
2233you can exactly control its lifetime.
2234
2235<div class="source-code">
2236<pre>
2237{
2238 MySink sink;
2239 nsReadingIterator&lt;PRUnichar&gt; sourceStart = aStr.BeginReading();
2240 nsReadingIterator&lt;PRUnichar&gt; sourceEnd = aStr.EndReading();
2241 copy_string(sourceStart, sourceEnd, sink);
2242 // |sink| destructor executed here
2243}
2244</pre>
2245</div>
2246
2247<p>Hope this helps,
2248
2249
2250
2251
2252
2253<hr>
2254<pre>
2255Date: Fri, 15 Dec 2000 20:02:08 -0400
2256Subject: fragment of code
2257</pre>
2258
2259<div class="source-code">
2260<pre>
2261nsPromiseFlatString flatKey(aReadable);
2262
2263flatKey.get()
2264</pre>
2265</div>
2266
2267
2268
2269
2270
2271
2272<hr>
2273<pre>
2274Date: Tue, 16 Jan 2001 16:47:37 -0400
2275Subject: Re: a few string questions...
2276</pre>
2277
2278>I've accumulated a few questions I've been wanting to ask you, mostly
2279>about string stuff. Nothing urgent, but I want to ask them before I
2280>forget. So here goes...:
2281>
2282>1) Is it acceptable to use nsLiteralCString or nsLiteralString on
2283>something that's not a literal? This can be useful in some places,
2284>for example, to convert a char* to PRUnichar*:
2285>
2286>PRUnichar* new = ToNewUnicode(nsLiteralCString(myCharPtr));
2287
2288<p>This is explicitly allowed. That's why I'm proposing to change the
2289names of those classes to <span class="code">nsLocal[C]String</span>.
2290
2291
2292>2) Should nsString2x.h and nsString2x.cpp go away? They look like a
2293>never-completed rewrite or something...
2294
2295<p>Yes. They should go away. They are uncompleted [old] bullshit,
2296exactly as you diagnosed.
2297
2298<p>I'll look into the other two questions.
2299
2300
2301
2302
2303
2304<hr>
2305<pre>
2306Date: Thu, 1 Feb 2001 15:12:41 -0400
2307Subject: Re: [Fwd: bad string, bad string]
2308</pre>
2309
2310<p>We've been removing implicit conversion operators because they
2311_always_ lead to trouble. Usually they make it harder to pick the
2312right function when overloading is involved and in the past they have
2313led to huge performance suckage because we ended up doing conversions
2314when we didn't need to because the implicit operator made us pick the
2315wrong function.
2316
2317<p>It's borderline when the class implements something that is <strong>so</strong>
2318close, as with a guaranteed flat string or an <span class="code">nsCOMPtr</span> ... but the
2319general recommendation is to avoid implicit conversions.
2320
2321<p>See bug #53057.
2322
2323
2324
2325
2326
2327<hr>
2328<pre>
2329Date: Tue, 6 Feb 2001 18:52:23 -0400
2330Subject: seeking review for bug #57087
2331</pre>
2332
2333<p> bug:
2334 <a class="exact-uri" href="http://bugzilla.mozilla.org/show_bug.cgi?id=57087">http://bugzilla.mozilla.org/show_bug.cgi?id=57087</a>
2335
2336 patch:
2337 <a class="exact-uri" href="http://bugzilla.mozilla.org/showattachment.cgi?attach_id=24576">http://bugzilla.mozilla.org/showattachment.cgi?attach_id=24576</a>
2338
2339<p>This patch is supposed to add the ability to define very long literal
2340strings more easily by breaking lines, e.g.,
2341
2342<div class="source-code">
2343<pre>
2344NS_MULTILINE_LITERAL( NS_L("This is the start of a very long line")
2345 NS_L(" which actually continues across")
2346 NS_L(" a couple more.") )
2347</pre>
2348</div>
2349
2350<p>The main danger in this scheme is callers who omit the inner <span class="code">NS_L</span>
2351wrapping. Though I believe this will be caught at compile time as the
2352wrong type initializer.
2353
2354<p>Seeking input from everybody, and waterson in particular.
2355
2356
2357
2358
2359
2360<hr>
2361<pre>
2362Date: Wed, 14 Feb 2001 16:09:10 -0400
2363Subject: Re: Question...
2364</pre>
2365
2366<p>There are some utilities in "xpcom/ds/nsReadableUtils.h". In
2367particular, if you want to get back a new heap-allocated ASCII string
2368with the minimal work, you would say
2369
2370<div class="source-code">
2371<pre>
2372PRUnichar* sourceChars = ...;
2373
2374char* destChars = ToNewCString(nsLiteralString(sourceChars));
2375</pre>
2376</div>
2377
2378
2379<p>It's more efficient if you happen to already know the length. If you
2380don't, don't bother counting, that's what I'll do in the constructor
2381for <span class="code">nsLiteralString</span>. If you do, then call like this
2382
2383<div class="source-code">
2384<pre>
2385destChars = ToNewCString( nsLiteralString(sourceChars, length) );
2386</pre>
2387</div>
2388
2389<p>Other routines in that file will help you if, for instance, you wanted
2390to translate into a buffer you had already allocated.
2391
2392<p>Hope this helps,
2393
2394
2395
2396
2397
2398<hr>
2399<pre>
2400Date: Fri, 23 Feb 2001 03:12:58 -0400
2401Subject: string snippet
2402</pre>
2403
2404<div class="source-code">
2405<pre>
2406nsCString aInput;
2407
2408
2409
2410nsReadingIterator&lt;char&gt; search_start;
2411aInput.BeginReading(search_start);
2412
2413nsReadingIterator&lt;char&gt; search_end;
2414aInput.EndReading(search_end);
2415
2416if ( FindCharInReadable(':', search_start, search_end) )
2417 {
2418 ++search_start;
2419 return ToNewCString( Substring(aInput, search_start, search_end)
2420);
2421 }
2422</pre>
2423</div>
2424
2425
2426
2427
2428
2429
2430<hr>
2431<pre>
2432Date: Wed, 7 Mar 2001 19:44:08 -0400
2433Subject: string help
2434</pre>
2435
2436<p>Here you go, Mike:
2437
2438 http://scottcollins.net/journal/discussion/mjudge-scratch.cpp
2439
2440
2441
2442
2443
2444
2445<hr>
2446<pre>
2447Date: Fri, 9 Mar 2001 20:56:07 -0400
2448Subject: Re: string assertions
2449</pre>
2450
2451<p>If you get an iterator into a string and you advance it all the way to
2452the end of the string, and then <strong>keep</strong> trying to advance it, you hit
2453this assert. This could happen, for example if you tried to copy 10
2454characters out of a 9 character string. I've tried to make this
2455impossible to get to. As far as I know, all my routines trim requests
2456in advance of manipulating iterators. When you see this, you should
2457get the stack. That will take you right to the bad spot.
2458
2459
2460
2461
2462
2463<hr>
2464<pre>
2465Date: Sat, 31 Mar 2001 11:04:03 -0400
2466Subject: Re: Sun bustage and string advice
2467</pre>
2468
2469<p>You do know you are comparing two pointers now? It seems unlikely
2470those two pointers would ever be the same pointer. You probably want
2471to say something like
2472
2473<div class="source-code">
2474<pre>
2475NS_LITERAL_STRING("foo").Equals(aTopic) // or
2476
2477NS_LITERAL_STRING("foo") == nsLiteralString(aTopic)
2478</pre>
2479</div>
2480
2481<p>...so that you compare the <strong>contents</strong> of two strings. Right now,
2482you're just testing to see if two pointers both point to the same
2483location in memory. A lot of people make this mistake. I would like
2484to make it obvious to people that comparing two pointers does not
2485compare strings. Can you tell me what gave you that impression so
2486that I can figure out how to better educate people not to do this? By
2487the way, it's not that I don't <strong>want</strong> to make this compare two
2488strings; it's that in C++, you can't override operations for built-in
2489types. And pointers are built-in types. So I can't make
2490<span class="code">operator==(const PRUnichar*, const PRUnichar*)</span> do anything different
2491than it already does, which is the same thing it does for any other
2492pointer.
2493
2494
2495
2496
2497
2498
2499</div>
2500
2501
2502
2503<!-- .................................................................End Matter -->
2504
2505
2506
2507 </body>
2508</html>
Note: See TracBrowser for help on using the repository browser.

© 2024 Oracle Support Privacy / Do Not Sell My Info Terms of Use Trademark Policy Automated Access Etiquette