1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
|
---|
2 | <html>
|
---|
3 | <head>
|
---|
4 | <title>an incomplete guide to mozilla/string</title>
|
---|
5 |
|
---|
6 | <link rel="stylesheet" href="http://www.mozilla.org/projects/string/string-guide.css" title="remote stylesheet" type="text/css">
|
---|
7 | <link rel="alternate stylesheet" href="string-guide.css" title="local stylesheet" type="text/css">
|
---|
8 | </head>
|
---|
9 | <body>
|
---|
10 | <!-- ----|---------|---------|---------|---------|---------|---------|---------| -->
|
---|
11 | <!-- ...............................................................Front Matter -->
|
---|
12 | <h1>an incomplete guide to <a class="exact-uri" href="http://lxr.mozilla.org/seamonkey/source/string/">mozilla/string</a></h1>
|
---|
13 | <h1><font color="red">This document is now deprecated in favor of <a href="http://www.mozilla.org/projects/xpcom/string-guide.html">The new string guide</a>.</font></h1>
|
---|
14 | <div class="author-note">
|
---|
15 | <p>by <a href="http://ScottCollins.net/">Scott Collins</a><!-- /p -->
|
---|
16 | <p>last modified 8 April 2001<!-- /p -->
|
---|
17 | </div>
|
---|
18 |
|
---|
19 | <div class="abstract">
|
---|
20 | <p>
|
---|
21 | <h1>Abstract</h1>
|
---|
22 | This document <span class="LXRSHORTDESC">provides
|
---|
23 | an <a href="#users_guide">introduction</a> to the design and use of the string classes in mozilla,
|
---|
24 | <a href="#implementors_guide">detailed information</a> on their implementation and how one may extend them,
|
---|
25 | and <a href="#faq">answers</a> to frequently asked questions about strings</span>.
|
---|
26 | </p>
|
---|
27 | </div>
|
---|
28 |
|
---|
29 |
|
---|
30 |
|
---|
31 | <h2><a name="contents">contents</a></h2>
|
---|
32 |
|
---|
33 | <div class="contents">
|
---|
34 | <ul>
|
---|
35 | <li><a href="#users_guide" >user's guide</a></li>
|
---|
36 | <li><a href="#implementors_guide">implementor's guide</a></li>
|
---|
37 | <li><a href="#faq" >frequently asked questions</a></li>
|
---|
38 | </ul>
|
---|
39 | </div>
|
---|
40 |
|
---|
41 | <p>
|
---|
42 | Please direct all comments, requests, and contributions to,
|
---|
43 | in order of preference,
|
---|
44 | the tracking bug <a href="http://bugzilla.mozilla.org/show_bug.cgi?id=70076">#70076</a> for this document,
|
---|
45 | the author <a class="exact-uri" href="mailto:[email protected]?subject=string-guide">[email protected]</a>, and/or
|
---|
46 | the newsgroup <a class="exact-uri" href="news:netscape.public.mozilla.xpcom">news:netscape.public.mozilla.xpcom</a>
|
---|
47 | (should there be a strings newsgroup?)
|
---|
48 | </p>
|
---|
49 |
|
---|
50 | <div class="author-note">
|
---|
51 | <p>
|
---|
52 | A note to potential editors:
|
---|
53 | don't even <strong>consider</strong> modifying this document with an HTML editor.
|
---|
54 | That would destroy the internal formatting,
|
---|
55 | and make patches unmanagable.
|
---|
56 | </p>
|
---|
57 | </div>
|
---|
58 |
|
---|
59 |
|
---|
60 |
|
---|
61 |
|
---|
62 | <!-- ...............................................................User's Guide -->
|
---|
63 | <hr>
|
---|
64 | <h1><a name="users_guide">user's guide</a></h1>
|
---|
65 |
|
---|
66 | <div class="author-note">
|
---|
67 | <p>
|
---|
68 | Strings in mozilla are a world apart from <span class="code">char*</span>s.
|
---|
69 | If you don't know why they are different,
|
---|
70 | this section is the place for you to start.
|
---|
71 | If you're already familiar with the hierarchy of string classes in mozilla,
|
---|
72 | then you might want to skip ahead to the <a href="#implementors_guide">implementor's guide</a>
|
---|
73 | or the <a href="#faq">FAQ</a>.
|
---|
74 | </p>
|
---|
75 | </div>
|
---|
76 |
|
---|
77 | <div class="contents">
|
---|
78 | <ul>
|
---|
79 | <li><a href="#users_guide_introduction">introduction</a></li>
|
---|
80 | <li><a href="#users_guide_how_to" >using the string classes correctly; using the correct string class</a></li>
|
---|
81 | <li><a href="#users_guide_iterators" >using string iterators</a></li>
|
---|
82 | <li><a href="#users_guide_summary" >summary</a></li>
|
---|
83 | </ul>
|
---|
84 | </div>
|
---|
85 |
|
---|
86 | <h2><a name="users_guide_introduction">introduction</a></h2>
|
---|
87 | <h3>what and what isn't a string?</h3>
|
---|
88 | <p>
|
---|
89 | A string is an opaque container holding a, possibly zero length, linear sequence of characters.
|
---|
90 | Understanding the implications of this statement is the foundation for understanding all mozilla's string classes.
|
---|
91 | </p>
|
---|
92 |
|
---|
93 | <h3>readable and writable</h3>
|
---|
94 | <h3>dependent strings</h3>
|
---|
95 | <h3>flat strings</h3>
|
---|
96 | <h3>encoding</h3>
|
---|
97 | <h3>sharing</h3>
|
---|
98 |
|
---|
99 | <h2><a name="users_guide_how_to">using the string classes correctly; using the correct string class</a></h2>
|
---|
100 | <h3>basic string operations</h3>
|
---|
101 | <h4>comparison</h4>
|
---|
102 | <h4>concatenation</h4>
|
---|
103 | <h4>substrings</h4>
|
---|
104 | <h4>find and replace</h4>
|
---|
105 | <h3>conversions</h3>
|
---|
106 | <h4>calling a function that expects a different kind of string</h4>
|
---|
107 | <h4>converting between string classes</h4>
|
---|
108 | <h4>converting between encodings</h4>
|
---|
109 | <h3>selecting the right string class</h3>
|
---|
110 | <h4>user string classes</h4>
|
---|
111 | <h4>selecting the right string class for a parameter</h4>
|
---|
112 | <h4>selecting the right string class for a local variable</h4>
|
---|
113 | <h4>selecting the right string class for a member variable</h4>
|
---|
114 | <h4>selecting the right string class for a return value</h4>
|
---|
115 | <h4>selecting the right string class in IDL</h4>
|
---|
116 | <h3>dont's</h3>
|
---|
117 |
|
---|
118 | <h2><a name="users_guide_iterators">using string iterators</a></h2>
|
---|
119 | <h3>what is an iterator?</h3>
|
---|
120 | <h3>reading iterators and writing iterators</h3>
|
---|
121 | <h3>`chunky' iterating for efficiency</h3>
|
---|
122 | <h3><span class="code">copy_string</span>, character sources and sinks</h3>
|
---|
123 | <h3>encoding conversion iterators</h3>
|
---|
124 |
|
---|
125 | <h2><a name="users_guide_summary">summary</a></h2>
|
---|
126 |
|
---|
127 |
|
---|
128 | <!-- ........................................................Implementor's Guide -->
|
---|
129 | <hr>
|
---|
130 | <h1><a name="implementors_guide">implementor's guide</a></h1>
|
---|
131 |
|
---|
132 | <div class="author-note">
|
---|
133 | <p>
|
---|
134 |
|
---|
135 | </p>
|
---|
136 | </div>
|
---|
137 |
|
---|
138 | <div class="contents">
|
---|
139 | <ul>
|
---|
140 | <!-- li></li -->
|
---|
141 | </ul>
|
---|
142 | </div>
|
---|
143 |
|
---|
144 |
|
---|
145 |
|
---|
146 | <!-- ........................................................................FAQ -->
|
---|
147 | <hr>
|
---|
148 | <h1><a name="faq">frequently asked questions</a></h1>
|
---|
149 |
|
---|
150 | <div class="author-note">
|
---|
151 | </div>
|
---|
152 |
|
---|
153 | <div class="contents">
|
---|
154 | <ul>
|
---|
155 | <!--
|
---|
156 | <li>
|
---|
157 | I have a wide string, i.e., an instance of a class derived from <span class="code">nsAString</span>
|
---|
158 | <ul>
|
---|
159 | <li>I want a pointer to the characters</span>
|
---|
160 | <li>I want a narrow string</li>
|
---|
161 | <li>I want to <span class="code">printf</span> it</li>
|
---|
162 | </ul>
|
---|
163 | </li>
|
---|
164 | <li>
|
---|
165 | I have a <span class="code">PRUnichar*</span>
|
---|
166 | <ul>
|
---|
167 | <li>I want a wide string</span>
|
---|
168 | <li>I want a narrow string</span>
|
---|
169 | <li>I want to <span class="code">printf</span> it</li>
|
---|
170 | </ul>
|
---|
171 | </li>
|
---|
172 | <li>
|
---|
173 | I have a narrow string, i.e., an instance of a class derived from <span class="code">nsACString</span>
|
---|
174 | <ul>
|
---|
175 | <li>I want a pointer to the characters</span>
|
---|
176 | <li>I want a narrow string</li>
|
---|
177 | <li>I want to <span class="code">printf</span> it</li>
|
---|
178 | </ul>
|
---|
179 | </li>
|
---|
180 | <li>
|
---|
181 | I have a <span class="code">char*</span>
|
---|
182 | <ul>
|
---|
183 | <li>I want a wide string</span>
|
---|
184 | <li>I want a narrow string</span>
|
---|
185 | </ul>
|
---|
186 | </li>
|
---|
187 | <li>
|
---|
188 | I have a literal character sequence, e.g., <span class="code">"Hello, World!\n"</span>
|
---|
189 | <ul>
|
---|
190 | <li>I want a wide string</span>
|
---|
191 | <li>I want a narrow string</span>
|
---|
192 | </ul>
|
---|
193 | </li>
|
---|
194 | <li>What's the best way to return a string?</li>
|
---|
195 | <li>How can I get a pointer to the characters in a string?</li>
|
---|
196 | <li>How can I <span class="code">printf</span> a string?</li>
|
---|
197 | </ul>
|
---|
198 | -->
|
---|
199 | </div>
|
---|
200 |
|
---|
201 |
|
---|
202 | <table class="chart">
|
---|
203 | <tr>
|
---|
204 | <th></th>
|
---|
205 | <th colspan="5">you have some <span class="code">char</span>s</th>
|
---|
206 | </tr>
|
---|
207 | <tr>
|
---|
208 | <th>you want</th>
|
---|
209 | <th><span class="code">'x'</span></th>
|
---|
210 | <th><span class="code">char c</span></th>
|
---|
211 | <th><span class="code">"foo"</span></th>
|
---|
212 | <th><span class="code">char* cp</span></th>
|
---|
213 | <th><span class="code">nsACString& cs</span></th>
|
---|
214 | </tr>
|
---|
215 | <tr>
|
---|
216 | <th class="row-label"><span class="code">char</span></th>
|
---|
217 | <td colspan="2">.</td>
|
---|
218 | <!-- "foo" --> <td><span class="code">[]</span></td>
|
---|
219 | <!-- char* cp --> <td><span class="code">[]</span></td>
|
---|
220 | <!-- nsACString& cs --> <td><a href="#faq_how_to_extract_a_character">extract a character</a></td>
|
---|
221 | </tr>
|
---|
222 | <tr>
|
---|
223 | <th class="row-label"><span class="code">PRUnichar</span></th>
|
---|
224 | <!-- 'x' --> <td><span class="code">PRUnichar('x')</span></td>
|
---|
225 | <!-- char c --> <td><span class="code">PRUnichar(c)</span></td>
|
---|
226 | <td colspan="3"><a href="#faq_how_to_convert_encoding">convert encoding</a>, <a href="#faq_how_to_extract_a_character">extract a character</a></td>
|
---|
227 | </tr>
|
---|
228 | <tr>
|
---|
229 | <th class="row-label"><span class="code">char*</span></th>
|
---|
230 | <!-- 'x' --> <td><span class="code">&</span></td>
|
---|
231 | <!-- char c --> <td><span class="code">&</span></td>
|
---|
232 | <!-- "foo" --> <td><span class="code">&</span></td>
|
---|
233 | <!-- char* cp --> <td>.</td>
|
---|
234 | <!-- nsACString& cs --> <td><a href="#faq_how_to_get_a_pointer">get a pointer</a></td>
|
---|
235 | </tr>
|
---|
236 | <tr>
|
---|
237 | <th class="row-label"><span class="code">PRUnichar*</span></th>
|
---|
238 | <td colspan="5"><a href="#faq_how_to_convert_encoding">convert encoding</a>, <a href="#faq_how_to_get_a_pointer">get a pointer</a></td>
|
---|
239 | </tr>
|
---|
240 | <tr>
|
---|
241 | <th class="row-label"><span class="code">nsACString</span></th>
|
---|
242 | <!-- 'x' --> <td><span class="code">NS_LITERAL_CSTRING("x")</span></td>
|
---|
243 | <!-- char c --> <td><a href="#faq_how_to_make_a_string">make a string</a></td>
|
---|
244 | <!-- "foo" --> <td><span class="code">NS_LITERAL_CSTRING("foo")</td>
|
---|
245 | <!-- char* cp --> <td><a href="#faq_how_to_make_a_string">make a string</a></td>
|
---|
246 | <!-- nsACString& cs --> <td>.</td>
|
---|
247 | </tr>
|
---|
248 | <tr>
|
---|
249 | <th class="row-label"><span class="code">nsAString</span></th>
|
---|
250 | <!-- 'x' --> <td><span class="code">NS_LITERAL_STRING("x")</span></td>
|
---|
251 | <!-- char c --> <td><a href="#faq_how_to_convert_encoding">convert encoding</a></td>
|
---|
252 | <!-- "foo" --> <td><span class="code">NS_LITERAL_STRING("foo")</span></td>
|
---|
253 | <td colspan="2"><a href="#faq_how_to_convert_encoding">convert encoding</a></td>
|
---|
254 | </tr>
|
---|
255 | <tr>
|
---|
256 | <th class="row-label">to call <span class="code">printf</span></th>
|
---|
257 | <td colspan="4">.</td>
|
---|
258 | <!-- nsACString& cs --> <td><a href="#faq_how_to_call_printf">call <span class="code">printf</span></a></td>
|
---|
259 | </tr>
|
---|
260 | </table>
|
---|
261 |
|
---|
262 | <table class="chart">
|
---|
263 | <tr>
|
---|
264 | <th></th>
|
---|
265 | <th colspan="3">you have some <span class="code">PRUnichar</span>s</th>
|
---|
266 | </tr>
|
---|
267 | <tr>
|
---|
268 | <th>you want</th>
|
---|
269 | <th><span class="code">PRUnichar w</span></th>
|
---|
270 | <th><span class="code">PRUnichar* wp</span></th>
|
---|
271 | <th><span class="code">nsAString& s</span></th>
|
---|
272 | </tr>
|
---|
273 | <tr>
|
---|
274 | <th class="row-label"><span class="code">char</span></th>
|
---|
275 | <!-- PRUnichar w --> <td></td>
|
---|
276 | <!-- PRUnichar* wp --> <td></td>
|
---|
277 | <!-- nsAString& s --> <td></td>
|
---|
278 | </tr>
|
---|
279 | <tr>
|
---|
280 | <th class="row-label"><span class="code">PRUnichar</span></th>
|
---|
281 | <!-- PRUnichar w --> <td></td>
|
---|
282 | <!-- PRUnichar* wp --> <td><span class="code">[]</span></td>
|
---|
283 | <!-- nsAString& s --> <td><a href="#faq_how_to_extract_a_character">extract a character</a></td>
|
---|
284 | </tr>
|
---|
285 | <tr>
|
---|
286 | <th class="row-label"><span class="code">char*</span></th>
|
---|
287 | <!-- PRUnichar w --> <td></td>
|
---|
288 | <!-- PRUnichar* wp --> <td></td>
|
---|
289 | <!-- nsAString& s --> <td></td>
|
---|
290 | </tr>
|
---|
291 | <tr>
|
---|
292 | <th class="row-label"><span class="code">PRUnichar*</span></th>
|
---|
293 | <!-- PRUnichar w --> <td><span class="code">&</span></td>
|
---|
294 | <!-- PRUnichar* wp --> <td></td>
|
---|
295 | <!-- nsAString& s --> <td><a href="#faq_how_to_get_a_pointer">get a pointer</a></td>
|
---|
296 | </tr>
|
---|
297 | <tr>
|
---|
298 | <th class="row-label"><span class="code">nsACString</span></th>
|
---|
299 | <!-- PRUnichar w --> <td></td>
|
---|
300 | <!-- PRUnichar* wp --> <td></td>
|
---|
301 | <!-- nsAString& s --> <td></td>
|
---|
302 | </tr>
|
---|
303 | <tr>
|
---|
304 | <th class="row-label"><span class="code">nsAString</span></th>
|
---|
305 | <!-- PRUnichar w --> <td></td>
|
---|
306 | <!-- PRUnichar* wp --> <td></td>
|
---|
307 | <!-- nsAString& s --> <td></td>
|
---|
308 | </tr>
|
---|
309 | <tr>
|
---|
310 | <th class="row-label">to call <span class="code">printf</span></th>
|
---|
311 | <!-- PRUnichar w --> <td></td>
|
---|
312 | <!-- PRUnichar* wp --> <td></td>
|
---|
313 | <!-- nsAString& s --> <td><a href="#faq_how_to_call_printf">call <span class="code">printf</span></a></td>
|
---|
314 | </tr>
|
---|
315 | </table>
|
---|
316 |
|
---|
317 | <div class="faq">
|
---|
318 | <dl>
|
---|
319 | <dt>
|
---|
320 | is there any string doc?
|
---|
321 | </dt>
|
---|
322 | <dd>
|
---|
323 | Yes, you're soaking in it!
|
---|
324 | </dd>
|
---|
325 |
|
---|
326 |
|
---|
327 |
|
---|
328 | <!-- getting a pointer -->
|
---|
329 | <dt>
|
---|
330 | <a name="faq_how_to_get_a_pointer">I have a string, how do I get a pointer to the characters?</a>
|
---|
331 | </dt>
|
---|
332 | <dd>
|
---|
333 | You want to avoid this situation.
|
---|
334 | In your own interfaces, prefer string types over raw pointers.
|
---|
335 | Any interface that wants to process a string using a single pointer is making two expensive assumptions.
|
---|
336 | First, that the string is stored in one contiguous hunk; and
|
---|
337 | second, that the string is zero-terminated.
|
---|
338 | If this isn't the case,
|
---|
339 | then to get a pointer, storage must be allocated and the entire string must be copied to it and zero-terminated.
|
---|
340 | You may not be able to avoid needing a pointer when interacting with system calls.
|
---|
341 | </dd>
|
---|
342 | <dd>
|
---|
343 | Some string classes guarantee that they are `flat'.
|
---|
344 | That is, that their data is stored in one contiguous zero-terminated hunk.
|
---|
345 | This <strong>does not</strong> imply that there are no embedded nulls. Caveat emptor.
|
---|
346 | All strings that explicitly promise flatness
|
---|
347 | inherit from the class <span class="code">nsAFlatString</span>
|
---|
348 | or <span class="code">nsAFlatCString</span>
|
---|
349 | and can produce a constant pointer to their data with the <span class="code">get()</span> member function.
|
---|
350 | Even strings that don't explicitly promise to be flat
|
---|
351 | may happen to be flat.
|
---|
352 | The helper function <span class="code">PromiseFlatString</span> will produce
|
---|
353 | a <span class="code">const</span> dependent string that is guaranteed to be flat.
|
---|
354 | If you use this on a string that already happens to be flat,
|
---|
355 | the result is simply a reference through to that string.
|
---|
356 | Otherwise,
|
---|
357 | <span class="code">PromiseFlatString</span> does the work to allocate, copy, terminate, and manage
|
---|
358 | a temporary flat string.
|
---|
359 | Since the result of <span class="code">PromiseFlatString</span> is a temporary,
|
---|
360 | you must be careful not to get and hold a pointer to its data for longer than the temporary itself lives.
|
---|
361 | </dd>
|
---|
362 | <dd>
|
---|
363 | <div class="source-code">
|
---|
364 | <pre>
|
---|
365 | /* I have a string, how do I get a pointer to the characters? */
|
---|
366 |
|
---|
367 | extern void EvilNarrowOSFunction( const char* ); // evil OS routines that want a pointers
|
---|
368 | extern void EvilWideOSFunction( const PRUnichar* );
|
---|
369 |
|
---|
370 | void func( const nsAString& aString, const nsACString& aCString )
|
---|
371 | {
|
---|
372 | EvilWideOSFunction( NS_LITERAL_STRING("Hello, World!").<span class="notice">get()</span> );
|
---|
373 | // literal strings are flat already (as are |nsString|s, et al), just use |.get()|
|
---|
374 |
|
---|
375 | EvilWideOSFunction( <span class="notice">PromiseFlatString(</span>aString<span class="notice">).get()</span> );
|
---|
376 | // for strings that don't explicitly guarantee flatness, use |PromiseFlatString|
|
---|
377 |
|
---|
378 |
|
---|
379 | // beware holding the pointer for longer than the life of the promise
|
---|
380 | <span class="warning">const PRUnichar* wp = PromiseFlatString(aString).get(); // BAD! |wp| dangles
|
---|
381 | EvilWideOSFunction(wp);</span>
|
---|
382 |
|
---|
383 | // if you really need to use the pointer from |PromiseFlatString| in more than one expression...
|
---|
384 | const nsAFlatString& flat = <span class="notice">PromiseFlatString(</span>aString<span class="notice">)</span>;
|
---|
385 | EvilWideOSFunction(flat.<span class="notice">get()</span>);
|
---|
386 | SomeOtherFunction(flat.<span class="notice">get()</span>);
|
---|
387 |
|
---|
388 | // similarly for |char| strings
|
---|
389 | EvilNarrowOSFunction( <span class="notice">PromiseFlatCString(</span>aCString<span class="notice">).get()</span> );
|
---|
390 | }
|
---|
391 | </pre>
|
---|
392 | </div>
|
---|
393 | </dd>
|
---|
394 |
|
---|
395 |
|
---|
396 |
|
---|
397 | <!-- extracting a character -->
|
---|
398 | <dt>
|
---|
399 | <a name="faq_how_to_extract_a_character">How do I get a particular character out of a string?</a>
|
---|
400 | </dt>
|
---|
401 | <dd>
|
---|
402 | Flat strings provide <span class="code">operator[]</span> and <span class="code">CharAt()</span>.
|
---|
403 | All strings provide <span class="code">First()</span>, <span class="code">Last()</span>, and access with iterators.
|
---|
404 | <strong>Don't</strong> promise a string flat just to do character indexing.
|
---|
405 | Prefer, instead, to get an iterator and <span class="code">advance</span> it to the position you care about.
|
---|
406 | </dd>
|
---|
407 | <dd>
|
---|
408 | <div class="source-code">
|
---|
409 | <pre>
|
---|
410 | /* How do I get a particular character out of a string? */
|
---|
411 |
|
---|
412 | PRUnichar Get5thCharacterOf( const nsAString& aString )
|
---|
413 | {
|
---|
414 | if ( aString.Length() >= 5 )
|
---|
415 | {
|
---|
416 | nsAString::const_iterator iter;
|
---|
417 | aString.BeginReading(iter); // make |iter| point to the beginning of |aString|
|
---|
418 | iter.advance(5);
|
---|
419 | return *iter;
|
---|
420 | }
|
---|
421 |
|
---|
422 | return PRUnichar(0);
|
---|
423 | }
|
---|
424 | </pre>
|
---|
425 | </div>
|
---|
426 | </dd>
|
---|
427 | <dd>
|
---|
428 | Using iterators isn't as bad as the example above makes it feel.
|
---|
429 | The typical use is for advancing through a string, examining many characters.
|
---|
430 | </dd>
|
---|
431 |
|
---|
432 |
|
---|
433 |
|
---|
434 | <!-- how to convert encoding -->
|
---|
435 | <dt>
|
---|
436 | <a name="faq_how_to_convert_encoding">How do I convert from one encoding to another?</a>
|
---|
437 | </dt>
|
---|
438 | <dd>
|
---|
439 | </dd>
|
---|
440 |
|
---|
441 |
|
---|
442 |
|
---|
443 | <!-- how to make a string -->
|
---|
444 | <dt>
|
---|
445 | <a name="faq_how_to_make_a_string">How do I create a string?</a>
|
---|
446 | </dt>
|
---|
447 | <dd>
|
---|
448 | </dd>
|
---|
449 |
|
---|
450 |
|
---|
451 | <!-- how to return a string -->
|
---|
452 | <dt>
|
---|
453 | What is the best way to return a string?
|
---|
454 | </dt>
|
---|
455 | <dd>
|
---|
456 | <p>
|
---|
457 | There are several reasonable ways to produce a string result from a function.
|
---|
458 | If you are already holding the answer as a sharable string,
|
---|
459 | you can simply return that string (pass-by-value).
|
---|
460 | Otherwise,
|
---|
461 | the most efficient and flexible way to return a string is
|
---|
462 | to assign your result into a non-<span class="code">const</span> reference parameter.
|
---|
463 | Don't bother to create a sharable string from scratch with your generated result.
|
---|
464 | </p>
|
---|
465 | <p>
|
---|
466 | Why?
|
---|
467 | The two things you want to minimize in string manipulation are,
|
---|
468 | in order of importance,
|
---|
469 | heap allocation, and
|
---|
470 | moving characters around.
|
---|
471 | </p>
|
---|
472 | </dd>
|
---|
473 | <dd>
|
---|
474 | <div class="source-code">
|
---|
475 | <pre>
|
---|
476 | /* What is the best way to return a string? */
|
---|
477 |
|
---|
478 | class foo
|
---|
479 | {
|
---|
480 | public:
|
---|
481 | // ...
|
---|
482 | void GetShortName( nsAString& aResult ) const;
|
---|
483 | nsCommonString GetFullName() const;
|
---|
484 |
|
---|
485 | private:
|
---|
486 | nsCommonString mFullName;
|
---|
487 |
|
---|
488 | const PRUnichar* mShortName;
|
---|
489 | PRUint32 mShortNameLength;
|
---|
490 |
|
---|
491 | };
|
---|
492 |
|
---|
493 | nsCommonString
|
---|
494 | foo::GetFullName() const
|
---|
495 | {
|
---|
496 | return mFullName;
|
---|
497 | }
|
---|
498 |
|
---|
499 | void
|
---|
500 | foo::GetShortName( nsAString& aResult ) const
|
---|
501 | {
|
---|
502 | aResult = DependentString(mShortName, mShortNameLength);
|
---|
503 | }
|
---|
504 | </pre>
|
---|
505 | </div>
|
---|
506 | </dd>
|
---|
507 |
|
---|
508 |
|
---|
509 | <dt>
|
---|
510 | <a name="faq_how_to_call_printf">How do I <span class="code">printf</span> a string, e.g., for debugging.</a>
|
---|
511 | </dt>
|
---|
512 | <dd>
|
---|
513 | If your string is already narrow, you just have to worry about <a href="#faq_how_to_get_a_pointer">making it flat, and then getting a pointer</a>.
|
---|
514 | </dd>
|
---|
515 | <dd>
|
---|
516 | If your string happens to be wide,
|
---|
517 | you'll need to convert it before you can <span class="code">printf</span> something reasonable.
|
---|
518 | If it's just for debugging,
|
---|
519 | you probably wouldn't care if something odd was printed in the case of a Unicode character that didn't have
|
---|
520 | an ASCII equivalent. (If you have a UTF-8 terminal, the result is
|
---|
521 | perfectly legible and nothing odd is printed.)
|
---|
522 | The simplest thing in this case is to make a temporary conversion using <span class="code">NS_ConvertUTF16toUTF8</span>.
|
---|
523 | The result is conveniently flat already, so getting the pointer is simple.
|
---|
524 | Remember not to hold onto the pointer you get out of this beyond the lifetime of temporary.
|
---|
525 | </dd>
|
---|
526 | <dd>
|
---|
527 | <div class="source-code">
|
---|
528 | <pre>
|
---|
529 | /* How do I |printf| a string? */
|
---|
530 |
|
---|
531 |
|
---|
532 | void PrintSomeStrings( const nsAString& aString, const PRUnichar* aKey, const nsACString& aCString )
|
---|
533 | {
|
---|
534 | // |printf|ing a narrow string is easy
|
---|
535 | printf("%s\n", <span class="notice">PromiseFlatCString(</span>aCString<span class="notice">).get()</span>); // GOOD
|
---|
536 |
|
---|
537 | // the simplest way to get a |printf|-able |const char*| out of a string
|
---|
538 | printf("%s\n", <span class="notice">NS_ConvertUTF16toUTF8(</span>aKey<span class="notice">).get()</span>); // GOOD
|
---|
539 |
|
---|
540 | // works just as well with an formal wide string type...
|
---|
541 | printf("%s\n", <span class="notice">NS_ConvertUTF16toUTF8(</span>aString<span class="notice">).get()</span>);
|
---|
542 |
|
---|
543 |
|
---|
544 | // But don't hold onto the pointer longer than the lifetime of the temporary!
|
---|
545 | <span class="warning">const char* cstring = NS_ConvertUTF16toUTF8(aKey).get(); // BAD! |cstring| is dangling
|
---|
546 | printf("%s\n", cstring);</span>
|
---|
547 | }
|
---|
548 | </pre>
|
---|
549 | </div>
|
---|
550 | </dd>
|
---|
551 |
|
---|
552 | </dl>
|
---|
553 |
|
---|
554 | <p>
|
---|
555 | Here are the email answers I have yet to format into the FAQ.
|
---|
556 | Some of the URLs may be out-dated or moved.
|
---|
557 | The messages are in order from oldest to newest.
|
---|
558 | </p>
|
---|
559 | <p class="editnote">[Note : In June, 2003, these emails were modified
|
---|
560 | to better reflect what is stored in 'wide' string
|
---|
561 | classes (UTF-16 string instead of UCS-2) and what
|
---|
562 | related methods do as a part of the patch for <a href=
|
---|
563 | "http://bugzilla.mozilla.org/show_bug.cgi?id=183156"
|
---|
564 | title="replace UCS2 in function/class/method names with UTF16">bug 183156</a>.
|
---|
565 | Therefore, they're a little different from the original emails
|
---|
566 | written by <a href="http://ScottCollins.net/">Scott Collins</a>]
|
---|
567 | </p>
|
---|
568 | <hr>
|
---|
569 | <pre>
|
---|
570 | Date: Thu, 13 Apr 2000 19:41:47 -0400
|
---|
571 | </pre>
|
---|
572 |
|
---|
573 | <p>Encoding Wars
|
---|
574 |
|
---|
575 | <p>This message is all about strings and the various encodings that might
|
---|
576 | be used to interpret their contents, the ramifications of that, and
|
---|
577 | where we're heading. The point of this message is to say what we're
|
---|
578 | currently thinking, and get feedback. I apologize in advance for the
|
---|
579 | rambling, and for the fact that this message may accidentally mix
|
---|
580 | discussion of how things <strong>are</strong> and how they will be.
|
---|
581 |
|
---|
582 | <p>There are many different possible encodings. Three in common use in
|
---|
583 | the Mozilla source base are: ASCII, UTF-16, and UTF-8. In ASCII, every
|
---|
584 | <!--the Mozilla source base are: ASCII, UCS2, and UTF8. In ASCII, every-->
|
---|
585 | character fits in 7-bits and is typically stored in an 8-bit byte. We
|
---|
586 | usually represent ASCII strings with <span class="code">nsCString</span>s, <span class="code">nsXPIDLCString</span>s,
|
---|
587 | or <span class="code">char</span> string literals. In UTF-16, characters occupy one 16-bit code unit (
|
---|
588 | <a href="http://www.unicode.org/glossary/index.html#BMP_character">
|
---|
589 | <abbr title="Basic Multilingual Plane">BMP</abbr>characters</a>)
|
---|
590 | or two 16-bit code units
|
---|
591 | (<a href="http://www.unicode.org/glossary/index.html#supplementary_character">
|
---|
592 | <abbr title="Supplementary Plane : Plane 1 through 16">non-BMP</abbr> characters</a>).
|
---|
593 | We usually represent UTF-16 strings as <span class="code">nsString</span>s, etc., i.e., two-byte
|
---|
594 | or `wide' strings. UTF-8 is a multi-byte encoding. A character might
|
---|
595 | occupy one, two, three, or four bytes. It is easiest to store and
|
---|
596 | manipulate such a string within a single-byte or `narrow' string
|
---|
597 | implementation.
|
---|
598 |
|
---|
599 | <p>None of our current string implementations know the encoding of the
|
---|
600 | data they hold at any given moment. An <span class="code">nsCString</span> might legitimately
|
---|
601 | hold data encoded in ASCII, UTF-8 or even EBCDIC for that matter.
|
---|
602 |
|
---|
603 | <p>Operations that convert from one encoding to another, or operations
|
---|
604 | that are encoding sensitive (e.g., <span class="code">to_upper</span>), rightly belong in
|
---|
605 | i18n. The fact that our current string interfaces automatically and
|
---|
606 | implicitly convert between wide and narrow strings is actually the
|
---|
607 | source of many errors in two particular categories: (1) unintended
|
---|
608 | extra work, (2) mistaken re-encoding, e.g., accidentally `converting'
|
---|
609 | a UTF-8 string to UTF-16 by pretending the UTF-8 string is ASCII and then
|
---|
610 | padding with <span class="code">'\0'</span>s.
|
---|
611 |
|
---|
612 | <p>We've known these were bad for a long time, and have been trying to
|
---|
613 | find the right way to fix them. The current thinking is to just byte
|
---|
614 | the bullet and eliminate implicit conversions. That has interesting
|
---|
615 | ramifications.
|
---|
616 |
|
---|
617 | <div class="source-code">
|
---|
618 | <pre>
|
---|
619 | void foo( const nsString& aUTF16string );
|
---|
620 |
|
---|
621 | foo("hello"); // works! constructs a temporary |nsString| by
|
---|
622 | // converting the ASCII literal with padding.
|
---|
623 | // Note: this requires an allocation
|
---|
624 | </pre>
|
---|
625 | </div>
|
---|
626 |
|
---|
627 | <p>Though we've always hated this form since it requires a heap
|
---|
628 | allocation. In current code, we recommend
|
---|
629 |
|
---|
630 | <div class="source-code">
|
---|
631 | <pre>
|
---|
632 | foo( nsAutoString("hello") );
|
---|
633 | </pre>
|
---|
634 | </div>
|
---|
635 |
|
---|
636 | <p>which still copy/converts, but at least it probably doesn't need to do
|
---|
637 | a heap allocation. In the best of all worlds, no conversion, copying,
|
---|
638 | or allocation would be necessary. To do that, you would need to be
|
---|
639 | able to directly specify a UTF-16 string, e.g., with the <span class="code">L"hello"</span>
|
---|
640 | notation, and wrap that in an interface that just held a pointer.
|
---|
641 | E.g., something like
|
---|
642 |
|
---|
643 | <div class="source-code">
|
---|
644 | <pre>
|
---|
645 | void foo( const nsAReadableString& aUTF16string );
|
---|
646 |
|
---|
647 | foo( nsLiteralString(L"hello") );
|
---|
648 | </pre>
|
---|
649 | </div>
|
---|
650 |
|
---|
651 | <p>There are problems with this example, however. The <span class="code">L</span> notation
|
---|
652 | specifically makes objects that are arrays of <span class="code">wchar_t</span>, which under
|
---|
653 | GCC is a 4-byte element. This leads to incompatibility with JS, and
|
---|
654 | the annoyance of possibly bloated storage (I'm sort of minimizing the
|
---|
655 | situation here. It's worse that I make it sound). More about tricks
|
---|
656 | to get around this in a bit, but first, let me talk about what to do
|
---|
657 | in the meantime while we're just getting rid of implicit constructors.
|
---|
658 | Initially to get around this problem (what problem? The problem that
|
---|
659 | <span class="code">foo("hello")</span> stopped compiling on my machine when I threw the
|
---|
660 | switch) I made a routine called <span class="code">NS_ConvertToString</span> which looked like
|
---|
661 | this
|
---|
662 |
|
---|
663 | <div class="source-code">
|
---|
664 | <pre>
|
---|
665 | inline
|
---|
666 | nsAutoString
|
---|
667 | NS_ConvertToString( const char* anASCIIstring )
|
---|
668 | {
|
---|
669 | nsAutoString aUCS2string;
|
---|
670 | aUCS2string.AssignWithConversion(anASCIIstring);
|
---|
671 | return aUCS2string;
|
---|
672 | }
|
---|
673 | </pre>
|
---|
674 | </div>
|
---|
675 |
|
---|
676 | <p>Which lets me write
|
---|
677 |
|
---|
678 | <div class="source-code">
|
---|
679 | <pre>
|
---|
680 | foo( NS_ConvertToString("hello") );
|
---|
681 | </pre>
|
---|
682 | </div>
|
---|
683 |
|
---|
684 | <p>This was <strong>OK</strong>, but in discussion there were concerns about performance
|
---|
685 | on machines that didn't <span class="code">inline</span> well, and issues about naming. In
|
---|
686 | that meeting we came up with an alternate naming strategy that we
|
---|
687 | think has room for growth and an implementation more likely to be
|
---|
688 | efficient on every platform. The implementation is to define a new
|
---|
689 | class that derives from <span class="code">nsAutoString</span>, but allows construction from a
|
---|
690 | <span class="code">char*</span>
|
---|
691 |
|
---|
692 | <div class="source-code">
|
---|
693 | <pre>
|
---|
694 | class NS_ConvertASCIItoUTF16 : public nsAutoString
|
---|
695 | {
|
---|
696 | public:
|
---|
697 | NS_ConvertASCIItoUTF16( const char* );
|
---|
698 | // ...
|
---|
699 | };
|
---|
700 | </pre>
|
---|
701 | </div>
|
---|
702 |
|
---|
703 | <p>Which gives identical (though renamed) notation for calling <span class="code">foo</span>:
|
---|
704 |
|
---|
705 | <div class="source-code">
|
---|
706 | <pre>
|
---|
707 | foo( NS_ConvertASCIItoUTF16("hello") );
|
---|
708 | </pre>
|
---|
709 | </div>
|
---|
710 |
|
---|
711 | <p>It looks like a function call to an explicit encoding conversion. It
|
---|
712 | acts like a function call to an explicit encoding conversion. It <strong>is</strong>
|
---|
713 | a function call to an explicit encoding conversion. We think that
|
---|
714 | this naming pattern has room for growth. In the meeting, we concluded
|
---|
715 | that the best representation for encoding conversions is a family of
|
---|
716 | functions, and <span class="code">NS_ConvertASCIItoUTF16</span> fits right in. We think that
|
---|
717 | XPCOM probably can't live without the ASCII to UTF-16 conversion (though
|
---|
718 | as explicit as possible) but that all others rightly belong in i18n
|
---|
719 | land.
|
---|
720 |
|
---|
721 | <p>You can probably deduce from the clues in <span class="code">NS_ConvertToString</span>, above,
|
---|
722 | that constructors weren't the only thing that became explicit.
|
---|
723 | Assignment, appending, comparison, et al, got renamed so that when
|
---|
724 | assigning, appending, or comparing to a value in a different encoding
|
---|
725 | the `WithConversion' form must be used. E.g.,
|
---|
726 |
|
---|
727 | <div class="source-code">
|
---|
728 | <pre>
|
---|
729 | nsString aUTF16string;
|
---|
730 | nsCString anASCIIstring;
|
---|
731 | // ...
|
---|
732 |
|
---|
733 | aUTF16string += anASCIIstring; // Currently legal, but not for long
|
---|
734 | aUTF16string.Append(anASCIIstring); // same
|
---|
735 |
|
---|
736 | aUTF16string.AppendWithConversion(anASCIIstring); // the new way
|
---|
737 |
|
---|
738 | if ( aUTF16string == anASCIIstring ) // Sorry, this is going away too
|
---|
739 | // ...
|
---|
740 |
|
---|
741 | if ( aUTF16string.EqualsWithConversion(anASCIIstring) )
|
---|
742 | // ...
|
---|
743 | </pre>
|
---|
744 | </div>
|
---|
745 |
|
---|
746 | <p>Yes, it's long and annoying. Just like the extra work you were
|
---|
747 | implicitly asking to have done, perhaps incorrectly. There are other
|
---|
748 | reasons to rename these functions. When <span class="code">nsString</span> and <span class="code">nsCString</span>
|
---|
749 | defined a ton of, e.g., <span class="code">Append</span>s each there was no problem, because
|
---|
750 | nobody wanted to override <span class="code">Append</span>. Now, with strings inheriting from
|
---|
751 | abstract base classes we immediately run into the problem that
|
---|
752 | overriding and overloading don't mix very well in C++. Because of a
|
---|
753 | feature of C++ called name hiding, it is problematic to override only
|
---|
754 | a single signature of a name overloaded in a base class. The base
|
---|
755 | <span class="code">nsAWritableString</span> provides several <span class="code">Append</span>s, all for objects of
|
---|
756 | (hopefully) the same encoding. <span class="code">nsString</span> can't easily add a bunch of
|
---|
757 | new <span class="code">Append</span>s (the converting ones) without running face first into
|
---|
758 | the name hiding problem. The discussion of the fix for this is mostly
|
---|
759 | unrelated to encoding issues, so I'll defer it to another post.
|
---|
760 |
|
---|
761 | <p>In hindsight, after the meeting, it seemed clear that all the
|
---|
762 | `WithConversion' forms would be better named
|
---|
763 |
|
---|
764 | <div class="source-code">
|
---|
765 | <pre>
|
---|
766 | xxxConvertingASCIItoUTF16
|
---|
767 | xxxConvertingUTF16toASCII
|
---|
768 | </pre>
|
---|
769 | </div>
|
---|
770 |
|
---|
771 | <p>however, the <strong>real</strong> goal (probably) is to move most such conversions
|
---|
772 | into i18n. Just bringing attention to the previously implicit
|
---|
773 | conversions is a good first step. Renaming these conversions as just
|
---|
774 | suggested is probably the right thing to do, though it sort of
|
---|
775 | validates them, which I'm not sure we really want. This is a decision
|
---|
776 | we need to discuss further.
|
---|
777 |
|
---|
778 | <p>Now, back to the string literal problem above. One possible solution
|
---|
779 | is to use a macro. Imagine
|
---|
780 |
|
---|
781 | <div class="source-code">
|
---|
782 | <pre>
|
---|
783 | NS_LITERAL_STRING("Hello")
|
---|
784 | </pre>
|
---|
785 | </div>
|
---|
786 |
|
---|
787 | <p>which on a machine where the <span class="code">L</span> trick works, turns into
|
---|
788 |
|
---|
789 | <div class="source-code">
|
---|
790 | <pre>
|
---|
791 | nsLiteralString(L"Hello")
|
---|
792 | </pre>
|
---|
793 | </div>
|
---|
794 |
|
---|
795 | <p>but on a machine where there is trouble, turns into something less
|
---|
796 | appealing, but more likely to work, like
|
---|
797 |
|
---|
798 | <div class="source-code">
|
---|
799 | <pre>
|
---|
800 | NS_ConvertASCIItoUTF16("Hello")
|
---|
801 | </pre>
|
---|
802 | </div>
|
---|
803 |
|
---|
804 | <p>Another solution is to add a compilation step that fixes <span class="code">L</span> strings
|
---|
805 | on bad platforms to be non-<span class="code">L</span> strings, but padded with <span class="code">\0</span>s. E.g.,
|
---|
806 | <span class="code">L"Hello"</span> gets preprocessed into <span class="code">"\000H\000e\000l\000l\000o\000"</span>.
|
---|
807 | This solution is more annoying to the developer, where the prior
|
---|
808 | solution is more annoying during the runtime.
|
---|
809 |
|
---|
810 | <p>Before we go to too much trouble on this specific feature, we will
|
---|
811 | probably want to do more measurement to see just how much and how
|
---|
812 | often we are converting constant literal strings, and why.
|
---|
813 |
|
---|
814 |
|
---|
815 | <p>I'm currently ripping through the tree fixing things to use the
|
---|
816 | `WithConversion' forms where appropriate. I was also converting
|
---|
817 | things to use <span class="code">NS_ConvertToString</span> where appropriate; unless I get
|
---|
818 | talked out of it, I want to switch midstream to
|
---|
819 | <span class="code">NS_ConvertASCIItoUTF16</span>, then go back and fix up the
|
---|
820 | <span class="code">NS_ConvertToString</span> instances later. I've set things up so I can
|
---|
821 | check in as I go. After all these conversions have been done, I'll be
|
---|
822 | able to throw the switch (what switch? NEW_STRING_APIS) which will
|
---|
823 | make <span class="code">nsString</span> inherit from <span class="code">nsAWritableString</span>, etc. and allow us to
|
---|
824 | start exploiting these other opportunities (e.g., for literal strings,
|
---|
825 | shared strings, etc. See
|
---|
826 | <a class="exact-uri" href="http://bugzilla.mozilla.org/show_bug.cgi?id=28221">http://bugzilla.mozilla.org/show_bug.cgi?id=28221</a> for details and
|
---|
827 | reasoning.)
|
---|
828 |
|
---|
829 | <p>I guess I'm expecting comments on:
|
---|
830 |
|
---|
831 | <ul>
|
---|
832 | <li>how really annoying this whole topic is
|
---|
833 | <li>how bad <span class="code">L"xxx"</span> is
|
---|
834 | <li>whether to move forward with <span class="code">NS_ConvertASCIItoUTF16</span>
|
---|
835 | <li>whether we should move to xxxConvertingASCIItoUTF16 etc instead
|
---|
836 | of `WithConverting'
|
---|
837 | <li>arguments about where encoding conversions should live
|
---|
838 | <li>arguments about whether going between 1 and 2 byte storage is an
|
---|
839 | encoding conversion
|
---|
840 | <li>questions about stuff I didn't mention or didn't explain well
|
---|
841 | <li>pointing out stuff I'm just plain wrong about, or things I forgot
|
---|
842 | <li>etc
|
---|
843 | </ul>
|
---|
844 |
|
---|
845 | <p>So as not to jumble the discussion, I'll be separately posting other
|
---|
846 | requests for comments about specific features of the design of the new
|
---|
847 | string hierarchy.
|
---|
848 |
|
---|
849 | <p>I hope this helps keep everybody filled in on what we're thinking and
|
---|
850 | able to point out what we're forgetting or screwing up :-)
|
---|
851 |
|
---|
852 |
|
---|
853 |
|
---|
854 |
|
---|
855 |
|
---|
856 | <hr>
|
---|
857 | <pre>
|
---|
858 | Date: Wed, 19 Apr 2000 21:12:47 -0400
|
---|
859 | Subject: more string info
|
---|
860 | </pre>
|
---|
861 |
|
---|
862 | <p> <a class="exact-uri" href="news://news.mozilla.org/[email protected]">news://news.mozilla.org/[email protected]</a>
|
---|
863 |
|
---|
864 |
|
---|
865 |
|
---|
866 |
|
---|
867 |
|
---|
868 | <hr>
|
---|
869 | <pre>
|
---|
870 | Date: Fri, 26 May 2000 15:31:37 -0400
|
---|
871 | Subject: Re: Question on ==
|
---|
872 | </pre>
|
---|
873 |
|
---|
874 | <p>I would prefer you compare with <span class="code">Equals</span> (which should really be named
|
---|
875 | <span class="code">IsEqualTo</span>) rather than <span class="code">operator==()</span> because of this:
|
---|
876 |
|
---|
877 | <div class="source-code">
|
---|
878 | <pre>
|
---|
879 | char* a;
|
---|
880 | char* b;
|
---|
881 |
|
---|
882 | // ...
|
---|
883 |
|
---|
884 | if ( a == b )
|
---|
885 | // ...
|
---|
886 | </pre>
|
---|
887 | </div>
|
---|
888 |
|
---|
889 | <p>Comparing two raw `string' pointers doesn't compare the characters
|
---|
890 | they point to, but instead compares the bits of the pointers. For
|
---|
891 | this reason, I may eventually make comparison of a string with a
|
---|
892 | pointer using operators just go away.
|
---|
893 |
|
---|
894 |
|
---|
895 |
|
---|
896 |
|
---|
897 |
|
---|
898 | <hr>
|
---|
899 | <pre>
|
---|
900 | Date: Wed, 14 Jun 2000 14:38:55 -0400
|
---|
901 | Subject: Re: Fix to XprtDefs.h
|
---|
902 | </pre>
|
---|
903 |
|
---|
904 | <p>Yes, we're aware that turning off <span class="code">wchar_t</span> support makes <span class="code">wchar_t</span> be
|
---|
905 | a synonym for <span class="code">unsigned short</span> under Metrowerks. We know that the
|
---|
906 | current version of VC++ also makes these types equivalent. In theory,
|
---|
907 | though, the types are distinct even when they are the same size and
|
---|
908 | shape. By using real <span class="code">wchar_t</span> support, we are forced to recognize
|
---|
909 | the distinction and navigate it appropriately with <span class="code">reinterpret_cast</span>
|
---|
910 | (via <span class="code">NS_REINTERPRET_CAST</span>). The win here is that we aren't caught by
|
---|
911 | compiler changes that suddenly make some set of compilers compliant
|
---|
912 | and therefore break our code. We will add an autoconf test that lets
|
---|
913 | UNIX compilers opt in to our string scheme when they have an
|
---|
914 | appropriately shaped <span class="code">wchar_t</span>. If these happen to be compliant
|
---|
915 | compilers, all will be well. If they don't, the casts don't hurt,
|
---|
916 | because they are type correct. We are writing our code to meet the
|
---|
917 | standard as we move forward.
|
---|
918 |
|
---|
919 | <p>The win for us is realized by the following macros
|
---|
920 |
|
---|
921 | <div class="source-code">
|
---|
922 | <pre>
|
---|
923 | #ifdef HAVE_CPP_2BYTE_WCHAR_T
|
---|
924 | #define NS_LITERAL_STRING(s) nsLiteralString(L##s, \
|
---|
925 | (sizeof(L##s)/sizeof(wchar_t))-1)
|
---|
926 | #else
|
---|
927 | #define NS_LITERAL_STRING(s) NS_ConvertASCIItoUTF16(s, \
|
---|
928 | sizeof(s)-1)
|
---|
929 | #endif
|
---|
930 | </pre>
|
---|
931 | </div>
|
---|
932 |
|
---|
933 | <p>An <span class="code">nsLiteralString</span> points directly to the literal characters. No
|
---|
934 | copying, no conversion, and the length calculation happens at compile
|
---|
935 | time. This has turned out to be as large a savings as 15% of code
|
---|
936 | space and 8% of data space, net, in our string test harness It's
|
---|
937 | faster as well, again by eliminating the copying, conversion, and
|
---|
938 | length calculation. We don't know yet what those numbers translate
|
---|
939 | into in our real code base, but we have high hopes.
|
---|
940 |
|
---|
941 | <p>I don't want to be in the position to ask you to change your code. I
|
---|
942 | don't think it's appropriate for me to do so. The AIM application
|
---|
943 | that is your client is our client as well. They need to resolve this
|
---|
944 | difference between us in whatever way they think best. That may mean
|
---|
945 | asking you if changing your apis is the right thing to do. Or it may
|
---|
946 | mean applying the casts. Our code-base and yours, Justin, are more
|
---|
947 | like cousins. I don't think you should have to change just to conform
|
---|
948 | to us. You may think my arguments for using real <span class="code">wchar_t</span> have
|
---|
949 | merit, and adopt similar usage just because you agree; but I think the
|
---|
950 | only obligation you have is to follow the technical solution you think
|
---|
951 | is right for your code.
|
---|
952 |
|
---|
953 | <p>If you decide to make this api change, it will mean shipping a new
|
---|
954 | binary (on Mac) for your library to clients who want to switch over to
|
---|
955 | the new api (since the name mangling will be different, and therefore,
|
---|
956 | the link requirements will change).
|
---|
957 |
|
---|
958 | <p>Hope this helps,
|
---|
959 |
|
---|
960 |
|
---|
961 |
|
---|
962 |
|
---|
963 |
|
---|
964 | <hr>
|
---|
965 | <pre>
|
---|
966 | Date: Thu, 15 Jun 2000 19:36:55 -0400
|
---|
967 | Subject: Re: Checkin approval for bug 32336
|
---|
968 | </pre>
|
---|
969 |
|
---|
970 | <div class="source-code">
|
---|
971 | <pre>
|
---|
972 | S.Equals(NS_LITERAL_STRING("bar"), PR_TRUE, 3)
|
---|
973 | </pre>
|
---|
974 | </div>
|
---|
975 |
|
---|
976 | <p>doesn't compile because there is no three parameter form for <span class="code">Equals</span>.
|
---|
977 | For all definitions of <span class="code">Equals</span> on strings, see "nsAReadableString.h"
|
---|
978 |
|
---|
979 | <p><a class="exact-uri" href="http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsAReadableString.h">http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsAReadableString.h</a>
|
---|
980 |
|
---|
981 | <p>There is an <span class="code">EqualsWithConversion</span> that takes three parameters.
|
---|
982 |
|
---|
983 | <p> <a class="exact-uri" href="http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsString2.h#731">http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsString2.h#731</a>
|
---|
984 |
|
---|
985 | <p>It is ``EqualsWithConversion'' because it admits the possibility of an
|
---|
986 | encoding specific transformation, in this case to provide
|
---|
987 | case-insensitive comparison. This also wouldn't compile, however,
|
---|
988 | since, at the moment, an <span class="code">nsLiteralString</span> doesn't provide an operator
|
---|
989 | to produce a <span class="code">const PRUnichar*</span> (though perhaps it should), and it
|
---|
990 | doesn't satisfy the other interfaces that match this call, e.g., a
|
---|
991 | <span class="code">const nsString&</span>.
|
---|
992 |
|
---|
993 | <p>Perhaps I need to move case-insensitive comparison up out of
|
---|
994 | <span class="code">nsString</span> into a global encoding specific transformations and
|
---|
995 | algorithms file (which was on its way anyway as Waterson, knows); this
|
---|
996 | use is one bit of evidence to support this. In the short term, this
|
---|
997 | can be fixed (if we think the current behavior is wrong) by providing
|
---|
998 | <span class="code">operator const CharT*() const</span> on literal string.
|
---|
999 |
|
---|
1000 | <p>If you can live with out case-folding, the earlier form is preferred
|
---|
1001 |
|
---|
1002 | <div class="source-code">
|
---|
1003 | <pre>
|
---|
1004 | S == NS_LITERAL_STRING("bar")
|
---|
1005 | </pre>
|
---|
1006 | </div>
|
---|
1007 |
|
---|
1008 | <p>if you can't, then one of the fixes I mentioned is in order.
|
---|
1009 |
|
---|
1010 |
|
---|
1011 |
|
---|
1012 |
|
---|
1013 |
|
---|
1014 | <hr>
|
---|
1015 | <pre>
|
---|
1016 | Date: Thu, 15 Jun 2000 19:47:12 -0400
|
---|
1017 | Subject: Re: [Fwd: how to use nsString ?]
|
---|
1018 | </pre>
|
---|
1019 |
|
---|
1020 | <pre class="email-quote">
|
---|
1021 | >I see these same examples time and again in the embedding
|
---|
1022 | >samples/docs, but I can't compile them.
|
---|
1023 | </pre>
|
---|
1024 |
|
---|
1025 | <p>Apologies. Documentation mentioning strings is getting out of date.
|
---|
1026 | Here are some specific answers.
|
---|
1027 |
|
---|
1028 |
|
---|
1029 | <pre class="email-quote">
|
---|
1030 | >nsString URLString("http://www.mozilla.org");
|
---|
1031 | </pre>
|
---|
1032 |
|
---|
1033 | <p>...is now perhaps best expressed as
|
---|
1034 |
|
---|
1035 | nsString URLString( NS_LITERAL_STRING("http://www.mozilla.org") );
|
---|
1036 |
|
---|
1037 | <p>since an <span class="code">nsString</span> is a sequence of 2-byte wide characters, and the
|
---|
1038 | routines that implicitly convert 1-byte sequences (like the literal
|
---|
1039 | sequence you specified, "http:...") are now gone.
|
---|
1040 |
|
---|
1041 | <p>Up until not too long ago, one would have had to say
|
---|
1042 |
|
---|
1043 | <div class="source-code">
|
---|
1044 | <pre>
|
---|
1045 | nsString URLString;
|
---|
1046 | URLString.AssignWithConversion("http://www.mozilla.org");
|
---|
1047 | </pre>
|
---|
1048 | </div>
|
---|
1049 |
|
---|
1050 | <p>The <span class="code">NS_LITERAL_STRING</span> construction is new machinery that has the
|
---|
1051 | potential to make many operations much more efficient.
|
---|
1052 |
|
---|
1053 | <pre class="email-quote">
|
---|
1054 | >nsString URLString;
|
---|
1055 | >URLString.SetString("www.mozilla.org");
|
---|
1056 | </pre>
|
---|
1057 |
|
---|
1058 | <p><span class="code">SetString</span> was a synonym for <span class="code">Assign</span> or assignment with
|
---|
1059 | <span class="code">operator=()</span>, it too went away. The equivalent is the second
|
---|
1060 | example I gave above, that is, the one with <span class="code">AssignWithConversion</span>.
|
---|
1061 |
|
---|
1062 | <p><span class="code">Assign</span> still exists. <span class="code">AssignWithConversion</span> takes on that
|
---|
1063 | functionality for assignments that require encoding transformations
|
---|
1064 | (e.g., from ASCII to UTF16). <span class="code">SetString</span> is gone, since it was always
|
---|
1065 | a synonym for <span class="code">Assign</span>.
|
---|
1066 |
|
---|
1067 | <p>Learn more about the general APIs for strings that we are trying to
|
---|
1068 | move to by examining
|
---|
1069 |
|
---|
1070 | <a class="exact-uri" href="http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsAReadableString.h">http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsAReadableString.h</a>
|
---|
1071 | <a class="exact-uri" href="http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsAWritableString.h">http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsAWritableString.h</a>
|
---|
1072 |
|
---|
1073 | <p>Hope this helps,
|
---|
1074 |
|
---|
1075 |
|
---|
1076 |
|
---|
1077 |
|
---|
1078 |
|
---|
1079 | <hr>
|
---|
1080 | <pre>
|
---|
1081 | Date: Thu, 15 Jun 2000 21:26:51 -0400
|
---|
1082 | Subject: Re: Checkin approval for bug 32336
|
---|
1083 | </pre>
|
---|
1084 |
|
---|
1085 | <pre class="email-quote">
|
---|
1086 | >I *need* the count attribute, because I need to compare only the first
|
---|
1087 | >chars (that's inherent to the logic).
|
---|
1088 | </pre>
|
---|
1089 |
|
---|
1090 | <p>This is what substrings are for. In that case, you could use
|
---|
1091 |
|
---|
1092 | <div class="source-code">
|
---|
1093 | <pre>
|
---|
1094 | Substring(S, 0, 3) == NS_LITERAL_STRING("bar")
|
---|
1095 | </pre>
|
---|
1096 | </div>
|
---|
1097 |
|
---|
1098 | <p>As for case-folding, it's best if you can case-fold everything up
|
---|
1099 | front, instead of doing it repeatedly. I'll have to get back to you
|
---|
1100 | on a general solution to that problem, or what my schedule for getting
|
---|
1101 | it checked in would be. I'm sorry, I know that's not what you needed
|
---|
1102 | to hear. If the source string is an <span class="code">nsString</span>, you can continue to
|
---|
1103 | exploit its implementation of these routines, e.g., <span class="code">ToLower</span> all
|
---|
1104 | up-front.
|
---|
1105 |
|
---|
1106 | <p>Hope this helps,
|
---|
1107 |
|
---|
1108 |
|
---|
1109 |
|
---|
1110 |
|
---|
1111 |
|
---|
1112 | <hr>
|
---|
1113 | <pre>
|
---|
1114 | Date: Mon, 19 Jun 2000 14:23:47 -0400
|
---|
1115 | Subject: Re: string fu
|
---|
1116 | </pre>
|
---|
1117 |
|
---|
1118 | <pre class="email-quote">
|
---|
1119 | >It seems less convenient to have to first check path.IsEmpty, and
|
---|
1120 | >then if false get path.Last and test it.
|
---|
1121 | </pre>
|
---|
1122 |
|
---|
1123 | <p>What would you prefer? That extracting a character not in the string
|
---|
1124 | always return <span class="code">CharT(0)</span>? Can't do it for two reasons: (1) <span class="code">0</span> may be
|
---|
1125 | a valid character in a particular encoding, so it can't be used in
|
---|
1126 | general as a ``no character at that position'' marker; and (2) I can't
|
---|
1127 | control what an individual string implementation does when asked to
|
---|
1128 | get an out-of-bounds fragment, it's explicitly undefined. That means
|
---|
1129 | the result of <span class="code">CharAt</span> is explicitly undefined for indexes outside the
|
---|
1130 | defined contents of the string. As a debugging convenience, I have
|
---|
1131 | made this assert, but it has always been the case that retrieving such
|
---|
1132 | a character had undefined results ... even in [the old] code.
|
---|
1133 |
|
---|
1134 | <p>OK, you might say, well at least let me ask for a character that is
|
---|
1135 | only off the end by one. E.g., <span class="code">Last</span> of an empty string. Reason (1)
|
---|
1136 | from above still applies. How bad is it to say, for the case you gave
|
---|
1137 |
|
---|
1138 | <div class="source-code">
|
---|
1139 | <pre>
|
---|
1140 | PRBool needsDelim = PR_FALSE;
|
---|
1141 | if ( !path.IsEmpty() )
|
---|
1142 | {
|
---|
1143 | PRUnichar last = path.Last();
|
---|
1144 | needsDelim = !(last == '/' || last == '\\');
|
---|
1145 | }
|
---|
1146 | </pre>
|
---|
1147 | </div>
|
---|
1148 |
|
---|
1149 | <p>In general, you probably want to opt out of a whole lot of work when
|
---|
1150 | the source string is empty. It is slightly less convenient, but it
|
---|
1151 | doesn't tie us to a bunch of implementation specific mojo.
|
---|
1152 |
|
---|
1153 |
|
---|
1154 | <pre class="email-quote">
|
---|
1155 | >Can we fix GetUnicode in this case?
|
---|
1156 | </pre>
|
---|
1157 |
|
---|
1158 | <p>This is an annoying property of auto strings, e.g., that they always
|
---|
1159 | have an allocated buffer. I'm happy to fix this bug, however, be
|
---|
1160 | aware that <span class="code">GetUnicode</span> and <span class="code">GetBuffer</span> are artifacts of [the old]
|
---|
1161 | implementation that we don't want to support. They are not part of
|
---|
1162 | the abstract interface. We will keep them no longer than we have to.
|
---|
1163 | They don't support our multi-fragment paradigm. People who require a
|
---|
1164 | contiguous hunk of characters in the future, and are unwilling to
|
---|
1165 | switch over to chunky-iterators, may be forced to copy the string to
|
---|
1166 | their own buffer. There will be an implementation of narrow character
|
---|
1167 | string that guarantees contiguous allocation and a zero-terminator,
|
---|
1168 | much as <span class="code">nsCString</span> does now, for compatibility with platform uses,
|
---|
1169 | but this won't be the default string class.
|
---|
1170 |
|
---|
1171 |
|
---|
1172 |
|
---|
1173 |
|
---|
1174 |
|
---|
1175 | <hr>
|
---|
1176 | <pre>
|
---|
1177 | Date: Mon, 19 Jun 2000 17:22:31 -0400
|
---|
1178 | </pre>
|
---|
1179 |
|
---|
1180 | <p>Clarifying String Sematics
|
---|
1181 |
|
---|
1182 | <p>Recently, I added an assert to the string operations that extract
|
---|
1183 | characters, namely <span class="code">First()</span>, <span class="code">Last()</span>, <span class="code">CharAt()</span>, and
|
---|
1184 | <span class="code">operator[]()</span>. This assert fires when any of these routines are used
|
---|
1185 | to access a character outside the defined contents of the string. For
|
---|
1186 | <span class="code">First()</span> and <span class="code">Last()</span> that means whenever they are applied to an
|
---|
1187 | empty string. For <span class="code">CharAt()</span> and <span class="code">operator[]()</span>, that means whenever
|
---|
1188 | they are used to access an index outside the range of
|
---|
1189 | <span class="code">0</span>..<span class="code">Length()-1</span>. There have been some complaints, however, the
|
---|
1190 | result was always undefined. What follows is extracted from an email
|
---|
1191 | exchange between me and warren on this topic. I hope it clarifies
|
---|
1192 | strings semantics
|
---|
1193 |
|
---|
1194 | <p>Warren writes:
|
---|
1195 | <pre class="email-quote">
|
---|
1196 | >I hit your funky CharAt assertion tonight in this piece of code:
|
---|
1197 |
|
---|
1198 | >NS_IMETHODIMP
|
---|
1199 | >nsIOService::ResolveRelativePath(
|
---|
1200 | > const char *relativePath,
|
---|
1201 | > const char* basePath,
|
---|
1202 | > char **result )
|
---|
1203 | > {
|
---|
1204 | > nsCAutoString name;
|
---|
1205 | > nsCAutoString path(basePath);
|
---|
1206 | >
|
---|
1207 | > PRUnichar last = path.Last();
|
---|
1208 | > PRBool needsDelim = !(last == '/' || last == '\\' || last ==
|
---|
1209 | > '\0');
|
---|
1210 | > ...
|
---|
1211 |
|
---|
1212 | >where basePath is null. It seems less convenient to have to first
|
---|
1213 | >check path.IsEmpty, and then if false get path.Last and test it.
|
---|
1214 | </pre>
|
---|
1215 |
|
---|
1216 | <p>I replied:
|
---|
1217 | <pre class="email-quote">
|
---|
1218 | >What would you prefer? That extracting a character not in the
|
---|
1219 | >string always return <span class="code">CharT(0)</span>? Can't do it for two reasons:
|
---|
1220 | >(1) <span class="code">0</span> may be a valid character in a particular encoding, so it
|
---|
1221 | >can't be used in general as a ``no character at that position''
|
---|
1222 | >marker; and (2) I can't control what an individual string
|
---|
1223 | >implementation does when asked to get an out-of-bounds fragment,
|
---|
1224 | >it's explicitly undefined. That means the result of <span class="code">CharAt</span> is
|
---|
1225 | >explicitly undefined for indexes outside the defined contents of
|
---|
1226 | >the string. As a debugging convenience, I have made this assert,
|
---|
1227 | >but it has always been the case that retrieving such a character
|
---|
1228 | >had undefined results ... even in [the old] code.
|
---|
1229 |
|
---|
1230 | >OK, you might say, well at least let me ask for a character that
|
---|
1231 | >is only off the end by one. E.g., <span class="code">Last</span> of an empty string.
|
---|
1232 | >Reason (1) from above still applies. How bad is it to say, for the
|
---|
1233 | >case you gave
|
---|
1234 |
|
---|
1235 | > PRBool needsDelim = PR_FALSE;
|
---|
1236 | > if ( !path.IsEmpty() )
|
---|
1237 | > {
|
---|
1238 | > PRUnichar last = path.Last();
|
---|
1239 | > needsDelim = !(last == '/' || last == '\\');
|
---|
1240 | > }
|
---|
1241 |
|
---|
1242 | >In general, you probably want to opt out of a whole lot of work
|
---|
1243 | >when the source string is empty. It is slightly less convenient,
|
---|
1244 | >but it doesn't tie us to a bunch of implementation specific mojo.
|
---|
1245 | </pre>
|
---|
1246 |
|
---|
1247 | <p>Warren also asks:
|
---|
1248 | <pre class="email-quote">
|
---|
1249 | >Here's another issue, perhaps more serious. If I say this:
|
---|
1250 |
|
---|
1251 | > foo(const PRUnichar* s) {
|
---|
1252 | > nsAutoString str(s);
|
---|
1253 | > bar(str.get());
|
---|
1254 | > }
|
---|
1255 |
|
---|
1256 | >where s is null, bar will get passed a zero-length PRUnichar
|
---|
1257 | >sequence instead of null. This makes it so that you can't just
|
---|
1258 | >test for the argument == null. You have to nsCRT::strlen(arg) == 0
|
---|
1259 | >which is much less efficient. Can we fix GetUnicode in this case?
|
---|
1260 | </pre>
|
---|
1261 |
|
---|
1262 | <p>And I reply:
|
---|
1263 | <pre class="email-quote">
|
---|
1264 | >This is an annoying property of auto strings, e.g., that they
|
---|
1265 | >always have an allocated buffer. I'm happy to fix this bug,
|
---|
1266 | >however, be aware that <span class="code">GetUnicode</span> and <span class="code">GetBuffer</span> are artifacts
|
---|
1267 | >of [the old] implementation that we don't want to support. They
|
---|
1268 | >are not part of the abstract interface. We will keep them no
|
---|
1269 | >longer than we have to. They don't support our multi-fragment
|
---|
1270 | >paradigm. People who require a contiguous hunk of characters in
|
---|
1271 | >the future, and are unwilling to switch over to chunky-iterators,
|
---|
1272 | >may be forced to copy the string to their own buffer. There will
|
---|
1273 | >be an implementation of narrow character string that guarantees
|
---|
1274 | >contiguous allocation and a zero-terminator, much as <span class="code">nsCString</span>
|
---|
1275 | >does now, for compatibility with platform uses, but this won't be
|
---|
1276 | >the default string class.
|
---|
1277 | </pre>
|
---|
1278 |
|
---|
1279 | <p>In a later message, Chris Waterson asks a related question
|
---|
1280 | <pre class="email-quote">
|
---|
1281 | >scc: should we add <span class="code">operator PRUnichar*()</span> to
|
---|
1282 | >NS_ConvertASCIItoUTF16?
|
---|
1283 | </pre>
|
---|
1284 |
|
---|
1285 | <p>And I reply:
|
---|
1286 | <pre class="email-quote">
|
---|
1287 | >It seems reasonable. A lot more reasonable that forcing people to
|
---|
1288 | >call <span class="code">GetUnicode()</span>. I alluded to platform specific classes in an
|
---|
1289 | >earlier message to warren that you were cc'd on, Chris. I imagine
|
---|
1290 | >that the <span class="code">...Convert...</span> routines would be required to produce
|
---|
1291 | >contiguous allocation 0-terminated strings (though the as yet
|
---|
1292 | >unimplemented <span class="code">...Copy...</span> forms, of course wouldn't. So <span class="code">operator
|
---|
1293 | >const PRUnichar*() const</span> makes perfect sense to me here.
|
---|
1294 | </pre>
|
---|
1295 |
|
---|
1296 | <p>Hope this makes sense,
|
---|
1297 |
|
---|
1298 |
|
---|
1299 |
|
---|
1300 |
|
---|
1301 | <hr>
|
---|
1302 | <pre>
|
---|
1303 | Date: Tue, 20 Jun 2000 04:05:31 -0400
|
---|
1304 | Subject: Re: NS_LITERAL_STRING is broken
|
---|
1305 | </pre>
|
---|
1306 |
|
---|
1307 | <p>The behavior you describe sounds exactly like when you say
|
---|
1308 |
|
---|
1309 | <div class="source-code">
|
---|
1310 | <pre>
|
---|
1311 | const char* foobar = "foobar";
|
---|
1312 |
|
---|
1313 | ... NS_LITERAL_STRING(foobar).get() ...
|
---|
1314 | </pre>
|
---|
1315 | </div>
|
---|
1316 |
|
---|
1317 | <p>because in this case, the thing passed in is a <span class="code">const char*</span>.
|
---|
1318 | <span class="code">NS_LITERAL_STRING</span> is not meant to be used in this way. It is only
|
---|
1319 | meant to be used around a <span class="code">"</span> delimited string. The type of such is
|
---|
1320 | <span class="code">const char[N]</span> where N is the number of characters in the string + 1
|
---|
1321 | for the zero terminator it helpfully adds. <span class="code">sizeof</span> such a type is
|
---|
1322 | <span class="code">N</span>.
|
---|
1323 |
|
---|
1324 | <p>Are you sure you had the actual string as an argument, as in your
|
---|
1325 | example to me? Or could the actual code have been like my sample,
|
---|
1326 | above?
|
---|
1327 |
|
---|
1328 |
|
---|
1329 |
|
---|
1330 |
|
---|
1331 |
|
---|
1332 | <hr>
|
---|
1333 | <pre>
|
---|
1334 | Date: Thu, 29 Jun 2000 13:35:10 -0400
|
---|
1335 | Subject: Re: a fix
|
---|
1336 | </pre>
|
---|
1337 |
|
---|
1338 | <pre class="email-quote">
|
---|
1339 | > + if (Length() == 0) { return nsnull; }
|
---|
1340 | </pre>
|
---|
1341 |
|
---|
1342 |
|
---|
1343 | <p>Dave,
|
---|
1344 |
|
---|
1345 | <p>please read
|
---|
1346 |
|
---|
1347 | <a class="exact-uri" href="news://news.mozilla.org/[email protected]">news://news.mozilla.org/[email protected]</a>
|
---|
1348 |
|
---|
1349 | <p>It's just plain wrong to let people try to index into a string outside
|
---|
1350 | its defined contents. I can't just return <span class="code">'\0'</span> or <span class="code">PRUnichar('\0')</span>
|
---|
1351 | there as that <strong>could</strong> be a legal value to have somewhere in your
|
---|
1352 | string for some encodings ... and the encoding is not specified. So
|
---|
1353 | your patch has the basic problem of defeating my plan to stop people
|
---|
1354 | from doing this bad thing.
|
---|
1355 |
|
---|
1356 | <p>The second problem with your patch is that you use the symbolic
|
---|
1357 | constant <span class="code">nsnull</span>, which is ostensibly a pointer value; <span class="code">Last</span> returns
|
---|
1358 | a character. <span class="code">nsnull</span> is not appropriate for that purpose. In fact,
|
---|
1359 | C++ gurus pretty much eschew the use of symbolic constants for <span class="code">0</span>.
|
---|
1360 | <span class="code">NULL</span> is to be avoided. <span class="code">nsnull</span> is wrong-headed in that it presumes
|
---|
1361 | we could have some <strong>other</strong> application specific value for <span class="code">NULL</span>. We
|
---|
1362 | can't, it would never work. It's just wasted brain-print. Always use
|
---|
1363 | <span class="code">0</span> for these situations, and if you want to communicate the fact that
|
---|
1364 | something is a pointer type, either use a comment or a
|
---|
1365 | (construction-style) cast, like so (graded examples from worst to
|
---|
1366 | best:)
|
---|
1367 |
|
---|
1368 | <ul>
|
---|
1369 | <li>F: FindChildByNameWithHint("Chuck", nsnull);
|
---|
1370 |
|
---|
1371 | <li>D: FindChildByNameWithHint("Chuck", NULL);
|
---|
1372 |
|
---|
1373 | <li>C: FindChildByNameWithHint("Chuck", /* Child* */ 0);
|
---|
1374 |
|
---|
1375 | <li>B: typedef Child* Child_ptr;
|
---|
1376 | FindChildByNameWithHint("Chuck", Child_ptr(0));
|
---|
1377 |
|
---|
1378 | <li>A: FindChildByNameWithHint("Chuck", 0);
|
---|
1379 | </ul>
|
---|
1380 |
|
---|
1381 | <p>Don't let this discourage you; keep up the good work :-)
|
---|
1382 |
|
---|
1383 |
|
---|
1384 |
|
---|
1385 |
|
---|
1386 |
|
---|
1387 | <hr>
|
---|
1388 | <pre>
|
---|
1389 | Date: Tue, 8 Aug 2000 23:47:16 -0400
|
---|
1390 | Subject: Re: nsWritingIterator?
|
---|
1391 | </pre>
|
---|
1392 |
|
---|
1393 | <pre class="email-quote">
|
---|
1394 | >Can you give me any pointers to examples, or docs, or just some
|
---|
1395 | >general advice?
|
---|
1396 | </pre>
|
---|
1397 |
|
---|
1398 | <a class="exact-uri" href="http://ScottCollins.net/Journal/discussion/string_iterators.html">http://ScottCollins.net/Journal/discussion/string_iterators.html</a>
|
---|
1399 |
|
---|
1400 | <p>does this help?
|
---|
1401 |
|
---|
1402 | <p>I can personally walk you through any specific scenario you need.
|
---|
1403 |
|
---|
1404 |
|
---|
1405 |
|
---|
1406 |
|
---|
1407 |
|
---|
1408 | <hr>
|
---|
1409 | <pre>
|
---|
1410 | Date: Wed, 9 Aug 2000 02:35:03 -0400
|
---|
1411 | Subject: Re: nsWritingIterator?
|
---|
1412 | </pre>
|
---|
1413 |
|
---|
1414 | <p>You got it right... it's <span class="code">nsWritingIterator<CharT></span> for whichever
|
---|
1415 | character type you care about, either <span class="code">char</span> or <span class="code">PRUnichar</span>. You
|
---|
1416 | _can_ use this iterator like a character pointer ... that is, you can
|
---|
1417 | dereference it, assign into its dereference, etc. It is more
|
---|
1418 | efficient, though, to directly address a particular range of
|
---|
1419 | characters around where it points by asking it for its actual
|
---|
1420 | character pointer with <span class="code">get</span>, and knowing that there are
|
---|
1421 | <span class="code">size_forward()</span> characters available ahead of that pointer and
|
---|
1422 | <span class="code">size_backward()</span> characters available behind it. After examining
|
---|
1423 | those characters by hand, you can advance the iterator beyond the
|
---|
1424 | characters you have examined (and possibly into the next chunk, should
|
---|
1425 | one exist) by adding into it (with +=) the count of the characters you
|
---|
1426 | have processed.
|
---|
1427 |
|
---|
1428 | <p>Here are three examples of running through a string and modifying some
|
---|
1429 | of the characters in it. All use <span class="code">nsWritingIterator</span>s.
|
---|
1430 |
|
---|
1431 |
|
---|
1432 | <div class="source-code">
|
---|
1433 | <pre>
|
---|
1434 | // inefficient, but works in a pinch:
|
---|
1435 | // iterators can hide all details of chunks by acting like
|
---|
1436 | // a raw character pointer
|
---|
1437 |
|
---|
1438 | nsWritingIterator<PRUnichar> s = S.BeginWriting();
|
---|
1439 | nsWritingIterator<PRUnichar> done_with_string = S.EndWriting();
|
---|
1440 |
|
---|
1441 | // for each character in the string |S|
|
---|
1442 | while ( s != done_with_string )
|
---|
1443 | {
|
---|
1444 | // if the character is lower case, capitalize it
|
---|
1445 | if ( 'a' <= *s && *s <= 'z' )
|
---|
1446 | *s = *s -'a' + 'A';
|
---|
1447 | }
|
---|
1448 |
|
---|
1449 |
|
---|
1450 |
|
---|
1451 |
|
---|
1452 | // efficient
|
---|
1453 | // iterators provide a mechanism by which you can process
|
---|
1454 | // a chunk-at-a-time
|
---|
1455 |
|
---|
1456 | nsWritingIterator<PRUnichar> iter = S.BeginWriting();
|
---|
1457 | nsWritingIterator<PRUnichar> done_with_string = S.EndWriting();
|
---|
1458 |
|
---|
1459 | // for each chunk of the string
|
---|
1460 | while ( iter != done_with_string )
|
---|
1461 | {
|
---|
1462 | size_t N = iter.size_forward(); // # of chars in this chunk
|
---|
1463 | PRUnichar* s = iter.get();
|
---|
1464 | PRUnichar* done_with_chunk = s + N;
|
---|
1465 |
|
---|
1466 | // for each character in this chunk
|
---|
1467 | for ( ; s < done_with_chunk; ++s )
|
---|
1468 | {
|
---|
1469 | // if the character is lower case, capitalize it
|
---|
1470 | if ( 'a' <= *s && *s <= 'z' )
|
---|
1471 | *s = *s - 'a' + 'A';
|
---|
1472 | }
|
---|
1473 |
|
---|
1474 | // advance the iterator past characters
|
---|
1475 | // we examined (and into the next chunk, if any)
|
---|
1476 | s += N;
|
---|
1477 | }
|
---|
1478 |
|
---|
1479 |
|
---|
1480 |
|
---|
1481 | // elegant
|
---|
1482 | // pull your transformation into a `sink', and |copy_string|
|
---|
1483 | // will efficiently pump any kind of string into it
|
---|
1484 |
|
---|
1485 | struct Capitalize
|
---|
1486 | {
|
---|
1487 | // inline
|
---|
1488 | PRUint32
|
---|
1489 | write( PRUnichar* s, PRUint32 N )
|
---|
1490 | // processes one chunk, called repeatedly by |copy_string|
|
---|
1491 | {
|
---|
1492 | PRUnichar* done_with_chunk = s + N;
|
---|
1493 |
|
---|
1494 | // for each character in this chunk
|
---|
1495 | for ( ; s < done_with_chunk; ++s )
|
---|
1496 | {
|
---|
1497 | // if the character is lower case, capitalize it
|
---|
1498 | if ( 'a' <= *s && *s <= 'z' )
|
---|
1499 | *s = *s - 'a' + 'A';
|
---|
1500 | }
|
---|
1501 | }
|
---|
1502 | };
|
---|
1503 |
|
---|
1504 | copy_string(S.BeginWriting(), S.EndWriting(), Capitalize());
|
---|
1505 | </pre>
|
---|
1506 | </div>
|
---|
1507 |
|
---|
1508 |
|
---|
1509 |
|
---|
1510 | <p>Does this show it better?
|
---|
1511 |
|
---|
1512 |
|
---|
1513 |
|
---|
1514 |
|
---|
1515 |
|
---|
1516 | <hr>
|
---|
1517 | <pre>
|
---|
1518 | Date: Thu, 17 Aug 2000 18:23:22 -0400
|
---|
1519 | </pre>
|
---|
1520 |
|
---|
1521 | <pre class="email-quote">
|
---|
1522 | >I tried looking at the string header files but they
|
---|
1523 | >are awfully complicated.
|
---|
1524 | </pre>
|
---|
1525 |
|
---|
1526 | <p>I'll explain things in a little <strong>more</strong> detail than you need, then so
|
---|
1527 | that some of the stuff you see in these headers will make more sense.
|
---|
1528 | I'll also answer your questions out of order.
|
---|
1529 |
|
---|
1530 | <p>First: the string hierarchy looks like this
|
---|
1531 |
|
---|
1532 | <a class="exact-uri" href="http://ScottCollins.net/Journal/discussion/string_hierarchy.gif">http://ScottCollins.net/Journal/discussion/string_hierarchy.gif</a>
|
---|
1533 |
|
---|
1534 | <p>The two most important headers are:
|
---|
1535 |
|
---|
1536 | <a class="exact-uri" href="http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsAReadableString.h">http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsAReadableString.h</a>
|
---|
1537 | <a class="exact-uri" href="http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsAWritableString.h">http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsAWritableString.h</a>
|
---|
1538 |
|
---|
1539 | <p>These abstract classes, <span class="code">nsAReadable[C]String</span>, and
|
---|
1540 | <span class="code">nsAWritable[C]String</span> are typically what you will want to use in the
|
---|
1541 | interfaces of new code. If you write a piece of code that takes a
|
---|
1542 | string for input, consider, e.g.,
|
---|
1543 |
|
---|
1544 | <div class="source-code">
|
---|
1545 | <pre>
|
---|
1546 | void consumes_a_string( const nsAReadableString& aInput );
|
---|
1547 | </pre>
|
---|
1548 | </div>
|
---|
1549 |
|
---|
1550 | <p>If you write a piece of code that modifies a string, consider
|
---|
1551 |
|
---|
1552 | <div class="source-code">
|
---|
1553 | <pre>
|
---|
1554 | void modifies_a_string( nsAWritableString& aResult );
|
---|
1555 | </pre>
|
---|
1556 | </div>
|
---|
1557 |
|
---|
1558 |
|
---|
1559 | <p>When creating your own classes, member strings will typically be
|
---|
1560 | <span class="code">nsString</span>s. When you can't avoid creating a short string that you
|
---|
1561 | need only temporarily during a function, you will typically use
|
---|
1562 | <span class="code">nsAutoString</span>. When someone passes you a raw pointer, or a raw
|
---|
1563 | pointer and a length, representing a buffer of characters that you may
|
---|
1564 | examine, but won't own, you can treat it like a string by wrapping it
|
---|
1565 | in an <span class="code">nsLiteralString</span>, e.g.,
|
---|
1566 |
|
---|
1567 | <div class="source-code">
|
---|
1568 | <pre>
|
---|
1569 | void
|
---|
1570 | reads_a_buffer( const PRUnichar* aInput, PRUint32 aInputLength )
|
---|
1571 | {
|
---|
1572 | nsLiteralString input(aInput, aInputLength);
|
---|
1573 | // doesn't allocate or copy
|
---|
1574 |
|
---|
1575 | // ...
|
---|
1576 | }
|
---|
1577 | </pre>
|
---|
1578 | </div>
|
---|
1579 |
|
---|
1580 | <p>You will use <span class="code">nsLiteralString</span> around quoted constant strings as well,
|
---|
1581 | though typically through the <span class="code">NS_LITERAL_STRING</span> macro, to avoid doing
|
---|
1582 | a length calculation
|
---|
1583 |
|
---|
1584 | <div class="source-code">
|
---|
1585 | <pre>
|
---|
1586 | NS_LITERAL_STRING("x")
|
---|
1587 | </pre>
|
---|
1588 | </div>
|
---|
1589 |
|
---|
1590 | <p>expands to
|
---|
1591 |
|
---|
1592 | <div class="source-code">
|
---|
1593 | <pre>
|
---|
1594 | nsLiteralString(L"x", (sizeof(L"x")/sizeof(PRUnichar) - 1))
|
---|
1595 | </pre>
|
---|
1596 | </div>
|
---|
1597 |
|
---|
1598 | <p>if <span class="code">L</span> notation works as needed on your platform.
|
---|
1599 |
|
---|
1600 | Those are the basics. Now onto your questions:
|
---|
1601 |
|
---|
1602 |
|
---|
1603 | <pre class="email-quote">
|
---|
1604 | >For example this won't compile. [...]
|
---|
1605 |
|
---|
1606 | >str1 += L"abc " + str2 + L"def";
|
---|
1607 | </pre>
|
---|
1608 |
|
---|
1609 |
|
---|
1610 | <p><span class="code">L"abc "</span> makes a an object that is a <span class="code">const wchar_t[5]</span>, and none of
|
---|
1611 | the string code knows about <span class="code">wchar_t</span>. The main reason is that
|
---|
1612 | <span class="code">wchar_t</span> is not necessarily the right size (it can be 4 bytes under
|
---|
1613 | gcc). If you wrap these constant expressions in <span class="code">NS_LITERAL_STRING</span>,
|
---|
1614 | as described above, you should get the right thing, e.g.,
|
---|
1615 |
|
---|
1616 | <div class="source-code">
|
---|
1617 | <pre>
|
---|
1618 | str1 += NS_LITERAL_STRING("abc ") + str2 + NS_LITERAL_STRING("def");
|
---|
1619 | </pre>
|
---|
1620 | </div>
|
---|
1621 |
|
---|
1622 |
|
---|
1623 | <pre class="email-quote">
|
---|
1624 | >Another one is:
|
---|
1625 | >function(const PRUnichar *foo);
|
---|
1626 | >call function(L"abc " + str2);
|
---|
1627 |
|
---|
1628 | >It won't create a temporary nsString.
|
---|
1629 | </pre>
|
---|
1630 |
|
---|
1631 | <p>This one, I have a quick and easy explanation for. If <span class="code">function</span> was
|
---|
1632 | declared like this
|
---|
1633 |
|
---|
1634 | <div class="source-code">
|
---|
1635 | <pre>
|
---|
1636 | function( const nsAReadableString& )
|
---|
1637 | </pre>
|
---|
1638 | </div>
|
---|
1639 |
|
---|
1640 | <p>then, no problem, since a <span class="code">nsPromiseConcatenation</span> (which was the
|
---|
1641 | result of adding those two things together) <strong>is</strong> a readable string.
|
---|
1642 | No other objects need to be created; no copying needs to be performed.
|
---|
1643 |
|
---|
1644 | <p>In all cases, we want the creation of <span class="code">nsString</span>s et al, to be
|
---|
1645 | <span class="code">explicit</span>, since creation is unbelievably expensive, requiring heap
|
---|
1646 | allocation, locks, copying, etc.
|
---|
1647 |
|
---|
1648 | <p>I hope this answers both your posts,
|
---|
1649 |
|
---|
1650 |
|
---|
1651 |
|
---|
1652 |
|
---|
1653 |
|
---|
1654 | <hr>
|
---|
1655 | <pre>
|
---|
1656 | Date: Thu, 17 Aug 2000 20:57:08 -0400
|
---|
1657 | Subject: re our conversation
|
---|
1658 | </pre>
|
---|
1659 |
|
---|
1660 | return ToNewUnicode( nsLiteralCString(buffer) );
|
---|
1661 |
|
---|
1662 |
|
---|
1663 |
|
---|
1664 |
|
---|
1665 |
|
---|
1666 |
|
---|
1667 | <hr>
|
---|
1668 | <pre>
|
---|
1669 | Date: Fri, 18 Aug 2000 02:52:45 -0400
|
---|
1670 | Subject: Re: More questions and new string API
|
---|
1671 | </pre>
|
---|
1672 |
|
---|
1673 | <pre class="email-quote">
|
---|
1674 | >1) How do I return a static string?
|
---|
1675 |
|
---|
1676 | >const nsAReadableString& foo() {return NS_LITERAL_STRING("x");}
|
---|
1677 | >errors on taking the address of a temporary variable.
|
---|
1678 | </pre>
|
---|
1679 |
|
---|
1680 | <p>Unfortunately, <span class="code">NS_LITERAL_STRING</span>s definition is not particularly
|
---|
1681 | amenable to this use. Instead, you would have to say something like
|
---|
1682 | this:
|
---|
1683 |
|
---|
1684 | <div class="source-code">
|
---|
1685 | <pre>
|
---|
1686 | const nsAReadableString&
|
---|
1687 | foo()
|
---|
1688 | {
|
---|
1689 | #ifdef HAVE_CPP_2BYTE_WCHAR_T
|
---|
1690 | static nsLiteralString static_foo(L"x", 1);
|
---|
1691 | #else
|
---|
1692 | static nsLiteralString static_foo;
|
---|
1693 | static PRBool initialized = PR_FALSE;
|
---|
1694 | if ( !initialized )
|
---|
1695 | {
|
---|
1696 | static_foo.AssignWithConversion("x", 1);
|
---|
1697 | initialized = PR_TRUE;
|
---|
1698 | }
|
---|
1699 | #endif
|
---|
1700 | return static_foo;
|
---|
1701 | }
|
---|
1702 | </pre>
|
---|
1703 | </div>
|
---|
1704 |
|
---|
1705 |
|
---|
1706 | <pre class="email-quote">
|
---|
1707 | >2) I'm using these with the STL library in an XPCOM component.
|
---|
1708 | >What type should I use with map? This doesn't work...
|
---|
1709 |
|
---|
1710 | >typedef map<const nsAReadableString&, myType*> mapStringMyType;
|
---|
1711 | >mapStringMyType foo;
|
---|
1712 | >foo.find(nsAReadableString); - I want to find on a ReadableString
|
---|
1713 | </pre>
|
---|
1714 |
|
---|
1715 | <p>I don't know what errors you are getting; but it probably doesn't work
|
---|
1716 | because a reference isn't an assignable type. This is just a guess.
|
---|
1717 | You may need to use
|
---|
1718 |
|
---|
1719 | <div class="source-code">
|
---|
1720 | <pre>
|
---|
1721 | map<const nsAReadableString*, myType*>
|
---|
1722 | </pre>
|
---|
1723 | </div>
|
---|
1724 |
|
---|
1725 | <p>If you actually want the map to manage ownership of the keys, then
|
---|
1726 | you'll want to use a concrete type, e.g.,
|
---|
1727 |
|
---|
1728 | <div class="source-code">
|
---|
1729 | <pre>
|
---|
1730 | map<nsString, myType*>
|
---|
1731 | </pre>
|
---|
1732 | </div>
|
---|
1733 |
|
---|
1734 | <p>or perhaps
|
---|
1735 |
|
---|
1736 | <div class="source-code">
|
---|
1737 | <pre>
|
---|
1738 | map<nsSharedStringPtr, myType*>
|
---|
1739 | </pre>
|
---|
1740 | </div>
|
---|
1741 |
|
---|
1742 | <p>Or maybe there's something else wrong. Send me the error messages.
|
---|
1743 | If you end up using a pointer, then of course you'll have to supply a
|
---|
1744 | comparison function to the <span class="code">map</span> template. You won't be satisfied
|
---|
1745 | with the default comparison of pointers :-) Sorry I couldn't answer
|
---|
1746 | this one more completely.
|
---|
1747 |
|
---|
1748 |
|
---|
1749 | <pre class="email-quote">
|
---|
1750 | >3) How do a get a raw PRUnichar pointer out of nsAReadableString
|
---|
1751 | >when I need to call something that wants 'unsigned short *'?
|
---|
1752 | </pre>
|
---|
1753 |
|
---|
1754 | <p>The problem with this scenario is that an <span class="code">nsAReadableString</span> doesn't
|
---|
1755 | promise that all its data is contiguous, nor that it is
|
---|
1756 | zero-terminated, which is what I suspect you want in this case. If
|
---|
1757 | the function you want to call can take {pointer, length} tuples, and
|
---|
1758 | can consume the string in hunks without zero termination ... then you
|
---|
1759 | can use <span class="code">copy_string</span> to pump the string into your function, see
|
---|
1760 |
|
---|
1761 | <a class="exact-uri" href="http://ScottCollins.net/Journal/discussion/string_iterators.html">http://ScottCollins.net/Journal/discussion/string_iterators.html</a>
|
---|
1762 |
|
---|
1763 | <p>If not, and you absolutely have to have a contiguous zero-terminated
|
---|
1764 | buffer, then there is a new facility (part of the DOMAPI branch) that
|
---|
1765 | does what you need. It's not checked in on the trunk; it should
|
---|
1766 | be in early next week. It is <span class="code">nsPromiseFlatString</span>. This class
|
---|
1767 | promises a contiguous zero-terminated buffer; and has an <span class="code">operator
|
---|
1768 | PRUnichar*</span> to produce a pointer to that buffer automatically. If the
|
---|
1769 | underlying class <strong>is</strong> one that happens to be a single fragment and
|
---|
1770 | zero-terminated, then, like <span class="code">nsPromiseSubstring</span> and
|
---|
1771 | <span class="code">nsPromiseConcatenation</span>, this class merely holds a reference into the
|
---|
1772 | original data. If, however, the underlying string is multi-fragment
|
---|
1773 | or not zero-terminated, then <span class="code">nsPromiseFlatString</span> allocates a
|
---|
1774 | contiguous buffer of appropriate size and copies the fragmented string
|
---|
1775 | data to it. So given
|
---|
1776 |
|
---|
1777 | <div class="source-code">
|
---|
1778 | <pre>
|
---|
1779 | void ReadBuffer( PRUnichar* );
|
---|
1780 | </pre>
|
---|
1781 | </div>
|
---|
1782 |
|
---|
1783 | <p>You can call this as efficiently as possible with an arbitrary string
|
---|
1784 | like so
|
---|
1785 |
|
---|
1786 | <div class="source-code">
|
---|
1787 | <pre>
|
---|
1788 | ReadBuffer( nsPromiseFlatString(aString) );
|
---|
1789 | </pre>
|
---|
1790 | </div>
|
---|
1791 |
|
---|
1792 |
|
---|
1793 | <p>If the function you are calling needs to take ownership of the buffer
|
---|
1794 | you hand it, then you will probably call <span class="code">ToNewUnicode</span> like so
|
---|
1795 |
|
---|
1796 | <div class="source-code">
|
---|
1797 | <pre>
|
---|
1798 | void ConsumeBuffer( PRUnichar* );
|
---|
1799 |
|
---|
1800 | ConsumeBuffer( ToNewUnicode(aString) );
|
---|
1801 | </pre>
|
---|
1802 | </div>
|
---|
1803 |
|
---|
1804 | <p>The global function <span class="code">ToNewUnicode</span> is declared in "nsReadableUtils.h",
|
---|
1805 | and was only recently added to the build. It is currently being used
|
---|
1806 | in the DOMAPI branch. It is part of the build, but the file
|
---|
1807 | "dlldeps.c" in XPCOM may need to be modified to ensure it is exported
|
---|
1808 | on your platform if you are building the tip.
|
---|
1809 |
|
---|
1810 | Needless to say, you want to avoid functions that require bare
|
---|
1811 | pointers for several reasons: (a) they typically assume
|
---|
1812 | zero-termination, which is not guaranteed by the normal encodings; (b)
|
---|
1813 | they require contiguous allocation, which may not be possible; (c)
|
---|
1814 | they scan for the end of the string, at linear cost (if the encoding
|
---|
1815 | makes it possible at all), when the length could be known in advance.
|
---|
1816 | If you have to do it, the above mechanisms work, but be aware of the
|
---|
1817 | cost and the potential need to copy.
|
---|
1818 |
|
---|
1819 |
|
---|
1820 | <pre class="email-quote">
|
---|
1821 | >4) How do I declare a local variable to hold a nsAReadableString?
|
---|
1822 | >and a member variable?
|
---|
1823 | </pre>
|
---|
1824 |
|
---|
1825 | <p><span class="code">nsAReadableString</span> is an abstract type. So you can't have a concrete
|
---|
1826 | instance of it. All strings in the hierarchy are readable strings.
|
---|
1827 | If you just want a reference to a readable string, you can say, e.g.,
|
---|
1828 |
|
---|
1829 | <div class="source-code">
|
---|
1830 | <pre>
|
---|
1831 | struct foo
|
---|
1832 | {
|
---|
1833 | const nsAReadableString& mString;
|
---|
1834 | // ...
|
---|
1835 |
|
---|
1836 | foo( const nsAReadableString& aString ) : mString(aString) { }
|
---|
1837 | };
|
---|
1838 | </pre>
|
---|
1839 | </div>
|
---|
1840 |
|
---|
1841 | <p>...similarly with pointers; but I suspect you are looking for
|
---|
1842 | something more concrete. An <span class="code">nsString</span> is a <span class="code">nsAReadableString</span>, and
|
---|
1843 | is the typical thing you want as a member variable. An <span class="code">nsAutoString</span>
|
---|
1844 | is also an <span class="code">nsAReadableString</span> and is typically what you would use for
|
---|
1845 | a short (in length) temporary (in lifetime) local variable, as I
|
---|
1846 | mentioned in my previous post.
|
---|
1847 |
|
---|
1848 |
|
---|
1849 | <pre class="email-quote">
|
---|
1850 | >5) If I call a function that returns a PRUnichar* and I want t
|
---|
1851 | >use it as a nsAReadableString should I wrap it in a
|
---|
1852 | >nsLiteralString?
|
---|
1853 | </pre>
|
---|
1854 |
|
---|
1855 | <p>Yes, though remember, an <span class="code">nsLiteralString</span> assumes the lifetime of the
|
---|
1856 | underlying data is under someone else's control. If the called
|
---|
1857 | function gives you a buffer that you need to <span class="code">delete</span>, you will have
|
---|
1858 | to manage that yourself. Currently, people often use <span class="code">nsXPIDLString</span>
|
---|
1859 | to handle that. XPIDL strings are <strong>not</strong> part of the hierarchy. They
|
---|
1860 | are only used as a sort of string-<span class="code">auto_ptr</span>. However, I'm
|
---|
1861 | integrating their functionality into <span class="code">nsString</span>. There is no problem
|
---|
1862 | in wrapping the same pointer in both as two separate local variables,
|
---|
1863 | one to give you the readable interface, and one to manage the
|
---|
1864 | lifetime.
|
---|
1865 |
|
---|
1866 | <p>If it's OK with you, I'd like to post this reply (including your
|
---|
1867 | quoted questions) to n.p.m.xpcom and also put a copy near the string
|
---|
1868 | iterator discussion I provided a link to above, so that other people
|
---|
1869 | with similar questions can see these answers.
|
---|
1870 |
|
---|
1871 | <p>Hope this helps,
|
---|
1872 |
|
---|
1873 |
|
---|
1874 |
|
---|
1875 |
|
---|
1876 |
|
---|
1877 | <hr>
|
---|
1878 | <pre>
|
---|
1879 | Date: Sun, 3 Sep 2000 03:52:17 -0400
|
---|
1880 | </pre>
|
---|
1881 |
|
---|
1882 | <p>In article <8nu9m2$eo14@secnews.netscape.com>, "Jon Smirl"
|
---|
1883 | <jonsmirl@mediaone.com> wrote:
|
---|
1884 |
|
---|
1885 | > I have the new strings up and running in my app. They work as
|
---|
1886 | > advertised and
|
---|
1887 | > I haven't found any bugs. Thanks for the good job in designing and
|
---|
1888 | > implementing them. Here's are a summary of issues I've encountered
|
---|
1889 | > so far...
|
---|
1890 |
|
---|
1891 | <p>Thanks, and I appreciate your comments and insights.
|
---|
1892 |
|
---|
1893 |
|
---|
1894 | >
|
---|
1895 | > 1) Should there be a nsSegmentedString derived from nsString instead
|
---|
1896 | > of building segment support into nsString? None of my strings are
|
---|
1897 | > segmented but
|
---|
1898 | > I keep executing code that is supports it. nsPromiseFlatString would
|
---|
1899 | > be trivial in the non-segmented case.
|
---|
1900 |
|
---|
1901 | <p>The general case is that a string does not promise to have contiguous
|
---|
1902 | data. A specific case is that, for some implementations, it does.
|
---|
1903 | You couldn't do it the other way around, because a segmented string
|
---|
1904 | couldn't satisfy all the promises of a flat string. However, through
|
---|
1905 | the use of chunky iterators, operating on strings that happen to be
|
---|
1906 | flat is very efficient. In fact, <span class="code">nsPromiseFlatString</span> is trivial in
|
---|
1907 | the non-segmented case. In addition, I'll be adding an abstract flat
|
---|
1908 | class into the hierarchy, which will present additional interface ...
|
---|
1909 | in your local routines where you actually have declared a concrete
|
---|
1910 | string instance that happens to be flat, the compiler will give you
|
---|
1911 | the benefit of using the flat specific routines (e.g., a substring
|
---|
1912 | object over a flat string is simpler than the general purpose
|
---|
1913 | substring). I need to be cautious about this, though, since I don't
|
---|
1914 | automatically want people propagating the flat type through their
|
---|
1915 | interfaces. That would put us in the same boat we're in right now ...
|
---|
1916 | where routines only work on a specific kind of string, which denies
|
---|
1917 | other parts of the code the opportunity to use an implementation
|
---|
1918 | beneficial to its specific needs, and typically for no good reason.
|
---|
1919 |
|
---|
1920 | >
|
---|
1921 | > 2) Should nsAWritableString have a way to get the buffer and then
|
---|
1922 | > return it?
|
---|
1923 | > I need to get the buffer to pass it to OS calls. I'm doing this now
|
---|
1924 | > by passing around nsStrings instead of the interface. If I just use
|
---|
1925 | > the interface I encur an extra copy since I have to use a temporary
|
---|
1926 | > buffer.
|
---|
1927 |
|
---|
1928 | <p>A specific string implementation could promise this, but in general, a
|
---|
1929 | writable could not. After all, a writable doesn't even guarantee
|
---|
1930 | contiguous storage. To some degree, this is what
|
---|
1931 | <span class="code">nsPromiseFlatString</span> is for. However, this is a readable promise
|
---|
1932 | only. It will also be the case that <span class="code">ns[C]String</span>s, in the very near
|
---|
1933 | future will be able to just assume ownership of an arbitrary buffer
|
---|
1934 | allocated on the free store with the XPCOM allocators ... getting one
|
---|
1935 | to give up its buffer, on the other hand, presents some problems. Do
|
---|
1936 | you have a lot of places where the system writes into your string
|
---|
1937 | buffer space? Or do you have a lot of system routines that return you
|
---|
1938 | new buffers? I can imagine using <span class="code">nsPromiseFlatString</span> for this, but
|
---|
1939 | what happens when the OS alters the underlying data? If the promise
|
---|
1940 | had generated that flat data on behalf of a multi-fragment string,
|
---|
1941 | should it now put the changes back? It's possible to do, I just want
|
---|
1942 | to know if it's correct to allow this situation to happen.
|
---|
1943 |
|
---|
1944 |
|
---|
1945 |
|
---|
1946 | >
|
---|
1947 | > 3) There needs to be a NS_LITERAL_CHAR() to go along with
|
---|
1948 | > NS_LITERAL_STRING().
|
---|
1949 |
|
---|
1950 | <p>OK.
|
---|
1951 |
|
---|
1952 |
|
---|
1953 |
|
---|
1954 | > Having NS_LITERAL_STRING() all over the code clutters
|
---|
1955 | > it up and makes it hard to tell what the code is doing, could we
|
---|
1956 | > have a standard short alias for this?
|
---|
1957 |
|
---|
1958 | <p>Yes, I'll try to think of something ... perhaps <span class="code">NS_LSTR</span>?
|
---|
1959 |
|
---|
1960 |
|
---|
1961 | > 4) nsLiteralString should support n.ToInteger(&error);
|
---|
1962 |
|
---|
1963 | <p><span class="code">ToInteger</span> is actually a bad interface. It's only good if your
|
---|
1964 | entire string is the number; this encourages you to edit your string
|
---|
1965 | until it is one, or perhaps copy the numeric part to another string.
|
---|
1966 | Better if you just <span class="code">sscanf</span> a string (don't know if I can provide
|
---|
1967 | that in the general case, but I'm thinking about it), or else use
|
---|
1968 | regular C++ extractors (which wouldn't be too hard for me to
|
---|
1969 | provide), or else I could give you a <span class="code">ToInteger</span> that works on a pair
|
---|
1970 | of iterators, extracting the integer from the digits between them.
|
---|
1971 |
|
---|
1972 | >
|
---|
1973 | > 5) There should be a global define for an interface to a readonly
|
---|
1974 | > empty string.
|
---|
1975 |
|
---|
1976 | <p>Yes, there will be.
|
---|
1977 |
|
---|
1978 |
|
---|
1979 | >
|
---|
1980 | > 6) Something is wrong with concatenation....
|
---|
1981 |
|
---|
1982 | <p>Hopefully I've fixed this now.
|
---|
1983 |
|
---|
1984 |
|
---|
1985 |
|
---|
1986 | > 8) A forward definition is missing in the h files
|
---|
1987 |
|
---|
1988 | <p>I'll check it out.
|
---|
1989 |
|
---|
1990 |
|
---|
1991 |
|
---|
1992 | <p>My understanding is that you have already found the answers to your
|
---|
1993 | other questions.
|
---|
1994 |
|
---|
1995 | <p>I hope this helps,
|
---|
1996 |
|
---|
1997 |
|
---|
1998 |
|
---|
1999 |
|
---|
2000 | <hr>
|
---|
2001 | <pre>
|
---|
2002 | Date: Wed, 20 Sep 2000 17:32:13 -0400
|
---|
2003 | Subject: Re: how to free an nsString::ToNewCString
|
---|
2004 | </pre>
|
---|
2005 |
|
---|
2006 | <pre class="email-quote">
|
---|
2007 | >What's the current approved way to free an nsString::ToNewCString?
|
---|
2008 | </pre>
|
---|
2009 |
|
---|
2010 | <p><span class="code">nsMemory::Free</span>
|
---|
2011 |
|
---|
2012 |
|
---|
2013 |
|
---|
2014 |
|
---|
2015 |
|
---|
2016 | <hr>
|
---|
2017 |
|
---|
2018 | <p>You use several <span class="code">NS_ConvertASCIItoUTF16("...").get()</span>, these should be
|
---|
2019 |
|
---|
2020 | NS_LITERAL_STRING("...").get()
|
---|
2021 |
|
---|
2022 | <p>Don't do this to the very first case where you aren't wrapping an actual literal string.
|
---|
2023 | The first instance would should exploit <span class="code">NS_LITERAL_STRING</span> technology as well,
|
---|
2024 | around the initial declarations of the strings ... probably want to do this with
|
---|
2025 | <span class="code">NS_NAMED_LITERAL_STRING</span>.
|
---|
2026 |
|
---|
2027 |
|
---|
2028 |
|
---|
2029 | <hr>
|
---|
2030 | <pre>
|
---|
2031 | Date: Thu, 12 Oct 2000 00:57:28 -0400
|
---|
2032 | Subject: string answers
|
---|
2033 | </pre>
|
---|
2034 |
|
---|
2035 | <div class="source-code">
|
---|
2036 | <pre>
|
---|
2037 | nsresult
|
---|
2038 | DoSomething( nsAWritableString& answer )
|
---|
2039 | {
|
---|
2040 | nsresult rv;
|
---|
2041 |
|
---|
2042 | nsXPIDLString registry_data;
|
---|
2043 | Fetch("key", getter_Shares(registry_data));
|
---|
2044 |
|
---|
2045 | nsLiteralString path(not_my_string);
|
---|
2046 |
|
---|
2047 | PRInt32 first_colon = path.FindChar(PRUnichar(':'));
|
---|
2048 | if ( first_colon != -1 )
|
---|
2049 | {
|
---|
2050 | // convert ... extract path from |path|
|
---|
2051 | nsCOMPtr<nsILocalFile> localFile( do_CreateInstance(CID, &rv)
|
---|
2052 | );
|
---|
2053 | if ( localFile )
|
---|
2054 | {
|
---|
2055 |
|
---|
2056 | localFile->SetPersistentDescriptor(NS_ConvertUTF16toUTF8(path));
|
---|
2057 |
|
---|
2058 | nsXPIDLString converted_path;
|
---|
2059 | localFile->GetUnicodePath(getter_Copies(converted_path));
|
---|
2060 | answer = converted_path.get();
|
---|
2061 | }
|
---|
2062 | }
|
---|
2063 | else
|
---|
2064 | {
|
---|
2065 | answer = path;
|
---|
2066 | }
|
---|
2067 |
|
---|
2068 |
|
---|
2069 | return rv;
|
---|
2070 | }
|
---|
2071 | </pre>
|
---|
2072 | </div>
|
---|
2073 |
|
---|
2074 |
|
---|
2075 |
|
---|
2076 |
|
---|
2077 |
|
---|
2078 | <hr>
|
---|
2079 | <pre>
|
---|
2080 | Date: Thu, 12 Oct 2000 02:03:49 -0400
|
---|
2081 | Subject: Re: and the answer is ...
|
---|
2082 | </pre>
|
---|
2083 |
|
---|
2084 | <p>You can see from the line of code that you're on, that this should
|
---|
2085 | have been fine. <span class="code">nsMemory::Alloc</span> would be asked to allocate a 1 byte
|
---|
2086 | object. But it failed trying to allocate that. Which suggests that
|
---|
2087 | the allocator was busy and non-reentrant and the debugger tried to
|
---|
2088 | misuse it. Yes?
|
---|
2089 |
|
---|
2090 | <p>Of course, this doesn't solve your problem. Perhaps we need to go
|
---|
2091 | back to the idea of a function that returns a pointer to the first
|
---|
2092 | hunk of the string.
|
---|
2093 |
|
---|
2094 | <div class="source-code">
|
---|
2095 | <pre>
|
---|
2096 | const char*
|
---|
2097 | debug_string( const nsAReadableCString& aCString )
|
---|
2098 | {
|
---|
2099 | nsReadingIterator<char> iter;
|
---|
2100 | aCString.BeginReading(iter);
|
---|
2101 | return aCString.IsEmpty() ? "" : iter.get();
|
---|
2102 | }
|
---|
2103 | </pre>
|
---|
2104 | </div>
|
---|
2105 |
|
---|
2106 | <p>This code should work regardless of what the allocator is doing. The
|
---|
2107 | downsides are (a) it only returns the first hunk of the string, in the
|
---|
2108 | case of a multi-fragment string; and (b) that hunk <strong>might</strong> not be
|
---|
2109 | zero-terminated.
|
---|
2110 |
|
---|
2111 | <p>Hope this helps,
|
---|
2112 |
|
---|
2113 |
|
---|
2114 |
|
---|
2115 |
|
---|
2116 |
|
---|
2117 | <hr>
|
---|
2118 | <pre>
|
---|
2119 | Date: Thu, 12 Oct 2000 08:30:32 -0400
|
---|
2120 | Subject: Re: Self healing the cache :-)
|
---|
2121 | </pre>
|
---|
2122 |
|
---|
2123 | <p>At 3:04 PM -0400 10/11/00, Mike Shaver wrote:
|
---|
2124 | <pre class="email-quote">
|
---|
2125 | >NS_LITERAL_STRING(NS_XPCOM_SHUTDOWN_OBSERVER_ID);
|
---|
2126 | </pre>
|
---|
2127 |
|
---|
2128 | <p>Macro ugliness makes <span class="code">NS_LITERAL_STRING</span> inappropriate for use over
|
---|
2129 | other macros. In other words:
|
---|
2130 |
|
---|
2131 | <div class="source-code">
|
---|
2132 | <pre>
|
---|
2133 | NS_LITERAL_STRING("foo")
|
---|
2134 | </pre>
|
---|
2135 | </div>
|
---|
2136 |
|
---|
2137 | <p>is <strong>good</strong>.
|
---|
2138 |
|
---|
2139 | <div class="source-code">
|
---|
2140 | <pre>
|
---|
2141 | #define FOO "foo"
|
---|
2142 | NS_LITERAL_STRING(FOO)
|
---|
2143 | </pre>
|
---|
2144 | </div>
|
---|
2145 |
|
---|
2146 | <p>is <strong>bad</strong>. Why? Because it turns into
|
---|
2147 |
|
---|
2148 | <div class="source-code">
|
---|
2149 | <pre>
|
---|
2150 | nsLiteralString(LFOO, sizeof(LFOO)...
|
---|
2151 | </pre>
|
---|
2152 | </div>
|
---|
2153 |
|
---|
2154 | <p>and there is no <span class="code">LFOO</span>. Sorry. If you have to do this to a
|
---|
2155 | macro-ized string, do the magic by hand, e.g.,
|
---|
2156 |
|
---|
2157 | <div class="source-code">
|
---|
2158 | <pre>
|
---|
2159 | nsLiteralString(FOO, sizeof(FOO)/sizeof(PRUnichar)
|
---|
2160 | + sizeof(PRUnichar('\0')))
|
---|
2161 | </pre>
|
---|
2162 | </div>
|
---|
2163 |
|
---|
2164 | <p>or else if you don't care that <span class="code">nsLiteralString</span> will scan for the
|
---|
2165 | length, just say
|
---|
2166 |
|
---|
2167 | <div class="source-code">
|
---|
2168 | <pre>
|
---|
2169 | nsLiteralString(FOO)
|
---|
2170 | </pre>
|
---|
2171 | </div>
|
---|
2172 |
|
---|
2173 | <p>Hope this helps,
|
---|
2174 |
|
---|
2175 |
|
---|
2176 |
|
---|
2177 |
|
---|
2178 |
|
---|
2179 | <hr>
|
---|
2180 | <pre>
|
---|
2181 | Date: Thu, 12 Oct 2000 08:36:14 -0400
|
---|
2182 | Subject: Re: Self healing the cache :-)
|
---|
2183 | </pre>
|
---|
2184 |
|
---|
2185 | <p>Actually, I'm not even sure you can do it by hand, since you didn't
|
---|
2186 |
|
---|
2187 | <div class="source-code">
|
---|
2188 | <pre>
|
---|
2189 | #define FOO L"foo"
|
---|
2190 | </pre>
|
---|
2191 | </div>
|
---|
2192 |
|
---|
2193 | <p>and <strong>can't</strong> do that cross-platform. The other way around this is to
|
---|
2194 | define a global instead of a macro, that is, instead of saying
|
---|
2195 |
|
---|
2196 | <div class="source-code">
|
---|
2197 | <pre>
|
---|
2198 | #define FOO "foo"
|
---|
2199 | </pre>
|
---|
2200 | </div>
|
---|
2201 |
|
---|
2202 | <p>at the top of your file, say
|
---|
2203 |
|
---|
2204 | <div class="source-code">
|
---|
2205 | <pre>
|
---|
2206 | NS_NAMED_LITERAL_STRING(FOO, "foo")
|
---|
2207 | </pre>
|
---|
2208 | </div>
|
---|
2209 |
|
---|
2210 | <p>or else, if the macro was used only in one spot ... perhaps you could
|
---|
2211 | just eliminate the macro in favor of <span class="code">NS_NAMED_LITERAL</span> in situ.
|
---|
2212 |
|
---|
2213 | <p>Arghh. In this case, you may be stuck with the extra work of
|
---|
2214 | <span class="code">AssignWithConversion</span>.
|
---|
2215 |
|
---|
2216 |
|
---|
2217 |
|
---|
2218 |
|
---|
2219 |
|
---|
2220 | <hr>
|
---|
2221 | <pre>
|
---|
2222 | Date: Sun, 3 Dec 2000 16:38:07 -0400
|
---|
2223 | Subject: Re: another copy_string question
|
---|
2224 | </pre>
|
---|
2225 |
|
---|
2226 | <pre class="email-quote">
|
---|
2227 | >Is there a way to tell, inside the write() sink, if one is in the
|
---|
2228 | >final hunk? I need to do some special processing at the end.
|
---|
2229 | </pre>
|
---|
2230 |
|
---|
2231 | <p>No, there isn't. But you could move such special processing into the
|
---|
2232 | destructor of the sink. Remember, the sink is passed by reference, so
|
---|
2233 | you can exactly control its lifetime.
|
---|
2234 |
|
---|
2235 | <div class="source-code">
|
---|
2236 | <pre>
|
---|
2237 | {
|
---|
2238 | MySink sink;
|
---|
2239 | nsReadingIterator<PRUnichar> sourceStart = aStr.BeginReading();
|
---|
2240 | nsReadingIterator<PRUnichar> sourceEnd = aStr.EndReading();
|
---|
2241 | copy_string(sourceStart, sourceEnd, sink);
|
---|
2242 | // |sink| destructor executed here
|
---|
2243 | }
|
---|
2244 | </pre>
|
---|
2245 | </div>
|
---|
2246 |
|
---|
2247 | <p>Hope this helps,
|
---|
2248 |
|
---|
2249 |
|
---|
2250 |
|
---|
2251 |
|
---|
2252 |
|
---|
2253 | <hr>
|
---|
2254 | <pre>
|
---|
2255 | Date: Fri, 15 Dec 2000 20:02:08 -0400
|
---|
2256 | Subject: fragment of code
|
---|
2257 | </pre>
|
---|
2258 |
|
---|
2259 | <div class="source-code">
|
---|
2260 | <pre>
|
---|
2261 | nsPromiseFlatString flatKey(aReadable);
|
---|
2262 |
|
---|
2263 | flatKey.get()
|
---|
2264 | </pre>
|
---|
2265 | </div>
|
---|
2266 |
|
---|
2267 |
|
---|
2268 |
|
---|
2269 |
|
---|
2270 |
|
---|
2271 |
|
---|
2272 | <hr>
|
---|
2273 | <pre>
|
---|
2274 | Date: Tue, 16 Jan 2001 16:47:37 -0400
|
---|
2275 | Subject: Re: a few string questions...
|
---|
2276 | </pre>
|
---|
2277 |
|
---|
2278 | >I've accumulated a few questions I've been wanting to ask you, mostly
|
---|
2279 | >about string stuff. Nothing urgent, but I want to ask them before I
|
---|
2280 | >forget. So here goes...:
|
---|
2281 | >
|
---|
2282 | >1) Is it acceptable to use nsLiteralCString or nsLiteralString on
|
---|
2283 | >something that's not a literal? This can be useful in some places,
|
---|
2284 | >for example, to convert a char* to PRUnichar*:
|
---|
2285 | >
|
---|
2286 | >PRUnichar* new = ToNewUnicode(nsLiteralCString(myCharPtr));
|
---|
2287 |
|
---|
2288 | <p>This is explicitly allowed. That's why I'm proposing to change the
|
---|
2289 | names of those classes to <span class="code">nsLocal[C]String</span>.
|
---|
2290 |
|
---|
2291 |
|
---|
2292 | >2) Should nsString2x.h and nsString2x.cpp go away? They look like a
|
---|
2293 | >never-completed rewrite or something...
|
---|
2294 |
|
---|
2295 | <p>Yes. They should go away. They are uncompleted [old] bullshit,
|
---|
2296 | exactly as you diagnosed.
|
---|
2297 |
|
---|
2298 | <p>I'll look into the other two questions.
|
---|
2299 |
|
---|
2300 |
|
---|
2301 |
|
---|
2302 |
|
---|
2303 |
|
---|
2304 | <hr>
|
---|
2305 | <pre>
|
---|
2306 | Date: Thu, 1 Feb 2001 15:12:41 -0400
|
---|
2307 | Subject: Re: [Fwd: bad string, bad string]
|
---|
2308 | </pre>
|
---|
2309 |
|
---|
2310 | <p>We've been removing implicit conversion operators because they
|
---|
2311 | _always_ lead to trouble. Usually they make it harder to pick the
|
---|
2312 | right function when overloading is involved and in the past they have
|
---|
2313 | led to huge performance suckage because we ended up doing conversions
|
---|
2314 | when we didn't need to because the implicit operator made us pick the
|
---|
2315 | wrong function.
|
---|
2316 |
|
---|
2317 | <p>It's borderline when the class implements something that is <strong>so</strong>
|
---|
2318 | close, as with a guaranteed flat string or an <span class="code">nsCOMPtr</span> ... but the
|
---|
2319 | general recommendation is to avoid implicit conversions.
|
---|
2320 |
|
---|
2321 | <p>See bug #53057.
|
---|
2322 |
|
---|
2323 |
|
---|
2324 |
|
---|
2325 |
|
---|
2326 |
|
---|
2327 | <hr>
|
---|
2328 | <pre>
|
---|
2329 | Date: Tue, 6 Feb 2001 18:52:23 -0400
|
---|
2330 | Subject: seeking review for bug #57087
|
---|
2331 | </pre>
|
---|
2332 |
|
---|
2333 | <p> bug:
|
---|
2334 | <a class="exact-uri" href="http://bugzilla.mozilla.org/show_bug.cgi?id=57087">http://bugzilla.mozilla.org/show_bug.cgi?id=57087</a>
|
---|
2335 |
|
---|
2336 | patch:
|
---|
2337 | <a class="exact-uri" href="http://bugzilla.mozilla.org/showattachment.cgi?attach_id=24576">http://bugzilla.mozilla.org/showattachment.cgi?attach_id=24576</a>
|
---|
2338 |
|
---|
2339 | <p>This patch is supposed to add the ability to define very long literal
|
---|
2340 | strings more easily by breaking lines, e.g.,
|
---|
2341 |
|
---|
2342 | <div class="source-code">
|
---|
2343 | <pre>
|
---|
2344 | NS_MULTILINE_LITERAL( NS_L("This is the start of a very long line")
|
---|
2345 | NS_L(" which actually continues across")
|
---|
2346 | NS_L(" a couple more.") )
|
---|
2347 | </pre>
|
---|
2348 | </div>
|
---|
2349 |
|
---|
2350 | <p>The main danger in this scheme is callers who omit the inner <span class="code">NS_L</span>
|
---|
2351 | wrapping. Though I believe this will be caught at compile time as the
|
---|
2352 | wrong type initializer.
|
---|
2353 |
|
---|
2354 | <p>Seeking input from everybody, and waterson in particular.
|
---|
2355 |
|
---|
2356 |
|
---|
2357 |
|
---|
2358 |
|
---|
2359 |
|
---|
2360 | <hr>
|
---|
2361 | <pre>
|
---|
2362 | Date: Wed, 14 Feb 2001 16:09:10 -0400
|
---|
2363 | Subject: Re: Question...
|
---|
2364 | </pre>
|
---|
2365 |
|
---|
2366 | <p>There are some utilities in "xpcom/ds/nsReadableUtils.h". In
|
---|
2367 | particular, if you want to get back a new heap-allocated ASCII string
|
---|
2368 | with the minimal work, you would say
|
---|
2369 |
|
---|
2370 | <div class="source-code">
|
---|
2371 | <pre>
|
---|
2372 | PRUnichar* sourceChars = ...;
|
---|
2373 |
|
---|
2374 | char* destChars = ToNewCString(nsLiteralString(sourceChars));
|
---|
2375 | </pre>
|
---|
2376 | </div>
|
---|
2377 |
|
---|
2378 |
|
---|
2379 | <p>It's more efficient if you happen to already know the length. If you
|
---|
2380 | don't, don't bother counting, that's what I'll do in the constructor
|
---|
2381 | for <span class="code">nsLiteralString</span>. If you do, then call like this
|
---|
2382 |
|
---|
2383 | <div class="source-code">
|
---|
2384 | <pre>
|
---|
2385 | destChars = ToNewCString( nsLiteralString(sourceChars, length) );
|
---|
2386 | </pre>
|
---|
2387 | </div>
|
---|
2388 |
|
---|
2389 | <p>Other routines in that file will help you if, for instance, you wanted
|
---|
2390 | to translate into a buffer you had already allocated.
|
---|
2391 |
|
---|
2392 | <p>Hope this helps,
|
---|
2393 |
|
---|
2394 |
|
---|
2395 |
|
---|
2396 |
|
---|
2397 |
|
---|
2398 | <hr>
|
---|
2399 | <pre>
|
---|
2400 | Date: Fri, 23 Feb 2001 03:12:58 -0400
|
---|
2401 | Subject: string snippet
|
---|
2402 | </pre>
|
---|
2403 |
|
---|
2404 | <div class="source-code">
|
---|
2405 | <pre>
|
---|
2406 | nsCString aInput;
|
---|
2407 |
|
---|
2408 |
|
---|
2409 |
|
---|
2410 | nsReadingIterator<char> search_start;
|
---|
2411 | aInput.BeginReading(search_start);
|
---|
2412 |
|
---|
2413 | nsReadingIterator<char> search_end;
|
---|
2414 | aInput.EndReading(search_end);
|
---|
2415 |
|
---|
2416 | if ( FindCharInReadable(':', search_start, search_end) )
|
---|
2417 | {
|
---|
2418 | ++search_start;
|
---|
2419 | return ToNewCString( Substring(aInput, search_start, search_end)
|
---|
2420 | );
|
---|
2421 | }
|
---|
2422 | </pre>
|
---|
2423 | </div>
|
---|
2424 |
|
---|
2425 |
|
---|
2426 |
|
---|
2427 |
|
---|
2428 |
|
---|
2429 |
|
---|
2430 | <hr>
|
---|
2431 | <pre>
|
---|
2432 | Date: Wed, 7 Mar 2001 19:44:08 -0400
|
---|
2433 | Subject: string help
|
---|
2434 | </pre>
|
---|
2435 |
|
---|
2436 | <p>Here you go, Mike:
|
---|
2437 |
|
---|
2438 | http://scottcollins.net/journal/discussion/mjudge-scratch.cpp
|
---|
2439 |
|
---|
2440 |
|
---|
2441 |
|
---|
2442 |
|
---|
2443 |
|
---|
2444 |
|
---|
2445 | <hr>
|
---|
2446 | <pre>
|
---|
2447 | Date: Fri, 9 Mar 2001 20:56:07 -0400
|
---|
2448 | Subject: Re: string assertions
|
---|
2449 | </pre>
|
---|
2450 |
|
---|
2451 | <p>If you get an iterator into a string and you advance it all the way to
|
---|
2452 | the end of the string, and then <strong>keep</strong> trying to advance it, you hit
|
---|
2453 | this assert. This could happen, for example if you tried to copy 10
|
---|
2454 | characters out of a 9 character string. I've tried to make this
|
---|
2455 | impossible to get to. As far as I know, all my routines trim requests
|
---|
2456 | in advance of manipulating iterators. When you see this, you should
|
---|
2457 | get the stack. That will take you right to the bad spot.
|
---|
2458 |
|
---|
2459 |
|
---|
2460 |
|
---|
2461 |
|
---|
2462 |
|
---|
2463 | <hr>
|
---|
2464 | <pre>
|
---|
2465 | Date: Sat, 31 Mar 2001 11:04:03 -0400
|
---|
2466 | Subject: Re: Sun bustage and string advice
|
---|
2467 | </pre>
|
---|
2468 |
|
---|
2469 | <p>You do know you are comparing two pointers now? It seems unlikely
|
---|
2470 | those two pointers would ever be the same pointer. You probably want
|
---|
2471 | to say something like
|
---|
2472 |
|
---|
2473 | <div class="source-code">
|
---|
2474 | <pre>
|
---|
2475 | NS_LITERAL_STRING("foo").Equals(aTopic) // or
|
---|
2476 |
|
---|
2477 | NS_LITERAL_STRING("foo") == nsLiteralString(aTopic)
|
---|
2478 | </pre>
|
---|
2479 | </div>
|
---|
2480 |
|
---|
2481 | <p>...so that you compare the <strong>contents</strong> of two strings. Right now,
|
---|
2482 | you're just testing to see if two pointers both point to the same
|
---|
2483 | location in memory. A lot of people make this mistake. I would like
|
---|
2484 | to make it obvious to people that comparing two pointers does not
|
---|
2485 | compare strings. Can you tell me what gave you that impression so
|
---|
2486 | that I can figure out how to better educate people not to do this? By
|
---|
2487 | the way, it's not that I don't <strong>want</strong> to make this compare two
|
---|
2488 | strings; it's that in C++, you can't override operations for built-in
|
---|
2489 | types. And pointers are built-in types. So I can't make
|
---|
2490 | <span class="code">operator==(const PRUnichar*, const PRUnichar*)</span> do anything different
|
---|
2491 | than it already does, which is the same thing it does for any other
|
---|
2492 | pointer.
|
---|
2493 |
|
---|
2494 |
|
---|
2495 |
|
---|
2496 |
|
---|
2497 |
|
---|
2498 |
|
---|
2499 | </div>
|
---|
2500 |
|
---|
2501 |
|
---|
2502 |
|
---|
2503 | <!-- .................................................................End Matter -->
|
---|
2504 |
|
---|
2505 |
|
---|
2506 |
|
---|
2507 | </body>
|
---|
2508 | </html>
|
---|