1 | <?xml version="1.0"?>
|
---|
2 | <!DOCTYPE kanjidic2 [
|
---|
3 | <!-- Version 1.3
|
---|
4 | This is the DTD of the XML-format kanji file combining information from
|
---|
5 | the KANJIDIC and KANJD212 files. It is intended to be largely self-
|
---|
6 | documenting, with each field being accompanied by an explanatory
|
---|
7 | comment.
|
---|
8 |
|
---|
9 | The file covers the following kanji:
|
---|
10 | (a) the 6,355 kanji from JIS X 0208;
|
---|
11 | (b) the 5,801 kanji from JIS X 0212;
|
---|
12 | (c) the 3,625 kanji from JIS X 0213 as follows:
|
---|
13 | (i) the 2,741 kanji which are also in JIS X 0212 have
|
---|
14 | JIS X 0213 code-points (kuten) added to the existing entry;
|
---|
15 | (ii) the 884 "new" kanji have new entries.
|
---|
16 |
|
---|
17 | At the end of the explanation for a number of fields there is a tag
|
---|
18 | with the format [N]. This indicates the leading letter(s) of the
|
---|
19 | equivalent field in the KANJIDIC and KANJD212 files.
|
---|
20 |
|
---|
21 | The KANJIDIC documentation should also be read for additional
|
---|
22 | information about the information in the file.
|
---|
23 | --><!ELEMENT kanjidic2 (header , character*)>
|
---|
24 | <!ELEMENT header (file_version , database_version , date_of_creation)>
|
---|
25 | <!--
|
---|
26 | The single header element will contain identification information
|
---|
27 | about the version of the file
|
---|
28 | --><!ELEMENT file_version (#PCDATA)>
|
---|
29 | <!--
|
---|
30 | This field denotes the version of kanjidic2 structure, as more
|
---|
31 | than one version may exist.
|
---|
32 | --><!ELEMENT database_version (#PCDATA)>
|
---|
33 | <!--
|
---|
34 | The version of the file, in the format YYYY-NN, where NN will be
|
---|
35 | a number starting with 01 for the first version released in a
|
---|
36 | calendar year, then increasing for each version in that year.
|
---|
37 | --><!ELEMENT date_of_creation (#PCDATA)>
|
---|
38 | <!--
|
---|
39 | The date the file was created in international format (YYYY-MM-DD).
|
---|
40 | --><!ELEMENT character (literal , codepoint , radical , misc , dic_number? , query_code? , reading_meaning? , nanori?)*>
|
---|
41 | <!ELEMENT literal (#PCDATA)>
|
---|
42 | <!--
|
---|
43 | The character itself in UTF8 coding.
|
---|
44 | --><!ELEMENT codepoint (cp_value)+>
|
---|
45 | <!--
|
---|
46 | The codepoint element states the code of the character in the various
|
---|
47 | character set standards.
|
---|
48 | --><!ELEMENT cp_value (#PCDATA)>
|
---|
49 | <!--
|
---|
50 | The cp_value contains the codepoint of the character in a particular
|
---|
51 | standard. The standard will be identified in the cp_type attribute.
|
---|
52 | --><!ATTLIST cp_value cp_type CDATA #REQUIRED>
|
---|
53 | <!--
|
---|
54 | The cp_type attribute states the coding standard applying to the
|
---|
55 | element. The values assigned so far are:
|
---|
56 | jis208 - JIS X 0208-1997 - kuten coding (nn-nn)
|
---|
57 | jis212 - JIS X 0212-1990 - kuten coding (nn-nn)
|
---|
58 | jis213 - JIS X 0213-2000 - kuten coding (p-nn-nn)
|
---|
59 | ucs - Unicode 4.0 - hex coding (4 or 5 hexadecimal digits)
|
---|
60 | --><!ELEMENT radical (rad_value)+>
|
---|
61 | <!ELEMENT rad_value (#PCDATA)>
|
---|
62 | <!--
|
---|
63 | The radical number, in the range 1 to 214. The particular
|
---|
64 | classification type is stated in the rad_type attribute.
|
---|
65 | --><!ATTLIST rad_value rad_type CDATA #REQUIRED>
|
---|
66 | <!--
|
---|
67 | The rad_type attribute states the type of radical classification.
|
---|
68 | classical - as recorded in the KangXi Zidian.
|
---|
69 | nelson - as used in the Nelson "Modern Japanese-English
|
---|
70 | Character Dictionary" (i.e. the Classic, not the New Nelson).
|
---|
71 | This will only be used where Nelson reclassified the kanji.
|
---|
72 | --><!ELEMENT misc (grade? , stroke_count+ , variant* , freq* , rad_name*)>
|
---|
73 | <!ELEMENT grade (#PCDATA)>
|
---|
74 | <!--
|
---|
75 | The Jouyou Kanji grade level. 1 through 6 indicate the grade in which
|
---|
76 | the kanji is taught in Japanese schools. 8 indicates it is one of the
|
---|
77 | remaining Jouyou Kanji to be learned in junior high school, and 9
|
---|
78 | indicates it is a Jinmeiyou (for use in names) kanji. [G]
|
---|
79 | --><!ELEMENT stroke_count (#PCDATA)>
|
---|
80 | <!--
|
---|
81 | The stroke count of the kanji, including the radical. If more than
|
---|
82 | one, the first is considered the accepted count, while subsequent ones
|
---|
83 | are common miscounts. (See Appendix E. of the KANJIDIC documentation
|
---|
84 | for some of the rules applied when counting strokes in some of the
|
---|
85 | radicals.) [S]
|
---|
86 | --><!ELEMENT variant (#PCDATA)>
|
---|
87 | <!--
|
---|
88 | A cross-reference code to another kanji, usually regarded as a variant.
|
---|
89 | The type of cross-reference is given in the var_type attribute.
|
---|
90 | --><!ATTLIST variant var_type CDATA #REQUIRED>
|
---|
91 | <!--
|
---|
92 | The var_type attribute indicates the type of variant code. The current
|
---|
93 | values are:
|
---|
94 | jis208 - in JIS X 0208 - kuten coding
|
---|
95 | jis212 - in JIS X 0212 - kuten coding
|
---|
96 | jis213 - in JIS X 0213 - kuten coding
|
---|
97 | deroo - De Roo number - numeric
|
---|
98 | njecd - Halpern NJECD index number - numeric
|
---|
99 | s_h - The Kanji Dictionary (Spahn & Hadamitzky) - descriptor
|
---|
100 | nelson - "Classic" Nelson - numeric
|
---|
101 | oneill - Japanese Names (O'Neill) - numeric
|
---|
102 | --><!ELEMENT freq (#PCDATA)>
|
---|
103 | <!--
|
---|
104 | A frequency-of-use ranking. The 2,500 most-used characters have a
|
---|
105 | ranking; those characters that lack this field are not ranked. The
|
---|
106 | frequency is a number from 1 to 2,500 that expresses the relative
|
---|
107 | frequency of occurrence of a character in modern Japanese. This is
|
---|
108 | based on a survey in newspapers, so it is biassed towards kanji
|
---|
109 | used in newspaper articles. The discrimination between the less
|
---|
110 | frequently used kanji is not strong.
|
---|
111 | --><!ELEMENT rad_name (#PCDATA)>
|
---|
112 | <!--
|
---|
113 | When the kanji is itself a radical and has a name, this element
|
---|
114 | contains the name (in hiragana.) [T2]
|
---|
115 | --><!ELEMENT dic_number (dic_ref)+>
|
---|
116 | <!--
|
---|
117 | This element contains the index numbers and similar unstructured
|
---|
118 | information such as page numbers in a number of published dictionaries,
|
---|
119 | and instructional books on kanji.
|
---|
120 | --><!ELEMENT dic_ref (#PCDATA)>
|
---|
121 | <!--
|
---|
122 | Each dic_ref contains an index number. The particular dictionary,
|
---|
123 | etc. is defined by the dr_type attribute.
|
---|
124 | --><!ATTLIST dic_ref dr_type CDATA #REQUIRED>
|
---|
125 | <!--
|
---|
126 | The dr_type defines the dictionary or reference book, etc. to which
|
---|
127 | dic_ref element applies. The initial allocation is:
|
---|
128 | nelson_c - "Modern Reader's Japanese-English Character Dictionary",
|
---|
129 | edited by Andrew Nelson (now published as the "Classic"
|
---|
130 | Nelson).
|
---|
131 | nelson_n - "The New Nelson Japanese-English Character Dictionary",
|
---|
132 | edited by John Haig.
|
---|
133 | halpern_njecd - "New Japanese-English Character Dictionary",
|
---|
134 | edited by Jack Halpern.
|
---|
135 | halpern_kkld - "Kanji Learners Dictionary" (Kodansha) edited by
|
---|
136 | Jack Halpern.
|
---|
137 | heisig - "Remembering The Kanji" by James Heisig.
|
---|
138 | gakken - "A New Dictionary of Kanji Usage" (Gakken)
|
---|
139 | oneill_names - "Japanese Names", by P.G. O'Neill.
|
---|
140 | oneill_kk - "Essential Kanji" by P.G. O'Neill.
|
---|
141 | moro - "Daikanwajiten" compiled by Morohashi. For some kanji two
|
---|
142 | additional attributes are used: m_vol: the volume of the
|
---|
143 | dictionary in which the kanji is found, and m_page: the page
|
---|
144 | number in the volume.
|
---|
145 | henshall - "A Guide To Remembering Japanese Characters" by
|
---|
146 | Kenneth G. Henshall.
|
---|
147 | sh_kk - "Kanji and Kana" by Spahn and Hadamitzky.
|
---|
148 | sakade - "A Guide To Reading and Writing Japanese" edited by
|
---|
149 | Florence Sakade.
|
---|
150 | henshall3 - "A Guide To Reading and Writing Japanese" 3rd
|
---|
151 | edition, edited by Henshall, Seeley and De Groot.
|
---|
152 | tutt_cards - Tuttle Kanji Cards, compiled by Alexander Kask.
|
---|
153 | crowley - "The Kanji Way to Japanese Language Power" by
|
---|
154 | Dale Crowley.
|
---|
155 | kanji_in_context - "Kanji in Context" by Nishiguchi and Kono.
|
---|
156 | busy_people - "Japanese For Busy People" vols I-III, published
|
---|
157 | by the AJLT. The codes are the volume.chapter.
|
---|
158 | kodansha_compact - the "Kodansha Compact Kanji Guide".
|
---|
159 | --><!ATTLIST dic_ref m_vol CDATA #IMPLIED>
|
---|
160 | <!--
|
---|
161 | See above under "moro".
|
---|
162 | --><!ATTLIST dic_ref m_page CDATA #IMPLIED>
|
---|
163 | <!--
|
---|
164 | See above under "moro".
|
---|
165 | --><!ELEMENT query_code (q_code)+>
|
---|
166 | <!--
|
---|
167 | These codes contain information relating to the glyph, and can be used
|
---|
168 | for finding a required kanji. The type of code is defined by the
|
---|
169 | qc_type attribute.
|
---|
170 | --><!ELEMENT q_code (#PCDATA)>
|
---|
171 | <!--
|
---|
172 | The q_code contains the actual query-code value, according to the
|
---|
173 | qc_type attribute.
|
---|
174 | --><!ATTLIST q_code qc_type CDATA #REQUIRED>
|
---|
175 | <!--
|
---|
176 | The q_code attribute defines the type of query code. The current values
|
---|
177 | are:
|
---|
178 | skip - Halpern's SKIP (System of Kanji Indexing by Patterns)
|
---|
179 | code. The format is n-nn-nn. See the KANJIDIC documentation
|
---|
180 | for a description of the code and restrictions on the
|
---|
181 | commercial use of this data. [P]
|
---|
182 |
|
---|
183 | sh_desc - the descriptor codes for The Kanji Dictionary (Tuttle
|
---|
184 | 1996) by Spahn and Hadamitzky. They are in the form nxnn.n,
|
---|
185 | e.g. 3k11.2, where the kanji has 3 strokes in the
|
---|
186 | identifying radical, it is radical "k" in the SH
|
---|
187 | classification system, there are 11 other strokes, and it is
|
---|
188 | the 2nd kanji in the 3k11 sequence. (I am very grateful to
|
---|
189 | Mark Spahn for providing the list of these descriptor codes
|
---|
190 | for the kanji in this file.) [I]
|
---|
191 | four_corner - the "Four Corner" code for the kanji. This is a code
|
---|
192 | invented by Wang Chen in 1928. See the KANJIDIC documentation
|
---|
193 | for an overview of the Four Corner System. [Q]
|
---|
194 |
|
---|
195 | deroo - the codes developed by the late Father Joseph De Roo, and
|
---|
196 | published in his book "2001 Kanji" (Bojinsha). Fr De Roo
|
---|
197 | gave his permission for these codes to be included. [DR]
|
---|
198 | misclass - a possible misclassification of the kanji according
|
---|
199 | to one of the code types. (See the "Z" codes in the KANJIDIC
|
---|
200 | documentation for more details.)
|
---|
201 |
|
---|
202 | --><!ELEMENT reading_meaning (rmgroup* , nanori*)>
|
---|
203 | <!--
|
---|
204 | The readings for the kanji in several languages, and the meanings, also
|
---|
205 | in several languages. The readings and meanings are grouped to enable
|
---|
206 | the handling of the situation where the meaning is differentiated by
|
---|
207 | reading. [T1]
|
---|
208 | --><!ELEMENT nanori (#PCDATA)>
|
---|
209 | <!--
|
---|
210 | Japanese readings that are now only associated with names.
|
---|
211 | --><!ELEMENT rmgroup (reading* , meaning*)>
|
---|
212 | <!ELEMENT reading (#PCDATA)>
|
---|
213 | <!--
|
---|
214 | The reading element contains the reading or pronunciation
|
---|
215 | of the kanji.
|
---|
216 | --><!ATTLIST reading r_type CDATA #REQUIRED>
|
---|
217 | <!--
|
---|
218 | The r_type attribute defines the type of reading in the reading
|
---|
219 | element. The current values are:
|
---|
220 | pinyin - the modern PinYin romanization of the Chinese reading
|
---|
221 | of the kanji. The tones are represented by a concluding
|
---|
222 | digit. [Y]
|
---|
223 | korean_r - the romanized form of the Korean reading(s) of the
|
---|
224 | kanji. The readings are in the (Republic of Korea) Ministry
|
---|
225 | of Education style of romanization. [W]
|
---|
226 | korean_h - the Korean reading(s) of the kanji in hangul.
|
---|
227 | ja_on - the "on" Japanese reading of the kanji, in katakana. A
|
---|
228 | second attribute r_status, if present, will indicate with
|
---|
229 | a value of "jy" whether the reading is approved for a
|
---|
230 | "Jouyou kanji".
|
---|
231 | ja_kun - the "kun" Japanese reading of the kanji, in hiragana.
|
---|
232 | Where relevant the okurigana is also included separated by a
|
---|
233 | ".". Readings associated with prefixes and suffixes are
|
---|
234 | marked with a "-". A second attribute r_status, if present,
|
---|
235 | will indicate with a value of "jy" whether the reading is
|
---|
236 | approved for a "Jouyou kanji".
|
---|
237 | --><!ATTLIST reading r_status CDATA #IMPLIED>
|
---|
238 | <!--
|
---|
239 | See under ja_on and ja_kun above.
|
---|
240 | --><!ELEMENT meaning (#PCDATA)>
|
---|
241 | <!--
|
---|
242 | The meaning associated with the kanji.
|
---|
243 | --><!ATTLIST meaning m_lang CDATA #IMPLIED>
|
---|
244 | <!--
|
---|
245 | The m_lang attribute defines the target language of the meaning. It
|
---|
246 | will be coded using the two-letter language code from the ISO 639
|
---|
247 | standard. When absent, the value "en" (i.e. English) is implied. [{}]
|
---|
248 | -->]>
|
---|
249 | <kanjidic2>
|
---|
250 | </kanjidic2>
|
---|