This page details the SVG extensions used by KanjiVG to add information on components of kanji, such as the radicals, as well as information on the expected shapes of strokes.
All of the SVG extensions used by KanjiVG are XML attributes with an
added kvg:
suffix, a "namespace" in XML terminology.
There are two root SVG groups. The StrokePaths group is a set of standard SVG paths that gives the strokes of the kanji. The strokes are ordered in the stroke order given by the references. It also contains groups which describe the structure of the kanji, such as its decomposition into elements, and which strokes comprise its radical.
The StrokeNumbers group is an optional group that gives a convenient position for stroke-order numbers, useful for displaying in printed material for instance. The stroke numbers are positioned near their corresponding strokes, at their starting points.
Undocumented and unknown features, and notes on possible flaws in the data, are distinguished with a pale green background.
Kanji are often made of several components, referred to
as elements in this
documentation. For instance,
頑 can be seen as a combination of
元 on its left and
頁 on its right. KanjiVG uses SVG
groups to reflect this structure, so the KanjiVG entry contains two
groups under the parent StrokePaths
group containing
the left and right side of the kanji. The 元 and 頁 are encoded in
these groups under their element
attributes. SVG groups provide an elegant way to collect strokes
into a given group.
In the case that the elements do not consist of contiguous strokes, KanjiVG uses the part and number XML attributes to distinguish them.
This section explains the SVG attributes used in the groups under the StrokePaths group which describe the structure of a kanji, such as radicals and other sub-elements.
The KanjiVG identification number for this group. It contains the Unicode value of the kanji as a five-digit lower-case hexadecimal number, followed by a hyphen and the letter "g", followed by a decimal number from one to the total number of groups.
The group ID numbers are always consecutive positive whole numbers.
All these attributes are placed under the kvg
namespace, for example kvg:original="家"
This attribute specifies which kanji best represents the group physically. It should be the Unicode character that resembles the group as much as possible.
The value of element
on the outermost group of the
strokes is the same as the kanji represented by the SVG.
This relatively rare attribute allows an element of a kanji to be identified when it is both represented several times in the kanji, and, due to the stroke order, more than one of these representations is broken into parts, so that the part attribute has to be used for more than one element. In other words, the number attribute is a way to uniquely identify the part when it becomes ambiguous.
It is only used in a few places in kanjivg where there are two
different sets of the same element, such as 05716.svg, the character
圖, where there are four 口 elements, two of which are broken into
parts one and two due to the stroke order. Please
inspect the
source code of that SVG file to understand
what kvg:number
attribute does.
Generally, elements which can be represented by contiguous blocks of strokes do not have a number attribute, even if multiple cases of the same element occur in a character, so, for example, the 口 elements of 品 do not have a number attribute.
This attribute specifies which kanji represents the group from a semantic point of view. This attribute only needs to be present if there is a difference between the semantic and physical representation of the group.
For example, 仮 has two groups.
The left one has 亻
(called ninben) for its element
attribute, and
人, meaning "person", for
its original
attribute, because ninben is a
variation of 人. However, the right side has
反 for element
,
which is not a variation, so an original
attribute is
not necessary.
When the elements of a group of kanji strokes which forms a larger unit are not consecutive strokes, the group of strokes may be spread over several groups of paths in the file. The part attribute allows numbering these groups and defines them as being part of the same component. There is also a number attribute which can be used in the rare cases that two groups with the same element have non-consecutive strokes within the same character.
Should be present and set to true if the group only represents the element attribute partially, i.e. if not all its strokes are present.
A large number of kanji consist of a radical and a phoneticum, the
Sino-Japanese pronunciation. The phon
attribute should
mark the part indicating the pronunciation.
The values of this attribute are inconsistent, and the meanings of many of them are completely undocumented. See issue 312 on Github for more details.
Defines where this groups is located with respect to the other groups with the same parent. Not every element has a "position" value. Possible values are
This is set to a value if this group of strokes is considered a radical of the kanji, and by which reference. The value of the attribute depends on the reference, as follows.
general
or tradit
radicals. This value was added to deal with inconsistencies
between KanjiVG and Kanjidic and other references.
Unicode has more than one code point which may represent each radical. The choices of radicals which have been used by KanjiVG are explained on the Radicals page.
This is set to the value true for a limited number of groups where a radical-like form of a character described by original is provided as the element.
The Kanjidic file with which Ulrich Apel worked in the beginning favored the radicals given in the Nelson character dictionary, which sometimes differ from the radicals given in "traditional" Japanese dictionaries and have mark-up as well.
Unknown, possibly used to indicate that the shape of the element is unlike the usual grapheme.
Each individual kanji stroke is represented by one SVG <path> element.
The SVG path information itself. This describes the shape of the line.
Although there is no rule disallowing various SVG elements, in practice all of the KanjiVG data consists of cubic bezier curves. In the SVG terminology the path is made up of only M/m, C/c, and S/s elements. There are no other SVG path elements present. None of the strokes contains a path with more than one sub-path, that is to say there are no strokes with more than one "moveto" element.
The KanjiVG identification number for this stroke. It contains the
prefix kvg:
followed by the Unicode value of the kanji
as a five-digit lower-case hexadecimal number, followed by any
variant information, followed by a hyphen and the letter "s",
followed by a decimal number from one to the total stroke count. For
example stroke 3 of the file kanji/053ec.svg
has the ID
number kvg:053ec-s3
.
The stroke IDs are consecutive positive whole numbers starting from 1 which correspond to the stroke number of the stroke.
These attributes are under the kvg:
namespace.
The shape of the stroke. It can be used to know how the stroke should be rendered.
The values of this attribute use the keys of Unicode's CJK Strokes, which occupy code positions from U+31C0 to U+31EF. The names of these, such as D or HZ, are the initials of the Chinese names.
Please see the Stroke types page for full information on stroke types.
Stroke numbers are represented by a top-level group with an ID of
the form kvg:StrokeNumbers_abcde
,
where abcde
is the identifier of the file. This group
contains text
elements. Each text element is located on
the diagram using a transform
attribute. The text
within each text element is the stroke number in digits, from one to
the total number of strokes. The stroke numbers should
correspond to the id value of the individual
strokes.
The stroke numbers are located to the side of the beginning of the stroke whose order they indicate. Generally, they should not overlap the strokes.