Getting the meaning out of the shape of Chinese charaters
As one of the most ancient natural language writing form, Chinese character has its own culture and aesthetic value. It influenced east Asia civilization for thousands of years.
Like any other natural languages, Chinese is not always like it's modern format. It changes along with the people who use it. The written format of modern Chinese has been changed due to the dramatic chaos of 20th century. Though using the same alphabet, the way to express same meaning changed so much that as a modern Chinese speaker without proper training of classic Chinese may see classic Chinese as a foreign language. Using "the same alphabet" may not be accurate since the characters changed its written form as well at the beginning of Chinese formation and kept relatively stable after. We will glimpse over the changes of the Chinese characters through a few examples.
Instead of a representation of phonemes, Chinese characters express meaning directly. The phonological information contained in Chinese characters are more like a byproduct, because in modern Chinese, almost every character can be pronounced as a single syllable made of a consonant and a vowel. The homonyms in Chinese sometimes even increase the disambiguation hardness. For example, the syllable “yì” has 179 corresponding characters. Consequently, the most common question when people know someone's name is "which character is your name?". Apparently, people cannot reduce the meaning through people's name -- a short string of syllables which usually won't be longer than 4 characters. It's quite reasonable to say that Chinese characters are designed more for a reader than a listener. (At the same time classic Chinese poems benefit from phonological ambiguity. )
The story begins with the creation of Chinese characters. The earliest known Chinese characters are created for the purpose of religious ritual and oracles. Basically, a priest will write down the oracle questions on a tortoiseshell and burn it and make an oracle according to the crack on the shell. This gives the most ancient Chinese alphabet name Oracle bone script. And the eldest residue of Oracle bone script discovered is around 4000 years ago from Shang Dynasty in the center area of China(Now Shaanxi and Henan Province).
Those alien-like characters are the earliest Chinese characters. If we take a closer look at some of the Oracle bone script characters, we may see that there're something in common with Chinese letters and Egyptian hieroglyph.
Shown below is an incident for the two civilizations using similar symbols for eye. The first one on the left is an Oracle glyph and the second one is a Hieroglyph. They all represent "eye" but don't get me wrong, this is not saying that hieroglyph and oracle bone script are the same kinds of writing format. They are not. In what sense? Hieroglyph though looks very pictographic is still a phoneme oriented writing system. While Oracle bone script, as well as its offspring Chinese, are meaning oriented writing system(pay less attention to pronunciation).
A few hundreds of years of development of the writing system after the oracle glyph age finally settled the written Chinese to a relatively stable format, by which aristocrat can record history and scholars can write down their theories. For whatever reason, the writing format is very unlike the spoken form due to the instinct meaning oriented design. The written format can live just fine and convey the same meaning over centuries without changing its format, even though modern Chinese people can certainly not understand spoken Chinese only a few hundred years ago. Evidence is that after Li script(can be considered as a font now) is promoted by the sovereign of Han Dynasty, the whole nation kept using this font officially for over 2000 years. However, the convention of written classic Chinese requires each character to represent a grammatical entity, which corresponding to a modern Chinese word. Let alone the grammar is actually very different comparing with its modern Chinese(spoken or written) descendant.
In order to disambiguate phonemes, spoken Chinese are forced to use multi-syllable words instead of single syllable characters. The modern trend of changing the written format closer to it's spoken form after the influence of western languages at the begining of 20th century makes modern Chinese just like a phoneme oriented language overnight without pilling off the meaning oriented written cloth. Actually, romanization of Chinese is a popular topic during the radical ages but eventually (thankfully) been given up. This suggests that the meaning represented in the glyph of Chinese characters are more likely to be necessary for classic Chinese writings. Some idioms that originated from classics can whether be understood by purely reciting its meaning or reading through its classic character meaning character by character. Phrases like “本末倒置”(běn mò dào zhì), which means put the cart before the horse, cannot be rationalized if we map the meaning of it word by word, since even we have a word “倒置”(dao zhi) which means up side down, “本末”(ben mo) by itself is no longer a word. Further, if we map “末” to its modern Chinese meaning, which means "end, trivial, or debris" the whole phrase doesn't make too much sense either.
And here comes our classic Chinese interpretation through the character itself. The original oracle glyph of "本" is:
The oracle glyph of "末" is:
We can see that there's a focusing line put on the exact opposite place of a shared main component. Even with the modern abstract version of these two characters, we can still see this legacy -- an extra bar is put on the character "木" to indicate different part of a plant. Therefore the meaning of the phrase is literally “take the branch for the root”. The information contained in the glyph makes it easier to understand the phrase's actual meaning. And it only make sense if we parse the phrase character by character instead of word by word like what classic Chinese did.
Nevertheless, this may not be the case for modern writing/spoken Chinese, since in modern Chinese the combination of characters determines the meaning of a word and single character is rarely used alone. Therefore the necessity of parsing the glyph is more likely to be trivial. This brings us to an obvious conclusion that the more we need to focus on single character's meaning, the more useful the information contained in Chinese glyph will be, and consequently classic Chinese will rely more on character itself comparing with its counterpart modern version.
As part of the observation in the paper Glyph-aware character embedding for Chinese, my coauthor Falcon and I found that aware of the glyph of Chinese characters doesn't help too much in modern Chinese language modeling. I personally believe that its the phoneme orientation of modern Chinese that weakened the necessity of knowing the visual representation.
One may wonder what is the point if almost all natural languages writing form are more or less phoneme oriented. Here we may want to consider some other possible language forms where vision signal might be the only accessible communication channel. For example in the movie "arrival", the aliens drew different patterns on a screen to communicate with human, which indicates that their alphabet is not necessarily finite or even discrete. When communicating through hands gestures, we parse through gestures. In these cases, visual representation will not only be helpful but crucial.