Transliteration Design Centre

do_shahbaz · Postby **do_shahbaz** » 2021-12-16, 13:18

世界好！I am creating a thread to gather discussion about phonetical transcriptions of any form or variant of Chinese: historical or extant, systematic or ad-hoc, well-established or amateur-ish, or even those of your creation (i.e. a conscript). Feel free to pitch in any idea, be they of your imagination or taken from places!

That being said, I would like to showcase some of the interesting transcription systems that have been posted by some of the threads in this channel -- hopefully by the leave of their poster. If any of them objects to the inclusion of links to their threads in this one, please do contact me and I shall have them promptly removed from this list.

Shanghainese 'New Characters': https://forum.unilang.org/viewtopic.php ... se#p976220
Xiao'erjing: https://forum.unilang.org/viewtopic.php?f=50&t=58068

I'll start by sharing a project of mine: an attempt at a simple, beginner friendly transcription of Middle Chinese named Ciet'im (切音) which I plan to use in readings of historical Chinese texts. The transcription is technically a set of a pair of orthographies: the etymological/scientific transliteration (源音/学音), and the simplified or common transliteration (公音). The former strives to be as faithful as possible to the phonological system of the Ciet'hiun (切韻), while the latter strips the former of features and distinctions not made in most or all contemporary Chinese topolects.

I have had the 'simplified' vowel system mostly figured out:

Code: Select all

1       2       3/4
等       等       等
o      a      ie,i       
u              iu
ou     au     iau,ieu
om     am     iem,im
on     an     ien
en             in
ong            iong
aong
eng             ing
ang            ieng
ung            iung

The 'complex system' shall build upon the 'simplified system', employing diacritics in order to indicate distinctions not made in the latter. Among those which I have had in mind are:

- the ogonek (鉤) ą ǫ : indicating an open vowel less open / more centralised than its hook-less counterpart.
- the slash or strikethrough (横) ɨ : only applied to i, the diacritic indicates the elision of the excrescent i found only in the later variants of Middle Chinese (c.a 800-1000 A.D.).

The three-way contrast in the stops and fricatives have been leaving me with a conundrum for some time. The two-way contrast that Latin possesses (b d g z vs. p t k s; but note that h and x, whatever sound they may each may represent, are by default left to their own devices) may do justice for languages/topolects, such as Mandarin and Cantonese, which distinguish only two types of stops, but not sufficiently well for Sinitic languages that make a three-way contrast between voiced, tenuis, and aspirated consonants, viz. Old and Middle Chinese and contemporary languages in the Wu / Ghu 呉, Minnam 閩南, and Old Xiang subgroups. One is thus left with two graphemes to represent three phonemes -- the problem is usually solved by distinguishing the 'third phoneme' by the use of a device (apostrophe, the letter h, orthographical gemination [e.g. Bbanlam Peng'im]). In most of the extant systems either the aspirated (i.e. p', ph for /p^h/) or the voiceless phoneme (i.e. bh, b', bb, bp for /b^(ɦ)) gets single out -- I can't decide which of the two to follow, each of them coming with their own advantages and disadvantages.

Postby **vijayjohn** » 2022-02-11, 0:58

I'm not quite sure I understand what the purpose of this thread is, but if it's of any interest to you, the very first linguistics research paper I ever wrote was a search algorithm that would take any term transcribed in Hanyu Pinyin and search for its equivalents in various other transliteration systems (Tongyong Pinyin, Wade-Giles, Gwoyeu Romatzyh, etc.) as well as the original search term.

do_shahbaz · Postby **do_shahbaz** » 2022-02-11, 1:39

vijayjohn wrote:I'm not quite sure I understand what the purpose of this thread is, but if it's of any interest to you, the very first linguistics research paper I ever wrote was a search algorithm that would take any term transcribed in Hanyu Pinyin and search for its equivalents in various other transliteration systems (Tongyong Pinyin, Wade-Giles, Gwoyeu Romatzyh, etc.) as well as the original search term.

I don't know if this is related to the topic of your research, but do you know a way to mass extract data from Wiktionary, or other source(s) that offer pan-topolectal data of Chinese characters? In other words, a neat big table containing every Chinese character in the first column and whatever data Wiktionary has on them on subsequent ones to the right (code, English gloss, radical, pronunciations in difference topolects, etc.)

vijayjohn wrote:I ever wrote was a search algorithm that would take any term transcribed in Hanyu Pinyin and search for its equivalents in various other transliteration systems (Tongyong Pinyin, Wade-Giles, Gwoyeu Romatzyh, etc.) as well as the original search term.

Does yours use a conversion algorithm, or were the data sourced from a certain database?

Postby **vijayjohn** » 2022-02-11, 1:49

do_shahbaz wrote:I don't know if this is related to the topic of your research, but do you know a way to mass extract data from Wiktionary, or other source(s) that offer pan-topolectal data of Chinese characters?

I don't, I'm sorry!

Does yours use a conversion algorithm, or were the data sourced from a certain database?

It's a conversion algorithm. I came up with it myself. It's honestly pretty simple, though.

do_shahbaz · Postby **do_shahbaz** » 2022-02-11, 1:57

It's a conversion algorithm. I came up with it myself. It's honestly pretty simple, though.

Which platform is your algorithm powered by? I' would like to create conversion tools for Middle Chinese (widely used transcriptions [Karlgren, Baxter, Zhengzhang] to my own system) and Hokkien.

Postby **vijayjohn** » 2022-02-11, 2:21

I think I wrote it in Java, and IIRC, I included the source code in the paper itself.

do_shahbaz · Postby **do_shahbaz** » 2022-02-11, 2:44

vijayjohn wrote:I think I wrote it in Java, and IIRC, I included the source code in the paper itself.

I don't think I possess the sufficient know-how to write a Java code, unfortunately

I've been using an Excel database to source the readings from, and used Mark Rosenfelder's SCA to convert Baxter's system into my own. Work involved was more than tedious, tell you.

Are you taking, or have you taken a degree in computational linguistics?

Postby **vijayjohn** » 2022-02-12, 18:32

Yes, I minored in computational linguistics but mostly just because my dad made me do it.

UniLang Language Community • Forum

Transliteration Design Centre

Re: Transliteration Design Centre

Re: Transliteration Design Centre

Re: Transliteration Design Centre

Re: Transliteration Design Centre

Re: Transliteration Design Centre

Re: Transliteration Design Centre

Re: Transliteration Design Centre

Who is online