How 'short' can a language get while still being usable?

xBlackHeartx · Postby **xBlackHeartx** » 2019-02-08, 3:13

The obscure auxlang Neo is known for how short its words and sentences always are. The language favors mono-syllabic words, and looking at sample sentences its rare for a word to be more than 4 or 5 phonemes long.

I've always wanted a conlang that didn't take long to say things, but how short can things actually get before they become impractical? Languages need redundancy, obviously, and its assumed that natural languages have the minimum amount of redundancy necessary. I can't fathom how to replicate something like this in a conlang.

The only other 'short' language I know of is Ithkuil and its various derivatives. That language can easily fit an entire sentence into a single 'word'. In fact, morphemes are often just a single phoneme long! The name of the language itself is actually made up of 4 morphemes (or more depending on how you count them). The morphemes are i-, -th-, kul, and the infix -i- (here its inside the 'kul' morpheme). Obviously, a language like this wouldn't be practical in reality. You could literally miss an entire 'word' of a sentence if you fail to hear even a single phoneme! Not to mention this conlang uses a ridiculously huge phoneme inventory to maximize its number of short morphemes; the original version effectively used the entire IPA, though the latest version now uses phonemic tone and a far more reasonable phoneme inventory. With the original version, only someone who could pronounce and easily distinguish every single symbol in the IPA could possibly speak this language. And of course, the author just made this as a thought experiment, he didn't intend anyone to actually use it.

So this has me wondering: how short is too short? Neo actually looks to be shorter in expression than English, though that's hard to say since the language never became popular. As far as I'm aware, there's never been any kind of extensive text translated into the conlang. Also, the conlang features a VERY generous number of derivational affixes that puts even Esperanto to shame. I highly suspect the possibility of long-winded compound words is there.

Also, how do you measure shortness? Is a morpheme with a CCVC structure shorter than a CVCV word?

And it is clearly possible for some languages to take longer to say things than others. If you watch some Spanish-speaker giving a speech, its pretty obvious that it takes him longer to say things than it would take to say the English translation (not to mention on average, Spanish words tend to use far more phonemes than their English translations). And of course Japanese is highly prone to being long-winded. I mean, their primary word for 'I' is 6 phonemes long! And their verb endings can easily add 4 or more phonemes to the end of each sentence, and very often do (Korean I believe is also infamous for the long chains of suffixes that can often appear on verbs). Its also obvious that Piraha takes longer than normal to say things, though that appears to be mainly a consequence of the fact that it doesn't allow adjectival clauses.

Is there really any measuring stick you could use to determine this? I've noticed that English seems to have a reputation for being short, though I'm not sure if its the shortest. Mandarin seems to have far more short syllables on average than English does, and that's going by the average number of phonemes per word. Personally I was thinking of using my own native language as a measuring stick: if a word has a number of phonemes equal to or less than its English translation, then its short. Of course, translations aren't normally that short forward, plus I've noticed that English seems to really like to use affixes that are only one or two phonemes long (our plural, possessive, and past tense endings are all a single phoneme, barring the occasional epenthesitic vowel). Even our auxiliary verbs and pronouns tend to be just two or three phonemes long, which thinking about it is rather hard to match, especially if your conlang has a smaller phoneme inventory than English.

księżycowy · Postby **księżycowy** » 2019-02-08, 9:26

xBlackHeartx wrote:The only other 'short' language I know of is Ithkuil and its various derivatives. That language can easily fit an entire sentence into a single 'word'. In fact, morphemes are often just a single phoneme long! The name of the language itself is actually made up of 4 morphemes (or more depending on how you count them). The morphemes are i-, -th-, kul, and the infix -i- (here its inside the 'kul' morpheme). Obviously, a language like this wouldn't be practical in reality. You could literally miss an entire 'word' of a sentence if you fail to hear even a single phoneme! Not to mention this conlang uses a ridiculously huge phoneme inventory to maximize its number of short morphemes; the original version effectively used the entire IPA, though the latest version now uses phonemic tone and a far more reasonable phoneme inventory. With the original version, only someone who could pronounce and easily distinguish every single symbol in the IPA could possibly speak this language.

This reminds me of how verbs work in languages like Tlingit or Abkhaz. A stem or tense marker can be a single phoneme. And these languages do tend to have good sized phoneme inventories. Especially the Northern Caucasian languages. Not as exact match, but a close parallel.

xBlackHeartx · Postby **xBlackHeartx** » 2019-02-08, 14:16

Yeah, makes sense. I was just looking up stuff about the Cherokee language, and it reminded me a lot of Ithkuil too. Of course, I still think Ithkuil's level of synthesis goes well beyond any natlang. The word 'ithkuil' itself is actually a sentence defining what a conlang is. I believe it translates as something like 'a set of interacting, but distinct, units of speech that are imaginary'. Also, the creator apparently counts 'kul' as two morphemes: the biconsonantal root 'k-l' and the infix 'u'. K-l is the stem for any word relating to speech. But 'u', to my knowledge, has no set meaning. Apparently all of its 'root words' are generated this way.

Linguaphile · Postby **Linguaphile** » 2019-02-09, 0:36

xBlackHeartx wrote:the creator apparently counts 'kul' as two morphemes: the biconsonantal root 'k-l' and the infix 'u'. K-l is the stem for any word relating to speech. But 'u', to my knowledge, has no set meaning. Apparently all of its 'root words' are generated this way.

This sounds similar to roots in Arabic and other related languages. One of the "classic" examples is the root k-t-b for words relating to writing. So from the root k-t-b there is كَتَبَ‎ (kataba) "to write", كاتب (kātib) "writer, clerk", كِتَاب‎ (kitāb) "book", كتيب (kutayyib) "booklet" and so on. I would be surprised if Arabic or a related natural language weren't the inspiration for the similar feature in Ithkuil.

xBlackHeartx · Postby **xBlackHeartx** » 2019-02-13, 4:59

Analyzing English, I think I've found why English in general tends to be shorter than most languages.

One is the length of words. A lot of English words are only 2 to 5 phonemes long. This is made possible by English's phonology. Not only does it have an above average number of phonemes, in particular vowels, it also has a fairly liberal syllable structure. Both of these allow for an abnormally high number of words with an unusually low phoneme count. The Sino-tibetan family seems to pull off the same trick by using phonemic tone to multiply the number of possible short words without having that elaborate of a syllable structure.

Another thing that makes English short is its lack of derivation. English mostly just borrows words from other languages, and rarely derives them internally. That means that few English words are made up of more than one morpheme, and most of the multi-morpheme words you do see in use just have grammatical markings. This is in stark contrast to a lot of conlangs such as Esperanto which likes to construct words out of multiple morphemes. Especcially since just about every word is marked for part of speech, which means that everything besides personal pronouns and prepositions are at least two morphemes long. This of course increases the number of phonemes words have on average, in particular all the words that have multiple affixes attached to them. And of course a lot of these derivational affixes are made up of two or three morphemes. And the three-phoneme 'mal' prefix is severely overused.

Of course, natural languages like Esperanto aren't exactly common (assuming there even is a natural language with such a generous derivational system), but its still pretty normal for languages to have affixes everywhere. Languages with grammatical case come to mind. Verb endings also add phonemes, and in some agglutinating languages these verb endings can easily add half a dozen phonemes to a verb.

Also, English is a configurational isolating language, so we don't have grammatical markings everywhere. Yeah, there's grammatical number and the third person singular form of verbs, but that's it. And English does make use of a fair number of inflections, particularly in our personal pronouns, the conjugation of 'to be', and all of the stem-changing verbs and plurals. This means that what little grammatical marking there is rarely adds phonemes to the word. i.e. 'Speak' has just as many phonemes as its past tense 'spoke', 'take' also has as many phonemes as 'took', 'goose' is the same length as its plural form 'geese', and it can be argued that the singular form of 'mouse' is shorter than its plural 'mice' (depending on whether or not you count diphthongs as one phoneme or two), etc...

The point of all this is: it would be hard to make a conlang as terse as English. Even if you did have a large phoneme inventory with phonemic tone and/or a highly liberal syllable structure, most conlangs rely heavily on derivation to form the bulk of its vocabulary, just to reduce the work load on the creator. Making a conlang that's made up almost entirely of singular underived morphemes would be very time consuming, assuming it would even be possible since I don't know of any examples. Yeah, there's Neo, but I find its shortness disputable. Yeah, its morphemes tend to be short, but it has a derivational system like Espernato's and ontop of that even marks its words for part of speech just like Esperanto does. Also, I can't find a copy of a complete dictionary for the language (the wikipedia files only include the pages that detail the grammar and the derivational affixes, it stops right before the dictionary, though judging by the page count it must have been rather large). But given the derivational system that looks like it puts even Esperanto to shame, I would guess that in a long-winded text the huge number of derived words would probably push the average phoneme count to well above English. Even with its un-derived words they only look to be about as short as German words. And yeah, German is kinda terse, but its not exactly the shortest out there. And honestly, the main reason German is as short as it is, is because it has both a larger phoneme inventory AND a more liberal syllable structure than even English does.

Okay, Ithkuil pulls it off, but I highly doubt a language like Ithkuil would actually be usable considering that there's no natlang out there that even vaguely resembles it.

Linguaphile · Postby **Linguaphile** » 2019-02-13, 14:29

xBlackHeartx wrote:English in general tends to be shorter than most languages.

I don't think that's necessarily true though. With the languages I know, when translating, sometimes English ends up shorter and sometimes another language ends up shorter. It depends on the content of the passage.
Here is an interesting comparison of a single passage (the first article of the Universal Declaration of Human Rights:
https://unicode.org/udhr/assemblies/first_article_all.html
English is certainly not the shortest (although it is on the "short" end of average).

The translation forum here at Unilang is another good source for various (random) comparisons.

To take a few of the most recent translations there, comparing English and Estonian (because I know both of them and have noticed that with this pair of languages it can go either way in terms of which is more concise), you'll see what I mean by saying it depends on the passage:

The two languages are about the same here:

Smartphones are evil. The world would be better without them.

Nutitelefonid on kurjad. Ilma nendeta oleks maailm parem paik.
source

Estonian is shorter here:

There is no freedom without solidarity.

Ei ole vabadust solidaarsuseta.
source

Estonian is shorter again:

The ice is melting.

Jää sulab.
source

English is shorter here:

That is the boy who stole my watch!

See on poiss, kes varastas mu käekella!
source

English is shorter again:

She went there following a dream, but woke up before reaching it.

Ta läks oma unistuse poole püüeldes, aga ärkas üles enne kui võttis selle kinni.
source

And although those examples are evenly divided in terms of which is shorter, they are just random examples - I just took the first five sentences from the top of the Translations forum. Estonian tends to have longer words, but fewer words. But the "Estonian tends to have longer words" isn't always the case, either (

he/

ta,

that/

see,

are /

on,

without /

ilma,

before /

enne, etc.) Overall I would say these two languages tend to end up being about the same in terms of length on average, with individual words and sentences going either way (in both directions).

xBlackHeartx · Postby **xBlackHeartx** » 2019-02-13, 15:17

Yeah, I was thinking a better way to determine how short or long-winded a language tends to be would be take a phoneme count of a longer text. But how could you do that? Most languages aren't spelled phonetically, and some like English and French make use of a lot of digraphs and silent letters.

Another problem is a phoneme count may not be a good way to measure how long utterances tend to be when spoken. Do some phonemes take up more time than others? This of course goes back to another issue I mentioned, concerning whether affricates and dighthongs count as one phoneme or two.

Honestly, the main reason I asked this question is that words in my conlangs tend to be really long, mainly because I seem to only want to use CV syllables with tiny phoneme inventories and I make very liberal use of derivation. Its not unusual in my conlangs for words to be half a dozen morphemes long, and that's not even including grammatical markings! Essentially, my languages always seem to end up filled with words like Esperanto's 'malsanulejo'. Its a trap that I just can't seem to avoid.

UniLang Language Community • Forum

How 'short' can a language get while still being usable?

Re: How 'short' can a language get while still being usable?

Re: How 'short' can a language get while still being usable?

Re: How 'short' can a language get while still being usable?

Re: How 'short' can a language get while still being usable?

Re: How 'short' can a language get while still being usable?

Re: How 'short' can a language get while still being usable?

Who is online