%0 Journal Article %T Morphological Analysis and Diacritical Arabic Text Compression %A Amjad M Daoud %J International Journal of ACM Jordan %D 2010 %I ACM Jordan ISWSA Professional Chapter %X Morphological analysis of Arabic words allows decreasing the storage requirements of the Arabicdictionaries, more efficient encoding of diacritical Arabic text, faster spelling and efficient Opticalcharacter recognition. All these factors allow efficient storage and archival of multilingual digitallibraries that include Arabic texts.This paper presents a lossless compression algorithm based on the affix analysis that takes advantageof the statistical studies of the diacritical Arabic morphological features. The algorithm decomposes agiven Arabic word into its root and its affixes. The affixes (prefix, infix, and suffix) are the redundantelements of the word. The roots are stored in the root dictionary. Also, we maintain categorized affixdictionaries and their valid combinations to validate and generate the morphological forms duringencoding and decoding using a list of patterns. Since our goal is lossless reproducible Arabic text,stemming is not an option and noise words (high frequency words) cannot be filtered out.The size of the obtained root dictionary is about 8000 three-character roots and 700 four characterroots. We also code the most frequently occurring diacritical bigrams (biliterals) and trigrams(triliterals) with unused codewords in ASCII, ASMO-449, and Unicode standard codes. Usingcombined methods of root dictionaries and the proposed coding scheme, compression ratios of properArabic text compare favorably with other unigram non-diacritical methods. %K Compression %K Affixes %K Morphological Analysis %K Dictionary %K Root %K Diacritics %K lexicon %U http://iswsa2010.acm.org/volumes/volume1/no1/ijjvol1no1p5.pdf