Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It is based on Arabic script, but it has quite a few differences. There are a couple of extra consonants (گ چ پ ژ), some letters are different (ک vs ك), (ی vs ي), etc. Some letters simply do not exist in Persian (ڤ‎). There are quite a few differences between the two.

Fun fact: during the era of Windows 9x, Windows did not have good (any?) support for Persian, but it supported Arabic. Since Iran is not a signatory to any international copyright treaties, that was not a problem. A company in Iran called Borna Rayaneh essentially patched Windows 95 and later 98 to make it work with Persian. Their patched Windows version was ubiquitous in Iran. It would take a couple of years and Windows versions until Windows' default installation was good enough for everyday use.

Unfortunately Borna made some engineering decisions in making their version whose result has been a mess whose effects could still be felt almost a quarter century later.

In order to make things work for Persian, they took the Arabic version and tweaked it just enough to make it usable. One of the things they did was taking an Arabic font, removing glyphs that Persian did not have, and replacing them with glyphs it did. Remember, this was the pre-Unicode days. This was the easiest way to make it work, as opposed to creating a new encoding system. Their fonts (called series B, because their names all started with B) are still widely used today, and they are far from ideal.

For example, you open a document that has all ک encoded as ك. But the font shows it as ك, so you don't know anything is wrong. You search for a word with ک and it doesn't find any matches. And if you are a non-technical person, you get the impression that search doesn't work and start looking through the 582-page document manually to find the word you are looking for.

Normalizing Arabic and Persian code points (to the best of my knowledge by manual replacement of one with the other, not built-in standard library functions, because they are actually different and the only reason they are sometimes mixed up is historical decisions) is a must if you want to implement any sort of search in a website or an app.



> Since Iran is not a signatory to any international copyright treaties

Getting off from a tangent, how does that work? Your copyrights are ignored on any other country? Or do your people do something to get some kind of "international copyrights"?

I imagine it does not make much difference for patents, is that right?


> Your copyrights are ignored on any other country?

Essentially yes. If a work is produced outside of Iran, it does not have any copyright protection in Iran, vice versa.

As an example, since Harry Potter was quite popular in Iran, multiple (at least six IIRC) publishers translated it to Persian for the Iranian market. One publisher could not take another one's translated version and re-print it—the translated version was produced in Iran and enjoyed copyright protection in Iran. But the original English version was fair game for anyone.


Just in case it is not quite clear what Borna Rayaneh did to fonts to add Persian support: they essentially took Arabic fonts and wingdinged them until they looked Persian.

Also the sentence saying "But the font shows it as ك" should read "But the font shows it as ک".


> (ی vs ي)

Arabic has both, but they're pronounced differently from Farsi. (ي) is a (y) sound (like seed) whereas (ی) is either an (a) sound (like bat) or an (ay) sound (like may).


> Arabic has both

Not really. Arabic has U+0649 (Arabic Letter Alef Maksura), while Farsi has U+06CC (Arabic Letter Farsi Yeh). They look similar, even identical depending on the font, as long as they are standalone. When they are in a word though, it gets more complicated.

The important difference between U+0649 and U+06CC is how they look when they are connected to other letters. The former is always dotless. The latter is only dotless when it is not connected to another letter from the left. Here is an example:

U+0649 (Arabic): ى لى ىد لىد

U+06CC (Farsi): ی لی ید لید

It's kinda similar to how Turkish I's are not the same as English I's. English capital vs small form is different from the Turkish one, so different code points is necessary:

English: I i

Turkish (dotless): I ı

Turkish (dotted): İ i

Because Turkish uses separete letters for capital and small letters, only the different forms have their own codepoints. Because in Farsi and Arabic different forms of letters are implemented as ligatures, you need a different codepoint for each of them. You cannot reuse standalone U+0649 for U+06CC.

So to recap, Turkish has dotted İ and dotless I and they always retain their dot status. English has one I that will be written with or without a dot depending on how it is placed in the sentence.

Arabic has dotted ي and dotless ى and they always retain their dot status. Farsi has one ی that will be written with or without dots depending on how it is placed in the word.


Makes sense. Historically, all Arabic letters were dotless as you probably know. I wonder if this made it into Farsi script somehow, for this case at least.


Thank you for the fascinating explanation and story




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: