weijiacheng's comments

weijiacheng · on Jan 1, 2024

The site actually hosts several "religious books" (try filtering by the "Spirituality" tag -- I've even produced several books on religious topics myself for SE). What it doesn't host are "Religious texts from modern world religions" (what some might call "scriptures," e.g. the Bible or the Quran) which is a much narrower category than "religious books."

As a religious person myself, I actually think this policy is very sensible. Most (nearly all?) religious texts of major world religions were originally written in languages other than English, and so if SE were to try to host those texts the site would have to make an editorial call about which translations of those texts are the "best." That quickly enters very murky theological territory, where one side of a given religion might push for one particular translation, whereas another side would push for another translation.

To give the Bible as an example, Catholics and Orthodox Christians include the deuterocanonical books (e.g. Tobit, Judith, Sirach) in their canons whereas Protestants exclude these. Would the SE version of the Bible include these? Some American fundamentalist Christians claim that the King James Version is the only valid English translation of the Bible, whereas the Revised Version (also available in the public domain) is based on more reliable Greek manuscripts. But some conservative Christians reject the Revised Version and its descendants based on certain theological premises...

Do you catch my drift? IMHO it's very sensible for SE to avoid these sorts of debates entirely by sticking to books where you could argue (with some degree of handwaving) that there really is a "best version" :)

azangru · on Jan 2, 2024

> Most (nearly all?) religious texts of major world religions were originally written in languages other than English, and so if SE were to try to host those texts the site would have to make an editorial call about which translations of those texts are the "best."

Is there a technical reason to disallow multiple translations of the same text? I can see on the "wanted ebooks" page a number of translated titles[0]; so the project does seem to make editorial decisions about which translations to work on. Obviously, where one translation exists, there may be others that have other advantages.

[0] - https://standardebooks.org/contribute/wanted-ebooks

robin_reala · on Jan 2, 2024

We try to pick the “best” translation that’s in the public domain in the US. Quite often, that’s a single translation unfortunately, but if there are multiple we do try to evaluate them from a readers point of view.

mahalex · on Jan 2, 2024

> Most (nearly all?) religious texts of major world religions were originally written in languages other than English, and so if SE were to try to host those texts the site would have to make an editorial call about which translations of those texts are the "best."

The site already hosts a number of works that were originally written in languages other than English, and yet it had no problems making an editorial call about which translations of those texts are the "best." The obvious solution would be to just allowing multiple translations of foreign-language books.

devashishp · on Jan 2, 2024

I think that makes sense, but it still seems a bit arbitrary, I don’t see bookshops having these issues

weijiacheng · on Jan 2, 2024

Yes, bookshops will sell one version of the Bible to Catholics, another to Protestants, another to fundamentalists, another to progressives, etc. :)

In contrast, part of the SE editorial philosophy is that it tries to host the best (based on academic scholarship, translation quality, academic acclaim, etc.) version of each text available in the public domain, which excludes that "something for everyone" sort of play available to a commercial bookstore. You could rightly argue that this is losing something (it's good to have multiple translations to compare if you're reading a text for critical purposes), but the SE editorial philosophy avoids a certain amount of confusion and clutter for the general reader. So there's a deliberate (you could call it "arbitrary" in some sense, if you wish) tradeoff being made here.

opminion · on Jan 2, 2024

US Barnes & Noble can have a few meters of shelves with different versions of the Bible, and a buying guide. It is quite striking if you are not used to it.

pasc1878 · on Jan 2, 2024

Part of the issue would be that the nooks are translations and the copywriter data would be from the translation date.

So modern versions of e.g. the Bible could not be in Standard Ebooks. So easiest to not carry any translations.

Bookshops have no problem with this as part of the purchase price will go to the copyright owners of the translation.

MrDrMcCoy · on Jan 3, 2024

> So modern versions of e.g. the Bible could not be in Standard Ebooks.

There are modern translations that are permissively licensed and are of surprisingly high quality. See the NET Bible as a prime example. It's also the only one I know of with good translation notes that can be had for free.

mahalex · on Jan 2, 2024

Modern versions of e.g. Tolstoy's "War and Peace" could not be in Standard Ebooks. So easiest to not carry any translations?

weijiacheng · on Jan 2, 2024

One of the funny things about Bible translations is that more modern translations are based on older manuscripts than older translations, due to advances in archeology. SE can't carry any translations that incorporate the insights of the Dead Sea Scrolls, and having access to some of the oldest Hebrew manuscripts is a pretty big deal when it comes to translating the Tanakh.

It's true, modern versions of War and Peace can't be hosted at SE, but those modern versions generally don't reflect revolutionary leaps in archeology :)

wyclif · on Jan 3, 2024

It seems like most of the Christian books on SE are Roman Catholic in orientation (Belloc, Chesterton, etc.) Pilgrim's Progress is a Protestant work, but it would be good to see a better representation of both pre-Reformation and Protestant titles.

bentley · on Jan 3, 2024

Can you provide any specific recommendations?

wyclif · on Jan 4, 2024

Sure, how about some classics like:

The Didache

Anselm, Cur Deus Homo

Anselm, Proslogion

Augustine, City of God

Augustine, Confessions

Augustine, On Christian Doctrine

William Law, A Serious Call to a Devout and Holy Life

Luther, The Bondage of the Will

Calvin, Institutes of the Christian Religion

Pascal, Pensées

All of these are in public domain.

weijiacheng · on Jan 1, 2024

In addition to what Alex has said, as an SE contributor I do try to submit errata to Project Gutenberg where I can find the time and energy. Part of the problem, though, is that PG's errata process (https://www.gutenberg.org/help/errata.html) is quite cumbersome since you have to write an email to their errata team with each individual error. That's a real hassle to try to keep track of and submit. Ideally, if PG had something like a pull request system, I would just be able to find those errors in their code and submit the changes directly, but unfortunately they don't have that, so far as I am aware.

That is one major advantage SE has, I think, which is that we do allow people to make pull requests against any of our ebook repositories and any PRs that get merged are automatically deployed to the site. This makes it much, much easier for tech-savvy people to submit proofreading corrections!

cxr · on Jan 1, 2024

> Part of the problem, though, is that PG's errata process (https://www.gutenberg.org/help/errata.html) is quite cumbersome since you have to write an email to their errata team with each individual error. That's a real hassle to try to keep track of and submit. Ideally, if PG had something like a pull request system, I would just be able to[...]

On the other side of the coin, Standard Ebooks's heavy endorsement/buy-in of GitHub-based workflows are offputting to broader audiences. (It's pretty offputting to me, and I'm not even non-technical; I just recognize it as a sort of Conway's Law + Law of the Hammer sort of thing, and it chafes.) I.e., for others what you describe is far less than "ideal".

bentley · on Jan 1, 2024

Typos can be reported by email on SE too. Git is only required when you’re publishing a new book. My observation from watching the mailing list is that emailed typos are fixed quickly. (I always fix typos using pull requests, and those are acted on quickly too.)

acabal · on Jan 1, 2024

You don't have to use Github if you don't want to, but you do have to use Git. We've had more than a few producers successfuly produce ebooks without using GitHub or Google Groups.

starkparker · on Jan 1, 2024

> We've had more than a few producers successfuly produce ebooks without using GitHub or Google Groups.

Can you share or document how? https://standardebooks.org/contribute suggests that "Technically inclined readers can produce ebooks themselves" but doesn't provide any point of entry to do so other than a link to the GitHub org, and "No technical experience is necessary. Contact the mailing list if you want to help." just links to the Google Group.

acabal · on Jan 1, 2024

It's very uncommon, if you want to do that then just email me privately and we can set something up.

weijiacheng · on Jan 1, 2024

I am one of the SE editors/regular contributors and I did play around with this a bit for a poetry collection: https://groups.google.com/g/standardebooks/c/IUvGLmvZrmM/m/s...

I'm sure someone sufficiently determined and good at prompt engineering, and integrating LLMs into a larger toolset, could come up with something even better. I'm personally very skeptical of LLMs as a technology, but even I have to admit that this was a pretty ideal and unobjectionable use of LLMs.

That being said, though it was a fun experiment, I later found that it was easier (and less wasteful of natural resources) to just do the same thing with a bit of custom markup and a search and replace script.

duskwuff · on Jan 1, 2024

I don't think that's quite what the parent had in mind.

The most natural application of a language model in proofreading is to compute perplexity across the text; if all goes well, errors should be detectable as points of unusually high perplexity. (In principle, this should even be able to spot otherwise undetectable errors like missing words.)

weijiacheng · on Jan 2, 2024

I could see how that would be helpful, but at least for my use case I'm more interested in seeing how LLMs integrated with computer vision can speed up transcriptions. Since a thorough proofread by a human is already baked into the SE production process (and is indeed one of the major selling points), having more automated tools to aid proofreading is nice but doesn't do anything fundamentally different, from my point of view. Whereas if LLMs can be leveraged for transcription SE producers no longer need to depend on external projects like Project Gutenberg or Wikisource to produce texts (which can take months) or transcribe texts from OCR results by hand (very tedious and error-prone--believe me, I'm speaking from experience!). It would drastically open up the range of possible books someone could reasonably produce (in a timely fashion) for SE.