I wish BioJulia[1][2] (bioinformatics ecosystem with Julia language) would get a...

jakobnissen · on March 11, 2021

I'm one of the main developers of BioJulia. I believe our main issue is lack of developer manpower, and not necessarily lack of funds.

Of course, if we got enough money to actually employ a developer, that would be amazing. It's just not very realistic. Furthermore, having BioJulia be developed by working scientists has its advantages.

If you, or anyone else, is interested in BioJulia, do think about making a contribution to your favorite package, it would be very welcome. Developing in Julia is extremely satisfying, as you get so much bang for your buck, while still being able to create highly efficient code.

akudha · on March 11, 2021

Can a programmer with zero knowledge of bioinformatics be of help too? Or do you need a bio background?

jakobnissen · on March 11, 2021

You don't need to have any particular skills except familiarity with Julia, but it's obviously an advantage to have a bio background - depending on what you're going to do.

Usually, the best packages come about when people are motivated to creating something specific, for example if they think the status quo in some domain is not good enough.

I'm sure we can dig up a handful of old, badly maintained projects that could use some love. Off the top of my head, it would be nice to have

* Micro-optimized our smith-waterman algorithm. That's probably fairly easy to get started with if you're not a bio person

* A number of our parsers have not been properly maintained. We use finite state automata https://github.com/BioJulia/Automa.jl to create parsers. That's for more advanced users

* We need to consolidate our scattered sources of k-mer analysis code. Another developer is re-writing our k-mer iterator protocol, but we need a big toolbox for k-mer counting, minhashing, calculating k-mer spectra etc. That's also very computer-sciency and no so much biological

Feel free to get in touch on the Julia Slack, or send me an email :)

planet-and-halo · on March 11, 2021

I'd like to second this question. I'm very interested in bioinformatics as a field, but no background. Would be happy to devote some free time but I wouldn't want to be counterproductive.

ViralBShah · on March 11, 2021

The way to do this is to find non-domain specific tasks in these projects, and make useful contributions to the team - and slowly learning as you go along. Website, CI, benchmarking, helping out new users, pointing out unclear docs, writing a tutorial as you learn, etc. are all great ways to get involved.

planet-and-halo · on March 11, 2021

Thanks a lot, these are great suggestions.

Ambix · on March 11, 2021

Are there signs of wider BioJulia adoption? Looks like bio frameworks for Python, Go and even Rust still more popular. Disclaimer: I'm just checking GitHub stars, have no glue what metrics would be more appropriate.

jakobnissen · on March 12, 2021

I can't tell how many users we have, or who they are. Unless they directly interact by e.g. raising a GitHub issue, I won't know they exist.

In my broader experience, almost no bioinformaticians use Julia. I think we, as a field, are more conservative than e.g. physicists when it comes to technology. My old institute taught me Perl as the lingua franca of bioinformatics as late as 2015 (but switched to Python the year after).

I think we, as a field, have been consistently fairly poor at choosing our programming tools. Old bioinfo scripts are cluttered mess of write-only spaghetti-Perl. Most bioinformaticians I know don't use Biopython or Bioperl or anything similar, but rather creates new programs or packages by either re-implementing the basics from scratch, or by duct-taping together static binaries and/or scripts through shell commands.

We will never get rid of having to use static binaries or external scripts, but I think BioJulia at least have a decent chance of stopping people from re-implementing the basics again and again, and providing a central "platform" that various external scripts communicate through (e.g. an old Perl script may produce DNA as a FASTA file, when can then be fed into BioJulia). The main issue is to have bioinformaticians understand the current situation is problematic.

It doesn't take much to have a big impact, I think. If we had an ecosystem of the most basic data types (biosequences, kmers, phylogenetic trees and protein structures), a collection of well-known fundamental functions to operate on them, and parsers for the 20 most common formats, we would already have a very compelling ecosystem.