Hacker Newsnew | past | comments | ask | show | jobs | submit | electroly's commentslogin

I don't know about everyone else, but slow Julia compilation continues to cause me ongoing suffering to this day. I don't think they're ever going to "fix" this. On a standard GitHub Actions Windows worker, installing the public Julia packages I use, precompiling, and compiling the sysimage takes over an hour. That's not an exaggeration. I had to juice the worker up to a custom 4x sized worker to get the wall clock time to something reasonable.

It took me days to get that build to work; doing this compilation once in CI so you don't have to do it on every machine is trickier than it sounds in Julia. The "obvious" way (install packages in Docker, run container on target machine) does not work because Julia wants to see exactly the same machine that it was precompiled on. It ends up precompiling again every time you run the container on other machines. I nearly shed a tear the first time I got Julia not to precompile everything again on a new machine.

R and Python are done in five minutes on the standard worker and it was easy; it's just the amount of time it takes to download and extract the prebuilt binaries. Do that inside a Docker container and it's portable as expected. I maintain Linux and Windows environments for the three languages and Julia causes me the most headaches, by far. I absolutely do not care about the tiny improvement in performance from compiling for my particular microarch; I would opt into prebuilt x86_64 generic binaries if Julia had them. I'm very happy to take R's and Python's prebuilt binaries.


I am very interested in improving the user-experience around precompilation and performance, may I ask why you are creating a sysimage from scratch?

> I would opt into prebuilt x86_64 generic binaries if Julia had them

The environment varial JULIA_CPU_TARGET [1] is what you are looking for, it controls what micro-architecture Julia emits for and supports multi-versioning.

As an example Julia is built with [2]: generic;sandybridge,-xsaveopt,clone_all;haswell,-rdrnd,base(1)

[1] https://docs.julialang.org/en/v1/manual/environment-variable...

[2] https://github.com/JuliaCI/julia-buildkite/blob/9c9f7d324c94...


I have a monorepo full of Julia analysis scripts written by different people. I want to run them in a Docker container on ephemeral Linux EC2 instances and on user Windows workstations. I don't want to sit through precompilation of all dependencies whenever a new machine runs a particular version of the Julia project for the first time because it takes a truly remarkable amount of time. For the ephemeral Linux instances running Julia in Docker, that happens on every run. Precompiling at Docker build time doesn't help you; it precompiles everything again when you run that container on a different host computer. R and Python don't work like this; if you install everything during the Docker image build, they will not suddenly trigger a lengthy recompilation when run on a different host machine.

I am intimately familiar with JULIA_CPU_TARGET; it's part of configuring PackageCompiler and I had to spend a fair amount of time figuring it out. Mine is [0]. It's not related to what I was discussing there. I am looking for Julia to operate a package manager service like R's CRAN/Posit PPM or Python's PyPI/Conda that distributes compiled binaries for supported platforms. JuliaHub only distributes source code.

[0] generic;skylake-avx512,clone_all;cascadelake,clone_all;icelake-server,clone_all;sapphirerapids,clone_all;znver4,clone_all;znver2,clone_all


My point is if you set JULIA_CPU_TARGET during the docker build process, you will get relocatable binaries that are multi-versioned and will work on other micro-architecture? It's not just for PackageCompiler, but also for Julia's native code cache.

It worked! I was able to drop the Windows install on a standard GitHub Actions worker from 1 hour to 27 minutes. Here's what worked:

    ARG JULIA_CPU_TARGET="generic;skylake-avx512,clone_all;cascadelake,clone_all;icelake-server,clone_all;sapphirerapids,clone_all;znver4,clone_all;znver2,clone_all"
    ARG JULIA_PROJECT=[...]
    ENV JULIA_PROJECT=[...]
    RUN julia -e "using Pkg; Pkg.activate(\"[...]\"); Pkg.instantiate(); Pkg.precompile();"
What I got wrong the first time: I failed to actually export JULIA_CPU_TARGET so it would take effect in the "Pkg.precompile()" command. In reality, I hadn't correctly tested with that environment variable set at all. I was only correctly setting it when running PackageCompiler.

Thank you so much for this! It's too late for me to edit my original post, but cutting the install time in half is a major win for me. Now it only needs to precompile, not also compile a sysimage.


That was the very first thing I tried, and I couldn't get it to work, but I'm sure I am doing something wrong. Everything seemed great at build time, and then it just precompiles again at runtime, without anything saying why it decided to do that. I'll give it another shot if you say it should be working. The PackageCompiler step is the longest part; if that can be removed, it'll make a big difference. I'd rather be wrong and have this working than the other way around :) I'll report back with what I find.

> It took me days to get that build to work; doing this compilation once in CI so you don't have to do it on every machine is trickier than it sounds in Julia

You may be interested in looking into AppBundler. Apart from the full application packaging it also offers ability to make Julia image bundles. While offering sysimage compilation option it also enables to bundle an application via compiled pkgimages which requires less RAM and is much faster to compile.


I believe it is. Just tested it. You can make the link "C:\windows\system32\cmd.exe" and clicking it will launch the Command Prompt. I noticed you can't make it "C:\windows\system32\cmd.exe /c some-nefarious-thing"; it doesn't like the space. Exploiting may require you to ship both the malicious EXE and the MD, then trick the user into clicking the link inside the MD. But then you could have just tricked them into directly clicking the EXE.

>Exploiting may require you to ship both the malicious EXE and the MD, then trick the user into clicking the link inside the MD. But then you could have just tricked them into directly clicking the EXE.

1. You can use UNC paths to access remote servers via SMB

2. Even if it's local, it's still more useful than you make it out to be. For instance, suppose you downloaded a .zip file of some github project. The .zip file contains virus.exe buried in some subfolder, and there's a README.md at the root. You open the README.md and see a link (eg. "this project requires [some-other-project](subfolder\virus.exe)". You click on that and virus.exe gets executed.


> 1. You can use UNC paths to access remote servers via SMB

Relevant article from The Old New Thing: https://devblogs.microsoft.com/oldnewthing/20060509-30/?p=31...

Programs (this is true for most mainstream operating systems) can become network facing without realizing it. I've sometimes found a bunch of Windows programs sometimes tends to assume that I/O completes "instantly" (even if async I/O has been common on Windows for a very long time) and don't have a good UX for cancelling long running I/O operations


Definitely; I didn't mean to underplay it. Here's a fun one:

    [Free AI credits](C:\windows\system32\logoff.exe)
It works. This is a real exploit that you could do things with.

What if the space is url encoded %20 ?

That wouldn't work because Windows doesn't understand url-encoded sequences.

I won't be paying extra to use this, but Claude Code's feature-dev plugin is so slow that even when running two concurrent Claudes on two different tasks, I'm twiddling my thumbs some of the time. I'm not fast and I don't have tight deadlines, but nonetheless feature-dev is really slow. It would be better if it were fast enough that I wouldn't have time to switch off to a second task and could stick with the one until completion. The mental cost of juggling two tasks is high; humans aren't designed for multitasking.


Hmm I’ve tried two modes: one is to stay focused on the task at hand, but spin up alternative sessions to do documentation, check alternative hypotheses, second-guess things the main session is up to. — The other is to do an unrelated task in another session. I find this gets more work done in a day but is exhausting. With better scaffolding and longer per-task run times (longer tasks in the METR sense), could be more sustainable as a manager of agents.

Two? I'd estimate twelve (three projects x four tasks) going at peak.


3-4 parallel projects is the norm now, though I find task-parallelism still makes overlap reduction bothersome, even with worktrees. How did you work around that?


If you're hosting on a public cloud, you can use a feature like AWS Session Manager to connect "through the backdoor" (via the guest's private communication with the hypervisor) without actually opening the ssh port to the world. This should fully address the client's concerns. None of my servers have ssh exposed at all.


How does the nature of remote access address the legal concern (presumably) about there being remote access in general?


That isn't my presumption about nature of the concern. In OP's other comment they specify that the client is specifically worried about the open port.


Well, if you allow remote access, you conceptually allow some kind of logical inbound connection, no matter how it's technically realized.


In the late 90s/early 00s, I worked at a company that bought a single license of Visual Studio + MSDN and shared it with every single employee. In those days, MSDN shipped binders full of CDs with every Microsoft product, and we had 56k modems; it was hard to pirate. I don't think that company ever seriously considered buying a license for each person. There was no copy protection so they just went nuts. That MSDN copy of Windows NT Server 4 went on our server, too.

This was true of all software they used, but MSDN was the most expensive and blatant. If it didn't have copy protection, they weren't buying more than one copy.

We were a software company. Our own software shipped with a Sentinel SuperPro protection dongle. I guess they assumed their customers were just as unscrupulous as them. Probably right.

Every employer I've worked for since then has actually purchased the proper licenses. Is it because the industry started using online activation and it wasn't so easy to copy any more? I've got a sneaky feeling.


> In the late 90s/early 00s, I worked at a company that bought a single license of Visual Studio + MSDN and shared it with every single employee.

During roughly the same time period I worked for a company with similar practices. When a director realised what was going on, and the implications for personal liability, I was given the job of physically securing the MSDN CD binder, and tracking installations.

This resulted in everyone hating me, to the extent of my having stand-up, public arguments with people who felt they absolutely needed Visual J++, or whatever. Eventually I told the business that I wasn't prepared to be their gatekeeper anymore. I suspect practices lapsed back to what they'd been before, but its been a while.


Primarily it's the reason you already know: restic and borg are the same model, but restic doesn't need it to be an ssh-accessible filesystem on the remote end. Restic can send backups almost anywhere, including object storage like your Backblaze B2 (that's what I use with restic, too). I agree with OP: restic is strictly better. There's no reason to use borg today; restic is a superset of its functionality.


Thanks! Then I’ll look more at Restic :)


Does restic work well with truenas?


I don't know specifically, but it's a self-contained single file Go executable. It doesn't need much from a Linux system beyond its kernel. Chances are good that it'll work.


I simply use SQLite for this. You can store the cache blocks in the SQLite database as blobs. One file, no sparse files. I don't think the "sparse file with separate metadata" approach is necessary here, and sparse files have hidden performance costs that grow with the number of populated extents. A sparse file is not all that different than a directory full of files. It might look like you're avoiding a filesystem lookup, but you're not; you've just moved it into the sparse extent lookup which you'll pay for every seek/read/write, not just once on open. You can simply use a regular file and let SQLite manage it entirely at the application level; this is no worse in performance and better for ops in a bunch of ways. Sparse files have a habit of becoming dense when they leave the filesystem they were created on.


I dont think the author could even use SQLite for this. NULL in SQLite is stored very compactly, not as pre-filled zeros. Must be talking about a columnar store.

I wonder if attaching a temporary db on fast storage, filled with results of the dense queries, would work without the big assumptions.


I think I did a poor job of explaining. SQLite is dealing with cached filesystem blocks here, and has nothing to do with their query engine. They aren't migrating their query engine to SQLite, they're migrating their sparse file cache to SQLite. The SQLite blobs will be holding ranges of RocksDB file data.

RocksDB has a pluggable filesystem layer (similar to SQLite virtual filesystems), they can read blocks from the SQLite cache layer directly without needing to fake a RocksDB file at all. This is how my solution (I've implemented this before) works. Mine is SQLite both places: one SQLite file (normal) holds cached blocks and another SQLite file (with virtual filesystem) runs queries against the cache layer. They can do this with SQLite holding the cache and RocksDB running the queries.

IMO, a little more effort would have given them a better solution.


Ah, clever. Since they chose RocksDB I wonder if Amazon supports zoned storage on NVMe. RocksDB has a zoned plugin which describes an alternative to yours.


Being specific: AWS load balancers use a 60 second DNS TTL. I think the burden of proof is on TFA to explain why AWS is following an "urban legend" (to use TFA's words). I'm not convinced by what is written here. This seems like a reasonable use case by AWS.


Not one of the downvoters, but I'd guess it's because this is only true with HATEOAS which is the part that 99% of teams ignore when implementing "REST" APIs. The downvoters may not have even known that's what you were talking about. When people say REST they almost never mean HATEOAS even though they were explicitly intended to go together. Today "REST" just means "we'll occasionally use a verb other than GET and POST, and sometimes we'll put an argument in the path instead of the query string" and sometimes not even that much. If you're really doing RPC and calling it REST, then you need something to document all the endpoints because the endpoints are no longer self-documenting.


HATEOAS won't give you the basic nouns on which to work with


Right, you wouldn't need HTML at all for LLMs though. REST would work really well, self a documenting and discoverable is all we really need.

What we find ourselves doing, apparently, is bolting together multiple disparate tools and/or specs to try to accomplish the same goal.


But that is roughly the point here. If we still used REST we wouldn't need swagger, openapi, graphql (for documentation at least, it has other benefits), etc.

We solved the problem of discovery and documentation between machines decades ago. LLMs can and should be using that today instead of us reinventing bandaids yet again.


A lot of negative responses so I'll provide my own personal corroborating anecdote. I am intending to replace my low-code solutions with AI-written code this year. I have two small internal CRUD apps using Budibase. It was a nice dream and I still really like Budibase. I just find it even easier yet to use AI to do it, with the resulting app built on standard components instead of an unusual one (Budibase itself). I'm a programmer so I can debug and fix that code.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: