Hacker Newsnew | past | comments | ask | show | jobs | submit | acidflask's commentslogin

A port of Andrej Karpathy's llama2.c to pure Julia. Works on Llama2 and Andrej's tinyllamas models. REPL mode allows interaction from the Julia console


How long until we can have Julia wrappers?


Since jakebol is involved in both TileDB and Julia, this might be up and coming?


This question keeps me up at night, as I presume it does for my colleagues at Julia Computing. Not that switching to a fully commercial model is necessarily a bad thing, but Julia Labs, like any academic group, always has to worry about where funding will come from.


I think that the problem is that the old model (proprietary development) is pretty unworkable too - but the effects are distributed less starkly. In the old days we had compilers from various vendors at a range of obsolescence, source code would work under a particular compiler and the compiler license was associated with that component. There was no budget for buying a new compiler for a particular project and maintenance gradually got worse and worse, meaning things had to be rebuilt eventually. Good for developers, good for vendors, bad for the bottom line and wider economic development. I wonder what the right size of operation for implementing, innovating and supporting a project like Julia is (not what people would dream of, but the operation that would just about do the job effectively) and I wonder what models could be created to sustain that kind of operation over the right kind of timescale.


See Slide 4 for why it's important to be open source.

TL;DR: Researcher A finds things he wants improved in Magma (closed source) but can't. Researcher B tries to write improved FOSS implementation, but lost his job, likely in part by spending too much time writing said code and not doing other things like writing papers. Researcher A moves on and has a successful academic career. Moral: writing FOSS can cost you your academic job; it's safer to find something else to do.


I'll be there for moral support, if I'm allowed to crash!


Please come!


And the problem with academic funding is that it's really hard to pay people to improve the documentation with those funds.


As Stein explains in the slides: because he couldn't sustainably pay people to work with him on SAGE, and there is only so much you can do alone.


But you can be a professor and also start a company simultaneously


I have tried for a while now, and I thought I could do both. But... (1) It is difficult on a personal level--for example last month SageMathCloud got hit by a major DDOS attack 15 minutes before I had to teach a class. I have family and though I love to work, there are only so many hours in a day. (2) There I am at a big old state university, and there are many complicated byzantine conflict of interest and IP rules, which have been a pain to navigate, and our university commercialization office isn't the best. (3) Investors greatly prefer that the person/company they are investing in is not just a side project for the person running it. All that said, the mathematics department at University of Washington is full of supportive faculty; I'm doing what I'm doing more for the people I want to hire than just for myself.


Why not take a leave of absence to at least get the company started and acquire some funding? After that, you could just have a consulting role with the company.


I did during my 2014-2015 sabbatical. Building a successful company is vastly more difficult and demanding of attention than I could have imagined. Maybe I'm just not as good at doing multiple difficult things at once as other people.


But you can't be a professor and work full-time on another project, which is what he wants to do.


He has a higher risk tolerance than me. I would work on this 80% and do the 20% required stuff as a tenured professor, but what do I know. Maybe I am overly enamored with becoming a tenured professor. If he was already spending his time as a professor working on this, I don't understand the difference. Still, all the best and good luck to him.


The work of a tenured professor is more than 20% time. A normal teaching load is 4-5 classes a year, plus significant committee work, student advising, etc. It takes an enormous amount of time. And it can be awesome, fun, and many of my colleagues love doing it. But it doesn't result in creating a free open source alternative to Mathematica.


I think people underestimate how much work a professor does. Everybody in my department who started a company either did it before starting at the university or they did it while on sabbatical.

Good luck!


If you do the bare minimum work as a tenured professor you're going to get an awful lot of people extremely mad at you at all levels of the academic hierarchy.


If you do the bare minimum work and you're spending a lot of time on your own company, at my university, you will almost certainly get a pink slip. Doesn't matter if you have tenure. We are not allowed to work more than one day a week on such things, and that requires approval, which probably won't happen if we're not publishing, have terrible teaching evaluations, and aren't doing a full share of service and advising.


This is precisely correct. And even if I could get away with it, I would not feel that it is morally right (for me at least).


Would you happen to have any insight into the question that brought me to go dig through old letters in a library in the first place?

https://news.ycombinator.com/item?id=11799143


Priority is tricky to nail down e.g. the EDSAC was operational a year before the Mark 1 (which actually was not operational until 1949). Because of the "B" Williams Tube which held the two index registers of the Mark 1, many other manufacturers -- e.g. Burroughs -- later called their index registers "B registers". (Also, I think the Univac I and II were successful commercially, and earlier than the 704.)

I started programming in earnest and for money in 1961 mostly in machine code on a variety of machines, and I'm pretty sure that the notions of single dimensional sequential locations of memory as structures (most often called arrays, but sometimes called lists), predated the idea of index registers. This is because I distinctly remember being first taught how to move through arrays sequentially just by direct address modification -- this was called indexing in the Air Force -- and later being introduced to the index register or registers -- depending on the machine.

My strong guess (from above) is that the array as sequential memory accessed by address modification of some kind was a primary idea in the very first useful computers in the late 40s, and that this idea pre-dated index registers.

It would be well worth your time to look at the instruction set of EDSAC, and try writing some programs.


The title "This guy’s arrogance takes your breath away" is taken directly from Backus's own description of this collection of letters. I've changed the title to make clearer that it is a direct quotation.

Unfortunately, the Library of Congress does not allow scanners without prior approval, so this was the only way I could make my own copies. It did not help that all these letters were written or typed on very thin mimeograph paper.


Thank you for going to the trouble of sharing this amazing material! I'd also like to hear about the historical paper you're working on.

We've taken the "arrogance" quote out of the HN title and replaced it with a neutral description of the letters—perhaps a bit too neutral, given how fabulous the post is. But HN readers can figure that out for themselves, especially once a post gets so high on the front page.


The paper is about, among other things, the history of the array data structure. It's far too early to advertise, but you can see a very early version on my GitHub account. :)

I was surprised to discover recently that the word 'array' prior to 1950 was used exclusively to describe two dimensional tables of numbers that one might find in a matrix or determinant. But by the advent of FORTRAN I in 1957 and ALGOL 58, 'array' now referred exclusively to a one-dimensional entity, as compared with 'n-dimensional arrays'. I was interested in digging through John Backus's papers from this era to see if I could find any clues.

I was able to narrow down the near window to 1952-1954, since the FORTRAN preliminary report of 1954 uses the word 'array' casually in the modern one-dimensional sense as interchangeable with 'subscripted variables', the latter being the more common terminology at the time. By comparison, a virtually unknown paper by Rutishauser in 1952 describing the "for" loop did not use the word 'array' at all, only 'subscripted variables'. (Rutishauser was an accomplished mathematician and quite possibly the world's first computational scientist.) A paper by Laning and Zierler at MIT in 1954 describing a formula compiler also used only the term 'subscripted variables'.

Backus's papers also have evidence showing that FORTRAN I was clearly written specifically to take advantage of the IBM 704's machine capabilities. Not only was the IBM 704 the world's first commercially successful computer, it was also an improvement over the preceding IBM 701 in providing index registers (3 of them) and floating point instructions which were fast for its era. Backus's papers describe how providing hardware support for indexing and floating point was revolutionary, as all programs up to that time had to write in all these instructions by hand (and for many programs was pretty much all they did).

So it is clear to me now that the changeover in the implied dimensionality of the word 'array' must be related to how the array developed as a data structure abstracting away indexing operations. By the time IAL (pre-ALGOL) came on the scene in 1958, the idea of indexable homogeneous containers was already well established. But I still haven't found any strong smoking gun evidence introducing the one-dimensional sense of the word. I suspect further digging into the description of the IBM 704 may be necessary. The 704 was not the first to provide index registers, but it may have been the first to call them as such. (The Manchester Mark I computer of 1948 appears to be the first computer with an index register, but it was called a B line. The [patent](https://www.google.com/patents/US3012724) claiming to cover index registers uses the term "control instruction" - no arrays mentioned - but it very cutely describes numbers as residing in known locations or "addresses" in quotes.)


Priority is tricky to nail down e.g. the EDSAC was operational a year before the Mark 1 (which actually was not operational until 1949). Because of the "B" Williams Tube which held the two index registers of the Mark 1, many other manufacturers -- e.g. Burroughs -- later called their index registers "B registers". (Also, I think the Univac I and II were successful commercially, and earlier than the 704.)

I started programming in earnest and for money in 1961 mostly in machine code on a variety of machines, and I'm pretty sure that the notions of single dimensional sequential locations of memory as structures (most often called arrays, but sometimes called lists), predated the idea of index registers. This is because I distinctly remember being first taught how to move through arrays sequentially just by direct address modification -- this was called indexing in the Air Force -- and later being introduced to the index register or registers -- depending on the machine.

My strong guess (from above) is that the array as sequential memory accessed by address modification of some kind was a primary idea in the very first useful computers in the late 40s, and that this idea pre-dated index registers.

It would be well worth your time to look at the instruction set of EDSAC, and try writing some programs.


Thanks for the input!

Yes, there are at least two separate stages of historical development here. The first is when people realized it was useful to repeat the same operation on different data in memory and viewed the collection of data as a variable in its own right. The earliest term I can find for this concept is "subscripted variable" (many examples prior to 1954, e.g. Rutishauser, 1952 in the "for" loop paper; Laning and Zierler, 1954)

but the idea appears to go all the way back to Burk, Goldstine and Von Neumann in 1946. Quoting p. 9, paras. 3.3-4:

"In transferring information from the arithmetic organ back into the memory there are two types we must distinguish: Transfers of numbers as such and transfers of numbers which are parts of orders. The first case is quite obvious and needs no further explication. The second case is more subtle and serves to illustrate the generality and simplicity of the system. Consider, by way of illustration, the problem of interpolation in the system. Let us suppose that we have formulated the necessary instructions for performing an interpolation of order n in a sequence of data. The exact location in the memory of the (n + 1) quantities that bracket the desired functional value is, of course, a function of the argument. This argument probably is found as the result of a computation in the machine. We thus need an order which can substitute a number into a given order-in the case of interpolation the location of the argument or the group of arguments that is nearest in our table to the desired value. By means of such an order the results of a computation can be introduced into the instructions governing that or a different computation. This makes it possible for a sequence of instructions to be used with different sets of numbers located in different parts of the memory.

"To summarize, transfers into the memory will be of two sorts:

"Total substitutions, whereby the quantity previously stored is cleared out and replaced by a new number. Partial substitutions in which that part of an order containing a _memory location-number_-we assume the various positions in the memory are enumerated serially by memory location-numbers-is replaced by a new _memory location-number_.

"3.4. It is clear that one must be able to get numbers from any part of the memory at any time. The treatment in the case of orders can, however, b more methodical since one can at least partially arrange the control instructions in a linear sequence. Consequently the control will be so constructed that it will normally proceed from place n in the memory to place (n + 1) for its next instruction."

https://library.ias.edu/files/Prelim_Disc_Logical_Design.pdf

The language is of course archaic, but the idea described clearly is that of indexing in 3.3 and arrays in 3.4. They use the word "sequence" but arguably this usage is in its ordinary mathematical sense.

The written historical evidence, at least, would confirm your strong guess that the idea of arrays itself is older than index registers. There's a missing etymological link though: when did a sequence of data stored consecutively in memory become associated with the word "array"? Still, the earliest written reference I can find for this second stage of historical development is the 1954 preliminary report on FORTRAN.

Maybe the word "array" is somehow derived from the advent of RAM, which even in its earliest form in Williams tubes had memory locations arranged physically in two dimensions. So right from the start we have two dimensions physically, but only one dimension logically, since the earliest computer instructions only dealt with (one-dimensional) offsets, if at all. Furthermore, popular science accounts of magnetic core memory describe them in terms of arrays. To give one example, the June 1955 issue of the Scientific American (no 192, pp 92–100) writes about "magnetic core arrays".

http://www.nature.com/scientificamerican/journal/v192/n6/pdf...


Thanks for making these letter public!

I missed that the quote was Backus's, it makes a lot more sense now.

The quality of your photos is excellent and I doubt a scanner would make them much better. I wish there was a transcript though.

As a start, here is the first letter transcribed:

    to John Backus
    International Business Machines Corporation
    5600 Cattle Road
    SAN JOSE CA 95193
    U.S.A
    
                          Monday 29th of May, 1978
    
              “To interrupt one’s own researches in order to 
              follow those of another is a scientific pleasure
              which most experts delegate to their assistants.”
                    (Eric Temple Bell in 
                         “Development of Mathematics”)
    
    Dear John,
    
      I do not claim to be more noble or sensible than “most 
    experts”; perhaps it is only becauseI have only one 
    assistant to whom I can delegate no more than one 
    man can do. But when you open your letter with:
        “I am quite sure that you have never read 
        any paper I’ve sent you before”
    it is my pleasure to inform you that - although
    “quite sure” - you were very wrong. I very well 
    remember that you mailed me a sizeable paper 
    on reduction languages to which I owe my introduction
    to that subject. I didn’t only read it, parts of it 
    were even studied. I also remember that it left me 
    with mixed feelings.
    
      I can very well understand your excitement, although, 
    for lack of experience, I still cannot share it. I am 
    far from delighted with the state of the art of programming 
    today, and anyone suggesting an alternative has in 


    in [sic] principle my sympathy - until, of course, he loses it again, 
    like Terry Winograd did when he suggested natural language 
    programming - “natural descriptions”! - as an alternative-.
    In the long run I have more faith in any rephrasing of the 
    programming task that makes it more amendable to mathematical 
    treatment. But you must have lots of patience, for the 
    run will be very long. It isn’t just mental inertia - that 
    problem can be solved generally by education a new 
    generation of youngsters and waiting until the old ones 
    have died - . It is the development of a new set of 
    techniques needed to achieve a comparable degree of 
    proficiency.
    
    Could you get me a copy of G.A. Mago’s (not 
    yet published) “A network of microprocessors to execute 
    reduction languages”? That might whet my appetite! 
    From various angles I have looked into such networks 
    and I am not entirely pleased with what I have 
    seen. Firstly I suspect our techniques for proving the 
    correctness of such designs: each next proof seems to 
    differ so much from all the previous ones. I suspect 
    I discovered that all I could design were special 
    purpose networks, which, of course, made me suspect 
    the programming language in the Von Neumann style 
    which, already before you have chosen your problem, 
    seems to have set you on the wrong track.
    
       Semantically speaking the semicolon is, of course, 
    only a way of expressing functional composition: it 
    imposes the same order that can also be expressed 
    with brackets - innermost brackets first -. In combination 
    with the distribution you can generate many innermost 


    bracket pairs, thus expressing very honestly that it is 
    really only a partial order that matters. I like that, 
    it is honest.
    
    When you write “one can transform programs [....]
    by the use of laws [...] which are _part of the program-
    ming language_” etc. I am somewhat hesitant. I am 
    not convinced (yet?) that the traditional separation in 
    fixed program, variable data and assertions is a 
    mistake. The first and the last - program and assertions - 
    are somewhat merged in LUCID, in which correctness 
    proofs are carried out in (nearly) the same formalism 
    as the program is expressed in. On the one hand that 
    presents a tempting unification, on the other hand I 
    thought that mathematicians speerated carefully and 
    for good reasons the language they talk about and 
    the metalanguage in which they do so. To put it in 
    another way: given a functional program, I feel only 
    confident if I can do enough other significant things 
    to it besides easy[striked out three times] carrying it out. And those I don’t 
    see yet. The almost total absence of redundancy 
    is another aspect of the same worry. In the case 
    of a traditional program we know how to make it 
    redundant: by inserting assertions, combination 
    of text and assertions makes it into a logically 
    tightly knit whole, that gives me confidence. How 
    do we this with functional programs? By supplying 
    two of them and an argument that demonstrates 
    their equivalence?
    
      What about the following example? (My notation 
    because I lack the fluency in yours.) 

    
    (1)   Consider the function f defined by:
        f(0) = 0, f(1) = 1, f(2n) = f(n), f(2n+1) = f(n) + f(n+1)

    (2)   Consider the following program (“peven” = “positive and even”,
        so for “podd”)
    
    {N>=0} n, a, b := N, 1, 0;
    _do_ peven(n) -> a, n := a + b, n/2
      [] podd(n)  -> b, n := b + a, (n-1)/2
    _od_ {b=f(N)}
    
    Here (1) gives a recursive definition of f, (2) gives 
    a repetitive program. Both definitions can be translated 
    in a straightforward manner into functional programs. 
    What would be involved in the demonstration of their
    equivalence?
    
       The above seems to me little example of the appropriate 
    degree of sophistication to try your hands on. (I am 
   not going to try it myself, as I fear that my past 
   experience would misguide me: I am afraid that I
   wouldn’t rise above the level of translating (2)’s 
   traditional correctness proof - with which I am very
   familiar - in an unfamiliar notation. Good luck!)
    
       With my greetings and warmest regards, 
                       yours ever
                               _Edsger_
    
    P.S. Please note my new postal code: 5671 AL
    (capitals obligatory) _in front of_ the village name
                                                         EWD.

Maybe someone with OCR software at hand can give the typed ones a try?


Thanks for contributing the transcript!

I've started a GitHub repo with a first draft of transcripts for all the letters.

https://github.com/jiahao/backus-dijkstra-letters-1979

Anyone else who is interested is welcome to help proofread and submit PRs.


As an addition to your transcription, here is the last letter transcribed:

    to John Backus
    91 Saint Germain Avenue
    SAN FRANCISCO, California 94114
    U.S.A.
                          12th July 1979
    
    Dear John,
       My schedule is very tight, and I am afraid that I
    must disappoint John Williams; Monday 20 or Tuesday 21
    August seem the only two slots that are really available.
      I shall arrive in the USA on Sunday 29th July, and
    had already committed myself for the week of Monday
    30 July to Burroughs in Mission Viejo. My wife and
    the two youngest children will join me for the first
    three weeks of the trip, and the week of 6th August
    was planned as "the real holiday." Since then Bill
    McKeenan has twisted my arm: the programming
    course to be given that week was so heavily over-
    booked that he asked me to give a second course,
    in parallel to David's. Under those circumstances
    I would like to avoid further commitments in
    the week starting on Monday 13: it is their last
    week in the USA, and I have then been working
    almost all the time!
      Immediately after WG2.3 I shall go to Austin,
    Texas (Burroughs and University), and then home!
    If I have a completely free choice, I think I
    would prefer Monday 20 slightly over Tuesday 21.
    
    New title and abstract:
    
    Title: Termination Detection for Diffusing Computations
    
    Abstract: The activity of a finite computation may
    propagate over a network of machines, when machines
    may delegate (repeatedly) subtasks to their
    neighbours; when such a computation is fired
    from a single machine, we call it "a diffusing
    computation." We present a signalling scheme
    --to be superimposed on the diffusing computation--
    enabling the machine that fired it to detect its
    termination. Our signalling scheme is perfectly
    general in the sense that, independently of the
    topology of the network, it is applicable to
    any diffusing computation. (End of abstract.)
    
      Please give my regards to John Williams and tell
    him how sorry I am that I shall miss him. I
    am not familiar with distance and transport facilities
    between Santa Cruz and San Jose. If it is better
    when I come to San Jose the day before that
    is fine with me; may I leave the logistics to
    you? (Wife and children leave on Friday 17th of August.)
    With my greetings and best wishes,
                     yours ever
                          _Edsger_


Why doesn't the LoC scan it themselves? That seems much more useful than just having on display in one location.


The government only recently scanned the US Statutes at Large, so I would say they have more important works to scan.


I'm genuinely surprised that one would say that Julia is "slowing down in development". Perhaps it's because less press is being generated about Julia? Or that the commit rate has gone down slightly, now that the easier issues have been picked off and the remaining work will take longer for the next round of incremental developments? I'm not sure what the OP meant, but from the inside, we are busier than ever.

- Both Julia Computing and the Julia Lab have grown sizably over the past two years. The Lab now houses ten full-time researchers (up from four last year), with five new students coming online over the summer and fall. We also maintain more active research collaborations with more research groups at MIT and off-campus.

- Julia is a grateful recipient of 12 Google Summer of Code slots this year, compared to 8 for 2015's Julia Summer of Code program (sponsored by the Moore Foundation) and 4 for GSoC 2014.

- JuliaCon grew from 72 attendees in 2014 to 225 in 2015 and we are on track to meet or exceed last year's ticket sales for 2016.

- New packages continue to be registered on the central METADATA repository at roughly the same rate since June 2014. http://pkg.julialang.org/pulse.html

By some measures we are still a relatively small project, but I don't see any serious evidence for the imminent heat death of the Julia universe.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: