Leaf: Machine learning framework in Rust

wall_words · on March 8, 2016

The performance graph is deceptive for two reasons: (1) Leaf with CuDNN v3 is a little slower than Torch with CuDNN v3, yet the bar for leaf is positioned to the left of the one for Torch, and (2) there's a bar for Leaf with CuDNN v4, but not for Torch.

It's good to see alternatives to Torch, Theano, and TensorFlow, but it's important to be honest with the benchmarks so that people can make informed decisions about which framework to use.

kibwen · on March 8, 2016

The graph in the readme is outdated, you can see the version with Torch/CuDNN v4 here: http://autumnai.com/deep-learning-benchmarks

And I don't believe the first point counts as deceptive; the bars are ordered by Forward ms, not by the sum of Forward and Backward. In both CuDNN v3 and v4, Leaf is faster than Torch by that metric (25 vs 28 for v4, 31 vs 33 for v3).

emcq · on March 8, 2016

Yes, on their site they post Torch CuDNN v4 as faster than Leaf [0]. Seems exciting for an early release.

Can it get much faster than something like Torch? I would think if CuDNN is doing most of the computation time it would be hard to see big improvements. Perhaps go the route of Neon and tune your GPGPU code like crazy [1, 2], or MXNet and think about distributed computing performance [3].

[0] http://autumnai.com/deep-learning-benchmarks

[1] https://github.com/soumith/convnet-benchmarks

[2] https://github.com/NervanaSystems/neon

[3] http://alex.smola.org/talks/NIPS15.pdf

jean- · on March 8, 2016

> Leaf with CuDNN v3 is a little slower than Torch with CuDNN v3, yet the bar for leaf is positioned to the left of the one for Torch

I think that's because they're sorting by forward time rather than forward+backward. That would also explain why in the Alexnet benchmark Tensorflow (cuDNN v4) is to the left of Caffe (cuDNN v3) despite having a much taller bar overall.

IshKebab · on March 8, 2016

I think Microsoft's approach with CNTK is far preferable to this. Rather than defining all the layers in Rust or C++ it uses a DSL to specify mathematical operations as a graph.

You can easily add new layer types, and recurrent connections are easy too - you just add a delay node.

Furthermore, since the configuration file format is fairly simple, it is possible to make GUI tools to visualise it and - in future - edit it.

hobofan · on March 8, 2016

A DSL based format has some advantages as it easy to get going with building networks. However you are then constrained by what the program that interprets/executes the DSL supports in terms of loading/saving data, solvers etc.. If you want to do something more dynamic e.g. AlphaGo then you have to go back to a "real" programming language anyway.

That's not to say that Leaf won't have a DSL at some point, but we will wait until the features of the layers are a bit more stabilized and we have more clearly mapped out what goals we have for a DSL.

NotUsingLinux · on March 8, 2016

so you say prototyping is easier without a DSL?

hobofan · on March 8, 2016

Depends on what kind of prototyping. At the current state of neural networks DSLs are mainly helpful if you want to tune a network architecture for well-established tasks like image classification for the Imagenet dataset.

Outside of that I see more dynamic alternatives used much more.

rubyfan · on March 9, 2016

I'm honestly skeptical that Rust is all that appealing for this type of work. It just doesn't seem like the main concerns like performance and type safety are #1 the top priority in this space and #2 this offering is differentiated enough from what you already get from Java today.

Honesly, many modeling problems are clunky and inefficient at scale - however that's ok. When you need to scale bad enough, you already have a significant set of library support in Java to support this.

I'm failing to see an answer to the one question I have, "why rust?"

andreif · on March 8, 2016

Previous discussion 4 months ago https://news.ycombinator.com/item?id=10539195

YeGoblynQueenne · on March 8, 2016

> super-human image recognition

That's a bold claim. As far as I know there was one paper that reported a model beating human scores in a specific test (imagenet, I believe). Whether that translates to "superhuman" results in general is followed by a very big question mark.

In general I really struggle to see how any algorithm that learns from examples, especially one that minimises a measure of error against further examples, can ever have better performance than the entities that actually compiled those examples in the first place (in other words, humans).

I'm saying: how is it possible to learn superhuman performance in anything from examples of mere human performance at the same task? I don't believe in magic.

Houshalter · on March 8, 2016

First of all no one ever expected machines to beat humans at Imagenet. At least not this soon. It's an amazing accomplishment, because Imagenet is high resolution pictures of many different types of objects. Which is very different than tiny photos or pictures of digits.

Second the examples were produced by scraping Flickr. Then mechanical turkers were asked to confirm if the object was in the image or not.

There are many images that are kind of ambigious, or contain multiple objects, so humans don't do perfectly. One researcher tried to estimate human performance, and got about 5%. Which has been beaten by computers now, by a lot.

YeGoblynQueenne · on March 8, 2016

> First of all no one ever expected machines to beat humans at Imagenet.

I'm not contesting the fact that it's surprising and overall a sign of progress. I'm contesting the claim that it demonstrates "superhuman" performance.

By analogy, a good student at a bad school is "superhuman" because he or she got a good mark in an exam that most other pupils _in that school_ failed. You gotta go a lot further than that before you put on the red cape.

tomp · on March 8, 2016

> how is it possible to learn superhuman performance in anything from examples of mere human performance at the same task? I don't believe in magic.

Computers could be better at assigning probabilities to ambiguous examples. In particular, for an image that is very ambiguous for most humans, maybe a computer would assign 99% probability to it (hence it would be only a little bit ambiguous).

YeGoblynQueenne · on March 8, 2016

That's not how it works. Assigning a high probability to anything is trivial: just add 90% to any probability calculation. The important thing is how close your guess is to the right answer.

kvb · on March 8, 2016

Ensembles of humans can outperform the average human, and in the same way an algorithm trained on data labeled by an ensemble of humans can outperform the average human.

YeGoblynQueenne · on March 8, 2016

Beating the average human does not make you "superhuman". Here's a quick proof: there exist mere humans with above-average performance who can outperform the "average human". Those people are human. Therefore, they're not superhuman.

Besides, I have no idea whether the people who tagged Imagenet are the "average human", nor whether an ensemble of them can outperform the "average human".

Also, I'm pretty sure that it doesn't necessarily follow that an algorithm trained by many X can outperform any X. Most humans are trained by an ensemble of humans and they don't necessarily outperform the "average human".

Mind you, I'm not saying I _know_ what "superhuman" is, but then again I'm not the one who claims to have created an example of it.

benbou09 · on March 8, 2016

It is much faster than humans

YeGoblynQueenne · on March 8, 2016

That means nothing.

Also, here's an example where humans beat machines in image recognition:

http://www.pnas.org/content/113/10/2744.full

The task is the recognition of very small and blurry images. Several different models were used, including a very deep convnet.

tomp · on March 8, 2016

Computers have been faster than humans for the last 40 years. That doesn't make them more intelligent.

deepnet · on March 8, 2016

Then by implication this task does not require intelligence ;)

Computers are faster serial processors but brains do more in parallel.

Parallel pipelines only really hit Neural Nets with GPU's and the Imagenet convnet solvers like Alexnet were among the 1st parallel implementations - this gave 30 - 300 speedup but still relatively tiny compared with squishy wetware.

_jdams · on March 8, 2016

kingnothing · on March 8, 2016

I'm completely new to ML and what real world applications it's suitable for. Are we at the point yet where you can train a computer to look at arbitrary images and count the number of people in it? What if it was the largely on the same background and only the number of people were changing -- for example, a camera shooting a queue of people to determine queue depth at a bus station.

rck · on March 8, 2016

A system like that would be surprisingly hard to build. The problem wouldn't be the ML algorithms - it would be just about everything else. A few things you need to solve robustly to build your counter:

1. The "same background" doesn't really exist for most cameras in most settings. Changes in illumination alone will make segmenting the background tricky. Moving objects in the scene will also be hard - think fountains and trees in the wind. Google for "foreground-background segmentation" to see some papers on this.

2. I haven't seen anyone use recent ML algorithms with less than high quality images. That may not matter, but it could matter a lot.

3. Extending recent ML algorithms to work with video at a high enough frame rate to be useful (10Hz at a minimum) may or may not be easy.

I'm sure that what you're proposing could be done. But I think that the number of small annoyances you'd hit would probably discourage most people who aren't treating the problem as a research exercise in Computer Vision.

danielvf · on March 8, 2016

In the scale of computer vision problems, the stationary camera case is relatively easy. It's not too hard to isolate moving objects from a background, it's not too hard to decide if an object is a person or not, and it's not too hard to keep track of an object once you've identified it. You would still have to handle overlapping people, scene illumination changes, etc, but these can be solved and have been done before.

If you would like to play with some of this stuff, take a look at OpenCV. http://opencv.org

kingnothing · on March 8, 2016

Excellent, thanks. I'll take a look at that and hack around!

somerandomness · on March 8, 2016

I actually think this is quite do-able and has been for a while. Although deep learning has revolutionized object recognition, face detection has been working reasonably well for some time, e.g. your cell phone camera or Google street view face blurring.

argonaut · on March 9, 2016

Yes. The general task of looking at arbitrary images and labeling objects (from a set of known categories) in those images is called "detection." In fact the problem you described is easier, because there's only one category (people), and the system only needs to provide a count, rather than provide bounding box rectangles around each object (which is what the standard "detection" task entails).

Convolutional neural nets are the state of the art for this, specifically deep residual learning (http://arxiv.org/abs/1512.03385). It requires a good deal of background to understand what's going on and tune/implement the models, though, even if you just use the frameworks already out there. You probably don't even need that much data - you can probably grab pre-trained models and train them on a small additional dataset you collect.

They can definitely handle arbitrary backgrounds, although having a standard background makes the problem even easier, again.

Most deep learning computer vision algos are trained on 256x256 images, so having even larger images is just fine (you can downsample, or maybe even add up the predictions of different crops).

eggy · on March 8, 2016

I will take a look at it, but are the benchmarks comparable, since to quote the site, "For now we can use C Rust wrappers for performant libraries."? Torch is LuaJit over C, and Tensorflow has Python and C++. Is Rust making it fast, or the interface code to the C libraries?

hobofan · on March 8, 2016

The interface code to the C libraries (which is written in Rust). We are however optimistic that there will be Rust libraries popping up in the future that outperform the current C implementations. (Optimistic as a Rust user, not as developer as Leaf)

ybrah · on March 8, 2016

Its interesting to see "technical debt" become a more common term. Is there a rigid definition for it?

From the article: "Leaf is lean and tries to introduce minimal technical debt to your stack."

What exactly does that mean?

jamesblonde · on March 8, 2016

It's code that you write (typically quickly), that you know will need to be re-written at a later stage. It's debt that will need to paid at some stage in the future. You didn't do it right first time.

Technical debt typically arises because the code was poorly structured or the programmer used the wrong tools/libraries (from a longer-term perspective) or didn't abstract when she should have. The current obsession with MVPs has led to an increase in technical debt.

choosername · on March 8, 2016

it could be any kind of maintainance, I thought

lloyd-christmas · on March 8, 2016

My view on it is that I lean towards _known_ future maintenance at the time of programming. Adding a global variable because you aren't sure how it's going to tie in with someone else's current feature is a bit different than adding a global variable because you think it's how it's supposed to be done. I try to make the point of distinguishing bad code and technical debt, as it becomes pretty easy to just say "it's technical debt" as an excuse for doing something poorly. I tend to put general code maintenance in a different bin. To each their own, though.

pmarreck · on March 8, 2016

Yes. https://en.wikipedia.org/wiki/Technical_debt

I've seen it firsthand. Basically, it's the accumulation of suboptimal code, over time, usually due to time constraints imposed by management. In short, any time you do a dirty hack just to get something working and meet a deadline, and then don't find the time to refactor that code into a working non-hack, you have piled a bunch of manure onto the technical-debt heap. But it also seems to be a side-effect of normal code accretion to a codebase while on a team- in other words, there seems to be no way to avoid it entirely. It's like cancer, in biology. ;)

TD-ridden code is often not modular, not unit-tested, has many dependencies (spaghetti) which are then difficult to remove or replace and tend to trigger cascading bugs/failures, has too many responsibilities, has very long methods/functions, uses mutable state (changes global state which can then impact other parts of the codebase or make concurrency impossible), or is otherwise difficult to maintain.

An example of "working" tech debt is the "God class" in codebases, the model that the entire business depends on but which is over-laden with responsibilities. The risk to change it is too great (due to the business dependence) so it becomes a constant thorn in the side of maintaining the code.

The "debt" part comes from the fact that at some point you are expected to "repay" it (via costly man-hours of refactoring work). The benefit of doing so is potentially multifold, though: Faster/more modular/better-written code, faster tests (and therefore better productivity), better designs in general, more resilient code, more maintainable code, less buggy code, etc. etc.

The only known resolutions of tech debt are costly refactorings or global rewrites. The way to reduce the risk there is to first unit-test the existing code. These books help:

http://smile.amazon.com/Growing-Object-Oriented-Software-Gui...

http://smile.amazon.com/Refactoring-Improving-Design-Existin...

NotUsingLinux · on March 8, 2016

I think this you can view this from a different point of view.

In my opinion this is because what people think software is.

So if you see software as code which expresses what you want, the question is what do you do when it does not do what you want wrong, or do you want something additionally.

So software really is our desire for some specific thing. But it is also a tool which can express arbitrary things. So its a mirror which reflects back on us to discover our real intentions and desires.

Eventually its more of a conversation in which you expand and direct your intentions. And programming or software is just one way to do that.

I think eventually AI will be able to deliver such reflecting conversations to us, the question would be which medium (hardware, operating system, programming language) will it use.

I do not think it will use building blocks (hardware, operating system, programming language) created by humans. Because those are to incomplete and arbitrary.

remember the building blocks allow for plenty of room to allow bootstrapping on multiple levels. An AI could create blocks to create a solution that is so very simple we can't even imagine, yet is unthinkable for humans right now.

taneq · on March 8, 2016

Think of it as 'code rot'. It's quick and dirty fixes that will have to be fixed later, taking longer overall than if you'd just done it right to start with (hence 'debt').

dev1n · on March 8, 2016

Tightly coupled code is how I define tech debt

eranation · on March 8, 2016

This is very cool! When I presented it to my CTO however, he said that he doesn't think this will gain traction from data scientists over Scala or Python, as Rust is even more complex than Scala (which is not the simplest language out there, even though I'm a big fan of both Scala and Rust and I know this might start a flame war)

Do you think Data Scientists can write their models directly using Leaf? do you think there will need to be a DSL that translates form the R / Python world to something you can run on Leaf to make it happen?

kibwen · on March 9, 2016

By what metric does your CTO consider Rust to be more complex than Scala? A lot of Scala's complexity has to do with interfacing nicely with Java, and Scala has a lot of implicit behavior and TIMTOWTDI-ness that Rust deliberately tries to avoid. Odersky has even said that he's hoping that he can remove many features from Scala in the future.

emcq · on March 9, 2016

It has less to do with complexity and more go do with REPL/Jupyter notebook support. Rust is a compiled language and you won't get some of the ease of exploratory data analysis you do with something like ipython.

I can use something like pandas or autograd to experiment with new optimization functions in seconds. For these big NN models it takes hours to days to wait for your model to train so squeezing out more performance is worth a more complex language.

rck · on March 8, 2016

The benchmarks would be a lot more useful if the context around them were more obvious. In particular, it would be nice to know if the benchmarks are for a single input, or for a batch of inputs. If for a batch, then the batch size is important too. Maybe this stuff is somewhere on their site, but it shouldn't require digging.

Without this information it's hard to make a useful comparison at all.

hobofan · on March 8, 2016

You are right, batchsize is important and we should make that more clear.

The numbers in the benchmark are taken from our deep-learning-benchmarks[1] which we are still in the process of building up. It might actually make sense to test the same model with different batch sizes. The current benchmarks are based on the convnet-benchmarks[2] where the Alexnet model has a batch size of 128. (Alexnet was chosen because out of the benchmarks that's the one I am most familiar with, since it small enough that I can work with it on my Laptop)

In some informal tests Leaf was generally faster than other frameworks in smaller batch sizes, but no benchmarks that we could publish with confidence yet.

[1]: https://github.com/autumnai/deep-learning-benchmarks [2]: https://github.com/soumith/convnet-benchmarks

rck · on March 9, 2016

That all sounds reasonable. I was surprised at how much the batch size can matter on different hardware. Maybe you've seen this post:

http://svail.github.io/rnn_perf/

It's primarily RNN-focused, but the discussion about batch sizes on GPUs is interesting.

zump · on March 8, 2016

Any recurrent layers?

mjhirn · on March 8, 2016

We would love to have them and compare their performance with recurrent layers of other frameworks[1]. There exists an issue for the implementation of recurrent layers in Leaf (#73)[2].

[1]: http://autumnai.com/deep-learning-benchmarks.html

[2]: https://github.com/autumnai/leaf/issues/73

mastax · on March 8, 2016

I'm glad that rust has crossed the point where posts to HN that would be "_ in Rust" are now just "_". I hope this means that Rust is starting to be used for its own merits rather than just novelty.

dang · on March 8, 2016

We changed the title to say "in Rust" because someone else complained about "for Hackers". I suppose we could take both of them out, but the project highlights its Rustiness so this seems more representative.

yarrel · on March 8, 2016

1. Rust warning.

2. If "for hackers" is the new "for dummies" then gentrification is complete.