Hi folks. Nice to see our new free (and ad-free) course here on HN! This course is for folks that are already comfortable training neural nets and understand the basic ideas of SGD, cross-entropy, embeddings, etc. It will help you both understand these foundations more deeply, since you'll be creating everything from scratch (i.e. from only Python and its standard library), and to understand modern generative modeling approaches.
We study and implement a lot of papers, including many that came out during the course, which is a great way to get practice and get comfortable with reading deep learning literature.
If you need to get up to speed on the foundations first, you should start with part 1 of the course, which is here: https://course.fast.ai
If you've got any questions about the course, generative modeling, or deep learning in general, feel free to ask!
I came across your APL study session videos while exploring the other material. I used APL professionally for about a decade back in the early 80's. I am always pleasantly surprised when I see interesting work being done in APL. Interestingly enough I always thought APL would eventually evolve into a central language for AI. There were attempts at designing hardware-based APL machines way back when. Of course, much as the language, they were ahead of the technology of the times.
I don't do much with APL these days. I do keep it around for quick calculations and exploration, far more so than doing any projects. In many ways, a thinking tool of sorts.
Not criticizing you, here, just asking a sincere question.
It was obvious to me on first encounter that APL would never become widespread. Its character set was too abstruse for most programmers and at the time required a special monitor. Plus it seemed hard to maintain if you weren’t either a mathematician or a full-time APL developer.
Not OP, but I'll say you're right, but for the wrong reasons. Having embarked on learning APL by doing a project in it, the character set becomes second nature in a few days time. No special monitor needed, at least on my Mac, just a new font and it all worked flawlessly.
What caused me to hang up my glyphs were the inconsistencies and head scratching behavior of the language in so many corner cases. Working with very nice mentors I found that one could get around them, but because of the legacy of the language they have to be kept around to support existing code bases.
I loved the notion of having a language that gave a first class experience with matrices, though, and after looking around the space, I finally came to Julia and have been very happy.
Thanks for your perspective, which is easy to understand. I hasten to say that when I talk about special monitors, I first encountered APL in the early 80s.
I have recently read "The Little Learner" and was quite amazed how much I knew already just because I know APL/J. With the right libraries, these vector languages would be perfect for neural network / deep learning tasks.
The math is covered in this course. We look at both ordinary and stochastic differential equations, and at Langevin dynamics / score matching, as well as the needed probability foundations. We only cover the necessary bits of these topics -- just enough to understand how diffusion models work (e.g. we don't look at the physics stuff which is where Langevin dynamics came from).
On the whole, we cover the needed math at the point where we use it in the course. But there's also a bonus lesson that focuses on just the mathematical side, if you're interested: https://youtu.be/mYpjmM7O-30
Maybe odd question, but: would you recommend taking this course if my goal is to build (and sell) products leveraging ML? (e.g. SaaS)
As in, with the pace of improvements from other AI startups and general availability of their APIs (e.g. GPT-4), is there a specific advantage (aside from maybe cost) to learning to build my own models? Or is the course more suitable for people wanting to become ML engineers (or similar) and to find a job as such?
Thanks in advance
Part 1 would probably be best for that: https://course.fast.ai . Lots of alums have gone on to do what you describe -- it's probably a good idea to have the level of understanding introduced there even if you mainly use external APIs. Part 2 is more for folks who want to go deeper in order to try new research ideas, optimize their models beyond standard approaches, etc.
Thanks for your reply, though I'm not sure if I get your comment correctly. As in, I agree that marketing is super important but if I had an idea about a certain SaaS product that requires a certain ML model, I need to decide either to build it myself, or using somebody else's APIs.
> We found that as our AI got worse, our product got better.
I guess my last sentence was confusing. What I meant was that we fell into the common trap that scientists want to do as fancy science as possible when they leave academia and enter the startup world. Because there's an issue of pride and desire for uniqueness. Whereas good business is not like art, where the mandate is to be as creative and unique as possible. The analogy from sports (which I don't watch) applicable to business is that teams just copy each other's plays and focus on out-executing each other.
So, specifically, our startup started with the idea that high-brow tech would be a key differentiator and give the best user experience. This is the common story trotted out by survivorship bias stories that make good tech news articles.
Whereas making cuts to our R&D time and focusing on UI/UX and working with simpler science ultimately led to better product.
From consulting, sales, and corporate work, one learns that the dirty secret of big-iron large tech companies is that all stuff sold as ML is just nicely packaged logistic regression. Or it was was five years ago. Nowadays I guess it would MAYBE be transformers, but the point being that off-the-shelf ML with solid data engineering work is what drives 99% of good products. Rarely is it truly innovative tech. I think Pete Skomoroch was the one that joked: "People say I'm a data scientist. I'm actually a data plumber."
I guess one could take this lesson from science. What you learn from a good PhD advisor is: a) read the latest work b) note the simple baseline approach constantly trashed as scoring 2% worse than the sophisticated intricate new things proposed and c) implement the simple baseline. Achieve impact not by hillclimbing on a standard metric but define a new problem or arbitrage insights from adjacent fields, etc.
Based on the last time we did a part 2 course, time spent varied a lot. Some folks could do it in 10 hours study (plus watch time) per lesson, and some took a whole year off to do the two parts of the course (including doing significant projects).
Yes, after we create something from scratch, we then see how to use implementations from libraries such as PyTorch. For basic operations like convolution, whilst we do create a GPU-accelerated version, it's not as fast as PyTorch/cudnn's highly-optimised version, so we use the PyTorch implementation for the rest of the course, rather than our handmade version.
So that means that nothing is mysterious, since we know how it's all made, but we also see where and how to use existing libraries as appropriate.
In practice, during the course we end up using PyTorch for stuff like gradients, matrix multiplication, and convolutions, and our own implementations for a lot of the stuff that's at a higher level than that (e.g. we use our own ResNet, U-net, etc.)
Ah, I've been waiting for this, looking forward to going through it!
While I appreciate your top-down approach and buy your arguments for doing it this way in previous courses, I also like this new bottom-up where one really learns what lies beneath.
We study and implement a lot of papers, including many that came out during the course, which is a great way to get practice and get comfortable with reading deep learning literature.
If you need to get up to speed on the foundations first, you should start with part 1 of the course, which is here: https://course.fast.ai
If you've got any questions about the course, generative modeling, or deep learning in general, feel free to ask!