If you're interested in the history of how the Poles and then British not only cracked the German Enigma Cipher machine used in world war 2, but then operationalized the interception, decryption and dissemination process - this is the place. There's quite a bit written about how the Enigma cipher system worked, but it isn't until you get to Bletchley Park do you understand the size, scope and scale of how they turned decryption of intercepted messages into an industrial process that gave the Allies a strategic edge in WW2. A must visit next time you're in London (35 minute train from Euston station, 5 minute walk to the museum.)
Sure .. happy to help (and to be clear, I did find the paper to be insightful even as a senior researcher. e.g. I was familiar with OODA but the SPAA was neat! Don't take my comments as too negative :) )
For recommendation systems, the top three examples that come to mind are TikTok, Layer 6 (a Canadian company that TD Bank acquired a few years back) and Netflix.
You may want to add Nerfs to the paper. They are the hot new algorithm out there. I am a scientist at a Canadian research lab and my very smart colleagues tell me it is the next best thing.
Automated vision is far head of NLP IMHO. NLP had it's Imagenet moment only at the advent of BERT (which was cira 2018? or so). Also, Transformers, which BERT and its progeny rely on, are massively compute and data hungry. They are also slow to run on today's chips. In my opinion, reason is that language benchmarks aren't as clearcut as vision. For instance, NLP researchers use BLEU scores, which are a pretty blunt instrument. I'd say NLP is even further behind than speech processing (which is now mostly based on DL). A key person behind Siri is Adam Cheyer btw .. he did Siri and then Bixby. The way these NLP systems work is pretty simple conceptually .. they break the problem to two steps. Intent Identification and then Slot filling. You can use DL for both steps but don't have to. Key issue with NLP systems is they are extremely brittle (a ton of work to customize). Dialog is pretty weak today, and that is partly due to the challenge in training signal.
You say 5000 images per images of a class. I know that was a ballpark but this seemed misleading. There are at least 2 problems I see. First, you need "different" examples .. seeing the same examples (e.g. from different viewpoints) does not help. Second, it really matters what the set of classes your model is trying to discriminate against. E.g. to differentiate apples and bananas, I likely need far fewer than 5000 examples since they are so visually distinct. Imagenet was a seminal moment not just because of the number of examples per class but because of the humongous number of classes (10K+).
For RL and robotics, there have been some neat advances. I was skeptical about RL's utility in practice (due to reasons you point out .. simulations vs. real-world, and especially the issue of faster than real-time) but am seeing it more and more in practice. E.g. 5/6G applications exist.
You may want to add coverage of some important emerging topics: multi-modal (matching vision to text and vice-versa), sensor fusion, student-teacher.
Paper didn't talk about any work from MIT's Han lab or their startup OMNIML? They just launched at the TinyML summit this year and they are hot! Also, TVM tech (startup behind it is called OctoML) is pretty important for on-device AI.
Those were some initial thoughts. If this is useful, can add to it later.
As an ML professor, I agree with all of these comments, especially the bit about Nerfs. Take a look at Waymo's use of the technology: https://waymo.com/research/block-nerf/
LOL At this point of my career I was the VP of Marketing of Convergent's Unix division which included the Miniframe. Our ad for the product - which I think I still have - said, "It's not how big it is, it's how well it performs."
The Jonathan concept of modular slices was a copy of the Convergent Technologies NGEN family of computers. frogdesign sold Apple the same packaging they had done two years earlier for Convergent. See the product family here: http://bitsavers.informatik.uni-stuttgart.de/pdf/convergent/...
The concept isn't far removed from the S-100 [1]bus as seen in the Altair 8800, and you could probably find numerous other examples, weren't early IBM mainframes made up of standardised cards with a few nand gates on each?
Brings back fond memories.
My second job in Silicon Valley was manager of Zilog's training and education department (all of five of us) and taught customers how to design systems around the Z80 which included the CTC.
see: https://steveblank.com/category/zilog/
Good times. I enjoyed your thoughtful entry on the SCC at https://steveblank.com/2009/07/30/hes-only-in-field-service/ . When I was using it as a kid on Z80s with radios and having a Z80K was a mere aspiration, like a computing Corvette, it was usually with pleasure. I'm looking forward to reading more about the the Z80K and perhaps learning the causes of Zilog's demise. It, like Tech Design Labs TDL from New Jersey and so many others, remain in the genes however far back it was.