If you know how to write low-level C (i.e. with direct Win32/POSIX API calls), that's a good start. If you don't know what that means, you need to master this first. So much of assembly is built to support higher-level programming, things like segments, indirect references, system calls, etc., so it pays to know the WHY of all of it (e.g. do you know what a frame pointer is? why it's useful and can sometimes be omitted)? Learn this stuff first.
Once you have a good handle on C, you need to start learning how operating systems and memory management work, at least the "client side" of them accessible from userspace.
Then you might want to dip your toe into how CPUs are built, with pipelines, registers, caches, all of that.
If you've mastered those things, gcc -S will provide everything else you need (another comment suggested this).
I learned this stuff in a traditional university computer engineering program. It helped a lot. But for context, this question feels a little like, "can someone explain quantum physics"? It's a huge topic but only like 5% is the actual thing you're asking about, the other 95% is the conceptual machinery (calculus, probability, mechanics) it's built on. Mastering all the other stuff is actually the hard part. For all intents and purposes, assembly is just a notational convenience for expressing program structure in the only way a CPU can understand it. At least 90% of the difficulty is understanding how CPUs work, and how to think at sufficiently low level that you can express human-useful work at such a low level of abstraction.
It would also be nice to know why you want to know this. Writing a device driver is going to be different from writing an operating system, which will be different from writing tight numerical loops in assembly. And in any case, dollars to donuts you won't be able to beat a modern optimizing compiler performance-wise.
How can I get started on learning the low level c stuff with direct system calls and such. I learned about assembly, pipelining, cache, and memory allocation during a systems architecture course, so I'm really interested in the low level of the machine. The problem is I don't really have an idea of a project that I could do to learn the level right above. Of course I learned a bit during the course, but I really want to get a solid grasp.
I'd recommend writing something you might have used or be familiar with, in bare-metal C. Maybe try writing a simple web server, which is something most people will be familiar with. Try doing it a couple different ways: single-threaded, prefork (processes), event-based with select/kqueue/epoll. Write your own data structures: hash tables, linked lists, etc.
This sounds simple but will give you a good tour of a lot of different areas: BSD sockets, named pipes/unix domain sockets for cross-thread communication, signals, memory management, threading, basic scheduling, and parallel programming (locks/concurrency), to name a few.
If you really want to understand modern x64 or other ASM, write a simplified compiler for C. I would recommend starting with a tiny subset of the language focused on whatever aspects you want to understand. You can then compare your output with a modern compiler to get a better understanding of the intricacies involved.
Your comment about compilers is most likely true. The exception being if he needs to vectorize or if he had the opportunity use these special-purpose instructions that are seemingly built to accelerate particular algorithms.
On x86 pretty much all of the special purpose and vector instructions are accessible from C or C++ through intrinsics. No need to drop down to assembly for that except perhaps for a very, very specialized use case.
If you know how to write low-level C (i.e. with direct Win32/POSIX API calls), that's a good start. If you don't know what that means, you need to master this first. So much of assembly is built to support higher-level programming, things like segments, indirect references, system calls, etc., so it pays to know the WHY of all of it (e.g. do you know what a frame pointer is? why it's useful and can sometimes be omitted)? Learn this stuff first.
Once you have a good handle on C, you need to start learning how operating systems and memory management work, at least the "client side" of them accessible from userspace.
Then you might want to dip your toe into how CPUs are built, with pipelines, registers, caches, all of that.
If you've mastered those things, gcc -S will provide everything else you need (another comment suggested this).
I learned this stuff in a traditional university computer engineering program. It helped a lot. But for context, this question feels a little like, "can someone explain quantum physics"? It's a huge topic but only like 5% is the actual thing you're asking about, the other 95% is the conceptual machinery (calculus, probability, mechanics) it's built on. Mastering all the other stuff is actually the hard part. For all intents and purposes, assembly is just a notational convenience for expressing program structure in the only way a CPU can understand it. At least 90% of the difficulty is understanding how CPUs work, and how to think at sufficiently low level that you can express human-useful work at such a low level of abstraction.
It would also be nice to know why you want to know this. Writing a device driver is going to be different from writing an operating system, which will be different from writing tight numerical loops in assembly. And in any case, dollars to donuts you won't be able to beat a modern optimizing compiler performance-wise.
Hope this helps.