Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
A tutorial on Principal Components Analysis (2002) [pdf] (otago.ac.nz)
85 points by stefankuehnel on May 22, 2023 | hide | past | favorite | 13 comments


I'm the author of this tutorial, crazy to see it linked here 20yrs after I wrote it. It was an assignment for 400-level COSC at Otago, everyone in the class had to write a tutorial on something and then present it. The department just happened to post them on their website.

Crazily, this tutorial must have filled a niche because I still get people contacting me about it, and it has 3000 references on Google Scholar. I'm glad it's helped so many people, I'm no expert in PCA or Maths in general, I learned what I needed to write the tutorial and the example code and I think the writing style must have been pretty good as lots of people seem to have been able to follow it.


PCA is really something that one must use whenever it has a multivariables model. It's a very quick check, easy to implement, which avoids basic issues such as: "I thought this variable was important in my model and well, it is not", much like : "don't optimize without measuring first".


Unless you are strictly talking about predictive modeling, I would disagree with this. PCA just tries to represent N-dimensional observations in a k < N dimensional subspace (for a given k) such that it captures the most variation. This does not mean that any obtained component loadings refer to anything real or meaningful.


To clarify your point, we need to distinguish between pruning dimensions and projecting onto a k < N set of new orthogonal dimensions. The name of PCA does make it sound like you are selecting dimensions.


> The name of PCA does make it sound like you are selecting dimensions.

I certainly thought that!

Using somes terms from the previous comment, does this mean that the k < N subspace is not (necessarily) a subset of the N-space? Or is the subspace a subset of the data with a different coordinate system?

(Yes, I'm still trying to intuitively grasp these ideas.)


I recently stumbled across a truly excellent 5-part deep dive into PCA on Peter Bloem's blog, which I highly recommend to anyone who's interested. It assumes you have some level of linear algebra knowledge.

https://peterbloem.nl/blog/pca


Here’s a great explanation of PCA in visual form, from Steve Brunton:

https://www.youtube.com/watch?v=fkf4IBRSeEc


I remember reading this when I first started using PCA in grad school for some analysis. It was definitely helpful to me.

Side note for title - article is 2002


Great to hear, thanks for letting me know :).


Huh, this was a huge help to me back in 2014 when I was doing PCA to make a statistical shape model during my PhD.


Great to hear, thanks for letting me know :).


Does anybody know some interesting practical applications of PCA?


>Does anybody know some interesting practical applications of PCA?

Not sure how "interesting" you'll find it, but PCA (and more generally, Factor analysis[0][1]) has been used for decades analyzing data from market research surveys.

Much of the process behind this analysis dates back to the late 1960/early 1970s at advertising agencies (notably Grey Advertising[2]) and market research suppliers like Grudin/Appel/Haley (all the name partners are long dead. but the firm continued as AHF Market Research. Not sure, but I don't think they exist any more).

These methods expanded within the industry and as computing resources became more generally available (back in the 60s/early 70s, it was all Fortran IV on punch cards, batch submitted to IBM/CDC mainframes), becoming pretty much de rigueur in the marketing/advertising industry by the mid 1980s.[3]

[0] https://en.wikipedia.org/wiki/Factor_analysis#In_marketing

[1] https://www.qualtrics.com/experience-management/research/fac...

[2] https://www.grey.com/

[3] Source: Both my parents were involved in adapting such multivariate analyses for marketing purposes back then, and I worked in the industry for five years back in the late 1980s/early 1990s.

Edit: Fixed prose.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: