hckrnws
I believe the paper asks something like this: why should it be that certain unsupervised (naive statistical) compressions of data produce structures which map to semantic categorisations (not present in that data)?
If i've understood the question of the paper correctly, my answer is simple: we induce "semantic categorisation" into (non-semantic) data structure.
In otherwords: we do not take random pictures of dogs (eg., at a zoom of 1000x on their skin); we make them subjects of photographs (and so on).
We exploit non-random non-semantic structure in communication as a kind of "helpful meta-data".
In ML the answer is always just "the data generating process (us) gave the data those properties" -- I struggle to see why so few people in the space fail to mention this
I think it's highly doubtful anything in ML (, modern AI) would work if pointed "open-camera" at arbitary parts of the world. The trick is always: we hold the camera.
> the data generating process (us) gave the data those properties
People in the space say that all the time, or else don't say it because they're talking with others who they know already understand that fact, rather than posting about it on HN in public.
HN feels like you have access to experts but it's often the case that experts don't have the time nor patience to teach grad-level classes in comment threads to clear up confusion. So the people who do post are either talking about something specific in a way that presumes a certain level of base knowledge or are frankly bullshitting.
There's a certain cultural heritage on HN, maybe across broad swaths of the tech world IRL, of speaking purely speculatively but phrasing it as presumed fact. I call this behavior "high-school debate club" but the more classical name for it is Dunning-Kruger. It seems to have a regrettably outsized effect on the discourse in the more influential (if still naive) echelons of tech, either as an extension of a broad blind optimism or as a way to stave off feelings of loneliness that are pervasive in these industries by appealing to and seeking approval of the prestigious community that HN is purported to be.
You'd be surprised at the vast array of PhDs who've entered this space through conversion courses from essentially non-technical backgrounds -- that end-up repeating essentially mystical pseudoscience.
There's lots of papers, vast numbers, that very clearly miss basic premises about the data generating process; and about what exactly we're modelling and how. Premises drilled into statisticians, and so on -- but omitted from a computer science education; and esp. from much of the route into these fields today.
Absent these, the process, i'm sure, appears very mystical.
This paper ought ask, "what assumptions about the data generating process ought we make such that these regularisation conditions work"
I really do think that the question isnt phrased this way due to "too much kool-aid" --- its a paper which seems, to me, to already buy into a kind of mystification.
Though, this may just be a symptom of not thinking about the problem from the pov of formal/traditional stats. The paper seems to be wanting some sort of "cute conceptual answer" that might appear in a hand-wavey ML paper
But the answer is to specify families of distributions whose semantic and non-sematic structures are "accessible" by these unsupervised techniques -- find which families do not work, which ones do... and then formulate an essentially psych-social hypothesis as to why the people creating htat data have used those distributions
Dunning-Kruger is a college professor who pronounces truth to the heathen in a condescending way.
>we do not take random pictures of dogs
Sure we do. The vast majority of computer vision training images come from self-driving cars’ cameras. None of the streetscapes they capture are deliberately composed, as the portrait of a pet dog might be. Yet models trained on the random streetscapes captured by self-driving cars can easily classify objects, even if they are present merely as random background entities, and not the carefully and consistently framed subject of a photo.
>I think it's highly doubtful anything in ML (, modern AI) would work if pointed "open-camera" at arbitary parts of the world.
But it does.
> computer vision training images come from self-driving cars’ cameras
Isn't this just another way of saying "we put cameras in human environments pointing at semantically relevant things"? Self driving cars aren't looking at rooftops or the inside of drains or straight down from high up. They're positioned where people are often looking from at things in human environments. Dogs, to use that example, are implicitly the subjects. We care deeply that self driving cars can see and recognize dogs (or one would hope).
If self driving cars had cameras mounted randomly at various angles and the cars drove without direction off-road wherever they pleased and took off like a drone to observe from the sky, you can be assured dogs would not be easily recognizable without supervision.
You what?
It may be like that on modern captcha mechanisms, but that was not how we got here (I.E "The vast majority of computer vision training images come from self-driving cars’ cameras").
Are you to young to remember when captcha's were based on Googles/others scans of books?
For like 10+ years the baseline was "handwritten" digits / "book text" and how well models could identify them...
Google spent millions on fighting lawsuits over this... I think you must be young (not that im insulting you, just that im showing my age and im < 30)
Cars are usually centered on the road, and the built environment has certain regular features that induce structure in the dataset of images. The same insight applies there too and GP's argument stands.
Dogs don't often float upside down above treetops, for instance.
I believe that's what Google does when it asks you to record your surroundings so it can pinpoint your location. And it works, yes.
Computer Vision in cars is trained on supervised data -- ie., human-labelled.
> why should it be that certain unsupervised (naive statistical) compressions
They seem to be investigating the intermediate representations of each layer in deep networks, and the degree to which those representations can linearly separate classes. Not "naive" statistical compressions, but a technique called UMAP is used to generate 2d scatter plots of the high dimensional representations.
(As I understood it).
Sure, I mean naive-wrt-semantics; not naive-wrt-statistics.
The naive statistical approach fails, the regularisation approach succeeds.
I'd say this does need explanation, but the paper really omits exactly what needs explaining... it isnt so much the introduction of regularisation...
Rather, it's why-it-should-be that our data generating processes encode metadata which can be capture by such regularisation
Phrased like this, i htink the answer is somewhat obviuos
Loving introspective methods such as this. It’ll be cool when to answer a question, a model will provide an answer, and because it knows a lot about itself, also provides annotated references.
Catchy title, by authors with catchy names (LeCun).
But it's just an empirical study; I'm not expecting to learn something that surprises me from it.
>But it's just an empirical study; I'm not expecting to learn something that surprises me from it.
What?
Empirical studies can be surprising.
Crafted by Rajat
Source Code