Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This post reminded me of a conversation I had with my cousins about language and learning. It’s interesting how (most?) languages seem inherently sequential, while ideas and knowledge tend to have a more hierarchical structure, with a “base frequency” communicating the basic idea and higher frequency overtones adding the nuances. I wonder what implications this might have in teaching current LLMs to reason?


> It’s interesting how (most?) languages seem inherently sequential, while ideas and knowledge tend to have a more hierarchical structure

Spoken and written languages are presented in a sequential medium. They still represent hierarchical trees in their structure though.

(Notable semi-exception to the linearity are the sign languages, which are are kinematic three-dimensional languages involving two hands, an entire upper body and facial expressions. While I don't speak it, I've read a bit about it, and apparently the most common error for non-deaf people who learn it is to make so-called "split verb" errors. That is to say: to sign in a linear fashion like one would with a spoken language, instead of making use of all the parallel communication options available)


In the movie Arrival, the aliens use a non sequential language.


Hm, Italian speakers look like what you describe. :-)


I know you're joking, but since we're among nerds who like technical correctness: what Italians do is known as "gesticulation". It is an important part of their speech, for sure, just like the melody of a spoken language can add layers of depth to a sentence when compared to its written representation. As far as I know this is not, however, a sign language. Sign languages have their own grammar that are not comparable to spoken languages. Italians do not take their gesticulation that far AFAIK.


Ok I get that people downvoted my comment being a killjoy, but the point I was trying to make is a serious one: namely that sign languages are real, valid languages, that Deaf people who speak and think in it should be taken seriously, and that the consequences of not doing so are severely damaging for the Deaf.

The Deaf community has suffered a lot of discrimination throughout history, and two of the biggest issues are non-deaf people deciding on their behalf what is best for them, and forcing them to use vocal languages (which makes as much sense as forcing a blind person to communicate via colorful paintings) while denying them access to sign language. Ask any Deaf person about Milan 1880 and why Alexander Graham Bell is so controversial in their community. A major driver in this has always been that people who don't know sign languages tend to think of them as funny interpretive miming.

With that in mind comparing sign languages to "haha Italian gesticulation funny" jokes without being aware of the differences can become a form of infantilization.

[0] https://www.deafhistory.eu/index.php/component/zoo/item/1880

[1] https://en.wikipedia.org/wiki/Infantilization


Statements can have high internal branching & nesting (clauses, referents, etc.) but it seems to hit the limits of the brain's pushdown stack pretty quickly.


Now you're making me curious why people with ADHD (me included) tend to have a weird tendency for writing longer run-on sentences with commas, that on top of that use more parenthesis than average. Often nesting them, even. Because according to research our working memory is a little lower on average than neurotypicals, which seems to contradict this.


Perhaps the text itself is functioning as working memory.

Both ADHD people and neurotypicals have deeply structured thoughts. "Serializing" those thoughts without planning ahead leads to the "stream of consciousness" writing style, which includes things like run-on sentences and deeply nested parentheses. This style is considered poor form, because it is hard to follow. To serialize and communicate thoughts in a way that avoids this style, it is necessary to plan ahead and rely on working memory to hold several sub-goals simultaneously, instead of simply scanning back through the text to see which parentheses have not been closed yet.

It could also be simply that ADHD people have "branchier" thoughts, hopping around a constellation of related concepts that they feel compelled to communicate despite being tangential to the main point; parentheses are the main lexical construct used to convey such asides.


It's not just "branchier" thought that make it hard to communicate, it's graphier thoughts, when you mean (it's important) to communicate that it's not just a tree, but that connections may also go both ways, and sometimes they even have cycles. That to see the full picture in more nuance you've got to consider those feedback loops, and that they don't necessarily have precedence one over the other but that they must be all taken account simultaneously.

When you explain it serially you are forced to choose a spanning tree, and people usually stop listening when the spanning tree has touched all the relevant concepts, then they persuade themselves they got the full picture but miss some connections, that make the problem more complex and nuanced.

When graphs have more than one loop, loopy belief propagation doesn't work anymore and you need an another algorithm to update your belief without introducing bias.


This explanation resonates with me a lot. I use Logseq to store my notes in a graph now, which works pretty darned good for me, but it still bothers me that I can't have polyhierarcies in the namespaces and/or compound aliases.

I want to be able to simultaneously encode [[Computer Science]] and [[Computer]] [[Science]].

And [[Project1/Computer Science]] to at least provide a connection to [[Project2/Computer Science]].


I am not familiar with logseq. The sort of connection you want to made can often be made automatically using some embeddings. Because [[Project1/Computer Science]] and [[Project2/Computer Science]] likely have similar content, their semantic embedding are probably close, and a neighborhood search can help find them easily.

Communication is kind of the game of transmitting the information in such a way that your interlocutor internal representation of things ends up mapping to yours. Low dimensional embeddings are often very useful, but sometimes graph are not planar. Symmetry is usually useful, and a symmetric higher dimensional embedding is often better, because the symmetry constrain it more making it easier to be sure it was transmitted correctly.

When people ends up with different concept maps, in one of which some concepts are located near each other and in the other the same concepts are located far apart, interesting things usually happen when they communicate, ranging from culture enlightenment to culture war.

Some of these mapping are sometimes constrained to 3d, by things like memory palaces, (method of loci), but this is somewhat arbitrary, and staying more abstract and working in higher dimension until you "feel" everything fall into the right place intuitively is often preferable, (aka the Feynman method).


Yes I think embeddings using some sort of analysis is the correct answer.

I have a basic natural language processing system implemented in Neo4J (what I tried to use before Logseq). But to take notes I like plain text more than a database. Less dependencies.

The problem with embeddings, is I don't know how I would wire that into my workflow yet. Plain text notes have links, I would need a separate interface or mode to browse and analyze the connections.


If it exhibits in spoken language as well, that would be evidence for the "branchier thoughts" explanation.

That said, knowing when to use dashes—longer than hyphens—can help mix things up.


Well, people with ADHD often have varying degrees of pressured speech, which on the surface appears like it could have the same origins.

https://en.wikipedia.org/wiki/Pressure_of_speech


One guy (whom I (electronically more than else) know) writes (can) in (most of the times this (or deeper)) style.

He can produce whole paragraphs of this semi-regular language and it even has distinct structure and non-standard interactions like in the above sentence.


The rule of parenthesis (that they only ever add context) implies that your example sentence's core message is;

"One guy write in style"


GP is hitting against limit of expressiveness of sequential text. Stacked parentheses work when the flattened sentence still reads correctly, but in this case, GP has a graph-like thought, in that:

  in (most of the times <this> (or deeper)) style
is supposed to represent a graph, where "most of the times" and "or deeper" both descend from "this", and "or deeper" also descends from "most of the times". A DAG like that can't in general be flattened without back references (which would be meta-elements in the text, something natural writing generally doesn't do) or repetition, and the latter will lead to non-grammatical sentences, especially as you trim the DAG down to reduce detail.

Also: while I'm not the guy GP references, I am a guy that does that too - or rather did, at some point in the past, until I realized there's like 5 people in my life who could understand this without an issue, even less who'd indulge me or enjoy communicating this way. So over time, I got back to writing like a normal person[0]; I guess conformity is just less mentally taxing.

--

[0] - Mostly - I still use semicolons and single-depth parentheses a lot, and on HN, also footnotes.


I used to do it a lot myself since it's closer to the thought. But I'm also dyslectic. Getting lost at which stack-depth I'm at while reading made me respect short and to-the-point writing.


Very easy to lose focus even without dyslexia. I found out that you have to “glide” through these stacks rather than trying to reconstruct the tree, because its structure often mirrors the commenter’s stream of thought and its tempo is either somewhat similar to yours or acts as a #clk.


That’s the non-standard part. His parentheses may add context and may serve as proper child nodes or just float there linking to the most semantically relevant parts.


Style is!


no filtering i would say. More stream of thoughts, less structured and planned communication


> “… languages seem inherently sequential, while ideas and knowledge tend to have a more hierarchical structure…”

Careful with your musings, or you might start thinking semiotically!

Diachrony and synchrony

https://en.wikipedia.org/wiki/Diachrony_and_synchrony




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: