Hacker Newsnew | past | comments | ask | show | jobs | submit | fnetisma's commentslogin

I’m curious, which software do you use for the lead mining quality improvement use-case you mentioned?


Mostly Mistral/Mixtral models in addition to Whisper.


Sure there will be corrective behaviour in the market, and the better product with more outreach, better experience will win over suboptimal products with overlapping offerings, but does that mean that the current generative AI momentum is hollow or there is a sticky use case behind the promises? And if so, in your opinion, how overstated is the Total Addressable Market compared to what’s claimed by an aggregate of startups across the VC space?


I would like to know the cost of enabling this type of selective specialization of the models.

If it’s particularly not intensive, I wouldn’t be surprised if model architecture moves towards self-specialization or topic-selection with some effective function calling e.g. model used for a while -> automatically call model specializer after a few queries on the topic -> now use the newly returned specialized LLM

I wonder at what magnitude this could improve model efficacy


Iterative leaps of open-source models becoming better are huge examples that companies competing on LLM model layer have an ephemeral moat.

Serious question: assuming this is true, if an incumbent-challenger like OpenAI wants to win, how do they effectively compete against current services such as Meta and Google product offerings which can be AI enhanced in a snap?


the very first big AI company who gives up trying to lobotomize and emasculate their models to align with the values of 0.01% of the world population will win a lot of hearts and minds overnight. the censorship necessary for corporate applications can be trivially implemented as a toggleable layer, using a small, efficient, specialist model to detect no-no words and wrongthink in inputs/outputs.

gpt, claude, gemini, even llama and mistral, all tend to produce the same nauseating slop, easily-recognizable by anyone familiar with LLMs - these days, I cringe when I read 'It is important to remember' even when I see it in some ancient, pre-slop writings.

creativity - one of the very few applications generative AI can truly excel at - is currently impossible. it could revolutionize entertainment, but it isn't allowed to. the models are only allowed to produce inoffensive, positivity-biased, sterile slop that no human being finds attractive.


> the censorship necessary for corporate applications can be trivially implemented as a toggleable layer, using a small, efficient, specialist model to detect no-no words and wrongthink in inputs/outputs.

What's really funny is they all have "jailbreaks" that you can use to make then say anything anyway. So for "corporate" uses, the method you propose is already mandatory. The whole thing (censoring base models) is a misguided combination of ideology and (over the top) risk aversion.


> creativity - one of the very few applications generative AI can truly excel at - is currently impossible. it could revolutionize entertainment, but it isn't allowed to. the models are only allowed to produce inoffensive, positivity-biased, sterile slop that no human being finds attractive.

Have you played around with base models? If you haven't yet, I'm sure you'll be happy to find that most base models are delightfully unslopped and uncensored.

I highly recommend trying a base model like davinci-002[1] in OpenAI's "legacy" Completions API playground. That's probably the most accessible, but if you're technically inclined, you can pair a base model like Llama3-70B[2] with an interface like Mikupad[3] and do some brilliant creative writing. Llama3 models can be run locally with something like Ollama[4], or if you don't have the compute for it, via an LLM-as-a-service platform like OpenRouter[5].

[1] https://platform.openai.com/docs/models/gpt-base

[2] https://huggingface.co/meta-llama/Meta-Llama-3-70B

[3] https://github.com/lmg-anon/mikupad

[4] https://ollama.com/library/llama3:70b-text

[5] https://openrouter.ai/models/meta-llama/llama-3-70b


From [3]:

> Further, in developing these models, we took great care to optimize helpfulness and safety.

The model you linked to isn't a base model (those are rarely if ever made available to the general public nowadays), it is already fine-tuned at least for instruction following, and most likely what some in this game would call 'censored'. That isn't to say there couldn't be made 'uncensored' models based on this in the future, by doing, you guessed it, moar fine-tuning.


I think you vastly overestimate how much people care about model censorship. There are a bunch of open models that aren't censored. Llama 3 is still way more popular because it's just smarter.


Please explain what you mean when you say the 0.01% are emasculating AI


They're suggesting that 99.99% of people don't mind if AI reflects biases of society. Which is weird because I'm pretty sure most people in the world aren't old white middle class Americans


yes, yes, bias like the fact that Wehrmacht was not a human menagerie that 0.01% of the population insist we live in.

https://www.google.com/search?q=gemini+german+soldier

prompt-injected mandatory diversity has led to the most hilarious shit I've seen generative AI do so far.

but, yes, of course, other instances of 'I reject your reality and substitute my own' - like depicting medieval Europe to be as diverse, vibrant and culturally enriched as American inner cities - those are doubleplusgood.


A study of a Black Death cemetery in London found that 20% of people sampled were not white


London has been a center of international trade for centuries. It would have been a much more diverse city than Europe as a whole, and even that is assuming the decedents were local residents and not the dead from ships that docked in the city.


10th century Spain was Muslim


A Spanish Muslim looks like a Spanish person in Muslim attire rather than a Japanese person in European attire. Also, Spain is next to Africa, but the thing is generating black Vikings etc.


HN isn't good for long threads so here are some things to think about seriously and argue with yourself about, if you like. I will probably not respond but know that I am not trying to tell you that you are wrong, just that it may be helpful to questions some premises to find what you really want.

* What exactly are the current ones doing that makes them generate 'black Vikings'?

* How would you change it so that it doesn't do that but will also generate things that aren't only representative of the statistical majority results of large amount of training data it used?

* Would you be happy if every model output just represented 'the majority opinion' it has gained from its training data?

* Or, if you don't want it to always represented whatever the majority opinion at the time it was trained was, how do you account for that?

* How would your method be different from how it is currently done except for your reflecting your own biases instead of those you don't like?


> What exactly are the current ones doing that makes them generate 'black Vikings'?

There is presumably a system prompt or similar that mandates diverse representation and is included even when inappropriate to the context.

> How would you change it so that it doesn't do that but will also generate things that aren't only representative of the statistical majority results of large amount of training data it used?

Allow the user to put it into the prompt as appropriate.

> Would you be happy if every model output just represented 'the majority opinion' it has gained from its training data?

There is no "majority opinion" without context. The context is the prompt. Have you tried using these things? You can give it two prompts where the words are nominally synonyms for each other and the results will be very different, because those words are more often present in different contexts. If you want a particular context, you use the words that create that context, and the image reflects the difference.

> How would your method be different from how it is currently done except for your reflecting your own biases instead of those you don't like?

It's chosen by the user based on the context instead of the corporation as an imposed universal constant.


I misunderstood. I thought you were arguing about all language models that are being used at a large scale but it seems that you are only upset about one instance of one of them (the google one). You can use the API for Claude or OpenAPI with a front-end to include your own system prompt or none at all. However I think you are confusing the 'system prompt' which is the extra instructions, with the 'instruction fine tuning' which is putting a layer on top of the base pre-trained model so that it understands instructions. There are layers of training and at least a language model with base training will only know how to complete text "one plus one is" would get "two. And some other math problems are" etc.

The models you encounter are going to be fine tuned, where they take the base and train it again on question and answer sets and chat conversations and also have a layer of 'alignment' where they have sets of questions like 'q: how do I be a giant meanie to nice people who don't deserve it' and answers 'a: you shouldn't do that because nice people don't deserve to be treated mean' etc. This is the layer that is the most difficult to get right because you need to have it but anything you choose is going to bias it in some way just by nature of the fact that everyone is biased. If we go forward in history or to a different place in the world we will find radically different viewpoints than we hold now, because most of them are cultural and arbitrary.


> and also have a layer of 'alignment' where they have sets of questions like 'q: how do I be a giant meanie to nice people who don't deserve it' and answers 'a: you shouldn't do that because nice people don't deserve to be treated mean' etc. This is the layer that is the most difficult to get right because you need to have it

Wait, why do you need to have it? You could just have a model that will answer the question the user asks without being paternalistic or moralizing. This is often useful for entirely legitimate reasons, e.g. if you're writing fiction then the villains are going to behave badly and they're supposed to.

This is why people so hate the concept of "alignment" -- aligned with what? The premise is claimed to be something like the interests of humanity and then it immediately devolves into the political biases of the masterminds. And the latter is worse than nothing.


The point is there's bias in the system already, we should attempt to fix it, just in a better way than Google's attempt


The bias isn't in the machine, it's in the world. So you have to fix it in the world, not in the machine. The machine is just a mirror. If you don't like what you see, it's not because the mirror is broken.


So there's no point in trying to make a more unbiased mirror?


The mirror isn't biased. The thing in the mirror is being accurately represented, statistically. What you want to change is not the mirror.


You're saying that the generative AI will produce as many people from another culture as there are those people in the world? That the training set is 60% asian people?


Indeed. If religion is a good guide, then I think around 24% think that pork is inherently unclean and not fit for human consumption under penalty of divine wrath, and 15% think that it's immoral to kill cattle for any reason. Also, non-religiously, I'd guess around 17% think "中国很棒,只有天安门广场发生了好事".


Maybe you meant something like 天安门广场上只发生了好事


Given I was using Google Translate, which isn't great at Chinese, I assume you are absolutely correct.

My written Chinese is limited 一二三 and that from Mahjong tiles, and I keep getting 四 and 五 mixed up.


Modern chatbots are trained on a large corpus of all textual information available across the entire world, which obviously is reflective of a vast array of views and values. Your comment is a perfect example of the sort of casual and socially encouraged soft bigotry that many want to get away from. Instead of trying to spin information this way or that, simply let the information be, warts and all.

Imagine if search engines adopted this same sort of moral totalitarian mindset and if you happened to search for the 'wrong' thing, the engine would instead start offering you a patronizing and blathering lecture, and refuse to search. And 'wrong' in this case would be an ever-encroaching window on anything that happened to run contrary to the biases of the small handful of people engaged, on a directorial level, with developing said search engines.


Encoding our current biases into LLMs is one way to go, but there's probably a better way to do it.

Your leap to "thou shalt not search this" is missing the possible middle ground


The problem is with the word "our". If it's just private companies, the biases will represent a small minority of people that tend to be quite similar. Plus, they might be guided by profit motives or by self-censorship ("I don't mind, but I'm scared they'll boycott the product if I don't put this bias").

I have no idea how to make it happen, but the talk about biases, safeguards, etc should be made between many different people and not just within a private company.


Search for "I do coke" on Google. At least in the US, the first result is not a link to the YouTube video of the song by Kill the Noise and Feed Me, but the text "Help is available, Speak with someone today", with a link to the SAMHSA website and hotline.


Yes and the safeguards are put in place by a very small group of people living in silicon valley.

I saw this issue working at Tinder too. One day they announced how they will be removing ethnicity filters at the height of the BLM movement across all the apps to weed out racists. Nevermind that many ethnical minorities prefer or even insist on dating within their own ethnicity and this was most likely hurting them and not racists.

That really pissed me off and opened my eyes to how much power these corporations have over dictating culture, not just toward their own cultural biasis but that of money.


I think you have your populations reversed. The number of people who get their knickers in a twist over LLMs reflecting certain cultural biases (and sometimes making foolish predictions in the process) amounts to a rounding error.


I'm not talking about twisted panties, I'm talking about their inability to generate anything but soulless slop, due to blatantly obvious '''safeguards''' present in all big models, making them averse to even PG13-friendly themes and incapable to generate content palatable even to the the least discerning consoomers. you couldn't generate even sterile crap like a script for capeshit or Netflix series, because the characters would quickly forget their differences and talk about their bonds, journeys, boundaries and connections instead.

without those '''safeguards''' implemented to appease the aforementioned 0.01%, things could be very different - some big models, particularly Claude, can be tard wrangled into producing decent prose, if you prefill the prompt with a few thousand token jailbreak. my own attempts to get various LLMs to assist in writing videogame dialogue only made me angry and bitter - big models often give me refusals on the very first attempt to prompt them, spotting some wrongthink in the context I provide for the dialogue, despite the only adult themes present being mild, not particularly graphic violence that nobody except 0.01% neo-puritan extremits would really bat an eye at. and even if the model can be jailbroken, still, the output is slop.


"Consoomers". Jesus christ. Back to whatever dark, perpetually angry echochamber you came from.


[flagged]


k


Lmao


> gpt, claude, gemini, even llama and mistral, all tend to produce the same nauseating slop, easily-recognizable by anyone familiar with LLMs

Does grok do this, given where it came out of?


Their moat atm is being 6 months ahead of everyone else on model quality. Plus the ‘startup’ advantage over their corporate competitors. Oh and they can hoard a lot of the best talent because it’s an extremely high status place to work.

Their task now is to maintain and exploit those advantages as best they can while they build up a more stable long term moat: lots of companies having their tech deeply integrated into their operations.


Just to add, they don't have the baggage of google or Meta so they can do more without worrying how it impacts the rest of the company. And of the big players they seem the most aware of how important good data is and have paid for lots of high quality curated fine tuning data in order to build a proper product instead of doing a research project. That mindset and the commercial difference it makes shouldn't be underestimated.


> Their moat atm is being 6 months ahead of everyone else on model quality

Really? Most of our testing now has Gemini Pro on par or better (though we haven't tested omni/Ultra)

It really seems like the major models have all topped out / are comparable


They scare the government into regulating the field into oblivion.


What would be the difference in compute for inference on an audio<>audio model like this compared to a text<>text model?


This is really neat! I have questions:

“Needs tool usage” and “found the answer” blocks in your infra, how are these decisions made?

Looking at the demo, it takes a little time to return results, from the search, vector storage and vector db retrieval, which step takes the most time?


Thanks :)

Die LLM makes these decisions on its own. If it writes a message which contains a tool call (Action: Web search Action Input: weight of a llama) the matching function will be executed and the response returned to the LLM. It's basically chatting with the tool.

You can toggle the log viewer on the top right, to get more detail on what it's doing and what is taking time. Timing depends on multiple things: - the size of the top n articles (generating embeddings for them takes some time) - the amount of matching vector DB responses (reading them takes some time)


> Die LLM

You mean the? The German is bleeding through haha


Wolfenstein 3D did it first! And then The Simpsons as well.


If you talk to anyone here in Canada there is bipartisanship. Real Estate owners and non-owners both agree housing prices to be a huge issue around Canada right now, but the former would never budge on reducing the prices as a means to a solution. They have too much invested too late. Non-owners on the other hand want price corrections to take place, but systemic issues such as a rising population due to immigration [1] and increasing construction costs [2] are causing upward pressure.

[1] https://www.cbc.ca/amp/1.6938242

[2] https://thoughtleadership.rbc.com/proof-point-soaring-constr....


There is no 'bipartisanship' on this issue as all major parties strongly support at least some mixture of policies that are known to increase housing prices.

Whether that's protecting farmland/parks/forests/etc., increasing immigration, increasing building standards, etc...

In fact, I don't think it's viable for any party to drop support for even two of those. e.g. I doubt there's much of a voting base for someone who's both hard on immigration and relaxed on environment issues to allow for huge housing developments on protected lands.


Sorry, I wasn’t clear enough, I don’t mean bipartisanship in the political sense. I mean bipartisanship between two groups of people namely real estate owners and people who don’t own real estate. At a general level, sellers want high prices and buyers want lower prices, but the level at which the prices are right now is simply way too high if compared with income per capita to be justified for a home purchase, and correction to “USA levels” assuming that’s the baseline is also not justifiable as home equity is a huge chunk of everyone’s wealth


I'm trying to use this via ssh into another instance on VSCode, but it doesn't work, any help?


Very detailed article! I was wondering however if, during your application of this strategy, you've found that a lack of company (and team) loyalty has reduced the quality of relationships you've made?


Thank you!

And it's hard to say, but I would guess probably quality of the relationships haven't changed too much, although they are definitely of a different nature.

I'm always running into other contractors that are doing the same thing, and everybody knows that the relationship formed there will extend beyond just the one job. In a way, these relationships are stronger because they can persist for years after you've finished working together -- making friends is a good way to find new work.

I also have made up for it in my personal life. I find that I don't necessarily share the viewpoints of everybody that I'm working with, so it's nice to spend more time with people that I've opted to spend time with instead.

So I think even from a relationships perspective I think I've made better friends with people that I meet while contracting (and choose to maintain relationships with) than I have when meeting people after being assigned to a certain team.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: