More

AustinZzx · on Feb 21, 2024

We are actively working on that. Thanks for the support.

AustinZzx · on Feb 21, 2024

Could you elaborate a bit on "my speech engine tied to a specific model provider"? Sorry, I might be lacking some context on what you are referring to here.

CuriouslyC · on Feb 21, 2024

I will be in the market for a text-to-speech engine, but from looking at the website it seems the model of Retell is trying to push is "use our all in one model + text to speech service" which is problematic when my choice of model and control over how that model runs is at the core of my product, and text to speech is a "nice to have" feature. I want an endpoint that I can fire off text to in a streaming mode, where it'll buffer that streaming text a little and then stream out a beautiful, natural sounding voice with appropriate emotion and intonation in a voice of my design. I'm sure I'm not really Retell's ideal customer, and they're going after lucrative "all in one" customers that just want to build on top of a batteries-included product.

yanyan_evie · on Feb 21, 2024

If you are looking for a text to speech solution, you could use elevenlabs turbo model.

AustinZzx · on Feb 21, 2024

You certainly could, given you play video games for long enough to gather the needed data lol.

AustinZzx · on Feb 21, 2024

The demo uses a simple gpt 3.5 turbo.

AustinZzx · on Feb 21, 2024

This can be handled with function calling and other features in LLM. We support the input signal of closing the call, and you can have your rule-based (timer) system or LLM-based end call functionality and use that to hang up.

AustinZzx · on Feb 21, 2024

Thanks for the support, we still have a lot of work ahead of us to make it better!

AustinZzx · on Feb 21, 2024

Thanks for the support. Means a lot to us.

AustinZzx · on Feb 21, 2024

Good point. Currently, our product does not contain LLM, as we are purely voice API -- instead the developer is bringing in their own LLM solutions and gets to decide what to say. This would be a great guardrail to build in for all sorts of reasons, will see how we can suggest our users adopt it.

samstave · on Feb 21, 2024

May I please understand your arch;

a dev builds an app it | to your API and you spit it back out? - if so - ensure when you spit out whatever it defines itself to whomever is listening....

--

Plz explain the arch of how your system works? (or link me if I missed..)

----

Shortest and most importnat law ever written:

"an AI must identify itself as AI when asked by Humans."

0. Law of robotics.

------

@autsin

-

Cool - so im on an important call with [your customer] your system has an outage?

How is this handled? dropped call?

(I am not being cynical - im being someone who is allergic to post mortems.

----

EDIT: you need to stop using the term "user" in anything you market or describe. full stop.

the reason: in the case of your product, the USER is the motherhecker on the phone listening to anything your CUSTOMER is spewing at them VIA your API.

the USER is who is making IRL *>>>DECISIONS<<<* based on what they hear from your system.

Your CUSTOMER is from whom you receive money.

THEIR customer, is whom they get money to pay you.

The USER is the end-point Human. who doesnt even know you exist.

AustinZzx · on Feb 21, 2024

We handle the audio bytes in / out, and also connect to your user's server for response. We handle the interaction and decide when to listen and when to talk, and send live updates to our users. When a response is needed, we ask for it and get it from our user.

Our homepage https://www.retellai.com/ has a GIF on it that illustrates this point.

AustinZzx · on Feb 21, 2024

Nice catch on the working -- customer is indeed more accurate than user.

For outage handling: we strive to keep up 99.9 plus up time, and in the case of a dropped call, the agent would hang up if using phone, and might have different error handling in web depending on how customer handles it.

AustinZzx · on Feb 21, 2024

We strive to make conversation humanlike, so maybe less contact center ops development, but more focus on performance and customizability of voice interactions. As a startup, our edge over big tech is being nimble and executing fast.

esafak · on Feb 21, 2024

I would keep working on positioning; I feel that your language is woolly at times:

> we focus most of our energy on innovating the AI conversation experience, making it more magical day by day. We pride ourselves on wowing our customers when they experience our product themselves.

This is not useful; you already have testimonials to show what customers think.

Maybe convert that first FAQ point about differentiation into a table comparing you against the closest competitors. Since you talk about performance you should measure it. Use a standard benchmark if there is one for your field.

AustinZzx · on Feb 21, 2024

Good point, note token. benchmarking is a great tool to show differentiation. BTW, apart from what we think is important ourselves (latency, mean opinion score, etc), would you mind sharing what you want to see in such a benchmark? One key metric I like to keep an eye on is the end conversion rate of using the product, but that's very use-case specific.

AustinZzx · on Feb 21, 2024

For TTS, we are currently integrating with different providers like Elevenlabs, Openai TTS, etc. We do have plans down the road to train our own TTS model.

gfodor · on Feb 21, 2024

Ah thank you! What's the lowest latency option you have found so far?

AustinZzx · on Feb 21, 2024

Deepgram TTS is pretty fast, but they have not publicly launched yet.

gfodor · on Feb 22, 2024