Pronouncing numbers
- From: Don Y <not@xxxxxxx>
- Date: Tue, 15 May 2012 00:09:55 -0700
Hi,
I'm trying to settle on a pronunciation algorithm for
numeric strings (e.g., "1234567.89") for my *backup*
speech synthesizer (i.e., the synthesizer that MUST WORK
regardless of what else might *not* be working -- think
of this as the mechanism by which panic() messages are
emitted).
The backup synthesizer is more severely resource constrained.
OTOH, it also doesn't need to be as clever/effective! The
range of messages that it has to emit is less "general".
Yet, it doesn't want to be deliberately *stilted* just
for the sake of efficiency!
E.g., the above value could be spoken as:
"one two three four five six seven point eight nine"
"one two three, four five six, seven, point eight nine"
"one, two three four, five six seven, point eight nine"
"one million, two hundred thirty four thousand, five hundred
sixty seven, point eighty nine"
There are a couple of issues, here. The first is the
cost of emitting the "translation". The second is the
"intelligibility" of the spoken output.
I.e., if someone *read* these four different interpretations
to you, which could you most easily "write down"? Which
could you most easily *remember* (even if you don't remember
all of the actual digits, can you recall the magnitude of
the number -- obviously, my 1234... example is artificial
and easy to remember).
Might there be a hybrid approach that improves intelligibility
(at some increased implementation expense)? E.g., recite
individual digits for values greater than 1,000 and more
"verbose" representations for smaller values (like "two hundred
and five").
Or, ways by which hints can be passed to the synthesizer
*without* explicitly passing directives. For example:
[say_as_digits] 1234567.89
vs.
[say_verbose] 1234567.89
are less desirable than counterparts like:
1234567.89
1,234,567.89
(the latter encoding the "verbose" flag by the presence of the
group separators)
Consider, also, how things like inflection might be affected
by these different presentations (e.g., you can think of
the verbose presentation as tending to carry more inflection
than the "as digits" -- which might end up sounding monotonic).
Also, think about how different the audio channel is than
the visual channel. While I can include provisions to let the
user repeat/review the message, you really don't want to
*need* to do this as it indicates a communication deficiency
in the design!
.
- Follow-Ups:
- Re: Pronouncing numbers
- From: Syd Rumpo
- Re: Pronouncing numbers
- Prev by Date: Re: Atmel, ATMEGA88-20AU. 32 TQFP package.
- Next by Date: Re: Pronouncing numbers
- Previous by thread: CAN <--> USB Adapter
- Next by thread: Re: Pronouncing numbers
- Index(es):