255 word Speech IC

hippy

Technical Support
Staff member
Think you may have lost some digits in your URL ...

https://www.ebay.com/itm/VINTAGE-GI-SPEECH-VOICE-SYNTHESIZER-IC-SP0256A-AL2-CTS256A-AL2-Text-to-Speech/273360299040

As you say it was cool back in its day. There was a Speech ROM for the BBC Micro which was very impressive. Have occasionally looked for the code for that but have never found it.

Modern speech engines are a lot better, actually human sounding, though most of the stuff for the Pi is rather robotic. Google's AIY Voice Kit stuff is probably good if one can figure out how to use it.

Another 'free' option is to use Google's picoTTS which is included in Android phones. MIT App Inventor 2 can easily interface to that and it sounds excellent. The problem there is how to interface to an Android phone.
 

Buzby

Senior Member
I furtled around in a dark crevice, and found my antique talkingbox.

Plugged it in to hear it speak after 30 years of silence.

Pressed the reset button, but instead of hearing "Ready" all I got was a quiet hiss.

Should I bother trying to find out why it's gone dumb ?

20180808_202817[1].png
 

tmfkam

Senior Member
The engineer in me would insist that I at least determined the cause of failure, preferably repairing it before another 30 years of silence.

I couldn't "just" put it back knowing it wasn't working. It would keep me awake at night!
 

Buzby

Senior Member
Hi All,

Well, I tried to find the fault, and it looks like one of the port pins on the CTS256 is only dropping about 0.5v below Vcc, not all the way to 0v like the others.

The pin is used as part of the address for a 74138 which decodes the chip selects for the extra RAM and the protocol switches.

I could maybe remove the 74138, or just lock it's inputs, to hard-code the single address needed for the SPO256. This would lose the RAM buffer, so no long phrases, and limit the serial port to the default setting.

Or I could buy a new CTS256, if I could find one at a decent price.

Or, and this is the off-the-wall solution, upload the code from the CTS256 and programme a new chip !.

The CTS256 is a PIC7041, but I can't find much data on this device. I don't know if it's even a PIC as we know, or some other device family.

If the code could be uploaded it would be a challenging job to translate to BASIC, but this would then allow a PICAXE to replace the CTS256 !.

What does the panel think ?

Cheers,

Buzby
 

hippy

Technical Support
Staff member
The CTS256 is a PIC7041, but I can't find much data on this device. I don't know if it's even a PIC as we know, or some other device family.
Seems to be a clone or equivalent of a Texas TMS7041. Texas chips were used a lot in products which supported speech such as Speak and Spell.

There should still be a fair amount of info out there on those.

If the code could be uploaded it would be a challenging job to translate to BASIC, but this would then allow a PICAXE to replace the CTS256 !.
I am not sure if the chip would even allow program extraction. Most were Masked ROM designs back in the day.

But it may be possible to discover the algorithm and I recall a good deal of that was in the public domain.

It is basically pattern matching but, because it's processing words of text, it might not be well suited to a PICAXE which doesn't have good string handling. Something probably could be done though.

I would probably start with a PC Basic or Python program prove the algorithm to convert text to allophones and then port that over to a PICAXE.

"Twos", as in "Blues and Twos", was always one I found tripped simple TTS up. Usually coming out as T-WOZE rather than TOOZ. "Isle" often ended up as IZ-UL.

The easy option is to have all your text in a file, have a pre-processor which converts that to allophones, then include and send those straight to the SPO256 or allophone chip being used.
 

hippy

Technical Support
Staff member
Here's the "Automatic Translation of English Text to Phonetics by Means of Letter-to-Sound Rules", the Naval Research Lab paper which the CTS256 algorithm is based upon ...

www.dtic.mil/dtic/tr/fulltext/u2/a021929.pdf

Haven't studied it in any depth but it seems to include the rules they used. The CTS256 looks to be an on-the-fly lookahead matching algorithm so it could be fairly easy to implement on a PICAXE.
 

Buzby

Senior Member
Hi hippy,

Thanks for the links. After reading through the NRL document I think I'll just do the simple hardware fix !.

There could be another way to build a text-to-allophone translator, use Machine Learning. That's how Google Translate works nowadays, and I've looked at the Google Cloud AI Service before and thought 'What could I do with that ?'. ( See : https://cloud.google.com/products/ai/ and https://www.tensorflow.org/ )

Unfortunately the learning curve is a bit beyond me. I'd only take it on if I had a real need, or lots of spare time !.

Cheers,

Buzby
 

hippy

Technical Support
Staff member
Bodging a fix is probably the most pragmatic solution.

I have now read the NRL paper and it is quite neat. The algorithm is quite simple and is possible to implement on a PICAXE. The problem is more the number of rules and the number of data bytes needed to store them. It would probably be necessary to use external I2C EEPROM to hold the rules or to use a 28X2 which has multiple internal slots.

Not having much program and data memory seems to be a problem the NRL also faced. Not surprising as it was back in 1976. They developed some rules, tested those, checked for failures altered the rules to fix them. The telling phrase is "The additions and alterations continued until the accumulation of changes made the interactions between rules hard to keep track of".

A more modern approach would be to have raw test input and the pronunciation dictionary, run the program and see what the results were, have the computer itself decide what rules need to be added, removed or altered to make the results better. Just keep doing that until it's as good as it's going to get. That I would imagine is the basis of the Machine Learning approach, though that probably also allows going outside a fixed algorithm.

More room for rules allows more whole words to be defined when needed so it probably tends more towards a dictionary look-up with only those not found having to be determined. That would likely be what the good systems do when having multi-megabyte data isn't a problem.

Back when NRL were doing their research the problem was how to do it without requiring huge amounts of memory.

It is tempting to try it with a limited set of rules to see how well a PICAXE could do it. The framework is pretty simple ...

Code:
SetText( " Hello PICAXE ." )

charPtr = 1
Do
  Select Case @charPtr
    Case "." : SerTxd( "Done" )
               End
    Case " " : SerTxd( " " )
               charPtr = charPtr + 1
    Else     : Gosub MatchThis
  End Select
Loop

MatchThis:
  rulePtr = 0
  Gosub CheckMatch
  Do Until matched <> 0
    Gosub NextRule
    Gosub CheckMatch
  Loop
  Gosub ShowPhoneme
  charPtr = charPtr + matched
  Return
 

Buzby

Senior Member
I'm sure a PICAXE could do this, but it will be quite a challenge.

One thing that will be needed for development is a tool to work with rules easily. Whatever form the rules are stored in the PICAXE memory, they will be a pain to work with. Maybe have an Excel speadsheet for editing rules, with a macro to output the rule table in a form that can be pasted in to a PICAXE prog.

This would be really good, as it would mean PICAXE users in different countries could tweak the rules to suit their local accents, or even ( with lots of work ) different languages !. ( I notices in the NRL document, pages 3 & 11, that the NRL work is based on previous work done in Keele University, but tweaked for American accents. )

This could be a cool project, if there was a ready supply of SPO256 chips to do the talking. Unfortunatley, according to http://www.smbaker.com/counterfeitfakejustplainbad-sp0256a-al2-chips there appears to be a lot of rubbish on eBay.

Cheers,

Buzby
 

hippy

Technical Support
Staff member
I'm sure a PICAXE could do this, but it will be quite a challenge.
Depends how occupied one's mind is while out Saturday shopping :)

Tada! PICAXE Text-to-Speech. Ish.

Runs in the PE6 simulator so doesn't need a PICAXE to test. It doesn't implement the full NRL algorithm. Certainly not all their or anyone else's rules. Nor generate IPA. But it is enough to turn "Catacomb Combing Combs" into "KAT-A-KOOM KOME-ING KOMEZ".

The biggest problem is the rule size. I cannot even fit all the single letter rules into PICAXE internal EEPROM. Though storage could be optimised it wouldn't really gain much.

So Text-To-Speech turns out to be a five part problem -

1) Having an algorithm to turn text into phonemes, graphemes, or allophones.
2) Having the rules to do that.
3) Having something to implement that on.
4) Having something to speak those phonemes, graphemes, or allophones.
5) Doing that well.
 

Attachments

Buzby

Senior Member
Depends how occupied one's mind is while out Saturday shopping :)
Well, I've been out for a nice Italian meal with the Mrs and lots of wine, and are now packing cases for our holiday tomorrow, so I think I can be excused from not producing such an impressive piece of code like wot you've done !.

I'm still looking at page 51 of NRL, trying to determine what each rule is actually saying. E.g. What is the difference between :
' [ARR] =/AX R/\'
and
'[ARR]=/AE R/\'

Anyway, before I went out, I did start looking into how to store the rules. Most rules can be stored in 7 or 8 bytes without to much complex compresion, so one #slot should hold about 200 rules.

I'm concentrating on the rule storage format, as that will be one of the core elements of a PICAXE-TTS.

A PICAXE-TTS ?. Are we going to build one ?

Cheers,

Buzby
 

hippy

Technical Support
Staff member
E.g. What is the difference between :
' [ARR] =/AX R/\'
and
'[ARR]=/AE R/\'
Leading and trailing space. so "ARE" as a standalone word sounds different to when used within the word; "these ARE spAREs", "ah" versus "air", or AX versus AE as they'd have it.

A PICAXE-TTS ?. Are we going to build one ?
Good question. I'm tempted to steal the SAM rules and see what I can do with the PICAXE code ported to run on a PC.
 
Last edited:

Buzby

Senior Member
Leading and trailing space.
I thought there was no point in leading/trailing spaces, as each word is separated by a space, so a rule with no prefix means 'the start of a word'

I'm tempted to steal the SAM rules and see what I can do with the PICAXE code ported to run on a PC.
I looked at SAM, but couldn't get my head round it. I'm still more in tune with 1975 and NRL, and I want it to run on a PICAXE, not a PC.

Cheers,

Buzby
 

hippy

Technical Support
Staff member
I thought there was no point in leading/trailing spaces, as each word is separated by a space, so a rule with no prefix means 'the start of a word'
No, it's the other way round. The leading and trailing spaces match with those separators. Without any prefix or postfix in the rule they match anything starting from where the text ptr is.

So " [ARE] " matches only "<space>ARE<space>". "#[ARE]" matches any "<vowel>ARE", "[ARE]#" matches any "ARE<vowel>", while just a lonesome "[ARE]" matches "<any>ARE<any>".

The other way of thinking of it is the text ptr is at "ARE" in the buffer, the leading space in the rule checks there was a space before the text ptr ( ie, ARE is start of word ) and the trailing space in the rule checks there's a space after the ARE in the buffer ( ie, ARE is the end of the word ). With both, must be start and end of a word, it's a standalone word.

I must admit I didn't think it was clearly explained. It was the need for the leading space in the text buffer which made me realise what was going on. Twigging the rule prefix referred to what is prior to text ptr. It's effectively 'if this word has followed one of these; it changes to this' modifier.

Last para on "page 9" ( page 14 of PDF ) onwards shows how it works though that was as clear as mud on my first reading. And I'm not convinced "ratio" is pronounced "ray-show" rather than "ray-she-oh".
 
Last edited:

Buzby

Senior Member
Thanks for the clarification. Now I can see I've got the very first line of my Excel wrong !.

The space between ] and = in the first A.rule matters, it's not just whitespace.

( Whitespace ?. Have you seen the Whitespace programming language, where only tabs and spaces are valid, and all alphanumeric is ignored ?. Long gone now, it's only available on the Wayback Machine archive : http://web.archive.org/web/20150424165140/http://compsoc.dur.ac.uk/whitespace/index.php )

Too late tonight for me to do any more.

Cheers,

Buzby
 

hippy

Technical Support
Staff member
I ported the PICAXE code to Python and that runs well on my PC. I also found "Deep Throat" which also appears to be based on NRL. That has rules which are more easily extracted -

https://pypi.org/project/deep_throat

Try to ignore the single example some might consider offensive which the author appears to have decided must be included.

I am not sure if Deep Throat rules are in the right order for my algorithm and I have not yet implemented the rule modifier handling. It's not exactly clear to me what some rule modifiers even mean. For example ":" is defined as "Zero or more consonants". One could take that to mean "anything or nothing" but it surely doesn't. Does it mean "skip any consonants", "must be start/end of word or only consonants to start/end of word", or something else ?

For "#'" being "One or more vowels", one can ask how's that any different to one vowel ? I'm guessing that means a vowel must follow and then consume all sequential vowels. There's some work to do yet.

One other thing I did find is; don't copy and paste from that NRL document without checking the result. Quite often something other than visually shown will be pasted!
 

Buzby

Senior Member
Hi hippy,

Sorry, not been working on this for last few days, but have been thinking !.

The parts of the rules, such as '1 or more vowels', need to be evaluated in order to match the rule. The NRL rules use ten of these parts. If each posible test is assigned a bit in the rule, then the relevant tests can be called easily as each rule is decoded.

I envisage a memory structure something like '<prefix flagsbyte><String><postfix flagbytes><allophone bytes>
The 'prefix byte' can also hold some length flags, so the algorithm can rapidly move the pointer to the next rule.
Even just looking at the 'A' rules shows that there are a lot of sub rules triggered by the prefix/postfix requirements, so optimising these is a must.

On another tack, I've thought of how my 'broken' CTS256 can still play a part in this task.

Use a simple PICAXE prog to output strings to the CTS, and then the same PICAXE captures the allophone addresses from the CTS. This would be a useful tool to compare a 'real' CTS with the PICAXE-TTS.

I've not been able to see the rules in Deep Throat, my PC has stopped unzipping .gz files. Can you post them ?

Regarding 'copy & paste' from the NRL doc, I got just loads of rubbish. It looked like the function was using some kind of OCR which couldn't handle the poor scan quality. If I can't find a machine readble version I'll just have to type them all in !.

Cheers,

Buzby
 

hippy

Technical Support
Staff member
Sorry, not been working on this for last few days, but have been thinking !.
It is rather addictive isn't it.

I think I have got most of the rule handling figured out except for "%" being a suffix, "One of ER, E, ES, ED, ING, ELY". Deep Throat uses "EL" rather than "ELY". I figure that has an implied 'only applies when end of word' or "E<anything>" would match with "E", wouldn't need "ER" etc being in there, but haven't decided yet.

One thing I am going to do is substitute rule spaces with underscore because those invisible spaces do start to become a bit of a pain.

On another tack, I've thought of how my 'broken' CTS256 can still play a part in this task.

Use a simple PICAXE prog to output strings to the CTS, and then the same PICAXE captures the allophone addresses from the CTS. This would be a useful tool to compare a 'real' CTS with the PICAXE-TTS.
That is a really good idea.

I've not been able to see the rules in Deep Throat, my PC has stopped unzipping .gz files. Can you post them ?
No problem ...
 

Attachments

hippy

Technical Support
Staff member
A PICAXE-TTS ?. Are we going to build one ?
As interesting, entertaining and educational as all this is and has been; I think the answer for me is no.

There appears to be two good text-to-speech solutions for the Raspberry Pi; a 'pico2wave' utility which I understand is based on Google's Pico TTS engine and is standalone, but not as good as Pico TTS included with Android, plus 'gTTS' libraries which use Google's on-lne API which is about as good as it gets but does require an on-line connection and I recall access and usage is not unlimited.

As the PICAXE can easily interface to a Pi, though an AXE027 cable, or potentially its UART, it would seem to make more sense to use a Pi as a CTS256/SPO256 combo than try to create a CTS256 oneself. It's easier, simpler, not much more expensive and probably cheaper than the original chips are these days and with better results.

A simple Python program should be able to read the serial, split that into sentences, convert words like PICAXE to "pickaxe" so they get pronounced as desired, not as "pick-ix" for example. Concatenate the audio files generated and play the lot via analogue output.

That's a whole lot easier than trying to figure out numerous rules and algorithms; let someone else do the heavy lifting. It might be possible to use the 'gTTS' on-line API when available and fall-back to 'pico2wave' when not which is perhaps the best of all worlds.

I've included a short snippet of 'pico2wave' generated speech. Not as good as 'gTTS' but it isn't bad.
 

Attachments

newplumber

Senior Member
Hello all

I finally have sometime to read/learn (pretend to learn) from the best
(I've been busy pretending to know plumbing hoping people pay the bills)
also thanks hippy for the picaxe speech zip i will check it out sometime
and also thanks for the help coding my rgb clock which everyone that comes over thinks I'm smart but
little do they(my friends) know I use the best info backup called picaxe forum
so hopefully I will have a chance to gain more on text to speech when work slows down around here
your most expensive plumber friend
Mark
 
Top