Emulating musical tones using cassette out - help!

Emulator and emulator development specific topics
Post Reply
User avatar
kpalser
Posts: 80
Joined: Sun Jun 03, 2012 2:18 pm
Location: Dundee, Scotland

Emulating musical tones using cassette out - help!

Post by kpalser »

Hi All,

I’m in the process of adding sound support to my ZX81 iOS emulator app that will be compatible with programs like Kelly Murta's 2005 ZX81 Music Interpreter. The emulator has already had AY support for about 2 years now. To be frank, there are too many holes in my understanding of the mechanics of turning a simple ZX81 cassette out high or low to the end result of musical pitches.

So the purpose of this post is to organise my thoughts and see if you can fill in the gaps in my knowledge. The two other emulators I have been reviewing, EightOne and SZ81, both are based on variants of the sound code source file by Ian Collier. From the comments in the file it looks like the code was originally for the Spectrum Beeper.


Basics:
cassette out low = IN FEh
cassette out high = any OUT

I see that the EightyOne emulator has the condition:
IN if (!(Address&1)) THEN sound_beeper(0);

Whereas SZX81:
if (!NMI_generator && VSYNC_state == 0) && IN FE THEN sound_beeper(1);

Related questions:
1. It looks like the SZ81 implementation has the sound_beeper(0); / sound_beeper(1); conditions are incorrectly inverted. But I don’t think this would have much impact on creating musical tones so long as the wavelength frequency (pitch) is correct. Would you concur?
2. For the cassette out low trigger would it be more correct to say that the NMI also needs to be off? In addition to the SZ81 emulator, I have seen other documentation that suggests the NMI would need to be off.


Spectrum Beeper / ZX81 Cassette out:
From looking at code and comments in source file by Ian Collier it looks like when the low or high level is set the wave amplitude returns back to a silent level. In most emulators of 8 bit systems the sound silent state is 0x80 (128) and anything below this is the low state and above this is the high state. And then as emulated cycle time passes (so long as there are no low/high triggering) the amplitude is incremented / decremented to return to the 0x80 silent position.

Related questions:
3. If a ZX81 user program does not alternate between low and high triggers and say instead does a series of consecutive IN FEh (cassette low) operations without cassette high triggers, will a real ZX81 computer keep setting the amplitude to a low displacement? The reason I ask is that Ian Collier’s code seems to ignore consecutive low or high triggers and instead lets the amplitude return back to a silent level based on when it was last flipped.
4. I assume that the cycles between low and high triggering are critical for determining the pitch. But since the rate that the wave amplitude naturally returns back to a silent level cannot be controlled, wouldn’t the wave form look more like my attached figure ZX81 figure if the high / low are tightly pinched to control the pitch?
IMG_1610.jpg

Ian Collier’s code:
There’s a few things that don’t make sense in the beeper code. I found the AY code far easier to follow. So I was wondering if you could shed some light on the code. My own code comments are included the the '//' prefix and not the existing '/*' enclosed comments. The code that I do not understand is in labelled with /* START/END ????? */ blocks (cannot use text colour in code comments to make it clearer).

Code: Select all

void sound_beeper(int on)
{
        unsigned char *ptr;
        int newpos,subpos;
        int val,subval;
        int f;

	// straight forward - just set the amplitude to the volume level with 128/0x80 being
	// the halfway point silence level emulator_beeper_volume could be in a range of 0 to 31.
        val = (on ? 128 + emulator_beeper_volume : 128 - emulator_beeper_volume);

	// see question three above - ignore any beeper triggering
        if (val == sound_oldval_orig) {
		return;
	}

	// frametstates = the number of emulated cycles executed in the current emulated 
	// video display frame.

	// sound_framesiz = sound_freq / 50, where sound_freq is the out frequency of the 
	// emulated device. It’s only 22050 for eightOne but in modern iOS devices for my
	// emulator this would be 48000, i.e. 48.0Khz

        // 65000 = the number of clock cycles that you would expect per video frame on a 50Hz 
        // PAL display system, i.e.
	// 3.25 MHz / 50 = 3250000 / 50 = 65000
	
	// The following is to locate the position in the data buffer for the current sound frame 
	// based on the number of cycles that have passed. It is worth noting that in a tight loop
	// with closely spaced IN FEh or OUT triggers the same newpos value may be calculated
	// because once position in the sound buffer could cover a number of instructions being 
	// executed by the CPU, e.g.:
	// 3.25 MHz = 3250000 so 3250000 / 48000 = 67.7 Tstates can be executed for every 
	// sound data point
        newpos = (frametstates * sound_framesiz) / 65000;

	
	/* START ????? */
	
	// this I really don’t understand - especially the significance of the beeper volume
	
        subpos = (frametstates * sound_framesiz * emulator_beeper_volume) / 65000 - emulator_beeper_volume * newpos;
        
        /* if we already wrote here, adjust the level. */
        if (newpos == sound_oldpos) {
                /* adjust it as if the rest of the sample period were all in
                 * the new state. (Often it will be, but if not, we'll fix
                 * it later by doing this again.)
                 */
                if (on) {
                        beeper_last_subpos += emulator_beeper_volume - subpos;
                } else {
                        beeper_last_subpos -= emulator_beeper_volume - subpos;
		}
        } else {
                beeper_last_subpos = (on ? emulator_beeper_volume - subpos:subpos);
	}

        // AMPL_BEEPER = 31
        subval = 128 - AMPL_BEEPER + beeper_last_subpos;
        
        /* END ????? */
        
        if (newpos >= 0) {

                // fill gap from previous position - this is easy enough to follow
                ptr = sound_buf + sound_fillpos;
                for (f=sound_fillpos; f<newpos && f <sound_framesiz; f++) {
			// BEEPER_OLDVAL_ADJUST is a simple macro that gradually returns sound_oldval back to
			// the silent level (0x80) as per the Spectrum Beeper / ZX81 Cassette out section above
                        BEEPER_OLDVAL_ADJUST;
                        *ptr++ = sound_oldval;
                }

                /* START ????? */

		// this again I really don’t get - surely all the above code to arrive at the subval
		// would only be applied to a single data position in the sound buffer and it importance
		// would be negligible in the long stream of thousands of sound data points that would
		// follow. I could understand if its value updated the sound_oldval because it would
		// impact all the subsequent sound points. 
                if (newpos < sound_framesiz) {

                        /* newpos may be less than sound_fillpos, so... */
                        ptr = sound_buf + newpos;

                        /* limit subval in case of faded beeper level,
                         * to avoid slight spikes on ordinary tones.
                         */
                        if ((sound_oldval < 128 && subval < sound_oldval) || (sound_oldval >= 128 && subval > sound_oldval)) {
                        		subval=sound_oldval;
			}

                        /* write subsample value */
                        *ptr = subval;
                }
                
                /* END ????? */
        }

	// subval is not passed into the next phase of processing
        sound_oldpos = newpos;
        sound_fillpos = newpos + 1;
        sound_oldval = sound_oldval_orig = val;
}
Related questions:
5. What is the purpose of the code between /* START ????? */ and /* END ????? */?

Thanks in advance for any response to any of the above points.
Kevin
olofsen
Posts: 189
Joined: Wed Jan 08, 2014 12:29 pm

Re: Emulating musical tones using cassette out - help!

Post by olofsen »

Some quick thoughts:
  1. Indeed with IN the signal should go low, and with OUT high, which would suggest the sound_beeper argument in sz81 is reversed.
  2. And indeed INs (with an even port number) are prevented to cause VSYNCs when a program is running, which is with NMIs in SLOW mode.
  3. The IN/OUT works as a flip-flop, and multiple INs or OUTs would then seem to work as single ones, apart from timing.
  4. The waveform created by the beeper code is probably meant to reflect an RC network between the TTL signal and cassette OUT (a high-pass filter) rather than trying to have a certain and controllable pitch.
  5. If the signal is taken to decrease linearly (rather than exponentially) to the silence level, the beeper volume may perhaps be used to efficiently calculate the range of positions in the waveform where the level needs to be adjusted.
User avatar
1024MAK
Posts: 5103
Joined: Mon Sep 26, 2011 10:56 am
Location: Looking forward to summer in Somerset, UK...

Re: Emulating musical tones using cassette out - help!

Post by 1024MAK »

In the ZX Spectrum, the bleeper output is a single digital output (from the ULA) controlling the speaker. For issue 3 and later boards, the ULA controls a NPN transistor, which in turn controls the speaker. Hence it is simply an ‘on’ / ‘off’ control. There is no mid-state. Sound waves are only formed if the program continually toggles the digital output. The time delay between toggles (one complete AC waveform cycle, LOW, HIGH, LOW) gives the frequency or pitch of the note.

With a ZX80 or ZX81, there is no dedicated digital output port for the mic output. Instead (to keep costs low) the sync signal is used. This comes from a flip-flop. The output of which can only be a TTL logic level high or a logic low. Again, there is no mid-state. Again, sound waves are only formed if the program continually toggles the flip-flop output. The time delay between the output of the flip-flop changing state gives the frequency or pitch of the note. The NMI generator needs to be switched off.

To get the SYNC signal to go low (and hence the mic output to go low), the CPU has to read an I/O port where address line A0 is low (so IN 0XFE to avoid affecting other I/O hardware). To get the SYNC signal to go high and hence the mic output to go high), the CPU has to write to any output port (so OUT 0xFF to avoid affecting other I/O hardware).

Now, between the SYNC TTL signal and the mic socket, there is a filter network of resistors and capacitors. These act to both round off the sharp edges of the TTL signal as it changes state (high to low or low to high) and acts as a crude frequency filter. This is why the code for emulator includes a time delay before to goes to a ‘silent’ level.

Does this help?

Mark
ZX81 Variations
ZX81 Chip Pin-outs
ZX81 Video Transistor Buffer Amp

:!: Standby alert :!:
There are four lights!
Step up to red alert. Sir, are you absolutely sure? It does mean changing the bulb :!:
Looking forward to summer later in the year.
User avatar
kpalser
Posts: 80
Joined: Sun Jun 03, 2012 2:18 pm
Location: Dundee, Scotland

Re: Emulating musical tones using cassette out - help!

Post by kpalser »

Hi Erik & Mark,

Thank you both for responding. I see things more clearer now.
olofsen wrote: Mon Jan 20, 2020 11:52 pm Some quick thoughts:
5. If the signal is taken to decrease linearly (rather than exponentially) to the silence level, the beeper volume may perhaps be used to efficiently calculate the range of positions in the waveform where the level needs to be adjusted.
Answers 1 to 3 were clear and helped reinforce that I am working on their correct understanding. For 4, I'd need to sit down and review the implications. For 5, Interesting but still one aspect that is troubling me a bout the beeper code - see the end of this reply post below.

1024MAK wrote: Tue Jan 21, 2020 12:50 am With a ZX80 or ZX81, there is no dedicated digital output port for the mic output. Instead (to keep costs low) the sync signal is used. This comes from a flip-flop. The output of which can only be a TTL logic level high or a logic low. Again, there is no mid-state. Again, sound waves are only formed if the program continually toggles the flip-flop output. The time delay between the output of the flip-flop changing state gives the frequency or pitch of the note. The NMI generator needs to be switched off.
I get that the ZX81 only sets a high or low state. So which of the following would it be more accurate to say for the real non-emulation world?:
  1. The silent mid-state is only ever reached through transitions between high and low states. And that it would naturally fall back returning to a silent sate. OR
  2. The sound wave would eventually fall back to the silent mid-state if there are no further high or low state alternations (i.e. all INs and OUTs were to cease).
1024MAK wrote: Tue Jan 21, 2020 12:50 am Does this help?
My remaining question about the sound_beeper(int) function - the strangest aspect seems to be that a large portion of its code is connected calculating the subval but it is not put back into variables that are used to fill the sound data position as the wave amplitude returns to the silent mid-state value. In fact it only seems to be used once per high / low state change at the newpos point. I would understand if its value was then used as a basis for the rest of the intermediate fill data but it seems quite disconnected from the BEEPER_OLDVAL_ADJUST and val / sound_oldval.

One lead sound data amplitude point amongst hundreds of others does no make much sense. Or am I reading the code incorrectly?

Thanks again,
Kevin
User avatar
1024MAK
Posts: 5103
Joined: Mon Sep 26, 2011 10:56 am
Location: Looking forward to summer in Somerset, UK...

Re: Emulating musical tones using cassette out - help!

Post by 1024MAK »

In a real ZX80 or ZX81, it does not matter if the digital output ends up at a logic high or a logic low. Due to the series wired ‘coupling’ capacitor (which blocks DC voltage levels and attenuates low frequency AC signals) and the parallel connected 1k resistor, the output at the mic socket will always change towards the 0V / ground level (silence as far as audio systems are concerned) shortly after any activity on the digital output stops. By activity, I mean a low to high or a high to low.

Also due to this ‘coupling’ capacitor, when the digital output IS toggling, it changes the signal from a logic low - logic high - logic low - logic high - logic low etc... signal, to an AC waveform that goes to a positive voltage, then it passes through 0V / ground to a negative voltage, then again through 0V to positive, etc...

I haven’t got a clue what is going on with that code (I’m more of a hardware guy), so can’t really help there. But the mic output is closer to a one (single) bit DAC than a conventional sound chip. But with the limitation of the capacitors and resistors forming a filter.

The audio output level will depend on the output frequency (due to the filter). It may be possible that some clever programmers use a high frequency pulse width modulation (PWM) to affect the volume level. The parallel capacitor across the mic connector would demodulate (remove) the high frequency component leaving an audio waveform. Longer high frequency pulses giving a higher volume compared to shorter high frequency pulses, but without changing the perceived audio frequency (pitch). However, code to do this would use up lots and lots of CPU time.

Mark
ZX81 Variations
ZX81 Chip Pin-outs
ZX81 Video Transistor Buffer Amp

:!: Standby alert :!:
There are four lights!
Step up to red alert. Sir, are you absolutely sure? It does mean changing the bulb :!:
Looking forward to summer later in the year.
User avatar
kpalser
Posts: 80
Joined: Sun Jun 03, 2012 2:18 pm
Location: Dundee, Scotland

Re: Emulating musical tones using cassette out - help!

Post by kpalser »

Hi Mark,

All very well described. Thank you.
1024MAK wrote: Tue Jan 21, 2020 10:12 am In a real ZX80 or ZX81, it does not matter if the digital output ends up at a logic high or a logic low ... the output at the mic socket will always change towards the 0V / ground level (silence as far as audio systems are concerned) shortly after any activity on the digital output stops. By activity, I mean a low to high or a high to low.
So my key take from your reply is that the current emulation approach of returning to a silent state after activity has a real world justification (with the accuracy of modelling the rate being another subject for debate).
1024MAK wrote: Tue Jan 21, 2020 10:12 am ... However, code to do this would use up lots and lots of CPU time.
Which is part of the reason I don’t want to include the extra questionable code I’ve highlighted. Though, at this stage after 2 weeks of contemplating it on and off I’d appreciate anyone else’s appraisal before continuing.

Thanks for your input on the theory side.

Kevin
olofsen
Posts: 189
Joined: Wed Jan 08, 2014 12:29 pm

Re: Emulating musical tones using cassette out - help!

Post by olofsen »

Indeed, the code is used only for the first value in the sound buffer after an IN or OUT. The "newpos" and "subpos" variables are calculated like an integer division and a remainder. So I guess the "subpos" variable smoothes things a bit by not taking the initial value itself, but taking into account where the IN/OUT occurred within the sound sampling times. Furthermore, if multiple INs or OUTs occur, only the first one matters. But if an IN and OUT occur at the same sample, this may be taken into account by adjusting "beeper_last_subpos" which is a global variable. So perhaps this makes the beeper sound a little bit better, and perhaps mainly if the sampling rate of the sound is low.
User avatar
kpalser
Posts: 80
Joined: Sun Jun 03, 2012 2:18 pm
Location: Dundee, Scotland

Re: Emulating musical tones using cassette out - help!

Post by kpalser »

olofsen wrote: Thu Jan 23, 2020 7:52 am Indeed, the code is used only for the first value in the sound buffer after an IN or OUT. The "newpos" and "subpos" variables are calculated like an integer division and a remainder. So I guess the "subpos" variable smoothes things a bit by not taking the initial value itself, but taking into account where the IN/OUT occurred within the sound sampling times. Furthermore, if multiple INs or OUTs occur, only the first one matters. But if an IN and OUT occur at the same sample, this may be taken into account by adjusting "beeper_last_subpos" which is a global variable.
Thank you for taking the time to also review the code. You make some interesting points.
olofsen wrote: Thu Jan 23, 2020 7:52 am So perhaps this makes the beeper sound a little bit better, and perhaps mainly if the sampling rate of the sound is low.
I ran a trace on my emulator using Kelly Murta’s demo program and the newpos only coincided at the same position a couple of times a second at most. Sometimes with a few seconds of separation. I think the sampling rate as you say is quite important and my emulator follows the standard levels of device, up to 48KHz on recent iOS devices. So 3.25MHz / 48KHz = 3250000 / 48000 = 67.7 CPU cycles per sound amplitude data position. Which is quite a small window to fit IN & OUT instructions with any separation.
Post Reply