A Higher Resolution Division Subroutine

AllyCat

Senior Member
Hi,

This is the first of a family of subroutines to calculate some more advanced mathematical operations (such as trignometric functions) that are not supported by M2 chips, and/or to calculate values to a higher resolution than is supported by the X2 chips. I posted this "double-word" division routine several years ago as a Code Snippett (where more details can be found), but have now revised it to include some error-handling capabilities and options to select the resolution and scaling of the Result, etc..

All these subroutines (will) use a consistent variable-naming convention that I have adopted for most of my programs now: b1 and w1 are used as "Temporary" or "Local" variables and to pass parameters to and from subroutines. They are named explicitly, rather than by symbol declarations, to reduce the risk of accidentally using a register for more than one purpose at the same time. Also, they may be subdivided into their individual bit or byte components when this is useful.

The routines are optimised to have a minimum "footprint", in terms of the number of variable/RAM bytes required (usually a total of 7 bytes). Also, to keep the program code relatively compact, some (fairly minor) restrictions are placed on the range of numerical values which may be employed. This division routine (and most others to be described) needs only two additional Word variables symbolised as WX and WY. These names are not particularly relevant to a division process, but most of the other subroutines relate to graphical or trigonomtric functions, where X and Y axes are employed.

In this division subroutine, WX is allocated to the High (or eXtension) word of the Numerator and then returns the Remainder. WY is the Divisor which is returned unchanged, unless the "Auto-Scaling" facility has been applied. WX and WY can be allocated to any PICaxe word variable, (i.e. from w2 upwards) but here, they are defined as W2 and W3 so that all the subroutine variables appear on the top line of the simulator RAM/variable table.

Below is a typical listing where the Numerator (up to 31 bits) is stored in "WX:W1 and the Divisor (up to 15 bits) in WY. The Result (up to 16 bits) is returned in W1, the Remainder in WX and the Divisor in WY. PICaxe Basic doesn't support a "Carry" flag, so the "top" bit of some words is used for this purpose, which is the reason why the maximum Numerator value is limited to 31 bits and the Divisor normally to 15 bits. However, this fits well with its application to signed numbers: Typically the (signed) input values are made positive (i.e. the sign bits are removed and negative values complemented) before starting the division process. Then the appropriate sign is reintroduced after the (unsigned) division procedure.

Code:
; Double-Word by single-word division subroutine with (optional) range-checking and auto-scaling
; AllyCat April 2017    Basic subroutine 30 - 65 bytes  (depending on the validity testing)

#picaxe 08m2        						; And probably any other
#no_data								; No lookup data required for this application

#define DDSCALE10						; Autoscale Mode (0=No check,1=Not scaled,else DDSCALE)
symbol DDSCALE = 10						; Autoscale step value (divisor multiplier)
symbol RESBITS = 16 * 16					; Number of bits in result * 16
symbol DDFLAGS = 256 + %0000				; Double-word Division Flags (256 to complement RESBITS)
symbol DDINIT = DDFLAGS - RESBITS			; Preload value for loop counter
symbol DDLIM = 65535 / DDSCALE				; Maximum divisor value that can accept auto-scalin
symbol flags = b0							; Available for Global Flags and Test values
symbol index = b1							; Multiply loop counter and optional flags (bit8 - bit11)
symbol tempb = b1						; Temporary (local) and parameter-passing byte 
symbol tempw = w1						; Numerator low word and returned Result
symbol wx 	= w2							; Numerator High word and Remainder from Division
symbol wy  	= w3							; Divisor, unchanged on return unless auto-scaled 
symbol carry = bit31						; Top bit of w1

test:
	w1 = 0							; Numerator Low word
	wx = 100							; Numerator High word
	wy = 100							; Divisor
	sertxd(cr,lf,#wx,":",#w1," / ")		
	call ddivi
	sertxd(#wy," = ",#w1," e",#b1," R=",#wx)
end

ddivi:				; Double-word division subroutine with initialisation and overflow checking
	b1 = DDINIT						; Set the number of Result bits and any flags
ddiv0:								; Double-word division with test for zero divisor
	if wy = 0 then divzero					; Optional, Otherwise Result = 65535 (or 2^number of loops)
ddivM:								; Double-word division with test for register overflows
	if wx > 32767 then numover				; Not required if converted from twos complement value
	if wy > 32767 then divover				; Not required if converted from twos complement value
ddivs:
#ifdef DDSCALE1
	if wx => wy then overflow				; Result will be more than 16 bits
#else
#ifNdef DDSCALE0						; Note: NOT Defined as zero
	do while wx => wy					; Divisor is too small so need to re-scale
 		if wx > DDLIM then overflow			; Scaling will overflow the register	
		wy = wy * DDSCALE				; Scale divisor up
		inc b1						; Optional (scaling is also shown by change in wy)		
	loop
#endif	
#endif

ddiv:									; Basic double-word division routine WX:W1/WY	
	for b1 = b1 to 255 step 16
		wx = wx + wx + carry				; Shift Numerator left (top bit lost,carry from w1)
		w1 = w1 + w1					; Shift Num/Result left (low bit = 0)
		if wx => wy then					; Can subtract divisor
			wx = wx - wy
			inc w1					; Set lowest bit in result register
		endif
	next b1							; Exit with Result=W1,Remainder=WX,Divisor=WY,
done:
return
; Error reports:	
divzero:
	sertxd(" Division by Zero ")
	goto done
numover:
	sertxd(" Numerator too Large ")
	goto done
divover:
	sertxd(" Divisor too Large ")
	goto done	
overflow:
	sertxd(" Result too Large ")
	goto done
The complete subroutine includes several additional entry points which check for some data errors (if required) but the conditional compilation is a little "clunky" because it is compatible with PE5.

Firstly, division by zero can be checked, but this could just as easily be performed in the main (calling) program. Also, as with PICaxe Basic, division by zero doesn't produce a "catastrophic" result, it simply fills the result word (or byte) with binary '1's (i.e. the largest available value). Next, the numerator can be rejected if it exceeds 31 bits (because it would overflow on the first left-shift). Then, if the Result would exceed 16 bits, the subroutine can automatically introduce a scaling factor to keep the Result within range. Conditional compilation (using DDSCALE..) allows this Result Overflow test to be omitted, or it can jump to an error label, or the Result may be auto-scaled by a power of 2 or 10 (ten).

The number of applications of the scaling factor can be recorded, for example in the lower flag bits of b1. But this may not be needed because on exit, the divisor will have been scaled to the actual value used (otherwise, the divisor is returned unchanged). With most microcontrollers, a scaling factor of 2 is the obvious choice but with PICaxe, a division by 10 (ten) is as easy/fast and produces a Result that may be more useful to an end-user. However, a disadvantage of multiplying the divisor by 10 is that the scaling (and thus some loss of resolution) must be introduced for any divisor greater than 3162 (rather than 16383 for the binary option), to prevent the divisor itself overflowing.

The main loop uses a STEP 16 to determine the number of iterations (i.e. possible binary subtractions), so the four lowest bits are ignored. Thus b1 may carry up to four flags through (or created within) the subroutine, in addition to defining the number of iterations for the core calculation (which must be at least as large as the number of bits in the RESULT).

Therefore, if the Result is known to fit into say 8 bits (as is the case in an interpolation routine to be described later) then only 8 iterations are required and b1 would be initialised with $80 (or %1000xxxx). Otherwise, b0 would be initialised with zero (plus any flag bits) for a maximum 16 bit Result, or for example $60 for a maximum 10-bit Result. On exit from the FOR loop, the high nibble of b1 contains zero, but if the number of iterations was less than 16, then the Result is scaled down by a factor of 2, for each loop that was omitted.

More subroutines are planned to follow in due course. ;)

Cheers, Alan.
 
Top