A double-word by single-word division subroutine (32 bits / 16 bits)

AllyCat · Jun 17, 2012

PICAXE Basic doesn't support ANY double-word (32-bit) arithmetic but it does offer two separate multiplication operators (* and **) which can be combined to form a double-word result. Subroutines for 32-bit Addition and Subtraction are simple enough (if you can't write those then you probably shouldn't be attempting 32-bit maths on a PICAXE) but a request for a 32-bit by 16-bit division routine made an interesting challenge for several "serious" programmers in this thread. I've previously posted a simplified ("31 / 15 bits") version here, which should meet most needs, but this extended version accepts a full 16-bit Divisor and can deliver a 32-bit Result (and 16-bit Remainder). The general aim has been to minimise the memory used, with a reasonably understandable program structure, so details common to both are not repeated here.

PICAXE Basic splits its Multiplication and Division processes into two separate operations (generating either a Result or a Remainder in the case of Division) but this isn't sensible for a (much slower) subroutine because the calculations are almost identical. However, there is some merit in performing individual calculations for "High" and "Low" word outputs (rather as done with PICAXE multiplication) but in this case they must be executed in the correct sequence (High then Low). Below are four routines (or really just one subroutine with four entry points), two for the High or Low word results individually (div3216H and div3216L), one to always use both in sequence (div3216A) and one which checks the input data (div3216C) and uses the appropriate path. The latter returns an Error Code (zero in pass) for invalid data, but as the only possible error is Division by Zero, the calling program could just as easily directly test the Divisor before execution.

Calculation of the High-Result word is trivial (it's just the High-Numerator / Low-Divisor in Basic) and the High-Remainder is automatically zero (because the Divisor is a single word). So the routine is extended slightly to prepare the Numerator for the Low-words routine, where most of the computation takes place. This latter routine shifts the Numerator and Result (in a shared double-word register) repeatedly leftwards, subtracting the Divisor when possible. A single byte variable counts the shifts and uses its top bit as a "Carry/Borrow flag" extension of the Numerator. On return, the High-Numerator is replaced by the Remainder and the Low-Numerator by the Low-Result words.

The "High" and "Low" routines individually need 7 bytes of RAM, but the Low-Numerator would normally be stored before executing the High routine (although it is not required for this calculation) and the High-Result will generally need to be stored until after the Low routine has been executed. So in practice 9 bytes of RAM are required, or 13 bytes if the original Numerator must be retained. The subroutine uses about 80 bytes of Program Memory and executes in a fairly consistent 150ms with a 4MHz clock.

Code:

#picaxe 20m2					; And most others
#no_data
; AllyCat 17/06/12

Symbol NUMERATOR = 2109876543 			; TEST DATA	
Symbol DIVISOR  = 10				; Maximum 16 bits (i.e. < 65536)
Symbol numerhi  = NUMERATOR / $10000		; NB: Program Editor (v5.5.1) Maths
Symbol numerlo  = NUMERATOR & $0FFFF		; is only valid up to 31 bits 	
Symbol divsor   = DIVISOR  & $0FFFF			; (i.e. maximum 2,147,483,647)
symbol topbit   = $8000				; "Weight" of carry bit when left-shifting

symbol reshi = w4 				; Result High word
symbol numhi = w3				; Numerator High, Returns Remainder (Low)
symbol numlo = w2				; Numerator Low, Retuns Result Low
symbol divis = w1				; Divisor, Unchanged on return
symbol pass  = b0				; Loop counter, Carry flag and Error report

divtest:	
	divis = divsor			; Divisor (maximum 16 bits)
	numhi = numerhi  			; Numerator High -> Remainder on exit
	numlo = numerlo 			; Numerator Low -> Result Low
	gosub	div3216C			; Check data and divide 32 by 16 bits
	if pass = 0 then
		sertxd ("Cannot divide by zero",cr,lf) 
		do : loop	
		endif
	sertxd ("Result Hi=",#reshi," Lo=",#numlo)
	sertxd (cr,lf,"Remainder=",#numhi)		; Value NOT stored beyond here
	sertxd (cr,lf,"Dec.Result=")
	numhi = reshi				; Move Result back to Numerator 
	gosub showascii				; Now Show the Result in ASCII
	do : loop					; (~145 bytes to here)

div3216H:	; Calculate High Result word for 32 by 16 bits division
; numhi = Numerator High. Returns with Numerator High ready for "Low Word" calculation 
; divis = Divisor;  reshi = High Result (High Remainder will be zero)
; numlo = Numerator Low (optional, not required for "High" calculation)

	reshi = numhi / divis * divis			; Divisible part of High byte 
	numhi = numhi - reshi			; Subtract to give numerator for Low calc
	reshi = reshi / divis				; High Result (Low not done yet)
done:
	return	
	
div3216C:	; Check input data and select entry point for 32 by 16 bits division
		; Return immediately with pass = 0 if divisor is zero

	pass = 0					; Pre-set the error code	
	if divis = 0 then done			; Can't divide by zero
	reshi = 0					; Pre-load zero High Result
	if divis > numhi then div3216L		; Jump to use 16-bit Result routine...
						; else fall through for full calculation
							
div3216A:	; Force a full 32 by 16 bit division (regardless of input data)

	gosub div3216H				; Calculate High word Result, then .. 
						; fall through to calculate Low words
							
div3216L:	; Calculate Low Result and Low Remainder of 32 by 16 bits division
; numhi = Numerator High word -> Remainder on return
; numlo = Numerator Low word -> Result (Low word) on return
; divis = Divisor, unchanged on return.  pass = 0 returned if input data error
	 
	for pass = 0 to 15			; Repeat for each bit position
	if numhi >= topbit then 		; Carry from top bit
		pass = pass OR 128	; Carry flag
		endif	
	numhi = numhi + numhi		; Start shifting numerator left
	if numlo >= topbit then 		; Carry bit
		numhi = numhi + 1		; Add in the carry
		endif
	numlo = numlo + numlo		; End of Shifting
	if numhi < divis AND pass < 128 then nosub	; Jump if can't subtract
	numhi = numhi - divis		; Subtract divisor from (current) numerator
	numlo = numlo + 1			; Add digit into result
	pass = pass AND 127		; Cancel the "subtract" flag
nosub:		
	next pass
	return		; (~80 bytes)

The subroutines are preceded by a short test harness to display the High- and Low-Result words and Remainder. However, for additional testing and demonstration it also calls the following subroutine (showascii) which employs the division routines twice more (so the total execution time is around 500ms at 4MHz) to convert the original (32-bit) result into a decimal number (up to 10 digits) and then display it as a formatted string of ASCII characters. A particular feature is that the final division forces use of the "Low-words-only" routine (since the Result is now one millionth of the original Numerator) to avoid over-writing the High-Result register (being used to store digits which cannot yet be sent for display). Also the dec3ascii routine does NOT use the PICAXE bintoascii, since the customised code is more efficient in memory usage (Program and RAM). It sends the digits with leading zeros suppressed and a comma ',' as a thousands separator. Note that this conversion routine corrupts (over-writes) all the previously calculated values so they may need to be copied/stored first.

Code:

symbol diblo = b2			; Low byte of divisor (used by ascii routine)
symbol dibhi = b3			; High byte of divisor (used by ascii routine)
			
showascii:	; Send Long Word (32 bits in numhi:numlo) as ASCII decimals
; NB: Registers divis, numlo, numhi, reshi and pass are ALL corrupted on return. 
	divis = 1000 			; Start by removing the final three digits
	gosub div3216C			; Returns Result in reshi, Remainder in numhi
	divis = numhi			; Shuffle the words for the next division
	numhi = reshi			; Result into Numerator ready for next division
	reshi = divis			; Remainder (Final 3 digits to send) now in reshi
	divis = 1000 			; Prepare to remove another three digits
	gosub div3216L			; Middle 3 digits in numhi, First 4 digits in numlo	
	pass = numlo / 1000		; First digit to be sent
	numlo = numlo // 1000		; Next 3 digits
	if pass > 0 then
		sertxd(#pass)		; Show billions digit
		endif
	gosub dec3ascii			; Show millions digits
	numlo = numhi
	gosub dec3ascii			; Show thousands digits
	numlo = reshi
	gosub dec3ascii			; Show units digits
	if pass = 0 then		; Result was zero so send that
		sertxd("0")	
		endif	
	return		; (~75 bytes)
	
dec3ascii:	 ; Send numlo as 3 digits with ',' and leading zeros (when appropriate)
	if pass > 0 then			; Flag indicates some digits have been sent
		dibhi = numlo /100 // 10	; Hundreds digit   )  Uses less
		diblo = numlo / 10 // 10	; Tens digit         )- memory than
		numlo = numlo // 10		; Units digit       ) BINTOASCII
		sertxd(",",#dibhi,#diblo,#numlo)	;  Send comma and leading zeros
	else if numlo > 0 then
		sertxd (#numlo)			; Send without leading zeros
		pass = 1			; Set the send-zeros flag
		endif
	return		; (~45 bytes)

AllyCat · Jul 29, 2012

32-bit Addition and Subtraction subroutines.

Hi,

As requested in another thread, here are some similar double-word (32-bit) Addition and Subtraction subroutines, since their implementation is perhaps not quite as "trivial" as I implied above. I've appended them to this thread because I doubt if they will be used except with a 32-bit division and/or decimal conversion routine. But IMHO PICaxe is still not really very suitable for 32-bit maths; for example, a single Long-Word calculation (e.g. LW3 = LW2 + LW1) requires 12 bytes, or almost half of all the directly accessible memory registers in most PICaxes.

In view of the limited memory, I set two restrictions for myself; firstly not to use any additional registers or flags in the subroutines and secondly to allow the Result to "overlay" one of the input Long Words (e.g. LW1 = LW2 + LW1). Thus the subroutines can use just 8 bytes if required.

The need for additional registers or flags can be avoided by putting the "core" calculation into a single line. Due to PICaxe's (lack of) operator precedence, it is necessary to calculate the Carry flag first, and then add the two input variables. Using hippy's formula for the Carry, the calculation of the High Word (for LW3 = LW2 + LW1) becomes:

LW3Hi = LW3Lo max LW1Lo - LW1Lo max 1 + LW2Hi + LW1Hi

The Carry is calculated from the (Low) Result Word and one of the Input Words, so it is not possible to overlay the Result onto that particular input word (so LW1 = LW2 + LW1 cannot be used). But addition is commutative so the variables can be simply rearranged, i.e. use LW2 = LW1 + LW2.

A similar routine can be created for subtraction, but with the need to invert the Borrow flag and incorporate the twos-complement of 1 (I couldn't find a better way) then the "one line" code becomes rather bloated and obscure. So I adopted a "conditional" structure (in simple words, an IF statement). IMHO this format is more intuitive, a "Borrow" is necessary (only) if the second variable (the subtrahend) is larger than the first (the minuend). In this case the Borrow is calculated directly from the two input variables, so the Result can overlay either one of these if required (i.e. LW1 = LW2 - LW1 and LW2 = LW2 - LW1 are both permissible).

Finally, I applied the conditional format back to the addition routine. It may be marginally larger and slower, but is more intuitive; an overflow (Carry) must have occurred if the Result word is smaller than (either) one of the input variables.

So, here are the three subroutines, preceded by a test and demonstration harness. Note that I developed and tested these routines simply using byte variables (because all the testing can then employ the PE's Word variables) and only increased the variables to full words (and thus the complete values to Double-Words) for the final listing.

Code:

; Double-Word (32-bits) Addition and Subtraction subroutines
; AllyCat, London, 29/7/2012

#picaxe 20m2		; And most others
#no_data		; Don't load EEPROM

#define notest		; Use {no}test for {double-}word calculation

#ifdef test
; TEST CONFIGURATION (using byte variables)
symbol LW3Hi = b7		; Word 3 (result) Hi byte
symbol LW3Lo = b6		; Word 3 (result) Lo byte
symbol LW2Hi = b5		; Word 2 Hi byte
symbol LW2Lo = b4		; Word 2 Lo byte	
symbol LW1Hi = b3		; Word 1 Hi byte
symbol LW1Lo = b2		; Word 1 Lo byte

	w1 = 255
	w2 = 257
#else	
symbol LW3Hi = w7		; Long-Word 3 (result) Hi word
symbol LW3Lo = w6		; Lond-Word 3 (result) Lo word
symbol LW2Hi = w5		; Lond-Word 2 Hi word
symbol LW2Lo = w4		; Lond-Word 2 Lo word	
symbol LW1Hi = w3		; Long-Word 1 Hi word
symbol LW1Lo = w2		; Long-Word 1 Lo word

	LW1Hi = 0
	LW1Lo = 65535	
	LW2Hi = 1
	LW2Lo = 1
#endif

demo:
	gosub add32			; 32 bit Add using Carry flag
	sertxd("Flag Add ")
	gosub report
	gosub add32a			; 32 bit Add using Conditional test
	sertxd("Cond.Add ")
	gosub report
	gosub sub32			; 32 bit Subtract
	sertxd ("Subtract ")
	gosub report
	do : loop
	
;SUBROUTINES

report:
	#ifdef test
		sertxd(#w3,cr,lf)
	#else
		sertxd("HiWord= ",#LW3Hi," LoWord= ",#LW3Lo,cr,lf)
	#endif
	return

add32:      ; Calculate LW3 = LW2 + LW1 (LW3 may overlay LW2, but not LW1)
	LW3Lo = LW2Lo + LW1Lo	; Next include the Carry using Hippy's snippet:
	LW3Hi = LW3Lo max LW1Lo - LW1Lo max 1    + LW2Hi + LW1Hi
	return	; (~16 bytes)
	
add32a:     ; Calculate LW3 = LW2 + LW1 (LW3 may overlay LW1, but not LW2)
	LW3Lo = LW2Lo + LW1Lo
	LW3Hi = LW2Hi + LW1Hi	; Next Carry if result < component: 
	if LW3Lo < LW2Lo then inc LW3Hi endif ; Carry from low word
	return 	; (~19 bytes)
	
sub32:      ; Calculate LW3 = LW2 - LW1 (LW3 may overlay LW1 or LW2)	
	LW3Hi = LW2Hi - LW1Hi	; Next Borrow if minuend < subtrahend:
	if LW2Lo < LW1Lo then dec LW3Hi endif   ; Borrow from low word 
	LW3Lo = LW2Lo - LW1Lo
	return 	; (~18 bytes)

Now I wonder if anyone will find a valid application for them.

Cheers, Alan.

srnet · Jul 31, 2012

Well the division one sure was useful, it worked rather well in the lost model locator.

The completed device has application for lost model locators and for use in balloons. I have been asked if I would make and sell the devices, as in the RC model world there is not really any similar device that does the same, has such a long range, and at such low cost. And as its PICAXE based anyone can customize to their own needs.

A double-word by single-word division subroutine (32 bits / 16 bits)

AllyCat

Senior Member

AllyCat

Senior Member

srnet

Senior Member