Dividing a 32 bit number by a 16 bit one???

I have a tipping bucket ranguage and need to convert tips to rain in mm . Here is the start of my code::



; Now calibrate the rain pulses from tipping bucket and converts them to cumulative rain mm x 0.1
; each tip is around 6.25 cc and we get around 160 pulses per litre so pulses represents around 0.28 mm of rain
;
the input pulse count is between 0 and 3000
the output of rainfall is in range 0 to 10000 with 1/ 0th of a mm resolution

;Formula::: note: rain (in mm x 0.1) = rain_pulsecount*10000/(rain_pulsesperlitre*rain_mlpercm) x 10
;
let w1 = rain_pulsesperlitre * rain_mlpercm ; this fits into 16 bit word (around about 35040)
let w2 = rain_pulsecount ** 10000 ; needs 32 bit, so store high word here
let w3 = rain_pulsecount * 10000 ; needs 32 bit, so store low word here

I now need to divide the 32 bit number held in w2 (High) and W3 ( low) by the 16 bit number in w1 but despite my eforts cant semm to get my head around doing it. Can anyone help with the code to do this please
 

hippy

Technical Support
Staff member
I don't have any clever maths to do it, but you can do it by repeated subtraction and counting how many done. That can take a while to process though.

There may be some other way to do things so you don't need to do complex maths. For example, if each pulse represents 0.28 mm of rain simply keep a running total, add 0.28 to that every pulse.

Maybe a more detailed specification of what results you actually need will help.
 
Thanks BeanieBots and hippy for your code and suggestions

I am still hoping that someone will come up with a smart and compact long division mechanism that will to divide a 32 bit number by a 16 bit word where the values are set such that the answer will never be more that a 16 bit word. I suspect it will not be too difficult using the existing picaxe divide function with a carry process between words.. kind of the reverse of multiply which exits with a 32 buit number ...... but for the life of me with all the other things cramming my brain right now nothing is emerging!
 

marks

Senior Member
Hi Peter Goldsbury,
just alittle confused from your formula
hopefully this is all your after 1 pulse = 0.28mm

w1 = rain_pulses * 14 / 5

ie. 1 pulse = 2 (0.2mm)
3572 pulses = 10001 (1000.1mm)
 

AllyCat

Senior Member
Hi Peter,

As the rain sensor is probably only accurate to around 5 - 10% you could just scale all of your data down to single words. ;) But:

There are no doubt many "elegant" solutions, but a normal "Long Division" as you'd do on paper (I hope) is quite easy in binary. I've just tried some simple code using bytes, to keep testing simple (using a word to represent your 32 bit field can use PICaxe's normal conversions), but it should work fine with word variables instead of bytes. The lack of direct support for overflow/carry flags with PICaxe basic is rather a pain, so a "cheat" I've done is to restrict the denominator to one bit less i.e. 127 (32767 for a word). You could probably "bodge" the top bit with extra code, or just extend the code to 3 words/2 words as required. The subroutine takes about 50 bytes of code.

Code:
#picaxe 20m2
; Divide w1 (b3,b2) by b4 using "Long Division"
; result will be in b5
main:
	w1 = 30000		; Test Numerator
	b4 = 125			; Test Denominator (; b4 < 128)
	gosub divide
	sertxd (#b5,cr,lf)		; Print result
	goto main
divide:
	b5 = 0			; Clear result register
	for b1 = 0 to 8		; Repeat for each bit position (16 for words)
	b5 = b5 + b5		; Shift result left
	if b3 < b4 then shift            ; Jump if can't subtract
	b5 = b5 + 1		; Store the databit 
	b3 = b3 - b4		; Subtract the denominator
shift:
	b3 = b3 + b3		; Shift numerator high left
	if b2 > 127 then 		; "Carry" bit (use 32767 if a word)
	b3 = b3 + 1	             ; Add carry to high byte
	endif
	b2 = b2 + b2		; Shift numerator low left
	next b1
	return
Cheers, Alan.
 
Last edited:
Thanks Alan and Co

I will try to see if I can get your code working with 16 bit words when I can find some spare time. He 15 bit linitqation to get the carry would not be much of a problem. For our application we keep a cumulative pulse count of up to a16 bit which we transmit by one way wireless regualy, then process it hourly at the other end. We are using 18M2 doing a lot of work so we dont have the word registers free the Q16 code

Any other suggestions for somthing smart fast and compacrt would be appreciated please I was trying to use native picaxe to divive the high word first to give athe quotient vai / and the remainder using // then process on through the lower word, but the logic as beyond me. If this divide function could be available, it seems picaxe divide maths would extent to 16 bit integer resolution rather than 8, which I suspect many would appreciate.
 

AllyCat

Senior Member
Hi Peter,

Yes I looked at Jeremy's excellent documentation with interest and have filed it away for another day, but think it might be rather resource hungry for your specific requirement.

I've been tidying up my above code, using word variables and a few extra lines to emulate an overflow/carry flag for the numerator. So the subroutine now supports full 32/16 bits, amounts to 52 bytes and should run on any PICaxe; I'll post the modified version if you wish. However, the speed is a little disappointing, it typically executes about 180 lines taking around 150ms (at 4 MHz). Given that most of the instructions are quite elementary (almost native assembler) using "variable + variable" rather than " *2 " to shift left, etc., I was hoping for better than 0.5ms per instruction. It currently uses 10 bytes of RAM, but one is only as the flag.

I've been wondering if it's possible to use much the same code structure at the byte level rather than the bit level (just as one uses decimal digits for normal long division), obviously moving bytes between registers rather than the "shift lefts". However, I think I've tried that before and encountered complications, but maybe only with overflow/underflow checking, which (I assume) we're ignoring here.

Cheers, Alan.
 
Hi Alan

We are using 18m2 at 32 Mhz so can probably get away with the time and resister use for your new 32 bit version if you dont mind posting it please

Your suggestiom to move from bit bashing (base2) to bytes ( base256) and maybe even words ( base 65535) to do the long division might have some merit ( given that we would constrain the values so that the answer always fits into a word to prevent overflow) I tried to think how to do this starting by dividing the high word by the divisor (/) which should almost always give a result of zero and maybe a remainder (//) which if not zero will always be less than the divisor. Then I got stuck as I could not see how to carry the remainder into the long division calculation into do the second word.? Does what you did with the byte base get around this somehow?

We have other school projects underway that need this expanded PICAXE maths function so are interested in finding an elegant solution.
http://www.whirinakirainforest.info/ecosystem_services_value/Rainforest research/forest sensor/Forest Environment Monitoring Station.doc


Thanks again for your help, its much appreciated , Peter
 

AllyCat

Senior Member
Hi Peter,

Thinking more about the repeated use of "partial" divisions (byte, word or any other base), I now rather doubt that it's a viable solution. It's probably easiest to consider the denary equivalent: we want to divide say ABCD by EF but only have the capability of AB by EF. Sometimes it can work, for example:

6543 / 21 starts with 65/21 = 3, remainder 2 ; 24/21 = 1, remainder 3 ; 33/21 = 1, remainder 12, so the final answer is 311, remainder 12.

However, there are plenty of examples where it doesn't work, for example:

9876 / 54 starts 98/54 = 1, remainder 44 ; but now we need 447/54 and we don't have the routine to handle the three digits of 447.

Even the simple binary bit bashing needs to handle an extra digit (i.e. 17 bits) but this can be bodged as we know that if a digit has reached the 17th (overflow) position then it's only necessary to (always) perform a single subtraction of the denominator.

Therefore, for the task you originally stated, the code below is perhaps the easiest solution. However, for more mathematically intensive processes there may be other/better solutions. But it has to be accepted that one of the weaknesses of the PICaxe system is that number-crunching is not its forte.

Code:
#picaxe 20m2        ; Or any other
#no_data
; Divide w3,w2 by w4 using "Long Division", result will be in w5
symbol numlo = w2
symbol numhi = w3
symbol numtop = b7		; Highest byte of numerator
symbol denom = w4
symbol result = w5
symbol carry = b0                  ; Only a single bit flag required (used as zero / non-zero)
symbol pass  = b1
main:	
	numlo = 40000 * 60000	; Test Numerator Low word
	numhi = 40000 ** 60000	; Test Numerator Hi word
	denom = 50000		; Test Denominator
	gosub divide
	sertxd (#result,cr,lf)	; Print result
	goto main
divide:				; 52 bytes in subroutine
	result = 0			; Clear result register
	carry = 0			; Clear carry "flag"
	for pass = 0 to 16	; Repeat for each bit position
	result = result + result         ; Shift left (do BEFORE the final bit is added)
	if numhi < denom & carry = 0 then shift ;  Jump if can't subtract
	result = result + 1	             ; Store the databit 
	numhi = numhi - denom	; Subtract the denominator
shift:
	carry = numtop & $80	; Carry flag from numerator top bit
	numhi = numhi + numhi	; Shift numerator high left
	if numlo > $7FFF then 	; Carry bit from low word
	      numhi = numhi + 1	; Add carry to high word
	      endif
	numlo = numlo + numlo	; Shift numerator low left
	next pass
	return
Note that the "overflow/carry" uses a full byte, but obviously a single bit flag would be better if other parts of the program also employ flags.

Cheers, Alan.
 
Last edited:

hippy

Technical Support
Staff member
Thinking more about the repeated use of "partial" divisions (byte, word or any other base), I now rather doubt that it's a viable solution.
That was generally my conclusion. Consider dividing 16-bits by 8-bits where only 8-bit by 8-bit division is supported -

12345 / 100 = 123

12345 = $3039, $30 in msb, $39 in lsb.

msb = 48, lsb = 57, divisor = 100

You can't go anywhere because the divisor is greater than msb and greater than lsb, any division result is always zero and remainders are the same as the numbers you started with.

However, PICAXE does allow 16-bit by 8-bit divide ( and 16-bit by 16-bit ) and there are three overlapping groups of 16-bits in a 32-bit number. You can create a 32-bit result from a 16-bit times 16-bit multiplication by multiplying only 8-bit values and getting 16-bit results, so it may be possible to somehow reverse that process.
 

womai

Senior Member
Try this one. I've tested it with several different combinations, but you should check it out further before you fully trust it.

It will work for 15-bit divisors, i.e. for a/b where a is a 32-bit number and b is a 16 bit number up to 32767. The result can be full 32 bit (if you don't need that then you can remove all the lines that handle result_hi).

Code:
' Long Divide
'
' Divides a 32-bit number by a 16-bit number
' Result is 32 bit
'
' 32-bit number gets modified, so you should save it before calculation
' if you still need it after the division
'
' IMPORTANT: As it is now, results are only correct of divisors up to 0x7fff = 32767,
'            so it's more a 32 bit divided by a 15-bit number

symbol val32_lo = w0
symbol val32_hi = w1
symbol val16 = w2
symbol result_lo = w3
symbol result_hi = w4
symbol dummy = w5

symbol i = b12

' test value = 12345678 (dez.)
'val32_hi = 0xbc
'val32_lo = 0x614e

val32_hi = 0x0001
val32_lo = 0x0000

val16 = 0xffff

result_lo = 0
result_hi = 0

for i = 0 to 16

    dummy = val32_hi / val16
    
    result_hi = result_hi * 2
    if result_lo >= 0x8000 then ' most significant bit is 1
        result_hi = result_hi + 1 ' bit15 becomes carry from LSB
    endif
    result_lo = result_lo * 2 + dummy
    
    ' calculate remainder
    dummy = dummy * val16
    val32_hi = val32_hi - dummy
    
    ' shift 32 bit number left by 1 bit
    val32_hi = val32_hi * 2
    if val32_lo >= 0x8000 then ' most significant bit is 1
        val32_hi = val32_hi + 1 ' bit15 becomes carry from LSB
    endif
    val32_lo = val32_lo * 2
    
next
 

hippy

Technical Support
Staff member
One handy thing with SYMBOL commands is that they can do 32-bit maths, so can be used to determine the two 16-bit most significant and least significant 16-bit words of a value. Saves a lot of time going to Calculator and converting decimal to hex ...

Code:
Symbol VALUE     = 12345678

Symbol VALUE_MSW = VALUE / $10000
Symbol VALUE_LSW = VALUE & $0FFFF

w1 = VALUE_MSW
w0 = VALUE_LSW
 
Thanks folks, it seems your collective wisdom is getting us somewhere . When this is sorted perhaps Rev Ed might like to consider putting the logic of this fuction into native code for the future to improve PICAXE maths ability and range and simplify life for people like me?
 
Hi Alan,

I have tried putting your code up on an 18M2 and get a syntax error on the following line.
if numhi < denom & carry = 0 then shift ; Jump if can't subtract
Did that work for you or was what you posted an older undebugged version - if so what should it be please?
 

AllyCat

Senior Member
Hi Peter,

I originally wrote (and tested) using "and" instead of the ampersand "&" (which I understood from the manual was an alternative) in two of the lines. After posting, I thought the "&" might be clearer, tested it in one line and it still worked, so edited the post. But for some reason the Program Editor is happy with the second instance but not the first. :confused: So change the two "&"s to "and"s and it should be fine.

I know you only originally requested a single word result, but womai's version with a two word result does overcome the possibility of undetected overflow errors (and makes the routine more "universal"). I wonder if our two versions can be combined (although his does employ two more bytes of RAM)?

Cheers, Alan.
 
Last edited:

womai

Senior Member
If you only need the low word of the result, then here's the code stripped down:

Code:
symbol val32_lo = w0
symbol val32_hi = w1
symbol val16 = w2
symbol result_lo = w3
symbol dummy = w5

symbol i = b12

test value = 12345678 (dez.)
val32_hi = 0xbc
val32_lo = 0x614e

val16 = 1000

result_lo = 0
result_hi = 0

for i = 0 to 16

    dummy = val32_hi / val16
    result_lo = result_lo * 2 + dummy
    
    ' calculate remainder
    dummy = dummy * val16
    val32_hi = val32_hi - dummy
    
    ' shift 32 bit number left by 1 bit
    val32_hi = val32_hi * 2
    if val32_lo >= 0x8000 then ' most significant bit is 1
        val32_hi = val32_hi + 1 ' bit15 becomes carry from LSB
    endif
    val32_lo = val32_lo * 2
    
next
 

hippy

Technical Support
Staff member
When this is sorted perhaps Rev Ed might like to consider putting the logic of this fuction into native code for the future to improve PICAXE maths ability and range and simplify life for people like me?
If we can come up with a simple sequential solution ( no looping ) execution at the PICAXE program speed should not be that slow.

The problem with a long looping solution implemented in firmware is that it would need the space, plus internal variables, and it's a rarely used requirement.

I originally wrote (and tested) using "and" instead of the ampersand "&" (which I understood from the manual was an alternative) in two of the lines.
In a LET assignment & and AND are synonymous ( bit-wise operators ), but in IF and DO-LOOP the AND is a boolean/logical operator, and only boolean/logical and comparison operators are allowed. Allowing AND in assignments is more convenience where allowing & in IF could potentially be confusing.

I don't know if this may help or inspire anyone, but here's the code to calculate a 32-bit result from a 16-bit x 16-bit multiplication, "result = lhs * rhs", using only word and byte values.

The 'lhs' variable is taken as two bytes '$aabb' and 'rhs' as '$ccdd' and from that there are four multiplications which are summed together to give a result ...

----BBDD = --bb * --dd
--BBCC-- = --bb * cc--
--AADD-- = aa-- * --dd
AACC---- = aa-- * cc--

The code runs on a PICAXE or in the simulator ...

Code:
#Picaxe 08M2
#No_Data

Symbol lhs            = w0
Symbol lhs.lsb        = b0
Symbol lhs.msb        = b1

Symbol rhs            = w1
Symbol rhs.lsb        = b2
Symbol rhs.msb        = b3

Symbol result.lsw     = w2
Symbol result.lsw.lsb = b4
Symbol result.lsw.msb = b5

Symbol result.msw     = w3
Symbol result.msw.lsb = b6
Symbol result.msw.msb = b7

Symbol temp           = w4
Symbol temp.lsb       = b8
Symbol temp.msb       = b9

' Some values to multiply together

lhs            = 12345                       ' aabb
rhs            = 54321                       ' ccdd

' Do it the hard way, result = lhs * rhs

temp           = lhs.lsb * rhs.lsb           ' ----BBDD
result.msw     = 0
result.lsw     = temp

temp           = lhs.lsb * rhs.msb           ' --BBCC--
result.msw     = result.msw     + temp.msb
temp           = result.lsw.msb + temp.lsb
result.lsw.msb =                  temp.lsb
result.msw     = result.msw     + temp.msb

temp           = lhs.msb * rhs.lsb           ' --AADD--
result.msw     = result.msw     + temp.msb
temp           = result.lsw.msb + temp.lsb
result.lsw.msb =                  temp.lsb
result.msw     = result.msw     + temp.msb

temp           = lhs.msb * rhs.msb           ' AACC----
result.msw     = result.msw     + temp

SerTxd( "Hard", 9, #result.msw, 9, #result.lsw, CR, LF )

' Do it the easy way

result.msw     = lhs ** rhs
result.lsw     = lhs *  rhs

SerTxd( "Easy", 9, #result.msw, 9, #result.lsw, CR, LF )

Do:Loop
 
ALLAN, I tried your code in my run time environment hoping it would just work , but got some unpredictable results which I have no time to look at just now. (You may like to just check to see that putting a divisor of 0 works OK with your code if you have not already done so?.) I have a practice of renaming the registers to make software configuration more flexible and easily followed (see example below) , so perhaps PICAXE does not allow us to nest assignment under the variable symbols you define for the divide routine, in which case I might need to point these back to the absolute register values (bo, b1 w3 etc) ( HIPPY - is that the case??

Thanks HIPPY for the mind trigger that shows how effectively the picaxe word*word = 32 bit result does that task in firmware. ( And also letting us know about 32 bit editor symbols which point ialong th same track) If only someone had thought to do the compliment for 32 /16 divide in firmware, but I quess in the old days memory space was absolutely premium.

However I am sure with the brains of ALAN, WOMAI , Q Maths and others an optimal picaxe subroutine can be configured and agreed on to give an immediate solution. I agree Alan that it would be safer to include the 23 bit result and wondered if it could be better returned in the 32 bit (twin words) used to hold the 32 bit number to be divided, as that might allow better chaining of maths functions?

If so, it would be great if this subroutine function was included in the Variable - Mathmatic section of the manual please with a couple of of good application examples as it takes a lot of time to get your head around maths with what is there now. We are using PICAXE in schools where its great for nurturing numeracy, so limiting to integers and having students need to think about signs, magnitude of answers, overflow etc may actually be a teaching benefit. For me I have found the range limits and resolution of the current 16/8 divide are a massive hinderence and very confusing for most even when trying to do very simple things.

I am certain if this impediment of PICAXE could be "skirted around" somehow, someone would later rethink about porting it to firmware to speed it up by orders of magnitude as doing things at the PICAXE Basic level is a waste of time and energy (as well as power) in a environment that is already snails pace. I guess here must already be internal storage allocated for the 32 bit internal accumulator etc for multiply anyway.

The far bigger memory and speed of the M2 chips means that posiible applications are extending far beyond the simple electronic input output programming space of the past into applications that probe deeper into applied science where its necessary for example to do things like scaling or calibrating ADC10. Obviously the floating point processor chip is available to extend beyond the above enhancements when needed.

Thanks so much for your help, I could never in my wildest dreams have writen this code myself.

Code:
;*********************************************************************************************************
;Register use (28 bytes  available  b0 - b27)    (note ;b0 = bit7:bit6:bit5:bit4:bit3:bit2:bit1:bit0)

;*********************************************************************************************************


symbol pointer = w0				;pointer for indirect addressing 
;symbol bptr = bO					;variable memory  or external
		symbol pointerl = b0		;low byte to point to bottom 256
		symbol pointerh = b1		;high byte to point to larger memory
 	symbol statscap_start = bit23		; set if its time up to collect stats
			
		symbol flags24up = b3			; status flags that get transmitted by the station 
		
		
;symbol flags = w1 
		symbol flags16up = b2   		;single bit flags 16-31 

			symbol waitsunset = bit16		; set on startup to wait for sunset	
			symbol rain_pulseprev = bit17		; prevoius state of tipping bucket 	      
			symbol wind_pulseprev = bit18 	; previous state of windflow impellor
			symbol logging_title = bit19	 	; set to inset title line in logging download
			symbol alarm_delay = bit20		; set to hold up alarm state TX for 1-2 hours
			symbol button_prev = bit21		; used to record prev button scan state
			symbol button_sequence = bit22	; set to indicate input sequence started
			symbol flag_restart = bit24		; set on startup or reboot
			symbol flag_night = bit25		; set to indicate local time is on night 
			symbol error_clocksync = bit26	; set if the local clock is out of sync 
			symbol error_spare = bit27		;  
					
			symbol mode_transmit = bit28		; set if remote transmission on
			symbol mode_logging = bit29		; set if local logging on 
			symbol mode_statistics = bit30	; set if local statistics collection on
			symbol mode_button = bit31		; set if buttons presssed locally (stays for about 1 hour)
			
symbol ww6 = w2					; two byte registers for global use (wb1-wb2)
		symbol wb1=b4			; (can use as a word ww6 if really needed)
		symbol wb2=b5
		
symbol ww5 = w3					; two byte registers for local use (wb3-wb4)
		symbol wb3=b6			; (can use as a word ww5 if needed)
		symbol wb4=b7


symbol ww1 = w4					; 4 words of working registers (ww1-ww4)			
		symbol ww1l=b8			; (used to pass parameters to maths suroutines etc)
		symbol ww1h=b9
		
symbol ww2 = w5
		symbol ww2l=b10
		symbol ww2h=b11
		
symbol ww3 = w6
		symbol ww3l=b12
		symbol ww3h=b13
		
symbol ww4 = w7
		symbol ww4l=b14
		symbol ww4h=b15		


		 
;symbol tba = w8					
		symbol button_counter = b16	; high = first bcd (command) digit. low = 2nd BCD (data) digit	 
		symbol button_timer = b17	; this is incremented every scan to time button presses	
		
symbol rain_pulsecount = w9			; rain pulse count resister
		symbol rain_pulsecountl = b18		;  
		symbol rain_pulsecounth = b19
		
symbol wind_pulsecount = w10			; wind pulse count register
		symbol wind_pulsecountl = b20		;  
		symbol wind_pulsecounth = b21
		
symbol tba = w11				;	
		symbol time_tweek	= b22		; tweek factor to adjust hour timing re  sunrise-sunset
		symbol time_sunset = b23	; daily sunset time hrs/10  to sync clock (0 at startup)
				
symbol daytimer = w12				; rough hour timing from mid sunset-sunrise 
		symbol time_day = b24			; pointer to days since last data / reset;  
		symbol time_hour = b25			; pointer to current hour / 10  ( hours and decimal)
		
symbol looptimer = w13
		symbol looptimerl = b26		; loop timing mainly with processor asleep to save power 	; 
		symbol looptimerh = b27		;
;
 
Last edited by a moderator:

westaust55

Moderator
I am certain if this impediment of PICAXE could be "skirted around" somehow, someone would later rethink about porting it to firmware to speed it up by orders of magnitude as doing things at the PICAXE Basic level is a waste of time and energy (as well as power) in a environment that is already snails pace. I guess here must already be internal storage allocated for the 32 bit internal accumulator etc for multiply anyway.

The far bigger memory and speed of the M2 chips means that posiible applications are extending far beyond the simple electronic input output programming space of the past into applications that probe deeper into applied science where its necessary for example to do things like scaling or calibrating ADC10. Obviously the floating point processor chip is available to extend beyond the above enhancements when needed.
Please note that the PIC chips (PIC12xxxx some earlier and PIC16xxxx current ) used for the PICAXE range of microcontrollers are in fact only 8-bit chips in terms of data width.
See the Mircochip website. As a simple diagram see: http://www.microchip.com/stellent/idcplg?IdcService=SS_GET_PAGE&nodeId=2551

If you look at the datasheet for say a 14M2 (http://ww1.microchip.com/downloads/en/DeviceDoc/41440B.pdf section 29) you will see that the instructions are for bit and byte manipulation with only add and subtract math functions so Rev Ed already have to impliment multiply and divide plus 16-bit math in firmware.
 

westaust55

Moderator
Here is my attempt at a PICAXE routine for a 32-bit dividend and and 16 bit divisor.
The quotient (answer) can be a 32 bit (2 word number if the divisor is very small) but usually the entire answer will be in the low word of the quotient.
The Remainder/modulus is also available.
The routine uses the lowest 14 bytes of the PICAXE varaibles. Be careful in changing to other variables than those used here as some of the consituent bits of word variables w1 and w0 are used to simplify the maths.

Seems to be working correctly with limited test example done in the PE (V5.5.1) simulator.
Code size including sample dividend and divisor values is 147 bytes

No error checking such as divisor = 0 - leave it to those interested to include that "safety net".

Code:
SYMBOL DivrHiWord	= w1	; high word of 32 bit dividend
SYMBOL DivrLoWord	= w0	; low  word of 32 bit dividend
SYMBOL Divisor	= w2
SYMBOL QuotHiWord	= w4
SYMBOL QuotLoWord	= w3
SYMBOL Leftover	= w5 	; remainder word to allow for up to 17 bits  - starting with  "Rem" gives poor PE syntax results
SYMBOL Counter	= b12	; just a loop counter
SYMBOL Temp		= b13
Init:
	Leftover = 0
	Counter = 32

; Test case values here
; dividend = 456789
	DivrHiWord = 6
	DivrLoWord = 63573
; divisor = 123
	Divisor = 123


Main:
	DO
		Leftover = Leftover * 2 + bit31
		DivrHiWord = DivrHiWord * 2 + bit15
		DivrLoWord = DivrLoWord * 2
		
		
		QuotHiWord = QuotHiWord * 2
		Temp = QuotLoWord / $8000
		IF Temp = 1 THEN
			INC QuotHiWord
		ENDIF
		QuotLoWord = QuotLoWord * 2

	
		IF Leftover >= divisor THEN
			INC QuotLoWord
			Leftover = Leftover - divisor
		ENDIF

		DEC Counter
	LOOP Until Counter = 0	

	SERTXD ("Quotient/Answer = ", #QuotHiWord,"  ",#QuotLoWord, cr,lf)
	SERTXD ("The Remainder   = ", #Leftover, cr,lf)
	PAUSE 1000	
	DO
	LOOP
 

AllyCat

Senior Member
You may like to just check to see that putting a divisor of 0 works OK with your code if you have not already done so?.
Hi Peter,

There is at most only one valid test for a divisor of zero and that's 0/0 My code (and womai's) gives a result of zero because the numerator is zero. The naive might expect a result of 1 but any (or no) answer is valid for 0/0. All other numerators should gnererate an out-of-range value and as I said earlier my code has no range testing. In general, PICaxe has no error reporting facility so there's little point in testing input validity, it is the responsibility of the programmer to perform range testing. Therefore, I've modified my header a little to read:
Code:
; Divide w3,w2 by w4 using "Long Division", w3 should be < w4
; result is in w5, remainder in w2, w3 = 0
As indicated by westaust, PICaxe basic does not support 16-bit by 16-bit multiplication so I certainly wouldn't expect native 32-bit divison. What PICaxe gives is a partial result for word * word, i.e.either a "high" word or "low" word. If the programmer chooses to use both * and ** then he can concatenate the results to a "32-bit" result, but that is his responsibility. Maybe you consider that to be nit-picking, however the point is that the PICaxe commands are of the form:

word (assigment) = word (function) word , with the input words remaining unchanged.

That is very different from a "32 bit" division with 3 "input" words and perhaps as many (assigned) output words (result, remainder). IMHO the best you can expect is a documented subroutine, but even defining what that should do is far from trivial.

Actually, PICaxe basic doesn't even fully support 16-bit addition and subtraction, because there is no carry/overflow/borrow/error flag. That is where I personally would like to see PICaxe basic extended since IMHO the carry flag is so fundamental to microcontroller maths. But perhaps the point is that PICaxe basic is intended to "hide" the underlying micro structure, and also hardly supports bit/flag operations at all.

I believe the original request was for a (maximum) 16-bit result, but in principle I do quite like the idea of a 32-bit result as offered by womai and westaust. However, I believe that womai's code takes about 30% longer to execute (probably because of the / and * instructions) and I suspect (not tested) westaut's is longer still because of the 32 iterations. Both also use more RAM.

So, if we were writing a "specification" for a "32-bit divison" subroutine and agree that it's to be a 32-bit numerator divided by a 16 (or 15) bit divisor, we still have to decide whether the result should be 16 or 32 bits and whether the input variable values should be retained, corrupted or maybe overwritten by result data. Normally, the "remainder" will be left in the numerator registers, so it must be decided whether to complicate the subroutine in restoring the input values or swapping around result and remainder data, etc. (which all takes time/space and may be of no value to some/many users).

I'm intrigued that womai's routine achieves a 32 bit result with 16-17 iterations and have considered attempting to create a "hybrid" routine, but am concerned by the additional resources required. The main use I can see is for converting a 32-bit number to a denary (decimal) value by repeatedly dividing by 10, but the execution time and the need to reverse the order of the 9 or so result digits before "printing" makes me wonder about its usefulness.

Cheers, Alan.
 

hippy

Technical Support
Staff member
I don't know if this helps at all but it's the long hand way to binary divide.

Say we want to divide 1000 by 3.

We shift the divisor left until it's the largest number less than the quotient, keeping track of by how much we have shifted it by.

Code:
0 -> 3
1 -> 6
2 -> 12
3 -> 24
4 -> 48
5 -> 96
6 -> 192
7 -> 384
8 -> 768
We then compare and subtract, setting bits in the result as we go which reflect how much we shifted the original divisor by.

Code:
8 : 1000 >= 768 -> Yes, Set result 2[sup]8[/sup], 1000-768 = 232
7 : 232  >= 384 -> No
6 : 232  >= 192 -> Yes, Set result 2[sup]6[/sup], 232-192 = 40
5 : 40   >= 96  -> No
4 : 40   >= 48  -> No
3 : 40   >= 24  -> Yes, Set result 2[sup]3[/sup], 40-24 = 16
2 : 16   >= 12  -> Yes, Set result 2[sup]2[/sup], 16-12 = 4
1 : 4    >= 6   -> No
0 : 4    >= 3   -> Yes, Set result 2[sup]0[/sup], 4-3 = 1
So the result is %101001101 = 333

The quotient is now the remainder = 1

So the algorithm is -

Code:
bitMask = 1
Do While (divisor*2) <= quotient
  bitMask = bitMask * 2
  divisor = divisor * 2
Loop

result = 0
Do
  If quotient >= divisor Then
    result = result | bitMask
    quotient = quotient - divisor
  End If
  bitMask = bitMask / 2
  divisor = divisor / 2
Loop Until bitMask = 0
That "(divisor*2) <= quotient" can probably be "divisor <= (quotient/2)" which is easier to handle as no possibility of overflow.

It's also possible to optimise it all into a single loop and then run through the bit shift of 31 down to 0 which is likely what most software implementations will do. Note that "divisor<<bit" is notionally 64-bit.

Code:
result = 0
For bit = 31 To 0 Step -1
  If quotient >= ( divisor << bit) Then
    result = result | ( 1 << bit )
    quotient = quotient - ( divisor << bit )
  End If
Next
In theory it's easy to do and the algorithm simple, but the implementation is the complicated thing.
 

hippy

Technical Support
Staff member
My first attempt at implementing the above algorithm. Not perfect in the shifting the divisor and bitMask but seems to work for 1000/3 = 333 and 123456789/100000 = 1234, so with a bit of tweaking should work ...

Code:
Symbol quotient.lsw = w2
Symbol quotient.msw = w3

Symbol divisor.lsw  = w4
Symbol divisor.msw  = w5

Symbol bitMask.lsw  = w6
Symbol bitMask.msw  = w7

Symbol result.lsw   = w8
Symbol result.msw   = w9

quotient.msw = 0 : quotient.lsw = 1000
divisor.msw  = 0 : divisor.lsw  = 3
Gosub Divide
SerTxd(#result.msw," ",#result.lsw,CR,LF)

Symbol QUOTIENT = 123456789
Symbol DIVISOR  =    100000

Symbol QUOTIENT_MSW  = QUOTIENT / $10000
Symbol QUOTIENT_LSW  = QUOTIENT & $0FFFF
Symbol DIVISOR_MSW   = DIVISOR  / $10000
Symbol DIVISOR_LSW   = DIVISOR  & $0FFFF

quotient.msw = QUOTIENT_MSW
quotient.lsw = QUOTIENT_LSW
divisor.msw  = DIVISOR_MSW
divisor.lsw  = DIVISOR_LSW
Gosub Divide
SerTxd(#result.msw," ",#result.lsw,CR,LF)

Do:Loop

End

Divide:

  bitmask.msw = $0000
  bitMask.lsw = $0001
  Do While quotient.msw > divisor.msw
    divisor.msw = divisor.msw * 2
    divisor.msw = divisor.lsw / $8000 + divisor.msw
    divisor.lsw = divisor.lsw * 2
    bitMask.msw = bitMask.msw * 2
    bitMask.msw = bitMask.lsw / $8000 + bitMask.msw
    bitMask.lsw = bitMask.lsw * 2
  Loop 
  Do While quotient.msw = divisor.msw And quotient.lsw >= divisor.lsw
    divisor.msw = divisor.msw * 2
    divisor.msw = divisor.lsw / $8000 + divisor.msw
    divisor.lsw = divisor.lsw * 2
    bitMask.msw = bitMask.msw * 2
    bitMask.msw = bitMask.lsw / $8000 + bitMask.msw
    bitMask.lsw = bitMask.lsw * 2
  Loop 

  result.msw = 0
  result.lsw = 0
  Do
    If quotient.msw > divisor.msw Then
      result.msw = result.msw | bitMask.msw
      result.lsw = result.lsw | bitMask.lsw
      If quotient.lsw < divisor.lsw Then
        quotient.msw = quotient.msw - 1
      End If
      quotient.lsw = quotient.lsw - divisor.lsw
      quotient.msw = quotient.msw - divisor.msw
    Else
      If quotient.msw = divisor.msw And quotient.lsw >= divisor.lsw Then
        result.msw = result.msw | bitMask.msw
        result.lsw = result.lsw | bitMask.lsw
        If quotient.lsw < divisor.lsw Then
          quotient.msw = quotient.msw - 1
        End If
        quotient.lsw = quotient.lsw - divisor.lsw
        quotient.msw = quotient.msw - divisor.msw
      End If
    End If
    bitMask.lsw = bitMask.lsw / 2
    bitMask.lsw = bitMask.msw & 1 * $8000 + bitMask.lsw
    bitMask.msw = bitMask.msw / 2
    divisor.lsw = divisor.lsw / 2
    divisor.lsw = divisor.msw & 1 * $8000 + divisor.lsw
    divisor.msw = divisor.msw / 2
  Loop Until bitMask.msw = 0 And bitMask.lsw = 0
  
  Return
 

hippy

Technical Support
Staff member
Second version, should work completely but not exhaustively tested. 32-bit divide by 32-bit giving 32-bit result and 32-bit remainder. N/0 = 0 remainder N.

The /2, /$8000, *2, *$8000 can all be replaced by << and >> on X2's which will increase speed. The initial divisor and bitMask shift is sub-optimal, simply shifts until the msb of divisor is set then lets subsequent compares fail so some wasted loops in both cases but will always work.

I've also corrected 'quotient' to be 'dividend'. Not sure what had got into my head earlier :)

Code:
Symbol dividend.lsw = w2
Symbol dividend.msw = w3

Symbol divisor.lsw  = w4
Symbol divisor.msw  = w5

Symbol bitMask.lsw  = w6
Symbol bitMask.msw  = w7

Symbol result.lsw   = w8
Symbol result.msw   = w9

dividend.msw = 0 : dividend.lsw = 1000
divisor.msw  = 0 : divisor.lsw  = 3
Gosub Divide
SerTxd(#result.msw," ",#result.lsw,CR,LF)

Symbol DIVIDEND = 123456789
Symbol DIVISOR  =    100000

Symbol DIVIDEND_MSW  = DIVIDEND / $10000
Symbol DIVIDEND_LSW  = DIVIDEND & $0FFFF
Symbol DIVISOR_MSW   = DIVISOR  / $10000
Symbol DIVISOR_LSW   = DIVISOR  & $0FFFF

dividend.msw = DIVIDEND_MSW
dividend.lsw = DIVIDEND_LSW
divisor.msw  = DIVISOR_MSW
divisor.lsw  = DIVISOR_LSW
Gosub Divide
SerTxd(#result.msw," ",#result.lsw,CR,LF)

Do:Loop

End

Divide:

  result.msw = 0
  result.lsw = 0

  If divisor.msw <> 0 Or divisor.lsw <> 0 Then

    bitmask.msw = $0000
    bitMask.lsw = $0001

    Do While divisor.msw < $8000
      divisor.msw = divisor.msw * 2
      divisor.msw = divisor.lsw / $8000 | divisor.msw
      divisor.lsw = divisor.lsw * 2
      bitMask.msw = bitMask.msw * 2
      bitMask.msw = bitMask.lsw / $8000 | bitMask.msw
      bitMask.lsw = bitMask.lsw * 2
    Loop 

    Do
      If dividend.msw >= divisor.msw Then
        If dividend.msw = divisor.msw Then
          If dividend.lsw >= divisor.lsw Then
            result.lsw = result.lsw | bitMask.lsw
            dividend.lsw = dividend.lsw - divisor.lsw
            dividend.msw = 0
          End If
        Else
          result.msw = result.msw | bitMask.msw
          result.lsw = result.lsw | bitMask.lsw
          If dividend.lsw < divisor.lsw Then
            dividend.msw = dividend.msw - 1
          End If
          dividend.lsw = dividend.lsw - divisor.lsw
          dividend.msw = dividend.msw - divisor.msw
        End If
      End If
      bitMask.lsw = bitMask.lsw / 2
      bitMask.lsw = bitMask.msw * $8000 | bitMask.lsw
      bitMask.msw = bitMask.msw / 2
      divisor.lsw = divisor.lsw / 2
      divisor.lsw = divisor.msw * $8000 | divisor.lsw
      divisor.msw = divisor.msw / 2
    Loop Until bitMask.msw = 0 And bitMask.lsw = 0
  
  End If
  Return
 
Last edited:

AllyCat

Senior Member
Hi,

I don't claim to understand (yet) how hippy's code works, but have spotted a couple of minor typos, "DIVIDENT" in the assignments and the remainder is not actually printed, but is correctly calculated in dividend.lsw (and potentially dividend.msw). Also, there appears to be a bug in the Program Editor which doesn't generate any value higher than 2,147,483,647 (31 bits).

I can't compete with hippy's 32 bit divisor, but have refined the code in the thread above to give a subroutine for 32 bits divided by a full 16 bits with a rather smaller "footprint" (Program and RAM size and execution time). It uses womai's technique of calculating the high result word with PICaxe's * and / , but only for the first pass, then it's all shifting and conditional subtraction. As often used in assembler versions, the result is (optionally) rotated into the numerator register to save space and time.

The basic subroutine is about 75 bytes and uses 9 bytes of RAM, or a "stripped down" version (with only a 16 bit result word) uses about 55 bytes of code, and 7 bytes of RAM. On exit, the divisor is unchanged with result and remainder words replacing the numerator. Obviously the two numerator words could be stored / poked on entry and reinstated on exit if required.

The speed of calculation (measured over 10 seconds, with a 4MHz clock) is an almost constant 5.7 results per second (i.e. about 175ms) regardless of the data. In comparison the 32 / 32 bit code varies from about 1.6 to 3.6 per second, the lowest speed when the divisor is small. I can't test the X-series shift commands, but the w2 = w2 + w2 format (to shift left) appears to be about 10% faster than w2 = w2 * 2 and twice the speed of w2 = w2 / 2 (shift right).

Code:
#picaxe 20m2				; Or any other
#no_data

symbol NUMERATOR = 2109876543 	; EDITOR ONLY VALID UP TO 2,147,483,647
symbol DIVISOR  = 10000
symbol numerhi  = NUMERATOR / $10000	
symbol numerlo  = NUMERATOR & $0FFFF	
symbol divsor   = DIVISOR   & $0FFFF

symbol topbit   = $8000
symbol reshi = w2				; Result Hi (w5 if separate register)
symbol reslo = w4				; Result Lo
symbol numhi = w3				; Numerator Hi, Returns Remainder
symbol numlo = w2				; Numerator Lo, Retuns Result Hi
symbol divis = w1				; Divisor, Unchanged on return
symbol pass  = b0				; Loop counter & carry flag

divtest:
	divis = divsor			; Divisor -> unchanged
	numhi = numerhi 			; Numerator High -> Remainder
	numlo = numerlo 			; Numerator Low -> Result Hi (optional)
	gosub	div32r32v	
	if pass = 0 then
		sertxd ("Cannot divide by zero",cr,lf) 
		endif
	sertxd("HiWord=",#reshi," LoWord=",#reslo," Remainder=",#numhi,cr,lf)
	do : loop

div32r32v:					; Entry point to test data validity
	if divis = 0 then 
		pass = 0 : return		; Can't divide by zero
		endif
div32r32:					; Divide 32 bits by 16 -> 32 (divisor <> 0)
	reslo = numhi / divis * divis	; Divisible part of High byte 
	numhi = numhi - reslo		; Subtract
	reslo = reslo / divis		; High byte of result
	for pass = 0 to 15		; Repeat for each bit position
	if numhi >= topbit then 	; Carry from top bit
		pass = pass or 128	; Add Carry flag
		endif 
	numhi = numhi + numhi		; Rotate numerator and result left
	if numlo >= topbit then 	; "Carry" bit
		numhi = numhi + 1		; Add the carry
		endif
	numlo = numlo + numlo		; Shift left
;;	reshi = reshi + reshi		; Use ONLY if reshi & numlo have separate registers
	if reslo >= topbit then
		reshi = reshi + 1
 		endif
	reslo = reslo + reslo		; Shifting done
	if numhi < divis and pass < 128 then skipsub	; Jump if can't subtract
	numhi = numhi - divis		; Subtract divisor from numerator
	reslo = reslo + 1			; Add digit to result
	pass = pass & 127			; Remove the flag from loop counter
skipsub:	
	next pass
	return
I've used this division routine in a further subroutine to "Print" 32 bit double-words in denary format (i.e. dividing by multiples of ten for up to 10 digits) but the extra code is far larger than the basic division subroutine(s)! Of course "formatting" output data can be very space-hungry, or is the bintoascii command rather inefficient?

Cheers, Alan.
 

hippy

Technical Support
Staff member
I don't claim to understand (yet) how hippy's code works, but have spotted a couple of minor typos, "DIVIDENT" in the assignments
Yes, DIVIDENT was a typo; should have corrected the source and re-copied it all rather than editing the post !

The theory and algorithm is in post #25.

The footprint probably is quite large, but using << and >> rather than any divides or multiply there will be no other multiplies or divides in the code so it should be quite fast.
 

AllyCat

Senior Member
Hi womai,

Yes indeed, I was thinking the same, but wasn't sure if a "blog" might be more appropriate. My test/demo code gives full 32 bits/16 bits with conversion/printing of results back to (up to) 10 digits of decimal (denary). However, it runs to over 350 bytes of code, hardly a "snippet".

So I was planning to start with my simplest version (maximum 15 bit divisor with maximum 16 bit result) which only uses about 37 bytes of code but looks useful for calibrating/scaling readadc10 and calibadc10 data with good resolution, etc.. But I was going to try porting it into my "battery tester" project first.

Cheers, Alan.
 
Hi Alan,

Sorry i have been away for a week. Could you post the code of your cut down version as its finally worked out and I will try it in my run time application this weekend to check it out. The 15 bit limitation should not be a burden.

Thanks to all of you for your contribution. The code section is one place to keep this but a bettter one would be in the manual itself under the maths function section, with an example (and for many of us an escape from what looked like an otherwise impossible challenge / picaxe limitation)
regards
Peter
 

AllyCat

Senior Member
Hi Peter,

That's quite a timely request, I've just been drafting the explanatory text to go with the code, which I hope to post in the "code snippets" section later this evening.

EDIT: Yes it's there now.

Cheers, Alan.
 
Last edited:
Thnaks Alan,

I actually got your older 32/16 routine going perfectly in my application thanks, which takes around 10 msecs on an 18M2 running at a MHz clock rate and makes the logic of maths so much more hassle free with the headroom it gives.

I will wait until I see the magic of your improved 32/16 version before I decide which to to finally use. I like the way you have reduced the word and byte registers required and the great documentation to go with it - very smart.

Peter
 

AllyCat

Senior Member
Hi All,

Just an update that I've now posted my "full" 32 bit by 16 bit division routine in the code snippets section.

However, it does seem that I've started "playing" with PICaxe at about the right time. When checking for "device compatibility" of the code, I was rather horrified to discover that my "little bit of code" posted there (which IMHO is written quite efficiently) is too large for devices like the 14M. :eek: Of course, it is often the output formatting (i.e. making it look pretty) that takes up much of the program space, but isn't another name for that "User Friendly"?

Cheers, Alan.
 

elektriker

New Member
Hi hippy.
In your program for '32bit div 32bit....' from 05-06-2012, 22:40 is an error in the calculation.

For example: FFFFFFFF / 0000FFFF = hi-lo 0 1 R = 0 0 --> error!
or FFFFFFFF / 1 = hi-Lo 65534 65535 R = 0 0 --> error!
Do you have a update ?

Best regards
Werner
 

hippy

Technical Support
Staff member
Do you have a update ?
Afraid not. It's all from quite some time ago so I don't have any ready answers as to why there may be bugs or where they may be.

One would have to go back to Post #25 to see if the theory and algorithm there works for those cases, and then move forward through posts #26 and #27 to try and determine where the implementation doesn't match the algorithm.

It might be necessary to step through the code a statement at a time to see where things were going astray. But I would likely work out on paper what each step of the algorithm should produce and then compare what stepping through the program produces.
 

hippy

Technical Support
Staff member
Actually, it might be that there was just a line which had gone AWOL, marked in the updated code below.

$FFFFFFFF / $FFFF -> 1 1 which is 65536*1 + 1 = 65537 which seems to be correct

$FFFFFFFF / 1 -> 65535 65535 which also seems to be correct

Code:
Symbol dividend.lsw = w2
Symbol dividend.msw = w3

Symbol divisor.lsw  = w4
Symbol divisor.msw  = w5

Symbol bitMask.lsw  = w6
Symbol bitMask.msw  = w7

Symbol result.lsw   = w8
Symbol result.msw   = w9

dividend.msw = $FFFF : dividend.lsw = $FFFF
divisor.msw  = 0     : divisor.lsw  = $FFFF
Gosub Divide
SerTxd(#result.msw," ",#result.lsw,CR,LF)

Do:Loop

End

Divide:

  result.msw = 0
  result.lsw = 0

  If divisor.msw <> 0 Or divisor.lsw <> 0 Then

    bitmask.msw = $0000
    bitMask.lsw = $0001

    Do While divisor.msw < $8000
      divisor.msw = divisor.msw * 2
      divisor.msw = divisor.lsw / $8000 | divisor.msw
      divisor.lsw = divisor.lsw * 2
      bitMask.msw = bitMask.msw * 2
      bitMask.msw = bitMask.lsw / $8000 | bitMask.msw
      bitMask.lsw = bitMask.lsw * 2
    Loop 

    Do
      If dividend.msw >= divisor.msw Then
        If dividend.msw = divisor.msw Then
          If dividend.lsw >= divisor.lsw Then
            [b]result.msw = result.msw | bitMask.msw[/b] ; <--- Was missing
            result.lsw = result.lsw | bitMask.lsw
            dividend.lsw = dividend.lsw - divisor.lsw
            dividend.msw = 0
          End If
        Else
          result.msw = result.msw | bitMask.msw
          result.lsw = result.lsw | bitMask.lsw
          If dividend.lsw < divisor.lsw Then
            dividend.msw = dividend.msw - 1
          End If
          dividend.lsw = dividend.lsw - divisor.lsw
          dividend.msw = dividend.msw - divisor.msw
        End If
      End If
      bitMask.lsw = bitMask.lsw / 2
      bitMask.lsw = bitMask.msw * $8000 | bitMask.lsw
      bitMask.msw = bitMask.msw / 2
      divisor.lsw = divisor.lsw / 2
      divisor.lsw = divisor.msw * $8000 | divisor.lsw
      divisor.msw = divisor.msw / 2
    Loop Until bitMask.msw = 0 And bitMask.lsw = 0
  
  End If
  Return
Well done and thanks for spotting the bug.
 
Top