Multiple precision arithmetic (32bit , 48bit, 64bit etc.), e.g for AD9850

matherp · Nov 18, 2013

This has been on my todo list for some time.

Sometimes we need more "bits" than the 16 in the standard picaxe word. One of the chips that has been frequently discussed is the AD9850 which needs the calculation
val = (n * 2^32)/125,000,000 solving, where n is the desired output frequency, to program the frequency on the chip.

So today I bit the bullet and have produced a series of routines that can do arbitrary precision arithmetic with routines for add, subtract, multiply, divide, increment, decrement, shift left and shift right.

In the code below this is set to 64 bits. It can be any multiple of 8 below this settable with a single symbol 'precision' and with a minor modification to the storage registers (these currently use ram locations 56 to 79) can be more.

The code has been optimised to some extent and tested on the simulator against excel and the AD9850 design calculator.

Please test and provide any suggestions for further optimisation. I'll post to the code snippets once this is finalised.

The use of the code modules should be self-explanatory but I'll try and answer any questions.

The code has been ported from the assembler given by http://avtanski.net/projects/math/

SEE NEXT POST FOR REVISED CODE

Hope this is helpful

Peter

matherp · Nov 20, 2013

I've discovered a small bug when using less than 64 bits in multiplication this has now been rectified.

I've also now written binary-to-ascii and ascii-to-binary routines which are included in the test program.

Overall the routines now allow for addition, subtraction, multiplication and division of up to 20 digit numbers (18446744073709551616 is the maximum!!) including conversion to and from text strings but also work well for 32 bit arithmetic. The example now uses 56 bit arithmetic which is what is needed for the AD9850.

All calculations can be checked on the website http://www.javascripter.net/math/calculators/100digitbigintcalculator.htm

The code is now too long to include in line so is attached. The code has now been tested on a 14M2 and 20X2 so should work on any of the more modern chips. As before any feedback would be useful

AllyCat · Nov 20, 2013

Hi Peter,

(Written before your second post).

That's interesting, I'm sorry there have been no other replies. It's probably worth copying the post (or a moderator moving the thread) to the "completed projects" or "code snippets" section, before it vanishes under the weight of "Active" threads. I'm sure that somebody (maybe me) will find it useful in the future.

However, I guess the code runs rather slowly (which of course is not necessarily a problem); I tried your posted code in the simulator and it took about 3 minutes for the multiply and 5 minutes for the divide (of 64 bit words). For comparison, the 32 bit by 16 bit divide routine which I posted last year executes in about 10 seconds in the simulator and 100 ms on a real PICaxe at 4 MHz.

As far as I can see the code doesn't use any "high level" PICaxe instructions, so in addition to the Basic Compiler overhead of about 300 times (slower) than a "real" PIC using machine code (or Assembler), it also needs to emulate the Carry/Borrow/Overflow flags.

As you asked for suggestions (so this is not a criticism), where speed might be an issue, a trivial improvement is that k = k + k is marginally faster than k = k * 2 (but there's no equivalent for the time-intensive right-shifts). A more significant "improvement" might be to work with words rather than bytes (to run about 4 times faster), taking advantage of PICaxe's higher level instructions. At first sight it needs little more than replacing the @BPTRDEC with 256 * @BPTRDEC + @BPTRDEC etc., (of course I've chosen the easy part, INCs etc. are not quite as convenient). I'm not sure how much more variable space would be required, but there are the 6 or 7 extra S_Wx variables with M2s, if needed.

Cheers, Alan.

matherp · Nov 20, 2013

Thanks Alan

I've tidied up the code, optimised as much as I am able, and much reduced the variable usage and posted in code snippets as suggested.

Run times for 64bit multiply and divide on a 20X2 at 64Meg are now something less than 0.5 seconds

best regards

Peter

AllyCat · Nov 21, 2013

Hi Peter,

Wow, you've been busy! I note that on an M2 the latest code is 33 bytes smaller than for an X2. Of course it would nearly fill an M2, but with the second slot now available in PE6, that might not be an issue.

Sometime, I might try to adapt your code to use word variables, particularly as it appears that the only current use of word variables is to calculate the Carry flags.

However, the reason for my particular interest is that, IMHO, the greatest "weakness" of PICaxe Basic maths is with division, so I really must look some more at Jeremy Leach's algorithm which appears to converge stunningly quickly and make us "bit-shifters" look like real amateurs.

Cheers, Alan.

matherp · Nov 22, 2013

Alan

I've written a word based version as an experiment which I've posted in the code snippets.

Timings in milli-seconds on a 64Meg 20X2 are:

Bits Word-based(mul/div) Byte-based(mul/div)
64 195/339 238/401
56 NA 211/295
48 152/160 184/238
32 95/78 115/76

Obviously all timings are test-data dependent to some extent as the number of add/subs change with the data.

Multiple precision arithmetic (32bit , 48bit, 64bit etc.), e.g for AD9850

matherp

Senior Member

matherp

Senior Member

Attachments

AllyCat

Senior Member

matherp

Senior Member

AllyCat

Senior Member

matherp

Senior Member