Serial Controller for KS0108B GLCD

nick12ab

Senior Member
I need help for getting text onto a KS1080B 128x64 GLCD as sold by Techsupplies at a decent speed. The trouble is - this controller doesn't seem to have a built-in capability for text. Putting text onto the display takes up much of the PICAXE (40X2) memory just for the numbers to define which pixels need to be set for which letters.

I decided that to overcome this I could use a serial driver chip for the display which is capable of putting text on it. I checked out the FGC thing offered on Techsupplies but it has flaws and is expensive. Its flaws are slow serial, and only allowing 7 lines of text when the display is capable of 8. I started producing my own version on the PICAXE-20X2 which lacks the circle-drawing and other features of the GLIC-K1 but has text on 8 lines, a PWM pin used for controlling the backlight (command: 249, byte for level), and allows direct control of the LCD as well through the high speed serial interface.

Only problem - it is very slow at printing the characters on screen... well not very slow for human standards but slow to computer standards. When the serial connection is disconnected, it continues printing the text which shows that all the data has already been received through hardware serial.

The code is attached as it is too big to show here. I'm thinking I may need to go for a lookup table with the command of the same name but typing that out will take an eternity so I'm looking for advice first. I have also included a video to demonstrate it. Those who own a GLIC-K1 may be able to tell me whether it provides a faster speed or not. Either way, I want this to look more professional.

The latest version of the code can be found at the end of this thread. The video has a PDF extension so that it can be uploaded here but you must change the extension back to 3gp to watch it. An easy way to do this is to right-click on the attachment link, click Save Link As, in the save dialog that appears, change the file extension filter to 'All Files' and add '.3GP' to the end of the file name in the file name box (after removing the '.PDF')
 

Attachments

Last edited:

MartinM57

Moderator
Looks like the beginnings of a good job ;)

It's tricky to suggest performance improvements to uncommented spaghetti code (sorry, but that's what it looks like).

Things that jump out to me straight away involve writesmallchar:
- every IF statement is done every time, whatever the incoming value. Use ELSEIF (if you don't blow some limit), a SELECTCASE or even a GOTO (maybe ;)) to stop further IFs being tested when you've got the answer already
- ...or maybe there's a lookup table just dying to be set up and used that will make it about two lines of code

You're also using what I would call "high overhead" instructions (it may be that they are super efficient, but not all of them are):
- DIG
- LOOKUP

There also some tricks such as instead of
Code:
	if pinC.3 = 1 then
		let b54 = b54 + 4
	end if
you can use
Code:
let b54 = 4 * pinC.3 + b54
to avoid an IF test (and 2 bytes of code space)

You'll have to do some timing yourself - write some small test programs with 1000 iterations of a command, or a sequence of your commands, that you play with to see if there are better algorithms for certain parts of the code

I'm sure others will pitch in....I expect there's a lot that can be done....
 

hippy

Technical Support
Staff member
I'd agree with all that MartinM57 says, particularly with 'writesmallchar', using high overhead code, and I also think a lot can probably be done.

select b55
case 0,2,4,6,8 : high cs1 : low cs2
else : low cs1 : high cs2
endselect

Change b55 to b0 and you can use -

If bit0 = 0 Then
High cs1 : Low cs2
Else
Low cs1 : High cs2
End If

You can replace -

lookup b55,(0,6,12,18,24,30,36,42,48,54),b54

With -

b54 = b55 * 6

Or even -

b54 = b55 << 1 | b55 << 1

Use the same variable ( replace b54 with b55 ) and it's even quicker.

Likewise replace all multiplication and division by power-of-two with shifts.

You can probably replace -

select charaddress
case 000 to 019 : b55 = 0
case 020 to 039 : b55 = 1
case 040 to 059 : b55 = 2
case 060 to 079 : b55 = 3
case 080 to 099 : b55 = 4
case 100 to 119 : b55 = 5
case 120 to 139 : b55 = 6
case 140 to 159 : b55 = 7
endselect
let lcddata = b55 or %10111000

With -

lcddata = charaddress / 20 Or %10111000

Other SELECT-CASE statements and multiple IF's can be replaced by BRANCH or ON-GOTO.

One thing which looks particularly odd is the way you're handling high-speed serial in. Process data as it arrives on-the-fly and you should improve performance greatly.

Inline as much code as you can -

lcddata = @ptr
gosub enable

Can become -

let pinsB = @ptr
let pinC.1 = bit30
pulsout enablepin,1

Not sure what "pinC.1 = bit30" does but if it's not required at this time then don't include it. Same too with other redundant commands such as "@ptr = 0"; no need to clear the receive buffer when you'll be overwriting it.

There are a number of minor tweaks, though they probably don't gain much in these cases.

if lcddatax = 1 then
let lcddatay = 0
elseif lcddatax = 0 then
let lcddatay = 255
end if

Can become -

let lcddatay = lcdatax - 1

And -

inc pagecounter
if pagecounter < 8 then x003
let pagecounter = 0

Can become -

pagecounter = pagecounter + 1 & 7
If pagevounter <> 0 Then x003

Optimising code ( for speed or size ) is a bit of an art form and does require a certain degree of experience. The trick is in moving out anything which doesn't need to be done, and using the most efficient means one can find.
 

westaust55

Moderator
If you have a look in the thread I ran a couple of years ago for using the gLCD from a Siemens A55 mobile/cell phone as well as the usual font set, I had developed some code for different (double) sized fonts together with drawing lines and circles.
It was all a bit slow (2 sec refresh rate using a 40X1 at 8MHz) but that could be improved with the newer X2&#8217;s having faster internal speed capabilities

Post around 12 and 14 cover the alpha-numeric fonts and post 18 plus 19 added the Greek font. - Uses pointers into EEPROM rather lots of IF...THEN tests
Post 21 and 22 cover double sized fonts.
Post 23 is line drawing and post 29 to 31 added &#8216;rough&#8217; circles.

Thereafter a general read. Some simple applications developed (plot a varying voltage, compass etc)
 
Last edited:

nick12ab

Senior Member
Update

Same too with other redundant commands such as "@ptr = 0"; no need to clear the receive buffer when you'll be overwriting it.
@ptr = 0 is actually necessary because executebyte is set to @ptr then ptr is incremented so the next time @ptr is accessed, it is the next byte so that a command can use the number that follows it. @ptr is set to 0 because otherwise the number used in a command would be executed seperately, so if you set the backlight to 65, you send it 249,65 and 249 is set as the executebyte which is used in the case statement and 65 is saved as the backlight setting, but without @ptr = 0, the backlight would be changed AND a letter A would appear at the end of whatever text was on-screen.

MartinM57 said:
It's tricky to suggest performance improvements to uncommented spaghetti code (sorry, but that's what it looks like).
I have re-uploaded the code now with comments, and implemnented some of Hippy's suggestions.

Hippy said:
Not sure what "pinC.1 = bit30" does but if it's not required at this time then don't include it.
Remember B.6 is used for hardware serial input so a different in has to be used for DB6 on the LCD

Hippy said:
One thing which looks particularly odd is the way you're handling high-speed serial in. Process data as it arrives on-the-fly and you should improve performance greatly.
The high-speed serial is much faster than the execution so if they were processed on-the-fly, the master processor would be stuck sending it data for as long as it takes to show it rather than quickly sending the lot and getting on with something else. The idea is that the PICAXE has a short timeout before starting execution and the timer is reset until no new data comes in so it can receive an amount of data as big as the scratchpad without any noticeable delay before execution starts.

Hippy said:
If bit0 = 0 Then
High cs1 : Low cs2
Else
Low cs1 : High cs2
End If
Doesn't work. The case statement is not used on the least significant digit, it is used on the next digit.

Hippy said:
You can probably replace -

select charaddress
case 000 to 019 : b55 = 0
case 020 to 039 : b55 = 1
case 040 to 059 : b55 = 2
case 060 to 079 : b55 = 3
case 080 to 099 : b55 = 4
case 100 to 119 : b55 = 5
case 120 to 139 : b55 = 6
case 140 to 159 : b55 = 7
endselect
let lcddata = b55 or %10111000

With -

lcddata = charaddress / 20 Or %10111000
Your solution worked and saved over 126 bytes! So Select Case seems to be a pig when it comes to program memory.

Branching sounds like a good idea but it would take forever to type in labels...
Code:
branch character,(c001,c002,c003,c004,c005,c006,c007,c008,c009,c010,c011,c012,c013,c014,c015,c016,c017,c018,c019,c020,c021,c022,c023,c024,c025,c026)
but then again the lookup tables will take even longer so Branch sounds like a good idea.

For anyone who wonders why writesmallchar is called that and not writechar, it is because if enough program memory gets freed up, I want to add a large character set as well. That would give it a major advantage over the GLIC-K1 when it comes to text.

Instruction set has to be moved here because it makes the .BAS file bigger than the limit.
0-127 : Print text
245 : Next byte sets y address
246 : Next byte sets vertical position
247 : Set all pixels on
248 : Clear all pixels
249 : Next byte sets backlight
250 : Chip Select 1 + 2
251 : Chip Select 1
252 : Chip Select 2
253 : Set Address for text (0 to 159)
254 : Send next byte to LCD as instruction
255 : Send next byte as data for LCD
 
Last edited:

hippy

Technical Support
Staff member
Hippy said:
If bit0 = 0 Then
High cs1 : Low cs2
Else
Low cs1 : High cs2
End If
Doesn't work. The case statement is not used on the least significant digit, it is used on the next digit.
Are you sure ? The original code sets "high cs1" for even numbers, "low cs1" for odd numbers, and so too does this.

This is one of those cases where one has to look at what is originally being done and then see another way of expressing the same thing in a different way. In this case not worrying about what the numbers are but spotting what they have in common.

I'll admit I could be wrong as I have no idea what's in b55 or what it means!
 
Last edited:

nick12ab

Senior Member
Are you sure ? The original code sets "high cs1" for even numbers, "low cs1" for odd numbers, and so too does this.

This is one of those cases where one has to look at what is originally being done and then see another way of expressing the same thing in a different way. In this case not worrying about what the numbers are but spotting what they have in common.

I'll admit I could be wrong as I have no idea what's in b55 or what it means!
Yes, sorry, it does work. I think I forgot to move the variable that was once in b55 to b1 (used bit8 instead here).

Also removed this unnecessary
Code:
let addresscounter = addresscounter + 6 AND %00111111
low rs : let lcddata = %01000000 OR addresscounter
let pinsB = lcddata
let pinC.1 = bit30
pulsout enablepin,1
All of the updates I've made with your support have had a noticeable effect on the speed of the display.

Uploaded updated code with comments.
 
Last edited:

nick12ab

Senior Member
Update

Spent lots of time converting the huge stack of IF statements into lookup tables and optimising other parts of the code. Converted the case statement in the execute subprocedure to an if statement and moved the subprocedure to the loop routine that executes it as nothing else executes it. It massively reduced the program size down to 1359 bytes and reduced the time it takes to show its entire character set on-screen from around 2 seconds to 1.5 seconds.

You're also using what I would call "high overhead" instructions (it may be that they are super efficient, but not all of them are):
- DIG
Why's DIG such a high overhead command? It doesn't really do much. I'm guessing result = number DIG 0 is converted to
Code:
result = bintobcd variable AND %00001111
or something like that as that code takes up the same amount of program memory and functions the same. Alternative for DIG 1 is bintobcd variable which uses one less byte and I use bit12 instead of bit8 to access the bit I need.

Here's new code.
 

Attachments

MartinM57

Moderator
We're not privvy to how efficient the "high level" commands are, and to most people it doesn't matter, but if you're going for ultimate spped I'm sure it's something to be aware of.

You can do some experiments if you wish with a 10,000 loop 5 line program

sertxd ("started", CR, LF)
for w0 = 1 to 10000
<your statement(s)>
next
sertxd ("finished", CR, LF)

Run it first with no statements in the middle to get the looping overhead and then put whatever you want inside. Manually measure the time taken (or make it cleverer and print out the time)

Report back with what you find - it's been done before many times but maybe not for these "high level" commands

Maybe DIG 0 really is done by "result = bintobcd variable AND %00001111", but whose to say that bintoBCD is "quick"?
 

hippy

Technical Support
Staff member
Spent lots of time converting the huge stack of IF statements into lookup tables and optimising other parts of the code ... It massively reduced the program size down to 1359 bytes and reduced the time it takes to show its entire character set on-screen from around 2 seconds to 1.5 seconds.
Looking good and code size falling, speed going up, is a good sign you're on the right track. Your code's looking good.

There's another trick you can use to improve the speed with the four LOOKUP into cb1-cb4, which are similar to ...

LookUp character, ( $01, $02, $03, $04 ), cb1
LookUp character, ( $2A, $2B, $2C, $2D ), cb2

Because you can use word values as numbers you can replace the above with the following where wX is a word variable -

LookUp character, ( $012A, $022B, $032C, $042D ), wX
cb1 = wX >> 8
cb2 = wX & $FF

IF cb2 is a byte variable then you can drop the $FF. Better, if you SYMBOL define cb1 and cb2 as the msb and lsb of wX respectively you won't need either assignment, the one LOOKUP will fill create both bytes automatically.

LOOKUP takes a while to execute, proportionate to the number of entries, so if you can split those in half you can gain speed. The above, indexed from 0 to 3 -

LookUp character, ( $012A, $022B, $032C, $042D ), wX

could become -

If character <= 1 Then
LookUp character, ( $012A, $022B ), wX
Else
wX = character - 2
LookUp wX, ( $032C, $042D ), wX
End If


Why's DIG such a high overhead command? It doesn't really do much. I'm guessing result = number DIG 0 is converted to result = bintobcd variable AND %00001111 or something like
You're right. "x = y DIG 1" likely translates to something which is equavalent "x = y / 10 // 10", but both division and modulus ( and multiplication ) are pretty slow in comparison to addition, shift and mask operations. If you can arrange for the maths to only use those commands you can get a good speed improvement. Making numbers easy to manipulate this way often means having to form the numbers in the right way to start with; base 10, decimal, isn't always best.

If you have a 4 line 20 digit display ( line 0-3, column 0-19 ) it's tempting to address each character position as 0 to 79 ...

line = ?
column = ?
position = line * 20 + column

And to reverse that -

position = ?
line = position / 20
column = position // 20

However if you hold 'position' as binary '%llccccc' where 'l' is line and 'c' is character it's much quicker, no multiply, no division, no modulus ...

line = ?
column = ?
position = line << 5 + column

position = ?
line = position >> 5
column = position & %0011111

BCD numbers are simply a variation on the same theme. The inefficient (slow) decimal way ...

msd = ?
lsd = ?
number = msd * 10 + lsb

msd = number / 10 ' or msd = number DIG 1
lsd = number // 10 ' or lsd = number DIG 0

The fast way -

msd = ?
lsd = ?
number = msd << 4 | lsd

number = ?
msd = number >> 4
lsd = number & %1111

I haven't investigated how you are using 'charaddress' but applying the above will get you yet another speed gain.
 
Last edited:

nick12ab

Senior Member
Looking good and code size falling, speed going up, is a good sign you're on the right track. Your code's looking good.

There's another trick you can use to improve the speed with the four LOOKUP into cb1-cb4, which are similar to ...

LookUp character, ( $01, $02, $03, $04 ), cb1
LookUp character, ( $2A, $2B, $2C, $2D ), cb2

Because you can use word values as numbers you can replace the above with the following where wX is a word variable -

LookUp character, ( $012A, $022B, $032C, $042D ), wX
cb1 = wX >> 8
cb2 = wX & $FF

IF cb2 is a byte variable then you can drop the $FF. Better, if you SYMBOL define cb1 and cb2 as the msb and lsb of wX respectively you won't need either assignment, the one LOOKUP will fill create both bytes automatically.
Will do as you say. Here are the byte assignments for cb1,cb2,cb3,cb4,cb5:
symbol cb1 = b7
symbol cb2 = b8
symbol cb3 = b9
symbol cb4 = b10
symbol cb5 = b11
so I will move them so if I eventually add a cb6 for 6x8 box drawings, that can also use your lookup system as word variables use an even byte then an odd-numbered byte:
w0 = b1:b0
w1 = b3:b2
w2 = b5:b4
w3 = b7:b6
w4 = b9:b8
w5 = b11:b10
w6 = b13:b12
w7 = b15:b14
 

nick12ab

Senior Member
Hippy's idea of using words in the lookup table reduces execution time from 1.5 seconds to 1.2 seconds for the entire character set (120 characters) and reduces program memory usage from 1359 to 1289. That large character set now seems much more possible. A macro assigned to the keyboard (functionality not provided by PE) made light work of converting the lookups. :D

MartinM57 said:
We're not privvy to how efficient the "high level" commands are, and to most people it doesn't matter, but if you're going for ultimate spped I'm sure it's something to be aware of.

You can do some experiments if you wish with a 10,000 loop 5 line program

sertxd ("started", CR, LF)
for w0 = 1 to 10000
<your statement(s)>
next
sertxd ("finished", CR, LF)

Run it first with no statements in the middle to get the looping overhead and then put whatever you want inside. Manually measure the time taken (or make it cleverer and print out the time)

Report back with what you find - it's been done before many times but maybe not for these "high level" commands

Maybe DIG 0 really is done by "result = bintobcd variable AND %00001111", but whose to say that bintoBCD is "quick"?
Ran a modified version of the code above and used the in-built timer. Didn't want to calculate the value I'd need for a multiple of 10 seconds for the timer so the unit used is 1/16 seconds. I used the serial LCD in development as a terminal and a second PICAXE-20X2 to perform DIG 0, DIG 1, DIG 2, then DIG 3 on the number 1705 chosen for no particular reason 65535 times for each one. Amazingly, they don't all take the same amount of time to execute and their times stay the same each time the test is run.
dig 0 = 21.2 seconds
dig 1 = 21.1 seconds
dig 2 = 17.3 seconds
dig 3 = 14.4 seconds

Here's the new BASIC code and a picture to show the GLCD showing that each command takes a different tile to execute 65535 times. This also gave me the realisation that the box drawings need re-designing so that the lines are in the middle of each character space not at the edge.

For large characters (7x11), would it run faster if they were stored one byte per row of character or two bytes per column?
 

Attachments

Last edited:

nick12ab

Senior Member
Update: Large Characters, ProgressBar and Fill Line

Large characters now programmed and they are a set of 7x11 characters and the display can fit 16x4 characters. Currently the big set lacks lower case letters and many symbols that the small set has. Now I look at it on the display, I wonder why not make character LCDs with 7x11 characters instead of 5x7 characters? Anyway, here's the character sets:
Large:

Small:

Also added a progressbar command: 244,line,size(0 to 127). This command fills a line according to the size variable with pixels to give a progressbar 8 pixels high. Inversion also added as currently the only bit of the new settings byte set with command 192,byte. The inversion works for both text sets, the progressbar and clear screen commands so the fill screen command is no longer needed and is replaced with a clear/fill line command which also supports inversion.

If you have a 4 line 20 digit display ( line 0-3, column 0-19 ) it's tempting to address each character position as 0 to 79 ...

line = ?
column = ?
position = line * 20 + column

And to reverse that -

position = ?
line = position / 20
column = position // 20

However if you hold 'position' as binary '%llccccc' where 'l' is line and 'c' is character it's much quicker, no multiply, no division, no modulus ...

line = ?
column = ?
position = line << 5 + column

position = ?
line = position >> 5
column = position & %0011111
Implemented that on the large characters and will eventually do it on the small characters. It was perfect for the large characters as there are 16 per line so they can be numbered sequentially and use this system, making it easy and efficient.

Svejk said:
A complete 28x2 implementation for driving GLCD may be found at glcd.svejklabs.cz.cc.

The application includes dot, line, box, circle, image and customizable fonts. Have a look.
Nice suggestion but if I blindly copy someone else's work I won't learn much. If I do most of the programming myself, I learn more and it is easier to add any feature that yours doesn't have because I know which variables are which and what all the pin symbols are. Besides, I don't have any 28X2s and have no spare 40X2s which is why I'm doing it on a 20X2.
 

Attachments

Last edited:

westaust55

Moderator
If I may make a suggestion, since you are creating your own font,
for the lower case characters: g, j, y plus p and q
try dropping them one pixel so that you have descenders.
Compare a few words on your screen with these in typical raised position and with descended "tails" and see what you think. ;)
 

nick12ab

Senior Member
If I may make a suggestion, since you are creating your own font,
for the lower case characters: g, j, y plus p and q
try dropping them one pixel so that you have descenders.
Compare a few words on your screen with these in typical raised position and with descended "tails" and see what you think. ;)
I thought about doing that before I started the entire thing but graphic LCDs have all their rows tightly next to each other rather than being grouped into groups of 8 like on a character LCD so if there were descenders, it would join onto the character below like the box drawings do. The characters are 5x7 with space for 6x8 and these 6x8 boxes have to be right next to each other vertically as the LCD is arranged in rows that are 8 pixels tall.
 

westaust55

Moderator
I thought about doing that before I started the entire thing but graphic LCDs have all their rows tightly next to each other rather than being grouped into groups of 8 like on a character LCD so if there were descenders, it would join onto the character below like the box drawings do. The characters are 5x7 with space for 6x8 and these 6x8 boxes have to be right next to each other vertically as the LCD is arranged in rows that are 8 pixels tall.
True but in reality how often is there a descender "tail" above a character which uses the top row of pixels ?

Have a look at what I did with an A55 gLCD out of a mobile phone which has descenders on lower case characters: http://www.picaxeforum.co.uk/attachment.php?attachmentid=1468&d=1218253143
 

hippy

Technical Support
Staff member
There's no perfect answer for true-descenders, nor character tile height and width, but even the AXE033 and other text LCD's can benefit by using a single pixel descender ( see code below ).

The best solution for limited storage is to chose a few tile sizes and fonts, allow the code to be compiled with #DEFINE's indicating which fonts to include. Users can then choose which to use or add more. Write the code so it will handle any tile size ( within reason ) and include that info with the font.

Note that inter-character and inter-line gaps don't have to be included in the font tiles themselves but can simply be set by how tiles map to GLCD x,y coordinates.

I think it's always nice to add / allow a proportional font. That will make the code more complicated but probably worth it.

Code:
#Picaxe 28X2

Symbol LCD  = C.0
Symbol BAUD = N2400

Pause 1000

SerOut LCD, BAUD, ( 254, %01000000 )
SerOut LCD, BAUD, ( %100000 )
SerOut LCD, BAUD, ( %100000 )
SerOut LCD, BAUD, ( %101101 )
SerOut LCD, BAUD, ( %110011 )
SerOut LCD, BAUD, ( %110001 )
SerOut LCD, BAUD, ( %110001 )
SerOut LCD, BAUD, ( %101111 )
SerOut LCD, BAUD, ( %100000 )

SerOut LCD, BAUD, ( 254, %01001000 )
SerOut LCD, BAUD, ( %100000 )
SerOut LCD, BAUD, ( %100000 )
SerOut LCD, BAUD, ( %110001 )
SerOut LCD, BAUD, ( %110001 )
SerOut LCD, BAUD, ( %110001 )
SerOut LCD, BAUD, ( %101111 )
SerOut LCD, BAUD, ( %100001 )
SerOut LCD, BAUD, ( %101110 )

SerOut LCD, BAUD, ( 254, %01010000 )
SerOut LCD, BAUD, ( %100000 )
SerOut LCD, BAUD, ( %100000 )
SerOut LCD, BAUD, ( %101111 )
SerOut LCD, BAUD, ( %110001 )
SerOut LCD, BAUD, ( %110001 )
SerOut LCD, BAUD, ( %101111 )
SerOut LCD, BAUD, ( %100001 )
SerOut LCD, BAUD, ( %101110 )

SerOut LCD, BAUD, ( 254, %10000000, "L", 8 ,"d", 9 ," G", 8 , 10, 8  )
SerOut LCD, BAUD, ( 254, %11000000, "L","a","d","y"," G","a","g","a" )

End
 

westaust55

Moderator
I think it's always nice to add / allow a proportional font. That will make the code more complicated but probably worth it.
The image I reference above was using a proportional font so that narrow characters such as "i" and "t" use a couple less pixels in width.

Yes, it does need slightly more program but easy to achieve if the font is set up accordingly at the start.
 

nick12ab

Senior Member
True but in reality how often is there a descender "tail" above a character which uses the top row of pixels ?
In your picture the h in 'graphics' has a y with descender growing out of the top of it and when displaying the entire character set in order, you're unlikely to do that as few lower-case letters use the top row but numbers, symbols and all capital letters use the top row.

If I add lowercase letters to the large character set, I can do that and I will because there's a 5-pixel gap between each row of characters so it will be possible to use 3 or 4 of those without making it look like they're joined on to each other.

hippy said:
I think it's always nice to add / allow a proportional font. That will make the code more complicated but probably worth it.
The problem is that the PICAXE is already very slow and getting it to change the positioning for the character addresses after a thin letter is used will be a pain especially considering that the GLCD is divided into two 64x64 areas side-by-side which are selected using Chip Select and there will need to be code that checks a certain variable after each byte is sent to the GLCD to move onto the other chip. Fixed width and a satisfactory speed is better than variable width and horribly slow speed.

Big question now... WHY WHY WHY did the designers make the PICAXE system an interpreter based system and not just a bootloader-based system? If the software claims it can convert BASIC to Assembler, then they could have been capable of just having a bootloader that can download the compiled code if download is pressed before applying power to the PICAXE and then execute the code like a normal PIC. Maybe meLabs paid Rev-Ed to not do that because their massively overpriced PicBASIC compiler would no longer get any sales.
 

MartinM57

Moderator
You're welcome to wander off to Arduino-land if you feel hard done by with Rev-Ed's technical decisions (and the background to PICAXEs that you (and many of us) may not be aware of) and you want a bootloader-based relatively easy-to-use entry point to more powerful technology.

Using the wrong tool for a job is as much/more a fault of the person choosing the tool than the deficiencies in the tool itself...;)
 

nick12ab

Senior Member
You're welcome to wander off to Arduino-land if you feel hard done by with Rev-Ed's technical decisions (and the background to PICAXEs that you (and many of us) may not be aware of) and you want a bootloader-based relatively easy-to-use entry point to more powerful technology.
That may be what I need to do if I want to make the text speed on the GLCD any faster. The standard Arduino has 28 pins so will be suitable for the GLCD driver and the chip alone costs around the same as a PICAXE-28X2. Better still, basic to c converters exist and buying a new microcontroller system will be worth it when the 16MHz Arduino is probably around the equivalent of a 3GHz PICAXE.

Back to PICAXE, it's still odd that they choose to make an awkward slow interpreter which can suffer from glitches and faults rather than a bootloader which would have probably made PICAXE less hated among proper technicians. But I should stop complaining about it and work on ways to make execution of the GLCD driver faster. The PICAXE was originally designed for education which it is good for and it is far better than that other microcontroller thing New Wave Concepts released a couple of years ago.

It may be a good idea to upgrade to a 28X2 afterall, which would mean I would have a port which is not affected by hardware serial so no more setting a pin on another port which will increase speed and I can use an external crystal and overclock it to 80MHz or even 96MHz using a 20/24MHz crystal, the PLL and no loading capacitors. I have successfully ran 40X2s at those speeds and the 28X2 should be even better for this as there is less distance between the pins of the package and the silicon chip inside.
 

MartinM57

Moderator
...a new microcontroller system will be worth it when the 16MHz Arduino is probably around the equivalent of a 3GHz PICAXE.

Back to PICAXE, it's still odd that they choose to make an awkward slow interpreter which can suffer from glitches and faults rather than a bootloader which would have probably made PICAXE less hated among proper technicians...
Interesting - got any real facts on either of those (genuinely interested)....
 

nick12ab

Senior Member
Interesting - got any real facts on either of those (genuinely interested)....
The technicians bit - I asked a technician which was more awful out of PICAXE, Basic Stamp and Genie (the New Wave Concepts thing), and he said that both PICAXE and Genie are equally awful but PICAXE offers the option of a high level language. I do need to find some more technicians to ask as a claim of "hated by 100% of technicians" isn't fair when I only asked one.

The execution speed bit - a microcontroller book I have which covers pic12,pic16,pic17 and pic18 claims that they execute 1,000,000 instructions a second at 4MHz and the PICAXE manual says that the PICAXE only does 1,000 basic instructions a second at 4MHz. Assembler offers commands that move data into or from accumulator and other variable storage, and performs mathematical operations. A high/low PICAXE command would probably be getting the pins-whatever value from the port in question, getting the bit that you want to set/clear, apply that to the variable and then move that variable back into the port. That makes 4 instructions, or 250,000 compiled BASIC instructions, giving an execution speed ratio of 250:1. *16 makes that 4000 or 4GHz. Derated that to 3GHz in case the PICAXE has a more efficient value for it.

And the interpreter glitches... that's why SETFREQ k31 requires a workaround and new versions of each PICAXE are often released to fix bugs, which you of course have to shell out for if you can't live with the bugs because Rev-Ed don't release the code for free, but if you had a PIC programmer, you wouldn't be using PICAXE.

Anyway, this thread was meant to be about the GLCD Serial Driver which connects to the KS0108 controller which multiplexes 64 lines at less than 1MHz operating speed. The execution speed problems the PICAXE have only become an issue for this... it's hopeless, I'll never be able to think of something positive, but it isn't because if I can harness the PICAXE's power in the right way, I won't need to move to Arduino (possibly).

Maybe we shouldn't argue anymore as that is not the purpose of this thread. Maybe a forum thread or section for command efficiencies and harnessing the power of the PICAXE should be created and that would resolve (almost) everything.

Edit: The 3/4GHz claim - just realized that I calculated it for 64MHz and not 16MHz. So it's actually 1GHz.
 
Last edited:

Dippy

Moderator
"Maybe we shouldn't argue anymore "
- excellent idea. Especially as your execution speed comparisons will send you up a cul-de-sac as they cannot be compared like for like. A simple comparison like that is just simply wrong.
Do you really think that a compiled command is one 'instruction'?

Glitches?
Please tell me of a compiler that hasn't got glitches.
I use 2 C compilers and 2 BASIC compilers. All have glitches.
Some are thanks to Microchip, but there we are.
The only advantage with a compiler is that you can, in many cases work out your own workaround - especially useful after you find a 30 page Errata document :)
Most of us just get on with it.

Anyway, as you say, back to the thread....(after a few more people have leapt onto this argument I'll bet) ;)
 

hippy

Technical Support
Staff member
When optimising for speed, the key issue is identifying where things may be improved for speed. The two key areas will be on a per-character basis ( getting that character, determining its meaning, getting a character tile ) and in putting that character tile data out to the display. For the later, your code ...

let lcddata = cb1 ^ invertbyte
let pinsB = lcddata
let pinC.1 = bit30
pulsout enablepin,1

There seems to be nothing that can be done about the "pinC.1 = bit30" using a 20X2 as there isn't a byte-wide port which has all pins available for output use. That's a design decision which has to be lived with but using 28X2 would considerably improve things there.

I'm not sure how invertbyte is used but would guess it inverts the display ( black-on-white / white-on-black ). It may be worth considering if that functionality is actually needed, and to have separate routines to output inverted and non-inverted characters if it is.

Because of the 20X2 design decision it's probably not easy to optimise the code when inverted beyond another set of LOOKUP tables for inverted characters, but the output code for non-inverted at least could change from -

let lcddata = cb1
let pinsB = lcddata
let pinC.1 = bit30
pulsout enablepin,1

To -

let pinsB = cb1
let pinC.1 = bit?
pulsout enablepin,1

with optimal placement of 'cb1' as 'b0' to 'b3'. That will get the original 15 tokens of code down to 9 for the 20X2, and would be just 6 for 28X2. That would give a notional 30% speed improvement for the 20X2, 60% speed improvement for a 28X2, for these parts of the code.

There's however a huge overhead in the long LOOKUP's still. Splitting those up as previously suggested will likely slash execution times in half or more.

There seems to be 768 bytes of character tile data per font and it may be possible to go from a 128 character font to fewer; if you can get that tile data down to 512 bytes you can put it in internal Eeprom and Table and make the lookup blindingly fast compared to the current LOOKUP's. A mix of Eeprom, Table and LOOKUP would provide higher average execution speed. There's also the option to move the character tiles to external I2C Eeprom which would still be quicker than long LOOKUP's and be consistent for all characters.

Traditionally, most code is developed top-down, taking what one wants to achieve and deciding how to achieve it. For a high-speed design that often needs to be more bottom-up, determining how something can be done speedily and then fitting the program to that to achieve it. The fastest code for strobing the data out would likely be ...

Read tileptr, pinsB : PulsOut enablepin, 1 : Inc tileptr
Read tileptr, pinsB : PulsOut enablepin, 1 : Inc tileptr
Read tileptr, pinsB : PulsOut enablepin, 1

The challenge is how to achieve that in practice, how to make that so in most cases with slower code more rarely used. It looks to me that there are still plenty of options to significantly improve the execution speed of the code you have.
 
Last edited:

nick12ab

Senior Member
I'm not sure how invertbyte is used but would guess it inverts the display ( black-on-white / white-on-black ). It may be worth considering if that functionality is actually needed, and to have separate routines to output inverted and non-inverted characters if it is.

Because of the 20X2 design decision it's probably not easy to optimise the code when inverted beyond another set of LOOKUP tables for inverted characters
That is correct. Invertbyte is used to invert the display for text, clear screen, clear line and progressbar. The feature does look much better on-screen than it sounds. Removing the inversion feature would give the GLIC-K1 one more advantage over mine.



28X2 seems to be the way to go though. Another thing about the 20X2 is that the microchip datasheets shows that the port pins are arranged in a strange pattern different from the PICAXE pattern but on the 28X2, the real pattern is the same as the PICAXE pattern so this could also be slowing down execution on the 20X2.
 

hippy

Technical Support
Staff member
Another thing about the 20X2 is that the microchip datasheets shows that the port pins are arranged in a strange pattern different from the PICAXE pattern but on the 28X2, the real pattern is the same as the PICAXE pattern so this could also be slowing down execution on the 20X2.
That is true, though it should only be a matter of a few microseconds, but where every microsecond counts it can have an impact.

Another advantage of the 28X2 may be the multiple internal slot programs. Though there will be some overhead in execution jumping between slots, the extra code space may make it more easy to optimise code for speed without having to worry about memory constraints.

Most programs will use loops and subroutines to keep the program readable and smaller, but that can be ( and sometimes needs to be ) sacrificed for speed gains.
 

nick12ab

Senior Member
That is true, though it should only be a matter of a few microseconds, but where every microsecond counts it can have an impact.

Another advantage of the 28X2 may be the multiple internal slot programs. Though there will be some overhead in execution jumping between slots, the extra code space may make it more easy to optimise code for speed without having to worry about memory constraints.

Most programs will use loops and subroutines to keep the program readable and smaller, but that can be ( and sometimes needs to be ) sacrificed for speed gains.
What are you views on over-overclocking the 28X2 to speeds such as 80 or 96MHz? I haven't been using the PICAXE system for as long as you so I won't know if it would cause premature failure of the PICAXE or other problems. I would only use the external crystal when processing the data and use the internal resonator when idle or receiving data.
 

MartinM57

Moderator
Overclocking...
Hobby project, just to see if you can/say that you can - yes
Hobby project that you want to rely on - maybe
Hobby project that you publish for others to use and rely on - no
Small commercial venture - don't even think about it
Large commercial venture - you shouldn't even be asking about it

I think Hippy has (as usual, and for nothing) put a lot of effort into giving you some strong pointers on what to look at in your code. I would suggest you give them serious consideration...
 

hippy

Technical Support
Staff member
Some timing results using a logic analyser and examining a single character output ...

Code as latest posted : 10.644ms

Cutting the LOOKUP in half : 6.644ms

Cutting in half again : 4.812ms

Replacing LOOKUP with READ : 3.082ms

Removing the high overhead maths : 2.552ms

So there's a potential for a four-fold increase in speed.

Removing the "^invert byte", dropping the redundant moves of data, and the extra bit setting ( ie, simulating a 28X2 using a 20X2 ) then time dropped to 1.828ms per character, a near ten fold increase in speed.

On over-clocking I'd agree entirely with MartinM57. For a one-off home project it's worth trying and if it works it works. I got a 28X2 up to 100MHz which should drop per character to about 1.2ms. With an effective 4 x 16 display at 2ms per character, even a full update would only take 128ms, about a tenth of a second so over-clocking wouldn't really be necessary.

With chopping LOOKUP in half, and half again, that seems to be tending towards the 3ms time of READ so focus on that and then getting rid of the high overhead maths. If you end up with 4ms per character that's still only 256ms for a full screen update, say 500ms if one line is inverted. That's still a three-fold increase over current speed and should be usable enough. You can make that faster if you only update lines which need to change.
 

Dippy

Moderator
I agree -not recommended.
Play by all means but, let's face it, if a PIC was safe/reliable to run overclocked like that don't you think Microchip would be shouting it from the rooftops?

Glad to see you're getting on with GLCD.
Attached is my feeble effort, though it does pictures too.
I'm on line 2343 of code - although 200 must be blanks/comments
 

Attachments

nick12ab

Senior Member
I agree -not recommended.
Play by all means but, let's face it, if a PIC was safe/reliable to run overclocked like that don't you think Microchip would be shouting it from the rooftops?

Glad to see you're getting on with GLCD.
Attached is my feeble effort, though it does pictures too.
I'm on line 2343 of code - although 200 must be blanks/comments
I don't see how your attempt is feeble - unless it's very slow. Yours has bold and non-bold fonts with variable width characters.

When it comes to overclocking, gamers overclock their PCs and they have to rely on them for gaming, but it's their choice and for a critical component of something like a home automation, a security system or medical equipment, it wouldn't be a good idea. Will replace some of the Lookup commands with read and readtable. I'm not saying that this will be used commercially as microcontrollers used in industry often have enough memory to manage the GLCD directly.

A question - would external EEPROM on the i2c bus (after upgrading to a 28x2) be faster than the lookup tables?

MartinM57 said:
I think Hippy has (as usual, and for nothing) put a lot of effort into giving you some strong pointers on what to look at in your code. I would suggest you give them serious consideration...
Will do.
 

Dippy

Moderator
Well, it took about 200mS to do that screen. Proportional spacing does add a little to the processing.
How long did yours take to do that image you posted?

I have 5 'fonts' ; standard fixed-width font, Arial and Arial Bold and Arial-large and symbols.
All my font data is stored in programme flash space.

One comment I would make; don't get hung up too much on speed. These GLCDs aren't very fast to respond.

I did a similar version into a Densitron OLED. I used a slightly differnt technique of memory mapping. The result was that a page like that was written in less than 8mS. So the display hardware plays a very significant role in overall speed - as does methods and programming techniques.

And I'd be very surprised if I2C EEPROM was faster than lookup - unless you've written the code badly ;)
 

nick12ab

Senior Member
Well, it took about 200mS to do that screen. Proportional spacing does add a little to the processing.
How long did yours take to do that image you posted?

I have 5 'fonts' ; standard fixed-width font, Arial and Arial Bold and Arial-large and symbols.
All my font data is stored in programme flash space.

One comment I would make; don't get hung up too much on speed. These GLCDs aren't very fast to respond.

I did a similar version into a Densitron OLED. I used a slightly differnt technique of memory mapping. The result was that a page like that was written in less than 8mS. So the display hardware plays a very significant role in overall speed - as does methods and programming techniques.

And I'd be very surprised if I2C EEPROM was faster than lookup - unless you've written the code badly ;)
It took around 700 milliseconds. For speed of the GLCD, the clear screen command affects all of the pixels on the display and that command runs much quicker than putting text on it. Another point is that the GLCD has a 47kohm resistor called Rf which sets the frequency. It isn't discussed in the Samsung datasheet but in a datasheet for Avant's version of the same LCD driver, SBN6400, it says that values between 30k and 47k should be used and the 33k resistor is recommended, where 30k offers the fastest speed and 47k offers the slowest speed. This causes the GLCD to flicker if the screen is viewed from the side or with a lot of white. Not going to risk replacing the resistor as Avant's version may be an improvement over Samsung's version and Samsung's version may not be compatible with the higher speed.
 

Dippy

Moderator
Well it's obvious why ClearScreen is much faster yes?

Are you just writing bytes to the (C)GRAM?
In mine I read/write GraphicsRAM with bitwise operation on PIC - thus you can superimpose lines etc. onto letters.
That's my 'bottleneck'.

By storing a memory block, bitwising in RAM, then sending the block you can increase speed hugely.
In my example that method would allow a screen refresh in under 100mS.
Unfortunately, the pixel response times for the GLCD screen is not fast enough to permit acceptable 10fps animation.

On the electrical side of things; leave it alone. Tweaking could end in tears and it ain't the GLCD that is slowing your execution.
 

nick12ab

Senior Member
Well it's obvious why ClearScreen is much faster yes?

Are you just writing bytes to the (C)GRAM?
In mine I read/write GraphicsRAM with bitwise operation on PIC - thus you can superimpose lines etc. onto letters.
That's my 'bottleneck'.

By storing a memory block, bitwising in RAM, then sending the block you can increase speed hugely.
In my example that method would allow a screen refresh in under 100mS.
Unfortunately, the pixel response times for the GLCD screen is not fast enough to permit acceptable 10fps animation.

On the electrical side of things; leave it alone. Tweaking could end in tears and it ain't the GLCD that is slowing your execution.
The GLCD is written to in vertical columns of 8 pixels using the 8 pins of the data bus. So are you saying that you store the data in the PICAXE RAM before sending it to the GLCD?
 

Dippy

Moderator
I'm afraid the mirror CGRAM technique isn't possible in PICAXE.

The vertical column Byte format is very common with GLCDs and OLEDs.

When you haven't got bundles of memory the standard technique is to Read_GRAM_Byte--->bitwise_your_new_bit--->Write_CGRAM_Byte.
When you have got oodles of RAM you can leave out the Read_CGRAM.
 

hippy

Technical Support
Staff member
Possible on some, but probably not convenient.

A 128x64 bit display needs 1024 bytes of storage so a 28X2 scratchpad could hold the screen image, but then you couldn't use it for received serial data. Ideally one would have two image stores; last sent and latest required and only send things which have changed.

Where using GLCD as a text display one needs fewer bytes to throw away unchanged characters but one still has to be able to find that storage.
 

nick12ab

Senior Member
Problem

Divided the LOOKUP command into sections:
Code:
if executebyte < 32 then
	lookup executebyte,(0,$1020,$4444,$1474,$1D15,$1515,$1516,$1038,$FE83,$2214,$0808,$7F10,$4E71,$0002,$0808,$0000,$0808,$0000,$0000,$0808,$0808,$0808,$0000,$0808,$0808,$FFFF,$3E41,$3E41,$7F7F,$7F41,$7F00,$487E,$0000,$0000),cw1
	lookup executebyte,(0,$7F01,$5F44,$1C17,$1700,$1F00,$7C16,$5410,$8183,$0814,$2A08,$100F,$0171,$0502,$0808,$FF00,$FF08,$F808,$0F08,$0F00,$F800,$FF00,$FF08,$0F08,$F808,$FFFF,$7549,$5D49,$7F7F,$4141,$7F00,$4949,$0000,$4F00),cw2
	lookup executebyte,(0,$0100,$4400,$1400,$0000,$0000,$1500,$1F00,$FE00,$2200,$0800,$1000,$4E00,$0000,$0808,$0000,$0808,$0808,$0808,$0000,$0000,$0000,$0808,$0808,$0808,$FFFF,$3E00,$3E00,$7F00,$7F00,$7F00,$4200,$0000,$0000),cw3
elseif executebyte < 64 then
	executebyte = executebyte + 32
	lookup executebyte,($0000,$0000,$0003,$147F,$242A,$2313,$3649,$0005,$001C,$0041,$1408,$0808,$0050,$0808,$0060,$2010,$7F41,$0000,$7949,$4149,$0F08,$4F49,$7F49,$0101,$7F49,$0F09,$0036,$0056,$0814,$1414,$0041,$0201,$3E41),cw1
	lookup executebyte,($0000,$4F00,$0003,$147F,$7F2A,$0864,$5522,$0300,$2241,$221C,$3E08,$3E08,$3000,$0808,$6000,$0804,$4141,$0000,$4949,$4949,$0808,$4949,$4949,$0101,$4949,$0909,$3600,$3600,$2241,$1414,$2214,$5109,$5D55),cw2
	lookup executebyte,($0000,$0000,$0000,$1400,$1200,$6200,$5000,$0000,$0000,$0000,$1400,$0800,$0000,$0800,$0000,$0200,$7F00,$7F00,$4F00,$7F00,$7F00,$7900,$7900,$7F00,$7F00,$7F00,$0000,$0000,$0000,$1400,$0800,$0600,$1E00),cw3
elseif executebyte < 96 then
	executebyte = executebyte + 64
	lookup executebyte,($0201,$3E41,$7C12,$7F49,$3E41,$7F41,$7F49,$7F09,$3E41,$7F08,$0041,$2040,$7F08,$7F40,$7F02,$7F04,$3E41,$7F09,$3E41,$7F09,$4649,$0101,$3F40,$1F20,$3F40,$6314,$0304,$6151,$007F,$0204,$0041,$0402,$4040),cw1
	lookup executebyte,($5109,$5D55,$1112,$4949,$4141,$4122,$4949,$0909,$4949,$0808,$7F41,$413F,$1422,$4040,$0C02,$0810,$4141,$0909,$5121,$1929,$4949,$7F01,$4040,$4020,$3840,$0814,$7804,$4945,$4141,$0810,$417F,$0102,$4040),cw2
	lookup executebyte,($0600,$1E00,$7C00,$3600,$2200,$1C00,$4100,$0100,$7A00,$7F00,$0000,$0100,$4100,$4000,$7F00,$7F00,$3E00,$0600,$5E00,$4600,$3100,$0100,$3F00,$1F00,$3F00,$6300,$0300,$4300,$0000,$2000,$0000,$0400,$4000),cw3
else
	executebyte = executebyte + 96
	lookup executebyte,($4040,$0000,$2054,$7F48,$3844,$3844,$3854,$087E,$0C52,$7F08,$0044,$2040,$7F10,$0041,$7C04,$7C08,$3844,$7C14,$0814,$7C08,$4854,$043F,$3C40,$1C20,$3C40,$4428,$0C50,$4464,$081C,$0402,$0808,$0804,$1020),cw1
	lookup executebyte,($4040,$0305,$5454,$4444,$4444,$4448,$5454,$0901,$5252,$0404,$7D40,$443D,$2844,$7F40,$1804,$0404,$4444,$1414,$1418,$0404,$5454,$4440,$4020,$4020,$3040,$1028,$5050,$544C,$2A08,$7F02,$2A1C,$0810,$7F20),cw2
	lookup executebyte,($4000,$0000,$7800,$3800,$2000,$7F00,$1800,$0200,$3E00,$7800,$0000,$0000,$0000,$0000,$7800,$7800,$3800,$0800,$7C00,$0800,$2000,$2000,$7C00,$1C00,$3C00,$4400,$3C00,$4400,$0800,$0400,$0800,$0800,$1000),cw3
end if
and when I try to get all the characters to show on the display like in the video as before, the first set of characters show but the £ sign replaces all the other characters. The £ sign is the last character of the first lookup group.

I cannot understand why this doesn't work.
 
Top