Now that we have an idea of what instructions we're going to implement, we have a real challenge in front of us. How do we represent these in memory so that an interpreter can execute them. (Yes, I know I short-circuited some design decisions with this -- we could generate actual machine code for the PLC processor, but we'll go with an interpreter instead, making things a tad simpler and more portable.) I spent a lot of time on the earlier projects trying to figure out what the AB PLC-5 binary representation might be based on the external documentation including instruction size. I still can't figure out how they get some of those things packed so tight. Again, that was a swamp that slowed me way down. Besides, I've done assembly language programming for years and have a pretty good idea of what the trade offs might be.
Trust me, there are trade offs. Lots of them. We'll start with the size of the address space. How big do we make it? A larger address space means (generally) bigger instructions to reference those addresses. A small address space can make programming a real challenge. Back in the day, I worked on an IBM S/360 Model 22 with a whopping 64K of memory -- that was enough to run the business office of a major university, but dang was it cramped. Fortunately, PLC programs tend to be on the smaller size; we're not trying to program IBM's Watson into this. The word size of our PLC is 16 bits, to be consistent with many of the PLCs out there, which gives us a 64K address space. Many (most) of those put data and program memory in that same space; one of the design criteria for BASPLC2 is that we'll keep them separate, so that gives us a full 64K for data and, effectively, unlimited program space. That should be enough for any reasonable program. The PLC-5 allows up to 1000 files in the data space, just so that we don't waste bits, we'll allow 212 (4096 files.) While that's overkill, the next 4-bit increment down would be 28 (256) which is pushing things a bit too small.
We could go for extreme efficiency in program size by being clever in how we encode instructions and addresses. Since we've just given ourselves a lot of program space (limited to the available physical or virtual memory of the host processor), we don't need to worry too much about that. I actually started with an implementation that broke an instruction down to the nybble (4-bit unit) basis, but found that decoding that was a serious pain and added complexity where it wasn't necessary. There's a little bit of that left over, but it's now constrained. So, how do we represent the instructions? Let's start with the most complex part, addresses and literals. These are the operands for the instructions and are the most complex to implement.
There are several "addressing modes" available in PLCs. First, we could have a literal value, usually a 16-bit integer or 32-bit float. Addresses can be much more complex, though. First off, the data space is divided into files, each file being a collection of elements of the same type. So, there is an integer file that is all integers, a bit file that is all bits, a timer file that is all timers and so forth. We can have multiples of each type (except for input and output) but the data type within a file is consistent. So, we'll address a file with a type and number like this: "B3" or "N7", representing file #3 which is binary and file #7 which is integer. "T4" is a file of timers. Now, we'd like to address a single word or element in a file. "B3:2" is the 3rd word (zero based) in file B3. "T4:5" is the 6th timer in that timer file. Now we have a bit of a split between the scalar types like binary and integer and the "structure" types like timers and counters.
For scalar types we frequently want to address a single bit. This is especially true for input, output and binary files and occasionally true for integer. If you're playing with bits in a float, good luck to you. To address a bit, we add a bit number, "B3:2/7". This is now the eighth bit in the third word of the binary file. Note that there's a little confusion at times, because sometimes that bit number is address in octal (base 8) and sometimes in base 10. That means that "11" could mean decimal 9 (in octal) or decimal 11. In any case, there are 16 bits in each word.
For structured types, we need to be able to address both bits and certain larger elements, such as the accumulated count in a counter. We use a different notation for this. "T4:5.EN" is the enabled bit for the timer, while "C5:2.ACC" is the accumulator for the counter, a 16-bit integer word. It's important, obviously, that the symbol used match the context -- having an XIC instruction referencing T4:5.EN is fine, but referencing C5:2.ACC is not.
So far, so good. We've got literal, file, word, bit and field addressing. These are all "immediate." But what if I want to use one word to indicate what word in another file should be referenced -- this comes into play a lot when you have looping. For that, we have "indirect" addressing. Just to make things entertaining, we can make the file number, element number and bit number indirect (although there are issues making timer, counter and control element numbers indirect -- some PLCs forbid this.) In a ladder logic program, we use a bracket notation to indicate this. "B[N7:2]:2/5" is the sixth bit, of the third word of the file whose number is stored in N7:2. Which had darn well better reference a binary file or you're messed up. Indirect addressing has its drawbacks. "B3:[N7:4]/2" references the third bit an a word whose number is stored in N7:4 in the file B3. N7:4 had better not be negative or equal to the size of B3 or greater. More things to watch out for in indexed addressing. We can do the same thing with a bit number, where the value of the indirect must be between 0 and 15. Note that we cannot do that with the fields of a structured type. At best, we can make the file or element number indirect. If we really want to make things hard to debug, we can combine any and all of these. "B[N7:2]:[N7:4]/[N7:6]" is a legitimate reference, but figuring out what it's looking at is not so much fun.
Finally, we have "indexed" addressing. In some situations, this is a rough equivalent to an indirect element address where there is a special value used for the indirect; this value comes from the status file. "#B3:0/5" would be the word in B3 addressed by this special index in the status file. So, roughly "B3:[S2:17]/5". I should note that the index register doesn't exist in the Micrologix series of PLCs, so it clearly wasn't that useful. The syntax remains, though, for some instructions that start in a certain place and cover a set of elements. That's just syntactic sugar and doesn't even enter into the binary form of the address. Although I doubt that I'll ever implement actual indexing, I'll leave space for it in the design.
Now, what do we do with this. We'll apply a few design requirements and see what comes out. First, we want this to be compact. Second, easily decoded. Some expansion room would be a nice-to-have. We'll use a 4-bit type code (giving us 16 possible types) and then fill in the rest as needed. The whole result will be byte-aligned although we will have some nybble-alignment within an address. The following picture shows what I propose. (Click to embiggen; you'll still need a magnifying glass, but it's slightly better.)
When we compile this, we'll turn all direct element references into absolute words. That is, the offset of the start of the element from the beginning of the data space. Of course, this will depend on the size of each file, but it's easy to calculate during compilation and very easy to use in the interpreter. An indirect reference then contains an absolute reference. For instance, "B3:[N:7]" will have the "N:7" compiled as an absolute offset in the data space where we get the indirect value, while the B3 part is compiled as a file number. We have to add a fudge factor for structured types since the element number references a block of 3 words and we want one specific word within that. If you look at the "TF:[TF:E]" line in the table, you'll see how we encode something like "B3:[N:7]" or "T4:[N:7].ACC".
(Edit: Ooops... I stopped before I finished the design!)
Now let's move on to the instructions themselves. As I said earlier, I tried a nybble-based approach with the most common instructions requiring just one nybble of opcode, some taking 2, some taking 3 and some taking 4. Nice for packing, not so nice for unpacking. Really not necessary, either. A single byte would give us 256 instructions. The first two phases are just about 50 instructions. If we need more than 256 distinct instructions then we're really, really, really over-designing something. The only extra stuff we need for the first two phases are a nybble of zeros for the SBR and RET instruction parameter counts and two nybbles of zero for the SBR instruction (input and output parameters.) That's it. One byte and a touch extra is all we need.
If you've got questions, ask them in the comments. I'll go into more detail as required, including examples.