|
[ / main / writing / att_asm ] |
AT&T Assembly Language |
(c)1997 Jeff Weeks and Code X software
|
|
Introduction |
|
|
Assembly language is often thought of as the ultimate of all languages. Perfectly adept to using every last drop of your computers power with the least ammount of code. This power comes with a price though; Assembly is not easy to learn. Not only that, but it is not standardized or portable either. Different processors will, typically, use a different form of assembly language. And in the case of the Intel processor, there are actually two versions of assembly for just one processor; Intel assembly, and AT&T assembly. |
|
|
It seems that all assembly language tutorials preach Intel assembly. Why? Quite frankly, there is no real reason. Intel and AT&T assembly are essentially the same thing, but for a few differences. Neither one or the other is better, however each have their own advantages. Intel assembly is more widely accepted, however, that seems to be changing with the introduction of DJGPP and the discovery of the excellent GCC compiler suite. Intel assembly lacks the standardization that AT&T assembly has, however. Even the two top big name assemblers, TASM and MASM, use slightly different syntax to get things done, while AT&T syntax remains constant across platforms. Not only that, but it interfaces with C compilers better. And lastly, AT&T assembly is quite possibly easier to read because of it's type specifiers. |
|
|
The CPU |
|
|
Before we learn about AT&T assembly itself, we must first stop to understand the CPU we're programming for. I, ofcourse, will concentrate on the Intel platform. The things to note are how the processor works, and how you, as a programmer, can take advantage of it. |
|
|
The first thing worth mentioning would have to be the registers. Have you ever wondered how you would send parameters to a CPU instruction? You may already know that parameters to a C/C++ function, for example, are typically passed through your computer's memory (the stack, infact). However, this is not the case for an Intel instruction. Intel processors, not unlike other processors, have a special type of memory contained on the processor chip itself. Withen this memory resides the CPU's registers. These registers are simply locations where data can be put to communicate to the processor. |
|
|
If that doesn't make since now, be assured that it will after we get more in depth with registers. They are really a simple concept. Just think of them like variables in a C/C++ program. Anyway, your Intel processor uses special registers for special purposes. Only four registers are available for general purpose operations. They are simply called ax, bx, cx and dx. These will be the registers we will start out with. Later on, more specialized registers will be introduced. |
|
|
There is one special register that will be introduced now. That is, the flags register. This register's purpose is simple; It contains information about the current status of the CPU. Well look into it more later. |
|
|
Hexadecimal |
|
|
It's unfortunate that every book or tutorial on assembly must start out with a description of the hexidecimal number system, but, quite frankly, it is unavoidable. |
|
|
You may have heard of hexidecimal before. It is a number system, just like the decimal system... almost. You see, the hexadecimal number system has 16 digits, while the decimal number system has only 10. 'Why do I have to know this?' you ask. Well, hex actually does have a place in conjunction with assembly language. As you may have guessed, in computer systems, you will often be dealing with very large numbers. In such situations, it makes sence to use hex because of it's more condensed output |
|
|
'Okay then, how do I used it?' That may not be as easy as understanding it's uses. Hexadecimal is actually a fairly simple concept, but it may take a while to fully grasp the concept. So, without further ado, I will try to explain it. |
|
|
Count to twenty! 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 'Simple' you say. Okay, now, how did you know to go to '10' from 9? Most of you do this without even thinking, but what are we actually doing when we go to 10 from 9? The answer is simple. Once we've gotten to 9, we've run out of digits. The only way we can get another number is if we restart from scatch, back at 0 again. However, to distinguish 10 from 0, we add a one in front of it. This is a reminder to us, that we've run out of digits once already, so-to-speak. |
|
|
So, what if we had a number system with 16 digits (0 1 2 3 4 5 6 7 8 9 A B C D E and F)? How would we count to 20 then? We'll we'd go like this: 1 2 3 4 5 6 7 8 9 A B C D E F 10 11 12 13 14 Ofcourse, that may look strange to you, but if we apply our simple 9 to 10 logic, it works perfectly well. Once we pass 9 in hexadecimal, we haven't run out of digits, so we continue on until we reach F, our last digit. Now we have to restart at 0, but remember, we add a 1 infront to tell us that this run isn't 0, it's 10h (note the h to signify we're working in hexadecimal, rather than decimal). |
|
|
Okay, that's pretty simple right? Perhaps you're still a little puzzled by this. Let's do a little practicing. What is the number 4Eh look like in decimal? |
|
|
Well, let's think logically here. Think back to the regular decimal system where you have a 1's, 10's, and 100's column, and so on. Now try to apply this to the hexidecimal system. The E, which represents 14, is in the 1's column. This column never changes between number systems; The first column is always the one's column. However, what about the next column, what is it? It should be obvious that this is the 16's column. Afterall, it tells us how many times we've gone through all the digits, of which there are 16 of them. |
|
|
So, using standard place value, we get the following
|| | -------- | | V V 4 x 16 + 14 x 1 = 78 |
|
|
Okay, you should be getting the hang of it now, so lets just try one more simple example. What is 25F is decimal? Let's check it out:
||| || ----------------- | -------- | | | | V V V 2 x 256 + 5 x 16 + 15 x 1 = 607 |
|
|
Okay, now lets take this one step further. Try and convert 607 back into hexadecimal. To do this, we must realise that hexadecimal is base 16 (as apposed to decimal being base 10) and so each place, in a place value chart will be an exponent of 16. To prove this, just calculate it out: |
|
|
The second place = 161 = 16 The third place = 162 = 256 The fourth place = 163 = 4096 and so on |
|
|
So to convert from decimal to hex we must find the highest place value that will fit in 607. By look at the chart above, we can see that 256 is the highest place value that will fit in 607. And it fits in two times, so we get 200 with 95 left over. |
|
|
Now repeat this process with the 95 left over. What is the highest place value that will fit in 95? How many times. Well, 16 will fit in 95, 5 times. Now we have 250 with 15 left over. |
|
|
Again, repeat the process with the 15 left over. Only 1 will fit in now, but it'll fit 15 times. However, what is 15 in hex? Think back and you'll remember that it is 'F' so, we finally get the answer of 25Fh. |
|
|
Binary |
|
|
'Not another number system!' you scream. Sadly though, yes, binary is yet another number system. But with our experiences in hexadecimal, binary should be a piece of cake, right? :) |
|
|
Okay, lets take a look at binary counting from 1 to 10: 1 10 11 100 101 110 111 1000 1001 1010 Sure, that may look strange, but just apply logic to it all. What are the place holders for a base 2 system? |
|
|
The second place = 21 = 2 The third place = 22 = 4 The fourth place = 23 = 8 and so on |
|
|
So this should all make since. However, for practice, lets take two examples. Let's convert 10111b to decimal:
||||| |||| ---------------------------- ||| --------------------- | || -------------- | | | ------- | | | | | | | | V V V V V 1 * 16 + 0 * 8 + 1 * 4 + 1 * 2 + 1 * 1 = 23 |
|
|
And now lets convert 102 to binary. What is the highest place value that will fit in 102? 64 (26) will go once, so we get 1000000 with 38 left over. Now what will fit in 38? 32 (25) will fit once, to give us 1100000 with 6 left over. Now, for simplicity, I'll just finish that off as 1100110. If you wish, you can fill in the steps there, but it should be obvious that 6 can break down into 2 and 4, which just happen to be place holders in binary, so you can just plug them in accordingly. |
|
|
The only thing left to say is this; One binary digit is commonly called a bit. I'm sure you've heard the term many times, so now you know what it means. It'll come up quite often when programming in assembly. |
|
|
Inline Assembly |
|
|
Okay, okay, I know you're aching to actually get something done in AT&T assembly, so I'll begin now, by introducing inline assembly. Inline assembly is the name given to assembly language which is 'inlined' or 'embedded' into an existing program, usually a C or C++ program. GCC, which is the compiler I will be using as an example, allows this. Let's look at an example. |
|
|
int one = 1, two = 2; asm ( "addl %ebx, %eax" : : "a" (one), "b" (two) : "ax", "bx" ); return 0; } |
|
|
Well, I'm sure that's a mouthful to digest. Let's check it out, step by step though. First of all, we define our variables (one and two). Nothing new there. The next relavent line may or may not confuse you. It's purpose is actually very simple. The "asm" keyword tells GCC that you are beginning an inline assembly block, and the "(" is simply the bracket which dictates where the block begins. Simple, right? |
|
|
The next line ("addl %ebx, %eax") is actually AT&T assembly! This shows the use of the instruction ADD, which will add two values and leave the result in the destination operand (this is what we call "parameters" to instructions). All this will be discussed later. For now, I'm just trying to communicate the structure of defining an inline assembly block. |
|
|
Now, after a variable number of instructions (we've only got one, ofcourse) comes the constrainment fields. AT&T inline assembly has three constrainment fields. The first one is the outputs field, then inputs, then registers modified. For simplicity, I've only defined an input field and a couple registers modified. |
|
|
The next line simply closes the inline assembly block. So, you see, the only things that really need further explaination are the constrainment fields, everything else is fairly simple. So, that is what I will explain now. |
|
|
The constrainment fields exist for the purpose of providing your assembly block with variables from your C program. If you wish to give an input to your assembly block, you must add to the input field. It should be obvious that the value in the brackets is the actual C/C++ variable name, but what is the quoted part for? Well, the quoted part tells GCC where you want your input put. You can use any of the following: |
|
|
"r" Put the input into any available register "a" put it into the ax register "b" put it into the bx register "c" put it into the cx register "d" put it into the dx register "D" put it into the di register (discussed later) "S" put it into the si register (discussed later) |
|
|
As you can see, we copied One into the ax register, and Two into the bx register. However, you'll notice we refered to them as eax and ebx inside the assembly block. Why? Good question. The ax register, for example, is a 16-bit register. Remember back to the definition of a bit; A bit is a single binary digit. In other words, the ax register has room for 16 binary digits. If you calculate that out, that means the ax register can hold a number as large as 65536 (216). In actuality, it can hold the numbers 0 through 65535 (remember, 0 is a number too). |
|
|
So, what does this have to do with ax and eax? Well, you see, the ax register can actually be split in half, to form two 8-bit registers; ah and al (the high and low parts of ax). The programmer can access these in the same way as ax. On a similar note, when the 386 was developed, Intel added 32-bit extentions to most registers, and so ax has a 32-bit version called eax. |
|
|
GCC will actually decide what version of the ax register to use. For example, If you specify a char type variable in the brackets, GCC will use ah (if you specified "a"). So, let's revise the above table: |
|
|
"r" Put the input into any available register "a" put it into the ah/ax/eax register "b" put it into the bh/bx/ebx register "c" put it into the ch/cx/ecx register "d" put it into the dh/dx/edx register "D" put it into the di/edi register (discussed later) "S" put it into the si/esi register (discussed later) |
|
|
You'll notice that di and si don't have 8-bit versions. You'll probably also note that you can't access the low part of any register using these contraints. This is only a minor problem really. You can, ofcourse, access the low part of any register inside the assembly block. |
|
|
Okay, now back to the constrainments. What if we reproduce the above program, but using "g" for the input variables? Now how do we reference our variables now, since GCC is deciding where to place them? Well, GCC does provide a mechanism for this. Let's take a look at it in action: |
|
|
int one = 1, two = 2; asm ( "addl %1, %0" : : "g" (one), "g" (two) : "memory" ); return 0; } |
|
|
So, as you can see, each input that is specified with "g" is referenced as %n, where n refers to the number of the "g" variable you want. Here's an interesting twist though; What if you want to use both registers and "g" constrained inputs? Well, this presents an interesting feature of GCC, where you then have to prepend registers with double percents (%%). Check it out: |
|
|
int one = 1, two = 2; asm ( "addl %0, %%eax" : : "a" (one), "g" (two) : "ax", "memory" ); return 0; } |
|
|
So you see, the double precent sign is required. An oddity, I agree. Okay, now what about outputs? Well, outputs are pretty much the same as inputs. The variable in the bracket tells GCC where to copy the output, and the value in the quotes tells GCC where to find this ouput. The only difference, however, is that outputs require an equals sign in the quotes. Check it out: |
|
|
int one = 1, two = 2, three; asm ( "addl %ebx, %eax" : "=a" (three) : "a" (one), "b" (two) : "ax", "bx" ); return 0; } |
|
|
That's pretty easy, right? Well, it gets a little bit more compilcated when we use "g" for both inputs and outputs. Here's an example: |
|
|
int one = 1, two = 2, three; asm ( "addl %2, %1" "movl %1, %0" : "=g" (three) : "g" (one), "g" (two) : "memory" ); return 0; } |
|
|
Okay, there are two things to note here. First of all, notice that %0 will always refer to the first variable defined as "g", wether it be an input or an output. So, in the above example, %0 refers to the first output, while %1 and %2 now refer to the inputs. Next, notice we used the mov instruction in the above example. This was required to move the result of the add instruction to %0, or, in other words, to copy it into the C variable 'three'. It might also be interesting to note that the mov instruction doesn't actually move one thing to another, but rather copies. Another oddity. |
|
|
Okay, now to the last, and probably simplest constrainment field; The registers modified field. This field, not surprisingly, contains a list of registers that have been modified by your assembly block. You can simply specify the regular register name (the 16-bit version) and that will suffice for all versions of that register. For example "ax" covers "eax" "ax" "al" and "ah" so it's recomended you just use the regular register name. |
|
|
You have probably also noticed that I have used "memory" in this field. This tells GCC that we have modified memory withen our assembly block. When using the "g" contrain for inputs and/or outputs it's a good idea to specify "memory" in the registers modified field. Afterall, GCC could put your inputs and/or outputs into memory if it decides to. |
|
|
One last thing to note about this whole inline assembly thing. The use of %0, %1 and so on are also required if you use the "r" constaint. For that matter, it's required whenever you don't know the exact location of the input or output. Afterall, how can you reference it, if you don't know where it is? :) |
|
|
Arithmetic in AT&T Assembly |
|
|
Okay, now that we (hopefully) know how to define an inline assembly block, let's try some arithmatic operations using AT&T assembly. In this section I will present 4 basic Intel instructions, ADD, SUB, MUL and DIV, as well as some variations on these. Not to mention some optomization tips. |
|
|
Addition |
|
|
Okay, we've already been introduced to the ADD instruction. Not surprisingly it's purpose is to add the two operands given to it. ADD places the result in the destination operand. In AT&T assembly, the last operand is the destination operand. This is the opposite of Intel assembly. |
|
|
It may also be noted that ADD doesn't just have to add two registers. In can add immediate data, which is simply a number right in the instruction, or it can also add memory contents. Let's take a look at some examples of each: |
|
|
addw $5, %ax # add 5 to ax, store in ax addl (%ebx), %eax # add data in memory location ebx to eax, store in eax |
|
|
There are a few things to notice here. First of all, lets talk about register postfixes. A register postfix is the last letter of the intruction, which can be either a b (byte - 8 bit), w (word - 16 bit), or l (long - 32 bit). This letter tells the assembler what size of operands to expect. You'll note that since eax is a 32-bit register, we use l. Since ax is a 16-bit register we used w. If we were working on either ah or al, we'd use b. |
|
|
Next, let's talk about the actual instruction. The first one should be fairly obvious in it's intent. It simply adds the ebx and eax register together and stores the result in the destination register, eax. The next instruction may look a little different. In AT&T assembly, we use the $ (dollar sign) to represent immediate data. Since 5 is immediate data, we much put a dollar sign in front of it. You may also note that to represent hexadecimal numbers, we use the typical C convention of prefixing the hex number with 0x. For example, to add 10h to eax, you would use the following syntax:
|
|
|
And, now the last instruction. This one may be the strangest one yet, but it doesn't have to be. You see, the brackets around a register simply tell the processor not to use the value actually stored in the register, but to use the memory location referenced by that register. I know that may sound a little complicated, but just think of it this way. If ebx contained 10, and it was bracketed, then the processor would not add 10 to the other operand. Instead, it would retrieve the number at byte 10 in memory, and add it to the other operand. It's a simple concept, but often confusing. It is analogous to a C/C++ pointer, if that helps any. It should be noted that you can't add two memory locations together. You can add memory and registers, memory and immediate values (as long the memory is the destination operand... that only makes sence), just not memory and memory. |
|
|
Oh, and one last thing. Those # signs mark a comment. Anything beyond a number sign will be ignored in an inline assembly block, so you can use them to describe parts of your program |
|
|
Subtraction |
|
|
Quite frankly, subtraction isn't much different than addition. You can still use the same three techniques of adding registers, memory, or immediate values.
subb $0xF, %al # subtract 15h from al, store in al subl (%ebx), %eax # subtract data in memory location ebx from eax, store in eax |
|
|
That's all well and good. Easy stuff, right? Well, there's just one topic of interest here; Negative numbers. What if you subtract 5 from 4? How does the processor handle negative numbers. Well, to understand that, we must first understand overflow. Let's add 5h to FFFFh |
|
|
addw $0xFFFF, %ax; |
|
|
What does ax equal now? Some of you might be tempted to say 10004h, afterall, that is the result of adding 5h to FFFFh. However, you would be wrong. Why? Well, remember back to how many bits the register ax can hold? ax is a 16 bit register, and therefore, can only hold numbers as high as 65535, or FFFFh. In other words, 10004h is too big to fit in the ax register. So what happens? Quite frankly, it gets truncated. The 1 disappears and we are left with 4h. This is the result of an overflow. |
|
|
So, how is this related to negative numbers? Well, just think about it. 5 plus what will yeild 4? That's right, -1. So, it seems that FFFFh is actually acting as if it were negative one! This is true, however, only for 16-bit registers. If we used 32-bit resgisters, then 10004h could fit inside eax, for example. What does -1 look like for a 32-bit register, then? FFFFFFFFh. And FFh for an 8-bit register. |
|
|
Okay, what's going on here? Well, the Intel processor uses the highest bit of the number (7 in an 8-bit number, 15 in a 16-bit number, 31 in a 32-bit number (remember, bit 0 does exist. Bit 1 is not the first bit, but the second)) to tell if the number is negative or not. If the highest bit is set (equal to 1) then the number is negative, if not it's positive. Keep in mind, however, that this is not always true. Have you ever heard of the terms signed and unsigned? Well, these terms represent wether the high bit should be used to interpret if the number is negative or positive, or not. If the number is signed, then the high bit represents the sign of the number, positive or negative. If the number is unsigned, the high bit is part of the regular number, which means an unsigned number will always be positive. This yeilds the following ranges for numbers: |
|
|
8 bit, unsigned 0 to 255 16 bit, signed -32768 to 32767 16 bit, unsigned 0 to 65536 32 bit, signed -2147483648 to 2147483647 32 bit, unsigned 0 to 4294967295 |
|
|
But wait, that's not all there is to negative numbers. How do we get FFFFh from -1? Well, lets go through the process of negating a number. Let's use 5 as an example. |
|
|
00000101b |
|
|
If we reverse each of those bits, we get the following:
|
|
|
And then if we add one:
|
|
|
That last value is equal to -5 according to the processor. And when you take into account overflow, it will, in fact, act as if it were negative five. For instance, if you add 5, you get 100000000b, which is 9 bits, so we have to trancate it to 0 (we're using 8-bit numbers/registers for this example). In other words, -5 + 5 = 0. |
|
|
This is a somewhat more difficult concept to grasp, but once you understand it everything will become clear. In fact, you might just understand C/C++ size limits and the signed and unsigned keyword a lot better after experiencing how the processor handles all these things. |
|
|
Multiplication |
|
|
Remember back to grade school when you were just introduced to the concept of multiplication. Remember how much slower it was to multiply two numbers than it was to add or subtract them. Well, this is very much the same on the Intel processor. Multiplication, and division as well, are rather slow on the Intel processor, so it's best to avoid them whenever possible. After explaining the mechanics of each of these, I will explain simple tricks to do just that. |
|
|
As with addition and subtraction, you can multiply memory, and registers (again, just not memory with memory). However, their are restrictions on multiplying immediate data. Let's take a look at some examples:
imull (%ebx) # multiply data in memory location ebx and eax, store in edx:eax |
|
|
Okay, there are two things to talk about here. First, IMUL and MUL (see below) both assume AL, AX, or EAX as the second operand. Simple enough. Now, look at where the result is stored. This requires a little bit of explaining. If we multiply a 32-bit value, by a 32-bit value, what size is the answer? 32-bits? No, actually it would be 64-bits. Just think about it, the result of a multiplication can be two times as large as it's operands, or else an overflow will occur. But wait, doesn't an overflow HAVE to occur to support negative numbers? Quite right, however, sometimes you don't want to support negative numbers for your purpose. This is why the Intel processor has two multiply instructions, MUL and IMUL. MUL performs an unsigned multiplication, while IMUL performs a signed multiplication. Use whichever is appropriate. |
|
|
Okay, back to the result size problem. If you multiply two bytes, the result will be contained in AX, because multiplication of two 8-bit numbers can result in a 16-bit number. Also, if you multiply two 16-bit numbers, you might expect the value to be stored in eax, but in fact it isn't, because eax did not exist when the multiply instruction was invented. So, what the processor does is put the first word of the answer in dx, and the second word of the answer in ax. And, since the Intel processor doesn't have any 64-bit registers, if you multiply two 32-bit registers, the result is contained in edx and eax. The first long is stored in edx, while the second is in eax. |
|
|
Division |
|
|
Division is like multiplication in that it can divide memory and registers, but has restrictions on immediate data. Also, the Intel processor, just like multiplication, provides two versions of division; Signed and unsigned. Here are examples of each: |
|
|
idivl (%ebx) # divide eax by data in memory location ebx divl %edx # divide eax by edx divb (%ebx) # divide al by data in memory location ebx |
|
|
You'll notice that I've left out where the result is put. This is because there is more than one result, really. There is the quotient and the remainder. When dividing by an 8-bit value, the quotient is stored in AL and the remainder in AH. When dividing by a 16-bit value the quotient is stored in AX, and the remainder in DX. And finally, when dividing by a 32-bit value, the quotient is stored in EAX, while the remainder is in EDX. |
|
|
Optimization |
|
|
And with that you have now learnt how to use the Intel processor to perform math operations. Now, as I promissed, I will present some optomization tips here. As I mentioned before, multiplication is slow on the Intel processor, and even slower is division. So, let's look at ways to avoid them. |
|
|
The Intel processor has two instructions called SHL and SHR which can be used to shift the bits of a number to the left or right by a certain ammount.
AH = 10110000 If Ah = 01100101, then after "shr ah, 2" AH = 00011001 |
|
|
As you can see, shifting simply shifts each bit by a set number of bits and fills in the gaps with zeros. What may not be outwardly obvious though, is that this can be used for very fast, but limited, multiplication and division. |
|
|
Lets take a practical example in decimal. We can rapidly calculate any number times 10, because we know all you must do is add a zero. In essence, we are shifting the number to the left, by one, therefore filling in one zero. This can also be done with the Intel processor. However, just as in decimal we are limited to multiplying by multiples of 10, on the Intel processor, we are limited to multiplying by multiples of 2, since it is a binary processor. |
|
|
So, to multiply by 2, we simply shift each bit to the left by 1. To multiply by 4, we shift each bit to the left by 2, and so on. And so to divide, we use the same process but we shift to the right instead. |
|
|
shll eax, 7 are equal divl $32 and shll eax, 5 are equal |
|
|
It's a simple trick, and very limited, but it actually comes in handy fairly often. Especially when you take into consideration than many things in the computer world are multiples of 2. |
|
|