You have no items in your shopping cart.

Subtotal: 0.00


In some situations you may find it necessary to mix C and assembly language in the same program. For example, a program may require particular routines which are performance-critical, and which must therefore be hand-coded in order to run at optimum speed.

All you need to do is follow few programming rules; you can call assembly routines from C and vice-versa. Public variables declared in assembly modules a re-available to your C program.

There may be several reasons to call an assembly routine like faster execution of program, accessing SFRs directly using assembly etc.

For any assembly routine to be called from C program, you must know how to pass parameters or arguments to function and get return values from a function.

Sometimes, it is difficult and inefficient for programming in Assembly. We may want to use C/C++ for fast prototyping. Firstly, we have to prepare the GNU tool chain for ARM (compiler, assembler, linker ...etc).

The ARM Software Development Toolkit enables AOF object files to be generated from C and assembly language source by the appropriate tools, and then linked with one or more libraries to produce an executable file, below:


Irrespective of the language in which they are written, routines that make cross-calls to other modules need to observe a common convention of argument and result passing. For the ARM, this convention is called the ARM Procedure Call Standard, or APCS.

Using the ARM Procedure Call Standard

The ARM Procedure Call Standard, or APCS, is a set of rules which govern calls between functions in separately compiled or assembled code fragments.

The APCS defines:

☞Constraints on the use of registers.

☞Stack conventions.

☞The format of a stack back traces data structure.

☞Argument passing and result return.

☞Support for the ARM shared library mechanism.

Code which is produced by compilers is expected to adhere to the APCS at all times. Such code is said to be strictly conforming.

Handwritten code is expected to adhere to the APCS when making calls to externally visible functions. Such code is said to be conforming.

The ARM Procedure Call Standard comprises a family of variants. Each variant is exclusive, so that code which conforms to one cannot be used with code that conforms to another. Your choice of variant will depend on whether:

☞The Program Counter is 32-bit or 26-bit.

☞Stack limit checking is explicit (performed by code) or implicit (memory management hardware).

☞Floating point values are passed in floating point registers.

☞Code is reentrant or non-reentrant.

Register names and usage

The following table summarizes the names and uses allocated to the ARM and Floating Point registers under the APCS.


a1-a4, f0-f3 are used to pass arguments to functions. a1 is also used to return integer results, and f0 to return FP results. These registers can be corrupted by a called function.

v1-v5, f4-f7 are used as register variables. They must be preserved by called functions.

Register Number





argument 1 / integer result / scratch register



argument 2 / scratch register



argument 3 / scratch register



argument 4 / scratch register



register variable



register variable



register variable



register variable



register variable



static base / register variable



stack limit / stack chunk handle / reg. variable



frame pointer



scratch register / new-sb in inter-link-unit calls



lower end of current stack frame



link address / scratch register



program counter



FP argument 1 / FP result / FP scratch register



FP argument 2 / FP scratch register



FP argument 3 / FP scratch register



FP argument 4 / FP scratch register


Table. ACPS Registers

sb, sl, fp, ip, sp, lr, pc have a dedicated role in APCS variants some of the time, when some of the registers used for other purposes when conforming to APCS.

In some variants of the APCS sb and sl are available as additional variable registers v6 and v7 respectively.

As stated previously, hand-coded assembler routines need not conform strictly to the APCS, but need only conform. This means that all registers which do not need to be used in their APCS role by an assembler routine (eg. fp) can be used as working registers provided that their value on entry is restored before returning.

An example of APCS registers usage: 64-bit integer addition

This example illustrates how to code a small function in ARM assembly language, such that it can be used from C modules.

First, however, we will write the function in C, and examine the compiler’s output.

The function will perform a 64-bit integer addition using a two-word data structure to store each 64-bit operand.

In assembler, the obvious way to code the addition of double-length integers would be to use the Carry flag from the low word addition in the high word addition. However, in C there is no way of specifying the Carry flag, so we have to use a workaround, as follows:

void add_64(int64 *dest, int64 *src1, int64 *src2)


unsigned hibit1=src1->lo >> 31, hibit2=src2->lo >> 31, hibit3;

dest->lo=src1->lo + src2->lo;

hibit3=dest->lo >> 31;

dest->hi=src1->hi + src2->hi +

((hibit1 & hibit2) || (hibit1!= hibit3));



The highest bits of the low words in the two operands are calculated (shifting them into bit 0). These are then used to determine the value of the carry bit.

Examining the compiler's output

If the addition routine were to be used a great deal, a poor implementation such as this would probably be inadequate. To see just how good or bad it is, let us look at the actual code which the compiler produces.

Copy the file add64_1.c from directory examples/c and asm to your current working directory and compile it to ARM assembly language source as follows:

armcc -li -apcs 3/32bit -S add64_1.c

The -S flag tells the compiler to produce ARM assembly language source (suitable for armasm) instead of object code. The -li flag tells it to compile for little-endian memory and the –apcs option specifies the 32-bit version of APCS 3. You can omit these options if your compiler is configured to have them as defaults.

Looking at the output file, add64_1.s, we can see that this is indeed an inefficient implementation.


STMDB              sp!, {v1, lr}

LDR                     v1, [a2, #0]

MOV                   a4, v1, LSR #31

LDR                     ip, [a3, #0]

MOV                   lr, ip, LSR #31

ADD                    ip, v1, ip

STR                      ip, [a1, #0]

MOV                   ip, ip, LSR #31

LDR                     a2, [a2, #4]

LDR                     a3, [a3, #4]

ADD                    a2, a2, a3

TST                      a4, lr

TEQEQ               a4, ip

MOVNE             a3, #1

MOVEQ             a3, #0

ADD                    a2, a2, a3

STR                      a2, [a1, #4]!

LDMIA                sp!, {v1, pc}

Modifying the compiler's output

Let us return to our original intention of coding the 64-bit integer addition using the Carry flag.

Since use of the Carry flag cannot be specified in C, we must get the compiler to produce almost the right code, and then modify it by hand. Let us start with (incorrect) code which does not perform the carry addition:

void  add_64(int64 *dest, int64 *src1, int64 *src2)


dest->lo=src1->lo + src2->lo;

dest->hi=src1->hi + src2->hi;



Comparing this to the C source we can see that the first ADD instruction produces the low order word, and the second produces the high order word. All we need to do to get the carry from the low to high word right is change the first ADD to ADDS (add and set flags), and the second ADD to an ADC (add with carry). This modified code is available in directory examples/c and asm as add64_3.s.

What effect did the APCS have?

The most obvious way in which the APCS has affected the above code is that the registers have all been given APCS names.

Here, a1 holds a pointer to the destination structure, while a2 and a3 hold pointers to the operand structures. Both a4 and IP are used as temporary registers which are not preserved. The conditions under which IP can be corrupted will be discussed later in this chapter.

This is a simple leaf function, which uses few temporary registers, so none are saved to the stack and restored on exit. Therefore a simple MOV PC, IR can be used to return.

If we wished to return a result—perhaps the carry out from the addition—this would be loaded into a1 prior to exit. We could do this by changing the second ADD to ADCS (add with carry and set flags), and adding the following instructions to load a1 with 1 or 0 depending on the carry out from the high order addition.

MOV a1, #0 ADC a1, a1, #0

Back to the first implementation

Although the first C implementation is inefficient, it shows us more about the APCS than the hand-modified version.

We have already seen a4 and IP being used as non-preserved temporary registers. However, here v1 and IR are also used as temporary registers. V1 is preserved by being stored (together with IR) on entry. IR is corrupted, but a copy is saved onto the stack and then reloaded into pc when v1 is restored.

Thus there is still only a single exit instruction, but now it is:

LDMIA sp!, {v1, pc}

A more detailed look at APCS registers usage

We stated initially that SB, SL, FB, IP, SP and IR are dedicated registers, but the example showed IP and IR being used as temporary registers. Indeed, there are times when these registers are not used for their APCS roles.

It will be useful for you to know about these situations, so that you can write efficient (but safe) code which uses as many of the registers as possible and so avoids unnecessary saving and restoring of registers.

☞IP is used only during function calls. It is conventionally used as a local code generation temporary register. At other times it can be used as a corruptible temporary register.

☞IR holds the address to which control must return on function exit. It can be used as a temporary register after pushing its contents onto the stack. This value can then be reloaded straight into the PC.

☞SP is the stack pointer, which is always valid in strictly conforming code, but need only be preserved in handwritten code. Note, however, that if any use of the stack is to be made by handwritten code, sp must be available.

☞SL is the stack limit register. If stack limit checking is explicit (i.e. is performed by code when stack pushes occur, rather than by memory management hardware causing a trap on stack overflow), SL must be valid whenever sp is valid. If stack checking is implicit SL is instead treated as v7, an additional register variable (which must be preserved by called functions).

☞FP is the frame pointer register. It contains either zero, or a pointer to the most recently created stack back trace data structure. As with the stack pointer this must be preserved, but in handwritten code does not need to be available at every instant. However it should be valid whenever any strictly conforming functions are called.

☞SB is the static base register. If the variant of the APCS in use is reentrant, this register is used to access an array of static data pointers to allow code to access data repeatedly. However, if the variant being used is not reentrant, SB is instead available as an additional register variable, v6 (which must be preserved by called functions).

SP, SL, FP and SB must all be preserved on function exit for APCS conforming code.

Passing and Returning Structures

This section covers:

☞The default method for passing structures to and from functions.

☞Cases in which this is automatically optimized.

☞Telling the compiler to return a struct value in several registers.

The default method

Unless special conditions apply (detailed in following sections), C structures are passed in registers which if necessary overflow onto the stack and are returned via a pointer to the memory location of the result. For struct-valued functions, a pointer to the location where the struct result is to be placed is passed in a1 (the first argument register). The first argument is then passed in a2, the second in a3, and so on.

It is as if:

Struct s f(int x)

were compiled as:

void f(struct s *result, int x)

Consider the following code:


LDR            a4, [a2, #0]

LDR            ip, [a3, #0]

ADD          a4, a4, ip

STR            a4, [a1, #0]

LDR            a2, [a2, #4]

LDR            a3, [a3, #4]

ADD          a2, a2, a3

STR            a2, [a1, #4]

MOV         pc, lr

This is available in the directory examples/c and asm as two_ch.c. It can be compiled to produce assembly language source using:

armcc -S two_ch.c -li -apcs 3/32bit

where -li and -apcs 3/32bit can be omitted if armcc has been configured appropriately.

Here is the code which armcc produced (the version of armcc supplied with your release may produce slightly different output to that listed here):


MOV               ip, sp


STMDB           sp!, {a1-a3, fp, ip, lr, pc}

SUB                 fp, ip, #4

LDRB              a3, [fp, #-&14]

LDRB              a2, [fp, #-&10]

CMP                a3, a2

SUBLE             a2, fp, #&10

SUBGT            a2, fp, #&14

LDR                 a2, [a2, #0]

STR                 a2, [a1, #0]

LDMDB          fp, {fp, sp, pc}

The STMDB instruction saves the arguments onto the stack, together with the frame pointer, stack pointer, link register and current pc value.

Here, a2 and a3 are then used as temporary registers to hold the required part of the structures passed, and a1 is a pointer to an area in memory in which the resulting struct is placed—all as expected.

Returning integer-like structures

The ARM Procedure Call Standard specifies different rules for returning integer-like structures.

An integer-like structure:

☞Is no larger than one word in size.

☞Exclusively has sub-fields whose byte offset is 0.

The following structures are integer-like:



 unsigned a:8, b:8, c:8, d:8;


union polymorphic_ptr


struct A *a;

struct B *b;

int *i;


Whereas the structure used in the previous example is not:

struct { char ch1, ch2; }

An integer-like structure has its contents returned in al. This means that a1 is not needed to pass a pointer to a result struct in memory, and is instead used to pass the first argument.

For example, consider the following code:

typedef   struct  half_words_struct


unsigned field1:16;

unsigned field2:16;

} half_words;

half_words    max( half_words a, half_words b )


half_words  x;

x =  (a.field1>b.field1) ? a : b;

return x;

Arguments a and b will be passed in registers a1 and a2, and since half_word_struct is integer-like we expect a1 to return the result structure directly, rather than a pointer to it.

The above code is available in directory examples/candasm as half_str.c. It can be compiled to produce assembly language source using:

armcc -S half_str.c -li -apcs 3/32bit

where -li and -apcs 3/32bit can be omitted if armcc has been configured appropriately.

Here is the code which armcc produced:


MOV                   a3, a1, LSL #16

MOV                   a3, a3, LSR #16

MOV                   a4, a2, LSL #16

MOV                   a4, a4, LSR #16

CMP                    a3, a4

MOVLE              a1, a2

MOV                   pc, lr

From this we can see that the content of the half_words structure is returned directly in a1 as expected.

Returning non integer-like structures in registers

There are occasions when a function needs to return more than one value. The normal way to achieve this is to define a structure which holds all the values to be returned, and return this.

This will result in a pointer to the structure being passed in a1, which will then be de-referenced to store the values returned.

For some applications in which such a function is time-critical, the overhead involved in “wrapping” and then “un-wrapping” the structure can be significant. However, there is a way to tell the compiler that a structure should be returned in the argument registers a1 - a4. Clearly this is only useful for returning structures that are no larger than four words.

The way to tell the compiler to return a structure in the argument registers is to use the keyword __value_in_regs.

Example: returning a 64-bit Result

To illustrate how to use __value_in_regs, let us consider writing a function which multiplies two 32-bit integers together and returns a 64-bit result.

The way this function must work is to split the two 32-bit numbers (a, b) into high and low 16-bit parts (a_hi, a_lo, b_hi, b_lo). The four multiplications a_lo * b_lo, a_hi * b_lo, a_lo * b_hi, a_hi * b_lo must be performed and the results added together, taking care to deal with carry correctly.

Since the problem involves manipulation of the Carry flag, writing this function in C will not produce optimal code. We will therefore have to code the function in ARM assembly language. The following performs the algorithm just described:

; On entry a1 and a2 contain the 32-bit integers to be multiplied (a, b)

; On exit a1 and a2 contain the result (a1 bits 0-31, a2 bits 32-63)


MOV         ip, a1, LSR #16 ; ip = a_hi

MOV         a4, a2, LSR #16 ; a4 = b_hi

BIC             a1, a1, ip, LSL #16 ; a1 = a_lo

BIC             a2, a2, a4, LSL #16 ; a2 = b_lo

MUL          a3, a1, a2 ; a3 = a_lo * b_lo (m_lo)

MUL          a2, ip, a2 ; a2 = a_hi * b_lo (m_mid1)

MUL          a1, a4, a1 ; a1 = a_lo * b_hi (m_mid2)

MUL          a4, ip, a4 ; a4 = a_hi * b_hi (m_hi)

ADDS        ip, a2, a1 ; ip = m_mid1 + m_mid2 (m_mid)

ADDCS      a4, a4, #&10000 ; a4 = m_hi + carry (m_hi')

ADDS        a1, a3, ip, LSL #16 ; a1 = m_lo + (m_mid<<16)

ADC                    a2, a4, ip, LSR #16 ; a2 = m_hi' + (m_mid>>16)

                   + carry

MOV         pc, lr

This code is fine for use with assembly language modules, but in order to use it from C we need to tell the compiler that this routine returns its 64-bit result in registers. This can be done by making the following declarations in a header file:

typedef struct int64_struct


unsigned int lo;

unsigned int hi;

} int64;

__value_in_regs extern int64 mul64(unsigned a, unsigned b);

The above assembly language code and declarations, together with a test program, are all in directory examples/candasm as the files mul64.s, mul64.h, int64.h and multest.c. To compile, assemble and link these to produce an executable image suitable for armsd, first copy them to your current directory, and then execute the following commands:

armasm    mul64.s   -o   mul64.o   -li

armcc -c   multest.c   -li   -apcs    3/32bit

armlink mul64.o multest.o libpath/armlib.32l -o multest

where libpath is the directory in which the semi-hosted C libraries reside.

Note that -li and -apcs 3/32bit can be omitted if armcc and armasm (and armsd, below) have been configured appropriately.

multest can then be run under armsd as follows:

> armsd -li multest

A.R.M. Source-level Debugger, version 4.10 (A.R.M.) [Aug 26 1992]

ARMulator V1.20, 512 Kb RAM, MMU present, Demon 1.01, FPE, Little endian.

Object program file multest


armsd: go

Enter two unsigned 32-bit numbers in hex eg.(100 FF43D)

12345678 10000001

Least significant word of result is 92345678

Most significant word of result is 1234567

Program terminated normally at PC = 0x00008418

0x00008418: 0xef000011 .... : > swi 0x11

armsd: quit



To convince yourself that __value_in_regs is being used, try removing it from mul64.h, recompile multest.c, relink multest, and re-run armsd. This time the answers returned will be incorrect, since the result is no longer being returned in registers, but in a block of memory instead.