Getting Started with ARM Architecture

Getting started with ARM architecture

Tags: ARM Architecture, Introduction of ARM7, Advanced RISC Machines,Characteristics of ARM7, ARM7 Program Flow Control, ARM7 Register Organization, Tutorials for ARM7 Architecture,,
Overall rating
Introduction of ARM7

ARM7 is a 32-bit RISC (Reduced Instruction Set Computer) processor architecture developed by ARM Corporation. It was previously known as Advanced RISC Machines and prior to that Acron RISC Machines. ARM cores are used in mobile phones, handheld organizers (PDA), portable consumer devices.

The ARM7TDMI core is delivered as a hard macro cell optimized to provide the best combination of performance, power and area characteristics. The ARM7TDMI core enables system designers to build embedded devices requiring small size, low power and high performance.

The ARM7 families provide a wide range of performance, from 100 MIPS to 1000 MIPS. ARM7 has produced architectural families that are compatible, flexible, and encompass the full range of embedded requirements. Each product is designed to allow multi-sourcing at every level of development.

ARM processors possess a unique combination of features that makes ARM the most popular embedded architecture today. First, ARM cores are very simple compared to most other general-purpose processors, which means that they can be manufactured using a comparatively small number of transistors, leaving plenty of space on the chip for application-specific macro cells. A typical ARM chip can contain several peripheral controllers, a digital signal processor, and some amount of on-chip memory, along with an ARM core.

Second, both ARM ISA and pipeline design are aimed at minimizing energy consumption — a critical requirement in mobile embedded systems.

Third, the ARM architecture is highly modular: the only mandatory component of an ARM processor is the integer pipeline; all other components, including caches, MMU, floating point and other co-processors are optional, which gives a lot of flexibility in building application-specific ARM-based processors.

Finally, while being small and low-power, ARM processors provide high performance for embedded applications.

Basic Characteristics of ARM7

The main Features of ARM7 is,

  • 32/16-bit RISC architecture.
  • 32-bit ARM instruction set for maximum performance and flexibility.
  • 16-bit Thumb instruction set for increased code density.
  • Unified bus interface, 32-bit data bus carries both instructions and data.
  • Three-stage pipeline.
  • 32-bit ALU.
  • Very small die size and low power consumption.
  • Fully static operation.
  • Coprocessor interface.
  • Extensive debug facilities (Embedded ICE debug unit accessible via JTAG interface unit).

The principle feature of the ARM 7 microcontroller is that it is a register based load-and-store architecture with a number of operating modes. While the ARM7 is a 32 bit microcontroller, it is also capable of running a 16-bit instruction set, known as “THUMB”. This helps it achieve a greater code density and enhanced power saving.

While all of the register-to-register data processing instructions are single-cycle, other instructions such as data transfer instructions, are multi-cycle. To increase the performance of these instructions, the ARM 7 has a three-stage pipeline. Due to the inherent simplicity of the design and low gate count, ARM 7 is the industry leader in low-power processing on a watts per MIP basis.

Finally, to assist the developer, the ARM core has a built-in JTAG debug port and on-chip “embedded ICE” that allows programs to be downloaded and fully debugged in-system.

In order to keep the ARM 7 both simple and cost-effective, the code and data regions are accessed via a single data bus. Thus while the ARM 7 is capable of single-cycle execution of all data processing instructions, data transfer instructions may take several cycles since they will require at least two accesses onto the bus (one for the instruction one for the data). In order to improve performance, a three stage pipeline is used that allows multiple instructions to be processed simultaneously.

The pipeline has three stages; FETCH, DECODE and EXECUTE. The hardware of each stage is designed to be independent so up to three instructions can be processed simultaneously. The pipeline is most effective in speeding up sequential code.


Fig.1 ARM7 3-Stage Pipeline

About ARM7 Architecture


Fig.2 ARM7 Architecture

The architectural simplicity of ARM processors has traditionally led to very small implementations, and small implementations allow devices with very low power consumption.

Implementation size, performance, and very low power consumption remain key attributes in the development of the ARM architecture.

The ARM is a Reduced Instruction Set Computer (RISC), as it incorporates these typical RISC architecture features

  • A large uniform register file.
  • A load/store architecture, where data-processing operations only operate on register contents, not directly on memory contents.
  • Simple addressing modes, with all load/store addresses being determined from register contents and instruction fields only.
  • Uniform and fixed-length instruction fields, to simplify instruction decode.

In addition, the ARM architecture provides

  • Control over both the Arithmetic Logic Unit (ALU) and shifter in most data-processing instructions to maximize the use of an ALU and a shifter.
  • Auto-increment and auto-decrement addressing modes to optimize program loops.
  • Load and Store Multiple instructions to maximize data throughput.
  • Conditional execution of almost all instructions to maximize execution throughput

These enhancements to a basic RISC architecture allow ARM processors to achieve a good balance of high performance, small code size, low power consumption, and small silicon area.

ARM7 Registers

ARM has 31 general-purpose 32-bit registers. At any one time, 16 of these registers are visible. The other registers are used to speed up exception processing. All the register specifiers in ARM instructions can address any of the 16 visible registers.

The main bank of 16 registers is used by all unprivileged code. These are the User mode registers. User mode is different from all other modes as it is unprivileged, which means

  • User mode can only switch to another processor mode by generating an exception. The SWI instruction provides this facility from program control.
  • Memory systems and coprocessors might allow User mode less access to memory and coprocessor functionality than a privileged mode.

Three of the 16 visible registers have special roles

Stack pointer is used by the PUSH and POP instructions in T variants, and by the SRS and RFE instructions from ARMv6.

Link register holds the address of the next instruction after a Branch and Link (BL or BLX) instruction, which is the instruction used to make a subroutine call. It is also used for return address information on entry to exception modes. At all other times, it can be used as a general-purpose register.

Program counter register can be used in most instructions as a pointer to the instruction which is two instructions after the instruction being executed. In ARM state, all ARM instructions are four bytes long (one 32-bit word) and are always aligned on a word boundary. This means that the bottom two bits of the PC are always zero, and therefore the PC contains only 30 non-constant bits.

The remaining 13 registers have no special hardware purpose. Their uses are defined purely by software.

All other execution modes are privileged and are therefore only used to run system software.

  • Fast interrupt processing mode is entered whenever the processor receives an interrupt signal from the designated fast interrupt source.
  • Normal interrupt processing mode is entered whenever the processor receives an interrupt signal from any other interrupt source.
  • Software interrupt mode is entered when the processor encounters a software interrupt instruction. Software interrupts are a standard way to invoke operating system services on ARM.
  • Undefined instruction mode is entered when the processor attempts to execute an instruction that is supported neither by the main integer core nor by one of the coprocessors. This mode can be used to implement coprocessor emulation.
  • System mode is used for running privileged operating system tasks.
  • Abort mode is entered in response to memory faults.

In addition to user-visible registers, ARM provides several registers available in privileged modes only. SPSR registers are used to store a copy of the value of the CPSR register before an exception was raised. Those privileged modes that are activated in response to exceptions have their own R13 and R14 registers, which allows avoiding saving the corresponding user registers on every exception.

Exceptions and Interrupts Modes

ARM supports seven types of exception, and a privileged processing mode for each type. The seven types of exception are

  • Reset.
  • Attempted execution of an undefined instruction.
  • Software interrupt (SWI) instructions can be used to make a call to an operating system.
  • Prefetch Abort, an instruction fetch memory abort.
  • Data Abort, a data access memory abort.
  • IRQ, normal interrupt.
  • FIQ fast interrupts.

When an exception occurs, some of the standard registers are replaced with registers specific to the exception mode. All exception modes have replacement banked registers for R13 and R14. The fast interrupt mode has additional banked registers for fast interrupt processing.

When an exception handler is entered, R14 holds the return address for exception processing. Register 13 is banked across exception modes to provide each exception handler with a private stack pointer.

The fast interrupt mode also banks registers 8 to 12 so that interrupt processing can begin without the need to save or restore these registers.

There is a sixth privileged processing mode, System mode, which uses the User mode registers. This is used to run tasks that require privileged access to memory and/or coprocessors, without limitations on which exceptions can occur during the task. The reset shares the same privileged mode as SWIs.

The exception process

When an exception occurs, the ARM processor halts execution in a defined manner and begins execution at one of a number of fixed addresses in memory, known as the exception vectors. There is a separate vector location for each exception, including reset. A behavior is defined for normal running systems and debugs events. An operating system installs a handle on every exception at initialization.

Status registers

All processor state other than the general-purpose register contents is held in status registers. The current operating processor status is in the Current Program Status Register (CPSR). The CPSR holds

  • Four condition code flags (Negative, Zero, Carry and overflow).
  • One sticky (Q) flag. This encodes whether saturation has occurred in saturated arithmetic instructions, or signed overflow in some specific multiply accumulate instructions.
  • Four GE (Greater than or equal) flags. These encode the following conditions separately for each operation in parallel instructions
       - Whether the results of signed operations were non-negative.
       - Whether unsigned operations produced a carry or borrow.
  • Five bits are used to encode the current processor mode.
  • Two bits that encode whether ARM instructions, Thumb instructions are being executed.
  • One bit controls the endianness of load and store operations.

Each exception mode also has a Saved Program Status Register (SPSR) which holds the CPSR of the task immediately before the exception occurred. The CPSR and the SPSRs are accessed with special instructions.

ARM instruction set

The ARM instruction set can be divided into six broad classes of instruction

  • Branch instructions.
  • Data-processing instructions.
  • Status registers transfer instructions.
  • Load and store instructions.
  • Coprocessor instructions.
  • Exception-generating instructions.

Most data-processing instructions and one type of coprocessor instruction can update the four condition code flags in the CPSR (Negative, Zero, Carry and overflow) according to their result.

Almost all ARM instructions contain a 4-bit condition field. One value of this field specifies that the instruction is executed unconditionally.

Fourteen other values specify conditional execution of the instruction. If the condition code flags indicate that the corresponding condition is true when the instruction starts executing, it executes normally.

Otherwise, the instruction does nothing. The 14 available conditions allow

  • Tests for equality and non-equality.
  • Tests for <, <=, >, and >= inequalities, in both signed and unsigned arithmetic.
  • Each condition code flag to be tested individually.

The sixteenth value of the condition field encodes alternative instructions. These do not allow conditional execution. Before ARMv5 these instructions were UNPREDICTABLE.

Branch instructions

As well as allowing many data-processing or load instructions to change control flow by writing the PC, a standard Branch instruction is provided with a 24-bit signed word offset, allowing forward and backward branches of up to 32MB.

There is a Branch and Link (BL) option that also preserves the address of the instruction after the branch in R14, the LR. This provides a subroutine call which can be returned from by copying the LR into the PC.

There are also branch instructions which can switch instruction set, so that execution continues at the branch target using the Thumb instruction set or opcodes. Thumb support allows ARM code to call Thumb subroutines, and ARM subroutines to return to a Thumb caller. Similar instructions in the Thumb instruction set allow the corresponding Thumb ARM switches.

Data-processing instructions

The data-processing instructions perform calculations on the general-purpose registers. There are five types of data-processing instructions

  • Arithmetic/logic instructions.
  • Comparison instructions.
  • Single Instruction Multiple Data (SIMD) instructions.
  • Multiply instructions.
  • Miscellaneous Data Processing instructions.

Arithmetic/logic instructions

The ARM architecture provides a range of addition, subtraction, and bit-wise logical operations that take two 32-bit operands and return a 32-bit result that can be independently specified in the 3-address instruction format.

The first operand and the result should be stored in registers, while the second operand can be either register or immediate. In the former case, the second operand can be shifted or rotated before being sent to the ALU. Due to the limited space available for operand specification inside the 32-bit instruction, an immediate operand should be a 32-bit binary number where all the binary ones fall within a group of eight adjacent bit positions on a 2-bit boundary.

ARM also supports several multiply and multiply-accumulate instructions that take two 32-bit register operands and return a 32- or 64-bit result. Finally, ARM supports binary comparison operations that do not return any values but only modify condition flags in CPSR register.

Of the two source operands

  • One is always a register
  • The other has two basic forms
       - An immediate value.
       - A register value optionally shifted.

If the operand is a shifted register, the shift amount can be either an immediate value or the value of another register. Five types of shift can be specified.

Every arithmetic/logic instruction can therefore perform an arithmetic/logic operation and a shift operation.

As a result, ARM does not have dedicated shift instructions.

The Program Counter (PC) is a general-purpose register, and therefore arithmetic/logic instructions can write their results directly to the PC. This allows easy implementation of a variety of jump instructions.

Comparison instructions

The comparison instructions use the same instruction format as the arithmetic/logic instructions. These perform an arithmetic or logical operation on two source operands, but do not write the result to a register. They always update the condition flags, based on the result.

Single Instruction Multiple Data (SIMD) instructions

The Add and Subtract instructions treat each operand as two parallel 16-bit numbers, or four parallel 8-bit numbers. They can be treated as signed or unsigned. The operations can optionally be saturating, wrap around, or the results can be halved to avoid overflow.

Multiply instructions

There are several classes of multiply instructions, introduced at different times into the architecture.

Miscellaneous Data Processing instructions

These include Count Leading Zeros (CLZ) and Unsigned Sum of Absolute Differences with optional Accumulate (USAD8 and USADA8).

Status register transfer instructions

The status register transfer instructions transfer the contents of the CPSR or an SPSR to or from a general-purpose register. Writing to the CPSR can

  • Set the values of the condition code flags.
  • Set the values of the interrupt enable bits.
  • Set the processor mode and state.

Load and store instructions

The following load and store instructions are available

  • Load and Store Register.
  • Load and Store multiple registers.
  • Load and Store Register Exclusive.

There are also swap and swap byte instructions. It is recommended that all software migrates to using the load and store register exclusive instructions.

Load and Store Register

Load Register instructions can load a 64-bit double word, a 32-bit word, a 16-bit half word, or an 8-bit byte from memory into a register or registers. Byte and half word loads can be automatically zero-extended or sign-extended as they are loaded.

Store Register instructions can store a 64-bit double word, a 32-bit word, a 16-bit half word, or an 8-bit byte from a register or registers to memory.

Unaligned loads and stores of words and half words are supported, accessing the specified byte addresses.

Load and Store Register instructions have three primary addressing modes, all of which use a base register and an offset specified by the instruction

  • In offset addressing, the memory address is formed by adding or subtracting an offset to or from the base register value.
  • In pre-indexed addressing, the memory address is formed in the same way as for offset addressing.
  • In post-indexed addressing, the memory address is the base register value. As a side effect, an offset is added to or subtracted from the base register value and the result is written back to the base register.

In each case, the offset can be either an immediate or the value of an index register. Register-based offsets can also be scaled with shift operations.

As the PC is a general-purpose register, a 32-bit value can be loaded directly into the PC to perform a jump to any address in the 4GB memory space.

Load and Store multiple registers

Load Multiple (LDM) and Store Multiple (STM) instructions perform a block transfer of any number of the general-purpose registers to or from memory. Four addressing modes are provided

  • Pre-increment
  • Post-increment
  • Pre-decrement
  • Post-decrement.

As the subroutine return address and PC values are in general-purpose registers, very efficient subroutine entry and exit sequences can be constructed with LDM and STM

  • A single STM instruction at subroutine entry can push register contents and the return address onto the stack, updating the stack pointer in the process.
  • A single LDM instruction at subroutine exit can restore register contents from the stack, load the PC with the return address, and update the stack pointer.

LDM and STM instructions also allow very efficient code for block copies and similar data movement algorithms.

Load and Store Register Exclusive

These instructions support cooperative memory synchronization. They are designed to provide the atomic behavior required for semaphores without locking all system resources between the load and store phases.

Coprocessor instructions

There are three types of coprocessor instructions

Data-processing instructions

These start a coprocessor-specific internal operation.

Data transfer instructions

These transfer coprocessor data to or from memory. The address of the transfer is calculated by the ARM processor.

Register transfer instructions

These allow a coprocessor value to be transferred to or from an ARM register, or a pair of ARM registers.

Data transfer instructions

ARM supports two types of data transfer instructions: single-register transfers and multiple-register transfers.

Single-register transfer instructions provide a flexible way to move 1, 2, or 4-byte blocks between registers and memory, while multiple-register transfer instructions provide an efficient but less flexible way to move larger amounts of data.

The main addressing mode is base-plus-offset addressing. Value in the base register is added to the offset stored in a register or passed as an immediate value to form the memory address for load or store.

An auto-indexed addressing mode writes the value of the base register incremented by the offset back to the base register, so it is possible to access the value in the next memory location in the following instruction, without wasting an additional instruction to increment the register.

Multiple-register transfer instructions allow loading or storing any subset of the sixteen general-purpose registers from/to sequential memory addresses. Auto indexed addressing modes are also supported for multiple-register transfers.

Control flow instructions

In addition to usual conditional and unconditional branch instructions, the ARM architecture provides support for conditional execution of arbitrary instructions: any instruction can be predicated using values of CPSR condition flags.

Exception-generating instructions

Two types of instruction are designed to cause specific exceptions to occur.

Software interrupts instructions

SWI instructions cause a software interrupt exception to occur. These are normally used to make calls to an operating system, to request an OS-defined service. The exception entry caused by a SWI instruction also changes to a privileged processor mode. This allows an unprivileged task to gain access to privileged functions.

Software breakpoint instructions

BKPT instructions cause an abort exception to occur. If suitable debugger software is installed on the abort vector, an abort exception generated in this fashion is treated as a breakpoint. If debug hardware is present in the system, it can instead treat a BKPT instruction directly as a breakpoint, preventing the abort exception from occurring.

In addition to the above, the following types of instruction cause an Undefined Instruction exception to occur

  • Coprocessor instructions which are not recognized by any hardware coprocessor
  • Most instruction words that have not yet been allocated a meaning as an ARM instruction.

In each case, this exception is normally used either to generate a suitable error or to initiate software emulation of the instruction.

Thumb instruction set

The Thumb instruction set is a subset of the ARM instruction set, with each instruction encoded in 16 bits instead of 32 bits.

Data Types

The ARM instruction set supports six data types namely 8 bit signed and unsigned, 16 bit signed and unsigned plus 32 bit signed and unsigned.

The ARM processor instruction set has been designed to support these data types in little or Big-endian formats however most ARM silicon implementations use the Little-endian format.

ARM instructions typically have a three-operand format, as shown below

      ADD R1, R2, R3;

      R1 = R2 + R3;

ARM processors support the following data types:

Byte - 8 bits

Half word - 16 bits

Word - 32 bits

ARM7 Program Flow Control

In all processors there is a small group of instructions that are conditionally executed depending on a group of processor flags. These are branch instructions such as branch not equal. Within the ARM instruction set, all instructions are conditionally executable.

The top four bits of each instruction contain a condition code that must be satisfied if the instruction is to be executed. This goes a long way to eliminating small branches in the program code and eliminating stalls in the pipeline so increasing the overall program performance. Thus for small conditional branches of three instructions or less, conditional execution of instructions should be used.

Processor modes

The ARM architecture supports the seven processor modes. Mode changes can be made under software control, or can be caused by external interrupts or exception processing.


Fig.3 ARM7 Register Organization

Most application programs execute in User mode. When the processor is in User mode, the program being executed is unable to access some protected system resources or to change mode, other than by causing an exception to occur. This allows a suitably-written operating system to control the use of system resources.

The modes other than User mode are known as privileged modes. They have full access to system resources and can change mode freely.

Five of them are known as exception modes

  • FIQ
  • IRQ
  • Supervisor
  • Abort
  • Undefined.

These are entered when specific exceptions occur. Each of them has some additional registers to avoid corrupting User mode state when the exception occurs.

The remaining mode is System mode, which is not entered by any exception and has exactly the same registers available as User mode.

However, it is a privileged mode and is therefore not subject to the User mode restrictions. It is intended for use by operating system tasks that need access to system resources, but wish to avoid using the additional registers associated with the exception modes. Avoiding such use ensures that the task state is not corrupted by the occurrence of any exception.

Join the World's Largest Technical Community

we respect your privacy.