Most modern CPUs — the ones we'll encounter in malware analysis — are based on the Von Neumann architecture. It defines how a processor fetches, decodes, and executes instructions. That three-step loop is the heartbeat of every program running on a system, including any malware.
There are five key components:
Fetches the next instruction from memory. Uses the Instruction Pointer register to know where to look.
Actually executes the instruction — does the maths, the comparisons, the logic. Puts the result in a register or back in memory.
Ultra-fast, tiny storage built into the CPU itself. Holds whatever the CPU is actively working with.
Where the full program — its code and data — lives while it's running. Much larger than registers but slower.
What matters for malware analysis: the Instruction Pointer (EIP in 32-bit, RIP in 64-bit) always tells the CPU what to run next. Almost all exploitation techniques — buffer overflows, ROP chains, shellcode injection — ultimately work by corrupting or redirecting this register.
Registers are the fastest storage available — they sit directly on the CPU die. Because there are so few of them, they're split by purpose. There are four main categories: the Instruction Pointer, General Purpose Registers, Status Flags, and Segment Registers.
Holds the address of the next instruction to execute. Simple concept, enormous importance.
Called EIP in 32-bit systems and RIP in 64-bit systems.
You'll see this referenced constantly in exploit writeups — it's always the prize.
These are the CPU's working hands. Each has a conventional role (accumulated results, loop counters, stack tracking), but they can generally be used for any computation. Knowing their conventions is what makes reading disassembly faster.
Each register can be accessed at different widths. EAX (32-bit) contains AX (lower 16-bit), which contains AH (upper 8-bit) and AL (lower 8-bit). This sub-register access shows up constantly in assembly.
| Register | Nickname | What it's for | 64-bit |
|---|---|---|---|
| EAX | Accumulator | Stores arithmetic results; holds return values from function calls | RAX |
| EBX | Base Register | Base address for memory offset references | RBX |
| ECX | Counter | Loop counters, string operation counts | RCX |
| EDX | Data Register | Multiplication/division overflow; I/O port operations | RDX |
| ESP | Stack Pointer | Always points to the top of the stack — updates on every push/pop | RSP |
| EBP | Base Pointer | Stable reference point for the current function's stack frame | RBP |
| ESI | Source Index | Source address for string/memory copy operations | RSI |
| EDI | Destination Index | Destination address for string/memory copy operations | RDI |
| R8–R15 | Extended GPRs | 64-bit only. Additional registers; used heavily in x64 calling conventions for passing function arguments | R8–R15 |
The EFLAGS register (64-bit: RFLAGS) is a 32-bit register
made entirely of individual 1-bit flags. After each instruction runs, the CPU automatically
updates the relevant flags. These flags drive conditional branches — if ZF is set,
jump here; otherwise go there — so understanding them is essential for following
program logic in a disassembler.
Set to 1 when an instruction produces a zero result. Used constantly in comparisons — CMP sets ZF, then JE (jump if equal) checks it.
Set when a result is too large (or too small) for its destination register — an unsigned overflow or underflow. Signals the carry out of the most significant bit.
Set when the most significant bit of a result is 1, meaning the result is negative in signed arithmetic.
When set, forces single-step execution — the CPU stops after every single instruction. Used by debuggers. Malware actively checks this to detect if it's being debugged.
The Trap Flag is a classic anti-debugging trick. Malware can deliberately set TF and then check whether it causes an exception that's handled internally or by an external debugger. If a debugger is present, behaviour changes — the payload hides or the sample terminates early. Always worth checking for TF manipulation during dynamic analysis.
Segment registers are 16-bit registers originally designed to help address memory by
dividing the flat address space into named regions. Modern OSes use a flat memory model
so these are less critical than they once were, but they still appear in disassembly and
one of them — FS — is actively used by Windows in ways malware exploits.
| Register | Segment | What it points to |
|---|---|---|
| CS | Code Segment | The executable code section of the program |
| DS | Data Segment | The data section — global and static variables |
| SS | Stack Segment | The program's stack |
| ES, FS, GS | Extra Segments | Additional data sections. FS is used by Windows to hold the Thread Information Block (TIB) — a structure malware often reads to get process/thread info without calling obvious APIs. |
When Windows loads a program, it doesn't give it access to all of physical RAM. The OS presents each process with its own virtual address space — an abstracted view of memory that looks complete but is isolated from other processes. Within that space, memory is divided into four sections:
Code — machine code instructions. Has execute permissions. If malware can inject shellcode into a region and get the CPU to treat it as code, that's code execution.
Data — initialised globals and constants set at compile time. Rarely changes at runtime. Often contains interesting strings that show up in static analysis.
Heap — dynamically allocated memory (think malloc).
Created and freed at runtime. Heap spray attacks and use-after-free bugs live here.
Stack — the most important section for malware analysis. Contains local variables, function arguments, and return addresses. Because return addresses control execution flow, the stack is the primary target for memory corruption attacks.
The stack is a Last In, First Out (LIFO) structure. Picture a stack of plates — you can only add to or remove from the top. The CPU tracks it with two registers at all times:
ESP / RSP (Stack Pointer) — always points to the current top of the stack.
Every push decrements it; every pop increments it.
EBP / RBP (Base Pointer) — a stable reference point for the current
function's stack frame. While ESP moves around, EBP stays fixed, making it easy to
reference local variables and arguments by a constant offset.
Every time a function is called, a chunk of the stack is set up for it — a stack frame. Reading from high addresses (top) to low (bottom):
There's a standard sequence of instructions at the start and end of every function call. Once you recognise these patterns in a disassembler, function boundaries become obvious.
Notice that local variables sit right below the Saved EBP and Return Address.
If a local buffer (say, char buf[64]) is written to without bounds checking,
an attacker can write past the end of that buffer and overwrite the Return Address
with any value they want. When the function hits ret, the CPU jumps to
the attacker's address — their shellcode, a ROP gadget, anywhere. This is a Stack
Buffer Overflow, and it's one of the most fundamental techniques in binary exploitation.