3.1 A Historical Perspective¶

Computers execute machine code (sequences of bytes encoding low-level operations)
A compiler generates machine code through a series of stages
- The gcc C compiler generates its output in the form of assembly code (a textual representation of the machine code giving the individual instructions in the program)
Moore's Law: the number of transistors on a microchip doubles about every 26 months

3.2 Program Encodings¶

Machine-level Code¶

To compile code using command line: gcc -O1 -o p p1.c p2.c

Two forms of abstraction important for machine level programming: 1. Instruction set architecture (ISA): defines the format and behavior of a machine-level program. Defines the processor state, the format of the instructions, and the effect each of these instructions will have on the state. 2. Virtual addresses: The memory addresses used by a machine-level program, providing a memory model that is a very large byte array.

Assembly / Machine Code View: ![[Screen Shot 2024-05-19 at 18.29.42.png]]

Machine code visible processor state: 1. Program Counter (PC): address in memory of next instruction 2. Register file: heavily used program data 1. Integer register file: contains 8 named locations storing 32-bit values 2. Floating-point registers: store floating-point data 3. Condition codes: status information about the most recently executed arithmetic or logical instruction. Used for conditional branching

Memory: - the executable machine code for the program - some information required by the operating system - a run-time stack for managing procedure calls and returns - blocks of memory allocated by the user (for example, by using the malloc library function)

Arrays and structures are represented in machine code as contiguous collections of bytes.
汇编代码不区分有符号数和无符号数，不区分指针的不同类型，不区分指针和整数。
一条机器指令只执行一个非常基本的操作。

Code Examples¶

C Code¶

code.c:

int accum = 0;
int sum(int x, int y) 
{
    int t = x + y;
    accum += t;
    return t;
}

C to Assembly¶

To see the assembly code generated by the C compiler, we can use the “-S” option on the command line:

unix> gcc -O1 -S code.c

This will cause gcc to run the compiler, generating an assembly file code.s, and go no further. (Normally it would then invoke the assembler to generate an objectcode file.)

code.s:

sum: 
    pushl %ebp 
    movl %esp, %ebp 
    movl 12(%ebp), %eax 
    addl 8(%ebp), %eax 
    addl %eax, accum 
    popl %ebp 
    ret

Assembly to Machine Code¶

If we use the ‘-c’ command-line option, gcc will both compile and assemble the code:

unix> gcc -O1 -c code.c

This will generate an object-code file code.o that is in binary format and hence cannot be viewed directly. Embedded within the 800 bytes of the file code.o is a 17-byte sequence having hexadecimal representation code.o:

55 89 e5 8b 45 0c 03 45 08 01 05 00 00 00 00 5d c3

Machine Code to Assembly - Disassemblers¶

With Linux systems, the program objdump (for “object dump”) can serve this role given the ‘-d’ command-line flag:

unix> objdump -d code.o

3.3 Data Formats¶

![[Screen Shot 2024-05-19 at 23.13.56.png]] ![[Screen Shot 2024-05-21 at 00.26.19.png]] Most assembly-code instructions generated by gcc have a single-character suffix denoting the size of the operand. Floating point involves an entirely different set of instructions and registers

3.4 Accessing Information¶

An IA32 CPU contains a set of 8 registers storing 32-bit values. ![[Screen Shot 2024-05-19 at 23.41.18.png]]
An x86-64 central processing unit (CPU) contains a set of 16 general-purpose registers storing 64-bit values. ![[Screen Shot 2024-05-19 at 23.54.18.png]]

Operand Specifiers¶

Different ways operands can be used in computer instructions: 1. Immediate: The operand is a constant value directly in the instruction. 2. Register: The operand is a value stored in a specific register. 3. Memory: We access some memory location according to a computed address (effective address). Since we view the memory as a large array of bytes, we use the notation $M_b[Addr]$ to denote a reference to the b-byte value stored in memory starting at address $Addr$. To simplify things, we will generally drop the subscript b.

![[Screen Shot 2024-05-20 at 00.13.37.png]]

General form of computing effective address: $$Imm(r_b, r_i, s)=Imm+R[r_b]+R[r_i]\cdot s$$ - immediate offset $Imm$ - base register $r_b$ - index register $r_i$ - scale factor $s$

Data Movement Instructions¶

`mov` class¶

![[Screen Shot 2024-05-20 at 15.25.26.png]] - x86-64 imposes the restriction that a move instruction cannot have both operands refer to memory locations. - The movabsq instruction can have an arbitrary 64-bit immediate value as its source operand and can only have a register as a destination.

`movz` class and `movs` class¶

movz 系列和 movs 系列可以把较小的源值复制到较大的目的，目的都是寄存器。
movz 将目的寄存器剩余字节做零扩展，movs 做符号扩展 ![[Screen Shot 2024-05-20 at 15.36.53.png]] ![[Screen Shot 2024-05-20 at 15.37.06.png]]

Data Movement Example¶

A function returns a value by storing it in register %rax, or in one of the low-order portions of this register.
Pointers are addresses, dereferencing a pointer involves copying that pointer into a register, and then using this register in a memory reference.
- Dereference a pointer in Assembly: (%rdi)
Local variables are often kept in registers rather than stored in memory locations. Register access is much faster than memory access

Pushing and Popping Stack Data¶

stack: last in, first out
The stack grows downward such that the top element of the stack has the lowest address of all stack elements.
The stack pointer %rsp holds the address of the top stack element.

![[Screen Shot 2024-05-20 at 19.05.28.png]]

3.5 Arithmetic and Logical Operations¶

Most of the operations are given as instruction classes, as they can have different variants with different operand sizes. ![[Screen Shot 2024-05-20 at 22.44.03.png]]

Load Effective Address¶

The load effective address instruction leaq is actually a variant of the movq instruction.
Reads from memory to a register
Uses
- Computing addresses without a memory reference
  - translation of p = &x[i];
- Computing arithmetic expressions of the form $x+k\cdot y$
  - k = 1, 2, 4, 8
  - if register %rdx contains value x, then the instruction leaq 7(%rdx,%rdx,4), %rax will set register %rax to 5x + 7

Unary and Binary Operations¶

Unary operations - the single operand serving as both source and destination. - This operand can be either a register or a memory location.

Binary operations - the second operand is used as both a source and a destination

Shift Operations¶

The different shift instructions can specify the shift amount either as an immediate value or with the single-byte register %cl.

3.6 Control¶

Condition Codes¶

In addition to the integer registers, CPU maintains a set of single-bit condition code registers, describing attributes of the most recent arithmetic or logical operation. - CF: Carry flag. Most recent operation has carry out bit. (Overflow for unsigned operations) - ZF: Zero flag. Most recent operation yielded zero. - SF: Sign flag. Most recent operation yielded a negative value. - OF: Overflow flag. Most recent operation caused a two's-complement overflow - either negative or positive. These flags are set implicitly by operations (Exception: leaq doesn't alter condition codes)

Comparison and test instructions ![[Screen Shot 2024-05-21 at 18.00.50.png]]

Accessing the Condition Codes¶

Three common ways of using the condition codes: 1. set a single byte to 0 or 1 depending on some combination of the condition codes 2. conditionally jump to some other part of the program 3. conditionally transfer data

SET instructions - first way - A SET instruction has either one of the low-order single-byte register elements or a single-byte memory location as its destination, setting this byte to either 0 or 1. ![[Screen Shot 2024-05-21 at 18.11.27.png]]

Jump Instructions¶

A jump instruction can cause the execution to switch to a completely new position in the program. These jump destinations are generally indicated in assembly code by a label. ![[Screen Shot 2024-05-21 at 18.52.17.png]]

Implementing Conditional Branches with Conditional Control¶

addq    $1, ge_cnt(%rip)

- ge_cnt(%rip): This specifies the memory location of the variable ge_cnt relative to the Instruction Pointer (RIP). This mode of addressing is known as RIP-relative addressing, commonly used in x86-64 assembly to access global variables or static data.

Conditional Control¶

How it works: Uses jump instructions to skip parts of code based on a condition.
Example: if statements that use goto or jump to execute different code sections.

Conditional Moves¶

How it works: Uses special instructions to move data based on a condition without jumping.
Example: cmov instructions that copy values into registers only if a condition is met.

Key Difference¶

Conditional Control: Changes the flow of the program.
Conditional Moves: Keeps the program flow linear but conditionally updates data.

Implementing Conditional Branches with Conditional Moves¶

![[Screen Shot 2024-05-22 at 02.51.20.png]] general form of if-else structure in C:

v = test-expr ? then-expr : else-expr;

general form of the conditional control using goto code:

t = test_expr;
if (!t)
    goto false;
then-statement
goto done;

false:
    else-statement
done:
    return

general form of conditional moves:

v = then-expr;
ve = else-expr;
t = test-expr;
if (!t) v = ve;

Loops¶

`do-while`¶

The general form of a do-while statement:

do 
    body-statement 
    while (test-expr);

goto statements:

loop: 
    body-statement 
    t = test-expr; 
    if (t) 
        goto loop;

`while`¶

The general form of a while statement is:

while (test-expr) 
    body-statement

goto statements:

    goto test; 
loop: 
    body-statement 
test: 
    t = test-expr; 
    if (t) 
        goto loop;

`for`¶

The general form of a for loop is:

for (init-expr; test-expr; update-expr) 
    body-statement

goto statements:

    init-expr; 
    t = test-expr; 
    if (!t) 
        goto done; 
loop: 
    body-statement 
    update-expr; 
    t = test-expr; 
    if (t) 
        goto loop; 
done:

Switch Statements¶

The key step in executing a switch statement is to access a code location through the jump table. ![[Screen Shot 2024-05-22 at 12.55.31.png]]

3.7 Procedures¶

The Run-Time Stack¶

![[Screen Shot 2024-05-24 at 16.18.21.png]]

Control Transfer¶

Passing control from function P to function Q involves simply setting the program counter (PC) to the starting address of the code for Q.
When it later comes time for Q to return, the processor must have some record of the code location where it should resume the execution of P. ![[Screen Shot 2024-05-24 at 15.50.42.png]]
call instruction: has a target indicating the address of the instruction where the called procedure starts (direct / indirect)

Data Transfer¶

When procedure P calls procedure Q, the code for P must first copy the arguments into the proper registers. Similarly, when Q returns back to P, the code for P can access the returned value in register %rax.
most of these data passing to and from procedures take place via registers.
With x86-64, up to six integral (i.e., integer and pointer) arguments can be passed via registers.![[Screen Shot 2024-05-24 at 16.12.05.png]]
When a function has more than six integral arguments, the other ones are passed on the stack.
Assume that procedure P calls procedure Q with n integral arguments, such that n > 6. Then the code for P must allocate a stack frame with enough storage for arguments 7 through n, as illustrated in Figure 3.25.
It copies arguments 1–6 into the appropriate registers, and it puts arguments 7 through n onto the stack, with argument 7 at the top of the stack.
- When passing parameters on the stack, all data sizes are rounded up to be multiples of eight.
With the arguments in place, the program can then execute a call instruction to transfer control to procedure Q. Procedure Q can access its arguments via registers and possibly from the stack. If Q, in turn, calls some function that has more than six arguments, it can allocate space within its stack frame for these, as is illustrated by the area labeled “Argument build area” in Figure 3.25.

Local Storage on the Stack¶

Common cases when local data must be stored in memory: - There are not enough registers to hold all of the local data. - The address operator ‘&’ is applied to a local variable, and hence we must be able to generate an address for it. - Some of the local variables are arrays or structures and hence must be accessed by array or structure references.

3.8 Array Allocation and Access¶

Basic Principles¶

For data type T and integer constant N, the form is: T A[N] - allocates a contiguous region of $L \cdot N$ bytes in memory, L = size (in bytes) of data type T - introduces an identifier $A$ that is used as a pointer to the beginning of the array: $x_A$ - index: $0$ to $N-1$ - array element i will stored at address $x_A+L\cdot i$ Examples: ![[Screen Shot 2024-06-03 at 22.05.57.png]]

Pointer Arithmetic¶

generation of pointer: &
dereferencing pointer: *
For an expression Expr denoting some object, &Expr is a pointer giving the address of the object.
For an expression AExpr denoting an address, *AExpr gives the value at that address.
The expressions Expr and *&Expr are therefore equivalent.
Array reference: A[i] is identical to the expression *(A+i)
- It computes the address of the $i^{th}$ array element and then accesses this memory location.
if p is a pointer to data of type T , and the value of p is xp, then the expression p+i has value $x_p + L \cdot i$, where L is the size of data type T .

![[Screen Shot 2024-06-04 at 11.44.37.png]]

Nested Arrays¶

In general, for an array declared as T D[R][C]; array element D[i][j] is at memory address $\&D[i][j] = x_D + L(C \cdot i + j)$

Fixed-Size Arrays¶

Suppose we declare data type fix_matrix to be 16 × 16 arrays of integers as follows:

#define N 16 
typedef int fix_matrix[N][N];

To compute the inner product of row i from A and column k from B
- (1) generating a pointer, which we have named Aptr, that points to successive elements in row i of A,
- (2) generating a pointer, which we have named Bptr, that points to successive elements in column k of B
- (3) generating a pointer, which we have named Bend, that equals the value Bptr will have when it is time to terminate the loop

Example: fixed matrix product ![[Screen Shot 2024-06-05 at 02.44.15.png]] ![[Screen Shot 2024-06-05 at 02.48.59.png]]![[Screen Shot 2024-06-05 at 02.50.28.png]] - salq $6, %rdx - salq stands for "Shift Arithmetic Left Quadword". It multiplies %rdx by 64 (2^6) and stores the result back in %rdx. This is used to compute the offset for accessing the correct row in matrix A based on the index i.

Variable-Size Arrays¶

variable-size arrays: allocate using malloc or calloc
declare an array: int A[expr1][expr2]

![[Screen Shot 2024-06-05 at 05.19.45.png]] Example: variable matrix product ![[Screen Shot 2024-06-05 at 05.21.49.png]]![[Screen Shot 2024-06-05 at 05.21.59.png]]

3.9 Heterogeneous Data Structures¶

Structures¶

The C struct declaration creates a data type that groups objects of possibly different types into a single object.

Structure represented as block of memory
- Big enough to hold all the fields
Fields ordered according to declaration
- Even if another ordering could be more compact
Compiler determines overall size + positions of fields
- In assembly, we see only offsets, not field names

Generating pointer to structure member ![[Screen Shot 2024-06-05 at 08.02.53.png]] Linked List ![[Screen Shot 2024-06-05 at 08.05.35.png]] ![[Screen Shot 2024-06-05 at 08.06.06.png]]

Unions¶

Data Alignment¶

![[Screen Shot 2024-06-05 at 08.45.56.png]] ![[Screen Shot 2024-06-05 at 09.14.22.png]] ![[Screen Shot 2024-06-05 at 09.16.19.png]] Saving Space: Put large data types first

Byte Ordering: ![[Screen Shot 2024-06-05 at 09.22.04.png]]

3.10 Combining Control and Data in Machine-Level Programs¶

Memory Layout¶

![[Screen Shot 2024-06-05 at 09.36.34.png]]

Buffer Overflow¶

Vulnerability¶

![[Screen Shot 2024-06-05 at 10.12.28.png]] ![[Screen Shot 2024-06-05 at 10.07.31.png]]

“buffer overflow”: When exceeding the memory size allocated for an array

Protection¶

Avoid overflow vulnerabilities
- use library routines that limit string lengths
- fgets instead of gets
- strncpy instead of strcpy
- Don’t use scanf with %s conversion specification
  - Use fgets to read the string
  - Or use %ns where n is a suitable integer
Employ system-level protections
Have compiler use “stack canaries”

3.1 A Historical Perspective¶

3.2 Program Encodings¶

Machine-level Code¶

Code Examples¶

C Code¶

C to Assembly¶

Assembly to Machine Code¶

Machine Code to Assembly - Disassemblers¶

3.3 Data Formats¶

3.4 Accessing Information¶

Operand Specifiers¶

Data Movement Instructions¶

mov class¶

movz class and movs class¶

Data Movement Example¶

Pushing and Popping Stack Data¶

3.5 Arithmetic and Logical Operations¶

Load Effective Address¶

Unary and Binary Operations¶

Shift Operations¶

3.6 Control¶

Condition Codes¶

Accessing the Condition Codes¶

Jump Instructions¶

Implementing Conditional Branches with Conditional Control¶

Conditional Control¶

Conditional Moves¶

Key Difference¶

Implementing Conditional Branches with Conditional Moves¶

Loops¶

do-while¶

while¶

for¶

Switch Statements¶

3.7 Procedures¶

The Run-Time Stack¶

Control Transfer¶

Data Transfer¶

Local Storage on the Stack¶

3.8 Array Allocation and Access¶

Basic Principles¶

Pointer Arithmetic¶

Nested Arrays¶

Fixed-Size Arrays¶

Variable-Size Arrays¶

3.9 Heterogeneous Data Structures¶

Structures¶

Unions¶

Data Alignment¶

3.10 Combining Control and Data in Machine-Level Programs¶

Memory Layout¶

Buffer Overflow¶

Vulnerability¶

Protection¶

Bypassing Protection¶

`mov` class¶

`movz` class and `movs` class¶

`do-while`¶

`while`¶

`for`¶