Linux Process Address Space

  high address   +---------------+
                 |               |
                 |     Stack     |    int local_b
                 |               |
                 +---------------+
                 |       |       |
                 |       v       |
                 |               |
                 |               |
                 |       ^       |
                 |       |       |
                 +---------------+
                 |               |
                 |     Heap      |    int * heap_c = malloc()
                 |               |
                 +---------------+
                 |     Data      |    int global_a
                 +---------------+
                 |     Code      |
  low address    +---------------+

上图是 Linux 的进程地址空间,从低位到高位地址分别为:

  • Code Segment: 程序的代码,CPU 执行的指令部分,共享只读
  • Data Segment: 可细分为初始化数据段和未初始化数据段,常用于存储全局变量等
  • Stack: 函数以及自动变量(未加 static 的自动变量又称为局部变量)
  • Heap: 动态分配内存,如 malloc() 分配的内存

更为详细的介绍请见 Anatomy of a Program in Memory


Fork

                  Parent Process             Child Process

  high address   +---------------+          +---------------+
                 |               |          |               |
                 |     Stack     |          |     Stack     |
                 |               |          |               |
                 +---------------+          +---------------+
                 |       |       |          |       |       |
                 |       v       |          |       v       |
                 |               |          |               |
                 |               |          |               |
                 |       ^       |          |       ^       |
                 |       |       |          |       |       |
                 +---------------+          +---------------+
                 |               |          |               |
                 |     Heap      |          |     Heap      |
                 |               |          |               |
                 +---------------+          +---------------+
                 |     Data      |          |     Data      |
                 +---------------+----------+---------------+
                 |                   Code                   |
  low address    +------------------------------------------+

fork 是 linux 中最重要的系统调用之一,用于创建一个新进程,它完全的复制父进程地址空间的 data segment、 heap 和 stack,但是和父进程共享一个 code segment,因为 code segment 通常为只读,从逻辑的角度来看,子进程和父进程的内存地址空间互相独立,子进程修改自己的 data segment,heap 和 stack 并不影响父进程内存空间。每次调用 fork,返回两次结果,其中父进程的返回值为子进程的 pid,子进程的返回值为 0。

#include<stdio.h>
#include<stdlib.h>
#include <unistd.h>

int global_a = 0;   // data segment

int main(void)
{
    int local_b = 0, status;    // stack
    int * heap_c = malloc(sizeof(int));     // heap
    pid_t pid;

    if(!fork()){
        global_a ++;
        local_b ++;
        *heap_c = 1;
        exit(0);
    }
    else{
        wait(&status);
    }

    printf("global_a = %d, local_b = %d, heap_c = %d\n", global_a, local_b, *heap_c);
}

程序的输出结果如下:

$ ./a.out
global_a = 0, local_b = 0, heap_c = 0

Note:为了减轻 fork 调用开销,实际采用 copy on write(COW) 技术。


Properties shared by both parent and child process

虽然父进程和子进程的地址空间是独立的,但是二者依旧共享很多其它的资源,以下摘自 Advanced Programming in the UNIX Environment, 3rd Edition

  • File descriptor
  • Real user ID, real group ID, effective user ID, effective group ID
  • Supplementary group IDs
  • Process group ID
  • Session ID
  • Controlling terminal
  • The set-user-ID and set-group-ID flags
  • Current working directory
  • Root directory
  • File mode creation mask
  • Signal mask and dispositions
  • The close-on-exec flag for any open file descriptors
  • Environment
  • Attached shared memory segments
  • Memory mappings
  • Resource limits