<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <id>https://shuye.dev/zh-Hans/blog</id>
    <title>Ye Shu Blog</title>
    <updated>2021-08-13T00:00:00.000Z</updated>
    <generator>https://github.com/jpmonette/feed</generator>
    <link rel="alternate" href="https://shuye.dev/zh-Hans/blog"/>
    <subtitle>Ye Shu Blog</subtitle>
    <icon>https://shuye.dev/zh-Hans/favicon.ico</icon>
    <entry>
        <title type="html"><![CDATA[内存泄漏 与 malloc chunk]]></title>
        <id>https://shuye.dev/zh-Hans/blog/malloc_chunk</id>
        <link href="https://shuye.dev/zh-Hans/blog/malloc_chunk"/>
        <updated>2021-08-13T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[我为什么写这篇文章]]></summary>
        <content type="html"><![CDATA[<h2 class="anchor anchorWithStickyNavbar_LWe7" id="how-it-all-started">我为什么写这篇文章<a href="https://shuye.dev/zh-Hans/blog/malloc_chunk#how-it-all-started" class="hash-link" aria-label="标题的直接链接" title="标题的直接链接">​</a></h2>
<p>在我暑期实习期间 debug 一个内存泄漏的问题时，我发现我使用的其中一个 API return 了一个裸指针，从而把这个目标的 ownership 转移给了调用者（我）。换言之，我现在需要负责在代码运行完毕之后手动 <code>delete</code> 掉这个指针。尽管这是一个 <a href="https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#i11-never-transfer-ownership-by-a-raw-pointer-t-or-reference-t" target="_blank" rel="noopener noreferrer">非常糟糕的工程实践</a>，我开始对内存泄漏是如何产生的，以及 <code>delete[]</code> 是如何删除内存的产生了兴趣。</p>
<p>在做了一些研究与实验后，我写下了这篇文章。本文将试图回答三组问题：</p>
<ol>
<li>什么是内存泄漏？</li>
<li>对象是如何在 堆 (heap) 上被分配的？<code>delete[]</code> 如何知道它需要释放哪块内存？</li>
<li>我们如何预防内存泄漏？</li>
</ol>
<p>Stack Overflow 上的问题 <a href="https://stackoverflow.com/questions/197675/how-does-delete-know-the-size-of-the-operand-array" target="_blank" rel="noopener noreferrer">"How does delete[] 'know' the size of the operand array?"</a> 其实已经大致回答了我们的第二个问题，但我还是决定更深入地探讨一下实际的内存空间是什么样的。</p>
<p>巧合的是，我和朋友 <a href="https://guozhen.dev/" target="_blank" rel="noopener noreferrer">@gzhding</a> 刚好在最近的一次 CTF 比赛中合作了一道 堆利用 (heap exploitation) 的题目。因为这份经历，我学会了如何使用 <code>gdb</code> 调试并查看堆上的内存，以借其管中窥豹。</p>
<div class="theme-admonition theme-admonition-info admonition_xJq3 alert alert--info"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"></path></svg></span>信息</div><div class="admonitionContent_BuS1"><p>注：我先写成了本文的英文版，之后才试图将其译回中文。因此如有可能的话，请<a href="https://shuye.dev/blog/malloc_chunk/" target="_blank" rel="noopener noreferrer">以英文阅读本文</a>，以避免一些因为翻译质量导致的语句不顺与理解困难。</p></div></div>
<!-- -->
<div class="tocCollapsible_ETCw"><button type="button" class="clean-btn tocCollapsibleButton_TO0P">本页总览</button></div>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="what-are-memory-leaks">什么是内存泄漏<a href="https://shuye.dev/zh-Hans/blog/malloc_chunk#what-are-memory-leaks" class="hash-link" aria-label="标题的直接链接" title="标题的直接链接">​</a></h2>
<p>我们知道 <a href="https://www.cplusplus.com/doc/tutorial/dynamic/" target="_blank" rel="noopener noreferrer">C++ 能够在堆上动态地分配内存</a>。一个常见的例子是使用 <code>new[]</code> 创建数组，以及 <code>delete[]</code> 删除数组。</p>
<p>当我们在内存中创建了一个数组（即分配了一段内存用以存储这个对象）而又忘记删除它时，<a href="https://en.wikipedia.org/wiki/Memory_leak" target="_blank" rel="noopener noreferrer">内存泄漏</a> 就会发生。当指向这段内存的指针超出作用域 (scope) 时，正在运行的代码就丢失了对被分配的内存的知识。在最坏的情况下，如果内存泄漏在一个循环中发生，新分配的内存能够持续地堆积而不被释放，最终使得电脑变慢甚至崩溃。</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="poc">PoC<a href="https://shuye.dev/zh-Hans/blog/malloc_chunk#poc" class="hash-link" aria-label="标题的直接链接" title="标题的直接链接">​</a></h3>
<p>以下有一段简单的 Proof of Concept (PoC) 代码。其中的 <code>main()</code> 函数调用了 <code>memory_leak()</code> 函数，后者又创建了一个由 26 个 <code>char</code> 组成的数组，并将大写英文字母填入它们。</p>
<div class="language-cpp codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockTitle_Ktv7">memory_leak.cpp</div><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-cpp codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">void</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">memory_leak</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic">// Always delete pointers created by new to avoid memory leaks!</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">char</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">*</span><span class="token plain">arr </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">new</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">char</span><span class="token punctuation" style="color:#393A34">[</span><span class="token number" style="color:#36acaa">26</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">(</span><span class="token keyword" style="color:#00009f">int</span><span class="token plain"> i </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">0</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"> i </span><span class="token operator" style="color:#393A34">&lt;</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">26</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"> i</span><span class="token operator" style="color:#393A34">++</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        arr</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">i</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">char</span><span class="token punctuation" style="color:#393A34">(</span><span class="token number" style="color:#36acaa">65</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">+</span><span class="token plain"> i</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"> </span><span class="token comment" style="color:#999988;font-style:italic">// 65 is the ascii of 'A'</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic">// The memory area is not freed!</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic">// delete[] arr;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">int</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">main</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token function" style="color:#d73a49">memory_leak</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">0</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">}</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="复制代码到剪贴板" title="复制" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>因为 <code>delete[]</code> 语句已经被注释掉，当函数 <code>memory_leak()</code> return 时，指针 <code>arr</code> 会超出作用域 (scope) 并导致这一内存区域被泄漏。</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="a-deeper-look-into-the-memory">初探内存<a href="https://shuye.dev/zh-Hans/blog/malloc_chunk#a-deeper-look-into-the-memory" class="hash-link" aria-label="标题的直接链接" title="标题的直接链接">​</a></h3>
<div class="theme-admonition theme-admonition-note admonition_xJq3 alert alert--secondary"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M6.3 5.69a.942.942 0 0 1-.28-.7c0-.28.09-.52.28-.7.19-.18.42-.28.7-.28.28 0 .52.09.7.28.18.19.28.42.28.7 0 .28-.09.52-.28.7a1 1 0 0 1-.7.3c-.28 0-.52-.11-.7-.3zM8 7.99c-.02-.25-.11-.48-.31-.69-.2-.19-.42-.3-.69-.31H6c-.27.02-.48.13-.69.31-.2.2-.3.44-.31.69h1v3c.02.27.11.5.31.69.2.2.42.31.69.31h1c.27 0 .48-.11.69-.31.2-.19.3-.42.31-.69H8V7.98v.01zM7 2.3c-3.14 0-5.7 2.54-5.7 5.68 0 3.14 2.56 5.7 5.7 5.7s5.7-2.55 5.7-5.7c0-3.15-2.56-5.69-5.7-5.69v.01zM7 .98c3.86 0 7 3.14 7 7s-3.14 7-7 7-7-3.12-7-7 3.14-7 7-7z"></path></svg></span>备注</div><div class="admonitionContent_BuS1"><p>我使用了 <a href="https://github.com/hugsy/gef" target="_blank" rel="noopener noreferrer">GEF</a> (GDB Enhanced Features) 而不是原生 GDB 以获取经过美化的输出以及诸如 <code>heap</code> 一类的额外功能。</p></div></div>
<p>让我们以 <code>g++ -g3 memory_leak.cpp -o memory_leak</code> 来编译这个程序（<code>-g3</code> flag 会在编译时保存程序的调试信息）并使用 <code>gdb</code> 来验证这一内存泄漏。</p>
<p>我们将会在 <code>memory_leak()</code> 函数的最后打一个断点，并运行程序直到其触发断点。</p>
<div class="language-console codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-console codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">$ gdb memory_leak</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">gef➤  b 11</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">Breakpoint 1 at 0x1179: file memory_leak.cpp, line 11.</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">gef➤  r</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">[...]</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">─────────────────────────────────────────────────────────────── source:memory_leak.cpp+11 ────</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      6          arr[i] = char(65 + i); // 65 is the ascii of 'A'</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      7      }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      8</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      9      // The memory area is not freed!</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">     10      // delete[] arr;</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">●→   11  }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">     12</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">     13  int main() {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">     14      memory_leak();</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">     15      return 0;</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">     16  }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">───────────────────────────────────────────────────────────────────────────────── threads ────</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">[#0] Id 1, Name: "memory_leak", stopped 0x555555555179 in memory_leak (), reason: BREAKPOINT</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">─────────────────────────────────────────────────────────────────────────────────── trace ────</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">[#0] 0x555555555179 → memory_leak()</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">[#1] 0x555555555186 → main()</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">──────────────────────────────────────────────────────────────────────────────────────────────</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">gef➤  info locals</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">arr = 0x55555556aeb0 "ABCDEFGHIJKLMNOPQRSTUVWXYZ"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">gef➤  x/8xw 0x55555556aeb0</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">0x55555556aeb0: 0x44434241      0x48474645      0x4c4b4a49      0x504f4e4d</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">0x55555556aec0: 0x54535251      0x58575655      0x00005a59      0x00000000</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="复制代码到剪贴板" title="复制" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>在程序触发断点后，我们打印出指针 <code>arr</code> 指向的地址及这块内存的内容。注意内存是以 <a href="https://zh.wikipedia.org/zh-cn/%E5%AD%97%E8%8A%82%E5%BA%8F#%E5%B0%8F%E7%AB%AF%E5%BA%8F" target="_blank" rel="noopener noreferrer">小端序</a> 存储的，因此 <code>0x44</code> (D) 排在 <code>0x43</code> (C)，<code>0x42</code> (B)，以及 <code>0x41</code> (A) 之前。</p>
<p>现在，让我们继续运行这个程序，直到函数 <code>memory_leak()</code> 运行完毕返回至 <code>main()</code>。</p>
<div class="language-gdb codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-gdb codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">gef➤  finish</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">[...]</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">─────────────────────────────────────────────────────────────── source:memory_leak.cpp+15 ────</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">     10      // delete[] arr;</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">●    11  }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">     12</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">     13  int main() {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">     14      memory_leak();</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"> →   15      return 0;</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">     16  }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">───────────────────────────────────────────────────────────────────────────────── threads ────</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">[#0] Id 1, Name: "memory_leak", stopped 0x555555555186 in main (), reason: TEMPORARY BREAKPOINT</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">─────────────────────────────────────────────────────────────────────────────────── trace ────</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">[#0] 0x555555555186 → main()</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">──────────────────────────────────────────────────────────────────────────────────────────────</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">gef➤  info locals</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">No locals.</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">gef➤  x/8xw 0x55555556aeb0</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">0x55555556aeb0: 0x44434241      0x48474645      0x4c4b4a49      0x504f4e4d</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">0x55555556aec0: 0x54535251      0x58575655      0x00005a59      0x00000000</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="复制代码到剪贴板" title="复制" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>既然 <code>memory_leak()</code> return 了，我们就丢失了指向内存地址 <code>0x55555556aeb0</code> 的指针 <code>arr</code>。但当我们打印出内存区域时，发现这些数据仍然存储在内存中，没有（也不会）被释放。这就是内存泄漏。</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="verification-with-valgrind">利用 Valgrind 进行验证<a href="https://shuye.dev/zh-Hans/blog/malloc_chunk#verification-with-valgrind" class="hash-link" aria-label="标题的直接链接" title="标题的直接链接">​</a></h3>
<p>此外，我们能够使用如 <a href="https://valgrind.org/" target="_blank" rel="noopener noreferrer">Valgrind</a> 一样的自动化工具来检查内存泄漏。</p>
<div class="language-console codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-console codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">$ valgrind --leak-check=full ./memory_leak</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">==382643== Memcheck, a memory error detector</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">==382643== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">==382643== Using Valgrind-3.17.0 and LibVEX; rerun with -h for copyright info</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">==382643== Command: ./memory_leak</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">==382643==</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">==382643==</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">==382643== HEAP SUMMARY:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">==382643==     in use at exit: 26 bytes in 1 blocks</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">==382643==   total heap usage: 2 allocs, 1 frees, 72,730 bytes allocated</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">==382643==</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">==382643== 26 bytes in 1 blocks are definitely lost in loss record 1 of 1</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">==382643==    at 0x484021F: operator new[](unsigned long) (vg_replace_malloc.c:579)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">==382643==    by 0x10914A: memory_leak() (memory_leak.cpp:3)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">==382643==    by 0x109185: main (memory_leak.cpp:14)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">==382643==</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">==382643== LEAK SUMMARY:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">==382643==    definitely lost: 26 bytes in 1 blocks</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">==382643==    indirectly lost: 0 bytes in 0 blocks</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">==382643==      possibly lost: 0 bytes in 0 blocks</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">==382643==    still reachable: 0 bytes in 0 blocks</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">==382643==         suppressed: 0 bytes in 0 blocks</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">==382643==</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">==382643== For lists of detected and suppressed errors, rerun with: -s</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">==382643== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="复制代码到剪贴板" title="复制" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="how-are-objects-allocated-on-the-heap">对象是如何在堆 (heap) 上被分配的<a href="https://shuye.dev/zh-Hans/blog/malloc_chunk#how-are-objects-allocated-on-the-heap" class="hash-link" aria-label="标题的直接链接" title="标题的直接链接">​</a></h2>
<p>为了更好地理解内存泄漏背后的机制，我们需要了解 C++ 是如何分配以及释放内存的。换言之，<code>new</code> 与 <code>delete</code> 是如何工作的。让我们一起深入进 GNU 的 <code>libstdc++</code> 实现（g++ 默认使用的库）的源码。</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="how-new-and-delete-works"><code>new</code> 与 <code>delete</code> 是如何工作的<a href="https://shuye.dev/zh-Hans/blog/malloc_chunk#how-new-and-delete-works" class="hash-link" aria-label="标题的直接链接" title="标题的直接链接">​</a></h3>
<div class="theme-admonition theme-admonition-info admonition_xJq3 alert alert--info"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"></path></svg></span>信息</div><div class="admonitionContent_BuS1"><p>因为 <code>new</code> 与 <code>delete</code> 操作符仅仅是 C++ 标准中定义的 interface，它们拥有不同的实现。我在此处将使用 GNU 在 gcc 11.2 版本中提供的 <code>libstdc++</code> 的 <a href="https://github.com/gcc-mirror/gcc/tree/releases/gcc-11.2.0" target="_blank" rel="noopener noreferrer">源码</a>。</p></div></div>
<h4 class="anchor anchorWithStickyNavbar_LWe7" id="new-and-delete-are-just-wrappers-of-new-and-delete"><code>new[]</code> 和 <code>delete[]</code> 只是对 <code>new</code> 和 <code>delete</code> 的封装<a href="https://shuye.dev/zh-Hans/blog/malloc_chunk#new-and-delete-are-just-wrappers-of-new-and-delete" class="hash-link" aria-label="标题的直接链接" title="标题的直接链接">​</a></h4>
<p>有意思的是，从 <code>operator new[]</code> 的实现（<a href="https://github.com/gcc-mirror/gcc/blob/releases/gcc-11.2.0/libstdc++-v3/libsupc++/new_opv.cc#L29-L33" target="_blank" rel="noopener noreferrer">源码</a>）来看，<code>new[]</code> 在 <code>stdlibc++</code> 中只是 <code>new</code> 的一个别名。</p>
<div class="language-cpp codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockTitle_Ktv7">/libstdc++-v3/libsupc++/new_opv.cc:L29-33</div><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-cpp codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">_GLIBCXX_WEAK_DEFINITION </span><span class="token keyword" style="color:#00009f">void</span><span class="token operator" style="color:#393A34">*</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">operator</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">new</span><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">std</span><span class="token double-colon punctuation" style="color:#393A34">::</span><span class="token plain">size_t sz</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">_GLIBCXX_THROW</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">std</span><span class="token double-colon punctuation" style="color:#393A34">::</span><span class="token plain">bad_alloc</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> </span><span class="token double-colon punctuation" style="color:#393A34">::</span><span class="token keyword" style="color:#00009f">operator</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">new</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">sz</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">}</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="复制代码到剪贴板" title="复制" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>对 <code>delete[]</code>（<a href="https://github.com/gcc-mirror/gcc/blob/releases/gcc-11.2.0/libstdc++-v3/libsupc++/del_opv.cc#L32-L36" target="_blank" rel="noopener noreferrer">源码</a>）而言亦是如此，它不过是 <code>delete</code> 的别名。</p>
<div class="theme-admonition theme-admonition-caution admonition_xJq3 alert alert--warning"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 16 16"><path fill-rule="evenodd" d="M8.893 1.5c-.183-.31-.52-.5-.887-.5s-.703.19-.886.5L.138 13.499a.98.98 0 0 0 0 1.001c.193.31.53.501.886.501h13.964c.367 0 .704-.19.877-.5a1.03 1.03 0 0 0 .01-1.002L8.893 1.5zm.133 11.497H6.987v-2.003h2.039v2.003zm0-3.004H6.987V5.987h2.039v4.006z"></path></svg></span>警告</div><div class="admonitionContent_BuS1"><p>根据 GNU stdlibc++ 的实现来看，似乎混合使用 <code>new[]</code> 与 <code>new</code>，以及 <code>delete[]</code> 与 <code>delete</code> 是完全可以接受的。</p><p>但是，你应当避免这么做，因为这种行为是取决于实现的。根据 <a href="https://timsong-cpp.github.io/cppwp/expr.delete#2" target="_blank" rel="noopener noreferrer">C++ Working Paper</a>，使用 <code>new</code> 和 <code>delete</code> 而不是 <code>new[]</code> 和 <code>delete[]</code> 会导致未定义的行为，这会使调试变得一团糟。</p></div></div>
<h4 class="anchor anchorWithStickyNavbar_LWe7" id="and-new--delete-are-wrappers-of-malloc-and-free">而 <code>new</code> 和 <code>delete</code> 不过是对 <code>malloc</code> 和 <code>free</code> 的封装<a href="https://shuye.dev/zh-Hans/blog/malloc_chunk#and-new--delete-are-wrappers-of-malloc-and-free" class="hash-link" aria-label="标题的直接链接" title="标题的直接链接">​</a></h4>
<p>让我们接下来看看 <code>new</code> 的 <a href="https://github.com/gcc-mirror/gcc/blob/releases/gcc-11.2.0/libstdc++-v3/libsupc++/new_op.cc#L41-L59" target="_blank" rel="noopener noreferrer">源码</a>。它也只是一个对 C 中的 <code>malloc</code> 加上一些错误处理的封装，并会在最后给调用者 return 一个 <code>malloc</code> 返回的原始指针。</p>
<div class="language-cpp codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockTitle_Ktv7">/libstdc++-v3/libsupc++/new_op.cc:L41-59</div><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-cpp codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">_GLIBCXX_WEAK_DEFINITION </span><span class="token keyword" style="color:#00009f">void</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">*</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">operator</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">new</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">std</span><span class="token double-colon punctuation" style="color:#393A34">::</span><span class="token plain">size_t sz</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">_GLIBCXX_THROW</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">std</span><span class="token double-colon punctuation" style="color:#393A34">::</span><span class="token plain">bad_alloc</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token keyword" style="color:#00009f">void</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">*</span><span class="token plain">p</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic">/* malloc (0) is unpredictable; avoid it.  */</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">(</span><span class="token function" style="color:#d73a49">__builtin_expect</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">sz </span><span class="token operator" style="color:#393A34">==</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">0</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">false</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    sz </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">1</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token keyword" style="color:#00009f">while</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">p </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">malloc</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">sz</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">==</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">0</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      new_handler handler </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> std</span><span class="token double-colon punctuation" style="color:#393A34">::</span><span class="token function" style="color:#d73a49">get_new_handler</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">(</span><span class="token operator" style="color:#393A34">!</span><span class="token plain"> handler</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">	</span><span class="token function" style="color:#d73a49">_GLIBCXX_THROW_OR_ABORT</span><span class="token punctuation" style="color:#393A34">(</span><span class="token function" style="color:#d73a49">bad_alloc</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token function" style="color:#d73a49">handler</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> p</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">}</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="复制代码到剪贴板" title="复制" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p><code>delete</code>（<a href="https://github.com/gcc-mirror/gcc/blob/releases/gcc-11.2.0/libstdc++-v3/libsupc++/del_op.cc#L46-L50" target="_blank" rel="noopener noreferrer">源码</a>）更加简单，直接调用了 C 中的 <code>free</code>。</p>
<div class="language-cpp codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockTitle_Ktv7">libstdc++-v3/libsupc++/del_op.cc:L46-50</div><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-cpp codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">_GLIBCXX_WEAK_DEFINITION </span><span class="token keyword" style="color:#00009f">void</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">operator</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">delete</span><span class="token punctuation" style="color:#393A34">(</span><span class="token keyword" style="color:#00009f">void</span><span class="token operator" style="color:#393A34">*</span><span class="token plain"> ptr</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">noexcept</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  std</span><span class="token double-colon punctuation" style="color:#393A34">::</span><span class="token function" style="color:#d73a49">free</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">ptr</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">}</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="复制代码到剪贴板" title="复制" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>这样一来，我们似乎需要一路深入到 C 标准库中对 <code>malloc</code> 与 <code>free</code> 的实现才能知道在数组的创建与销毁背后究竟发生了什么。</p>
<p>然而，我们不会涵盖与 <code>malloc</code> 相关的全部内容（这些内容本身就足够撑起另外一篇文章了），我们将主要关注 <code>malloc</code> 如何组织它分配的内存空间（答案：在堆上构建 <code>malloc_chunk</code>）以及 <code>free</code> 是如何知道去释放哪块内存的。</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="how-are-malloc_chunks-structured"><code>malloc_chunk</code> 的结构是什么样的?<a href="https://shuye.dev/zh-Hans/blog/malloc_chunk#how-are-malloc_chunks-structured" class="hash-link" aria-label="标题的直接链接" title="标题的直接链接">​</a></h3>
<div class="theme-admonition theme-admonition-info admonition_xJq3 alert alert--info"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"></path></svg></span>信息</div><div class="admonitionContent_BuS1"><p>与上节一样，我将使用 GNU 对 C 标准库的实现, 即 <code>glibc</code>。
glibc 的当前版本是在 2021 年 8 月 2 日 release 出的 <a href="https://sourceware.org/git/?p=glibc.git;a=tag;h=refs/tags/glibc-2.34" target="_blank" rel="noopener noreferrer">glibc 2.34</a>。</p></div></div>
<p>以下内容来自 glibc 中 <code>malloc/malloc.c</code> 的注释（<a href="https://sourceware.org/git/?p=glibc.git;a=blob;f=malloc/malloc.c;h=e065785af77af72c17c773517c15b248b067b4ad;hb=ae37d06c7d127817ba43850f0f898b793d42aea7#l1168" target="_blank" rel="noopener noreferrer">源码</a>）。以下内容为英文原文，我可能会在之后某个时候考虑将其翻译为中文。我在原文之上进行了一些微小的编辑以将其适配为 Markdown 格式（本网站使用的格式化工具）。</p>
<blockquote>
<p>(The following includes lightly edited explanations by Colin Plumb.)</p>
<p>Chunks of memory are maintained using a `boundary tag' method as
described in e.g., Knuth or Standish. (See the paper by Paul
Wilson <a href="ftp://ftp.cs.utexas.edu/pub/garbage/allocsrv.ps" target="_blank" rel="noopener noreferrer">ftp://ftp.cs.utexas.edu/pub/garbage/allocsrv.ps</a> for a
survey of such techniques.) Sizes of free chunks are stored both
in the front of each chunk and at the end. This makes
consolidating fragmented chunks into bigger chunks very fast. The
size fields also hold bits representing whether chunks are free or
in use.</p>
<p>An allocated chunk looks like this:</p>
<div class="codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">    chunk-&gt; +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            |             Size of previous chunk, if unallocated (P clear)  |</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            |             Size of chunk, in bytes                     |A|M|P|</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      mem-&gt; +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            |             User data starts here...                          .</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            .                                                               .</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            .             (malloc_usable_size() bytes)                      .</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            .                                                               |</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">nextchunk-&gt; +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            |             (size of chunk, but used for application data)    |</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            |             Size of next chunk, in bytes                |A|0|1|</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="复制代码到剪贴板" title="复制" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>Where "chunk" is the front of the chunk for the purpose of most of
the malloc code, but "mem" is the pointer that is returned to the
user. "Nextchunk" is the beginning of the next contiguous chunk.</p>
<p>Chunks always begin on even word boundaries, so the mem portion
(which is returned to the user) is also on an even word boundary, and
thus at least double-word aligned.</p>
<p>Free chunks are stored in circular doubly-linked lists, and look like this:</p>
<div class="codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">    chunk-&gt; +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            |             Size of previous chunk, if unallocated (P clear)  |</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    `head:' |             Size of chunk, in bytes                     |A|0|P|</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      mem-&gt; +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            |             Forward pointer to next chunk in list             |</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            |             Back pointer to previous chunk in list            |</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            |             Unused space (may be 0 bytes long)                .</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            .                                                               .</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            .                                                               |</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">nextchunk-&gt; +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    `foot:' |             Size of chunk, in bytes                           |</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            |             Size of next chunk, in bytes                |A|0|0|</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="复制代码到剪贴板" title="复制" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>The P (<code>PREV_INUSE</code>) bit, stored in the unused low-order bit of the
chunk size (which is always a multiple of two words), is an in-use
bit for the <em>previous</em> chunk. If that bit is <em>clear</em>, then the
word before the current chunk size contains the previous chunk
size, and can be used to find the front of the previous chunk.
The very first chunk allocated always has this bit set,
preventing access to non-existent (or non-owned) memory. If
<code>prev_inuse</code> is set for any given chunk, then you CANNOT determine
the size of the previous chunk, and might even get a memory
addressing fault when trying to do so.</p>
<p>[...]</p>
<p>Note that the <code>foot</code> of the current chunk is actually represented
as the <code>prev_size</code> of the NEXT chunk. This makes it easier to
deal with alignments etc but can be very confusing when trying
to extend or adapt this code.</p>
<p>[...]</p>
</blockquote>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="a-verification-using-poc">利用 PoC 代码的验证<a href="https://shuye.dev/zh-Hans/blog/malloc_chunk#a-verification-using-poc" class="hash-link" aria-label="标题的直接链接" title="标题的直接链接">​</a></h3>
<p>现在我们将使用 <code>gdb</code> 打印出内存区域并验证以上的解释在我们的代码中是如何工作的。这里我将使用 <a href="https://gef.readthedocs.io/en/master/commands/heap/#heap-chunk-command" target="_blank" rel="noopener noreferrer">GEF</a> 的 <code>heap</code> 功能来更好地显示 <code>malloc</code> 分配的 chunk 的属性。</p>
<div class="codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">gef➤  heap chunk arr</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">Chunk(addr=0x55555556aeb0, size=0x30, flags=PREV_INUSE)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">Chunk size: 48 (0x30)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">Usable size: 40 (0x28)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">Previous chunk size: 0 (0x0)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">PREV_INUSE flag: On</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">IS_MMAPPED flag: Off</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">NON_MAIN_ARENA flag: Off</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">gef➤  x/16xw 0x55555556aeb0-16</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">0x55555556aea0:	0x00000000	0x00000000	0x00000031	0x00000000</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">0x55555556aeb0:	0x44434241	0x48474645	0x4c4b4a49	0x504f4e4d</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">0x55555556aec0:	0x54535251	0x58575655	0x00005a59	0x00000000</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">0x55555556aed0:	0x00000000	0x00000000	0x0000f131	0x00000000</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="复制代码到剪贴板" title="复制" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>值得注意的是，chunk 的大小是 48 字节，可用大小（实际存储用户内容的区域）为 40 字节，要远远大出我们所请求的（26 个字符的数组，应当占据 26 字节的空间）。这是因为 “chunk 永远开始于双数 字（word）的边界……因此至少是双字对齐的”<sup><a href="https://shuye.dev/zh-Hans/blog/malloc_chunk#user-content-fn-1-c5d1db" id="user-content-fnref-1-c5d1db" data-footnote-ref="true" aria-describedby="footnote-label">1</a></sup></p>
<div class="theme-admonition theme-admonition-info admonition_xJq3 alert alert--info"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"></path></svg></span>信息</div><div class="admonitionContent_BuS1"><p>一个 <a href="https://zh.wikipedia.org/zh-cn/%E5%AD%97_(%E8%AE%A1%E7%AE%97%E6%9C%BA)" target="_blank" rel="noopener noreferrer">字（word）</a> 的大小是取决于系统架构的。一般而言，64 位系统的字长为 64 比特，也就是 8 字节。然而，在 <code>gdb</code> 的 <code>x/w</code> 命令中，字长为固定的 32 比特（4 字节），非常令人迷惑。因此，我将使用“字”代指现实中的取决于系统的可变长度的字，而使用“32 比特字”代指 <code>gdb</code> 中的字。</p></div></div>
<p>因为内存中的 chunk 永远是双字对齐的，我们应该从地址中减去 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mn>2</mn><mo>×</mo><mn>8</mn><mo>=</mo><mn>16</mn></mrow><annotation encoding="application/x-tex">2\times8=16</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.7278em;vertical-align:-0.0833em"></span><span class="mord">2</span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">×</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:0.6444em"></span><span class="mord">8</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.6444em"></span><span class="mord">16</span></span></span></span> 字节来获得指向 <code>chunk</code> 的指针地址。这里的第一个字（在 gdb 中显示为两个 32 比特字）被 <code>0x00</code> 填满了；并且将在 <code>P</code> flag 被复位时填入上一个 chunk 的大小。</p>
<p>第二个字 <code>0x31</code>（或是 <code>0b110001</code>）存储了该 chunk 的大小以及 3 个 flag。最低有效位（LSB）<code>0b1</code> 代表 flag <code>P</code> (PREV_INUSE) 被设置了，因此上一个 chunk 还未被释放。因为所有 chunk 的大小都必须至少是 8 字节的整数倍，因此其大小的 3 个最低有效位都必定为 0，这也是为什么这三位 LSB 能被用作 flag。在计算 chunk 大小时，我们能够直接丢弃三位 LSB 并取得 <code>0b110000</code>（<code>0x30</code>，或是 48）字节。</p>
<div class="theme-admonition theme-admonition-note admonition_xJq3 alert alert--secondary"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M6.3 5.69a.942.942 0 0 1-.28-.7c0-.28.09-.52.28-.7.19-.18.42-.28.7-.28.28 0 .52.09.7.28.18.19.28.42.28.7 0 .28-.09.52-.28.7a1 1 0 0 1-.7.3c-.28 0-.52-.11-.7-.3zM8 7.99c-.02-.25-.11-.48-.31-.69-.2-.19-.42-.3-.69-.31H6c-.27.02-.48.13-.69.31-.2.2-.3.44-.31.69h1v3c.02.27.11.5.31.69.2.2.42.31.69.31h1c.27 0 .48-.11.69-.31.2-.19.3-.42.31-.69H8V7.98v.01zM7 2.3c-3.14 0-5.7 2.54-5.7 5.68 0 3.14 2.56 5.7 5.7 5.7s5.7-2.55 5.7-5.7c0-3.15-2.56-5.69-5.7-5.69v.01zM7 .98c3.86 0 7 3.14 7 7s-3.14 7-7 7-7-3.12-7-7 3.14-7 7-7z"></path></svg></span>备注</div><div class="admonitionContent_BuS1"><p>如果你足够仔细的话，你可能已经发现了：chunk 的可用大小是 40 字节，只比 chunk 大小小了 8 字节（而不是 16），也就是一个字。</p><p>这是因为 <code>chunk</code> 指针 “指向的并不是本 chunk 的开始，而是上一 chunk 的最后一字”<sup><a href="https://shuye.dev/zh-Hans/blog/malloc_chunk#user-content-fn-2-c5d1db" id="user-content-fnref-2-c5d1db" data-footnote-ref="true" aria-describedby="footnote-label">2</a></sup>（<a href="https://sourceware.org/glibc/wiki/MallocInternals#What_is_a_Chunk.3F" target="_blank" rel="noopener noreferrer">来源</a>）。实际上，chunk 开始于 <code>chunk</code> 指针指向的后一个字（也就是存储 chunk 大小的字）。</p></div></div>
<p>也就是说，我们 “实际的” chunk 开始于内存地址 <code>0x55555556aea8</code> 并结束于 <code>0x55555556aec8</code>。数据区域开始于 <code>0x55555556aeb0</code> 并结束于 <code>0x55555556aec8</code>。同理，下一个 <code>chunk</code> 指针指向的是本 chunk 的数据区域的最后一个字（<code>0x55555556aec8</code>）。</p>
<p>既然如此，为什么 chunk 指针会令人迷惑地指向前一 chunk 的最后一字呢？答案与 <code>free</code> 设计的理念有关。</p>
<p>当前一个 chunk 被 free 时，它会把最后一字填充为它的大小，并清除下一个 chunk（本 chunk）中的 P flag。这样，本 chunk 就可以通过这个大小在前一 chunk 被释放后 “找到前一 chunk 的起始位置” <sup><a href="https://shuye.dev/zh-Hans/blog/malloc_chunk#user-content-fn-3-c5d1db" id="user-content-fnref-3-c5d1db" data-footnote-ref="true" aria-describedby="footnote-label">3</a></sup>。</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="how-does-free-work"><code>free</code> 是如何工作的？<a href="https://shuye.dev/zh-Hans/blog/malloc_chunk#how-does-free-work" class="hash-link" aria-label="标题的直接链接" title="标题的直接链接">​</a></h3>
<p>到现在，你应该已经知道 “<code>delete[]</code> 如何知道它需要释放哪块内存？” 的答案了：因为这个 chunk 的大小就被存储在它的元数据中。</p>
<p>但是，还是有一些细节值得我们进行探讨：为什么 chunk 指针要指向上一 chunk 的结尾？我们为什么需要 <code>PREV_INUSE</code> (P) flag？为了解答这些疑问，我们需要了解 <code>free</code> 是如何工作的。</p>
<div class="theme-admonition theme-admonition-note admonition_xJq3 alert alert--secondary"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M6.3 5.69a.942.942 0 0 1-.28-.7c0-.28.09-.52.28-.7.19-.18.42-.28.7-.28.28 0 .52.09.7.28.18.19.28.42.28.7 0 .28-.09.52-.28.7a1 1 0 0 1-.7.3c-.28 0-.52-.11-.7-.3zM8 7.99c-.02-.25-.11-.48-.31-.69-.2-.19-.42-.3-.69-.31H6c-.27.02-.48.13-.69.31-.2.2-.3.44-.31.69h1v3c.02.27.11.5.31.69.2.2.42.31.69.31h1c.27 0 .48-.11.69-.31.2-.19.3-.42.31-.69H8V7.98v.01zM7 2.3c-3.14 0-5.7 2.54-5.7 5.68 0 3.14 2.56 5.7 5.7 5.7s5.7-2.55 5.7-5.7c0-3.15-2.56-5.69-5.7-5.69v.01zM7 .98c3.86 0 7 3.14 7 7s-3.14 7-7 7-7-3.12-7-7 3.14-7 7-7z"></path></svg></span>备注</div><div class="admonitionContent_BuS1"><p>在阅读本节时，你可以与 <a href="https://shuye.dev/zh-Hans/blog/malloc_chunk#how-are-malloc_chunks-structured"><code>malloc_chunk</code> 的结构是什么样的?</a> 节进行交叉对照，以查看 chunk 在 <code>free</code> 前后的结构分别是什么样子的。</p></div></div>
<p>长话短说，<code>free</code> 大致是如下工作的。当它被调用时（<a href="https://sourceware.org/git/?p=glibc.git;a=blob;f=malloc/malloc.c;h=e065785af77af72c17c773517c15b248b067b4ad;hb=ae37d06c7d127817ba43850f0f898b793d42aea7#l3237" target="_blank" rel="noopener noreferrer">源码</a>），用户会传给它一个指向数据地址的指针。<code>free</code> 则会调用 <code>mem2chunk</code>（<a href="https://sourceware.org/git/?p=glibc.git;a=blob;f=malloc/malloc.c;h=e065785af77af72c17c773517c15b248b067b4ad;hb=ae37d06c7d127817ba43850f0f898b793d42aea7#l1310" target="_blank" rel="noopener noreferrer">源码</a>）将其转换为指向 chunk 头的指针。
随后，如果这一 chunk 是被 <code>mmap</code> 分配的（可由 M flag 得知），<code>free</code> 会调用 <code>munmap</code>（<a href="https://man.archlinux.org/man/munmap.3p.en" target="_blank" rel="noopener noreferrer">man 3p</a> | <a href="https://man.archlinux.org/man/munmap.2.en" target="_blank" rel="noopener noreferrer">man 2</a>）进行释放；如果不是，它会将 chunk 指针传给 <code>_int_free</code>（<a href="https://sourceware.org/git/?p=glibc.git;a=blob;f=malloc/malloc.c;h=e065785af77af72c17c773517c15b248b067b4ad;hb=ae37d06c7d127817ba43850f0f898b793d42aea7#l4302" target="_blank" rel="noopener noreferrer">源码</a>）正式进行释放。</p>
<p>然而， <code>free</code> 一个 chunk “并不会将其交还操作系统以给其他程序使用。<code>free()</code> 调用仅仅是将这块内存标记为 ‘可被本程序重新使用’，但对于操作系统而言，这块内存仍然 ‘属于’ 应用程序”<sup><a href="https://shuye.dev/zh-Hans/blog/malloc_chunk#user-content-fn-4-c5d1db" id="user-content-fnref-4-c5d1db" data-footnote-ref="true" aria-describedby="footnote-label">4</a></sup>（<a href="https://sourceware.org/glibc/wiki/MallocInternals#Free_Algorithm" target="_blank" rel="noopener noreferrer">来源</a>）的堆上。也就是说，堆管理器仍然需要追踪这块内存，并在合适的时候重新使用它。</p>
<p>这就是为什么我们使用了一个循环链表来组织被 <code>free</code> 的 chunk 们，其中每一个 chunk 都存储了指向前一个与后一个 chunk 的指针。此外，每个 chunk 的大小会被存储在它内存的最后一个字，即下一个 chunk 的 <code>chunk</code> 指针。这样一来，下一个 chunk 可以利用这一大小访问这个被 <code>free</code> 的 chunk 以及它的 header。当下一个 chunk 也被 <code>free</code> 时，我们能够利用这一属性来 合并（<a href="https://cs.stackexchange.com/a/18234" target="_blank" rel="noopener noreferrer">coalesce</a>）这两个 chunk。</p>
<p>当然了，实际的 <code>free</code> 操作要远比这复杂，且 chunk 们也会为了更高效的再分配（reallocation）被放置到不同的 bin 中。 你可以阅读官方的 <a href="https://sourceware.org/glibc/wiki/MallocInternals" target="_blank" rel="noopener noreferrer">glibc wiki</a>，这篇更为详细的 <a href="https://azeria-labs.com/heap-exploitation-part-2-glibc-heap-free-bins/" target="_blank" rel="noopener noreferrer">博文</a>，或是 <a href="https://sourceware.org/git/?p=glibc.git;a=blob;f=malloc/malloc.c;h=e065785af77af72c17c773517c15b248b067b4ad;hb=ae37d06c7d127817ba43850f0f898b793d42aea7#l4302" target="_blank" rel="noopener noreferrer"><code>_int_free</code> 的源码</a> 以了解更多底层细节。</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="how-can-we-prevent-memory-leaks">我们如何预防内存泄漏？<a href="https://shuye.dev/zh-Hans/blog/malloc_chunk#how-can-we-prevent-memory-leaks" class="hash-link" aria-label="标题的直接链接" title="标题的直接链接">​</a></h2>
<p>现在可能是时候回到我们开始的主题了：既然我们已经知道了内存泄漏是什么，以及它们是如何发生的，那么我们有什么办法预防内存泄漏吗？</p>
<ol>
<li>永远 <code>delete</code> (<code>delete[]</code>) 使用 <code>new</code> (<code>new[]</code>) 创建的对象<!-- -->
<ul>
<li>这是我们能做的最简单的事情，如果你仍然坚持使用 <code>new</code> 的话</li>
</ul>
</li>
<li>避免直接调用 <code>new</code> 与 <code>delete</code>
<ul>
<li><a href="https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#r11-avoid-calling-new-and-delete-explicitly" target="_blank" rel="noopener noreferrer">说明</a>（英语）</li>
<li><strong>太长不看</strong>：使用资源句柄（resource handle）而不是裸指针，后者具有泄漏的可能性。</li>
<li>解决方法：使用诸如 <code>unique_ptr</code> 与 <code>shared_ptr</code> 的智能指针。</li>
</ul>
</li>
<li>不要用裸指针（<code>T*</code>）或引用（<code>T&amp;</code>）来转移所有权（ownership）<!-- -->
<ul>
<li><a href="https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#i11-never-transfer-ownership-by-a-raw-pointer-t-or-reference-t" target="_blank" rel="noopener noreferrer">说明</a>（英语）</li>
<li><strong>太长不看</strong>：容易产生“谁应当删除指针”的歧义。</li>
<li>解决方法：直接 return 对象本身，或是使用智能指针。</li>
</ul>
</li>
</ol>
<p>一般来说，要求程序员手动释放资源是很容易出错的。你应该考虑 <a href="https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Rr-raii" target="_blank" rel="noopener noreferrer">使用资源句柄和 RAII（资源获取即初始化）自动管理资源</a>（英语）。</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="references--further-readings">引用 &amp; 扩展阅读<a href="https://shuye.dev/zh-Hans/blog/malloc_chunk#references--further-readings" class="hash-link" aria-label="标题的直接链接" title="标题的直接链接">​</a></h2>
<ul>
<li>Stroustrup, Bjarne and Sutter, Herb. <a href="https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines" target="_blank" rel="noopener noreferrer">"C++ Core Guidelines"</a>. Updated Jun 17, 2021. Accessed Aug 08, 2021.</li>
<li>glibc wiki. <a href="https://sourceware.org/glibc/wiki/MallocInternals" target="_blank" rel="noopener noreferrer">"MallocInternals"</a>. Updated May 20, 2019. Accessed Aug 08, 2021.</li>
<li>Azeria Labs. <a href="https://azeria-labs.com/heap-exploitation-part-2-glibc-heap-free-bins/" target="_blank" rel="noopener noreferrer">"Heap Exploitation Part 2: Understanding the Glibc Heap Implementation"</a>. Accessed Aug 08, 2021.</li>
<li>CTF Wiki. <a href="https://ctf-wiki.org/pwn/linux/user-mode/heap/ptmalloc2/heap-structure/" target="_blank" rel="noopener noreferrer">"堆相关数据结构"</a> (in Chinese). Accessed Aug 10, 2021.</li>
<li>glibc Contributors. <a href="https://sourceware.org/git/?p=glibc.git;a=tree;h=6eb9f63e6c9197e967a8cc12a8b235335e5a873d;hb=ae37d06c7d127817ba43850f0f898b793d42aea7" target="_blank" rel="noopener noreferrer">glibc v2.34 source code</a>. Aug 2, 2021. Accessed Aug 8, 2021.</li>
<li>gcc Contributors. <a href="https://github.com/gcc-mirror/gcc/tree/releases/gcc-11.2.0" target="_blank" rel="noopener noreferrer">gcc v11.2.0 source code</a>. Jul 28, 2021. Accessed Aug 8, 2021.</li>
<li>StackOverflow. <a href="https://stackoverflow.com/questions/197675/how-does-delete-know-the-size-of-the-operand-array" target="_blank" rel="noopener noreferrer">"How does delete[] know the size of the operand array?"</a></li>
</ul>
<p>顺带一提，在搜索内存泄漏的时候，我在 Brookhaven National Lab 的域名下面发现了一个 大亚湾反应堆中微子实验的 wiki 页面 <a href="https://wiki.bnl.gov/dayabay/index.php?title=Dealing_With_Memory_Leaks" target="_blank" rel="noopener noreferrer">"Dealing With Memory Leaks"</a>。我都不知道大亚湾反应堆还有一个国际研究项目 😂</p>
<!-- -->
<section data-footnotes="true" class="footnotes"><h2 class="anchor anchorWithStickyNavbar_LWe7 sr-only" id="footnote-label">Footnotes<a href="https://shuye.dev/zh-Hans/blog/malloc_chunk#footnote-label" class="hash-link" aria-label="标题的直接链接" title="标题的直接链接">​</a></h2>
<ol>
<li id="user-content-fn-1-c5d1db">
<p>"chunks always begin on even word boundaries ... and thus at least double-word aligned." <a href="https://shuye.dev/zh-Hans/blog/malloc_chunk#user-content-fnref-1-c5d1db" data-footnote-backref="" aria-label="Back to reference 1" class="data-footnote-backref">↩</a></p>
</li>
<li id="user-content-fn-2-c5d1db">
<p>"does not point to the beginning of the chunk, but to the last word in the previous chunk"（<a href="https://sourceware.org/glibc/wiki/MallocInternals#What_is_a_Chunk.3F" target="_blank" rel="noopener noreferrer">来源</a>） <a href="https://shuye.dev/zh-Hans/blog/malloc_chunk#user-content-fnref-2-c5d1db" data-footnote-backref="" aria-label="Back to reference 2" class="data-footnote-backref">↩</a></p>
</li>
<li id="user-content-fn-3-c5d1db">
<p>"to find the front of the previous chunk" <a href="https://shuye.dev/zh-Hans/blog/malloc_chunk#user-content-fnref-3-c5d1db" data-footnote-backref="" aria-label="Back to reference 3" class="data-footnote-backref">↩</a></p>
</li>
<li id="user-content-fn-4-c5d1db">
<p>"does not actually return it to the operating system for other applications to use. The <code>free()</code> call marks a chunk of memory as 'free to be reused' by the application, but from the operating system's point of view, the memory still 'belongs' to the application"（<a href="https://sourceware.org/glibc/wiki/MallocInternals#Free_Algorithm" target="_blank" rel="noopener noreferrer">来源</a>） <a href="https://shuye.dev/zh-Hans/blog/malloc_chunk#user-content-fnref-4-c5d1db" data-footnote-backref="" aria-label="Back to reference 4" class="data-footnote-backref">↩</a></p>
</li>
</ol>
</section>]]></content>
        <author>
            <name>Ye Shu</name>
            <uri>https://github.com/yechs</uri>
        </author>
        <category label="c++" term="c++"/>
        <category label="pwn" term="pwn"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[Me and My Broken Site(s)]]></title>
        <id>https://shuye.dev/zh-Hans/blog/welcome</id>
        <link href="https://shuye.dev/zh-Hans/blog/welcome"/>
        <updated>2021-04-22T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[My personal website is finally up and running today! It is already 821 days after I purchased this domain (huh I'm such big a procrastinator). I also have a blog running on the subdomain blog.shuye.dev for some longer and possibly non tech-related blog posts.]]></summary>
        <content type="html"><![CDATA[<p>My personal website is finally up and running today! It is already 821 days after I purchased this domain (huh I'm such big a procrastinator). I also have a blog running on the subdomain <a href="https://blog.shuye.dev/" target="_blank" rel="noopener noreferrer">blog.shuye.dev</a> for some longer and possibly non tech-related blog posts.</p>
<p>This is not the first personal website I've ever made. I coded my very first personal website in 2014 using static HTML and CSS (that was the time when most websites were still using HTML 4.01 and you had to decide whether you want a "strict" version of HTML), after a bit of self-learning with W3Schools. In fact, I wasn't even learning from the true <a href="https://w3schools.com/" target="_blank" rel="noopener noreferrer">W3Schools</a>, but <a href="https://web.archive.org/web/20140103013237/http://w3school.com.cn/" target="_blank" rel="noopener noreferrer">an imitation of it</a>.</p>
<p>That was my first step, before I got into building more complicated (and dynamic!) websites using PHP, with the help from lecture videos of <a href="http://cs75.tv/2012/summer/" target="_blank" rel="noopener noreferrer">Harvard's open course CS75</a>. Shortly afterwards, I hosted my first dynamic website on the free hosting space RedHat OpenShift by 2016 (though it served more as a proxy server for me to get around the Internet censorship—that would be the topic of some other day). Sadly, all these codes were lost during the years (I didn't even know how to use git at that time).</p>
<p>Since then, I have hosted different sorts of personal websites: portfolios, WordPress blogs, and many lightweight alternatives like typecho or hugo. But to be honest I was more interested in the process of trying out new frameworks or toolsets, instead of actually settling down and writing something. And what use are websites of if not for sharing information?</p>
<p>In fact, that's what motivates me to still build blogs and websites to this day, in the year of 2021, when most people are turning to centralized social media platforms like WeChat, Instagram, Twitter ..., when small websites are almost extinct from public sight. I became so fed up by the frequent warnings of "sensitive words" that force me to water down and censor my comments so I finally decided to start (again) a website of my own to at least have a place to myself where I can actually write something undisturbed. And that's why, out of the innumerable possibilities, you are reading this article right now.</p>]]></content>
        <author>
            <name>Ye Shu</name>
            <uri>https://github.com/yechs</uri>
        </author>
        <category label="events" term="events"/>
        <category label="essays" term="essays"/>
    </entry>
</feed>