QEMU With E2K User Support
Go to file
Sam Eiderman 98eb9733f4 vmdk: Add read-only support for seSparse snapshots
Until ESXi 6.5 VMware used the vmfsSparse format for snapshots (VMDK3 in
QEMU).

This format was lacking in the following:

    * Grain directory (L1) and grain table (L2) entries were 32-bit,
      allowing access to only 2TB (slightly less) of data.
    * The grain size (default) was 512 bytes - leading to data
      fragmentation and many grain tables.
    * For space reclamation purposes, it was necessary to find all the
      grains which are not pointed to by any grain table - so a reverse
      mapping of "offset of grain in vmdk" to "grain table" must be
      constructed - which takes large amounts of CPU/RAM.

The format specification can be found in VMware's documentation:
https://www.vmware.com/support/developer/vddk/vmdk_50_technote.pdf

In ESXi 6.5, to support snapshot files larger than 2TB, a new format was
introduced: SESparse (Space Efficient).

This format fixes the above issues:

    * All entries are now 64-bit.
    * The grain size (default) is 4KB.
    * Grain directory and grain tables are now located at the beginning
      of the file.
      + seSparse format reserves space for all grain tables.
      + Grain tables can be addressed using an index.
      + Grains are located in the end of the file and can also be
        addressed with an index.
      - seSparse vmdks of large disks (64TB) have huge preallocated
        headers - mainly due to L2 tables, even for empty snapshots.
    * The header contains a reverse mapping ("backmap") of "offset of
      grain in vmdk" to "grain table" and a bitmap ("free bitmap") which
      specifies for each grain - whether it is allocated or not.
      Using these data structures we can implement space reclamation
      efficiently.
    * Due to the fact that the header now maintains two mappings:
        * The regular one (grain directory & grain tables)
        * A reverse one (backmap and free bitmap)
      These data structures can lose consistency upon crash and result
      in a corrupted VMDK.
      Therefore, a journal is also added to the VMDK and is replayed
      when the VMware reopens the file after a crash.

Since ESXi 6.7 - SESparse is the only snapshot format available.

Unfortunately, VMware does not provide documentation regarding the new
seSparse format.

This commit is based on black-box research of the seSparse format.
Various in-guest block operations and their effect on the snapshot file
were tested.

The only VMware provided source of information (regarding the underlying
implementation) was a log file on the ESXi:

    /var/log/hostd.log

Whenever an seSparse snapshot is created - the log is being populated
with seSparse records.

Relevant log records are of the form:

[...] Const Header:
[...]  constMagic     = 0xcafebabe
[...]  version        = 2.1
[...]  capacity       = 204800
[...]  grainSize      = 8
[...]  grainTableSize = 64
[...]  flags          = 0
[...] Extents:
[...]  Header         : <1 : 1>
[...]  JournalHdr     : <2 : 2>
[...]  Journal        : <2048 : 2048>
[...]  GrainDirectory : <4096 : 2048>
[...]  GrainTables    : <6144 : 2048>
[...]  FreeBitmap     : <8192 : 2048>
[...]  BackMap        : <10240 : 2048>
[...]  Grain          : <12288 : 204800>
[...] Volatile Header:
[...] volatileMagic     = 0xcafecafe
[...] FreeGTNumber      = 0
[...] nextTxnSeqNumber  = 0
[...] replayJournal     = 0

The sizes that are seen in the log file are in sectors.
Extents are of the following format: <offset : size>

This commit is a strict implementation which enforces:
    * magics
    * version number 2.1
    * grain size of 8 sectors  (4KB)
    * grain table size of 64 sectors
    * zero flags
    * extent locations

Additionally, this commit proivdes only a subset of the functionality
offered by seSparse's format:
    * Read-only
    * No journal replay
    * No space reclamation
    * No unmap support

Hence, journal header, journal, free bitmap and backmap extents are
unused, only the "classic" (L1 -> L2 -> data) grain access is
implemented.

However there are several differences in the grain access itself.
Grain directory (L1):
    * Grain directory entries are indexes (not offsets) to grain
      tables.
    * Valid grain directory entries have their highest nibble set to
      0x1.
    * Since grain tables are always located in the beginning of the
      file - the index can fit into 32 bits - so we can use its low
      part if it's valid.
Grain table (L2):
    * Grain table entries are indexes (not offsets) to grains.
    * If the highest nibble of the entry is:
        0x0:
            The grain in not allocated.
            The rest of the bytes are 0.
        0x1:
            The grain is unmapped - guest sees a zero grain.
            The rest of the bits point to the previously mapped grain,
            see 0x3 case.
        0x2:
            The grain is zero.
        0x3:
            The grain is allocated - to get the index calculate:
            ((entry & 0x0fff000000000000) >> 48) |
            ((entry & 0x0000ffffffffffff) << 12)
    * The difference between 0x1 and 0x2 is that 0x1 is an unallocated
      grain which results from the guest using sg_unmap to unmap the
      grain - but the grain itself still exists in the grain extent - a
      space reclamation procedure should delete it.
      Unmapping a zero grain has no effect (0x2 will not change to 0x1)
      but unmapping an unallocated grain will (0x0 to 0x1) - naturally.

In order to implement seSparse some fields had to be changed to support
both 32-bit and 64-bit entry sizes.

Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com>
Reviewed-by: Eyal Moscovici <eyal.moscovici@oracle.com>
Reviewed-by: Arbel Moshe <arbel.moshe@oracle.com>
Signed-off-by: Sam Eiderman <shmuel.eiderman@oracle.com>
Message-id: 20190620091057.47441-4-shmuel.eiderman@oracle.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-06-24 15:53:02 +02:00
accel target/i386: kvm: Add support for save and restore nested state 2019-06-21 13:23:47 +02:00
audio Include qemu-common.h exactly where needed 2019-06-12 13:20:20 +02:00
authz Include qemu/module.h where needed, drop it from qemu-common.h 2019-06-12 13:18:33 +02:00
backends Include qemu/module.h where needed, drop it from qemu-common.h 2019-06-12 13:18:33 +02:00
block vmdk: Add read-only support for seSparse snapshots 2019-06-24 15:53:02 +02:00
bsd-user Include qemu-common.h exactly where needed 2019-06-12 13:20:20 +02:00
capstone@22ead3e0bf disas: Add capstone as submodule 2017-10-26 11:56:20 +02:00
chardev monitor: Replace monitor_init() with monitor_init_{hmp, qmp}() 2019-06-18 08:14:17 +02:00
contrib vhost-user-gpu: initialize msghdr & iov at declaration 2019-06-16 16:16:52 -04:00
crypto Normalize position of header guard 2019-06-12 13:20:20 +02:00
default-configs hw/acpi: Consolidate build_mcfg to pci.c 2019-05-29 18:00:57 -04:00
disas Include qemu-common.h exactly where needed 2019-06-12 13:20:20 +02:00
docs i386/kvm: add support for Direct Mode for Hyper-V synthetic timers 2019-06-21 02:29:39 +02:00
dtc@88f18909db Update dtc/libfdt submodule to v1.4.7 2018-10-02 13:53:26 +10:00
fpu hardfloat: fix float32/64 fused multiply-add 2019-03-25 10:35:32 +00:00
fsdev Include qemu/module.h where needed, drop it from qemu-common.h 2019-06-12 13:18:33 +02:00
gdb-xml RISC-V: Add 64-bit gdb xml files. 2019-03-19 05:13:24 -07:00
hw nvme: do not advertise support for unsupported arbitration mechanism 2019-06-24 15:53:01 +02:00
include hw: Nuke hw_compat_4_0_1 and pc_compat_4_0_1 2019-06-21 13:25:29 +02:00
io Include qemu-common.h exactly where needed 2019-06-12 13:20:20 +02:00
libdecnumber build: remove CONFIG_LIBDECNUMBER 2017-10-16 18:03:52 +02:00
linux-headers linux-headers: sync with latest KVM headers from Linux 5.2 2019-06-21 13:23:47 +02:00
linux-user semihosting: split console_out into string and char versions 2019-06-12 17:53:22 +01:00
migration Include qemu-common.h exactly where needed 2019-06-12 13:20:20 +02:00
monitor block/block-backend: blk_iostatus_reset: drop usage of bs->job 2019-06-18 16:41:10 +02:00
nbd nbd/server: Nicer spelling of max BLOCK_STATUS reply length 2019-06-13 08:56:10 -05:00
net Include qemu-common.h exactly where needed 2019-06-12 13:20:20 +02:00
pc-bios pc-bios: update the README file with edk2-stable201905 information 2019-06-14 21:48:00 +02:00
po po/Makefile: Modern shell scripting (use $() instead of ``) 2018-10-24 07:39:10 +01:00
python/qemu event_match: always match on None value 2019-06-14 14:16:57 +02:00
qapi block/null: Expose read-zeroes option in QAPI schema 2019-06-18 16:41:10 +02:00
qga Include qemu-common.h exactly where needed 2019-06-12 13:20:20 +02:00
qobject qemu-common: Move qemu_isalnum() etc. to qemu/ctype.h 2019-06-11 20:22:09 +02:00
qom Include qemu-common.h exactly where needed 2019-06-12 13:20:20 +02:00
replay Include qemu-common.h exactly where needed 2019-06-12 13:20:20 +02:00
roms roms/Makefile.edk2: update input file list for "pc-bios/edk2-licenses.txt" 2019-06-14 21:47:53 +02:00
scripts decodetree: Fix comparison of Field 2019-06-13 15:14:03 +01:00
scsi Include qemu-common.h exactly where needed 2019-06-12 13:20:20 +02:00
slirp@f0da672620 Update upstream slirp 2019-05-09 09:58:57 +02:00
stubs monitor: Replace monitor_init() with monitor_init_{hmp, qmp}() 2019-06-18 08:14:17 +02:00
target MIPS queue for June 21st, 2019 2019-06-21 15:40:50 +01:00
tcg Supply missing header guards 2019-06-12 13:20:21 +02:00
tests vmdk: Reduce the max bound for L1 table size 2019-06-24 15:53:02 +02:00
trace Include qemu-common.h exactly where needed 2019-06-12 13:20:20 +02:00
ui ui/cocoa: Fix mouse grabbing in fullscreen mode for relative input device 2019-06-13 11:23:22 +01:00
util util/main-loop: Fix incorrect assertion 2019-06-21 13:25:29 +02:00
.cirrus.yml cirrus / travis: Add gnu-sed and bash for macOS and FreeBSD 2019-05-21 10:12:47 +02:00
.dir-locals.el Add .dir-locals.el file to configure emacs coding style 2015-10-08 19:46:01 +03:00
.editorconfig editorconfig: add setting for shell scripts 2019-06-12 17:53:22 +01:00
.exrc qemu: add .exrc 2012-09-07 09:02:44 +03:00
.gdbinit .gdbinit: load QEMU sub-commands when gdb starts 2017-06-07 14:38:45 +01:00
.gitignore Makefile: install the edk2 firmware images and their descriptors 2019-04-17 15:38:35 +02:00
.gitlab-ci.yml gitlab-ci.yml: Test the TCG interpreter in a CI pipeline 2019-05-02 16:56:33 +02:00
.gitmodules gitmodules: use qemu.org git mirrors 2019-05-03 13:56:56 +01:00
.gitpublish Add a git-publish configuration file 2018-03-05 09:03:17 +00:00
.mailmap maint: Grammar fix to mailmap 2018-12-11 18:35:54 +01:00
.patchew.yml ci: store Patchew configuration in the tree 2019-06-03 14:03:02 +02:00
.shippable.yml .shippable.yml: disable the win cross tests 2018-12-17 13:02:12 +00:00
.travis.yml Travis: print acceptance tests logs in case of job failure 2019-06-18 11:15:08 -03:00
CODING_STYLE CODING_STYLE: indent example code as all others 2019-05-02 18:12:58 +02:00
COPYING COPYING: update from FSF 2008-10-12 17:54:42 +00:00
COPYING.LIB COPYING.LIB: Synchronize the LGPL 2.1 with the version from gnu.org 2019-01-30 11:01:22 +01:00
Changelog Use HTTPS for qemu.org and other domains 2017-11-21 13:34:13 +00:00
HACKING HACKING: document preference for g_new instead of g_malloc 2018-05-20 08:32:09 +03:00
Kconfig.host kconfig: add dependencies on CONFIG_MSI_NONBROKEN 2019-03-18 09:39:57 +01:00
LICENSE vfio: move hw/misc/vfio.c to hw/vfio/pci.c Move vfio.h into include/hw/vfio 2014-12-19 15:24:06 -07:00
MAINTAINERS MAINTAINERS: Consolidate MIPS disassembler-related items 2019-06-21 11:29:43 +02:00
Makefile docs: Build and install specs manual 2019-06-17 15:35:31 +01:00
Makefile.objs monitor: Split out monitor/qmp.c 2019-06-17 20:36:56 +02:00
Makefile.target Move monitor.c to monitor/misc.c 2019-06-17 20:36:56 +02:00
README README: use 'https://' instead of 'git://' 2018-11-12 11:26:02 +00:00
VERSION Open 4.1 development tree 2019-04-24 10:12:22 +01:00
arch_init.c Include qemu-common.h exactly where needed 2019-06-12 13:20:20 +02:00
balloon.c Include qemu-common.h exactly where needed 2019-06-12 13:20:20 +02:00
block.c block: Ignore loosening perm restrictions failures 2019-06-18 16:41:10 +02:00
blockdev-nbd.c nbd: allow authorization with nbd-server-start QMP command 2019-03-06 11:05:27 -06:00
blockdev.c blockdev: enable non-root nodes for transaction drive-backup source 2019-06-24 15:53:01 +02:00
blockjob.c block: drop bs->job 2019-06-18 16:41:10 +02:00
bootdevice.c fw_cfg: ignore suffixes in the bootdevice list dependent on machine class 2018-08-16 22:27:43 -03:00
bt-host.c all: Clean up includes 2016-02-04 17:41:30 +00:00
bt-vhci.c Include qemu-common.h exactly where needed 2019-06-12 13:20:20 +02:00
configure configure: remove tpm_passthrough & tpm_emulator 2019-06-03 14:03:02 +02:00
cpus-common.c qemu/queue.h: simplify reverse access to QTAILQ 2019-01-11 15:46:55 +01:00
cpus.c hax: Honor CPUState::halted 2019-06-21 02:29:38 +02:00
device-hotplug.c hmp: Fix drive_add ... format=help crash 2019-04-08 17:42:06 +02:00
device_tree.c device_tree: Fix integer overflowing in load_device_tree() 2019-04-09 16:35:40 -07:00
disas.c Include qemu-common.h exactly where needed 2019-06-12 13:20:20 +02:00
dma-helpers.c block: explicitly acquire aiocontext in bottom halves that need it 2017-02-21 11:39:39 +00:00
dump.c Include qemu-common.h exactly where needed 2019-06-12 13:20:20 +02:00
exec.c Include qemu-common.h exactly where needed 2019-06-12 13:20:20 +02:00
gdbstub.c monitor: Replace monitor_init() with monitor_init_{hmp, qmp}() 2019-06-18 08:14:17 +02:00
gitdm.config contrib: gitdm: add a mapping for Janus Technologies 2019-03-12 19:31:29 +00:00
hmp-commands-info.hx {hmp, hw/pvrdma}: Expose device internals via monitor interface 2019-03-16 15:52:44 +02:00
hmp-commands.hx monitor: Rename HMP command type and tables 2019-06-17 20:36:56 +02:00
hmp.h Include qemu-common.h exactly where needed 2019-06-12 13:20:20 +02:00
ioport.c Include qemu-common.h exactly where needed 2019-06-12 13:20:20 +02:00
iothread.c iothread: document about why we need explicit aio_poll() 2019-03-08 10:20:57 +00:00
job-qmp.c Include qemu-common.h exactly where needed 2019-06-12 13:20:20 +02:00
job.c Include qemu-common.h exactly where needed 2019-06-12 13:20:20 +02:00
memory.c qemu-common: Move tcg_enabled() etc. to sysemu/tcg.h 2019-06-11 20:22:09 +02:00
memory_ldst.inc.c exec: Fix MAP_RAM for cached access 2018-06-28 19:05:30 +02:00
memory_mapping.c Include qemu-common.h exactly where needed 2019-06-12 13:20:20 +02:00
module-common.c all: Clean up includes 2016-02-04 17:41:30 +00:00
numa.c numa: improve cpu hotplug error message with a wrong node-id 2019-06-07 15:28:46 -03:00
os-posix.c Include qemu-common.h exactly where needed 2019-06-12 13:20:20 +02:00
os-win32.c Include qemu-common.h exactly where needed 2019-06-12 13:20:20 +02:00
qdev-monitor.c monitor: Simplify how -device/device_add print help 2019-04-18 22:18:59 +02:00
qemu-bridge-helper.c qemu-bridge-helper: Fix misuse of isspace() 2019-05-22 14:57:33 +02:00
qemu-deprecated.texi vl: Deprecate -mon pretty=... for HMP monitors 2019-06-18 08:14:17 +02:00
qemu-doc.texi docs: add Security chapter to the documentation 2019-05-10 10:53:52 +01:00
qemu-edid.c Include qemu-common.h exactly where needed 2019-06-12 13:20:20 +02:00
qemu-ga.texi doc: fix the configuration path 2019-05-03 13:03:04 +02:00
qemu-img-cmds.hx qemu-img: Add salvaging mode to convert 2019-06-14 14:16:57 +02:00
qemu-img.c qemu-img: Add salvaging mode to convert 2019-06-14 14:16:57 +02:00
qemu-img.texi qemu-img: Add salvaging mode to convert 2019-06-14 14:16:57 +02:00
qemu-io-cmds.c qemu-io-cmds: use clock_gettime for benchmarking 2019-06-12 17:53:22 +01:00
qemu-io.c Include qemu-common.h exactly where needed 2019-06-12 13:20:20 +02:00
qemu-keymap.c Include qemu-common.h exactly where needed 2019-06-12 13:20:20 +02:00
qemu-nbd.c qemu-nbd: Do not close stderr 2019-06-13 08:50:47 -05:00
qemu-nbd.texi qemu-nbd: Add --pid-file option 2019-06-13 08:50:47 -05:00
qemu-option-trace.texi qemu-option-trace: -trace enable= is a pattern, not a file 2018-05-20 08:29:01 +03:00
qemu-options-wrapper.h qemu-img: remove references to GEN_DOCS 2018-05-20 08:35:54 +03:00
qemu-options.h Clean up ill-advised or unusual header guards 2016-07-12 16:20:46 +02:00
qemu-options.hx docs: smbios: remove family=x from type2 entry description 2019-05-29 18:00:57 -04:00
qemu-seccomp.c seccomp: report more useful errors from seccomp 2019-03-27 13:11:38 +01:00
qemu-tech.texi qemu-tech.texi: Remove "QEMU compared to other emulators" section 2019-06-17 15:35:31 +01:00
qemu.nsi Use HTTPS for qemu.org and other domains 2017-11-21 13:34:13 +00:00
qemu.sasl Default to GSSAPI (Kerberos) instead of DIGEST-MD5 for SASL 2017-05-09 14:41:47 +01:00
qtest.c Include qemu/module.h where needed, drop it from qemu-common.h 2019-06-12 13:18:33 +02:00
replication.c replication: Introduce new APIs to do replication operation 2016-09-13 11:00:56 +01:00
replication.h Include qemu/module.h where needed, drop it from qemu-common.h 2019-06-12 13:18:33 +02:00
rules.mak contrib: add vhost-user-gpu 2019-05-29 06:30:45 +02:00
thunk.c thunk: improve readability of allocation loop 2019-03-11 18:48:20 +01:00
tpm.c tpm: Clean up error reporting in tpm_init_tpmdev() 2018-10-19 14:51:34 +02:00
trace-events Move monitor.c to monitor/misc.c 2019-06-17 20:36:56 +02:00
version.rc Use HTTPS for qemu.org and other domains 2017-11-21 13:34:13 +00:00
vl.c vl: Deprecate -mon pretty=... for HMP monitors 2019-06-18 08:14:17 +02:00
win_dump.c Include qemu-common.h exactly where needed 2019-06-12 13:20:20 +02:00
win_dump.h dump: move Windows dump structures definitions 2018-10-02 19:09:12 +02:00

README

         QEMU README
         ===========

QEMU is a generic and open source machine & userspace emulator and
virtualizer.

QEMU is capable of emulating a complete machine in software without any
need for hardware virtualization support. By using dynamic translation,
it achieves very good performance. QEMU can also integrate with the Xen
and KVM hypervisors to provide emulated hardware while allowing the
hypervisor to manage the CPU. With hypervisor support, QEMU can achieve
near native performance for CPUs. When QEMU emulates CPUs directly it is
capable of running operating systems made for one machine (e.g. an ARMv7
board) on a different machine (e.g. an x86_64 PC board).

QEMU is also capable of providing userspace API virtualization for Linux
and BSD kernel interfaces. This allows binaries compiled against one
architecture ABI (e.g. the Linux PPC64 ABI) to be run on a host using a
different architecture ABI (e.g. the Linux x86_64 ABI). This does not
involve any hardware emulation, simply CPU and syscall emulation.

QEMU aims to fit into a variety of use cases. It can be invoked directly
by users wishing to have full control over its behaviour and settings.
It also aims to facilitate integration into higher level management
layers, by providing a stable command line interface and monitor API.
It is commonly invoked indirectly via the libvirt library when using
open source applications such as oVirt, OpenStack and virt-manager.

QEMU as a whole is released under the GNU General Public License,
version 2. For full licensing details, consult the LICENSE file.


Building
========

QEMU is multi-platform software intended to be buildable on all modern
Linux platforms, OS-X, Win32 (via the Mingw64 toolchain) and a variety
of other UNIX targets. The simple steps to build QEMU are:

  mkdir build
  cd build
  ../configure
  make

Additional information can also be found online via the QEMU website:

  https://qemu.org/Hosts/Linux
  https://qemu.org/Hosts/Mac
  https://qemu.org/Hosts/W32


Submitting patches
==================

The QEMU source code is maintained under the GIT version control system.

   git clone https://git.qemu.org/git/qemu.git

When submitting patches, one common approach is to use 'git
format-patch' and/or 'git send-email' to format & send the mail to the
qemu-devel@nongnu.org mailing list. All patches submitted must contain
a 'Signed-off-by' line from the author. Patches should follow the
guidelines set out in the HACKING and CODING_STYLE files.

Additional information on submitting patches can be found online via
the QEMU website

  https://qemu.org/Contribute/SubmitAPatch
  https://qemu.org/Contribute/TrivialPatches

The QEMU website is also maintained under source control.

  git clone https://git.qemu.org/git/qemu-web.git
  https://www.qemu.org/2017/02/04/the-new-qemu-website-is-up/

A 'git-publish' utility was created to make above process less
cumbersome, and is highly recommended for making regular contributions,
or even just for sending consecutive patch series revisions. It also
requires a working 'git send-email' setup, and by default doesn't
automate everything, so you may want to go through the above steps
manually for once.

For installation instructions, please go to

  https://github.com/stefanha/git-publish

The workflow with 'git-publish' is:

  $ git checkout master -b my-feature
  $ # work on new commits, add your 'Signed-off-by' lines to each
  $ git publish

Your patch series will be sent and tagged as my-feature-v1 if you need to refer
back to it in the future.

Sending v2:

  $ git checkout my-feature # same topic branch
  $ # making changes to the commits (using 'git rebase', for example)
  $ git publish

Your patch series will be sent with 'v2' tag in the subject and the git tip
will be tagged as my-feature-v2.

Bug reporting
=============

The QEMU project uses Launchpad as its primary upstream bug tracker. Bugs
found when running code built from QEMU git or upstream released sources
should be reported via:

  https://bugs.launchpad.net/qemu/

If using QEMU via an operating system vendor pre-built binary package, it
is preferable to report bugs to the vendor's own bug tracker first. If
the bug is also known to affect latest upstream code, it can also be
reported via launchpad.

For additional information on bug reporting consult:

  https://qemu.org/Contribute/ReportABug


Contact
=======

The QEMU community can be contacted in a number of ways, with the two
main methods being email and IRC

 - qemu-devel@nongnu.org
   https://lists.nongnu.org/mailman/listinfo/qemu-devel
 - #qemu on irc.oftc.net

Information on additional methods of contacting the community can be
found online via the QEMU website:

  https://qemu.org/Contribute/StartHere

-- End