🔦 Debunking the Expert Witness Compression Format (EWF)

Abstract Link to heading

As a digital forensic expert, proving the authenticity and reliability of a forensic image in court is essential. Indeed, the integrity of the data needs to be maintained during the imaging process, preventing any accidental or intentional modification of the data. The Expert Witness Compression Format (EWF) provides a way to store metadata about the image, such as the source device, imaging tool, checksums, signatures, and other relevant information about the acquired media. This imaging format main feature is its compression capability thus reducing the size of the resulting image file. Compression allows for faster analysis of the data and reduces storage requirements. This article is meant to vulgarize the structures behind an EWF Segment. The reader will discover the main algorithms to use in order to be able to read and seek inside such image format. Finally, a proof of concept writen in rust will be shared to the reader.

Filesystem layers of abstraction Link to heading

Before getting right into the main subject of this article, it is important to learn or get a little reminder about the filesystem concepts, vocabulary, and the underlying layers of abstractions. Let us take the Unix filesystem concept as an example. Below is a vulgarized representation of the main layers of abstraction.

alt text

A storage medium (hard drive, SSD, …) have the necessary set of electronics to create an abstraction of the Logical Block Addressing. It can be viewed as contiguous sequence of sectors. A sector is the smallest accessible unit on a drive (typically 512 bytes for disk drives). It is possible to create a group multiple sectors and form a block. Blocks are the smallest accessible units on a filesystem. Each filesystem type can have their own concepts to represents files, directories, hardware devices etc… The exploitation system is supplying the abstraction of those human friendly concepts via the kernel to perform various actions on the filesystem (read, write, seek…).

Digital forensics is performed on a copy of the media to be investigated. This can be done with various tools (FTK Imager, EnCase, dd, Falcon, others…) and produce an image that can have various format (raw, img, ewf, vmdk, vdi…) to be analyzed later without performing the investigation on the original media.

From a forensics perspective, when a storage media is acquired, the investigator needs to find a way to emulate all the necessary abstraction layers in order to extract specific artifacts usefull to an investigation without tempering with the data. Forensics tools are providing such abstraction and capabilities. Most of those tools support different image format. One of these formats is well known and largely used: the Expert Witness Compressing Format (EWF).

alt text

The Expert Witness Compression Format Link to heading

The Expert Witness Compression Format (EWF) is a forensic image output format created by the ASRDATA company. It can be used to create a bit-by-bit copy of a digital device. It includes both data and metadata, such as the partition table and other information about the device. EWF is designed to maintain the integrity of the original data and can be compressed to reduce storage requirements. It is widely used to preserve evidence for analysis and investigation both by law enforcement, digital forensics and incident response companies. This format is not so easy to understand because it is a proprietary file format, thus the purpose of this blog article. Luckily, the opensource community provide a C library and a nice documentation about this file format 1. Let’s try to have a nice mental representation of the components of an EWF image.

The segments Link to heading

An EWF image can be divided into multiple segment files (there can also be a unique segment file). Those segments files have a consecutive extension system: Starting from “E01” to “E99”, then in alphabetical order from “EAA” to “ZZZ”. Dividing a large sized media evidence source into multiple segments is a great way to prevent a large and unique raw output file that can sometime creates problems on some filesystem. Each segment file is composed of a Header and multiple Sections.

The EWF file Header and Sections Link to heading

Let’s now dive into the components of a EWF segment file.

alt text

The EWF is storing the source evidence image’s sectors inside chunks. A chunk is just a group of sectors. There is a finite number of chunks per segments. Therefore, the information about the sectors and chunks needs to be known if we want to read them.

The segment file Header Link to heading

Each segment file has a Header (do not confuse the segment file header with the section header described later). The file header contains a signature (or a magic number) of 8 bytes that attest of its format:

alt text

In this example, the signature is: “EVF\0x09\0x0d\0x0a\0xff\0x00”

The file header also contains the information about the first section offset and the segment number.

The sections Link to heading

The sections are the metadata of the image used by the tools to be able to read the evidence sectors and get other various information about the acquired evidence (checksums, acquisition tool used, timestamps etc…). Each section starts with metadata describing itself:

  • Its type (header, volume, …)
  • Its size
  • The next section offset

Here is what important information you can extract from each section:

  • The header section - Not to be confused with the segment file header described earlier, it contains information about the acquired media (case number, Evidence Number, Examiner name, etc.). Each acquisition tools have their own way of describing what information reside in this section.
  • The “volume” or “disk” section – It contains critical information about the sectors and the chunks of the acquired media that will help the investigator to parse the EWF file like the chunk count, the size of a chunk, the size of a sector, the number of sectors per chunk.
  • The sector section – It contains the actual chunks of the acquired evidence. Now, the main advantage about EWF is that some of the chunks can be compressed to gain space on the destination storage using the zlib compression algorithm. Therefore, we need to know what the offsets of each chunk are and if it is compressed chunk or not.
  • The table section – This section is like a table of pointers that will tell the investigator where to find each chunk and if it is compressed. The most significant bit (MSB) of each pointer indicates if the chunk is compressed (1) or uncompressed (0).
  • The “end” or “next” section – The “end“ section indicates that this segment file was the last one. However, the “next” section indicates that there is another segment to parse.

You can now understand better the image showed at the beginning. To have more details about each section, the libewf project is providing a good documentation 2.

Parsing the EWF Segments Link to heading

Now that you have a better understanding of this file format, you want to be able to write a code to create the abstraction layer needed to read data like a standard disk and beginning the extraction of evidence.

Step 1: Parsing all the useful metadata from each segment. Link to heading

First, we want to be able to read chunks, we first need to extract all the necessary metadata about those chunks from each segment. To hold all the important metadata, we can create multiple structures to store them. Here is an example of what you can do.

alt text

Here, the purple color corresponds to a Structure or an Object. The red color represents an HashMap or a Dictionary with a key and a value. The blue color is a vector. You’ll notice that we have created the structures representing the different headers and sections of an EWF Segment. Our main goal is:

  • To know where all the segment file descriptors are. (segments)
  • To store all the chunk for each segment (chunks).
  • To store all the end of sector offset for each segment (end_of_sectors).
  • To know what is the current chunk that the EWF structure points to (CachedChunk). To understand better here are the structure definitions of a Chunk and a CachedChunk:

alt text

To parse a segment here is a pseudo-code algorithm:

Algorithm: parse_segment
Parameters: self: The EWF Structure, file: the current segment.
Return value: EwfSegment filled with all the information about the chunks    

		// Parsing EWF Header
    self.ewf_header <- new EwfHeader(file)
    current_offset <- 0xd // We place our self just after the EWFHeader.
    ewf_section_descriptor_size <- 0x4c
    extracted_chunks <- []

    begin loop:
        // Parsing EWF section descriptor
        section <- new EwfSectionDescriptor(file, current_offset)
        section_offset <- section.next_section_offset
        section_size <- section.section_size
        section_type <- section.section_type_def
        self.sections.push(section) // Save the section into a vector

        // Saving header information
        if section_type == "header" or section_type == "header2":
            self.header <- new EwfHeaderSection(file, current_offset+ewf_section_descriptor_size, self.sections.last())

        // Saving volume information
        if section_type == "disk" or section_type == "volume":
            self.volume <- new EwfVolumeSection(file, current_offset+ewf_section_descriptor_size)

        // Extracting chunks from table section
        if section_type == "table":
            extracted_chunks.extend(self.parse_table(&file, current_offset+ewf_section_descriptor_size)) //We save our chunks structure.

        // Saving end of sectors information
        if section_type == "sectors":
            self.end_of_sectors.insert(self.ewf_header.segment_number, current_offset + section_size)

        // Checking if the current section is done
        if current_offset == section_offset or section_type == "done":

        // Updating the offset to go throught the segment file.
        current_offset <- section_offset
    end loop
    // Saving segment and extracted chunks information
    self.chunks.insert(self.ewf_header.segment_number, extracted_chunks)
    return self

Notice that this function is calling other parsing functions and data structures that I did not describe in pseudo-code. The main goal is to understand the main parsing routine.

Step 2: Read an arbitrary chunk. Link to heading

Now that we have save all our chunks, we can create a function to read the chunk number X from the given segment number Y.

Reading a chunk includes checking if it is a compressed chunk. And if so, decompressing its data before.

  1. To read the data from a chunk number in a segment file here are the steps to follow :
  2. Check if the given chunk number is valid for the given segment using the “chunks” dictionary in our EWF structure. If not, it raises an error.
  3. Use the following variables:
  • data: An empty buffer of bytes to store the read data.
  • chunk: A reference to the chunk object in the segment.
  • start_offset: The starting position in the segment where the chunk data is located.
  • end_offset: The ending position in the segment where the chunk data is located (for compressed chunks).
  1. Seek to the starting position of the chunk data in the segment.
  2. If the chunk is not compressed, read data from the segment into a buffer.
  3. If the chunk is compressed, decode the compressed data using Zlib 3 and store the result in the data buffer.
  4. Return the data buffer containing the chunk data.

We can now read the data from any chunk number in a given segment!

Step 3: Create a standard read Link to heading

Now the last step is to create a read function to imitate the traditional read system call 4 on a POSIX system. To perform this task here are the steps for a given number of bytes to read:

  1. Check if there is any cached chunk data available. If not, read the first chunk of the first segment and set it as the cached chunk data.
  2. Loop until the size is zero.
  3. If the remaining size of the data to be read is less than or equal to the remaining data in the cached chunk, update the buffer with the remaining data. Then update the cached chunk pointer and size.
  4. If the remaining size is greater than the remaining cached chunk data, update the buffer with all the remaining cached chunk data and calculate the remaining size of the data to be read.
  5. Check if there are more chunks to be read or if the end of the segments has been reached. If there are more chunks, get the next chunk number and read that chunk’s data.
  6. Otherwise, return the buffer that has been read so far (nothing more to read).

Proof of concept Link to heading

Now that we have finished the theorical part, I am sharing to you a proof of concept written in rust. The code can be found here: https://github.com/forensicxlab/EWF

alt text

This code will show you all the important metadata about the parsed segments. It is capable of:

  • Reading and seeking through the sectors of an EWF image.
  • Parsing the MBR.
  • Calculate the original media signature (MD5 of all the sectors).

Conclusion Link to heading

To conclude this blogpost, we were able to create the abstraction layer needed to read an EWF image. We can now identify partitions and create other abstraction layer to read files, reconstruct a system tree etc… This can be the subject of future blogposts. Do not hesitate to reach me at felix.guyard@forensicxlab.com to make this article better.

References Link to heading