the windows file system

In this paper I will take a look at the current windows file system and explain the following in detail: the boot sector, the MFT, Files and their attributes, folders and the B+ Data Structure, and Possible Attacks on NTFS.

The Boot Sector

The windows File system begins with the boot sector. This is made whenever you format an NTFS volume and is located in the first sector of your windows partition. The boot sector holds information about the drive, which is recorded in the BIOS parameter block (often referenced as BPB). The BPB details information about the hard disk such as its size, and the physical parameters of volume. The boot sector also contains code that points to the Master File Table and it's backup ($MFT and $MFTMirror). The MFT Backup ($MFTMirror) acts as a fault tolerance mechanism; it holds a mirror copy of the first four records or the first cluster of the Master File Table. If any records in the MFT are corrupt, NTFS will refer to the boot sector for the location of the mirror and use the mirror copy to not only get the correct information but to also repair the MFT. The Boot Sector is also the mechanism that is responsible for passing operations from the Master Boot Record to the NT loader program. The Boot process basically goes something like this: BIOS >> MBR >> Boot Sector >> the NT Loader (NTLDR) >> hardware detection >> Core OS loads (Ntoskrnl.exe) >> Services Start >> Logon.

The Master File Table

The MFT is the core component of the NTFS file system. Through the MFT the NTFS file system becomes a highly organized array of records containing information describing the content of your file system. Every instance of data on your hard disk is described within these records, from the boot sector to your plain text file.

The first sixteen records of the MFT are dedicated to metadata files. The metadata files define the structure of the MFT and essentially make it a self-describing database. The use of metadata files in the MFT should not be surprising; every database uses some form of metadata to define it's data structure. The metadata files that are stored within the first sixteen records of the MFT are as follows:

The MFT

Rec. | File Name  | Description
0    | $Mft       | The Master File Table
1    | $MftMirror | The Master File Table Mirror
2    | $LogFile   | A log file containing a list of 
                    transaction steps for NTFS 
                    recoverability.
3    | $Volume    | Information about he volume.
4    | $AttrDef   | Defines attributes
5    | .          | The root folder
6    | $Bitmap    | Cluster bitmap representing the volume.
7    | $Boot      | Boot sector (discussed above)
8    | $BadClus   | Contains bad clusters for a volume
9    | $Secure    | Contains security descriptors for all
                    files within the volume.
10   | $Upcase    | Converts lowercase characters to 
                    Unicode uppercase characters.
11   | $Extend    | Used for various option extensions
                    (Unique file Id, Quota Information, 
                    Reparse point information, etc.)
12 - 15           | Reserved for future use.

The location of these files is not fixed (save for the boot sector which must be located in the first sector of the partition. NTFS is a flexible file system, in windows XP, Microsoft moved the location of the $LogFile and $Bitmap metadata files to improve overall performance. In fact, nearly all of the system files described above can be moved if needed to avoid bad clusters.

Microsoft stores every file or folder on your system as a record within the MFT starting at either record seventeen or record twenty-four. The reason I give two starting points here is because there are two different views on the subject. The Linux-NTFS project says that the MFT table doesn't use records seventeen through twenty-three, while ntfs.com says that file records begin at record seventeen. I have not seen Microsoft give a specific starting point for normal file records.

Files and their Attributes

In the MFT, normal records are made up of numerous fields called file attributes. A file attribute describes some aspect of the file that is contained within the MFT record. Going into more detail, a descriptive list of attributes are as follows:

File Attributes

Standard Information:
Old school file attributes: read only, timestamp, link count etc.
Attribute List:
Almost like another metadata file. It gives locations of all attribute records that don't fit in the actual MFT.
File name:
The name of the file. The long name can be up to 255 Unicode characters while the short name follows the 8.3 old-school format. Additional names (required to meet the POSIX standard), or hard links are stored here also as file name attributes.
Data:
This attribute contains the actual data (if it is a small file) or is the base file that points to the extent on the disk that contains the data. It is possible to have multiple data attributes per file.
Object ID:
A volume unique identifier. Used by the distributed link tracking service.
Logged Tool Stream:
Similar to a data stream, but operations are logged to the NTFS log files. This is used by EFS.
Reparse Point:
Used for Symbolic Links (yes NTFS does have this capability), Junction Points, Volume Mount Points, Remote Storage Server.
Index Root:
Used to implement folders and other indexes (to be explained below).
Index Allocation:
Used to implement the B-tree structure for large folders or other large indexes (to be explained below).
Bitmap:
Used to implement the B-tree structure for large folders and other large indexes.
Volume Information:
Used only in the $Volume system file. Contains the volume version.

As mentioned, with small files (usually no more than 1kb), the data resides in the MFT record as a resident attribute. In most cases the file is too large to fit in the MFT record. In these instances, the data attribute contains the VCN-to-LCN mapping information which points to the extent on the disk where the data resides as a non-resident attribute (an extent or data run is where the data is actually held on your hard disk). Using this map, the MFT points to the physical location of the extent by referring to the Logical Cluster Number(the LCN is simply a numbered ordering of all clusters on the volume) and the length of the extent. Each extent must consist of contiguous set of clusters on the disk. NTFS organizes the extents of each file logically (even though they may not be physically contiguous) by the assignment of a Virtual Cluster Number (VCN).

For example, I have file A that is too large to fit in the MFT. NTFS writes the data attribute of file A onto the hard disk starting at LCN 127. The length of the file takes up 5 clusters - but cluster number 130 is bad or occupied. The File on disk would look like: |data | data | data | another file | data | data |. A VCN to LCN description for this file would be clusters 0, 1, 2, 4, 5 to 127, 128, 129, 131, 132. The MFT would point to LCN 127 as the start of the run, identify it as VCN 0 and count the length of the run (3 clusters). It would then point to LCN 131 continuing the run, identify it as VCN 4 and count the length of the run (2 clusters).

Folders and the B+ Tree Data Structure

Directories under NTFS are indexes that contain the filename attribute, file reference, timestamp and file size for the files organized by that index. Indexing and sorting the files speed directory access, there is no need for NTFS to organize the data every time you list the contents of the directory. The duplicate attributes in the index also save time - as the NTFS doesn't need to look up that information in the MFT every time the directory is accessed. Also, because the index contains the file reference (a 64bit number identifying each file) there is no need to search through the MFT for the file.

When a directory grows too large to fit into the limited space of the MFT it expands from it's entry onto the file system. NTFS creates child indexes on the disk - referenced by the parent index in the MFT. To expand the directory structure onto the disk NTFS implements a B+ Tree data structure, expanding 'out' rather than 'deep', allowing for fast retrieval times.

Possible Attacks on NTFS

Any unauthorized modification of file attributes is an attack on the integrity of the Windows File system. This could include the modification of the security descriptors or the timestamp for a certain file. Another exploit within the windows file system would be the abuse of alternate data streams for a quick way to hide data. The virus Win2k.Stream is an example of this kind of abuse, so is my hide program. Security Descriptors could also be completely bypassed by using another ntfs driver to read the file system. The oft referred to ntpasswd utility uses this method to circumvent permissions when accessing the SAM file on an NTFS drive.

Is there a need to attack the NTFS or the MFT itself? Programs rarely touch the file system directly. Any requests that you issue will be passed into kernel and then to the NT I/O manager. The I/O manager then calls the NTFS File System Driver which in turn accesses the file system. Because of this approach an attack on the file system becomes unnecessary. The cleaner method of attack, and one that you'll see in rootkits is to intercept the I/O request before it reaches the file system by either hooking into dispatch functions of the driver or setting up a file system driver filter.

tools & links of interest:
ntpasswd | ntfs tools | ntfs progs | sysinternals

feedback? comment on livejournal.