Wrangling with Filesystems

Chai Jia Xun
a read

I’ve been on Windows since 98 so my external hard drives have always been formatted with NTFS. It was only recently that MacOS entered my life and forced me to reformat my drives to a cross platform compatible format, namely exFAT. I also decided that I needed a backup for my backup drive in the event that failed, or got somehow destroyed, so I bought another drive.

It appeared straightforward enough.

  1. format K format the new drive to exFAT .
  2. copy J > K Copy entire drive’s contents.
  3. format J Format original drive to exFAT.
  4. copy K > J Copy contents back.

I had just completed step 2 when I noticed to my horror, that simply changing the drive format to exFAT had eaten up more than 300GB of free space on my drive.

Wait what?

At first I thought it could be some random temp files lying around, so I performed a disk cleanup. Then I checked the properties of a few folders and that’s when I saw it.

The huge discrepancy was one of the most mind boggling things I’ve seen. I checked a text file and got this.

1 MB on disk to store a 45 byte file. That would explain where that phantom 300GB came from.

It turned out that the default Allocation Unit Size (AUS) when formatting was 1MB. Every file size was being rounded up to the nearest 1 MB. If you want to know why this is the case, I have created a mini post explaining this phenomenon.

Solution and Implications?

It seems the obvious solution here is to choose as small of a AUS as possible. And that was what I did.

So I managed to reclaim some of the disk space, but at what cost? They wouldn't give us the option to change the AUS if there wasn't a trade-off right?

One quick google later...

As it turns out, a large AUS is better for storing large files as it would require less table lookups for each file and file transfer speeds should improve. A small AUS is better for storing many smaller files as not as much space would be wasted.

So what are the actual trade-offs if we choose a smaller AUS? Sure, we know the theoretical trade-offs, but I had to know if this had any impact in actual use.

Experimentation

The Setup

In order to create a comprehensive test, prepared three sets of test data each with a different type of content.

The first, small-files contains files averaging in at 100kB each. This is to emulate project folders or software installations. I used the ancient Android Eclipse package for this test.

The second, medium-files, contains pictures ranging from 200KB to 50MB averaging ~9MB per file. This footprint will be very similar to the photos taken from a digital camera.

Thirdly, large-files contains files averaging ~220MB. These would be similar to the videos you get from a modern phone or digital camera.

Lastly, xlarge-files contains a single 1GB file, similar to what you’d get if you had a disc image or a unprocessed raw video.

Here is the test data in tabular form if you really like tables.

The Tests

The files would be copied to and from my Surface Book’s internal solid state drive. That should mean that the main bottleneck would be the read / write speed of the disk drive.

I wanted the tests to be as automated and replicable as possible. So I wrote a powershell script that would copy the files 12 times, time them and average the runtimes. To get more consistent results, I ran the copy 2 times prior to that. This would serve to minimize variances from OS scheduling and prevent delays caused by the disk spinning up from sleep mode.

Finally, I left all my programs and background processes consistent throughout the tests so that the OS scheduling should be consistent across the different drives.

The Results

Again, in tabular form if that’s your thing.

Now we can quite clearly see that having a larger AUS would indeed give us higher read speeds, and this is good right? Well, you’d save about 10 seconds on a 1GB file and 20 minutes on a 1TB file transfer. So this would be great if you almost exclusively only stored files above 5MB. But if you have any folders with thousands of files (usually programs or coding projects) the space wastage would be quite huge.

Conclusion

Just go with as small of an AUS as possible. I feel that the space savings are more substantial than the time savings you’d get had you a large AUS.

“But what if I want to edit videos? or do work requiring faster read/write speeds?” I metaphorically hear you ask.

In that case, you really should look into getting an SSD for work. Stop editing your videos on spinning disk drives. Seriously.

“These tests were done on a spinning hard drive, what will the results be for an SSD?”

Frankly, I have no idea. I theorise that SSDs are fast enough that the file lookup will be negligible take any time. So I’d say still stick with a small AUS. I will probably run tests some day to see just how much the AUS size affects an SSD.

Now, there were a few more things I could have done to get even more accurate results. I could have interlaced the tests as well as swapped the AUS on each drive and run the test again to eliminate differences in the drive hardware. I could run this test again on a flash drive to completely eliminate drive accessing times even and limit the variable to just the table lookups.

However, doing so would make the testing much more clinical and would stray from the purpose of running real world tests. Truth be told, I also had no intention on spending that much effort on testing. I believe that the numbers I got were more than enough to give you a sense of how the AUS affects the read/write speed (for a spinning disk) in a real world setting.

Saving More Space

Now the default windows format dialog (Windows 10) has a minimum AUS of 256KB. We still have the problem of each file being a minimum of 256KB. Reducing that definitely saved us a lot of space, but there are a few more things that could be done to save even more space. I have pulled this section out into another blog post as I felt this was getting a bit long.

Check the next post: Goodbye Ghost v0.11, Hello Ghost v2 »

Share on:
Chai Jia Xun