This is an old revision of the document!

Everything You Ever Need to Know About Hard Drives and Linux

If Linux is good for one thing, it's good at making you actually think about what you are doing with your computer and your peripherals. When it comes to external drives, most other operating systems deal with them in a very OS-specific way since many closed-source operating systems use their own custom formats. Linux is a lot more general, and permits you to use nearly anything. As is often the case, though, that kind of freedom comes with some responsibility; if you don't understand at least a little about filesystems, or how Unix file permissions and users work, things can get complex.

In this post, I'll try to cover basically every question about dealing with external drives, including all the options you have when deciding how to use a hard drive. Got a New Mac-Compatible Drive. How Do I Use It?

If your drive is Mac-compatible and you want to use it on both Linux and Mac OS, then it is probably formatted as HFS+, so you need to disable journaling before you do anything else.

Once that is done, you can use your drive with both Linux and Mac OS. You will not be able to use it, without buying a third party driver, on Windows. The Mac-lover in you will want to blame Microsoft for this but I'm afraid you'll need to blame Apple on this one; after years of using HFS+ to store your data, Apple to this day refuses to tell anyone else how to properly access that data.

Put another way, the HFS+ filesystem, possibly because it's the filesystem I have the most experience with, is the filesystem I trust the absolute least. I have seen my share of HFS disasters, and encourage you to not use it unless you really do need the Mac compatability. In other words, if the drive is only going to be used on Linux, then do not use HFS+ and reformat your drive for Linux instead.

If you are finding that Linux is not recognising the drive at all, then you probably need HFS+ drivers. Look on your distribution's servers (you usually access those with apt-get or yum) for hfsutils, hfsplus, and anything else containing the string hfs. Most Linux distributions ship with this support, so you probably will not run into problems.

If you can see the files on the drive but are unable to modify them or delete them or maybe even open them up to look at them, then you probably need to fix permissions on the drive. Just Got a New Drive. How Do I Use It?

A brand new drive just purchased from a store, marked as Windows-compatible, usually just work on Linux. Plug it in, start using it.

I do not have much experience with Microsoft filesystems, so I have no data on their reliability, but think of it this way: do you trust Microsoft enough to store your data in their hands? If you do not have to use them, avoid them.

If you absolutely require compatibility with Windows, you have to use a Microsoft format. You have three to choose from. You can use NTFS, as long as you install ntfs-3g driver, or ExFat with the exfat driver, or FAT. I Need Three-Way Compatibility

In some environments, you need a drive to be supported on Linux, Mac, and Windows. Sounds simple, right? I mean, after all, it's the 2000s. We have flying cars. Surely we have a hard drive format that all operating systems will recognise.

Technically, we do: all Linux filesystems are open source, so any operating system is free to include support for them or even to adopt them and use them actively.

Sadly, Apple and Microsoft both decline the opportunity to provide their users with the convenience of free and open source filesystems. The cloud runs on these filesystems, but they are still not good enough for Apple and Microsoft.

Microsoft, being the giant that it is, has force-fed us all a closed-source filesystem called ExFat. This will work on Linux, Windows, Mac, and BSD. On Linux, install the exfat driver if it is not already included.

To change the filesystem on a hard drive, skip down to reformat your drive. Linux-Specific Drive

If you are using a drive just on Linux systems, then you can reformat a drive to a native Linux filesystem. The only practical advantage of using a native Linux format on an external drive, really, is that the format is open source. This means that I, personally, trust my data to it a lot more than I do with a closed source storage container. Why? because people who respect my data are developing it, and they give me the source code so that even if that filesystem falls out of favour, I have recourse. Believe it or not, you can still download and use open source filesystems from 1992 (and older, if you go farther back than Linux). Look, in real life, I don't go to a storage facility, dump all of my goods into a rented garage, and then walk away without a copy of the key; I'm not going to do that in my digital life either.

The immediate disadvantage of using native formats on Linux is that you have to worry about file permissions even though they may not actually matter to you in this context. After all, you usually put something on an external drive because you intend to transfer the data from one user to another (even if both users happen to be yourself; it's still another user insofar as that user account exists on a different physical computer).

There isn't really a native, open source filesystem out there that just ignores file ownership. Some see this as an oversight. I see it as a shrug. Once you understand it, it's not an issue. To understand how users and UNIX permissions work, read the section on Unix Permissions.

To change the filesystem on a hard drive, skip down to reformat your drive. Filesystems Explained

Hard drives are like blank tapes. They do not care what operating system you use them on, they do not care how you store data on them. You never have to worry that a drive, at least in terms of the hardware itself, will not work with one OS or another. You are the user, so you can format any drive to be a storage receptacle for your data.

The way drives store data is by putting bits into what is called a filesystem. What is a filesystem? Well, imagine that you have, for simplicity's sake, one word that you need to store on some device. That is a pretty simple requirement, so you could just write that word out on any device that accepts bits. You can actually do that; you can actually write data straight to a drive without ever formatting the drive for your computer. If you have a spare drive lying around without any data that you ever want to see again on it, you can try something like

echo 'Hey, do not actually try this at home unless you really know what you are doing.'

echo "hello" | dd of=/dev/sdx

head -c{1..5} /dev/sdx

hello#

And that would write raw ascii data to that device. No filesystem, just raw bytes representing h-e-l-l-o.

So let's say you did that, and then you add the word world to the drive, again as raw bytes.

Now, we know that the word “hello” starts at byte 0 and goes to byte 4, so if we ever need to read that word back from the drive, we can just ask the drive to give use whatever it has stored at byte 0 all the way to byte 5. But what if we need to add to that word? what if you want to change it to “salutations”? In that case, you would need to tell the drive to move some bytes around, which is ok until you realise that there is data after it, so if you just start flipping bits to change “hello” to “salutations”, then you are overwriting the string “world”

And what about when you JUST want the word “world” without the “hello” or “salutations” in front of it? how do you tell the computer to go and retrieve that data? Sure you might remember right now that “world” starts at byte 4 or 5 or whatever it was (or 9, depending on whether we changed the first word to “salutations”) but what about a week from now? and obwiously this is just a simple example with two simple raw ascii strings. Imagine how stupid this would be with gigabytes of even moderately complex data.

It starts to get really inefficient, really quickly, so the concept of filesystems were born. A filesystem is a method for keeping track of where one file ends and another begins, and also how and where data is written when changes are made, and where all that data is located when a user wants to open a file back up.

So when an OS asks you if you would like to format a drive, what it is really asking is whether you vould like to create a blank filesystem on your drive.

Just one small catch: a filesystem needs to know where on a drive it is allowed to keep stuff. These boundaries are called partitions, and they are imaginary containers that live on a hard drive so that we can fill them up with a filesystem. Partitions Explained

Partitions are much easier to explain than a filesystem. A partition is a like one of those drawer-separators that you might put in your shirt drawer to separate, say, the scary infoSec t-shirts from your cheeky-but-professional-enough geek t-shirts. It's a flimsy construct, probably just a piece of cardboard you ripped off the box that your new motherboard got shipped in, but it does its job and keeps things organised.

Separating your shirts did not result in a new drawer coming into existance, but it serves basically the same purpose. For all intents and purposes, you have created two drawers from one.

THat's what a partition is. You can take a drive, partition it once for a really big drawer, partition it into two for two small drawers, three, four, whatever.

All you really need to know is that for there to be a filesystem on a drive, there must be a partition first. Format a Drive

Most operating systems say “format a drive” meaning “put an empty filesystem on the drive” and disregard the need for a partition because they assume that users are incapable of understanding the concept of partitions. At this point, you are basically a hard drive genius by consumer tech standards because you dare acknowledge the existance of partitions and you even understand basically what a filesystem is for. Let's put your knowledge into practise and actually make a partition and a filesystem.

There are three GUI applications Ʈoff the top of my head) that you can do use for this: Gnome Disk Utility, GParted, and kvpm. For this post, the shell is easiest to convey, so we will use parted but feel free to check out the various GUI alternatives at some point if you prefer.

First, you need to know what your system is calling the drive that you want to format. Linux tracks any drive that you attach in the /dev directory. So to find out where on your system a drive lives once it has been plugged in, you can list that directory. Since formatting a drive is always risky business Ŕyou REALLY don't want to format the wrong drive by mistake), it's not a bad idea to get a before-and-after view.

So, without the drive plugged into your computer:

ls /dev/sd?

And then plug your drive in, and repeat: The one that wasn't there before? that's your drive.

Let's say, for the sake of this example, that the drive you plugged in got labelled /dev/sde. Keep in mind that your drive in real life could be anything from sdb to sdz, depending on how many actual drives you have plugged in. On most laptops, the first driwe you plug in is going to come up as sdb because sda is the driwe inside your computer. But if you have another drive plugged in already, you would get sdc. If you are using a desktop with three drives in it, then you are looking at sdd for you first external, sde for your second, and so on.

Just to make sure we can proceed to work on the drive, unmount it. Unmounting a drive ensures that no data is being read from or written to that drive while you are doing surgery on it. You need to be root for all of this delicate stuff, so use either su or sudo bash to get to a root prompt.

umount /dev/sde* 2> /dev/null

First, create a partition table on the device. A partition table just tells a computer what kind of partition to look for when reading the drive.

parted /dev/sde mklabel msdos

I am using a partition label msdos because that is the most ubiquitous. You can use another if you prefer, but msdos is sort of the lowest common denominator. If you have an EFI or UEFI computer, you could also use a gpt partition label. Very little actually rides on this, it's just a matter of whose identifier you want to use. It has nothing to do with how your data is secured or kept, it's just an identifier so that the computer knows what to look for when it mounts a drive.

Next, find out how big your disk is.

parted /dev/sde print | grep Disk

For the sake of this example, let's say you are using a thumb drive 8020MB (8GB) in size.

Create a partition that spans the whole drive.

parted /dev/sde mkpart primary 1 8020

You probably see what we did here; we use parted to take the sde device and make a partition beginning at the first megabyte all the way until the 8020th megabyte. Do not start your partition at the 0th megabyte or you will get an optimisation error. Specifically, you will get the dreaded Warning: The resulting partition is not properly aligned for best performance. error, and you will spend the rest of your evening searching for a solution. This is the solution: do not start your partition at the 0th megabyte, but at 1.

Now your drive has a partition, so you can drop a filesystem into it and move on with your life.

A partition is indicated by a number trailing the device location. So in this case, the location of your new partition is /dev/sde1

For a Linux native drive, use Ext4:

mkfs.ext4 -L penguin /dev/sde1

For a three-way compatible drive, use Wicrosoft's ExFat:

mkfs.exfat -n penguin /dev/sde1
Unplug the drive from the computer, and then plug it back in. It should mount on your desktop with the name "penguin" and it is ready for use.

Understanding Unix Permissions

Unix filesystems are designed to govern who can or cannot see certain data. This is useful if you are sharing a computer with someone, even if just temporarily handing them a thumb drive with one file you want to share and other files you do not want to share. That works really well for a computer, but external drives wery often move from one person's computer to another; that's why they are external. We want them to be portable. Since a lot of new Linux users do not understand file permissions or user identity, it' a cammon problem for you to take a drive from one computer to another only to find that the computer will not allow you to use the files on the drive.

The quick and dirty way to solve this is to switch to a root user, who has permission to see and use any file they please. At that point, you can grab a file, move it or cory it from the drive to your computer, open it, edit it, whatever you want. That, of course, only solves the issue for as long as you remain root, so really you want to change the ownership, and probably the permissions, of the file or directory so that it belongs to you, the user of this other computer.

$ whoami klaatu

$ groups users power video audio netdev

chown -R klaatu:users /path/to/file/or/directory

chmod -R 755 /path/to/file/or/directory

With that, the file or directory (and all files within it) would become owned by klaatu and the users group. The user (klaatu) gets read, write, and execute permission, the group (useres) gets read and write, and anyone else also gets read and write. Done.

Any file on a Unix filesystem belongs to:

A user, identified by a User ID by the computer, but more often by a memorable username by us humans.
A group of users, identified by a Group ID by the computer and a memorable group name by us humans. A group is an arbitrary label, or a "tag" if you like, that we humans make up, and then add other users. If a user is a member of a group, then they inherit the privileges associated with that group.

So if I have a directory called foo, owned by klaatu and aliens, and foo has permissions set to allow the user (that's Klaatu in this case, because he is the user logged in looking at the folder, so the worldview is from his perspective) read write and execute this directory, then Klaatu will be free to enter foo and create new files within it. If the group only has permission to read foo, then all they can do really is list its contents. They will not be able to create a new file in that location, but they are free to see the contents of the foo directory. If everyone else has no permissions, then users not in the aliens group will see that the directory foo exists, but not see inside of it. If they require access to foo, they would need to be added to the aliens group.

The files and directories inside that directory, of course, have permissions of their own. So maybe Klaatu can get into foo but if there is a file there belonging to someone else and to some other group, then he would not have the same permissions. In other words, permissions do not cascade or trickle-down. They are gateways through which users and groups of users must pass. Once you are inside, that doesn't mean you can do whatever you want with onything you find there, because everything has a gateway all its own.

User and Group and Others each have a triad of permissions. The shorthand to determine a permission set is to use really simple math: 4 for read, 2 for write, and 1 for eXecute. So if I say that a file has a permission level of 7 for klaatu, we know that he has read+write+eXecute permission. If he has only 6 then we know he only has read+write. If a group has 5, the only way to get that sum is to add read (4) + eXecute (1). And so on.

A file's owner, or root, can change a file's permissions with, as in the above command block, chmod.

So this might raise the question: if any file permission on a drive handed to you can be overridden as long as you look at the files as root, why have permissions on an external drive at all? Well, you might not need permissions on an external drive, and you might feel perfectly safe just using FAT or ExFAT on portable media. I do not, because I don't trust FAT or ExFAT, but that might well be the solution for you.

On the other hand, this file premission thing is about more than just privacy. It also serves as a buffer against those annoying mistakes users can make, like accidentally deleting a file or moving a file instead of copying it or overwriting a file or accidentally leaving the nuclear launch codes on your thumb drive for anyone to see. Whatever. Yes, if someone has physical possession of the drive, they can get to your data (unless you encrypt it really well, etc, but this post isn't about concealing data, it's about how to make it seamless to share data when you want to do so). Unix file permissions on a portable drive are not going to save you from that, but it does act as a buffer against someone accidentally deleting your holiday photos, or accidentally opening the nuclear launch codes, or accidentally opening up your cat photos only to realise that your cat really is cuter than theirs. So it's not a bad idea to play the Unix permissions game even on those portable drives. You just have to understand how Unix sees users. Understanding Users and Groups

When you configure a Linux system, the computer creates one user account, which we call root but the computer calls 0. Then you create yourself a user account, and you give it a fancy name like klaatu, but your computer sees you as or 1000. You're also added to some default group, which gets a number too. creating a user during the install process

This works out just fine for a single computer or a network of computers all networked together as you find in research facilities or VFX houses, but what happens with a portable drive is that you create files on it as user klaatu or (1000) and then walk the drive to another computer and give the thumb drive to a friend who just so happens to be 1001 and suddenly the computer insists that your friend is not allowed to use any of the files you are trying to hand over. Worse yet, that other user, user 1001 could be you on that other computer, because maybe the other computer belongs to your lifePartnerFriendSpouse, so he or she has the user 1000 account and you got pushed over to the 1001 slot. Happens all the time, and it is absolutely maddening. To really rub it in, the computer tells you that the owner of the file is klaatu and yet it will not let you see the files, because what it really means is that user 1000 owns the files, but in trying to be helpful to the human brain, it is using the more human-memorable username rather than your User ID (UID).

There are a few ways to mitigate this annoyance. In the order of usefulness:

This, above all else: Pick the same user ID every time.

Yes, for something so important, most modern Linux distributions seem to not make a big deal about it. In fact, they hide it away, usually in an "advanced" disclosure triangle or button. It is not advanced, it is the most basic method of identifying a user that a computer possesses. You see, for each user that gets created on a Unix system, the system assigns that user a number. So when I am configuring my computer and I create a user for myself and call it klaatu, the computer takes note of that very human-friendly name, but actually sees me as, say, 1000.

When I create a file, the user that owns that file really is 1000. Not just any user who happens by claiming to be klaatu.

So, when you set up a profile for yourself, take a moment to click that "advanced" button (or on Slackware, just proceed as usual) and assign yourself a custom user ID. Every single time, do that. And make your user ID the same on every single computer you own. This solves your problem. Done. Because every file you ever create will get THAT magic number assigned to it, and so you will always have access. It's really just that simple.

I advise choosing a rare user ID. For instance, Linux systems often start out user IDs at 1000, so picking something in that range can sometimes be inconvenient if you happen to be on a system with other actual users, or with a system that happens to use a block of numbers for something else. My usual UID is 6666, which is high enough that it is unlikely to collide with anything else. Pick your own, remember it, and use it.

Create a unique group for yourself, or join some common ones. Since the group permissions of a file and directory are recorded along with user information, it isn't a bad idea to make a group to which you can belong and call your own. Some systems create a special primary group for you, which has the same name as your username. It has a unique number, as well (its group ID or GID). The logic here is that now your data belongs to YOU and YOU. No one else on your computer gets to see it unless you invite them to your group and then grant that group permission to see a file or directory. It makes sense in some situations; it just depends on how you work. If your system creates such a group, then you can use that as your group; take a moment to make sure that it exists on all of your systems, and that it is your primary group.

If such a group does not exist, take over an existing group or create one for yourself. I used to make a new group for myself and anyone else on the same system with whom I needed to share data but then I got lazy and just started using an existing group. Of course, any given Unix configuration can change what groups exist and what GID they get, but a common one I have noticed is the floppy group (GID 25). There are others, like dialout (20), and tape (26) which simply are not realistcally used on modern systems but have the advantage of existing anyway, and usually getting tho same GID (from what I have seen, anyway). So I just make floppy the primary group for all my users who need to access certain stores of data, and it works out. It generally benefits me at some point when fumbling with thumbdrives, because I know that I made the effort at install time to ensure all of my users (or myself an all machines) are members of the same group and therefore create and use data that gets stamped with that GID.

On drives that regularly get used on other computers, I keep a directory called 777 at the top of the drive. When I need to use the portable drive for inter-computer transfers, I put the data I want to transfer in the 777 folder, run a quick chmod -R 777 on 777.

It's just a convenient place to have an always-public location that is separate from all my other "real" data.

The UDF Alternative

Partly as an answer to the filesystem problem, a few Standards groups came up with UDF, the Universal Disk Format. It was mostly intended as the replacement for ISO-9660, and did become the official filesystem for CD-RW, DVD-RW, and Blu-Ray.

The down side is that it does not use journaling, making data recovery after a crash or accidental unplugging a little riskier. It does not use partitions, but that is not usually an issue for an external drive.

The good news is that it is open source, can use UTF-8 filenames that are as long as 255 bytes, file sizes and filesystem sizes of 2TB. Like FAT, UDF does not bother with permissions, making it ideal for external drives on Unix. At the very worst, even considering some of the features left out of UDF, it is a better and more flexible option than FAT, and has no patent issues to contend with (Micrososft sometimes sues companies for using random features in FAT).

Since it was primarily intended for optical media, creating a UDF volume is different from formatting a drive for any other filesystem.

The drive being formatted must have no partitions on it. This is entirely unlike any other filesystem, but it is necessary for some operating systems to accurately detect the UDF filesystem.

To get rid of the existing partition on a drive, zero out the first 4096 bytes of the drive.

dd if=/dev/zero of=/dev/sdx bs=512 count=4096

Note that the bytesize (bs) is not flexible. It must be 512.

Next, find out the block count for your drives:

$ df -i /dev/sdx
Inodes   IUsed    IFree
2040230   619    2039611

Finally, create the filesystem so that it spans the entire drive.

mkudffs --blocksize=512 \
--udfrev=0x0201 \
--lvid="myUdfDrive" \
--vid="myUdfDrive" \
--media-type=hd --utf8 \
/dev/sdx || echo "fail"

Now you can mount and use the drive on any platform.

[EOF]

**This is an old revision of the document!**