**This is an old revision of the document!**

Hard Drives and Linux

Hard drives can broadly be classified as either internal or external. Internal drives tend to be easier to manage, because they are not portable; they are integrated in some way with your operating system, even if it is just extra storage space that you use on an as-needed basis.

Here is a quick profile of your internal drive:

Internal Drives

The drive inside your Linux computer is either an SSD or a traditional spinning disk drive, or you possibly are using both. Your drive(s) can be seen as one pool of available disk space by Linux, or conversely one drive can be seen as several different, artificially-separated disks even though it's actually one physical drives. Computers are able to perform this perceptual magick through “partitions”: imaginary boundaries that an OS agrees to respect as an (air-quotes) Different Disk than the rest of the drive (or, when disks actually are physically separate, the computer can see all of the disks as [air-quotes] One Big Drive).

There is nothing special about hard drives in terms of storage except that they are relatively fast and efficient in managing a lot of information. Abstractly, though, they are no different that, for instance, a tape archive (or a quarter-inch tape, or similar media). In fact, hard drives can be used directly as, basically, a tape drive.

Do not try this at home, you could possibly erase important data.

Assume we have attached a drive and it appears as /dev/sdx on our computer. Raw data can be written to the drive without any formatting, and independent of any file system:

# echo 'Do not try this at home unless you really know what you are doing.'
# echo "hello" | dd of=/dev/sdx
# head -c{1..5} /dev/sdx
hello#

The data “hello” was written as raw bytes on the drive. It was not written as a file, so if you plug the drive into another computer, it will not look like there is anything on the drive (if the computer even understands how to mount the drive), but the string “hello” is still on the drive. It's just written as raw data.

This is actually how data was stored for a very long time,but eventually the disadvantages became too much to bear and someone invented a “file system” which does exactly what its name suggests: it creates a system for managing files. With a file system, you don't need to know the exact byte count to retrieve your data off of a hard drive; the drive itself remembers that for you. Rather than reading raw data, bytes 1 to 5, for the string “hello”, we could instead just make a request to our drive for our “hello” file, and no matter where that file was or how many times we have revised and added to it, the computer can quickly and easily find it on the drive and show it to us.

Your Linux drive is running an open source file system, probably either ext4 or jfs. These file systems, although open source and free for anyone (person or corporation) to use, the major closed source operating systems decline to include support for them. This means that the drives are recognised as being blank or simple “un-readable” by another OS. It's advantageous to use the file systems nevertheless, because aside from being some of the most stable on the market, they have features that help Linux run efficiently.

Once a drive has a file system, it can be mounted and used by your operating system. On Linux, when you attach a drive to your computer (internally or externally), the drive (upon detection) is assigned a node in the /dev directory. Drive nodes are dynamically created on an as-needed basis. Internal drives are assigned nodes as they are detected by the system, which is usually predictable after you notice the pattern but really it depends on the motherboard and which slots your drives are plugged into.

The first detected drive is assigned /dev/sda. The “sd” prefix denotes the type of drive it is (actually it's historically inaccurate, but as a revisionist you can think of “sd” meaning “Sata Disk”), and the “a” is the first letter of many alphabets. That node represents the physical drive itself, which is different that what's on the drive. If a partition is found on that drive (most drives have at least one partition), then it gets a node /dev/sda1. If there is yet another partition, then it would be assigned /dev/sda2.

The second drive found gets /dev/sdb and its partition /dev/sdb1, and so on.

These nodes only represent that drives are attached. They are not directories that you can open and view data in, they are metadata about your system.

The data on the drives is used by your Linux system, and internal drives are usually automatically mounted by Linux because those drives appear in the file /etc/fstab as drives that are to be mounted upon boot.

The take away points from this overview of your internal Linux drives are:

  • Use ext4 or jfs or any Linux-native format for drives that are directly a part of your operating system to avoid unexpected results in how you OS works.
  • Drives get nodes in /dev based on when they were detected, and how many partitions exist on them.
  • Internal drives are automatically mounted by the system only if they are listed in /etc/fstab.

There is no quiz on this, but now you know.

Formatting Internal Drives

If you purchase an additional internal drive for your Linux computer, you probably should be using a Linux-native file system. There is probably no advantage in using a non-native file system, because it's an internal drive; other computers are not plugging into in asking to use data off of it (and if they are, they are doing so over your network, and TCP/IP makes basically everything universal).

(The possible exception here is that you are getting a large drive that you want to use as shared storage space between a drive running Linux and a drive running some other OS. This is not recommended, but if you do this, then treat the drive as an external drive. Slackermedia does not support this, because the other OS adds a significant variable to how your data is being managed, so you are, respectfully, on your own!)

To format an internal drive for use with Linux:

Determine the device node of the drive you are going to format by first seeing what drives are already part of your system:

$ mount | egrep '.*sd.*'
/dev/sda2 on / type jfs (rw)
/dev/sda1 on /boot type vfat (rw,fmask=177,dmask=077)
/dev/sdb1 on /home type jfs (rw)

In this example, there are two drives already in use by the system: one being used as the boot and system drive (sda), and another (sdb) used exclusively for the home directory.

Compare that list to what the computer actually has attached:

$ ls -1 /dev/sd*
sda
sda1
sda2
sdb
sdb1
sdc
sdc1

In this example, there is a third drive not in use by the system, labelled sdc. This is the new drive that needs formatting. Notice that it does have a partition on it already, but that's only because most all drives purchased from a modern computer store are pre-formatted, presumably so that users do not have to learn about formatting themselves.

Keep in mind that your drive in real life could be anything from sdb to sdz, depending on how many actual drives you have plugged in. Usually, the first drive you plug in is going to come up as sdb because sda is the drive running your computer, but be aware of your actual setup and use your head. You do not want to format the wrong drive.

If you are unsure that you are targeting at the correct drive, mount it and have a look at what's on it:

$ su -c 'mount /dev/sdc1 /mnt/hd'
$ cd /mnt/hd
$ ls
.
..
Acme Drivers
Acme Backup Pro Plus
$ df -h /mnt/hd | awk '{print $2}'
Size
2.8T

In this example, the drive is mounted at /mnt/hd (a pre-existing directory for quickly mounting drives on Slackware) and is shown to contain basically nothing, if we ignore the obligatory drivers and bloatware bundled by the vendor on the drive.

Confirming the size of the drive provides further reinforcement: yes, this really is the 3TB drive you have purchased.

With that settled, unmount (with the umount [sic] command) the drive so that you can perform surgery on it:

$ cd ~
$ su -c 'umount /dev/sdc*'

Create a fresh partition table on the device. A partition table just tells a computer what kind of partition to look for when reading the drive. Operations like re-formatting entire drives justifiably require root permissions:

$ su
# parted /dev/sdc mklabel gpt

Historically, the de facto partition label was msdos because that was (and still is) the most ubiquitous; msdos-style partitioning is universally recognised. For drives larger than 2TB, a gpt partition label must be used, because msdos partition labels cannot scale to 2TB.

Very little actually rides on this, it's just a matter of whose identifier you want to use. It has nothing to do with how your data is secured or kept, it's just an identifier so that the computer knows what to look for when it mounts a drive.

Next, find out how big your disk is:

# parted /dev/sdc print | grep Disk

For the sake of this example, assume the drive is 2834020 MB (2.8TB) in size.

Create a partition that spans the whole drive:

# parted /dev/sdc mkpart primary 1 2834020

This creates a partition that starts at the first megabyte (1) and spans all the way until the 2,834,020th megabyte.

Do not start your partition at the 0th megabyte or you will get the error Warning: The resulting partition is not properly aligned for best performance. Start your partition at 1. You are sacrificing 1024 bytes, but it's worth it.

Now the drive has a partition; all it needs now is a file system. Remember, a partition is indicated by a number trailing the device node. In this example, the location of your new partition is /dev/sdc1.

For a Linux native drive, use ext4:

# mkfs.ext4 -L penguindrive /dev/sdc1

Or jfs:

# mkfs.jfs -L penguindrive /dev/sdc1

The drive is now formatted. It's best to create a permanent, standard place for it on your system. Assuming that it is going to be used as extra storage space:

# mkdir /storage

To make the drive automatically mount, add it to /etc/fstab. For example, to have it mount as extra storage at boot time, add a line like this:

LABEL=penguindrive   /storage  jfs  rw  1 1

If you do not know the label of your drive, use lsblk -f. If your drive has no label, then use the PARTUUID (use UUID if you partition is msdos) instead:

LABEL=penguindrive   /storage  jfs  rw  1 1
PARTUUID=7280201c-fc5d-40f2-a9b2-466611d3d49e /storage  jfs  rw  0  2

Now mount the drive by mounting all drives listed in /etc/fstab:

mount /storage

HFS+. HFS+, in addition to being one of the least stable file systems on the market, is crafted intentionally to be incompatible with other systems; in order to write to HFS+ from Linux, you must disable journaling on the drive. You can, however, read from it without doing anything to the drive. Disabling the journal on an HFS+ drive must be done from within Mac OS. If you do not have access to Mac OS, then you cannot write to the HFS+ drive, do not attempt to write to the drive; you could do damage to the files. If you are stuck with a “Mac compatible” drive and want to use it as an active “normal” drive in your studio, with full read and write capabilities, then your best bet is to copy all of the data off of the drive, re-format it, and then copy the data back onto it. If you require the drive to remain compatible with a Mac as well as Linux (and, as a side benefit, Windows), then use the UDF format. If these are not valid options for you, then use Mac OS to disable the journal on the drive. Mac OS may later re-activate the journal without notice, so you may have to do this often. =====Windows Compatible===== If you have a drive that claims to be “Windows compatible” and you want to use it on both Linux and Windows, then it is probably formatted as NTFS or ExFAT, with a small chance of it being FAT32. None of these filesystems are particularly good but they are all well-supported by Linux (not, however, by Mac OS). If you are stuck with a “Windows compatible” drive and want to use it as an active “normal” drive in your studio, you probably can, as is. There are several inconveniences that you may notice (file size limitations, permission issues), but as external drives go, everything should basically work as expected. A better option is to copy all of the data off the drive, re-format it as a UDF file system, and put all of the data back on. UDF brings along with it several benefits, including Window and Mac compatibility, and elimination of the quirks of Window hard drive formats. =====Linux Compatible===== If you are using a drive just on Linux systems (recommended, but not always possible), then you can keep your drives in a native Linux format. The only practical advantage of using a native Linux format on an external drive is that they happen to be very robust file systems. They are comparatively difficult to corrupt or break, they are fast, well-designed and maintained, case-sensitive, and have very few limitations in a pragmatic sense. The immediate disadvantage of using native formats for external drives on Linux is that none of them were really designed for external use. That is, most of them assume that the drive is inside the computer. This tends to not matter much until you start swapping drives with other Linux users or computers, at which point file permissions can become a problem. Permissions can be managed for external drives, but you have to be intentional and mindful of it. A better option is to copy all of the data off the drive, re-format it as a UDF file system, and put all of the data back on. UDF brings along with it several benefits, including Window and Mac compatibility, and elimination of the quirks of Window hard drive formats. Partly as an answer to the filesystem problem, a few Standards groups came up with UDF, the Universal Disk Format. It was mostly intended as the replacement for ISO-9660, and did become the official filesystem for CD-RW, DVD-RW, and Blu-Ray. The down side is that it does not use journaling, making data recovery after a crash or accidental unplugging a little riskier. It does not use partitions, but that is not usually an issue for an external drive. The good news is that it is open source, can use UTF-8 filenames that are as long as 255 bytes, file sizes and filesystem sizes of 2TB. Like FAT, UDF does not bother with permissions, making it ideal for external drives on Unix. At the very worst, even considering some of the features left out of UDF, it is a better and more flexible option than FAT, and has no patent issues to contend with (Micrososft sometimes sues companies for using random features in FAT). Since it was primarily intended for optical media, creating a UDF volume is different from formatting a drive for any other filesystem. The drive being formatted must have no partitions on it. This is entirely unlike any other filesystem, but it is necessary for some operating systems to accurately detect the UDF filesystem. To get rid of the existing partition on a drive, zero out the first 4096 bytes of the drive. dd if=/dev/zero of=/dev/sdx bs=512 count=4096 Note that the bytesize (bs) is not flexible. It must be 512. Next, find out the block count for your drives: $ df -i /dev/sdx Inodes IUsed IFree 2040230 619 2039611 Finally, create the filesystem so that it spans the entire drive. mkudffs –blocksize=512 \ –udfrev=0x0201 \ –lvid=“myUdfDrive” \ –vid=“myUdfDrive” \ –media-type=hd –utf8 \ /dev/sdx || echo “fail” Now you can mount and use the drive on any platform. [EOF]