Glossary of Terms
Ediscovery and tape specific terminoligy
Archive – Long-term on- and/or off-site storage.
ASCII – American Standard Code Information Interchange. An ANSI standard code for transferring information from one computer language to another.
Backup – Copying information from a hard disk onto another data storage medium (e.g., tape).
Binary Coded Decimal – A method of encoding decimal numbers in a fixed number of bits, usually four or eight. See ‘Packed Numbers’, for example.
Bit – Smallest amount of data that can be processed by a computer; represents the binary value of either one or zero.
Block – A block is how and where data is stored and is the minimum addressable physical storage unit. A drive will work in two modes of operation: fixed block mode and variable block mode. A block can be any length (size) but typically in the range of 80 bytes to 512KB. Many current enterprise standard tape drives are configured to write fixed blocks in the range of 64-256KB. The contents of a data block are entirely application dependent.
BOT – Beginning of Tape. Designated by a specific mark, hole or logical means.
BPI – Bits per Inch. The number of data bits recorded per inch of tape.
Bus and Tag – A complex cabling system designed by IBM to interface external peripheral devices, including tape drives and line printers. Introduced in 1964 for the System/360 mainframe.
Byte – Consists of eight bits and represents one character of information.
Compression – (See Data Compression)
Convert – To change media or recording type from one format to another. Converting typically includes copying the data on one type of media to a different type of media.
CRC – Cyclic Redundancy Check. A complex mathematical method used to check that the data written to tape is error-free.
DAT – Digital Audio Tape. A relatively low-cost data storage method developed by Sony in the 1980s. Originally aimed at the professional audio recording market, and almost identical to the technology used in video recording, using a rotating head and helical scanning.
Data Block Marker – Identifies the start of user data in a block.
Data Compression – Permits increased storage capacities using a mathematical algorithm that reduces redundant strings of data; can be performed by software or hardware.
DDS -Digital Data Storage. A data storage technology based upon digital audio tape, where minimum quality standards are set for the recording of computer data. Media and drives are often identified by the DDS logo.
Differential Backup – A backup of all files that have changed or have been augmented since the last ‘full backup’. Slow backup but faster restore – no need to load previous incremental’s.
Data Integrity – Validity of recorded information.
Degausser – A machine that uses a magnetic field to remove previously-written data on tape by randomizing the magnetic orientation of the media.
Density – The amount of data stored in a given length of tape; usually expressed in bits per inch (BPI).
DLT – Digital Linear Tape. A popular ” tape cartridge format developed by DEC originally as CompacTape. It is also manufactured under license by Quantum.
Double Density – See MFM
Dump – A very simple method of exporting data to tape without any descriptive volume or file headers. Simple record-based data may occasionally be handled in this way, but it is not a recommended method of transport.
EBCDIC – An IBM proprietary 8-bit character coding matrix, usually found on older mainframe and AS400 systems. Originally derived from punched card systems. In any migration or data interchange between different systems, the codes are usually converted to ASCII or ANSI before the data files can be used.
ECC – Error Correction Code. Mathematical algorithm used to correct errors.
Edge Seek – Method of using the recording head to detect the edge of tape and then to reference the tracks from the edge of tape, thus assuring the tracks are positioned accurately.
Endian – The order in which a sequence of bytes is stored. There are two versions depending on the design of microprocessor: Big Endian and Little Endian. Big Endian is where the most significant bytes are stored first (in the lowest value position). For example, the number 22,042 represented by hex values 561A, would be stored as 561A; in a Little Endian system, it would be reversed, 1A56. This rather unusual term is derived from Jonathan Swift’s famous book, Gulliver’s Travels.
End of Tape (EOT) – This is the logical end of tape where recording has stopped. It is the point immediately before that additional may be appended. Many tape drives will not allow old data to be accessed beyond EOT. If the start of tape is overwritten, data may still be there, but cannot be accessed unless the drive firmware is modified which usually involves some data recovery expertise.
Erase – To remove previously written data by randomizing the magnetic orientation of the media (see Degausser).
Error – Loss of magnetic signal strength to a degree that data cannot be read.
Error Correction Codes (ECC) – Onboard system codes that correct errors, in real time, without the drive having to perform retries, thus not affecting processing time.
Error Recovery Procedure (ERP) – In the event of a potential defective area of tape, the drive will stop processing data and effectively process errors by performing mechanical drive retries. This does impact processing time and may lead to a block of data being written elsewhere on the tape.
ESCON (Enterprise Systems Connection) – Developed by IBM as an improved means to connect external peripheral devices, including tape drives, to their mainframe systems. Replaced the earlier Bus and Tag method that had been in use since the 1960s.
Field – The smallest unit of information within a ‘record’. It might be somebody’s surname or the first line of an address or a product code.
Files – One or more recorded blocks of data.
Filemark (or Tapemark) – These logically separate a series of blocks. They contain no user data but can be searched for very quickly by the tape drive. Typically, they separate volume headers from the start of data and between individual savesets. Some formats (IBM labelled tape) use two filemarks, with no data block between them to indicate the logical end of a tape. Most logical tapes do terminate with a filemark.
File-by-File Backup – Method of backup in which each file is stored separately and sequentially. Extremely useful if you need to restore or interchange a single file.
Flux Transition – Change in the magnetic state, which can be interpreted to represent a data bit on tape.
FM (Frequency Modulation) – Also known as Differential Manchester recording, it was one of the first encoding methods for floppy disks. Sometimes referred to as single density. Although not particularly efficient in terms of the relationship between clock and data bits, it was considered very reliable. Eventually superseded by MFM and GCR encoding methods.
Format – Defines how data is written to the tape; it defines things such as the number and position of tracks, number of bits per inch and the recording code to be used.
Form Factor – Physical size of a device; for example, the width of a data cartridge drive. If the drive is a 5-1/4″ form factor this means that the drive is the same size as a 5-1/4″ diskette drive and uses the same fixing points. The same principle applies with the 3.5″ format, where a 3.5″ diskette drive may be exchanged for a 3.5″ data cartridge drive in your computer.
FRPI – Flux Reversals Per Inch. The number of flux changes per inch of tape. This may or may not be equal to the number of bits per inch stored, depending on the recording code in use.
FTPI – Flux Transitions Per Inch that may not necessarily be a flux reversal.
Full Backup – A backup of all selected data.
GCR – Group Code Recording. A data encoding method often used on data cartridge drives. GCR combines high data density with relative ease of decoding.
Headers – Blocks of data written at the beginning of tapes or files that contain specific identification information.
Helical Scan Recording – A method of recording data at high speed on to tape wrapped around an inclined drum. Used on DAT, 8mm, AIT, and of course, domestic and professional analogue video recording machines.
Hexadecimal – A number system that uses 16 as a base, often abbreviated to ‘hex’. The digits 0 to 9 must be supplemented by letters A to F to represent the numbers 10 to 15. For example, a hex number like 3C means 3 x 16 + 12 = 60 in decimal. Hex numbers are usually denoted as 0x??, where ‘??’ is the hex value.
HSM – Hierarchical Storage Management
Incremental Backup – A backup of everything that has changed or been added since the last backup of any type or scheme. Fast backup but slow restore.
Initialize – To write the Volume ID in the header before a tape is used.
Interblock Gap – A term often used in conjunction with legacy tape technology to describe the physical gaps between data blocks to allow enough time for the tape drive’s mechanics to get up to speed or slow down. This was important before the days of sophisticated data buffering techniques when drives had to operate at a reliable and consistent speed.
Interchange – To remove a tape from one drive and read the data on another tape drive.
Interface – Hardware and software used to establish communication between a host and device.
KB – Kilobyte = 1,000 bytes
KIT – Kills Information on Tape, system from Insurgo that erases the data but allows reuse of the tape as servo tracks are not destroyed during the process.
LaserDisc – Developed by Philips and MCA in the 1970s, and eventually became the modern Compact Disc and DVD we know today. The original LaserDisc was available in 12- and 8-inch diameter formats. Drives are extremely hard to find nowadays.
Labelled Tapes – Normally referred to as IBM labelled tapes, or ANSI labelled tapes, or ANSI X3.27. Typically found in legacy IBM mainframe environments and AS400 systems. The labels give details such as volume label, saveset name, creation and expiry date. They also include volume sequence, file sequence, block and record size.
Line Code – A pattern of voltages used to represent digital data.
LP – Load Point. The physical location on the tape where data recording begins.
LSB – Least Significant Bit. The bit on the right-hand side of the number is the LSB because it is the position used for units and therefore of least value.
LTO (Linear Tape Open) – Originally created in the late 1990s by the consortium of IBM, Hewlett Packard and Seagate, and was intended as an ‘open standards’ alternative to the proprietary tape technologies then available. Model generations range from LTO-1 through to LTO-8 (2017), taking individual native cartridge capacity from 100GB to 12TB.
MB – Megabyte = 1,000 kilobytes = 1 million bytes.
Media Cleaning – The physical cleaning of both sides of the tapes surface. Normally done on a “Tape Cleaner” specifically designed for that particular media type.
MFM – Modified Frequency Modulation. A recording code also used on floppy-interface QIC drives. Also known as ‘double density’ in the floppy disk world. It is the most efficient self-clocking code but requires “good” electronics to decode.
MiC (Memory in Cartridge) – Although found on several media types, it is particularly associated with LTO technology. For LTO media it is a non-contacting, passive RF chip. The chip can be interrogated to identify tapes, assist in discriminating between different generations, and to store tape-use information.
MSB – Most Significant Bit. The bit on the left-hand side of the number is the MSB because it is the position of maximum value.
MTBF – Mean Time Before Failure. Expected time before first failure.
MTTR – Mean Time To Repair. Estimated time to repair a drive.
Multiplexed Tapes – A solution to slow backups across networks. Several data streams are multiplexed on to a single tape drive. This means the input speed from a disk or across the network is less critical but the restore process becomes more complex.
Multimedia Files – Computer files containing audio, video or graphics.
Native Restore – Using the original backup software to perform the restoration of data. One is limited by the number of licences available, and often the precise version must be deployed to guarantee success
Nibble (Nybble) – half a byte or 4-bits. Often used to describe the amount of memory necessary to store the digit of a number in packed number format.
Non-Native Restore – Eliminates the need for the originating technology and simplifies access to the wide range of hardware and software formats that may have been used to create the original backups. Additional advantages include, bypass security and software-specific limitations; preserves file-level metadata; faster, therefore less costly. Use Media Merge/PC!
NRZ – Non-Return to Zero. An early telecoms/data storage Line Code representing binary data. More efficient than previous methods but now no longer used. Can be found encoded on very old open reel tapes.
Open Reel Tape (ORT) – A term used synonymously with 9-track tape. eMag was the last manufacturer of ORT.
Overwrite – Method of overwriting data on a tape without first erasing it.
Packed Numbers – A form of binary coded decimal where the values are stored in the ‘nibbles’ of the bytes, i.e. 1,234 is stored as 01 23 4F in hexadecimal. The last (right-hand) nibble ‘F’ represents the ‘sign’ which in this case is unsigned; an explicit negative number may be stored as 01 23 4D. In the days when storage was expensive, these numbers occupy less space, typically expanding by a value of (2n)-1
Packet – A set of binary bits, usually with reference to networks. The bits are often accompanied by error check bits and codes to indicate start and end of the packet.
Partitions – This is when a tape is divided into multiple logical tapes. Each logical tape acts like a single tape and can be appended to independently. Not many drives support partitions, mainly DAT, AIT and Travan. However, LTO-5 introduced a form of media partitioning for LTFS. Also, there are very few software applications that make use of partitions, with the main exception being digital voice loggers. The main advantage is directory or catalogue information can be updated and retrieved very quickly.
PE – Phase Encoding. Method of coding data; it has the advantage of being very reliable and easy to decode, but it is not particularly efficient in data density.
Protocol – A set of computing rules. A term usually applied to telecommunications, but also how data should be converted to accommodate the target system.
Punched Cards – A way of storing data on standardised thick paper cards. On early IBM systems, each card had eighty columns and therefore could store 80 bytes of information. Interestingly, that is the volume label format used on IBM labelled tapes that are still in current circulation.
QIC – Quarter-Inch Cartridge.
Quad Density – Often denoted on legacy 5.25-inch diskettes, where track density is increased over and above double density recording to 96tpi (narrower tracks). Some manufacturers introduced 100tpi formats, but these are incompatible with most drives.
Read After Write – Method of ensuring that data written to tape is correct by immediately reading the tape on a read head placed just after the writing head.
Reference Burst – Number of flux transitions written at the beginning of the tape to indicate the center line of the tape. This allows the read head of the drive to align itself correctly and improves the data integrity of the drive.
Restore – Retrieving information from a tape drive in order to replace data that was lost from a hard disk.
Retension – Winding the tape from the beginning of tape (BOT) to the end of tape (EOT), or EOT to BOT.
RLL – Run Length Limited. A family of codes used to encode data. The number of zero or one bits is limited to a certain value. GCR is an RLL code.
SCSI – Small Computer Systems Interface. A bus interface that enables many different kinds of devices, such as disk drives, CD-ROM drives and Tape drives, to interface with the host PC type computer.
Search – Method of finding a particular data file without having to read all the preceding data. Often done at a different speed than reading and writing. Usually applies to start/stop drives.
Sequential Device – Device that reads each data block sequentially as opposed to a random access device.
Signed – A way of denoting the positive or negative values of a number within the nibble. Various methods can be employed. See ‘Packed Numbers’, for example.
Single Track Error (STE) – A loss of a recorded “bit” due to anything that causes loss of signal. Single track error performance is a direct indicator of the magnetic quality of the tape.
Stiction – Age-related decay associated with older iron-oxide magnetic tape. As the read/write head heats the surface of the tape a sticky-residue forms, which upon rapid cooling causes the tape to temporarily ‘glue’ itself to the head assembly. An audible shrieking noise is often heard as the tape is read, which is likely to result in damage to the magnetic layer carrying the data.
Stacking – A method whereby multiple input tapes may be added together logically by appending to each other. For IBM labelled tapes, the headers will be adjusted to be consistent. The original application was to reduce the footprint that very large numbers of legacy tapes would occupy and reduce it down to a handful of high capacity tapes, i.e. LTOs.
Striped Tapes – A form of RAID for tape backups. Various schemes can be applied. For example, on a three-tape system, data would be written to two tapes and the third would be a parity, so that any one of the three tapes could fail without losing data.
Start/Stop – Tape drives that are capable of stopping and starting before and after each data block written or read from tape.
Streamer – Tape drives that write or read blocks without stopping between blocks.
Tapemark – See Filemark.
Track – Linear area of media on which data is written.
Underrun – Action that occurs when a streaming drive runs out of data to be written to tape; it stops and repositions the tape. This occurs when your processor is too slow to keep up with the streamer.
User Data – Data recorded by the user. User data is differentiated from other information recorded by either the drive or tape format.
Unsigned – Where the MSB is used to denote the value of a number rather than the sign.
Volume – Logical division of data. Consists of a number of files.
Volume Label – Data block written at the front of a volume to identify it.
Winchester Disk – A rather old-fashioned reference to what we now call the generic ‘hard disk’. Although development of rigid random-access devices goes back to the 1950s, the IBM 3340 (codenamed ‘Winchester’) was launched in March 1973.
Word – Two bytes (or 16 bits).
WORM Tapes – Introduced on LTO-3 technology, a method preventing the erasure of data. Often used in financial and legal applications where data must be preserved over a long period of time. Cartridges are identified by a two-tone colour scheme where the bottom half is a light grey.
XOR – (Exclusive OR), a logical comparison. The action compares two bits and gives a result of 1 only if the bits that are compared are different. Extensively used in password systems.
Zeroize – To set the memory or group of variables to zero.
eMag New York
New York, New York 10006
10745 Westside Way
Alpharetta, Georgia 30009
eMag EMEA & APAC
2A Oaktree Court
Cardiff Gate Business Park
© eMag Solutions 2018