Disk Storage: Logical Data Storage
One of the major functions of an operating system is to provide access to
files of records on disk. There is a significant amount of work required
in order to translate between the "records" and "files" as worked with
in most high level programs and with the way the data is physically stored.
Data stored on disks is most conveniently viewed as a collection of
"records"stored in "files", in many ways analogous to paper records stored
in file folders.
Hierarchy of Logical Data Storage
File
- a collection of records associated with a
collection of related individuals and stored together
Record
- a collection of information or data values
Field
- an element of a record which can contain a
single datum of information
-
Words, Bytes, and/or Bit Strings
- elements from which
fields are constructed, but which of themselves only have meaning
when reguarded in the context of the field
-
Databases
- one or more files containing both a description
of the fields and actual data values stored in (typically) multiple
occurances of those fields
Record Access Methods
Sequential Access
- Records are accessed in the order
in which they have been stored in the file. A record will only
be read (or written) after all preceding records have been read
(or written). Until the "end of file" is reached, there is only
one possible record which can be read next (the "next record"
stored in the file). Without recreating the file, the sequence
of records can not be modified.
Random Access by Record Number
- When all records in a
file have the same, fixed length, it is generally possibe to read
or write a record based on its relative position in the file
(i.e. by its "record number").
Random Access by Byte Offset
- Although less often supported
by high-level languages, at least theoretically, a record can be
accessed without accessing its predecessors, provided the distance
to that record within the file, as a fixed number of bytes, is known.
This in fact forms the basis for being able to access records by
their relative record numbers.
Intra- and Inter-Linked Files
- Sometimes individual records
can be associated with other records either in the same file
(intra-file linkage) or with records in a different file (inter-file
linkage). To enable such an association the "primary" record contains
information which enables the "secondary" record to be located;
typically this information may be the "secondary" record's relative
record number.
Key Field Access - Binary Search
- When the records of a
file are arranged within the file based on sequentially ordering the
records on the data value in a "key field" and when all the records
of such a file have a common, fixed length, then it is possible t
search for a particular record having a specific "key field" value.
An effective way to do this is to start by looking at the record in
the middle of the file and, assuming this is not the required
record, using the half of the file containing the required key value
(the first half if the required key value is less than the mid-point
key value; the second half otherwise) as the basic group for the next
mid-point division. In this way, for example, a record could be
found within a file of a million records with no more than 20 record
accesses.
Key Field Access - Hashed Keys
- In some cases, even faster
record access can be achieved by performing some arithmetic operation
on the key field to reduce it to a record number value (this is called
"hashing"). Typically, in this environment, not all record location
are actually used, and the method is really only effective if the
arithmetic algorithm does not result in many record keys "hashing" to
the same record number.
Key Field Access - Indexed Files
- Another alternative is t
keep a separate, secondary file which contains the values of the key
field to be used (usually a different field than used for sequencing
the main file). Along with each key field value, this secondary file
stores the record number in the main file of the corresponding record.
This secondary file acts as an "index" to the records of the main
file. Typically this secondary (index) file will be sequenced so a
to permit a binary search based on the key value.