In an information environment, an organization's success is tightly coupled to its ability to store and manage information. Storage systems provide a critical part of an organization's network infrastructure. With the amount of data growing at an incredible rate, your storage strategy must keep pace. In designing a storage strategy for your organization, you must select the right technology for your primary storage system, implement solid backup procedures and ensure ongoing management of the system.
This chapter explores the critical issues related to data storage in a network and will guide you through the many options available for storage, whether yours is a small-scale departmental LAN or a massively large enterprisewide network. The available technologies differ in terms of capacity, performance, cost and reliability. The network architect's job involves selecting the right combination of hardware and software options that will meet the data storage needs of the organization.
The Need for Storage
A computer's main memory holds active programs, data and computational results. The computer's efficient operation depends on data held in memory being available in a few nanoseconds. A computer's main memory uses Dynamic RAM (DRAM): It stores data and provides almost instantaneous access to that data, but is limited in and is gone once the computer is turned off.
Permanent storage holds the data and software that must be preserved even when the computer is powered down. Permanent storage needs can be immense. An organization's library of software applications can easily exceed many gigabytes, and the quantity of data can range in the terabytes. Financial systems, customer databases, electronic documents, bitmap images, digital sound and video are but some examples of the data on which organizations rely. A key part of the network infrastructure involves the hardware and software that stores the organization's ever-growing data.
In designing a data storage strategy for an organization's network, the stakes are extremely high. The data stored on the network is a vital resource that cannot be re-created. Give care to provide adequate capacity, fast performance and reliable access, but at all means never allow data loss.
This chapter focuses on the hardware and software technologies that directly relate to data storage itself. We will not attempt to cover the technologies for providing access to storage on a network and will not argue the nuances of the competing network operating systems and network communications issues. Our concern lies in the hardware and software solutions that become the building blocks of a comprehensive data storage strategy.
Storage Strategy Design Issues
As you construct a storage system, keep the following goals in mind:
- Prevent data loss
- Offer adequate capacity that can easy scale as storage needs grow
- Provide fast access to data without interruptions
- Be prepared for equipment failures
- Use cost-effective technologies
Not all issues that have an impact on an organization's data strategy can be solved with technology. Individuals must follow sound practices with institutional data. Be sure that the users place their data within the supported data structures. Users cannot, for example, store vital data files on the local drives of their computers if the organization's storage strategy assumes that all data resides on network servers. End users are unlikely to perform frequent backups of their data or follow other procedures that ensure that institutional data are secure. An important part of an organization's storage strategy involves training users to trust that the network storage facilities are secure and well-managed.
Part of the design of the storage strategy includes defining what constitutes the supported data environment. Will you support data storage on desktop computers? Or must all institutional data reside on servers? Although it is possible to include desktop computers in the storage strategy to a limited extent, this practice should be generally avoided. Most backup software products will archive the contents of distributed computer hard drives, however, few of the other requirements of a well-managed storage strategy can be met with this approach. In this chapter, we will focus on storage systems designed to be integrated with network servers.
Several factors come into play when selecting storage options. Following is a survey of the basic concerns that will be discussed in detail later in the chapter.
Your storage system must be able to handle the appropriate quantity of data. Be aware of the organization's current data storage needs and the expected rates of growth. You cannot plan a storage strategy without detailed knowledge of the quantitie s of data involved.
The type of storage technology must be well-matched to the overall size of the organization's data needs and must be able to outpace its expected growth. Storage strategies implemented when the organization's network was relatively small often cannot be expanded beyond a certain point. Storage technologies designed for largescale enterprise networks may be burdensome for a departmental LAN. The storage systems on the network must be designed from the beginning to scale to larger data capacities without major upheavals. Avoid the disruption and costs associated with redeploying a whole new data storage system because you outgrew the previous system. Rather, choose a system that will continue grow as your data needs grow.
Select the least-costly approach that effectively meets the objectives. Many cost issues must be considered: the initial purchase cost of the hardware; the produtivity costs related to network down time; and ongoing hardware and software maintenance, for example. Do not ignore the personnel costs associated with each storage technology option. More complex solutions will demand the time and attention of network administrators, technicians and operators. Simpler approaches should require less ongoing support.
Storage technologies must be able to deliver information to the user rapidly. Fortunately, many current systems have very high performance capabilities. Designing a storage solution to service a relatively small number of users can be fairly straightforward. But a network with an extremely large user population will challenge the network architect to design a system that can handle an extremely high rate of simultaneous activity and still deliver rapid access.
All storage systems rely on parts that will eventually break down. It is possible to develop a data storage environment with enough redundancy to ensure that no interruptions can occur, even if individual components fail or malfunction. Such high-availability comes at a price--both in terms of the cost of the equipment and in the complexity of its operation. Smaller-scale departmental networks may be satisfied with a data system that can potentially fail, provided that it can be restored with little or no data loss within a reasonable time. It is relatively simple and inexpensive to build a storage system that is available 99 percent of the time. Eliminating that last 1 percent or 2 percent of failure possibilities is complex and expensive.
Once a storage system has been designed and implemented, the organization must maintain it. Aim for the system with the simplest operational concerns. As systems increase in complexity it becomes increasingly important to be able to monitor their performance, preempt failures and manage storage media with as little effort and interaction as possible. Also, this functionality must come without sacrificing the depth of management available to the administrator.
Cost AnalysisA vital part of designing a storage system involves a careful cost analysis. Be sure to include all cost components. The major ones include:
- Capital outlays for hardware: disk drives, controllers, enclosures, power supplies, jukeboxes, tape drives, CD-ROM drives, etc.
- Media costs: tape cartridges, optical platters
- Any required software: RAID utilities, backup software, HSM software, systems management utilities
- Installation costs: vendor installation costs, internal personnel costs
- Ongoing costs: hardware maintenance, software maintenance and upgrades, product/technical support
The development of the cost analysis may be an iterative process. Design a system with the highest performance and reliability possible with generous capacity in your initial analysis. If this package does not then fit within your budget allotment, then incorporate other lower-cost storage technologies for selected tiers of storage.
The security of your organization's data reigns supreme in the design of a storage system. It is important to assess the impact of the possible failure of your organization's storage system. Be sure to consider the following factors:
- Productivity losses. Many, if not most, individuals in the organization will not be able to carry out their normal activities when key components of the technical environment--such as the network storage system--are inoperable.
- Asset recovery costs. When data are lost, the organization must channel resources into its recreation. Recovering data from archives and backups requires effort of technical staff. The re-keying of irrecoverable data can be a massive undertaking.
- Loss of active sales. Organizations that depend on their storage environment to support sales may not be able to process financial transactions during periods of failure.
- Loss of customers. Prolonged or frequent failures diminish confidence in an organization by its clients.
It is relatively cheap and easy to construct a storage system that works 98% of the time. Eliminating the last few points of downtime possibilities can double or triple the cost of a storage system. The investment you make toward enhancing reliability operates much like an insurance policy. The financial value of the organization's operation guides the expense that can be justified for reducing the likelihood of downtime and reducing the recovery time from failures.
As you design a storage system, carefully measure the risk of failure for each alternative. Consider a wider range of alternatives. Start with a mental model of a system that initially seems to match your organization, and then work through design changes that both increase and decrease reliability. Finally, weigh the cost implications of the design alternatives against their relative risk factors and the impact of downtime to the organization. No magic formulas apply to risk analysis, but considering all these factors will lead to a solution well-matched to your organizations needs.
The development of a storage strategy involves planning for the quantity of storage and the level of performance required. Although there are no iron-clad rules for planning the capacity of your storage system, here are some general guidelines.
Measure the amount and significance of current data. Determine the extent of your current data environment. It should not be difficult to measure of the amount of data in use. Look for ways to consolidate and simplify. Look for pockets of data not currently managed in your central storage system that should be. Are there important institutional data files currently being stored on local drives of desktop computers? Do you have departmental servers that have data that needs to be managed in a more secure environment? Work closely with your organization's management to clarify the scope of the project.
Plan for growth. Once you have carefully calculated current data needs, you must predict its growth rates for the next few years. Although historical information about past growth rates may be useful, if your organization is like most, plan for significantly faster data growth in the future than you experienced in the past. Make sure that you are aware of any special projects that units within your organization may be planning for the next few years. New databases, document management systems, archiving and digitizing projects and the like could all have massive implications for your future storage requirements.
Allocate excess capacity. How much excess capacity should you provide in the first year of a new storage system? You should plan for 20 percent to 30 percent reserve capacity for the first year, just to keep the system operating smoothly. You also will want to purchase additional capacity for your anticipated growth for the second and third year of the system. Depending on your growth rates, you will probably size the system for at least double your current needs.
The system will need to grow beyond its initial configuration, so make sure that the components needed for expansion are likely to be available in the future.
Expect to expand later. Do not purchase capacity too far in advance. Storage hardware, like other computing components, will most certainly decline in costs over time. You will probably be able to purchase higher-capacity storage units for less cost in future years.
System longevity. What is the lifecycle for the storage system you are designing? Always hedge your bets for your storage system. New technologies constantly arise, and one may come along in the next few years that will offer benefits above the system you are designing today. Keep realistic expectations on how long your storage system will last before its replacement. A five-year life span is typical for these systems.
Tiers of Storage. While the majority of organizations can manage their data with a single media-usually magnetic storage-large data environments may require a tiered approach. It may be impractical to build a system large enough from the highest-performing hardware. If your environment falls into this category, then you will need to analyze the various data sets and determine how much of your system must be managed with high-performance storage and what might be relegated to lower-cost media. An example of a multitiered storage environment would be one with small store of solid state disks, a large set of magnetic disks organized in a RAID structure, a jukebox of optical discs and a tape drive for backups and archives. Budget considerations and performance demands factor into the proportion of storage allocated to each tier. The distribution of data files among a multitiered storage system can be automated through a genre of software called Hierarchical Storage Management (HSM).
Storage Options: General Guidelines
A variety of technologies are available for network storage. Each of these hardware and software solutions form the building blocks from which you will build a comprehensive storage solution. The size of your data environment, the kind of applications you run, the performance you expect, the level of reliability required and the cost expectations each factor into the picture.
The storage environment will use a hierarchy of technologies. The technologies with the best performance come at the highest cost. Organizations with data sets of a certain scale can build their storage system completely from high-performance hardware. Others have such large data considerations that make it impractical and unaffordable to use all high-performance storage and must therefore place some data on slower and less expensive media. Those that can store completely on high-performance media may still want slow/removable technologies such as tape, so that they store backups of critical data off-site.
The fastest and most expensive storage technology is magnetic disk drives. The capacities of magnetic drives have historically risen faster than their costs, making this technology affordable for an increasingly larger portion of the information storage paradigm. Currently a 9-GB drive can be purchased for around $2,000, yielding a cost of $0.25 per MB. When these drives are combined into arrays, the costs expand to include redundant drives, controller hardware, redundant power supplies, specialized enclosures and management software.
Magneto-optical technologies offer a lower per megabyte cost, if you consider only the media costs. Magnetic discs offer a level of performance just a notch below that of magnetic storage and are popular in environments that have extremely large data sets. Less frequently used data can be transferred from fast/expensive magnetic storage to cheaper/slower optical platters for a more cost-effective storage environment. For this reason, MO technologies are popular in HSM environments. While optical media itself is relatively cheap, its use may involve some fairly expensive hardware. At the very least you will need optical drives that cost a few hundred dollars each. In most cases you will also need equipment to manage the optical media. An optical minichanger may cost less than a thousand dollars, but a full-blown optical jukebox that can manage 100 to 500 discs can cost tens of thousands of dollars. Make sure that you factor in the costs of the disc changing hardware and the additional software as you calculate the cost effectiveness of optical technologies.
Magnetic tape stands at the lower end of the storage technology spectrum in terms of performance--it is relatively slow and has a correspondingly low media cost for high capacity. Both magnetic and optical storage allows random access of information on the media. Tape, however, is a serial-access media. This is fine for applications that read and write data files sequentially, such as backup operations. The use of magnetic tape also comes with significant hardware costs, including the purchase of one or more tape drives, SCSI controllers and software. Several varieties of magnetic are available: Digital Audio Tape (DAT) of the 8-mm and 4-mm varieties and Digital Linear Tape (DLT).