NFS IN A NUTSHELL
NFS was originally designed and implemented by Sun Microsystems for use on its UNIX based workstations. Other manufacturers now support it as well, for many operating systems including MS-DOS. NFS supports heterogeneous systems, for example, MS-DOS clients making use of UNIX servers. It is not even required that all machines use the same hardware. It is common to find MS-DOS clients running Intel 386 CPUs getting service from UNIX file servers running on Motorola 68030 or Sun SPARC CPUs.
Three aspects of NFS are of interest: the architecture, the protocol, and the implementation. Let us look at these in turn.
NFSˆ2
“The main goal of NFSˆ2 is to build a Single distributed File system from many NFS servers under a single name space.”
NFS servers are widely used to provide file service on the Internet. However, adding new servers to an existing namespace is management intensive, and in some ways inflexible. When a new server is brought online, all clients requiring access to the new server must be updated to mount any new file systems from the server, and access rights for the new file systems must be configured on the server. Additionally, the new file systems are bound to sub-trees of each client’s namespace.
The NFSˆ2 architecture allows standard NFS servers to be combined into a single, scalable file system. Each NFS server is essentially treated as an object store. New servers added to an NFSˆ2 system merely add more object storage—they are not bound to a particular location in the namespace. Clients accessing the NFSˆ2 file system need not be aware as new NFS servers are added or removed from the system.
Properties of NFS^2
This architecture has some interesting properties:
1) The physical file systems that make up an NFSˆ2 instance, i.e., the file systems on the individual NFS servers, may be heterogeneous. This, combined with the way the NFSˆ2 namespace is constructed, allows files of different types (text, video etc.) to be served from file servers (potentially) optimized for each type.
2) Its storage is strictly partitioned-each NFS server is solely responsible
for allocating the resources under its control. This eliminates resource
contention and distributed lock management, commonly found in cluster file
systems.
3) Third, because the system may be constructed with standard NFS servers, it
can benefit from existing HA solutions for individual nodes, and improves as
NFS servers improve.
SEPARATEING DIRECTORY STRUCTURE FROM PHYSICAL SYSTEM
Traditionally, file systems contain directory structures that are tightly bound to a particular file system implementation. These structures may be embedded, both logically and physically, in the file system and contain data that are specific to the file system implementation. Changing the directory structure of a file system can be extremely tedious: the file system code must be changed and rebuilt, new file system initialization code (mkfs) is needed, and new recovery code (fsck) is also likely to be necessary. Several areas of file system research could benefit from a generic directory structure that is implemented above the physical file system layer, allowing experimentation with directory contents and possibly alternative naming schemes. Each underlying file system has its own definition for the structure of a directory. Directory files provide a directory structure that is independent of the underlying file systems and allow great flexibility in customizing directory structures to particular tasks. A directory file is simply a file in the underlying file system that contains “pieces” of the namespace for a file system built on one or many underlying file systems. Depending on implementation, directory files may be interpreted by file system clients, servers, or both. Directory files serve the same purpose as traditional directories—they completely describe the namespace of their file system, and they are used very similarly to existing directory structures. Their advantage lies in their independence from underlying file systems, and the flexibility this allows.
LOAD BALANCER FOR NFSˆ2
I think that as far as load balancing is concerned, Chain De-Clustering would be a good configuration in NFS^2. This section describes chained-declustered data access and recovery modules. These modules give clients highly available access to data by automatically bypassing failed components. Dynamic load balancing eliminates system bottlenecks by ensuring uniform load distribution even in the face of component failures. We start by describing the basic idea behind chained-declustering and then move into detailed descriptions of exactly what happens on each read and write operation. Figure 5-1 illustrates the chained-declustered data placement scheme. The dotted rectangle emphasizes that the data on the storage servers appear as a single virtual disk to clients. Each sequence of letters represents a block of data stored in the storage system. Note that the two copies of each block of data are always stored on neighboring servers. Furthermore, every pair of neighboring servers has data blocks in common. Because of this arrangement, if Server 1 fails, servers 0 and 2 will automatically share Server 1’s read load; however, Server 3 will not experience any load increase. By performing dynamic load balancing, we can do better. For example, since Server 3 has copies of some data from servers 0 and 2, servers 0 and 2 can offload some of their normal read load on Server 3 and achieve uniform load balancing.
Chaining the data placement allows each server to offload some of its read load to the server either immediately following or pre- ceding the given server. By cascading the offloading across multiply servers, a uniform load can be maintained across all surviving servers. In contrast, with a simple mirrored redundancy scheme that replicates all the data stored on two servers, the failure of either would result in a 100% load increase at the other with no opportunities for dynamic load balancing. In a system that stripes
over many mirrored servers, the 100% load increase at this single server would reduce the overall system throughput by 50%. Each client keeps track of the number of requests
it has pending at each server and always sends read requests to the server with the shorter queue length. This works well if most of the requests are generated by a few clients but, obviously, would not work well if most requests are generated by many clients that only occasionally issue I/O requests.
An additional advantage with chained-declustering is that by placing all the even-numbered servers at one site and all the odd numbered servers at another site, we can tolerate site failures. A disadvantage of chained-declustering relative to simple mirroring is that it is less reliable. With simple mirroring, if a server failed, only the failure of its mirror server would result in data becoming unavailable. With chained-declustering, if a server fails, the failure of either one of its two neighboring servers will result in data become unavailable. In the implementation of chained-declustering, one of the two copies of each data block is denoted the primary and the other is denoted the secondary. Read requests can be serviced from either the primary or the secondary copy but the servicing of write requests must always start at the primary, unless the server containing the primary is down in which case it may start at the secondary. Because we lock copies of the data blocks before reading or writing them to guarantee consistency, this ordering guarantee is necessary to avoid deadlocks.
On a read request, the server that receives the request attempts to read the requested data. If successful, the server returns the requested data, otherwise it returns an error code and the client tries another server. If a request times out due to network congestion or because a server is down, the client will alternately retry the primary and secondary servers until either the request succeeds or both servers return error codes indicating that it is not possible to satisfy the request. Currently, this happens only if both disks containing copies of the requested data have been destroyed.
On a write request, the server that receives the request first checks to see if it is the primary for the specified data element. If it is the primary, it first marks this data element as busy on stable storage. It then simultaneously sends write requests to its local copy and the secondary copy. When both requests complete, the busy bit is cleared and the client that issued the request is sent a status code indicating the success or failure of the operation. If the primary crashes while performing the update, the busy bits
are used during crash recovery to ensure that the primary and secondary copies are consistent. Write-ahead-logging with group commits makes updating the busy bits efficient. As a further optimization, the clearing of busy bits is done lazily and we maintain a cache of the most recently set busy bits. Thus, if write requests display locality, a given busy bit will already be set on disk and will not require additional I/O.
If the server that received the write request is the secondary for the specified data element, then it will service the request only if it can determine that the server containing the primary copy is down. In this case, the secondary marks the data element as stale on stable storage before writing it to its local disk. The server containing the primary copy will eventually have to bring all data elements marked stale up-to-date during its recovery process. A similar procedure is used by the primary if the secondary dies.
Download the complete seminar report the above given article is just an intro :
0 comments:
Post a Comment
Thanks for your Valuable comment