Study of Concurrency Control Techniques in Distributed DBMS

oncurrency control is one of the important task of any database management system. Without the proper concurrency control technique it is infeasible to maintain the integrity of the database system in concurrent environment. The concurrency control algorithms focus on maintaining consistency and integrity of databases through synchronized access. In centralized environment it is simple to synchronize among the various concurrent transactions. But, it becomes very complex as compared to centralized framework when the concurrency control algorithms are implemented for distributed framework because of the requirements of consistency and integrity within the multiple fragments / copies of the database, synchronization among the various distributed concurrent transactions, and isolation of the complexities of the algorithms / operations. A variety of concurrency control methods have been proposed by the many researchers so far for the centralized framework as well as for the distributed framework. The concurrency control methods are broadly classified as locking based methods, timestamp ordering based methods and optimistic methods. Two-phase locking based methods are the most popular and widely used methods used in both centralized and distributed framework of the database systems. A number of algorithms based on two-phase locking have been proposed by the many researchers for Distributed DBMS. This paper consolidates and discusses various lock based concurrency control techniques for Distributed DBMS. This paper also presents a comparative study of various two phase locking based concurrency control techniques.


Introduction
Database system is the backbone of the many organizations as is stores and manages the operational data of the organization. Most of the applications of the organizations are dependent on database systems. The database systems can be either implemented based on centralized approach or distributed approach. The database system which implemented based on distributed approach is called as Distributed Database System. A database system (DBS) is a combination of Database Management System (DBMS) and Databases of the organization. The shift from centralized to distributed architectures over the years attributes to the demand for higher performance and availability. Distributed database system (DDBS) technology in the field of database systems is also the result of the same. This technology may be viewed as combination of database system, computer network technologies and the concept of distributed computing (or distributed processing) [3,12,14,17].
"A distributed database (DDB) involves a collection of number of logically related databases spread over a computer network" [9]. Özsu [9] defined distributed database management system (DDBMS) as "the software system concerned with the management of the distributed database and makes the distribution transparent to the users." The combination of DDB and DDBMS is referred to as "distributed database system" (DDBS).
Internally, DBMSs perform several functions in order to manage and manipulate the data properly such as transaction management, concurrency control, recovery, security, etc. The concurrency control ensures the consistency and integrity of the databases in the concurrent execution environment. As we know that the DBMSs support database sharing among various transactions. The uncontrolled concurrent execution of the transactions may lead to the database systems into an inconsistent state. Therefore, there is requirement of controlling the concurrent execution of the transactions so that the consistency and integrity of the database systems can be ensured. The concurrency control is the mechanism through which DBMS can ensure the consistency and integrity of the databases even in the case of concurrent execution of the transactions without affecting the degree of concurrency.

Concurrency Control
In DDBS, at the same time, multiple users can access the database concurrently where each user thinks that he/she is working alone on dedicated system but this is not the case. In case of concurrent execution of transactions, the consistency of the database can ensured with the help of serializable schedules because the result obtained will be equivalent to one of the serial execution of the transactions. The use of only serial schedules limits the degree of concurrency hereby affecting performance. Therefore, concurrency control techniques need to be applied to guarantee that the schedules generated by concurrent execution of transactions are serializable [1].
Concurrency control involves the coordination among concurrent accesses to maintain consistency and integrity of database [1,24]. The major problem in attaining this objective is to make ensure that database updates performed by one the transaction should not affect the updates and retrievals of the another transaction [9,12,13,16,25]. The problems regarding concurrency control is more difficult in a DDBMS due to the following reasons:  Data may be accessed by the multiple users at number of distant sites,  Database is fragmented and/or replicated across multiple sites,  Complexity during the synchronization among concurrent transactions executed at multiple remote sites by multiple users, and  Concurrency control techniques implemented at one location must ensure the consistency of the database at all other sites.

Concurrency Control Techniques
A number of concurrency control mechanisms have been proposed by the leading researchers so far. These concurrency control mechanisms broadly can be classified in the categories such as (a) Locking Based Protocols, (b) Timestamp-Ordering Based Protocols and (c) Optimistic Protocols. The first two categories are based on the pessimistic approach. The pessimistic protocols are used when there is high activity on the databases whereas the optimistic protocols are used in case of low activity on the databases. Implementation of these concurrency protocols / algorithms for the centralized environment is simpler than that of distributed environment.
The number of concurrency control techniques implemented for centralized environment can easily be extended to handle the problem in distributed databases, but there are some concurrency control techniques that are not suitable for a distributed environment [1,16,25]. The detailed classification of concurrency control techniques is as follows:

Pessimistic Techniques
These techniques involve the synchronization among the transactions during initial phase of their execution life cycle as these consider the high conflicts among the concurrent transactions. The pessimistic techniques are mainly suitable when there is high activity in the database system. In other words, if a large number of conflicting transactions are executed concurrently on the database systems frequently then it is preferred to use pessimistic techniques which reduce the wastage of resources if conflicts identified at later stage. In addition, if particular application involves activity that is beyond the rolling of database capability like printing, than such activity do not take place whenever conflict is generated, hereby removing the responsibility of performing undo operation. The pessimistic techniques further can be classified as:

Two Phase Locking (2 PL) Based techniques
Two phase locking technique is based on the locking technique which ensures the serializability of the concurrent transactions in order to maintain the consistency and integrity of the database system. In this technique, each data object of the database is associated with a shared variable called as lock which stores the state of the data object in order to control the shared access of the data object by the mutually exclusive transactions [1,9,19,21,23]. In two-phase locking, there are two phase of operations. The first phase is called as growing phase in which the transactions can acquire or upgrade the locks and in the second phase the transactions can release or degrade the locks only. To implement two phase locking the following rules need to be followed (a) Conflicting locks should not exist in two transactions. (b) Unlock operation cannot be performed before lock operation in any of the transaction. (c) Until and unless all locks are obtained, no data are affected in any of the transaction. This approach may results in deadlocks and starvation.

Timestamp-Ordering Based techniques
In this technique, the transactions are ordered based on some value called as timestamp. The timestamp can be either a value generated by a global incremental counter variable or current timestamp of the master clock which ensures the uniqueness of the timestamp. The transaction which is created earlier is assigned a lower timestamp as compared to the transaction which is created later. The transaction with lesser timestamp is older than the transaction with higher timestamp. Therefore, the ordering of transaction can be done based on the assigned timestamp and hence called as timestamp-based ordering technique which avoids the deadlock situation [1,19,20,22,23]. In this technique, each data item X is stored as a triplet X = {x, write-timestamp, read-timestamp} where x is the latest value of X, write-timestamp is the timestamp of the youngest transaction who has written the latest value of X, and read-timestamp is the timestamp of the youngest transaction who has read the value of X. There are two-approaches based on this technique (a) waitdie and (b) wound-wait. This technique has the advantage of eliminating deadlocks as the transactions need not to wait. This technique results in cascading rollbacks. Starvation can also occur if same transaction is aborted and restarted again and again.

Integrated
Both two-phase locking technique and timestamp-based ordering techniques are having their relative pros and cons. In order to exploit the pros of both the techniques, some DDBMS uses the integrated approach in which combination of the above two techniques are implemented at the same time. This approach proves to be advantageous when heterogeneous databases are connected together [1].

Optimistic techniques
In these techniques, synchronization of concurrent execution of transactions is delayed until their termination [19,20,21]. These techniques allow the transactions to perform their operations as they desire except write-phase in order to maximize the degree of concurrency. In this approach, the operations of the transactions are logically divided in to three phases (a) read phase, (b) validation phase, and (c) write phase. In the read phase, all the transactions are freely allowed to perform their operations in the local memory. Before writing the changes to the database the transaction has to pass the validation phase in which DDBMS check whether the proposed write phase of the transaction will lead the database in to consistent state or not. If validation phase indicates the consistency of the database then the write-phase will be performed by the transaction otherwise the write-phase of the transaction is denied. Optimistic approach can be implemented either using 2PL or timestamp-based ordering techniques. No chance of cascading rollback is there because the actual write operation occurs only when the transaction initiating the write operation has committed.
The comparison between pessimistic and optimistic approach is presented in Table 1.

Two Phase Locking Based Techniques
Locking-based techniques are widely used in centralized database systems in which logical and physical locks on data items are used for synchronization among transactions. If any transaction wants to read and/or write any granule of database, it has to first acquire the lock on that granule and after the completion of operation the locks has to be released. The locking and unlocking on data items is handled by lock manager on behalf of transactions. There are four major categories for implementing lock based algorithms:

Centralized two phase locking algorithm
It is a variation of 2 PL in which a single lock table is maintained at any one designated site for the whole distributed database. All the requests for locking and unlocking are sent to that site only [1,26]. Only the designated site decides the grant of requisite lock to the requested transactions based on the compatability of requests. This technique is useful in the case of both replicated and fragmented distributed databases. The load on the single site will increase as all the rquests for locking and unlocking are made at single site only. If the designated site fails, then the operation of the DDBMS will fail.

Primary Copy two phase locking algorithm
This approach is useful when multiple copies of same data item are stored at different locations (replication). One of the copies at any particular site is designated as a primary copy. All the requests for locking and unlocking are sent to the designated site only with respect to the specific data items. [1,14,15,18,20]. This approach is suitable for replicated or mixed databases. Effect of the failure of one site is lesser than that of centralised two phase locking as each site is designated as primary copy for specific data items only.

Distributed two phase locking algorithm
Locking of data items are done at all sites where transaction accesses these data items. Once locking of all data items have been completed by the transaction, only than unlocking begins [24]. The trasactions are required to submit the locking and unlocking requests to all sites. The lock request will be granted to the transactions when it is allowed by all the sites. This approach is more complex then centralized two phase locking and primary copy two phase locking. In this approach, the network traffic will increase. Lot of communication overhead is involved and handling of deadlocks becomes more complex. This technique is more reliable as failure of centralized site will affect less number of transactions.

Majority Consensus 2 PL Algorithm
This technique is the variation of distributed two phase locking algorithm. Unlike, distribute two phase locking, it requires locking on majority of copies, i.e. at least (n+1) / 2 copies, instead of all copies. If the data item is to be updated, the transaction would have to send updated value to all sites where data item is stored [1,18]. Less communication ovehead as compared to distributed two phase locking technique as less number of requests have to be made for locking and unlocking data items.This algorithm is more efficient than the distributed two phase locking algorithm.
The comparison among above two-phase locking based techniques is presented in Table 2.

Conclusion
Several categories of concurrency control methods have been proposed and implemented for Distributed DBMS. The pessimistic methods are still widely used but are cause of concern as they may lead to number of problems like lowering the degree of concurrency, deadlocks, starvation and cascading rollbacks. The optimistic methods can be used to increase the degree of concurrency but the current works on optimistic methods concentrates mainly on centralized DBMS in contrast to distributed DBMS. The pessimistic approach is mostly suitable for the distributed database systems with high activity ratio whereas optimistic approach is mostly suitable for the distributed database systems with low activity ratio.