DTC and the 2 Phase Commit Protocol
Atomic Transactions rely on automatic commit and rollback actions. Therefore they need a system to manage these actions implicitly. The name of this system is Transaction Manager.
There are multiple kinds of transaction managers, each used for a certain transaction protocol. For a detailed discussion check my WCF Transactions Pluralsight course. But for the sake of this discussion, we care about the WS-AT protocol, and a particular Transaction Manager – called the Distributed Transaction Coordinator – or DTC.
DTC is familiar to most developers, and is capable of managing transactions across process and machine boundaries.
DTC and Atomicity
So DTC has a tough challenge: while the Consisteny, Isolation, and Durability attributes are managed by Resource Managers (RMs), DTC must preserve the atomicity attribute, and it must do this over multiple participants that are most probably distributed over multiple machines.
For example, DTC must make sure that atomicity is maintained over two DB RMs, so “add” and “deduct” operations are either committed as unit, or rolled back as unit.
DTC achieves this task by implementing the 2 Phase Commit Protocol.
2 Phase Commit Protocol
The 2 phase protocol, is made up of two phases, which when both complete, TMs can guarantee atomicity.
- Phase 1 starts when the TM sends a signal asking all RMs participating in a transaction, to “vote” for either committing or rolling back their part of the transaction. Each RM decides if it wants to commit or roll back.
- If a RM decides to commit however, this does not mean that it will actually commit at this stage. It will only persist a copy of the transaction result to its durable storage. At this stage, the RM is said to be prepared to commit.
- The TM collects the votes from all RMs, and decides on the final outcome of the transaction.
- If both RMs vote to commit, then the TM signals them to actually commit the transaction results. Else, if either RM voted to rollback, the TM signals both RMs to rollback.
- In either case, RMs notify the TM of the final status.
Now let’s see an example on how DTC and the 2 Phase Commit Protocol, play together to manage distributed transactions:
Here we have two resource managers, running on different machines, each machine has its own DTC. The 2 Phase Commit Protocol works as follows:
- Phase1 starts when each DTC on each machine asks the corresponding RM to vote for committing or aborting.
- Each RM votes.
- Now DTC on machine A is the root DTC considering that it is the one that started the transaction. As such, the Root DTC is responsible for collecting the votes of the other DTCs on other machines. So, it asks DTC on machine B about the result of its voting procedure. DTC on machine B then replies back with the result.
- The Root DTC then decides on the outcome of the transaction based on the votes that it has collected from its own RM and from the DTC at B.
- In Phase 2, the Root DTC sends its command to the participating DTCs so each will then instruct its own RMs to commit or abort.
This entire messaging between DTCs is done using what is called a Transaction Protocol. There are multiple types of transaction protocols; and these are explained in my WCF Transactions Pluralsight course. But for the sake of this discussion, the protocol we’re interested in is the WS-AT.
So now that you have seen all the components participating in a transaction, WS-AT is the protocol that dictates the communication between all participating applications inside a transaction. It defines what a transaction is made of and what are the criteria which will determine the successful completion of a transaction.
However, in order to understand WS-AT, you first need to understand WS-Coordinaiton. WS-Coordination can be thought of as the generic specification that WS-AT builds on. So what is WS-Coordination?
First step in understanding what WS-Coordination is, is to provide a common understanding of the term ‘activity’.
An activity is made of a set of actions. These actions work together to achieve a certain goal. A general example would be taking an exam activity. To accomplish this activity, actions such as exam reservation and fees payment must be taken. Similarly, an activity would be money transfer: The actions that has to be completed are add and deduct.
Now, the key thing to note here, is that these actions must not work in isolation; rather, their work must be coordinated someway. So there must be a coordination mechanism that provides these actions a common context.
WS-Coordination is the standard that provides such common context to a set of web service actions working towards a common web activity.
Probably the most important construct of WS-Coordination, is that of “CoordinationContext”. CoordinationContext contains the information required to coordinate between actions participating in an activity. This common information ties together the various actions so that they know that they are part of the same activity.
WS-Coordination also defines various web services, but here are the most important.
- The Activation Service is the one that creates the coordinationcontext
- The Registration service is the service that other Web services use to register to participate in an activity
- And finally, the Coordinator service is the one that plays the coordinator role between the various actions. Of course, the nature of this coordination is domain-specific; meaning that it differs depending on the type of activity being coordinated. For example, it can be to coordinate actions of the take exam activity, or to coordinate actions of a money transfer activity.
So in summary, WS-Coordination is a generic framework. Lets now see, how these concepts map to WS-AT.
What is WS-AT?
So now you know that WS-Coordination provides a generic framework for coordinating web service actions. You also know that it contains a coordinationcontext construct.
A “Coordination Type” is part of the coordinationcontext and it represents a protocol that is targeted towards a specific specific problem domain. For example, WS-AtomicTransaction defines a coordination type that is used in atomic transaction scenarios.
In other words, WS-AT builds on the generic WS-Coordination framework, to coordinate a set of actions for a specific type of activity, and that is atomic transaction.
WA-AT allows cross machine and platform calls. What this means practically, is that you have services not only running on different machines, but also on different platforms, participating in a single atomic transaction.
This is a coordinationcontext used by WS-AT. I will explain this in the demo, but for now, just notice the CoordinationType element. Its value is set to the WS-AT namespace. So this indicates that its being used to coordinate actions of an atomic transaction.
WS-AT Coordination Protocols
There are three coordination protocols specified in the WS-AT specification:
- The Completion Protocol, is used by an entity participating in the transaction, to tell the TM either to commit or rollback an operation. This is basically equivalent to the voting concept I talked about in the 2PC protocol.
- The other protocol is either Volatile 2PC, or Durable 2PC. You have already seen what 2PC protocol is. In addition recall that there two types of resource managers, volatile which can only manage in-memory transaction data and cannot survive system failures. And Durable RMs which can persist data and transaction state during system failures, such as SQL Server.
So simply, volatile RMs, use the Volatile2PC, while durable RMs, use the Durable2PC.
The Role of DTC
Final thing I want to say before walking you through a scenario, is that in WS-AT, the Activation, Registration, and Coordinator services I talked about in WS-Coordination, are typically implemented implicitly by a transaction manager, such as the DTC.
So for the sake of WS-AT, you do not need to know how exactly these service work and what messages are exchanged as part of the protocol, unless of course you’re implementing your own TM.
Now I will walk you through a scenario which clarifies the operation of the WS-AT.
In this example, a client application wants to propagate transaction to a service such that both entities commit or abort their operations as a unit.
The client and the service are running on different machines, so each entity is dealing with a TM, such as DTC.
- The client application first sends a request to its TM activation service to create a new coordination context for the transaction.
- Now remember from when I talked about the 2PC protocol, that the first TM that starts the transaction, is called a root TM. Well in this case, because there is no existing coordination context, then this TM becomes the root of a new transaction.
- The TM creates a new transaction coordinationcontext. This context contains a URI for the registration service. Remember that the registration service is used by services to participate in the activity.
- After the client receives the coordinationcontext, it uses the registration service URI to register for the completion protocol. Note here that this is only registration, votes to commit or rollback are not collected yet
- The TM returns to the client the Coordinator Service URI
- Now the client is ready to propagate the Tx to the service. So it sends a message which contains in its header the coordination context that the client already has.
- The service, finds that an already existing coordination context, so it knows that it must not start a new Tx, rather it should join the existing transaction.
- So the service, sends the coordination context to its own TM Activation Service
- Now its time for the two TM to talk to each other. The TM creates a new message, specifying that it is using the Durable2PC. It then looks up the Registration Service URI of the client TM contained in the coordinationcontext. Finally it sends this message to the client TM – which remember – is the root TM.
- The root TM, sends back a response message containing its Coordinator Service URI
- The service TM, then wraps the coordination context with a new one that includes the URI of its own registration service. It then sends this context to the service.
- The service then, uses the registration service, to register for the completion protocol
- This concludes the setup process to coordinate the entire interaction
- Now when the client, say commits its operation, the 2PC protocol kicks in. I have already explained the 2PC, so I will skip this detail here. But in summary, the client is already registered to the completion protocol, so the client tells the root TM that it wants to commit. The root TM also sees that the service TM has registered in the completion protocol, so it asks the service TM to collect the votes of the entities it’s managing. The service TM, in turns gets the vote of the service, and sends the result to the root TM.
- Assuming the service also decides to commit its operation, the root DTC know has two votes to commit, so it commits the client operation, and commands the service TM to commit the service operation. This concludes the 2PC protocol, and the root TM dismisses the Tx.
Transaction Flow in WCF
Transaction flow is how clients propagate atomic transactions to services. For this to happen, the WCF transport binding must support this. The bindings that support transactions are:
- Those that allow inter-process communication through Application Domains; namely, the NetNamedPipeBinding
- Those that allow TCP-based cross machine calls, such as the NetTcpBinding and the NetMsmqBinding
- And those that allow inteoperable communication through WS-AtomicTransaction, such as the WSHttpBinding
Configuring Transaction Flow Step1: Enable Transaction
There are 3 steps you need to perform to enable transaction flow.
Even for the bindings that support transactions, transaction flow is disabled by default. You have to enable this for the client and the service. This is done by setting the “transactionFlow=”true”” attribute of the <binding> element, as shown in this example:
Step2: Configuring Transaction Flow for Service Operation
The second step, is to configure the behavior of the flow on the WCF operation level.
Consider the following WCF service:
In order to configure transaction flow for this operation, you need to supply the TransactionFlow attribute, with one of 3 possible values for the TransactionFlowOption:
- NotAllowed is the default value. It means that the client cannot propagate its transaction to the service operation. In this case, it does not matter if the binding supports transactions and if it is configured to allow transaction flows; if the client tries to propagate its transaction, it will simply be ignored without any exception being thrown.
- Allowed, means that the operation allows transaction propagation if the client wants to. That is, if the client has created a transaction, and is using a binding which supports transactions, and has allowed transaction flows; then the transaction will be propagated to the service (provided of course that transaction flow is enabled also at the service). In this case if the service is using a binding that does not support transactions, while the client is trying to propagate a transaction, an exception will be thrown.
- Mandatory means that the client and the service must be using a binding that supports transactions, and with transaction flow enabled. Any violation to these conditions will result in an exception.
Step3: Configuring Transaction Management for Service Operation
The third step is to do the actual transaction coding for a WCF operation. This means the code that actually initializes the transaction, commits and aborts it.
The following operation shows how to configure transactions by using the OperationBehavior attribute.
- The “TransactionScopeRequired” property has the same meaning as the “TransactionScope” class of the .NET Framwork; it wraps the code block within a transaction scope.
- The “TransactionAutoComplete” property instructs WCF to auto commit the transaction at the end of the code block. This is the same as using the “Complete” method of the “TransactionScope” class.
One final important note; is that the above configuration means that the operation will use a transaction; however, it says nothing about the source of this transaction. This means that if transaction flow is allowed or mandated, then the client transaction will be propagated to the operation and it will be the one in use. If, on the other hand, transaction flow is not configured or not allowed, the operation will initialize its own transaction and use it.
Demo: WS-AT in WCF
Download source code from here: http://1drv.ms/1JBTxFn
The “Deduct” Service
- In the Deduct service, service configuration, notice that the endpoint is using the wshttpbinding, which implements the WS-AT protocol.
- And in the binding configuration, transaction flow is enabled. As explained before, this means that the service accepts being part of a transaction propagated from the client.
- In the service contract, I have an operation called deduct and transactionflow is mandatory; so there must be propagated transaction from the caller.
- And in the service implementation, using the operationbehavior attribute, using the transactionscoperequired attribute, I specify that that the operation code will be implicitly wrapped with a transaction scope; and using the transactionautocomplete attribute, the transaction will commit at the end of the transaction scope.
- The Deduct operation, internally calls a DBOperation method which inserts a record into a SQL Server database.
The “Add” Service
The Add service has the same exact configuration so I won’t spend more time on it. The only difference is in the implementation of the Add operation.
The Add operation also calls a DBOperation to insert a record into an another SQL database; however, what I have done here is throwing an exception, and the Add method returns back to the caller a value to indicate if an exception has been raised.
- In the client console I wrap my code in a transaction scope because I need to propagate the client transaction to the services.
- Then I simply call the Deduct operation followed by the Add operation. I then check the result of the Add operation, and if I find that there have been no exceptions raised, then I commit the transactions.
- I will talk more about this Distributedidentifier property in a moment when I run the client.
Enable DTC and WS-AT Protocol
You need to enable DTC and the WS-AT protocol.
First from the local service, make sure that the Distributed Transaction Coordinator service is running.
And from the Component Services console, from the properties window of the Local DTC, make sure the WS-AT protocol is enabled:
For development and debugging purposes, you will want also to increase the “Default outgoing timeout property” to avoid the transaction being terminated while you are debugging.
Running the Sample
When you first run the client, a transaction has been created at the client. To verify this, get the current transaction information from the Transaction class and then retrieve the local transaction identifier. See that it has a value which means a transaction exists:
However, the important identifier for the sake of this sample is the distributed identifier. Notice that this has no value yet; this indicates that the transaction has not been propagated yet.
Next when you step into the Deduct service operation. Check the distributed identifier again, and this time, notice that it has a value. The reason is that the transaction has been propagated by the client. DTC is now in control and it has assigned a distributed transaction id to manage the entire transaction:
To examine how how our code can acquire a lock on the RM (SQL Server), before returning back to the client (from the Deduct operation), run a query to select records from the deduct database. You will see that the query just keep processing. The reason is that Deduct operation has acquired a lock on this database table as a part of the atomic transaction. So the record which was inserted from the Deduct operation has not been committed yet, and the data is locked until the transaction is completed.
One interesting thing to do, is to check traffic in Fiddler:
Couple of things to note:
- As I said before, the activation, registration, and coordination services, are implemented implicitly in DTC. So you will not see here any traffic related to these services.
- This sample ran on a single machine, which means the transaction was managed by a single DTC. If I had the service distributed over multiple machines, then as I explained, this would have made one of the DTCs a root TM, and there would have been communication between the two DTCs as part of the 2PC protocol.
- The first log entry shows the request to the Deduct service. The transaction related traffic is found in the coordinationcontext. The identifier element has the same distributed transaction id we traced in the code. The coordinationtype is set to the ws-at namespace. Recall that WS-AT is a coordinationtype of the generic WS-Coordination standard.
- We can see the address of the RegistrationService hosted within DTC:
- Remember that in the background when DTC created the coordinationcontext and handed it over to the client, that context contained the URI of the registration service. The client now when propagating the transaction to the deduct service, sends the same coordinationcontext including this registrationservice URI. When running over multiple machines and therefore multiple DTCs, the DTC will use this regisrtationservice to talk to the root DTC. If any of this is not clear, please review the WS-AT section.
- And the final important thing, is the PropagationToken element. This elements contains a base64 encoding of a randomly generated propagation token:
- The second log entry shows the client call to the Add service; and there is nothing new to show here. The transaction is propagated the same way, and you will again see the same coordinationcontext flowing from the client to the service.