Main Menu
Knowledge Base
Product Registration
Log an Incident
Request a Feature
Search Incidents/Bug Reports


Search KB

Please note: In an effort to better serve you, we are in the process of restructuring DevCenter. In the process, we have moved many items that you may be used to finding in DevCenter over to the Main Site. If you are having trouble locating something, please try looking at the following places:

Reconcile Your Data with a Smarter Merge
View Printer Friendly Version

Dino Esposito
Windows Developer Network
Page Options
Average Rating:
6 out of 10

Rate this page
Print this page
E-mail this page
Add to Favorites

Posted:

Monday, September 01, 2003

Applies To:


  • ADO.NET
  • .NET Framework

Summary:

Reconciling a changes-only copy of your data can be tricky, particularly if it contains new records as well as updates

The disconnected nature of ADO.NET brings up a number of new issues. Data conflicts, record locking, and optimistic concurrency are only the most featured ones. Another, more subtle, problem can easily show up if you're building real-world, high-scalable applications and make intensive use of disconnected data. At the foundation of the ADO.NET disconnected model lies the assumption that you load and save data at different times with no form of client/server connectivity existing in the mean time. How do you deal with server-generated values? And do you merge and reconcile the client copy of the data with the modified copy produced by the database server? And, more importantly, how do you accomplish that using ADO.NET tools?

In general, when it comes to submitting a bunch of updates to a remote database, the right tool to use is batch update. ADO.NET, in particular, provides a handful of interesting features including the capability of retrieving and posting back server-side generated values. Turned on by default, this feature works only if you use stored procedures or SQL batches to perform updates. The reason is that an extra SELECT statement must be added to the original command for ADO.NET. Once the command terminates, the ADO.NET infrastructure retrieves those values and automatically updates the client copy of the DataSet. I discussed underpinnings and implications of this technique in the May "Inside .NET" column.

In this article, we'll use the same technique in a particular scenario; specifically, when you have a separate data tier that receives a changes-only DataSet from the client.

Where the Problem Lies


High-scalable, real-world applications have a data tier placed somewhere in between the client and the remote database. Moving data from the client to the database and vice versa is an operation that can affect the throughput of the application and its overall performance. The DataSet is a serializable object and, as such, can be transferred from tier to tier. For performance and scalability reasons, though, you normally want to avoid transferring the whole set of data that the client holds and displays. Well-designed clients extract a subset of rows from the DataSet tables and send only them. It goes without saying that the rows extracted are those that have been changed since the last commit to the database.

Suppose now that a user has added new records to one of those tables. Suppose also that one of those tables has an autoincrement column. ADO.NET can generate autoincrement values, but those are fake numbers that are guaranteed to be unique only in the context of the DataSet. When you submit a new row to the database, the batch update process intelligently recognizes autoincrement and read-only columns and excludes them from the commands. In this way, the ADO.NET-generated increment value is never passed to the server, and at the end of the batch it's automatically replaced with the actual, server-side generated value. So far, so good. What happens, though, if the DataSet being sent to the data tier for update is a changes-only copy? The server-side values are replicated in the copy of the DataSet, not the original.

This is not too difficult because the DataSet features the Merge method, which is used to merge two DataSet objects that have largely similar schemas. A merge is typically used on a client application to incorporate the latest changes from a database into an existing DataSet. This allows the client application to have a refreshed DataSet with the latest data from the data source. When you call Merge to reconcile the original copy of the DataSet with the changes-only copy modified on the server, something happens that is not exactly what you want. The Merge method has no way to distinguish between two otherwise identical rows that now have different values in the autoincrement field. The row in the source DataSet retains the ADO.NET-generated value; the copy of the row that was passed with the changes-only DataSet is now updated with the real value for the column. For the Merge method, they are actually distinct rows and both are left in the DataSet at the end of the merge.

We'll first review the basics of the GetChanges and Merge methods and then discuss a workaround for the aforementioned problem.

Getting Changes


The DataSet object provides an ad hoc method to let you obtain a child DataSet made of changed rows only.

DataSet changes = sourceDataSet.GetChanges();

The GetChanges method walks through the collection of tables in the source DataSet and copies all the rows that result changed. The method has an overload that allows you to select rows based on the type of change. For example, the following code snippet shows how to select only the rows that have been added.


DataSet added;
added = sourceDataSet.GetChanges(DataRowState.Added);


GetChanges works on all tables contained in the DataSet. There's no direct way to select only the changes occurred on a particular table. In this case, you have to use the GetChanges method defined in the DataTable class.</P><P>Note that the changes-only DataSet contains all the tables defined in the schema regardless of whether they have been changed or not. If a table results unchanged, it will be empty in the final DataSet. The GetChanges method works by first cloning the schema of the original DataSet — which ensures that all the tables are there — and then iterating through the tables looking for changes. To clone the schema of a DataSet, you use the Clone method. No data is copied in this case. To obtain a deep copy, in which both schema and data are duplicated, you must resort to the Copy method. </P><P>During the copy, constraints are suspended on the DataSet being created. The EnforceConstraints property is temporarily set to False. When the method terminates, constraints are set as in the original. Finally, note that the GetChanges method returns NULL if the original DataSet has no pending changes. The HasChanges method is used to check for this condition. </P><P>

The Merge Operation


The Merge method is typically called at the end of a series of procedures that validate changes, reconcile errors, and update the data source with the changes. The Merge operation is strictly connected with the update process and the GetChanges method in particular. It's by using the Merge method that you finally refresh the existing DataSet with the version of the DataSet resulting from the data tier. </P><P>The Merge method compares and manages rows as well as columns. The first operation that Merge accomplishes is the comparison between the schemas of the involved DataSet objects. There might be situations in which the data and the middle tier also modify the schema of the original table, typically adding columns. </P><P>If the DataSet being merged with the original contains extra columns, what happens depends on the value assigned to the MissingSchemaAction argument. The Merge method features several overloads, as shown here:


public void Merge(DataRow[]);
public void Merge(DataSet);
public void Merge(DataTable);
public void Merge(DataSet, bool);
public void Merge(DataRow[], bool, MissingSchemaAction);
public void Merge(DataSet, bool, MissingSchemaAction);
public void Merge(DataTable, bool, MissingSchemaAction);


If missing, the MissingSchemaAction is set to Add, meaning that the merged DataSet will contain the added schema and data. Other options are AddWithKey, Error, and Ignore. The Boolean value in the signatures indicates whether the changes made to the original DataSet should be maintained. The default is False, meaning that the state of the rows is accepted and changes that are saved to the database committed.

After merging schemas, the data is merged, and after that, constraints, relations, and extended properties are merged. During a merge, constraints are disabled. At the end of a merge, should any constraints be impossible to enable, a ConstraintException is thrown. The merge has no effect until the constraints are correctly restored. The EnforceConstraints property is automatically set to False; all invalid rows are marked in error. Errors must be cleared before the EnforceConstraints property can be set back to True.

When merging a new source DataSet into the target, rows are matched using primary key values. When the primary key column is also an autoincrement column, you run into a snag — the same row is represented in the two DataSet objects to merge with different values. Both values are, for different reasons, unique and valid in their own context. Only one of them, though, is acceptable. We need to get rid of the original value and replace it with the more-reliable, server-generated value in the changes-only DataSet.

The Sample Application


In this code snippet, you can see a piece of code that runs when users click the Save button on the sample application of Figure 1:


private void button2_Click(object sender, System.EventArgs e)
{
    Merger m = new Merger();
    DataSet ds = m_dataSet.GetChanges();
    ds = m.Save(ds, "Employees");
m_dataSet.Merge(ds);</P><P>    // bind to grid
    BindData();
}


The Merger class represents a middle-tier component that takes care of connecting to the data source and performs reads and updates. You pass the subset of the current DataSet that only contains changes and get back a new DataSet. If no row value is modified on the server, then the returned DataSet is entirely contained in the original as it was at the beginning of the procedure. In case of autoincrement columns and insertions, there's a significant difference between the two — the value of the autoincrement column. The Merge method in the code snippets shown here loads any changes that occurred on the server in the original DataSet. Or at least this is what you expect will happen.

The employeeID column is an autoincrement column. The ID for the last record has been automatically generated by ADO.NET. That value will unlikely be the same that the database will assign for the same field. However, if you simply perform the merge as in the aforementioned code, what you get is exactly what Figure 2 shows.

The true autoincrement value is 24. The other record is the old copy in which the autoincrement value has been generated by ADO.NET. Since the key values are different, the framework can't distinguish between the two. As a result, a new undesired record is added.

Smart Merging


Smart merging is a procedure that replaces the undesired records with others that have been modified on the server. Of course, it's worth noting that such a problem doesn't occur in all cases, but only if the autoincrement column is also the table's primary key. All in all, this is a rather common scenario.

The following code illustrates a Merge method defined as a middle-tier object. The method takes two DataSet objects and merges them. In this particular implementation, it also takes a third argument, being the name of the single table to process. A more general implementation would loop through all the tables and also consider other aspects of merging.


public DataSet Merge(DataSet source,
    DataSet changes, string member)
{
    // First, remove added
    // records from the source
    DataTable dt = source.Tables[member];
    DataRow[] rows = dt.Select(
        "", "", DataViewRowState.Added);
    foreach (DataRow row in rows)
        dt.Rows.Remove(row);</P><P>    // Second, merge as usual
    source.Merge(changes)
    return source;
}


The idea is that you first remove from the original DataSet all the records that have been added. Next, you merge as usual. In this way, you basically partition the original DataSet in two disjoint segments. This prevents at the root any possible data duplication. Figure 3 shows the final result. The source code for the sample app is available online.

Dino Esposito is Wintellect's ADO.NET and XML expert and is a trainer and consultant based in Rome, Italy. He is a contributing editor to MSDN Magazine, writing the "Cutting Edge" column, and is the author of several books for Microsoft Press, including Building Web Solutions with ASP.NET and Applied XML Programming for .NET. Contact him at dinoe@wintellect.com.

Article provided by:

How would you rate the quality of this content?
Poor -----------------------------------------> Outstanding

Tell us why you rated the content this way. (optional)