You sometimes find distant relatives who have a GEDCOM you want to merge into your database. You can't just add their file to the end of yours. There are 2 main problems:
You almost always have the second problem because if you didn't have people in common, there is no reason to read in their file. But creating duplicates is the last thing you want.
When people ask about merging in another GEDCOM, what they really want is to integrate common people and then merge in the rest.
But before you can integrate the common people, you must first match them. It may be obvious to us (us nerds) that James T. Kirk and Jim Kirk are the same person, but a computer program would see them as different people.
After you match everyone up, there is still one more problem. In some cases the imported details are completely wrong! Even when the differences are minor, like James vs. Jim, you still have to decide which details to accept.
This means that a GEDCOM merge is actually 4 main steps:
This is the hardest part of the merge process. The first problem is getting started. Once you have the same person identified in both GEDCOMs, it can move up and down the family tree matching everyone else.
At each point in the family tree, the program can present side-by-side families, and let the user verify they match up. If a man was married twice, he might be paired up with the wrong wife.
When you reject the Wife match, it should then find the other wife and show that family. But in the 2nd GEDCOM the children may not be listed in the right birth order.
When you then reject the Child 1 match, it should then rearrange the children until it finds the right match.
After this family is matched, it should look at the spouse and children for each member of this family.
... to do ...
... to do ...
... to do ...
It has always been a plan to add merging to GDBI, and work on it has finally begun. At this point it is just analysis code to help decide if 2 people match. Eventually you should be able to run GDBI, open your database, specify a GEDCOM text file, and begin the merge process (described above).
What GDBI has so far is a simple test program for matching people. That hard part of that program is comparing all the details for each possible pair of people (from the old and new database) to have it automatically find the matching people. (Matching them all manually is tedious.) It has a basic GUI for choosing a primary and import database, and then selecting the starting person in each database. It needs to be enhanced to chose which details to take, and then finally to merge.
If you plan to merge in your relative's GEDCOM from time to time, make sure they have unique IDs for all of their records. That way you will only have to drudge through the matching process once. After that your database will have their IDs, and every time you re-merge, it will find the matching IDs.
Not all genealogy programs have this feature,
but some will generate unique values for the
REFN and RFN are standard tags, and _UID is a popular extension.
If they can't generate these tags automatically,
it helps if they add a few manually
to give the matching program a starting point.
If we expect other programs to generate unique IDs, GDBI needs to do it as well. This can either be done by GDBI itself or by the databases that it connects to (PGV, GenJ, jLL). We recently discussed adding it to PGV.
The only thing worse than having no ID is having a non-unique ID. It defeats the purpose. We need a technique for generating good IDs. This was recently discussed on the GEDCOM-L mail list, and these ideas were proposed:
$Header: /cvsroot/gdbi/doc/webpage/htdocs/merge.html,v 1.8 2005/02/11 03:15:01 dkionka Exp $