V BASE was compiled using all available human germline V gene sequences (V, D or J) taken from the literature and from the EMBL/Genbank databases. Currently, 961 sequences from 145 different publications make up the V BASE database. Where sequences from different publications have identical coding regions there is a single V BASE entry. Thus, there are currently 638 different sequences in V BASE. For further details on the compilation of V BASE see below.

V BASE was compiled as follows:

1. Individual sequences were cut down to leave just the V, D or J exon (these include the nucleotides immediately 3' of the last codon in the V exon and immediately 5' of the first codon in the J exon where available, but exclude recombination signals and splice sites). We have also removed any primer sites.

2. There are several examples where the name of the sequence has been truncated (indicated by two dots) due to its length. For example, b9-12,33,35 has been shortened to b9..

3. Sometimes the name of the sequence has been altered if it contains characters prohibited in file names. For example, greek characters are replaced by the words psi, phi etc.

4. Once a germline match has been established, always refer to the original manuscript cited in the reference list for the correct syntax.

5. Where a single sequence has two different names, the least used name is given in round brackets. For example, VKRF(humkv325)

6. Where two sequences published by different groups have the same name, or if the same sequence has later been corrected, the name is suffixed by the first author of the manuscript in which it appears. For example, VH251 Shen and VH251 Sanz

7. In some cases the sequence described in the manuscript (m) is different from the one submitted to the EMBL data libraries (e). Both sequences have been included in V BASE. For example, b28m and b28e

8. Occasionally there is a conflict between the nucleotide and protein sequences, or between the same nucleotide sequence in different tables/figures. Wherever possible we have attempted to resolve these conflicts.

9. Where two or more exon sequences are identical on the nucleotide level there is a single entry in V BASE - the V BASE entry being named after the individual sequences separated by a slash. Where there are more than two identical sequences the name of the V BASE entry is truncated (indicated by three dots). In these cases, a list of all the identical sequences can be found in the reference list. For example, the four sequences V79, VIV-4b, VH4.19, VH4-MC4 have identical exon sequences so in the directory the V BASE entry is called V79/VIV-4b...

10. Where 'identical' sequences are of different lengths (due to the presence of partial sequences), the first named corresponds to the sequence provided. Occasionally, the partial sequence is 'identical' to two or more longer sequences (which differ outside the region covered by the shorter sequence). In these cases the partial sequence or sequences may appear more than once. For example, HC16-16/DP-82, VHGL3.5/DP-82 and VHGL3.7/DP-82

11. Where known, the subgroup or family of each VH, VK, VL and D sequence is provided.

12. Where known, the locus or loci to which each VH, VK, VL and D sequence corresponds is given (indicated by a + at the end of the V BASE entry). Where the same sequence corresponds to two or more different loci they are separated by a slash. For VH, 5-a, 4-b, 1-c, 3-d, 1-e, 1-f, 3-g and 3-h correspond to the eight VH loci (from JH-proximal to JH-distal) which have been sequenced but are not precisely located on the map (see maps). If the map location is outside the functional locus the chromosome on which it is located is given. In these cases the sequences do not have a + at the end of the V BASE entry.