1461175402-0612695a-e66d-4cfe-8d35-9e2b820382a3

A method according to the present invention enables the similarity between sequences of symbols to be determined using rules generated from a dictionary-based compression scheme according to the content of the columns from databases. Primers hybridizing to regions flanking these biallelic markers are also provided. The method further includes determining a value of a weighting factor based on the activity data.