|
About SOUNDEX
-
SOUNDEX is a phonetic code utililized to index various U.S. Census data since
1880.
Converts English alphabets only
-
Soundex was used by the National Archives to index the U.S. censuses. It codes
together surnames that sound similar but have different spellings.
Source Code
The Soundex Algorithm
Soundex codes begin with the first letter of the surname followed by a
three-digit code that represents the first three remaining consonants. Zeros
will be added to names that do not have enough letters to be coded.
Soundex Coding Guide (Consonants that sound alike have the same code)
1 - B,P,F,V
2 - C,S,G,J,K,Q,X,Z
3 - D,T
4 - L
5 - M,N
6 - R
The letters A,E,I,O,U,Y,H, and W are not coded.
Names with adjacent letters having the same equivalent number are coded as one
letter with a single number.
Surname prefixes such as La, De and Van are generally not used in the soundex.
However, Mc, Mac and O generally are not considered prefixes for soundex.
Soundex Limitations
Names that sound alike do not always have the same soundex code. For example,
Lee (L000) and Leigh (L200) are pronounced identically, but have different
soundex codes because the silent g in Leigh is given a code.
Names that sound alike but start with a different first letter will always have
a different soundex code. Thus, names such as Carr (C600) and Karr (K600)
should be calculated separately.
Soundex is based on English pronunciation so European names may not soundexed
correctly. For example, some French surnames with silent last letters will not
code according to pronunciation. This is true with French name such as Beaux -
where the x is silent. Sometimes this surname is also spelled Beau (B000) and
is pronounced identically to Beaux (B200), yet they will have different soundex
codes. Although I have given only a French example, this could be true of any
name that does not use English pronunciation.
Sometimes names that don't sound alike have the same soundex code. When I am
searching for the surname Powers (P620), I have to wade through Pierce, Price,
Perez and Park which all have the same soundex code. Yet Power (P600), a common
way to spell Powers 100 years ago, has a different soundex code.
Surnames with prefixes were usually coded without the prefix, but not always.
If you are searching for a surnames such as DiCaprio or LaBianca, you should
try the soundex for both with and without the prefix.
US Census soundex confusion arises with names such as Ashcraft. When the
original soundex coder didn't code the H and didn't consider the H as a
separator between the adjacent letters with the same code S and C ,
then the S and C would be considered adjacent letters to be coded only once and
the soundex will be A261. In the 1920 NY Census, Ashcraft is found under A261.
Those who coded the soundex for the 1880*, 1900 and 1910** census may or may
not have used this rule. They sometimes considered the H as a separator, and
did not code the S and C as adjacent letters that would only be assigned one
letter, but rather gave a number code to each letter. In this case Ashcraft
would be A226, the result you receive with the calculator on this page.
The important thing to know is that the US Census was not consistent with using
the letter H and W as separators between adjacent letters. If you are trying to
calculate the soundex for a name with the letters W or H that separate two
adjacent letters, it is best to calculate the soundex using the two different
methods to locate the name in the US census. This would be true of any name
that has any of the letters C,S,G,J,K,Q,X,Z on both sides of the letter H or W
such as SHC, SHS, CHS, KHZ, SWS, KWS, CWK.
A surname of more than one word, or a surname that commonly comes before a
given name, such as Native Americans and Chinese surnames, may have been coded
under the name which appears last, even though it might not be the actual
surname. In the case of multi-word surnames, only the last word may have been
coded.
|
|