opensextant.phonetics
index
/Users/ubaldino/workspace/opensource/Xponents-Core/src/main/python/opensextant/phonetics.py

Geocoding Phonetics Library
 
:created Created on Mar 15, 2012
:author: ubaldino
:copyright:  MITRE Corporation, (c) 2010-2012
 
Requirements: advas.phonetics library is used here; but a modified version of it is included in this package.

 
Classes
       
builtins.object
PhoneticMap

 
class PhoneticMap(builtins.object)
    PhoneticMap(p)
 
Convenience class to organize a single Phoneme to a list of names (which have the same phoneme)
 
  Methods defined here:
__init__(self, p)
Initialize self.  See help(type(self)) for accurate signature.
add(self, name)

Data descriptors defined here:
__dict__
dictionary for instance variables (if defined)
__weakref__
list of weak references to the object (if defined)

 
Functions
       
get_phonetic_initials(phrase)
Convert a word into its acronym.  You would only do this if you knew
you had a phonetic spelling, e.g., Tango Bravo = TB
get_phonetic_phrase(word)
Convert a code word into its expanded phonetic spelling.
e.g., given TB generate tango bravo
input is assumed lowercase.
 
:param word: lower case code word
match_filter_phonetically(target, targetlen, test, testlen, max_len_diff, max_edit_dst)
For performance reasons we assume you have lower case versions of target and test
and lengths for both.
 
Does test match target phonetically?  Usage: Given target, find test in [a, b, c, d...] that match target
 
:param target:      thing you want to match to.
:param targetlen:
:param test:        a test.
:param testlen:
:param max_len_diff:  basic length filter
:param max_edit_dst:  Finally assess edit distance of text
match_phonetically(a, b)
match_phonetically( a, b ) attempts to match 
words by the phonetic similarity of their initials.
Limitation:  F and PH are one intended match, but for now F =? P suffices.
phonetic_code(tok)
An application of Advas phonetics library
Metaphone appears to generate a fourth of the matches Caverphone does.
... that is Caverphone is looser, and noisier similarity matching.
CAVEAT:  If you change phonetics, you must RE-Pickle
 
WINNER: metaphone.
phonetic_params(termlen)
get params for a given term length
:param termlen: term len
:return:
phonetic_redux(tok)

 
Data
        ARRAY_OF_PHONETICS = [{'c', 'k', 'q'}, {'f', 'p'}, {'g', 'j', 'y'}, {'c', 's', 'z'}, {'s', 'z'}, {'v', 'w'}]
JA_CONSONNANCE = {'g', 'j', 'y'}
KA_CONSONNANCE = {'c', 'k', 'q'}
PH_CONSONNANCE = {'f', 'p'}
REDUCE_CONSONNANCE = {'bb', 'cc', 'dd', 'ff', 'gg', 'kk', ...}
SA_CONSONNANCE = {'c', 's', 'z'}
WA_CONSONNANCE = {'v', 'w'}
XA_CONSONNANCE = {'s', 'z'}
ascii_lowercase = 'abcdefghijklmnopqrstuvwxyz'
digits = '0123456789'
phonetic_a2w = {'0': 'zero', '1': 'one', '2': 'two', '3': 'three', '4': 'four', '5': 'five', '6': 'six', '7': 'seven', '8': 'eight', '9': 'nine', ...}
phonetic_alphabet = ['alpha', 'bravo', 'charlie', 'delta', 'echo', 'foxtrot', 'golf', 'hotel', 'india', 'juliet', 'kilo', 'lima', 'mike', 'november', 'oscar', 'papa', 'quebec', 'romeo', 'sierra', 'tango', ...]
phonetic_numbers = ['zero', 'one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine']
phonetic_w2a = {'alpha': 'a', 'bravo': 'b', 'charlie': 'c', 'delta': 'd', 'echo': 'e', 'eight': '8', 'five': '5', 'four': '4', 'foxtrot': 'f', 'golf': 'g', ...}