Python: module opensextant.phonetics

opensextant.phonetics

index
/Users/ubaldino/workspace/opensource/Xponents-Core/src/main/python/opensextant/phonetics.py

Geocoding Phonetics Library :created Created on Mar 15, 2012 :author: ubaldino :copyright: MITRE Corporation, (c) 2010-2012 Requirements: advas.phonetics library is used here; but a modified version of it is included in this package.

Classes



builtins.object

PhoneticMap

class PhoneticMap(builtins.object)

    PhoneticMap(p) Convenience class to organize a single Phoneme to a list of names (which have the same phoneme)

Methods defined here:

__init__(self, p)
Initialize self.  See help(type(self)) for accurate signature.

add(self, name)

Data descriptors defined here:

__dict__

dictionary for instance variables (if defined)

__weakref__

list of weak references to the object (if defined)

Functions


get_phonetic_initials(phrase)
Convert a word into its acronym.  You would only do this if you knew you had a phonetic spelling, e.g., Tango Bravo = TB

get_phonetic_phrase(word)
Convert a code word into its expanded phonetic spelling. e.g., given TB generate tango bravo input is assumed lowercase. :param word: lower case code word

match_filter_phonetically(target, targetlen, test, testlen, max_len_diff, max_edit_dst)
For performance reasons we assume you have lower case versions of target and test and lengths for both. Does test match target phonetically?  Usage: Given target, find test in [a, b, c, d...] that match target :param target:      thing you want to match to. :param targetlen: :param test:        a test. :param testlen: :param max_len_diff:  basic length filter :param max_edit_dst:  Finally assess edit distance of text

match_phonetically(a, b)
match_phonetically( a, b ) attempts to match words by the phonetic similarity of their initials. Limitation:  F and PH are one intended match, but for now F =? P suffices.

phonetic_code(tok)
An application of Advas phonetics library Metaphone appears to generate a fourth of the matches Caverphone does. ... that is Caverphone is looser, and noisier similarity matching. CAVEAT:  If you change phonetics, you must RE-Pickle WINNER: metaphone.

phonetic_params(termlen)
get params for a given term length :param termlen: term len :return:

phonetic_redux(tok)

Data

ARRAY_OF_PHONETICS = [{'c', 'k', 'q'}, {'f', 'p'}, {'g', 'j', 'y'}, {'c', 's', 'z'}, {'s', 'z'}, {'v', 'w'}]
JA_CONSONNANCE = {'g', 'j', 'y'}
KA_CONSONNANCE = {'c', 'k', 'q'}
PH_CONSONNANCE = {'f', 'p'}
REDUCE_CONSONNANCE = {'bb', 'cc', 'dd', 'ff', 'gg', 'kk', ...}
SA_CONSONNANCE = {'c', 's', 'z'}
WA_CONSONNANCE = {'v', 'w'}
XA_CONSONNANCE = {'s', 'z'}
ascii_lowercase = 'abcdefghijklmnopqrstuvwxyz'
digits = '0123456789'
phonetic_a2w = {'0': 'zero', '1': 'one', '2': 'two', '3': 'three', '4': 'four', '5': 'five', '6': 'six', '7': 'seven', '8': 'eight', '9': 'nine', ...}
phonetic_alphabet = ['alpha', 'bravo', 'charlie', 'delta', 'echo', 'foxtrot', 'golf', 'hotel', 'india', 'juliet', 'kilo', 'lima', 'mike', 'november', 'oscar', 'papa', 'quebec', 'romeo', 'sierra', 'tango', ...]
phonetic_numbers = ['zero', 'one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine']
phonetic_w2a = {'alpha': 'a', 'bravo': 'b', 'charlie': 'c', 'delta': 'd', 'echo': 'e', 'eight': '8', 'five': '5', 'four': '4', 'foxtrot': 'f', 'golf': 'g', ...}

Data
		ARRAY_OF_PHONETICS = [{'c', 'k', 'q'}, {'f', 'p'}, {'g', 'j', 'y'}, {'c', 's', 'z'}, {'s', 'z'}, {'v', 'w'}] JA_CONSONNANCE = {'g', 'j', 'y'} KA_CONSONNANCE = {'c', 'k', 'q'} PH_CONSONNANCE = {'f', 'p'} REDUCE_CONSONNANCE = {'bb', 'cc', 'dd', 'ff', 'gg', 'kk', ...} SA_CONSONNANCE = {'c', 's', 'z'} WA_CONSONNANCE = {'v', 'w'} XA_CONSONNANCE = {'s', 'z'} ascii_lowercase = 'abcdefghijklmnopqrstuvwxyz' digits = '0123456789' phonetic_a2w = {'0': 'zero', '1': 'one', '2': 'two', '3': 'three', '4': 'four', '5': 'five', '6': 'six', '7': 'seven', '8': 'eight', '9': 'nine', ...} phonetic_alphabet = ['alpha', 'bravo', 'charlie', 'delta', 'echo', 'foxtrot', 'golf', 'hotel', 'india', 'juliet', 'kilo', 'lima', 'mike', 'november', 'oscar', 'papa', 'quebec', 'romeo', 'sierra', 'tango', ...] phonetic_numbers = ['zero', 'one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine'] phonetic_w2a = {'alpha': 'a', 'bravo': 'b', 'charlie': 'c', 'delta': 'd', 'echo': 'e', 'eight': '8', 'five': '5', 'four': '4', 'foxtrot': 'f', 'golf': 'g', ...}