Save This Page
Home » lucene-2.3.2-src » org.apache » lucene » index » memory » [javadoc | source]
org.apache.lucene.index.memory
public class: SynonymMap [javadoc | source]
java.lang.Object
   org.apache.lucene.index.memory.SynonymMap
Loads the WordNet prolog file wn_s.pl into a thread-safe main-memory hash map that can be used for fast high-frequency lookups of synonyms for any given (lowercase) word string.

There holds: If B is a synonym for A (A -> B) then A is also a synonym for B (B -> A). There does not necessarily hold: A -> B, B -> C then A -> C.

Loading typically takes some 1.5 secs, so should be done only once per (server) program execution, using a singleton pattern. Once loaded, a synonym lookup via #getSynonyms(String) takes constant time O(1). A loaded default synonym map consumes about 10 MB main memory. An instance is immutable, hence thread-safe.

This implementation borrows some ideas from the Lucene Syns2Index demo that Dave Spencer originally contributed to Lucene. Dave's approach involved a persistent Lucene index which is suitable for occasional lookups or very large synonym tables, but considered unsuitable for high-frequency lookups of medium size synonym tables.

Example Usage:

String[] words = new String[] { "hard", "woods", "forest", "wolfish", "xxxx"};
SynonymMap map = new SynonymMap(new FileInputStream("samples/fulltext/wn_s.pl"));
for (int i = 0; i < words.length; i++) {
String[] synonyms = map.getSynonyms(words[i]);
System.out.println(words[i] + ":" + java.util.Arrays.asList(synonyms).toString());
}

Example output:
hard:[arduous, backbreaking, difficult, fermented, firmly, grueling, gruelling, heavily, heavy, intemperately, knockout, laborious, punishing, severe, severely, strong, toilsome, tough]
woods:[forest, wood]
forest:[afforest, timber, timberland, wood, woodland, woods]
wolfish:[edacious, esurient, rapacious, ravening, ravenous, voracious, wolflike]
xxxx:[]
Constructor:
 public SynonymMap(InputStream input) throws IOException 
    Constructs an instance, loading WordNet synonym data from the given input stream. Finally closes the stream. The words in the stream must be in UTF-8 or a compatible subset (for example ASCII, MacRoman, etc.).
    Parameters:
    input - the stream to read from (null indicates an empty synonym map)
    Throws:
    IOException - if an error occured while reading the stream.
Method from org.apache.lucene.index.memory.SynonymMap Summary:
analyze,   getSynonyms,   toString
Methods from java.lang.Object:
equals,   getClass,   hashCode,   notify,   notifyAll,   toString,   wait,   wait,   wait
Method from org.apache.lucene.index.memory.SynonymMap Detail:
 protected String analyze(String word) 
    Analyzes/transforms the given word on input stream loading. This default implementation simply lowercases the word. Override this method with a custom stemming algorithm or similar, if desired.
 public String[] getSynonyms(String word) 
    Returns the synonym set for the given word, sorted ascending.
 public String toString() 
    Returns a String representation of the index data for debugging purposes.