|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectorg.annolab.tt4j.TreeTaggerWrapper<O>
O - the token type.public class TreeTaggerWrapper<O>
Main TreeTagger wrapper class. One TreeTagger process will be created and
maintained for each instance of this class. The associated process will be
terminated and restarted automatically if the model is changed
(setModel(String)). Otherwise the process remains running,
in the background once it is started which saves a lot of time. The process
remains dormant while not used and only consumes some memory, but no CPU
while it is not used.
During analysis, two threads are used to communicate with the TreeTagger. One process writes tokens to the TreeTagger process, while the other receives the analyzed tokens.
For easy integration into application, this class takes any object containing
token information and either uses its Object.toString() method or
an TokenAdapter set using setAdapter(TokenAdapter) to extract
the actual token. To receive the an analyzed token, set a custom
TokenHandler using setHandler(TokenHandler).
Per default the TreeTagger executable is searched for in the directories
indicated by the system propery treetagger.home, the
environment variables TREETAGGER_HOME and TAGDIR
in this order. A full path to a model file optionally appended by a
: and the model encoding is expected by the setModel(String)
method.
For additional flexibility, register a custom ExecutableResolver
using setExecutableProvider(ExecutableResolver) or a custom
ModelResolver using setModelProvider(ModelResolver). Custom
providers may extract models and executable from archives or download them
from some location and temporarily or permanently install them in the file
system. A custom model resolver may also be used to resolve a language code
(e.g. en) to a particular model.
A simple illustration of how to use this class:
TreeTaggerWrapper tt = new TreeTaggerWrapper(); try { tt.setModel("/treetagger/models/english.par:iso8859-1"); tt.setHandler(new TokenHandler () { void token(String token, String pos, String lemma) { System.out.println(token+"\t"+pos+"\t"+lemma); } }); tt.process(asList(new String[] {"This", "is", "a", "test", "."})); } finally { tt.destroy(); }
| Field Summary | |
|---|---|
static boolean |
TRACE
|
| Constructor Summary | |
|---|---|
TreeTaggerWrapper()
|
|
| Method Summary | |
|---|---|
void |
destroy()
Stop the TreeTagger process and clean up the model and executable. |
protected void |
finalize()
|
String[] |
getArguments()
|
Model |
getModel()
Get the currently set model. |
PlatformDetector |
getPlatformDetector()
Get platform information. |
int |
getRestartCount()
Get the number of times a TreeTagger process was started. |
String |
getStatus()
|
void |
process(Collection<O> aTokenList)
Process the given list of token objects. |
protected Collection<O> |
removeProblematicTokens(Collection<O> aTokenList)
|
void |
setAdapter(TokenAdapter<O> aAdapter)
Set a TokenAdapter used to extract the token string from
a token objects passed to process(Collection). |
void |
setArguments(String[] aArgs)
Set the arguments that are passed to the TreeTagger executable. |
void |
setEpsilon(Double aEpsilon)
Set minimal tag frequency to epsilon |
void |
setExecutableProvider(ExecutableResolver aExeProvider)
Set a custom executable resolver. |
void |
setHandler(TokenHandler<O> aHandler)
Set a TokenHandler to receive the analyzed tokens. |
void |
setHyphenHeuristics(boolean hyphenHeuristics)
Turn on the heuristics fur guessing the parts of speech of unknown hyphenated words. |
void |
setModel(String modelName)
Load the model with the given name. |
void |
setModelProvider(ModelResolver aModelProvider)
Set a custom model resolver. |
void |
setPerformanceMode(boolean performanceMode)
Disable some sanity checks, e.g. whether tokens contain line breaks (which is not allowed). |
void |
setPlatformDetector(PlatformDetector aPlatform)
Set platform information. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
public static boolean TRACE
| Constructor Detail |
|---|
public TreeTaggerWrapper()
| Method Detail |
|---|
public void setPerformanceMode(boolean performanceMode)
performanceMode - public void setArguments(String[] aArgs)
aArgs - the arguments.public String[] getArguments()
public void setEpsilon(Double aEpsilon)
epsilon
aEpsilon - epsilonpublic void setHyphenHeuristics(boolean hyphenHeuristics)
hyphenHeuristics - use hyphen heuristics.public void setModelProvider(ModelResolver aModelProvider)
aModelProvider - a model resolver.public void setExecutableProvider(ExecutableResolver aExeProvider)
aExeProvider - a executable resolver.public void setHandler(TokenHandler<O> aHandler)
TokenHandler to receive the analyzed tokens.
aHandler - a token handler.public void setAdapter(TokenAdapter<O> aAdapter)
TokenAdapter used to extract the token string from
a token objects passed to process(Collection). If no adapter
is set, the Object.toString() method is used.
aAdapter - the adapter.public void setPlatformDetector(PlatformDetector aPlatform)
aPlatform - the platform information.public PlatformDetector getPlatformDetector()
public void setModel(String modelName)
throws IOException
modelName - the name of the model.
IOException - if the model can not be found.public Model getModel()
public void destroy()
protected void finalize()
throws Throwable
finalize in class ObjectThrowable
public void process(Collection<O> aTokenList)
throws IOException,
TreeTaggerException
aTokens - the token objects.
IOException - if there is a problem providing the model or executable.
TreeTaggerException - if there is a problem communication with TreeTagger.protected Collection<O> removeProblematicTokens(Collection<O> aTokenList)
aTokenList -
public String getStatus()
public int getRestartCount()
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||