Incompatible changes
Known Issues
Major Changes
-
Replaced JPedal with PDFbox as a basis for the PDF converter. This massively
improves the performance of the conversion. The layout recognition code has
been ported to use PDFbox now. This affects the PDF Converter Plug-in for
Eclipse as well as the automatic PDF conversion performed by AnnoLab when
importing PDF files or accessing PDF files through a BibTeX-based datastore.
Additionally the Eclipse PDF Converter can now optionally remove hyphenation
during the conversion. For this it needs a dictionary file in UTF-8 which
contains a list of expected words, one word per line. Hyphens are only removed
if the word resulting form the removal can be found in the dictionary.
-
A lot of effort has gone into performance optimization concerning the PDF
import and the 'export' command. The IntegratedRepresentationBuilder on which
the export command is based will now perform an operation within 6 minutes for
which before it needed over 40 minutes. The performance improvement of the PDF
importer is even more drastic, but has not been measured.
-
Metadata is back! The metadata available from the BibTeXFileDatastore is now
transferred from there into the an eXist database or into exported files.
That means now it is possible again to query on metadata. It also allows the
metadata to be part of the DaSciTex 0.1 release.
Minor Changes
- [613] New CLI module 'tueba' to load data provided by the C2 Projekt.
- [622] AnnoLab CLI is a real Eclipse runtime application now, not just a bundle.
- [563] Tightened compiler and Javadoc errors/warnings on all projects to improve code quality
- [537] CLI module 'export': Added 'trim' and 'drop-empty' switches.
- [538] CLI module 'export': Include 'role' attribute for integrated layers.
- [539] CLI module 'lemmatizer': Allow to run without specifying any formatting XSLT.
- [595] CLI module 'export': Added missing "w " in tag for WordSmith export.
- [575] Core: Dropped the release() methods which did not do anything anyway but eat performance.
- [614] Core: Factored the query template code out of the query CLI command so it can be used by WebLab Mk 3.
- [571] eXist module: Removed the default indexes for segments as it is not wise anyway to search for segments.
- [625] eXist module: Upgrade to a new version of eXist with some API changes.
Bug Fixes
- [619] Core: No longer use a single AnnoLabRootSource for all contexts. Removes problems with WebLab Mk3.
- [541] Core: Segmentation parser doesn't deliver cover segment after reset().
- [543] TreeTagger module: Fixed NPE in case module is shut down without ever really running the tagger process. Thanks Mônica.
- [547] TreeTagger PEAR: Run treetagger directly inside the AE instead of relying on the TreeTaggerModule.
- [536] UIMA module: Updated uimautil.jar fixes bug: one character tokens are dropped.
- [573] UIMA module: Some fixes to the importing of annotations from XML to the CAS.
- [596] UIMA module: Added workaround to a bug in CookXML 3.0.1 which causes it not to use the most specific add() method for any given class.
Known Issues
| Resolved in version |
| pending |
Whenever you use AnnoLab CLI to produce output to the file system, it
is a good idea to create a new directory and tell AnnoLab to dump the
output there, otherwise AnnoLab may overwrite data. For example:
mkdir out
annolab lemmatize process --format builtin:bnc english test.txt out/
Note that the trailing "/" after "out" indicates to dump the stuff IN the
directory.
| Resolved in version |
| AnnoLab DaSciTeX Companion Release 0.1.1 20080313 |
If a problem occurs it may happen that AnnoLab simply hangs instead of
printing an exception report and exiting.
Workaround: when in doubt wait a while and then press CTRL-C to terminate to
CLI process. After that check for a recent '.log' file in the 'configuration'
folder of your AnnoLab installation. Any error gets logged there.
| Resolved in version |
| pending |
At may happen that a TreeTagger process suddenly gets lost and thus. I have
no idea what causes this, possible a memory shortage. So far I have not been
able to find a way to properly reproduce the problem.
Workaround: Simply try again. Chances are good that next time you try it
will run just fine.
| Resolved in version |
| pending |
If you use a BibTeXFileDatastore and in conjunction with a Subversion
repository, you should make sure you subversion client is configured to use
the commit times as file time stamps during checkout. If you do not do this,
the BibTexFileDatastore may get stuck in an endless loop trying to figure
out which source file is the most recent.
You can configure this in the your subversion config file.
This file can be found on Unix in
$HOME/.subversion/config and on Windows 2000/XP in
C:\Documents and Settings\{USERNAME}\Application Data\Subversion\config
(note that drive letter can be different and path may vary if you use a
non-english version of Windows). Make sure you uncomment 'use-commit-times'
setting as shown here:
### Set use-commit-times to make checkout/update/switch/revert
### put last-committed timestamps on every file touched.
use-commit-times = yes
If you have already done a check-out, you need to do it again from scratch
for the setting to be effective.