Applications
- OmegaT+: translation editor/processor (CAT) tool with translation memory and other features (follow link to separate page);
- bitext2tmx: bitext aligner/converter to TMX;
- bitextor: webpage to bitext creator;
- LanguageTool: language checker
- PDFedit: Editor for manipulating PDF documents;
- pdftk: Toolkit to work with PDFs in various ways;
- Tag Aligner: webpage to bitext aligner;
- Validator: TMX format validation application.
bitext2tmx
This application can, at present, take two text files (*.txt), that are mutual translations (an original language and a translation) of each other, or a TMX contaning translations, and be used to produce a TMX for use in other translation or language software that supports it. Upon loading of the text files, the application works to align the translations to the best of its ability, afterwhich the user can edit the text segments and save a TMX of the aligned segments.
Status: preliminary version 1.0M0, going towards a full 1.0 version, is available.
Homepage:
http://bitext2tmx.sf.net
Project page:
http://sf.net/projects/bitext2tmx
bitextor
Builds parallel text corpora from webpages. Uses websites as the source of text. Analyzes webpage text for bitexts. Presently works with es, ca, gl, pt, and en languages. Can easily be extended to support new languages.
Status: C++ source code and binary(Linux only) packages.
Homepage: N/A
Project page:
http://sf.net/projects/bitextor
LanguageTool
A language checker tool (current support for English, German, Polish, Dutch, etc.). It is a rule-based language tool to find errors for defined rules (in XML configuration files or Java). It is used to detect errors that spell checking cannot and can also detect certain grammar mistakes.
Status: Latest Java source code and binary packages for use as standalone applications or in conjunction with OpenOffice.org 2.x and 3.x.
Homepage: http://www.languagetool.org
PDFedit
An editor for manipulating PDF documents. Users can view PDF, investigate structure, extract text, and so forth.
Status: project currently has RPMs only (source, Mandriva). Packages for Mac OS X, Windows, may be available elsewhere (e.g. Fink for OS X) or the source code can be recompiled for other platforms (Qt framework required).
pdftk
A toolkit to work with PDFs in various ways that usually cannot be done with a regular reader/viewer application. Very useful in different situations. Perhaps you need to break up a large PDF into smaller more manageable ones, or merge small ones into a bigger one. How about programmatically extracting text for use in other tools, rather than using your PDF reader/viewer.
Status: project currently has RPMs only (source, Mandriva, SuSE 9.2). Packages for Mac OS X, Windows, may be available elsewhere (e.g. Fink for OS X).
Homepage:
http://www.accesspdf.com/pdftk
Project page: N/A
Tag Aligner
Parallel text aligner that uses webpage text and tag structure to improve alignments. It is language-independent, due to its geometric aligner that works based on sentence length. Generates alignments in TMX format.
Status: C++ source code and binary(Linux only) packages.
Homepage: N/A
Project page:
http://sf.net/projects/tag-aligner
Validator
An application to validate TMX. A derivative version of TMXValidator that uses Java Swing for the graphical user interface in place of the SWT toolkit. Requires Java.
Status: Preliminary release candidate (RC2) is available.
Documentation
No real documentation at present other than a simple readme file included in the download package. It's straightforward to use so there's not that much to say anyway. Just open a TMX to validate segments or clean (invalid UNICODE characters).
Requirements
A Java Runtime Environment(JRE) is needed in order to use Validator. Obtain this straight from this project or separately from Sun Microsystems or other vendor (on Linux there is a version called IcedTea or Blackdown available; this may or may not work for you). Please use a recent version.
Downloads
To download the application go to the OmegaT+ sourceforge project page. The latest package release is listed there (version 1.0). Or access the files section on that page to see all files available to date.
Packages for download are available in a number of formats:
- .7z: 7-zip archive (Linux, Windows)
- .tar.bz2: tar bzip2 archive (mostly UNIX/Linux, some MS applications can work with this, Mac?)
- .tgz or tar.gz: tar gzipped (mostly UNIX/Linux, works with some MS applications and with Mac OS)
Important Note
Zip file format is not provided on this project. The free 7-zip tool can be used to open most, if not all, packages that are provided.
Licence
Copyright (C) 2005-2009 by Raymond: Martin. This material may be distributed only subject to the terms and conditions set forth in the Open Publication License, v1.0 or later (the latest version is presently available at http://www.opencontent.org/openpub/). Distribution of substantively modified versions of this document is prohibited without the explicit permission of the copyright holder. Distribution of the work or derivative of the work in any standard (paper) book form is prohibited unless prior permission is obtained from the copyright holder.



