Applications
- bitext2tmx: bitext aligner/converter to TMX;
- bitextor: webpage to bitext creator;
-
extspell: simple spellchecker interface to Aspell; - OmegaT+: translation editor with TM and other features;
- omegat: translation editor with TM and other features(old OmegaT+ version);
- pdftk: Toolkit to work with PDFs in various ways;
-
sentseg: paragraph to sentence segment converter; - Tag Aligner: webpage to bitext aligner;
- Validator: TMX format validation application.
bitext2tmx
This application can, at present, take two text files (*.txt), that are mutual translations (an original language and a translation) of each other, or a TMX contaning translations, and be used to produce a TMX for use in other translation or language software that supports it. Upon loading of the text files, the application works to align the translations to the best of its ability, afterwhich the user can edit the text segments and save a TMX of the aligned segments.
Status: preliminary version 1.0M0, going towards a full 1.0 version, is available.
Homepage:
http://bitext2tmx.sf.net
Project page:
http://sf.net/projects/bitext2tmx
bitextor
Builds parallel text corpora from webpages. Uses websites as the source of text. Analyzes webpage text for bitexts. Presently works with es, ca, gl, pt, and en languages. Can easily be extended to support new languages.
Status: C++ source code and binary(Linux only) packages.
Homepage: N/A
Project page:
http://sf.net/projects/bitextor
extspell
This application has been removed; the original author neglected to include the GPL license with the distribution, thus making it a violation to distribute. A newer version is available, but OmegaT+ will not distribute it because a newer version of OmegaT+ will have spellchecking internally, plus Tk applications are dead ugly. Enough said.
A simple user interface to the Aspell spellchecker.
Will check spelling of text that is in the clipboard
when it receives the mouse focus. Requires Tcl/Tk for use.
Status: removed
OmegaT+
Step away from legacy-based omegat to a new generation tool. OmegaT+, a free MAHT tool, will have all the good features that users expect in a translation tool (translation memory, full and partial matches, glossary function, concordance search, support for various original document types, translation projects) without the inadequacies and quirks that other tools have.
Status:
OmegaT+ 1.0 (Ambrosia) is in development.
Note: version 1.0M1 has been taken down temporarily due to license violations propagated from the legacy code. It will be back soon. We want to ensure that users have all the rights they are entitled to under the GPL. The problem has been fixed, just updating a few little things for the next release.
Development version M1 is available. This is the first in a series of milestones that will culminate in a much improved translation tool. This version has been tested over a long period of time on real translation jobs. Many features are still missing and a few quirks exist, but it has proven to be reliable.
Development version M2 is in progress. We have a functional version, but more testing is required before release due to the many changes that were made after M1. This version is effectively frozen.
Development version M3 is in planning...
When completed, OmegaT+ will include a new user interface, sentence segmentation plus other expected functionality, and new features. A large effort is being put into revamping the underlying software architecture due to poor design from the legacy code. This will ensure that the application is more reliable and ready for future enhancements. Specific features and enhancements will be announced as the project proceeds.
Documentation
The documentation for the application will be made available through this site for convenience. Currently available,
- Quick Start - a short tutorial to get users started with OmegaT+;
- Locales - an overview of locale usage in OmegaT+, related information about locale codes, language codes, and references to some standards;
- File Filters - instructions on how to use the file filter functionality to deal with file encodings and naming.
Requirements
Please note that a Java Runtime Environment(JRE) is needed in order to use OmegaT+. This can be obtained from this project or separately from Sun Microsystems or other vendors. Please use a recent version. Check the OmegaT+ User Guide (accessible from this page) for details on installation. Mac OS X users already have a JRE installed by default.
For those doing or interested in software development in Java, a Java Software Development Kit(JDK) can also be used (has a JRE in it already).
Downloads
To download the application go to the OmegaT+ sourceforge project page. The latest package release is listed there (version 1.0M1). Or access the files section on that page to see all files available to date.
Packages for download are available in a number of formats:
- .7z: 7-zip archive (Linux, Windows, Mac)
- .dmg: disk image (Mac OS X)
- .rpm: RPM package (RedHat package manager) (Linux only)
- .tar.bz2: tar bzip2 archive (All platforms)
- .tar.gz: tar gzipped archive (All platforms)
omegat
omegat: cross-platform CAT tool with translation memory, fuzzy matches, glossary function, concordance search, support for a number of source file types (OpenDocument, OpenOffice.org, MS Word [via OpenOffice], HTML, XHTML, Java resource bundles, plain text), ongoing project support, file filters.
Status: stable version 1.4.6 is available. Developement version 1.OM1 of OmegaT+ (successor to omegat) is available now. OmegaT+ will include a new user interface, sentence segmentation, bug fixes, other minor changes, and most of the current functionality of OmegaT (selectively chosen to improve upon that work). Other inclusions may be announced at a later date.
Documentation
See the OmegaT+ documentation
pdftk
A toolkit to work with PDFs in various ways that usually cannot be done with a regular reader/viewer application. Very useful in different situations. Perhaps you need to break up a large PDF into smaller more manageable ones, or merge small ones into a bigger one. How about programmatically extracting text for use in other tools, rather than using your PDF reader/viewer.
Status: project currently has RPMs only (source, Mandriva, SuSE 9.2). Packages for Mac OS X, Windows, etc. may be included in future.
Homepage:
http://www.accesspdf.com/pdftk
Project page: N/A
sentseg
this application has been removed; the original author neglected to include the GPL license with the distribution, thus making it a violation to distribute. A newer version of OmegaT+ will have segmentation support internally, plus Tk applications are dead ugly.
Converts paragraph segmented files to sentence segmented ones.
This is useful when working with files that are to be translated
with omegat (specifically OpenOffice.org/OpenDocument types).
By default omegat segments by the paragraph, use this utility
to get around that. Requires Tcl/Tk for use.
Status: removed
Tag Aligner
Parallel text aligner that uses webpage text and tag structure to improve alignments. It is language-independent, due to its geometric aligner that works based on sentence length. Generates alignments in TMX format.
Status: C++ source code and binary(Linux only) packages.
Homepage: N/A
Project page:
http://sf.net/projects/tag-aligner
Validator
An application to validate TMX. A derivative version of TMXValidator that uses Java Swing for the graphical user interface in place of the SWT toolkit. Requires Java.
Status: Preliminary release candidate (RC1) is available.
Documentation
No real documentation at present other than a simple readme file included in the download package. It's straightforward to use so there's not that much to say anyway. Just open a TMX to validate segments or clean (invalid UNICODE characters).
Requirements
A Java Runtime Environment(JRE) is needed in order to use Validator. Obtain this straight from this project or separately from Sun Microsystems or other vendor (on Linux there is a version called IcedTea or Blackdown available; this may or may not work for you). Please use a recent version.
Downloads
To download the application go to the OmegaT+ sourceforge project page. The latest package release is listed there (version 1.0). Or access the files section on that page to see all files available to date.
Packages for download are available in a number of formats:
- .7z: 7-zip archive (Linux, Windows)
- .tar.bz2: tar bzip2 archive (mostly UNIX/Linux, some MS applications can work with this, Mac?)
- .tgz or tar.gz: tar gzipped (mostly UNIX/Linux, works with some MS applications and with Mac OS)
Important Note
Zip file format is not provided on this project. The free 7-zip tool can be used to open most, if not all, packages that are provided.
Last Updated: March 8, 2008



