Logo

Essentials

Folders and Documents

Project Specific

An OmegaT+ project is made up of a number of folders(directories) and documents.

Folders

<project name>

This folder is created at project creation time (<project name> is replaced by a name entered by the user when creating a project). It is the top-level folder for the other folders and documents in a project.

OmegaT+

Contains project.tmx, metrics.txt, and timestamped versions of project.tmx in the form project.tmx.<time-stamp>.bak, with <time-stamp> replaced by the actual time it was created. Both project.tmx and project.tmx.<time-stamp>.bak are TMX documents.

glossaries

Holds any glossaries to be used in the project. Glossary documents must be placed in this directory before a project is opened. The glossary format is plain-text rows of three columns as either comma separated values (CSV) or tab separated values (TSV) and in the default operating system encoding (i.e., *.csv, or *.tsv file extensions) or UTF-8 (i.e., *.csv8, or *.tsv8 file extensions). The three columns correspond to an original term, translation term, and a comment or description, respectively. These documents can easily be created (exported) or read (imported) by most spreadsheet applications.

originals

Documents for translation are placed into this folder before a project is opened. All translatable documents in this folder and its sub-folders are included for translation in the project. Translatable documents are those that are supported by OmegaT+ with appropriate document filters, these documents must use appropriate file extensions to be used.

Note: the term original is used, in place of the de-facto term source, to refer to a document to be translated.

translations

Translation documents and non-translation documents (those copied over directly from originals without change) are located here after generating translations.

Note: the term translation is used, in place of the de-facto term target, to refer to a translated document.

tms

Legacy translation memories to be utilized for matching are placed into this folder before a project is opened. Only translation memories in TMX Level 1 (version 1.1-1.4b) format.

Folders Configuration

The project folders are created at the same time as a project and, by default, they are grouped directly under the top-level project folder in specific locations. Users may alter the locations of project folders when they are creating a new project or by choosing the project configuration dialog from the Project menu.

Changing the folder locations from the defaults may be useful for some users who, for example, may choose to keep a larger number of translation memories in a particular directory location (e.g., as a TM repository) and select this directory when they create a new project rather than having to copy translation memories into a new project's tms folder.

To change the location of directories after a project has been created it is necessary to alter a project's accepted structure from its default settings.

First, decide whether you want to move or copy folders. If you want to move these then ensure that a project is not open. Then move one or more folders and their documents to a new location.

Upon re-opening the project, OmegaT+ will detect that some folders are missing, and will present the project configuration dialog again so that new locations can be chosen.

If you decide to copy instead, then this may be done while the project is open or closed. Open the project configuration dialog after copying the folders to new locations. Change the folder locations in the dialog and apply the changes. The project will have to be closed and re-opened at this point.

Documents

Project Configuration (project.xml)

This is an XML document that is created automatically by OmegaT+ when a new project is created. It is located in the top-level folder of a project.

TMX exports

<project name>-level1.tmx
<project name>-level2.tmx
<project name>-omegatplus.tmx

These are the TMX documents exported from the project when generating translation memories or translation documents. These are saved in the top-level folder of the project. These documents have all the same segments as the latest generation of the translation documents.

<project name>-level1.tmx corresponds to a TMX level 1 version, <project name>-level2.tmx to TMX level 2, and <project name>-omegatplus.tmx is an OmegaT+ specific TMX containing certain tags that the program understands.

Note: segments from legacy TMX documents, those in tms, are not imported into the project and then exported into the project's TMX. Segments have to be used in a project for them to end up in the exported TMX. For example, by copying/pasting them as matches into an active segment and generating the translations or translation memories afterwards.

Project TMX (project.tmx)

This TMX document, located in OmegaT+, is initially created when a project has first been opened. Segments are added to it when translations or translation memories are generated. It is updated when new segments are available from attempting to generate these again (manual generation or automatic periodic generation by the program). It serves as the backing storage for the in-memory translation memory of a project. When a project is opened this document's contents are read into memory and used for matching segments.

Project TMX Backups (project.tmx.<time-stamp>.bak)

From one to ten backup files of project.tmx may exist in OmegaT+ in the form project.tmx.<time-stamp>.bak with <time-stamp> replaced by an actual time-stamp of when it was saved. A new backup is created each time a project is re-opened for use, with the oldest one past the maximum of ten backup files being deleted at that point. This occurs before changes have taken place in the project that can effect the translation memory document of the project.

Metrics (metrics.txt)

Located in OmegaT+, this document contains some basic data measurements for the project and its documents. For instance, word count, number of characters, etc.

OmegaT+

Installation and use of OmegaT+ involves some specific folders that contain the main files of the application and configuration/informational documents. Their location depends to a certain extent on how OmegaT+ was installed and on what operating system is used.

The installation will usually be in a folder named similar to OmegaT+<version number>, with <version number> replaced by the actual version number of the program. Installation on Linux via RPM will install into /opt/OmegaT+ by default and installation on Mac OS X from .dmg package will be as a OmegaT+.app folder into a user selected location.

Regardless of the name of the top-level installation directory, much of the contents is the same.

Folders

Installation

The installation folder will contain the following sub-folders. Some differences in their location may occur depending on installation method. The Mac OS X .dmg that provides a .app folder has the most differences due to Mac specific requirements needed to launch the application. In this case, the sub-folders are located in another sub-folder, Resources/Java, inside OmegaT+.app

doc

This folder is only available when using offline documentation. The recommended and default setup of OmegaT+ does not use this folder anymore. Online documentation is preferred due to the availability of faster updates and corrections. When desired, the separate package of offline documentation can be downloaded and this folder doc can be extracted into the top-level OmegaT+ install folder.

Program documentation, when used in this folder, is contained in locale specific sub-folders in HTML format (except for the default English version). These can be opened in the users web browser for view from within the application (from the Help menu when Use Offline Documentation is selected from the Settings menu) or opened in any external web browser directly.

lib

Program libraries and related needed for the application to run. Not to be deleted or altered in regular use.

localizations

Localization documents for the application using different locales/languages. These documents can be altered, but only when a new localization for the program itself is being created. Not to be deleted or altered in regular use.

Configuration/Informational

In most cases OmegaT+ creates a specific folder to hold some documents related to the overall configuration of the application, and a few informational documents. If it cannot create this folder for some reason then these documents will be located in the users home directory. The default folder used depends on the operating system.

Where <user home> and <user name> are replaced with the appropriate user home directory path and user name, respectively.

Files

Installation

The installation folder for the basic installation (i.e., OmegaT+<version>.7z) contains these files. Contents of sub-folders are left out. This information is of use to know how to start the application and to point to user documentation.

The installation folders for the Linux and Windows packages (i.e., OmegaT+<version>-Linux-x86.7z and OmegaT+<version>-Windows.7z) have a few differences. A few files are added or left out. This information is of use to know how to start the application.

The Mac OS X .app folder (i.e., from OmegaT+<version>.dmg) contains a number of differences from the basic installation or the Linux/Windows packages. These are not that relevant to the user for starting the application. Just click on the OmegaT+.app to launch the application like any other OS X program. The .dmg for installation contains the License-GPL.txt, changes.txt, and readme.txt files for informational purposes.

Configuration/Informational

OmegaT+.log: standard log file. Contains messages about errors and other actions taken by the application during use. Users can report errors to the OmegaT+ project with reference to their log file.

OmegaT+.wksp: docking desktop configuration XML file. Saved after each use of program and read in at start-up, if available. Should not be edited. Removal of this file will result in loss of desktop settings that differ from the built-in program defaults.

OmegaT+.xml: general configuration XML file. Saved after each use of program and read in at start-up, if available. Should not usually be edited. Removal of this file will result in loss of the users specific program settings that differ from the built-in defaults.

filters.xml: document filters configuration XML file. Saved after each use of program if any modifications are made. Read in when opening a project. Should not be edited. Removal of this file will result in loss of filter settings that differ from the built-in program defaults.

Translation

The primary use of OmegaT+ is in the translation of documents. This is performed within the translation editor area of the application. The editor provides the user with the text of the original (source) document divided up into segments. These are vertically space separated and are numbered, with an indicator viewable in the title bar of the editor or briefly in the status bar of the application after activating a segment.

The active segment (the one currently editable) in the editor shows the original text and the current translation below it, or, depending on the users settings, a copy of the original or a blank area if a translation does not exist yet. Whether there is a copy of the original or a blank depends upon the Insert Original setting in the Segment > Options menu. The original text is rendered in blue. The text area below the original text is where editing takes place. This area acts as an editor in the usual manner, responding to keyboard input, shortcuts, and so forth. Once a translation of a segment has been made a user simply presses <Enter> to accept the translation. The next segment is then activated and translation continues in this manner.

Note: in the current version you must move to another segment for the translation to ultimately be generated in the project's TMX or out to a document. Not doing so and quitting the application will result in loss of any translation made for the active segment. This will be fixed in a future version.

To access other document segments, other than in a strictly forward sequential order, simply click on a segment anywhere in the editor to make it the active one, use the Previous Segment, Next Segment, Next Untranslated Segment, or Go to Segment functions under the Segments menu. Access segments in other documents by selecting another document in the Documents View, using the Previous Document or Next Document functions under the Documents menu, or using the Go to Segment feature for a segment number that is out of range of the current document in view.

This is the most basic functionality of the editor. It by no means shows the full functionality, that encompasses the simple operations described here and those contributed by the translation memory integrated into OmegaT+.

Editing Functions

All the editing functions provided can be applied to the active segment’s translation text, but only a few can be used on the active segment's original text since it is not editable. Only the current segment can be edited at any particular instance. Text outside the active one can otherwise be selected, copied, or pasted. Typical mouse operations, cursor keys, cursor key shortcuts, and shortcuts (copy, cut, and paste) are supported. Specific functionality can vary between some operating systems.

Undo and Redo actions are possible using <Ctrl-Z> and <Ctrl-Alt-Z>, respectively (Mac OS X: the <Command> key is usually used in place of <Ctrl>, but note that there are more differences than this when it comes to keyboard shortcuts. The exact shortcuts are shown in the applications menus).

For matches returned for the active segment (viewable in the Matches View) it is possible to:

(see the sections on Matching for more information on inexact matches).

When working on formatted documents the translation text field will, depending on the particular segment, contain OmegaT+ specific tags that represent the formatting structure of the original document. Be aware that it is not possible to fully edit textual formatting during translation. Only a very limited degree of control can be exercised over it. Formatting of the underlying structured document format, such as OpenDocument, OpenOffice.org, XHTML, HTML, and so forth, is, and must be, retained. Specifically, while formatting tags in these document segments can be moved around or have text added or removed from them, the tags should not be removed and extra ones should not be added.

If extra formatting is required in a translated document, one way to achieve this is to add in very specific textual markers to note the locations in the document where formatting is required. Generate the translation and then format the document externally with another application by finding the markers.

Segmentation

Segmentation is the process of dividing a text up into smaller pieces, the segments. How a text is segmented depends on the level of segmentation used. In general for MAHT(CAT) tools, text is usually segmented at either the paragraph level (via the text's layout) or the sentence level. The paragraph level is quite useable, but it is not fine-grained enough for efficient matching and reuse of translations in most cases.

Sentence-level segmentation provides the possibility of increased matches and more reuse. It comes with the added overhead of needing various rules to be defined by which to perform the segmentation. These rules can ensure that certain text is not mistaken as delimiting sentences when it should not and that other text that does constitute sentences are not overlooked.

This version of OmegaT+ provides support for sentence segmentation and layout-based paragraph segmentation (that can be used for any segmentation desired, but is very rudimentary). It is also possible to have a text segmented at higher or lower levels than paragraph or sentence level; i.e. block-level and phrase-level, respectively. These levels are not dealt with in OmegaT+, a few points are given next for the sake of interest.

Block-level segments are almost useless for translation memory because they are too large and many sub-parts of the segment could better match smaller pieces of text. Phrase-level segmentation could be useful in some situations. This is almost the exact opposite of block level, with too many matches and the need for many rules by which to decide whether a portion of text is a phrase or not. Phrase-level segmentation can be mimicked in sentence segmentation by adding more fine-grained rules if desired. Doing this will lead to more smaller segments and matches. So there will eventually occur a point of diminishing returns where one expends much more effort to get work done. Overall, phrase level segmentation borders on terminology use, rather than translation. That is, it could be put to better use on phrases of a very low number of words (e.g. 1-5).

Upon opening a project, document text is subjected to a segmentation process before it is displayed. In this process, segment boundaries are set. These boundaries are immutable and cannot be changed from within the application during translation.

If bad segmentation occurs, one possible remedy is to close the project and externally modify the documents containing the errors. This may require manipulating sentence and paragraph boundaries. Basically, just make sure it displays properly in its original format and space or lay things out better.

Sentence Segmentation

In the usual sentence-level segmentation, the default setup to use when a project is created, rules are used to determine how segmentation should be performed.

OmegaT+ comes with some default sentence segmentation rules already set for a number of languages. The degree of coverage of these rules varies between languages. Customizing the rules may be necessary if you notice bad segmentation. To customize the rules, close any open project, open the Segmentation Configuration dialog under the Settings menu, and adjust the rules (create, delete, correct, etc.). Any changes will apply to the next project open.

Layout-Based Segmentation

When sentence segmentation is disabled in a project. The layout format of a document determines segmentation. This means that hard returns (carriage returns, line feeds) will indicate where segment breaks occur. Punctuation marks and other possible delimiters will not determine segmentation. This is only applicable to human-readable text, not to other document components, such as images, tables, and other non-text elements.

Layout-based segmentation originated in the legacy code that OmegaT+ is based on when it did not support sentence segmentation. Use it at your own discretion, it may or may not be useful. One caveat of the older layout-based segmentation is that users may created TMX with segmentation type set to paragraph when the segments may or may not correspond to real paragraphs. Thus, users may have created numerous non-standardized TMX in this fashion. These will need to be properly recreated by users to be compliant will the TMX standard and so that they may correctly leverage their translations for use in other documents. Bitext2tmx is one application (OmegaT+ project) that can help you to revise your TMX.

Text in tables, spreadsheets, presentations can be roughly considered to correspond to the sentence level. OmegaT+'s segmentation will reflect the original document's structure. The distinction between paragraph and sentence in this case is irrelevant. If sentence segmentation is disabled when creating a project the segmentation will depend upon document type and text layout.

Text Files Segmentation

Text file segmentation will occur at line breaks. These are sometimes located at the end of actual sentences and usually after paragraphs. The formatting of a document can alter which segments are shown when layout-based segmentation is used. In this situation, block-level, paragraph-level, sentence-level, or phrase-level segments can result. That is to say, segmentation can be forced to any level by inserting line breaks in a document at a desired location.

Please check the formatting of an original text prior to opening it in a project if it may contain carriage returns, line breaks, or other formatting at bad locations that could lead to poor segmentation or presentation.

Tagged Documents Segmentation

Tagged (formatted) documents to OmegaT+ are those that include markup, such as: OpenDocument/OpenOffice.org, HTML/XHTML. Usually, these documents possess two tag types: in line (defines the text style); block level (describes logical structure). Segmentation occurs at block level tags in these formatted documents.

Other Documents Segmentation

OmegaT+ also supports Java properties files, used for resource bundles, configuration, and other purposes by Java applications. These use one line per text item (character string) to be translated when used as resource bundles for Java application localization.

Translation Memory

The most significant feature of OmegaT+ is its incorporation of translation memory capabilities. These capabilities enable the user to reap the benefits of previous translations for reuse in their work. This feature saves much time and prevents duplication of effort.

When previous translations exists for segments they are automatically propagated into a document without the users intervention. This occurs for all identical original segments within a document and throughout all documents of a project. When a document is opened in the editor the user will not see all the original text displayed in the segments anymore. They will see the translations that have been completed along with any original segments that remain to be translated. The active segment will show both the original text and the translated text, this allows the user to easily verify that the translation is correct and edit as necessary with the original text for the segment in full view.

Translation Memory eXchange (TMX)

In OmegaT+, translation memory is stored externally in Translation Memory eXchange (TMX) format. During operation, a project's TMX is read into the program and written back out with any changes made. TMX is a standard format for translation memory data. The main purpose of TMX is to store translations so that they can be exchanged between translators and businesses and used with different tools that support the TMX standard.

TMX is maintained by the Localization Industry Standards Association (LISA) at http://www.lisa.org.

The operation of translation memory and matching is inseparable. The next section deals with matching and how it is used in OmegaT+.

Matching

Matching is the act of comparing two or more pieces of data for equality. In terms of OmegaT+ this means comparing two pieces of textual data in the form of segments.

Terminology

Exact Matches

Exact matches are just that, exact. Every character in two segments match one another. In the present case of OmegaT+ this comes into play with the translation memory function for auto-propagating previous translations into segments for reuse or listing them in the Matches View for selection and reuse. The difference between these two ways of working with exact matches is that the source of the matches are not the same. Auto-propagated matches are presently only taken from the project's translation memory store (i.e., project.tmx in the OmegaT+ folder), whereas the exact matches listed in Matches View are from external translation memory storage (i.e., TMX in tms).

Exact matches are also referred to as full or perfect matches.

Inexact Matches

Inexact, or approximate, matches are those that are less than 100% matches. That is, not all characters in two segments match one another.

In OmegaT+, these matches are not auto-propagated into a document or project and must be selected by the user for insertion. They are always listed in the Matches View for selection, along with other matches and a percentage match value.

Search

Starting with OmegaT+ version 1.0.M2.2, there is a new docking desktop view Search View to make searching easier. This is visible in the main window image given earlier on this page. This replicates the functionality of the Search Window, which still exists from previous versions.

Search View

This view provides the same functionality as the Search Window. The differences are that there is only one view (you cannot have multiple search views, unlike the Search Window) and to change the search settings you right click the mouse to bring up a context menu. Here is a shot of the view (floating) with the menu visible.

Search Window

Search Window

This window provides functionality to search through project documents and TMX to find text and to activate any result segments in the Translation Editor that are returned by double-clicked on them in the window.

Starting with OmegaT+ version 1.0.M2.2, this window has been redesigned for better usage. The window has a split pane (with little arrow heads) that separates the search settings from the search entry/results areas. Click on the appropriate arrow to expand or contract the settings area. That's nicer than the old version, less clutter and more area devoted to the search results when desired. This should make it easier to see the search results when you have overlapping windows on your desktop.

Search Window

An OmegaT+ search window can be opened with the F7 key. It is also possible to open the window with some pre-selected text from the active segment or copy and paste. Selecting text with the mouse or keyboard and pressing F7 will result in the text being automatically entered into the search text area when the window opens.

Location

By default, search is done in the project only, within the original and translation parts of that project. Using the options provided it is possible to also search in a combination of the project, external translation memories (Search TMs option), a single document or a folder of documents (Search documents option). Search will only be done on supported document types.

Type

Exact

Exact search finds ordered sequences of words. All exact matches with respect to order of the words will be displayed. This is an exclusive search.

Keyword

Keyword search finds unordered sequences of words. All matches without respect for order of the words will be displayed. Only the matches that contain all the keywords will be displayed. This is an inclusive search.

Options

There are three options for the search types provided:

Glossaries

A simple function is provided that can be used to view and select terms from user provided bilingual glossaries.

Usage

The Glossaries View shows terms found in glossaries that match terms in the active segment of the translation editor. These are shown as three columns on a line: original term, translated term, and a related description (optional). The user can select terms in the view with their mouse in the usual manner and copy/paste them into the active segment at the cursor location. There is no facility as of yet to add or edit terms from within the program.

To make use of glossaries, documents in the appropriate format containing the terms must be placed in the glossaries folder of a project, or the relevant location if it has been altered from the default.

Glossary Format

The format used for glossaries is a limited form of the standard Comma Separated Values (CSV) format that is available for use in most spreadsheet applications. The limitation is that there can only be three items/cells of data per line.

An example of terms in a glossary file are:

Monday, Lundi, first day of week
Noon, Midi, a time of day reference

Tab separated values (TSV) can also be used for glossaries. The difference is that a tab separates each value, rather than a comma.

The character encoding of a glossary file must be in UTF-8 or the default encoding under the system's locale. In Western countries this will usually be one of those in the ISO-8859 family (e.g. 8859-1, Latin1). In general, it is best to use UTF-8 when working with any bilingual or multilingual documents to avoid problems reading or displaying the contents.

Glossary Creation

Just open an editor, or better yet use a spreadsheet program. Create a new document with your terms added in the described format. Save the document with the proper file extension. Ensure that the encoding used is correct when doing this. Use .csv8 for UTF-8 and .csv for the default encoding. If you use a spreadsheet application, then you will most likely have to look for an export function to save out the data in the right format. Most programs do export to CSV. Put the glossary into the glossaries folder used for a project and open the project.

For TSV format, the difference is that the file extensions are .tsv8 for UTF-8 and .tsv for the default encoding.

Documents and File Formats

Documents

For OmegaT+, documents are the supported types of file formats containing text intended to be translated. These are the only ones available from within the application to translate. All other files (contained in originals) are not read by the program, although they will be copied to the translations folder as is when the translations are generated.

Files

From the point of view of OmegaT+, a project contains files, some of which are to be translated (documents) and others which are not (some other unsupported or non-document file types). These types are thus kept separate and there is no need to consider untranslatable files in the context of the translation process.

Supported Document Formats

OmegaT+ supports the following document formats for translation: Android Resource, DocBook, HHC (HTML Help Compiler), HTML/XHTML, Java properties(.properties), INI(Key/value pairs), ODF(OpenDocument), OOo(OpenOffice.org), OOXML(Office Open XML), PO(Portable Object) ResX, SRT(SubRip), Text, XLIFF, XTag(CopyFlow Gold).

Documents with other extensions can also be opened if they are of a supported type and their file extension are changed to one of the default extensions.

Non-translatable files in originals are copied straight into translations without modifications when translations are generated.

Tagged Documents

OmegaT+ displays text formatting in segments, of supported formats (e.g., HTML, XHTML, OpenDocument, OpenOffice.org), as tags. The displayed tags in these segments are not the ones found in the original format. The formatting is converted to OmegaT+ specific tags when the document is opened that can be manipulated by the user. The formatting is converted back to its original form when the translations are generated.

Tags in the active segment can be edited by the user. Due diligence should be exercised when doing this since errors in the formatting tags can break proper formatting of translation documents such that they may become corrupted and not open externally.

OmegaT+ cannot be used to provide custom formatting, but a certain amount editing may be done to the existing format.

OpenDocument/OpenOffice.org

Tags in each segment are paired as numbered beginning <f> and ending </f> tags (numbers start from 0). These tags must be retained as is to maintain the integrity of the output translation.

Although these tags cannot be changed in general, they can be moved to a different location in the segment (while maintaining order and number of tags in the segment).

Example: to remove formatting from one part of a segment. Given the tagged sentence A text in <f0>italics</f0>. It can be changed to A text in <f0></f0>italics. The italics on italics has effectively been removed.

HTML/XHTML

In these formats the start of formatting is indicated by a beginning tag like <x>. The finish of formatting is indicated by a corresponding ending tag </x> (x is a placeholder for the actual character shown in a tag). Tags use matching numbers that start from 0.

Tag Rules

Tag deletion is allowed. Example: to delete the italics from A text in <x0>italics</x0> take out the tags <x0> and </x0>.

Tag duplication is allowed. Example: in A text in <x0>italics</x0>, and also <x1>this</x1> this has the same italics formatting.

Tag order change is allowed. Example: A text in <x0>italics</x0>, that is <x1> underlined</x1> could become A text <x1>underlined</x1>, that is in <x0>italics</x0>.

Tag nesting is allowed. Example: A text in <x0>italics</x0>, that is <x1>underlined</x1> could become A text in <x0>italics and <x1>that is also underlined</x1></x0>

Overlapped tag nesting is not allowed. Doing so can corrupt the translation document generated. Example: A text in <x0>italics, that is also <x1>underlined</x0>, another text that is underlined and not in italics</x1> Nested tags must have their ending tags come before the ending tags of another tag.

Tag Validation

Tags in tagged documents can be validate using the Validate Tags function under the Document or Project menu. Any suspected tag errors in documents will be displayed in the Tag Validation dialog and the applications status bar will indicate that possible errors have been found. When there are no errors a simple OK message is shown in the status bar.

Automatic Tag Validation

Tag validation can be done manually, by invoking the function from the appropriate menus at any time, or by setting it to automatically be performed at generation of translations using Validate Tags (auto) under the Settings menu.

Errors

If problems occur when attempting to open a generated translation (of a tagged format), then validating tags and fixing errors usually will correct the problem.

Logo
Get OmegaT+ CAT Tools at SourceForge.net. Fast, secure and Free Open Source software downloads