RSS

How far we are from good MT system between Arabic/English?

In this post I will try to perfom some MT testing using Google translate and MS-Windows live translator service in order to evaluate its accuracy and – to some extend – the quality of the translation. This evaluation is not automated (i.e. based on my own human translation). The reason I wrote this post is to know how far we are from good MT system that can really provide the minimum acceptable translated text.

I will divide the evaluation to main two aspects, the first one is English related issues and second one is Arabic related issues.

I used two indicators, (1) translation accuracy to measure the translation from syntactic point of view. (2) translation quality to measure if this translation convey the message (semantic).

A. From English to Arabic:

  1. Space-Delimited Compound Words
    Source Language (SL):  Take off your jacket!
    Target Language (TL) – Google:   خلع سترته الخاصة بك! (he took off your jacket!)
    Target Language (TL) – MS:تقلع الغلاف الخارجي الخاص بك! (you took off your outer shell!)
    Tranlation Accuracy: Bad
    Translation Quality: Bad
  2. Compound Verbs with inserted words between the verb parts:
    SL: Take your jacket off!
    TL-Google:   خذ خارج الغلاف! (take out the shell!)
    TL-MS: تقلع الغلاف الخارجي الخاص بك! (take off your outer shell!)
    Tranlation Accuracy: Bad
    Translation Quality: Bad
  3. Hyphenated/Solid/Open:
    SL: I used a school-bus.
    TL-Google: كنت في حافلة المدرسة.  (I was in the school-bus)
    TL-MS:استخدام حافلة مدرسية. ( The use of the school-bus)
    Tranlation Accuracy: bad
    Translation Quality: good
  4. Hyphenated/Solid/Open:
    SL: New York is a big city in US.
    TL-Google: نيويورك هي مدينة كبيرة في الولايات المتحدة. (New York is a big city in US.)
    TL-MS:نيويورك مدينة كبيرة في الولايات المتحدة.(New York- is- a big city in US.)
    Tranlation Accuracy: Very good
    Translation Quality: Very good
  5. Textual Meaning:
    SL: Ali is a pure man.
    TL-Google: علي هو محض رجل.(Aly is a plain man)
    TL-MS:علي رجل محض.(Aly is a plain man)
    Tranlation Accuracy: bad
    Translation Quality: bad
  6. Textual Meaning:
    SL: I studied pure mathematics.
    TL-Google: درست الرياضيات البحتة.(I studied pure mathematics.)
    TL-MS:درست الرياضيات محض.(I studied plain mathematics.)
    Tranlation Accuracy:Very Good
    Translation Quality: Very Good
  7. Textual Meaning:
    SL: he is trying to contain acts.
    TL-Google:انه يحاول احتواء الأعمال.(he is trying to include acts)
    TL-MS:يحاول تحتوي على الأعمال.(trying including of acts.)
    Translated text meaning:
    Tranlation Accuracy:Bad
    Translation Quality: Bad
  8. Suggestive Meaning (Culture dependent translation):
    SL: The girl is as white as snow.
    TL-Google:الفتاة كما الثلج الأبيض(the girl is like the white ice)
    TL-MS:كما بيضاء كما الثلوج الفتاة. (like white like snow the girl)
    Tranlation Accuracy:Bad
    Translation Quality: Bad
  9. Suggestive Meaning (Culture dependent translation):
    SL: UNESCO is a big organization.
    TL-Google:اليونسكو منظمة كبيرة(UNESCO is a big organization)
    TL-MS:اليونسكو منظمة كبيرة.(UNESCO is a big organization)
    Tranlation Accuracy:Very good
    Translation Quality: Very good
  10. Proper Arabic morphological generation:
    SL: The man said they will write it.
    TL-Google:وقال الرجل أنها سوف اكتبها.
    TL-MS: وقال الرجل سوف تكتب عليه
    The correct translation: وقال الرجل أنهم سيكتبونها.
    Tranlation Accuracy:Bad
    Translation Quality: Bad

A. From Arabic to English:

  1. Suggestive Meaning (Culture dependent translation):
    SL: الفتاة كالقمر في جمالها.
    TL-Google:Moon girl beauty.
    TL-MS:Girls as Moon in aesthetics
    The correct translation: The girl is as fair as Snow White.
    Tranlation Accuracy:Bad
    Translation Quality: Bad
  2. Arabic morphological translation:
    SL: وقال الرجل انهم سيكتبونها.
    TL-Google:The man said they Sictbunha.
    TL-MS:And the man they سيكتبونها
    The correct translation: The man said they will write it.
    Tranlation Accuracy:Bad
    Translation Quality: Bad
Google translate service: http://translate.google.com
I plan to add more results in future posts, God willing.
The man said they Sictbunha
 
Leave a comment

Posted by on 25/09/2009 in Uncategorized

 

My first published paper

An English-To-Arabic Web Translation Agent paper here imam-195. This paper published in ICENCO’2004 conference.

 
Leave a comment

Posted by on 27/06/2009 in MT

 

Arabic/English Online Free Dictionaries

Did you ever wonder about the meaning of any English or Arabic word or its pronunciation. There are many useful online dictionaries that can help you out. I decided to put a list of them so any one can find them easily and use them. This is as well related to my Masters thesis, as the dictionary part is important in MT research.

  1. Dicts http://dicts.info/
  2. Alburaq http://www.alburaq.net/dictionary/transform.cfm (also provide Arabic roots dictionary http://www.alburaq.net/mukhtar/root.cfm)
  3. tos.edu http://www.tps.edu.ee/nastik/ar-en/ (very simple)
  4. ECTACO http://www.ectaco.co.uk/English-Arabic-Dictionary/ (organized output with similar words)
  5. Dict.info http://www.dicts.info/2/english-arabic.php (English pronunciation available)
  6. Sakhr http://dictionary.sakhr.com/ (You will notice that you need to put the word in the website search input box :), the output is very good with diacritics (i.e. tashkeel) )
  7. Lexilogos http://www.lexilogos.com/english/arabic_dictionary.htm (linked to many other dictionaries).
  8. Albalqa http://dictionary.bau.edu.jo/Search.aspx
  9. Sensagent http://dictionary.sensagent.com/ (Rich output ).
  10. Babylon http://www.babylon.com/define/98/English-Arabic-Dictionary.html
 
Leave a comment

Posted by on 25/08/2008 in Arabic, MT, Uncategorized

 

Tags: , , , ,

Arabic to English Online Machine Traslation Services

As I’m working on my Masters thesis in English to Arabic Machine Translation, I thought to collect all online English to Arabic (and vise versa) online machine translation services. This post scope is only to collect the translation web sites. I may put more posts to evaluate the translation quality God willing in the future.

  1. Goole translate http://translate.google.com/translate_t (free)
  2. Trjem http://www.tarjem.com/ (free)
  3. Sakhr http://translate.sakhr.com/Sakhr/ (I think this is not free and you need to login)
  4. Almisbar http://www.almisbar.com/ (Not free)
  5. Systran http://www.systran.co.uk/ (Free)
  6. WorldLingo http://www.worldlingo.com/microsoft/computer_translation.html
  7. MicroSoft Windows Live translator http://www.windowslivetranslator.com/Default.aspx (Based on Systran).
  8. translated.net  http://free.translated.net/ (it gives you 1 time trial translation)
  9. Starts21.com (uses some of translation sites above) http://www.stars21.com/translator/arabic_to_english.html
  10. There is nother free translation service but doesn’t have Arabic language support http://www.freetranslation.com/

Pleas let me know your feedback.

Enjoy !

 
1 Comment

Posted by on 24/08/2008 in MT

 

Tags: , , , ,

10 Reasons to reject OOXML as ISO standard

Of course, I listed here 10 points but actually you can find hundreds of reasons and thousands of issues and defects. For me, it’s enough a more shorter list than the one I’m providing here to reject the OOXML and abide to one only ISO standard for office document format which is ODF (OpenDocument format).From my point of view, OOXML standardization is only useful for Microsoft and it is not for the rest of the world. While ODF is bad for Microsoft and good for the rest of the world for the same reasons. You can have a look on the following list and think if OOXML should be approved by ISO or should be rejected (even from normal ISO track which I think it is not possible according to ISO directives). Here is why OOXML (i.e. ISO DIS29500) should be rejected:

  1. The current ISO fast track is not suitable for OOXML DIS29500 which is more than 6000 pages and still is not ready to use fast track. Ready means it is completed, bug-free and has enough consensuses from all parties. This is still not true for OOXML.
  2. More than 3000 comments from all NBs was provided and normalized later to 1026 non-duplicate comments that had no chance for careful discussion in the Ballot resolution meeting in Geneva last Feb.
  3. Many Issues has been discovered related to Arabic and Islamic culture specific requirements. All these comments has not reached to ISO because of limited time allowed for review.
  4. This is a redundant standard. The current ODF ISO standard is sufficient and satisfactory for all NBs around the world. OOXML is only satisfactory for MS because it will retain its monopolization of the office suites and the documents file formats.
  5. OOXML approval as ISO standard is against the interoperability of the government documents as it will restrict the exchange of the documents between citizens or other countries that are using open source and office suites other than MS products.
  6. If OOXML approved as ISO standard, this will force million of people around the world) who want to deal with the governments to use this format. This mean a great increase of revenue for MS and a lot of lose for the our country National Income. This can be avoided by rejecting OOXML as standard and support the current ODF ISO standard.
  7. OOXML will cause ISO dual-standards for the same thing, which will divide the world and the people. Also dual standards always mean increase costs, confusion for industries, governments and citizens.
  8. OOXML didn’t achieve any of the goals stated in the proposed standard draft. For example: The full compatibility with the current MS office binary format can not be implemented using the current proposed specification, only MS can implement it.
  9. Currently no full implementation for OOXML. Even MS Office 2007 does not compliant to the current proposed DIS29500 specification. How OOXML can be ISO standard while no full implementation is exist. Instead this needs various full implementations from different vendors to guarantee the maturity, common and repeated use.
  10. This format conflicts with existing ISO standards, such as ISO 8601 (Representation of dates and times), ISO 639 (Codes for the Representation of Names and Languages) or ISO/IEC 10118-3 (cryptographic hash)

Hence, for all those rational reasons and since the NBs should vote by the end of the current month for the final ISO DIS29500 standardization, I think OOXML should fail and return to normal ISO track.

 
4 Comments

Posted by on 24/03/2008 in Arabic, ISO, ODF, OOXML

 

Why OOXML should be rejected

This is a technical presentation regarding the MS Open XML format which is currently in its ISO fast track to be standardized. This presentation covers the subject from many sides. OOXML should not be an ISO standard for many reasons as shown in the presentation.The presentation shows the lack of support of languages that based on Arabic script. Also a missing Islamic specific requirements. I’m asking all National Bodies around the world to reject OOXML in the current fast track and instead of having dual-standards, MS should improve the current approved ODF standard (OpenDocument format). Please watch the presentation carefully and let me know your comments.  

 
3 Comments

Posted by on 09/03/2008 in Arabic, ISO, ODF, OOXML

 

Tags: , , , ,