In this post I will try to perfom some MT testing using Google translate and MS-Windows live translator service in order to evaluate its accuracy and – to some extend – the quality of the translation. This evaluation is not automated (i.e. based on my own human translation). The reason I wrote this post is to know how far we are from good MT system that can really provide the minimum acceptable translated text.
I will divide the evaluation to main two aspects, the first one is English related issues and second one is Arabic related issues.
I used two indicators, (1) translation accuracy to measure the translation from syntactic point of view. (2) translation quality to measure if this translation convey the message (semantic).
A. From English to Arabic:
Space-Delimited Compound Words
Source Language (SL): Take off your jacket!
Target Language (TL) – Google: خلع سترته الخاصة بك! (he took off your jacket!)
Target Language (TL) – MS:تقلع الغلاف الخارجي الخاص بك! (you took off your outer shell!)
Tranlation Accuracy: Bad
Translation Quality: Bad
Compound Verbs with inserted words between the verb parts:
SL: Take your jacket off!
TL-Google: خذ خارج الغلاف! (take out the shell!)
TL-MS: تقلع الغلاف الخارجي الخاص بك! (take off your outer shell!)
Tranlation Accuracy: Bad
Translation Quality: Bad
Hyphenated/Solid/Open:
SL: I used a school-bus.
TL-Google: كنت في حافلة المدرسة. (I was in the school-bus)
TL-MS:استخدام حافلة مدرسية. ( The use of the school-bus)
Tranlation Accuracy: bad
Translation Quality: good
Hyphenated/Solid/Open:
SL: New York is a big city in US.
TL-Google: نيويورك هي مدينة كبيرة في الولايات المتحدة. (New York is a big city in US.)
TL-MS:نيويورك مدينة كبيرة في الولايات المتحدة.(New York- is- a big city in US.)
Tranlation Accuracy: Very good
Translation Quality: Very good
Textual Meaning:
SL: Ali is a pure man.
TL-Google: علي هو محض رجل.(Aly is a plain man)
TL-MS:علي رجل محض.(Aly is a plain man)
Tranlation Accuracy: bad
Translation Quality: bad
Textual Meaning:
SL: I studied pure mathematics.
TL-Google: درست الرياضيات البحتة.(I studied pure mathematics.)
TL-MS:درست الرياضيات محض.(I studied plain mathematics.)
Tranlation Accuracy:Very Good
Translation Quality: Very Good
Textual Meaning:
SL: he is trying to contain acts.
TL-Google:انه يحاول احتواء الأعمال.(he is trying to include acts)
TL-MS:يحاول تحتوي على الأعمال.(trying including of acts.)
Translated text meaning:
Tranlation Accuracy:Bad
Translation Quality: Bad
Suggestive Meaning (Culture dependent translation):
SL: The girl is as white as snow.
TL-Google:الفتاة كما الثلج الأبيض(the girl is like the white ice)
TL-MS:كما بيضاء كما الثلوج الفتاة. (like white like snow the girl)
Tranlation Accuracy:Bad
Translation Quality: Bad
Suggestive Meaning (Culture dependent translation):
SL: UNESCO is a big organization.
TL-Google:اليونسكو منظمة كبيرة(UNESCO is a big organization)
TL-MS:اليونسكو منظمة كبيرة.(UNESCO is a big organization)
Tranlation Accuracy:Very good
Translation Quality: Very good
Proper Arabic morphological generation:
SL: The man said they will write it.
TL-Google:وقال الرجل أنها سوف اكتبها.
TL-MS: وقال الرجل سوف تكتب عليه
The correct translation: وقال الرجل أنهم سيكتبونها.
Tranlation Accuracy:Bad
Translation Quality: Bad
A. From Arabic to English:
Suggestive Meaning (Culture dependent translation):
SL: الفتاة كالقمر في جمالها.
TL-Google:Moon girl beauty.
TL-MS:Girls as Moon in aesthetics
The correct translation: The girl is as fair as Snow White.
Tranlation Accuracy:Bad
Translation Quality: Bad
Arabic morphological translation:
SL: وقال الرجل انهم سيكتبونها.
TL-Google:The man said they Sictbunha.
TL-MS:And the man they سيكتبونها
The correct translation: The man said they will write it.
Tranlation Accuracy:Bad
Translation Quality: Bad
Did you ever wonder about the meaning of any English or Arabic word or its pronunciation. There are many useful online dictionaries that can help you out. I decided to put a list of them so any one can find them easily and use them. This is as well related to my Masters thesis, as the dictionary part is important in MT research.
Sakhr http://dictionary.sakhr.com/ (You will notice that you need to put the word in the website search input box , the output is very good with diacritics (i.e. tashkeel) )
As I’m working on my Masters thesis in English to Arabic Machine Translation, I thought to collect all online English to Arabic (and vise versa) online machine translation services. This post scope is only to collect the translation web sites. I may put more posts to evaluate the translation quality God willing in the future.
Of course, I listed here 10 points but actually you can find hundreds of reasons and thousands of issues and defects. For me, it’s enough a more shorter list than the one I’m providing here to reject the OOXML and abide to one only ISO standard for office document format which is ODF (OpenDocument format).From my point of view, OOXML standardization is only useful for Microsoft and it is not for the rest of the world. While ODF is bad for Microsoft and good for the rest of the world for the same reasons. You can have a look on the following list and think if OOXML should be approved by ISO or should be rejected (even from normal ISO track which I think it is not possible according to ISO directives). Here is why OOXML (i.e. ISO DIS29500) should be rejected:
The current ISO fast track is not suitable for OOXML DIS29500 which is more than 6000 pages and still is not ready to use fast track. Ready means it is completed, bug-free and has enough consensuses from all parties. This is still not true for OOXML.
More than 3000 comments from all NBs was provided and normalized later to 1026 non-duplicate comments that had no chance for careful discussion in the Ballot resolution meeting in Geneva last Feb.
Many Issues has been discovered related to Arabic and Islamic culture specific requirements. All these comments has not reached to ISO because of limited time allowed for review.
This is a redundant standard. The current ODF ISO standard is sufficient and satisfactory for all NBs around the world. OOXML is only satisfactory for MS because it will retain its monopolization of the office suites and the documents file formats.
OOXML approval as ISO standard is against the interoperability of the government documents as it will restrict the exchange of the documents between citizens or other countries that are using open source and office suites other than MS products.
If OOXML approved as ISO standard, this will force million of people around the world) who want to deal with the governments to use this format. This mean a great increase of revenue for MS and a lot of lose for the our country National Income. This can be avoided by rejecting OOXML as standard and support the current ODF ISO standard.
OOXML will cause ISO dual-standards for the same thing, which will divide the world and the people. Also dual standards always mean increase costs, confusion for industries, governments and citizens.
OOXML didn’t achieve any of the goals stated in the proposed standard draft. For example: The full compatibility with the current MS office binary format can not be implemented using the current proposed specification, only MS can implement it.
Currently no full implementation for OOXML. Even MS Office 2007 does not compliant to the current proposed DIS29500 specification. How OOXML can be ISO standard while no full implementation is exist. Instead this needs various full implementations from different vendors to guarantee the maturity, common and repeated use.
This format conflicts with existing ISO standards, such as ISO 8601 (Representation of dates and times), ISO 639 (Codes for the Representation of Names and Languages) or ISO/IEC 10118-3 (cryptographic hash)
Hence, for all those rational reasons and since the NBs should vote by the end of the current month for the final ISO DIS29500 standardization, I think OOXML should fail and return to normal ISO track.
This is a technical presentation regarding the MS Open XML format which is currently in its ISO fast track to be standardized. This presentation covers the subject from many sides. OOXML should not be an ISO standard for many reasons as shown in the presentation.The presentation shows the lack of support of languages that based on Arabic script. Also a missing Islamic specific requirements. I’m asking all National Bodies around the world to reject OOXML in the current fast track and instead of having dual-standards, MS should improve the current approved ODF standard (OpenDocument format). Please watch the presentation carefully and let me know your comments.