RSS

Author Archives: woransa

Progress Update on: Proposal for Replacing @ in Arabic emails address with Arabic symbol

New update on the Proposal for Replacing @ in Arabic emails address with Arabic symbol

Current Progress:

Yes, you can use it right now !!

I was able to change some fonts with the help of Arabic font designers ( Saud and Abdullah Aref from Maktabat Alkhotot Alarabia), and now we have a font that render @ in it’s new Arabic shape, example below from MS-Word

The new Fonts can be downloaded from here

Future work:

To Do:

Phase 1: Use this symbol in email clients and applications that deal with email addresses and spread it usage and implementation.

Phase 2: Work on suggest the new symbol to Unicode and other standard bodies as new character should be used in email addresses with Arabic domain name.

 
Leave a comment

Posted by on 10/11/2010 in Uncategorized

 

Proposal for Replacing @ in Arabic emails address with Arabic symbol

This is a Proposal for Replacing @ in Arabic emails address with Arabic symbol, As you know Arabic (and international) domain names are now supported, for example, In Egypt TE-Data started registering Arabic domains, see this link http://www.tedata.net/web/eg/ar/default.aspx?sec=66&pr=12

Since the username and domain name for some email address’ will be in Arabic, it doesn’t make sense to keep the sign “@” between them, which is originally Latin symbol as abbreviation for “at”.  So here, this is a proposal to add new character to Unicode to replace “@” for emails address’ which use Arabic domain names. This new symbol will only be used when Arabic email address is displayed, the memory buffer will not be changed and the email address internally should still keep “@” in order not to break the email protocol.

The suggested symbol is the Arabic letter Aien “ع” with long tail that will circle it in the same way it circle the letter “a” in @.. The Aien is suitable because it can be considered an abbreviation of one of the following:

  1. على – means “on”
  2. عند – means “at”
  3. عنوان – means “address”

I am not very good in art, but I tried to do my best and designed a draft symbol just to illustrate the idea, which of course will have different shapes based on the font, as shown in the figure below.

Please leave comments with your feedback and opinion.

 
2 Comments

Posted by on 21/10/2010 in Arabic, ISO, Proposal, Unicode

 

Proposal for Hijri Date Synchronization Standard and Protocol

Hello all,

This proposal mainly focuses on the request for Hijri date synchronization standard and its protocol.

Note: synchronization means here synchronization of clients date with the actual date from the suggested service.

Introduction:

1. Currently the Hijri calendar is based on conversion algorithm using Gregorian calendar. This conversion is inaccurate since Hijri calendar is lunar based.

2. Each Islamic country may have a different local Hijri date, this date usually synchronized with Saudi Hijri date by Zo-ElHeja month because of Arafa day and Eid-Adha.

Problem description:

1. The normal end user of today OSs, Applications can’t get an accurate Hijri date using the current infrastructure and technology.

2. Application developers can’t use a reliable method to know the actual Hijri date value. Instead, nowadays, most application developers and web sites (e.g. Aljazeera, CNN, etc.) set this value every daily with the correct Hijri date. This manual setting can cause silly mistakes and consumes some effort.

3. Currently there is no formal way to know the Hijri date in other countries. Of course the user can look into some on-line news papers in this country but this is manual, slow and may be difficult sometime. Also this is not automated. Automation is needed to allow integration of the Hijri date in computer systems.

4. Currently there is no way to convert the hijri dates across different countries. In other way, if we have 2 Muharram in Egypt, sometimes it is needed to know what is the current Hijri date in India. For example to allow reliable usage of Hijri calendar. Reliable usage means that if someone sets a meeting in specific Hijri date (2 Muharram), it will appear to each participant in the meeting in his own local Hijri calendar (say 3 Muharram in India).

5. If Hijri calendar is inaccurate , conversion between Gregorian and Hijri will be inaccurate as well, so missing for accurate Hijri date is really bad thing in computers systems that need to be fixed to make it more reliable and hence allow it to be used by users.

6. Technology and G11n should support and respect Islamic culture and calendar, instead of forcing people to use other calendar systems. Also we need to remember the following Aya in Quran. Our responsibility is to allow technology to support Hijri at least to remove any claims by end user like saying that the lack of Hijri support in the technology prevents the Hijri usage.

{إِنَّ عِدَّةَ الشُّهُورِ عِندَ اللّهِ اثْنَا عَشَرَ شَهْرًا فِي كِتَابِ اللّهِ يَوْمَ خَلَقَ السَّمَاوَات وَالأَرْضَ مِنْهَا أَرْبَعَةٌ حُرُمٌ ذَلِكَ الدِّينُ الْقَيِّمُ فَلاَ تَظْلِمُواْ فِيهِنَّ أَنفُسَكُمْ وَقَاتِلُواْ الْمُشْرِكِينَ كَآفَّةً كَمَا يُقَاتِلُونَكُمْ كَآفَّةً وَاعْلَمُواْ أَنَّ اللّهَ مَعَ الْمُتَّقِينَ} (36) سورة التوبة

Proposal

1. Create a standard centralized service to be publicly available to allow synchronization of the hijri date based on the country, also use Hijri date of Makka as a standard unified date. ( I suggest the name “Internet Hijri Service(IHS)” or “Internet Synchronization Hijri Service (ISHS)”)

2. We will need to write a standard protocol to describe the communication between the requester of the service and the provider, also this define the request format/information needed and the response format/information.

3. We may need to define various formats for the request and response that can be used out of the box, like html div, xml, plain text, Java, RPC etc.

4. It might be useful to provide a service that return names of  a group of countries whom share the same specific Hijri date.

5. We need to request adding such service to Hirji date adjustment in OSs like Windows, Linux OS, Solaris, AIX, AS400, etc. Also Eclipse SWT, .Net and Java should have the APIs to get the Hijri date for a specific country, may be based on the locale object.

6. We need to find a way to add this service to the Unicode CLDR, may be adding the Hijri date format tag that should be updated automatically using the standard protocol. (This may need more discussion if this is not possible, and also this can be delayed until we have something solid).

7. We need to add another conversion service to the standard protocol which return the equivalent Gregorian date of any Hijri date, and the equivalent Hijri date of any Gregorian date, this is very important to make Hijri calendar a reliable date that can be accurately converted to and from Gregorian date.

Some applications:

1. Some application will be possible and reliable based on this standards protocol and service like: Religious Events (Eid Fetr, Eid Adha, First Day in the Hijri Year,  AAshora,  Maweld Nabawy,  1st Ramadan, etc.) which can be integrated based on the user local in current schedulers applications like Calendar applications which define working days but can’t understand the current Islamic religious holidays and events unless defined manually which is a usability issue, OSs like Windows, Linux can also use such religious events smartly, ICU Hijri calendar can be updated to allow it as interface to adjust the Hijri date from the public service before return the date to user (if  internet access available otherwise it will keep using  current way based on conversion from Gregorian)

2. This service will allow Hijri date reliable use in set a meeting with different people (in same country like Egypt) using local Hijri calendar instead of Gregorian, so for example I can set a meeting to meet people on 24th of Muharram. Setting meeting using Hijri calendar will be highly reliable when set for any date of the current month (i.e. Hijri month already determined)  (note that current implementation of Hijri calendar doesn’t allow this because it does not use accurate Hijri date). The same reliability is available to set old events or dates. For example sometimes you need to set the dates of your travel period or travel expenses (and similar cases), While it is not highly reliable to set the date out of the current Hijri month. For example, set a date 20 Saffar (assume Saffar is the next month) should have it is etiquette between users, which mainly allow 1 day adjustment that the system will does automatically for all invitees at begin of this month (i.e. Saffar) and send a reschedule of this meeting  automatically (if needed) so users can accept the new confirmed and reliable date based on the actual start of this month (i.e. Saffar). This is unavoidable way to make the Hijri calendar reliable for months in future (note that this is not an issue when using a date in the current Hijri month as described before).

3. Web sites can use this service to reliably display the current Hijri date for specific country. So a web site like Aljazeera.net doesn’t have to set the date manually every day. Also it is now possible to display the Hijri date according to the visitor not according to server local Hijri date (e.g. Local date in Qutar for Aljazeera.net) which may be needed sometimes. Think of the method that  allow web sites to display the time in your own local time not the server time, also think of a congratulations message that can be displayed to user at specific Islamic events according to his country today Hijri date.

4. The reliable conversion of Hijri data to Gregorian and vice versa, allow other countries, embassies, etc. to use Hijri calendar in their communication (beside Gregorian) not worry about set wrong Hijri date for Egypt OR for Suadi Arabic ..etc.

5. We may think of many applications but this is what I thought about for now and I’m sure this can be expanded to include Office docs Hijri dates, Quote of the day applications, Visa and accounting application especially for Saudi Arabic, etc.

Please let me know if you have additions and so we can formalize this proposal and initiate it to one of the standard organization, may be ISO or ECMA or may be establish separate organization for it.

 
Leave a comment

Posted by on 17/06/2010 in Hijri Calendar, Proposal

 

My Arabic calligraphy

I love Arabic calligraphy art, see a sample of my work below

 
2 Comments

Posted by on 16/06/2010 in Arabic, Arabic calligraphy

 

Statistical Machine Translation

I made an introduction presentation on SMT (Statistical Machine Translation) as getting started for people or researchers who would like to know what is SMT and how it works. I hope you like it !

 
1 Comment

Posted by on 29/05/2010 in MT, SMT, Statistical Machine Translation

 

My Master’s Thesis

Injected Linguistic Tags Approach to Improve Phrase Based Statistical Machine Translation

Thesis Submitted to Computer Science Department in Partial Fulfillment of the Requirements for Obtaining the Degree of  MASTER of Science in Computer Science.
Please find the abstract of the work in both English and Arabic below:
Abstract

Statistical machine translation (SMT) has proven to give good results between languages with high similarity in morphological and grammatical nature like English and French. However, SMT still needs improvements when used to translate text between languages that have different morphology and syntax structure, especially between poor and rich morphological languages like English and Arabic. In this thesis, Injected Linguistic Tags approach is presented which improves the phrase based statistical machine translation (PBSMT). This approach has been applied to “English to Arabic translation”. The Injected Tags (ITs) approach is language independent and can be used with any language pair. The proposed approach incorporating English-Arabic languages using the state-of-the-art PBSMT system is presented. This approach presents a method to enrich and expand the SMT parallel corpus to allow more capabilities and vocabularies. The proposed approach has been evaluated and a comparison between its results with several online MT services has been presented. It has shown good improvement of the translation quality of at least 13% increase of BLEU score. The experiments reveal that the results achieved by this approach considered significant enhancements over PBSMT. Further more, the experiments show that for the translation system that uses the proposed approach, an increases of the noun/verb gender-number agreement of the translated text are recorded.


Published Work

This thesis has resulted in original work published as follows:

Waleed Oransa, Mohamed Kouta and Mohamed Sakre. “Injected Linguistic Tags to Improve Phrase Based SMT”. In t he 2nd International Conference on Computer and Automation Engineering , ICCAE, Singapore, 2010. [Download from IEEEXplore]

ملخص الرسالة باللغة العربية

لقد أعطت نظم الترجمة المبنية على الترجمة الإحصائية نتائج جيدة بين اللغات التي تمتاز بتقارب في طريقة الصرف والقواعد النحوية مثل اللغة الانجليزية والفرنسية. هذه النظم مازالت تحتاج إلى تحسين عندما يتم استخدامها للترجمة بين اللغات المختلف ، بشكل كبير، في طريقة الصرف و البناء اللغوي ، خاصة بين اللغات الفقيرة بالتصريفات واللغات الغنية بالتصريفات كالانجليزية والعربية. هذه الرسالة تناقش طريقة مقترحة للإثراء اللغوي لنظم الترجمة الإحصائية المعتمدة على المقاطع. هذه الطريقة تم تطبيقها للترجمة من اللغة الانجليزية إلى اللغة العربية لكنها تعتبر طريقة مستقلة عن اللغة فيمكن استخدامها لتحسين الترجمة بين أي لغتين.

هذه الرسالة تعرض نظام ترجمة آلية من اللغة الانجليزية للغة العربية باستخدام أحدث نظم الترجمة الإحصائية المفتوحة المصدر “موسى”. وتضيف هذه الرسالة وسيلة لإثراء نظم الترجمة الإحصائية عن طريق تعزيز قدراتها ومفرداتها بمفردات خارج الجمل المتوازية التي استخدمت في مرحلة تدريبها. تم تقييم الطريقة المقترحة ومقارنة النتائج مع خمس نظم ترجمة آلية متاحة على الانترنت وكانت نتائج الترجمة للعربية أفضل خاصة في قواعد مطابقة الاسم من ناحية التذكير والتأنيث والعدد مع الفعل والصفة.

أظهرت نتائج التجارب على نظام الترجمة الذي يستخدم طريقة الإثراء اللغوي تحسنا في النص المترجم للعربية مقارنة مع النظام الأساسي الذي لم يستخدم هذه الطريقة. كما أعطى “التقييم الآلي لجودة الترجمة” باستخدام مقياس “بلو” ومقياس “إن آي إس تي” ارتفاعا كان حده الأدنى بنسبة 13% مقارنة بالنظام الأساسي. كذلك تم تسجيل تحسن في مطابقة الاسم في التذكير والتأنيث والمفرد والجمع والمثنى والعدد مع بقية مكونات الجملة كالأفعال والصفات عنها في النظام الأساسي.

 
2 Comments

Posted by on 24/04/2010 in Maters

 

I got my Masters degree in CS

Asalam alikom,

I would like to share with you a good news. Yesterday was my Masters seminar. The thesis title is

“Injected Linguistic Tags Approach to Improve Phrase Based Statistical Machine Translation”

“طريقة الإثراء اللغوي لتحسين نظم الترجمة الآلية الإحصائية المعتمدة على المقاطع”

In the masters degree program from Arab Academy for Science and Technology & Maritime Transport, College of Computing & Information Technology

and elhamdoleAllah I got my Masters degree in computer science, It was a lot of hard work and research effort. I feel I did something useful to improve English into Arabic machine translation.

Acknowledgement

I would like to express my gratitude to Prof. Dr. Mohamed Kouta for his supervision of my thesis and his great support along my thesis work.

I am deeply indebted to my supervisor Dr. Mohamed Sakr whose help, stimulating suggestions and encouragement, helped me in all the time of research and writing of this thesis.

Also I would like to thank my examiners Prof. Dr. Mohamed Esam Khalifa and Prof.Dr.Mohamed Sharawy .

Also, I would like to thank my brother Osama Oransa for his help in J2ME related issues.

Especially, I would like to give my special thanks to my parents and my wife whose patient love and encouragement enabled me to complete this work.