Awesome Arabic Corpus Search Tool
Evidently an Arabic corpus search tool has existed at BYU for some time, but a post on the Arabic LinguistList this morning brought it to my attention:
This is to announce that two new ‘sub’ corpora have been added to newspaper section of arabiCorpus.byu.edu:Masri2010:This is the entire year of 2010 worth of the newspaper Al-Masri Al-Yawm. This paper was chosen partly because of its popularity, partly because it contrasts markedly in style from the Ahram, and partly because it is one of the papers that uses the new ‘quoting’ style: they actually write down what people say, even if it is in colloquial Arabic or some mixed form (look up وتعاليمها تخاخل الإنجيل using ‘string’ for a relatively hilarious example quoting Baba Shanouda during last summers ‘divorce controversy'(. (almost 14 million words)ShuruqColumns:This is a large set of columns from the Egyptian newspaper Al-Shuruuq. This paper is reputed to have attracted some of the best editorial writers in Egypt, and many people buy it just for the writers and columns, rather than for the news. This would be a good (small) corpus to use if you wanted samples of what is considered to be ‘fine’ current writing on politics and social life. Writers include Fahmy Huwaidi, Khaled Al-Khamissi (of Taxi fame), Alaa’ Al-Aswaani (of Yaqubian Building fame), and many others. Enjoy. (about 2 million words)
I cannot contain my excitement. Not only does the search provide full citations, but also does it show frequency distributions of various word forms (e.g., مكتب => مكتبهم, المكتب) and tokens appearing both before and after the query term. In the past I have used Google search as a corpus tool, but the pollution in the Arabic web due to chat forums subverts the discovery of meaningful linguistic examples. Bravo, BYU.
Leave a Reply
You must be logged in to post a comment.