Acta Scientific Computer Sciences

Research Article Volume 3 Issue 10

Pattern Defined Corpus Construction: A Complete Training to Building the Corpus

Sayed Majid Ali Shah*, Zeeshan Bhatti, Zulfiqar Ali Bhutto and Kamran Taj Pathan

Dr. A.H.S. Bukhari Institute of Information Communication and Technology, University of Sindh, Jamshoro, Pakistan

*Corresponding Author: Sayed Majid Ali Shah, Dr. A.H.S. Bukhari Institute of Information Communication and Technology, University of Sindh, Jamshoro, Pakistan.

Received: August 22, 2021; Published: September 14, 2021


  Corpus is considering the mandatory component required for the processing of any language to building the Natural Language Processing (NLP) applications that perform the tasks, particularly the language analysis, manipulation, and information retrieval. In this article, a procedure has been discussed and illustrated for the constructions of the corpus. This study illustrates the training about the constructions of the corpus in any language. Numerous approaches have been sketched through different perspectives of the language with the support of the Sindhi language. The applications including machine translation, spell checking, grammar checking, parts of speech tagging, named entity recognition and word identification also have been addressed.  The text has been taken from various digital sources such as newspaper websites, blogs, e-books, and magazines. The procedural models also have been demonstrated for the NLP applications by using the corpus.

Keywords: Natural Language Processing (NLP); Corpus; Sindhi Language


  1. Meyer C F. “English corpus linguistics: an introduction”. Cambridge: Cambridge University Press (2002).
  2. Powell C and R Simpson. “Collaboration between corpus linguistics and digital librarians for the MICASE web search interface”. In R. Simpson and J. Swales (2001): 32-47.
  3. Ismaili I A., et al. “Design and Development of the Graphical User Interface for Sindhi Language”. arXiv preprint arXiv (2014): 1401.1486.
  4. Bhatti Z., et al. “Phonetic based soundex and shapeex algorithm for sindhi spell checker system”. arXiv preprint arXiv (2014): 1405.3033.
  5. Bhatti Z., et al. “Word segmentation model for Sindhi text”. American Journal of Computing Research Repository1 (2014): 1-7.
  6. Ko W K and Phyo T Z. “Selection of XML tag set for Myanmar National Corpus”. In Proceedings of the 6th Workshop on Asian Language Resources (2008).
  7. Bojar O., et al. “HindEnCorp-Hindi-English and Hindi-only Corpus for Machine Translation”. In LREC (2014): 3550-3555.
  8. Zanettin F. “CEXI: designing an english Italian translational corpus”. In Teaching and Learning by Doing Corpus Analysis. Brill Rodopi (2002): 327-343.
  9. Meyer C F. “English corpus linguistics: An introduction”. Cambridge University Press (2002).
  10. Bhatti Z and Shah M. “Sindhi Text Corpus using XML and Custom Tags”. Sukkur IBA Journal of Computing and Mathematical Sciences2 (2018): 30-37.


Citation: Sayed Majid Ali Shah., et al. “Pattern Defined Corpus Construction: A Complete Training to Building the Corpus". Acta Scientific Computer Sciences 3.10 (2021): 33-36.


Copyright: © 2021 Sayed Majid Ali Shah., et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.


Acceptance rate35%
Acceptance to publication20-30 days

Indexed In

News and Events

  • Certification for Review
    Acta Scientific certifies the Editors/reviewers for their review done towards the assigned articles of the respective journals.
  • Submission Timeline for Upcoming Issue
    The last date for submission of articles for regular Issues is July 30, 2024.
  • Publication Certificate
    Authors will be issued a "Publication Certificate" as a mark of appreciation for publishing their work.
  • Best Article of the Issue
    The Editors will elect one Best Article after each issue release. The authors of this article will be provided with a certificate of "Best Article of the Issue"
  • Welcoming Article Submission
    Acta Scientific delightfully welcomes active researchers for submission of articles towards the upcoming issue of respective journals.

Contact US