A Comparison of Phrase Based and Word based Language Model for Punjabi

Umrinderpal Singh

Abstract


A language model provides connection to the decoding process to determine a precise word from several available options in the information base or phrase table. The language model can be generated using n-gram approach. Various language models and smoothing procedures are there to determine this model, like unigram, bigram, trigram, interpolation, backoff language model etc. We have done some experiments with different language models where we have used phrases in place of words as the smallest unit. Experiments have shown that phrase based language model yield more accurate results as compared to simple word based mode. We have also done some experiments with machine translation system where we have used phrase based language model rather than word based model and system yield great improvement.

Full Text:

PDF

References


Yuanhua Lv and ChengXiang Zhai, Positional Language Models for Information Retrieval, in Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval (SIGIR), 2009.

E. Cambria and A. Hussain. Sentic Computing: Techniques, Tools, and Applications. Dordrecht, Netherlands: Springer, ISBN 978-94-007-5069-2 2012

Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze: An Introduction to Information Retrieval, pages 237–240. Cambridge University Press, 2009

Buttcher, Clarke, and Cormack. Information Retrieval: Implementing and Evaluating Search Engines. pg. 289–291. MIT Press.

Craig Trim, What is Language Modeling?, April 26th, 2013.

Bengio, Yoshua (2008). "Neural net language models". Scholarpedia. 3. p. 3881.

Mikolov, Tomas; Chen, Kai; Corrado, Greg; Dean, Jeffrey. "Efficient estimation of word representations in vector space". arXiv:1301.3781 Freely accessible. (2013)

Mikolov, Tomas; Sutskever, Ilya; Chen, Kai; Corrado irst4=Greg S.; Dean, Jeff. Distributed Representations of Words and Phrases and their Compositionality (PDF). Advances in Neural Information Processing Systems. pp. 3111–3119. (2013)

Harris, Derrick. "We’re on the cusp of deep learning for the masses. You can thank Google later". Gigaom. (2013)

Rosenfeld, R.. Two decades of statistical language modeling: where do we go from here? In Proceedings of the IEEE, 88(8), 2000.

Peters, J.. Semantic text clusters and word classes – the dualism of mutual information and maximum likelihood. In Proceedings of the Workshop on Language Modeling and Information Retrieval, Carnegie Mellon University, Pittsburgh. (2001)




DOI: https://doi.org/10.23956/ijarcsse/V7I7/0232

Refbacks

  • There are currently no refbacks.




© International Journals of Advanced Research in Computer Science and Software Engineering (IJARCSSE)| All Rights Reserved | Powered by Advance Academic Publisher.