Effective Concentrated Web Crawling Approach Path for Google

Ashwani Kumar, Anuj Kumar, Rahul Mishra

Abstract


A concentered crawler crosses the World Wide Web, choosing out applicable pages to a predefined topic and forgetting those out of concern. Collecting domain specific documents employing focused crawlers has been considered one of most crucial schemes to detect applicable data. While browsing the Internet, it is unmanageable to act with extraneous pages and to anticipate which associates lead to quality pages. However most focused crawler use local explore algorithmic program to crisscross the web space, but they could easily entrapped within bounded a sub graph of the web that surrounds the starting URLs also there is problem related to applicable pages that are miss when no associates from the starting URLs. There is some applicable pages are miss. To address this problem we design a focused crawler where calculating the absolute frequency of the topic keyword also calculate the equivalent word and sub equivalent word of the keyword. The weight table is constructed agreeing to the user query. To check the resemblance of web pages with respect to topic keywords and priority of extracted associate is computed.

Full Text:

PDF

References


Qu Cheng, Wang Beizhan, Wei Pianpian, “Efficient Focused Crawling Strategy Employing Combination of Associate Structure and Content Similarity”, Software School, Xiamen University, Xiamen 361005, Fujian, China, Proceedings of 2008 IEEE International Symposium on IT in Medicine and Education, 978-1-4244-2511- 2/08/$25.00 ©2008 IEEE.

Meenu, Priyanka Singla, Rakesh Batra, “Design of a Focused Crawler Founded on Dynamic Computation of Topic Specific Weight Table” International Journal of Engineering Reexplore and General Science Volume 2, Issue 4, June-July, 2014 ISSN 2091-2730.

Anshika Pal, Deepak Singh Tomar, S.C. Shrivastava, “Effective Focused Crawling Founded on Content and Associate Structure Analysis” (IJCSIS) International Journal of Computer Science and Data Security, Vol. 2, No. 1, June 2009.

Bireshwar Gangly, Rahila Sheikh, “A Review of Focused Web Crawling Strategies” International Journal of Advanced Computer Reexplore, volume 2, number 4 issue 6, December 2012.

Jaira Dubey, Divakar Singh, “A Survey on Web Crawler”, International Journal of Of Electrical, Electronic and Computer System, ISSN (Online): 2347-2820, Volume-1, Issue -1, 2013.

Meenu, Rakesh Batra, “A Review of Focused Crawler Approaches”, International Journal of Advanced Reexplore in Computer Science and Software Engineering, Volume 4, Issue 7, July 2014.

Deepali Dave, “Relevance Prediction in Focused Crawling: A Survey”, Journal of Data, knowledge and reexplore in computer Engineering, ISSN: 0975 – 6760, Volume 2, issue -2, November 2013.

Debashish, Amritesh, Lizashree “Unvisited URL Relevancy Calculation in Focused Crawling founded on Naïve Bayesian Classification”, International Journal of Computer Application, volume 3, July 2010.

Anshika Pal, Deepak Singh Tomar, S.C. Shrivastava, “Effective Focused Crawling Founded on Content and Associate Structure Analysis” Available at: http://arxiv.org/ftp/arxiv/papers/0906/0906.5034.pdf

D. Minnie, S.Srinivasan, “Intelligent Google Algorithms on Indexing and Exploreing of Text Document employing Text Representation” available at: “http://ieeeexplore.ieee.org/xpl/login.jsp”.

“About Google Optimization” available at: http://static.googleusercontent.com/media/www.google.com/en//webmasters/docs/explore -engine-optimization-starter-guide.pdf

Najork, M. and Wiener, J., L., 2001. Breadth-First Explore Crawling Yields High-Quality Pages. In 10th International World Wide Web Conference, pp. 114-118

Alexander Shen “Algorithms and Programming: Problems and solutions” Second edition Springer 2010, Pg 135.

Grigoriadis, A. and Paliouras, G., 2004. Focused Crawling employing Temporal Difference-Learning. In Proceedings of the Panhellenic Conference in Artificial Intelligence (SETN), Samos, Greece, pp. 142-153.

Pant, G. and Menczer, F., 2003. Topical Crawling for Business Intelligence. In Proceedings of the 7th European Conference on Reexplore and Advanced Technology for Digital Libraries.

Brin, S. and Page, L., 1998. The Anatomy of a Large-Scale Hypertextual Web Google. In Proceedings of the seventh international conference on World Wide Web 7.Brisbane, Australia pp. 107 - 117.

Page, L., Brin, S., Motwani, R. & Winograd, T., 1998. The PageRank Citation Ranking: Bringing Order to the Web. Stanford Digital Library Technologies Project.

Kleinberg, M. J., 1997. Authoritative Sources in a Hyperassociateed Environment, In Proceedings of the 9th ACM-SIAM Symposium on Discrete Algorithm.

Aggarwal, C., Al-Garawi, F. & Yu, P., 2001. Intelligent Crawling on the World Wide Web with Arbitrary Predicates, In Proceedings of the 10th international conference on World Wide Web, Hong Kong, Hong Kong, pp. 96 – 105.

Ehrig, M. and Maedche, A., 2003. Ontology-Focused Crawling of Web Documents. In Proceedings of the Symposium on Applied Computing 2003 (SAC 2003), Melbourne, Florida, USA, pp. 1174-

Zhuang, Z., Wagle, R. & Giles, C. L., 2005. What‟s There and What‟s Not? Focused Crawling for Missing Documents in Digital Libraries. In Joint Conference on Digital Libraries, (JCDL 2005) pp. 301-310.

Madelyn, O., Schulz, S., Paetzold, J., Poprat, M. & Markó, K., 2006. Language Specific and Topic Focused Web Crawling. In Proceedings of the Language Resources Conference LREC 2006, Genoa, Italy.




DOI: https://doi.org/10.23956/ijarcsse.v7i11.459

Refbacks

  • There are currently no refbacks.




© International Journals of Advanced Research in Computer Science and Software Engineering (IJARCSSE)| All Rights Reserved | Powered by Advance Academic Publisher.