案例,spss,数据分析

基于Lucene的本地搜索和网页搜索


全文字数:30000字左右  原创时间:<=2022年

【内容摘要】

基于Lucene的本地搜索和网页搜索

 

 Lucene技术由Doug Cutting创建于1997年,该技术采用Java编写,并随之Java热门技术的发展而迅速发展,该技术是个开源项目,并于2001年成为Apache软件基金会jakarta项目组的一个子项目,是一个开放源代码的全文检索引擎工具包,即Lucene并不是一个软件,而是一个JAR包,一个基于全文检索引擎的架构,与之相对应的是为软件开发人员提供完整的查询引擎及索引引擎,Lucene最重要的特点是其提供的JAR包简单易用,非常方便地嵌入目标系统中并为之提供全文检索的功能,亦或是建立以此为基础的完整的全文检索引擎。
由于发展迅速,Lucene已经被翻译成很多其它的语言应用到各种各样的应用程序中,包括时下热门的C++、Perl、Python、C#、C、Ruby、PHP,等等。由于是开源项目,并且索引文件的格式独立于应用平台,而且使用的是优秀的面向对象的系统构架,所以Lucene不仅被应用到具体的全文检索项目,还被集成到各种系统软件中去,甚至被用于构建Web应用。
本系统是一个基于Lucene内核的对本地文件和网页进行简易但完整搜索的系统,采用爬虫技术抓取网页,然后利用HtmlParser技术解析网页,核心部分即索引部分采用的是Lucene的核心类Index来索引文件,最后通过用户输入查询内容来完成搜索查询过程。
[主题词]  Lucene;全文索引;全文检索;开源;搜索引擎;jakarka
Local search and Web Search based on Lucene

 
[Abstract]  Lucene technology was developed by Doug Cutting, founded in 1997, which uses Java, and subsequently the development of Java Hot rapid development of technology, the technology is an open source project, and in 2001 became the Apache Software Foundation jakarta a sub-project team project is an open source text search engine kits, that Lucene is not a software but a JAR package, a full-text search engine based on the structure, corresponding for software developers to provide a complete query engineand indexing engine, Lucene is its most important features of the JAR packages provide easy to use, very easy to embed the target system and provide full-text search functionality, but also as a basis for the establishment or the complete full-text search engine.
As the rapid development, Lucene has been translated into many other languages used in a wide variety of applications, including the nowadays popular C + +, Perl, Python, C #, C, Ruby, PHP, and so on. Because it is open-source project, and the index file format independent of the application platform, and using good object-oriented system architecture, so not only be applied to the Lucene full-text search of specific items, has also been integrated into a variety of system software to, or even be used to build Web applications.
The system is based on Lucene core documents and web pages for local simple but complete search system that uses Web crawler technology to crawl and then use technical analysis HtmlParser pages, the core of the Lucene index part is used to index the core classes Index file, and finally through the user input query to complete a search query process..
[Key Words]  Lucene;Full-Text index;full-text search;OpenSource;Search Engine;jakarka.

 

*若需了解更多与协助请咨询↓→[电脑QQ][手机QQ]【数据协助】