Architecture – Page 5 – CHEN Jian's Java Blog

solr应用的部署、运行方式

Leave a Comment / Architecture / January 20, 2013 January 20, 2013

http://wiki.apache.org/solr/SolrInstall#Setup 按这个说法，似乎要部署的对象有两份：一份solr.war，一份solr.home指向的solr应用. 在maven + svn 环境下，这种东西要怎么部署，需要好好想想。

solr不是一个jar库，而是一个java web app应用

Leave a Comment / Architecture / January 18, 2013 January 18, 2013

solr不是一个jar库，而是一个java web app应用. 你的系统一般不是引入一个库，而是要与新搭建的solr webapp进行远程通信。运行在servlet容器里的solr，在使用方式上相当于一个数据库，是独立的。如果你一定要把solr当成jar库来用也可以，官方提供了一个"EmbeddedSolr"，看合不合你的胃口： http://wiki.apache.org/solr/EmbeddedSolr

[Lucene] Payload一般只用于过滤、打分、排序等

Leave a Comment / Architecture / January 17, 2013 January 17, 2013

我原以为可以search阶段直接把特定的payload取出来，然后打印一下，但google了很久，似乎没有直接的API. Payload可能本来就不适用于这种用况。 Lucene in Action说，引用 "… use it during search, either to decide which documents are included in the search results or to alter how matched documents are scored or sorted"

Lucene: snowball一点都不好用

Leave a Comment / Architecture / January 17, 2013 January 17, 2013

package player.kent.chen.temp.lucene.stemming; import java.io.IOException; public class MyLuceneStemmingDemo { private final static String allText = "The companies organized an better activity than the individuals."; public static void main(String[] args) throws Exception { Analyzer snowball = new SnowballAnalyzer(Version.LUCENE_30, "English"); doSearch(snowball, "company"); //搜不到 doSearch(snowball, "compani"); //搜得到 doSearch(snowball, "organize"); //搜不到 doSearch(snowball, "organiz");//搜不到 doSearch(snowball, "organ");//搜得到 doSearch(snowball, "good");//搜不到 doSearch(snowball, "act");//搜不到 doSearch(snowball, …

Lucene: snowball一点都不好用 Read More »

代码示例：Lucene Highlighter

Leave a Comment / Architecture / January 17, 2013 January 17, 2013

这里用的是FastVectorHighlighter，可以高效地对付大文件 <!–pom.xml–> <dependency> <groupId>org.apache.lucene</groupId> <artifactId>lucene-fast-vector-highlighter</artifactId> <version>3.0.0</version> </dependency> package player.kent.chen.temp.lucene.highlight; import java.io.File; import org.apache.commons.io.FileUtils; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.document.Field.Index; import org.apache.lucene.document.Field.Store; import org.apache.lucene.document.Field.TermVector; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; import org.apache.lucene.util.Version; public class MyHighlightIndexer { public static void main(String[] args) throws Exception { String rootDir = "/home/kent/diskD/home-kent-dev/workspace/kent-temp/data/lucene-sanguo"; File contentDir = new File(rootDir, "content"); …

代码示例：Lucene Highlighter Read More »

Lucene: Query vs Filter

Leave a Comment / Architecture / January 15, 2013 November 30, 2019

Query: How well does this document match the search condition? A question of score Filter: Does the document match the search condition, or not? A question of true or false. Filters can be used for exact matching, range queries etc. Filtering is faster the querying because it doesn’t care about scoring.

Lucene代码示例：使用SpanQuery找到keyword在文档中第一次出现的地方

Leave a Comment / Architecture / January 15, 2013 January 15, 2013

无干货，仅供复制位置信息类 package player.kent.chen.temp.lucene.span; import org.apache.commons.lang.builder.ToStringBuilder; public class KeywordLocation { private String file; /** * position in the token stream */ private int position; private KeywordLocation() { } public static final KeywordLocation createInstance(String file, int position) { KeywordLocation instance = new KeywordLocation(); instance.file = file; instance.position = position; return instance; } public String getFile() { …

Lucene代码示例：使用SpanQuery找到keyword在文档中第一次出现的地方 Read More »

一句话概括分词的作用

Leave a Comment / Architecture / January 14, 2013 January 14, 2013

如果文档"中华人民"被分析成了"中华"和"人民"，那么你用"华人"搜索就命中不到这个文档。

Lucene Analyzer中的Position Increment

Leave a Comment / Architecture / January 14, 2013 January 14, 2013

带点语病地说，Position Increment 代表token之间的“间隙值”。一般来说，这个值等于1. 比如 Obama is a politician 分拆后，引用 Obama – position1 is – position2 a – position3 politician – position4 1,2,3,4 以1累进如果Position Increment大于1，则代表有的词省略了。引用 Obama – position1 politician – position4 从1直接跳跃到4 如果Position Increment为0，则一般是因为Analyzer配上了同义词引用 Obama – position1 politician – position4 statesman – position4 politician和statesman同义，它们的位置都是4

Lucene Analyzer的基本框架

Leave a Comment / Architecture / January 14, 2013 January 14, 2013

基本的输入输出：稍微细化一下TokenStream类：