Indexing¶
Hawkore's Advanced Lucene Indexes are an implementation of Apache Ignite's Grid H2 indexes. As such, they can be created by three different mechanisms with identical results:
- QueryEntity Based Configuration (see basic concepts for JAVA, C# and C+)
- Java Annotations Based Configuration (see basic concepts for JAVA)
- SQL Data Definition Language (DDL) (see basic concepts for DDL)
Hawkore's Advanced Lucene Indexes are intended to improve out of the box Apache Ignite's Text and SQL queries, schema mutation at runtime and providing lucene index persistence.
You can find samples source code at Hawkore's Apache Ignite extensions sample project.
Syntax¶
Hawkore's Advanced Lucene Index can be configured through JSON Lucene index definition or by annotating your JAVA code with @QueryTextField. Using one or the other mechanism will depend on the type of configuration that you use.
Using QueryEntity Based Configuration
Set JSON Lucene index definition into luceneIndexOptions
property within the QueryEntity
beans that you want to index. For your convenience, it's recommended to use Java Lucene Query builder to build JSON lucene index definition.
public class Person { private String food; private float latitude; private float longitude; private int number; private boolean bool; private String phrase; private String date; private String startDate; private String stopDate; [...] }
public class PersonKey { @AffinityKeyMapped private String name; private String gender; private String animal; private int age; [...] }
<bean class="org.apache.ignite.configuration.CacheConfiguration"> <property name="name" value="PERSONS"/> <!-- Configure query entities --> <property name="queryEntities"> <list> <bean class="org.apache.ignite.cache.QueryEntity"> <!-- Setting indexed type's key class --> <property name="keyType" value="com.hawkore.ignite.examples.entities.person.PersonKey"/> <!-- Setting indexed type's value class --> <property name="valueType" value="com.hawkore.ignite.examples.entities.person.Person"/> <!-- Defining primary key fields --> <property name="keyFields"> <set> <value>name</value> <value>gender</value> <value>animal</value> <value>age</value> </set> </property> <!-- Defining fields that will be either indexed or queryable. Indexed fields are added to 'indexes' list below.--> <property name="fields"> <map> <entry key="name" value="java.lang.String"/> <entry key="gender" value="java.lang.String"/> <entry key="animal" value="java.lang.String"/> <entry key="age" value="java.lang.Integer"/> <entry key="food" value="java.lang.String"/> <entry key="latitude" value="java.lang.Float"/> <entry key="longitude" value="java.lang.Float"/> <entry key="number" value="java.lang.Integer"/> <entry key="bool" value="java.lang.Boolean"/> <entry key="phrase" value="java.lang.String"/> <entry key="date" value="java.lang.String"/> <entry key="start_date" value="java.lang.String"/> <entry key="stop_date" value="java.lang.String"/> </map> </property> <!-- Defining advanced lucene index configuration.--> <property name="luceneIndexOptions"> <value><![CDATA[{ 'version':'0', 'refresh_seconds':'60', 'directory_path':'', 'ram_buffer_mb':'10', 'max_cached_mb':'-1', 'partitioner':'{"type":"token","partitions":10}', 'optimizer_enabled':'true', 'optimizer_schedule':'0 1 * * *', 'schema':'{ "default_analyzer":"standard", "analyzers":{ "my_custom_analyzer":{"type":"snowball","language":"Spanish","stopwords":"el,la,lo,loas,las,a,ante,bajo,cabe,con,contra"} }, "fields":{ "duration":{"type":"date_range","from":"start_date","to":"stop_date","validated":false,"pattern":"yyyy/MM/dd"}, "place":{"type":"geo_point","latitude":"latitude","longitude":"longitude"}, "date":{"type":"date","validated":true,"column":"date","pattern":"yyyy/MM/dd"}, "number":{"type":"integer","validated":false,"column":"number","boost":1.0}, "gender":{"type":"string","validated":true,"column":"gender","case_sensitive":true}, "bool":{"type":"boolean","validated":false,"column":"bool"}, "phrase":{"type":"text","validated":false,"column":"phrase","analyzer":"my_custom_analyzer"}, "name":{"type":"string","validated":false,"column":"name","case_sensitive":true}, "animal":{"type":"string","validated":false,"column":"animal","case_sensitive":true}, "age":{"type":"integer","validated":false,"column":"age","boost":1.0}, "food":{"type":"string","validated":false,"column":"food","case_sensitive":true} } }' }]]></value> </property> <!-- Defining regular indexed fields.--> <property name="indexes"> <list> <!-- Single field (aka. column) index sample --> <bean class="org.apache.ignite.cache.QueryIndex"> <constructor-arg value="name"/> </bean> <!-- Group index sample. --> <bean class="org.apache.ignite.cache.QueryIndex"> <constructor-arg> <list> <value>animal</value> <value>age</value> </list> </constructor-arg> <constructor-arg value="SORTED"/> </bean> </list> </property> </bean> </list> </property> </bean>
Using Java Annotations based configuration
Take a look to Java Annotation Syntax.
@QueryTextField( // Index configuration indexOptions = @IndexOptions(refreshSeconds = 60, partitions = 10, ramBufferMB = 10, defaultAnalyzer = "english", snowballAnalyzers = { @SnowballAnalyzer(name = "my_custom_analyzer", language = "Spanish", stopwords = "el,la,lo,loas,las,a,ante,bajo,cabe,con,contra") }), // this will create an additional field named "place" into Lucene Document // that will be indexed geoPointMappers = @GeoPointMapper(name = "place", latitude = "latitude", longitude = "longitude"), // this will create an additional field named "duration" into Lucene Document // that will be indexed dateRangeMappers = @DateRangeMapper(name = "duration", from = "start_date", to = "stop_date", pattern = "yyyy/MM/dd") ) public class Person { @QuerySqlField(index = true) @QueryTextField(stringMappers = @StringMapper) private String food; @QuerySqlField private float latitude; @QuerySqlField private float longitude; @QueryTextField(integerMappers = @IntegerMapper) private int number; @QueryTextField(booleanMappers = @BooleanMapper) private boolean bool; @QuerySqlField @QueryTextField(textMappers = @TextMapper(analyzer = "my_custom_analyzer")) private String phrase; @QueryTextField(dateMappers = @DateMapper(validated = true, pattern = "yyyy/MM/dd")) private String date; @QuerySqlField(name = "start_date") private String startDate; @QuerySqlField(name = "stop_date") private String stopDate; [...] }
public class PersonKey { @QuerySqlField(index = true) @QueryTextField(stringMappers = @StringMapper) @AffinityKeyMapped private String name; @QuerySqlField(index = true) @QueryTextField(stringMappers = @StringMapper(validated = true)) private String gender; @QuerySqlField(index = true) @QueryTextField(stringMappers = @StringMapper) private String animal; @QuerySqlField(index = true) @QueryTextField(integerMappers = @IntegerMapper) private int age; [...] }
<bean class="org.apache.ignite.configuration.CacheConfiguration"> <property name="name" value="PERSONS" /> <property name="rebalanceMode" value="ASYNC"/> <property name="cacheMode" value="PARTITIONED" /> <property name="indexedTypes"> <array> <value>com.hawkore.ignite.examples.entities.person2.PersonKey</value> <value>com.hawkore.ignite.examples.entities.person2.Person</value> </array> </property> </bean>
Using Extended SQL Data Definition Language (DDL)
To CREATE or UPDATE (relaxed existence verification) an Advanced Lucene Index over any QueryEntity you can use below CREATE INDEX
syntax:
CREATE INDEX (IF NOT EXISTS)? <index_name> ON ("<schema_name>".)?<table_name> (LUCENE) (PARALLEL <int_value>)? FULLTEXT '<options>';
Where <options>
is a JSON Lucene index definition.
CREATE INDEX Syntax restrictions
-
<index_name>
must be<table_name>
with _LUCENE_IDX suffix:ATABLE_LUCENE_IDX
-
Index must be created on internal LUCENE column.
-
PARALLEL (optional): specifies the number of threads to be used in parallel for index creation. The more threads are set the faster the index will be created and built. If the value exceeds the number of CPUs, then it will be decreased to the number of cores. If the parameter is not specified, then the number of threads will be calculated as 25% of the CPU cores available. See Apache Ignite SQL CREATE INDEX
-
FULLTEXT (mandatory): specifies the JSON Lucene index definition. Please, note that you must escape single quote
'
by two single quotes''
within<options>
. -
Other
CREATE INDEX
DDL parameters are not supported. -
For your convenience, it's recommended to use Java Lucene Query builder to build
CREATE INDEX
DDL statement.
-- Create Person QueryEntity CREATE TABLE IF NOT EXISTS "PERSONS".PERSON ( name varchar, gender varchar, animal varchar, age int, food varchar, latitude decimal, longitude decimal, number int, bool boolean, phrase varchar, date varchar, start_date varchar, stop_date varchar, PRIMARY KEY (name, gender, animal, age) ) WITH "TEMPLATE=PARTITIONED, affinity_key=name";
-- create Advanced lucene index with DDL statement CREATE INDEX PERSON_LUCENE_IDX ON "PUBLIC".PERSON(LUCENE) FULLTEXT '{ ''refresh_seconds'':''60'', ''directory_path'':'''', ''ram_buffer_mb'':''10'', ''max_cached_mb'':''-1'', ''partitioner'':''{"type":"token","partitions":10}'', ''optimizer_enabled'':''true'', ''optimizer_schedule'':''0 1 * * *'', ''version'':''0'', ''schema'':''{ "default_analyzer":"english", "analyzers":{"my_custom_analyzer":{"type":"snowball","language":"Spanish","stopwords":"el,la,lo,loas,las,a,ante,bajo,cabe,con,contra"}}, "fields":{ "duration":{"type":"date_range","from":"start_date","to":"stop_date","validated":false,"pattern":"yyyy/MM/dd"}, "place":{"type":"geo_point","latitude":"latitude","longitude":"longitude"}, "date":{"type":"date","validated":true,"pattern":"yyyy/MM/dd"}, "number":{"type":"integer","validated":false,"boost":1.0}, "gender":{"type":"string","validated":true,"case_sensitive":true}, "bool":{"type":"boolean","validated":false}, "phrase":{"type":"text","validated":false,"analyzer":"my_custom_analyzer"}, "name":{"type":"string","validated":false,"case_sensitive":true}, "animal":{"type":"string","validated":false,"case_sensitive":true}, "age":{"type":"integer","validated":false,"boost":1.0}, "food":{"type":"string","validated":false,"case_sensitive":true} } }'' }';
Update Advanced Lucene index at runtime
You can UPDATE any Advanced Lucene index at runtime without drop it by running CREATE INDEX
DDL statement with new JSON Lucene index definition. Usage limitations:
-
Modify/remove existing mappers and text analyzers is not allowed, unless you drop and create the lucene index again.
-
Some index options like directory_path or partitioner can not be modified at runtime, unless you drop and create the lucene index again.
-- added new lucene indexed field 'gender2' with case insensitive on gender column. -- Use 10 parallel threads for population. CREATE INDEX PERSON_LUCENE_IDX ON "PUBLIC".PERSON(LUCENE) PARALLEL 10 FULLTEXT '{ ''refresh_seconds'':''60'', ''directory_path'':'''', ''ram_buffer_mb'':''10'', ''max_cached_mb'':''-1'', ''partitioner'':''{"type":"token","partitions":10}'', ''optimizer_enabled'':''true'', ''optimizer_schedule'':''0 1 * * *'', ''version'':''0'', ''schema'':''{ "default_analyzer":"english", "analyzers":{"my_custom_analyzer":{"type":"snowball","language":"Spanish","stopwords":"el,la,lo,loas,las,a,ante,bajo,cabe,con,contra"}}, "fields":{ "duration":{"type":"date_range","from":"start_date","to":"stop_date","validated":false,"pattern":"yyyy/MM/dd"}, "place":{"type":"geo_point","latitude":"latitude","longitude":"longitude"}, "date":{"type":"date","validated":true,"pattern":"yyyy/MM/dd"}, "number":{"type":"integer","validated":false,"boost":1.0}, "gender":{"type":"string","validated":true,"case_sensitive":true}, "gender2":{"type":"string","validated":true,"column":"gender","case_sensitive":false}, "bool":{"type":"boolean","validated":false}, "phrase":{"type":"text","validated":false,"analyzer":"my_custom_analyzer"}, "name":{"type":"string","validated":false,"case_sensitive":true}, "animal":{"type":"string","validated":false,"case_sensitive":true}, "age":{"type":"integer","validated":false,"boost":1.0}, "food":{"type":"string","validated":false,"case_sensitive":true} } } }'' }';
JSON Syntax¶
Hawkore's Advanced Lucene Index can be configured as JSON object with this structure:
{ 'schema':'<schema_definition>' (, 'version': '<int_value>')? (, 'refresh_seconds': '<int_value>')? (, 'ram_buffer_mb': '<int_value>')? (, 'max_cached_mb': '<int_value>')? (, 'directory_path': '<string_value>')? (, 'optimizer_enabled': '<boolean_value>')? (, 'optimizer_schedule': '<string_value>')? (, 'partitioner': '<partitioner_definition>')? }
All options take a value enclosed in single quotes:
-
version
(optional. Default '0'): It's a sequential number that enforce control over eventually index configuration updates. Allows to change index options on an already created lucene index only if newversion
is equals or greater than currentversion
. Must be a strictly non-negative integer. -
refresh_seconds
(optional. Default '60' seconds): number of seconds before auto-refreshing the index reader. It is the max time taken for writes to be searchable without forcing an index refresh. You can change this value at runtime. -
ram_buffer_mb
(optional. Default '5' MB): size of the write buffer. Its content will be committed to disk when full. You can change this value at runtime. -
max_cached_mb
(optional. Default '-1'): Assigned off-heap memory. Once value is changed to > -1, you will not be able to change it back to default '-1', unless you drop and create index again. You can change this value at runtime if consumed memory by lucene index is lower than new value.- Default -1: lucene index will share a defined percent (default 20%) of underline configured cache dataregion with other lucene indexes. You can change default local shared percent by setting
IGNITE_LUCENE_INDEX_MAX_MEMORY_FACTOR
system property on Apache Ignite server nodes, default 0.2 (20%). - value = 0: unlimited: lucene index has its own unlimited memory region.
- value > 0: lucene index has its own limited to value MB memory region.
- Default -1: lucene index will share a defined percent (default 20%) of underline configured cache dataregion with other lucene indexes. You can change default local shared percent by setting
-
directory_path
(optional): The path of the directory where the Lucene index will be stored. Once index is created you can NOT change this value, unless you drop and create lucene index again. -
partitioner
(optional. Default None partitioner): The optional index partitioner. Index partitioning is useful to speed up some searches to the detriment of others, depending on the implementation. It is also useful to overcome the Lucene's hard limit of 2147483519 documents per index. Once index is created you can NOT change this value, unless you drop and create lucene index again. -
optimizer_enabled
(optional. Default 'true'): Whether Lucene index automatic optimization is enabled. Optimizer will merge deletions and delete unused files to improve performance. You can change this value at runtime. -
optimizer_schedule
(optional. Default'0 1 * * *'
): Optimizer's schedule CRON expression. By default, if optimizer is enabled, will run every day at 1:00 AM ('0 1 * * *'
). Take a look to Cron-Based Scheduling. You can change this value at runtime. -
schema
(mandatory): You can extend schema at runtime by adding new mappers and/or text analyzers to definition, note that modify/remove existing mappers is not allowed unless you drop and create lucene index again. See definition below:<schema_definition>:= { fields: { <mapper_definition> (, <mapper_definition>)* } (, analyzers: { <analyzer_definition> (, <analyzer_definition>)* })? (, default_analyzer: "<analyzer_name>")? }
Where
default_analyzer
defaults to"standard"
(org.apache.lucene.analysis.standard.StandardAnalyzer
). Once index is created you can NOT change this value, unless you drop and create lucene index again.<analyzer_definition>:= <analyzer_name>: { type: "<analyzer_type>" (, <option>: "<value>")* }
<mapper_definition>:= <mapper_name>: { type: "<mapper_type>" (, <option>: "<value>")* }
Where
<mapper_name>
will be used as indexable field name into lucene document. Must be unique within a lucene document. You will use it asfield
parameter's value on lucene searches.
Java Annotation Syntax¶
Hawkore's Advanced Lucene Index can be configured from code with the usage of extended @QueryTextField
annotation. Take a look to Apache Ignite Annotation Based Configuration.
@QueryTextField( /** * Optional. Index user-specified configuration options. Only allowed at TYPE level */ indexOptions = @IndexOptions, /** * Optional. Specifies whether the specified field is INVISIBLE on H2 table. */ hidden=<boolean>, /** Optional. The BigDecimal mappers to apply. Supports multiple definitions */ bigDecimalMappers={<@BigDecimalMapper>, <@BigDecimalMapper>, ...}, /** Optional. The BigInteger mappers to apply. Supports multiple definitions */ bigIntegerMappers={<@BigIntegerMapper>, <@BigIntegerMapper>, ...}, /** Optional. The Bitemporal mappers to apply. Only allowed at TYPE level. Supports multiple definitions */ bitemporalMappers={<@BitemporalMapper>, <@BitemporalMapper>, ...}, /** Optional. The Blob mappers to apply. Supports multiple definitions */ blobMappers={<@BlobMapper>, <@BlobMapper>, ...}, /** Optional. The Boolean mappers to apply */ booleanMappers={<@BooleanMapper>, <@BooleanMapper>, ...}, /** Optional. The Date mappers to apply. Supports multiple definitions */ dateMappers={<@DateMapper>, <@DateMapper>, ...}, /** Optional. The DateRage mappers to apply. Only allowed at TYPE level. Supports multiple definitions */ dateRangeMappers={<@DateRangeMapper>, <@DateRangeMapper>, ...}, /** Optional. The Double mappers to apply. Supports multiple definitions */ doubleMappers={<@DoubleMapper>, <@DoubleMapper>, ...}, /** Optional. The Float mappers to apply. Supports multiple definitions */ floatMappers={<@FloatMapper>, <@FloatMapper>, ...}, /** Optional. The GeoPoint mappers to apply. Only allowed at TYPE level. Supports multiple definitions */ geoPointMappers={<@GeoPointMapper>, <@GeoPointMapper>, ...}, /** Optional. The GeoShape mappers to apply. Supports multiple definitions */ geoShapeMappers={<@GeoShapeMapper>, <@GeoShapeMapper>, ...}, /** Optional. The Inet mappers to apply. Supports multiple definitions */ inetMappers={<@InetMapper>, <@InetMapper>, ...}, /** Optional. The Integer mappers to apply. Supports multiple definitions */ integerMappers={<@IntegerMapper>, <@IntegerMapper>, ...}, /** Optional. The Long mappers to apply. Supports multiple definitions */ longMappers={<@LongMapper>, <@LongMapper>, ...}, /** Optional. The String mappers to apply. Supports multiple definitions */ stringMappers={<@StringMapper>, <@StringMapper>, ...}, /** Optional. The Text mappers to apply. Supports multiple definitions */ textMappers={<@TextMapper>, <@TextMapper>, ...}, /** Optional. The UUID mappers to apply. Supports multiple definitions */ uuidMappers={<@UUIDMapper>, <@UUIDMapper>, ...} )
@IndexOptions( /** Optional. Default 0 */ version=<int>, /** Optional. Default 1 */ partitions=<int>, /** Optional. Default IGNITE_WORK_DIRECTORY */ directoryPath=<String>, /** Optional. Default 60 seconds */ refreshSeconds=<int>, /** Optional. Default 5 (MB) */ ramBufferMB=<int>, /** Optional. Default -1 */ maxCachedMB=<int>, /** Optional. Default true */ optimizerEnabled=<boolean> /** Optional. Default "0 1 * * *". If Optimizer is enabled, will run every day at 1:00 AM by default */ optimizerSchedule=<String CRON expression>, /** Optional. Default "standard" */ defaultAnalyzer=<String>, /** Optional. Supports multiple definitions */ classpathAnalyzers={<@ClasspathAnalyzer>, <@ClasspathAnalyzer>, ...}, /** Optional. Supports multiple definitions */ snowballAnalyzers={<@SnowballAnalyzer>, <@SnowballAnalyzer>, ...} )
You can customize Lucene index within @IndexOptions
annotation:
-
version
(optional. Default 0): It's a sequential number that enforce control over eventually index configuration updates. Allows to change index options when using Dynamic SQL entities functionality on an already created lucene index only if new@IndexOptions
'sversion
is equals or greater than current@IndexOptions
'sversion
. Must be a strictly non-negative integer. -
partitions
(optional. Default 1): Number of sub-indexes in which to split the lucene index. See Partitioners. Once index is created you can NOT change this value, unless you drop and create lucene index again -
directoryPath
(optional. DefaultIGNITE_WORK_DIRECTORY
): The path of the directory where the Lucene index will be stored. Once index is created you can NOT change this value, unless you drop and create lucene index again -
refreshSeconds
(optional. Default 60 seconds): number of seconds before auto-refreshing the index reader. It is the max time taken for writes to be searchable without forcing an index refresh. You can change this value at runtime. -
ramBufferMB
(optional. Default 5 MB): size of the write buffer in MB. Its content will be committed to disk when full. You can change this value at runtime. -
maxCachedMB
(optional. Default -1): Assigned off-heap memory. Once value is changed to > -1, you will not be able to change it back to default -1, unless you drop and create index again. You can change this value at runtime if consumed memory is lower than new value.- Default -1: lucene index will share a defined percent (default 20%) of underline configured cache dataregion with other lucene indexes. You can change default local shared percent by setting
IGNITE_LUCENE_INDEX_MAX_MEMORY_FACTOR
system property on Apache Ignite server nodes, default 0.2 (20%). - value = 0: unlimited: lucene index has its own unlimited memory region.
- value > 0: lucene index has its own limited to value MB memory region.
- Default -1: lucene index will share a defined percent (default 20%) of underline configured cache dataregion with other lucene indexes. You can change default local shared percent by setting
-
optimizerEnabled
(optional. Default true): Whether Lucene index automatic optimization is enabled. Optimizer will merge deletions and delete unused files to improve performance. By default is enabled. You can change this value at runtime. -
optimizerSchedule
(optional. Default"0 1 * * *"
): Optimizer's schedule CRON expression. By default, if optimizer is enabled, will run every day at 1:00 AM ("0 1 * * *"
). Take a look to Cron-Based Scheduling. You can change this value at runtime. -
defaultAnalyzer
(optional. Default "standard"): The default text analyzer.Once index is created you can NOT change this value, unless you drop and create lucene index again -
classpathAnalyzers
(optional): Custom text analyzers using a Lucene's Analyzers in classpath. Once index is created you can change this value only by adding new analyzers. -
snowballAnalyzers
(optional): Custom text analyzers for tartarus.org snowball Analyzers. Once index is created you can change this value only by adding new analyzers.
Partitioners¶
Lucene indexes can be partitioned on a per-node basis. This means that the local index in each node can be split in multiple smaller fragments. Index partitioning is useful to speed up some searches to the detriment of others, depending on the implementation. It is also useful to overcome the Lucene's hard limit of 2147483519 documents per local index, which becomes a per-partition limit.
Partitioning is disabled by default, it can be activated specifying partitions
> 1 on @IndexOptions
if you use Java Annotation Syntax or by defining a partitioner
if you use JSON Syntax. Once index is created you can not modify this value.
Please note that the index configuration specifies the values of some Lucene memory-related attributes, such as ram Buffer MB. These attributes are applied to each local Lucene index or partition, so the amount of memory should be multiplied by the number of partitions. Additionally, if you configure your cache with a #degree of query parallelism within a single node, this amount should be multiplied by that #degree.
None partitioner¶
A partitioner with no action. This is the default implementation to use.
@QueryTextField ( indexOptions = @IndexOptions(refreshSeconds = 60, partitions = 1) [...] ) public class Tweet { [...] }
CREATE INDEX TWEET_LUCENE_IDX ON "PUBLIC".TWEET(LUCENE) FULLTEXT '{ ''schema'': ''{...}'', ''partitioner'': ''{type: "none"}'' }';
Token partitioner¶
A partitioner based on the cache key. Partitioning on key guarantees a good load balancing between partitions while speeding up partition-directed searches to the detriment of key range searches performance. It allows to efficiently run partition directed queries in nodes indexing more than 2147483519 rows. The number of partitions per node should be specified as a strictly non-negative integer > 1.
Take a look to Affinity Collocation performance tip.
public class TweetKey { // annotated with @QuerySqlField(index=true) will create a Grid H2 index (TWEETKEY_ID_IDX name) for non lucene searches and publishes "id" as table column @QuerySqlField(index=true) private Integer id; // field annotated with @AffinityKeyMapped will be auto-indexed as a Grid H2 index (AFFINITY_KEY index name) for affinity collocation and non lucene searches, so set index=true is not required // field annotated with @QuerySqlField publishes "user" as table column to allow use it from a lucene index mapper @QuerySqlField @AffinityKeyMapped private String user; [...] } @QueryTextField( // index with 4 sub-partitions indexOptions = @IndexOptions(partitions=4), // this will create a field named "id" into Lucene Document that will be // indexed as integer, mapped to composed primary key TweetKey's "id" column name integerMappers = @IntegerMapper(name = "id"), // this will create a field named "user" into Lucene Document that // will be indexed as string, mapped to composed primary key TweetKey's "user" column name stringMappers = @StringMapper(name = "user") ) public class Tweet { // field annotated with @QuerySqlField publishes "body" as table column to allow use it from a lucene index mapper // field annotated with @QueryTextField(textMappers = @TextMapper(analyzer = "english")) will create an indexed field named "body" into Lucene Document @QuerySqlField @QueryTextField(textMappers = @TextMapper(analyzer = "english")) private String body; [...] }
-- as a sample, here we have decided to create tweet QueryEntity by DDL statement CREATE TABLE "PUBLIC".tweet ( id int, user varchar, body varchar, PRIMARY KEY (id, user) ) WITH "TEMPLATE=PARTITIONED, affinity_key=user"; -- lucene index creation with 4 local sub partitions by cache key CREATE INDEX TWEET_LUCENE_IDX ON "PUBLIC".TWEET(LUCENE) FULLTEXT '{ ''schema'': ''{...}'', ''partitioner'': ''{type: "token", partitions: 4}'' }';
-- Fetch from 1 node (affinity collocation due to user condition), 1 local sub partition (due to full cache key condition) SELECT * FROM "PUBLIC".tweet WHERE lucene = '{...}' AND user = 'jsmith' AND id = 5; -- Fetch from 1 node (affinity collocation due to user condition), all local sub partitions SELECT * FROM "PUBLIC".tweet WHERE lucene = '{...}' AND user = 'jsmith'; -- Fetch from all nodes, all partitions SELECT * FROM "PUBLIC".tweet WHERE lucene = '{...}';
Text analyzers¶
Custom analyzers must be defined within @IndexOptions
(Java annotation based configuration) or within schema
(DDL or QueryEntity based configuration). Details are listed in the table below.
Analyzer |
Parameter (JSON) |
Parameter (Annotation) |
Value type |
Mandatory |
---|---|---|---|---|
-- |
name |
String |
Yes |
|
class |
className |
String |
Yes |
|
-- |
name |
String |
Yes |
|
language |
language |
String |
Yes |
|
stopwords |
stopwords |
String |
No |
The custom analyzers can be referenced by [Text Mapper]s using its name
.
Additionally, there are prebuilt analyzers for:
Analyzer name |
Analyzer full package name |
standard |
|
keyword |
|
stop |
|
whitespace |
|
simple |
|
classic |
|
arabic |
|
armenian |
|
basque |
|
brazilian |
|
bulgarian |
|
catalan |
|
cjk |
|
czech |
|
dutch |
|
danish |
|
english |
|
finnish |
|
french |
|
galician |
|
german |
|
greek |
|
hindi |
|
hungarian |
|
indonesian |
|
irish |
|
italian |
|
latvian |
|
norwegian |
|
persian |
|
portuguese |
|
romanian |
|
russian |
|
sorani |
|
spanish |
|
swedish |
|
turkish |
|
thai |
Classpath analyzer¶
Text analyzer which instances a Lucene's analyzer present in classpath.
Example:
@QueryTextField( // Index configuration indexOptions = @IndexOptions( refreshSeconds = 60, partitions = 10, ramBufferMB = 10, defaultAnalyzer = "english", classpathAnalyzers = { @ClasspathAnalyzer(name = "my_classpath_standard_analyzer", className = "org.apache.lucene.analysis.standard.StandardAnalyzer"), @ClasspathAnalyzer(name = "my_classpath_french_analyzer", className = "org.apache.lucene.analysis.fr.FrenchAnalyzer") } ) ) public class MyEntity { @QueryTextField(textMappers = @TextMapper(analyzer = "my_classpath_standard_analyzer")) @QuerySqlField private String body; @QueryTextField(textMappers = @TextMapper(analyzer = "my_classpath_french_analyzer")) @QuerySqlField private String aFrenchText; //will use defaultAnalyzer ("english") @QueryTextField(textMappers = @TextMapper) @QuerySqlField private String anEnglishText; [...] }
CREATE TABLE "PUBLIC".MYENTITY ( id int, body varchar, aFrenchText varchar, anEnglishText varchar, PRIMARY KEY (id) ) WITH "TEMPLATE=PARTITIONED"; CREATE INDEX MYENTITY_LUCENE_IDX ON "PUBLIC".MYENTITY(LUCENE) FULLTEXT '{ ''refresh_seconds'':''60'', ''ram_buffer_mb'':''10'', ''partitioner'':''{"type":"token","partitions":10}'', ''schema'':''{ "default_analyzer":"english", "analyzers":{ "my_classpath_standard_analyzer":{"type":"classpath","class":"org.apache.lucene.analysis.standard.StandardAnalyzer"}, "my_classpath_french_analyzer":{"type":"classpath","class":"org.apache.lucene.analysis.fr.FrenchAnalyzer"} }, "fields":{ "body":{"type":"text","validated":false,"analyzer":"my_classpath_standard_analyzer"}, "aFrenchText":{"type":"text","validated":false, "analyzer":"my_classpath_french_analyzer"}, "anEnglishText":{"type":"text","validated":false} } }'' }';
Snowball analyzer¶
Text analyzer using a http://snowball.tartarus.org/ snowball filter SnowballFilter
Example:
@QueryTextField( // Index configuration indexOptions = @IndexOptions( refreshSeconds = 60, partitions = 10, ramBufferMB = 10, defaultAnalyzer = "english", snowballAnalyzers = { @SnowballAnalyzer(name = "my_snowball_spanish_analyzer", language = "Spanish", stopwords = "el,la,lo,los,las,a,ante,bajo,cabe,con,contra"), @SnowballAnalyzer(name = "my_snowball_french_analyzer", language = "French") } ) ) public class MyEntity { @QueryTextField(textMappers = @TextMapper(analyzer = "my_snowball_spanish_analyzer")) @QuerySqlField private String body; @QueryTextField(textMappers = @TextMapper(analyzer = "my_snowball_spanish_analyzer")) @QuerySqlField private String anotherSpanishText; @QueryTextField(textMappers = @TextMapper(analyzer = "my_snowball_french_analyzer")) @QuerySqlField private String aFrenchText; //will use defaultAnalyzer = "english" @QueryTextField(textMappers = @TextMapper) @QuerySqlField private String anEnglishText; [...] }
CREATE TABLE "PUBLIC".MYENTITY ( id int, body varchar, anotherSpanishText varchar, aFrenchText varchar, anEnglishText varchar, PRIMARY KEY (id) ) WITH "TEMPLATE=PARTITIONED"; CREATE INDEX MYENTITY_LUCENE_IDX ON "PUBLIC".MYENTITY(LUCENE) FULLTEXT '{ ''refresh_seconds'':''60'', ''ram_buffer_mb'':''10'', ''partitioner'':''{"type":"token","partitions":10}'', ''schema'':''{ "default_analyzer":"english", "analyzers":{ "my_snowball_spanish_analyzer":{"type":"snowball","language":"Spanish","stopwords":"el,la,lo,loas,las,a,ante,bajo,cabe,con,contra"}, "my_snowball_french_analyzer":{"type":"snowball","language":"French"} }, "fields":{ "body":{"type":"text","validated":false,"analyzer":"my_snowball_spanish_analyzer"}, "anotherSpanishText":{"type":"text","validated":false,"analyzer":"my_snowball_spanish_analyzer"}, "aFrenchText":{"type":"text","validated":false,,"analyzer":"my_snowball_french_analyzer"}, "anEnglishText":{"type":"text","validated":false} } }'' }';
Supported languages: English, French, Spanish, Portuguese, Italian, Romanian, German, Dutch, Swedish, Norwegian, Danish, Russian, Finnish, Hungarian and Turkish.
Mappers¶
Field mapping definition options specify how the SQL rows will be mapped to Lucene documents. Several mappers can be applied to the same SQL table's column/s.
All mappers have a validated
option indicating if the mapped column values must be validated at SQL level
before performing the distributed write operation.
If this option is set then the coordinator node will throw an error on writes containing values that can't be mapped,
causing the failure of all the write operation and notifying the client about the failure cause.
If validation is not set, which is the default setting, writes will never fail due to the index.
Instead, each failing column value will be silently discarded,
and the error message will be just logged in the implied nodes.
This option is useful to avoid writes containing values that can't be searched afterwards,
and can also be used as a generic data validation layer.
Note that mappers affecting several columns at a time, such as date_range, geo_point and bitemporal need to have all the involved columns to perform validation, so no partial columns update will be allowed when validation is active.
Using JSON syntax¶
Details and default values are listed in the table below for JSON Syntax:
Mapper type |
Option |
Value type |
Default value |
Mandatory |
---|---|---|---|---|
validated |
boolean |
false |
No |
|
column |
string |
mapper_name of the schema |
No |
|
integer_digits |
integer |
32 |
No |
|
decimal_digits |
integer |
32 |
No |
|
validated |
boolean |
false |
No |
|
column |
string |
mapper_name of the schema |
No |
|
digits |
integer |
32 |
No |
|
validated |
boolean |
false |
No |
|
vt_from |
string |
Yes |
||
vt_to |
string |
Yes |
||
tt_from |
string |
Yes |
||
tt_to |
string |
Yes |
||
pattern |
string |
yyyy/MM/dd HH:mm:ss.SSS Z |
No |
|
now_value |
object |
Long.MAX_VALUE |
No |
|
validated |
boolean |
false |
No |
|
column |
string |
mapper_name of the schema |
No |
|
validated |
boolean |
false |
No |
|
column |
string |
mapper_name of the schema |
No |
|
validated |
boolean |
false |
No |
|
column |
string |
mapper_name of the schema |
No |
|
pattern |
string |
yyyy/MM/dd HH:mm:ss.SSS Z |
No |
|
validated |
boolean |
false |
No |
|
from |
string |
Yes |
||
to |
string |
Yes |
||
pattern |
string |
yyyy/MM/dd HH:mm:ss.SSS Z |
No |
|
validated |
boolean |
false |
No |
|
column |
string |
mapper_name of the schema |
No |
|
boost |
integer |
0.1f |
No |
|
validated |
boolean |
false |
No |
|
column |
string |
mapper_name of the schema |
No |
|
boost |
integer |
0.1f |
No |
|
validated |
boolean |
false |
No |
|
latitude |
string |
Yes |
||
longitude |
string |
Yes |
||
max_levels |
integer |
11 |
No |
|
validated |
boolean |
false |
No |
|
column |
string |
mapper_name of the schema |
No |
|
max_levels |
integer |
5 |
No |
|
transformations |
array |
No |
||
validated |
boolean |
false |
No |
|
column |
string |
mapper_name of the schema |
No |
|
validated |
boolean |
false |
No |
|
column |
string |
mapper_name of the schema |
No |
|
boost |
integer |
0.1f |
No |
|
validated |
boolean |
false |
No |
|
column |
string |
mapper_name of the schema |
No |
|
boost |
integer |
0.1f |
No |
|
validated |
boolean |
false |
No |
|
column |
string |
mapper_name of the schema |
No |
|
case_sensitive |
boolean |
true |
No |
|
validated |
boolean |
false |
No |
|
column |
string |
mapper_name of the schema |
No |
|
analyzer |
string |
default_analyzer of the schema |
No |
|
validated |
boolean |
false |
No |
|
column |
string |
mapper_name of the schema |
No |
Using Java Annotation Syntax¶
When using Java Annotation Syntax you must take care of these restrictions:
-
Single-column mappers could be defined within
@QueryTextField
annotation at type or field level. -
Multiple-columns mappers must be defined within
@QueryTextField
annotation at type level only. -
name parameter (mapper's name) must be unique within same lucene document, so it's mandatory when:
- Mapper is defined at type level.
- Mapper affects several columns at a time (Multiple-columns mapper).
- There are more than one mapper applied to the same QueryEntity's property to guarantee its uniqueness.
When you apply a mapper at field level, by default mapper's name will be annotated QueryEntity's property name.
-
column parameter is mandatory when:
- name parameter differs from real QueryEntity's property name to map.
When you apply a mapper at field level, by default mapper's column value will be the mapper's name parameter value.
-
Entity's fields to map must be annotated with
@QuerySqlField
, take a look to Apache Ignite Annotation Based Configuration. -
You must define at least one mapper within your Entity.
Note that mappers affecting several columns at a time, such as @DateRangeMapper, @GeoPointMapper and @BitemporalMapper, must to be defined at type level and need to have all the involved columns to perform validation, so no partial columns update will be allowed when validation is active.
Hawkore's Dynamic SQL entities functionality allows alter QueryEntities transparently. Index alteration will be performed on background.
Details and default values are listed in the table below.
Mapper Type |
Option |
Value type |
Default value |
Mandatory |
---|---|---|---|---|
validated |
boolean |
false |
No |
|
name |
string |
annotated QueryEntity's property name |
No* |
|
column |
string |
mapper's name |
No* |
|
integer_digits |
integer |
32 |
No |
|
decimal_digits |
integer |
32 |
No |
|
validated |
boolean |
false |
No |
|
name |
string |
annotated QueryEntity's property name |
No* |
|
column |
string |
mapper's name |
No* |
|
digits |
integer |
32 |
No |
|
validated |
boolean |
false |
No |
|
name |
string |
Yes |
||
vt_from |
string |
Yes |
||
vt_to |
string |
Yes |
||
tt_from |
string |
Yes |
||
tt_to |
string |
Yes |
||
pattern |
string |
yyyy/MM/dd HH:mm:ss.SSS Z |
No |
|
now_value |
object |
Long.MAX_VALUE |
No |
|
validated |
boolean |
false |
No |
|
name |
string |
annotated QueryEntity's property name |
No* |
|
column |
string |
mapper's name |
No* |
|
validated |
boolean |
false |
No |
|
name |
string |
annotated QueryEntity's property name |
No* |
|
column |
string |
mapper's name |
No* |
|
validated |
boolean |
false |
No |
|
name |
string |
annotated QueryEntity's property name |
No* |
|
column |
string |
mapper's name |
No* |
|
pattern |
string |
yyyy/MM/dd HH:mm:ss.SSS Z |
No |
|
validated |
boolean |
false |
No |
|
name |
string |
Yes |
||
column |
string |
Yes |
||
from |
string |
Yes |
||
to |
string |
Yes |
||
pattern |
string |
yyyy/MM/dd HH:mm:ss.SSS Z |
No |
|
validated |
boolean |
false |
No |
|
name |
string |
annotated QueryEntity's property name |
No* |
|
column |
string |
mapper's name |
No* |
|
boost |
integer |
0.1f |
No |
|
validated |
boolean |
false |
No |
|
name |
string |
annotated QueryEntity's property name |
No* |
|
column |
string |
mapper's name |
No* |
|
boost |
integer |
0.1f |
No |
|
validated |
boolean |
false |
No |
|
name |
string |
Yes |
||
column |
string |
Yes |
||
latitude |
string |
Yes |
||
longitude |
string |
Yes |
||
max_levels |
integer |
11 |
No |
|
validated |
boolean |
false |
No |
|
name |
string |
annotated QueryEntity's property name |
No* |
|
column |
string |
mapper's name |
No* |
|
max_levels |
integer |
5 |
No |
|
transformations |
array |
No |
||
validated |
boolean |
false |
No |
|
name |
string |
annotated QueryEntity's property name |
No* |
|
column |
string |
mapper's name |
No* |
|
validated |
boolean |
false |
No |
|
name |
string |
annotated QueryEntity's property name |
No* |
|
column |
string |
mapper's name |
No* |
|
boost |
integer |
0.1f |
No |
|
validated |
boolean |
false |
No |
|
name |
string |
annotated QueryEntity's property name |
No* |
|
column |
string |
mapper's name |
No* |
|
boost |
integer |
0.1f |
No |
|
validated |
boolean |
false |
No |
|
name |
string |
annotated QueryEntity's property name |
No* |
|
column |
string |
mapper's name |
No* |
|
case_sensitive |
boolean |
true |
No |
|
validated |
boolean |
false |
No |
|
name |
string |
annotated QueryEntity's property name |
No* |
|
column |
string |
mapper's name |
No* |
|
analyzer |
string |
@IndexOptions's defaultAnalyzer |
No |
|
validated |
boolean |
false |
No |
|
name |
string |
annotated QueryEntity's property name |
No* |
|
column |
string |
mapper's name |
No* |
Big decimal mapper¶
Single-column mapper. Maps arbitrary precision signed decimal values.
Parameters:
- validated (default = false): if mapping errors should make SQL writes fail, instead of just logging the error.
- column (default = mapper's name): the name of the column storing the big decimal to be indexed.
- integer_digits (default = 32): the max number of allowed decimal digits for the integer part.
- decimal_digits (default = 32): the max number of allowed decimal digits for the decimal part.
Additional parameters for Java Annotation Syntax:
- name (default = name of the annotated QueryEntity's property): The mapper's name. It will be used as indexed field name into lucene document. You will use it as
field
parameter's value on lucene searches.
Supported Java types:
java.lang.String
(a base 10 decimal string representation, decimal separator must be '.' )java.lang.Byte
orbyte
java.lang.Short
orshort
java.lang.Integer
orint
java.lang.Long
orlong
java.lang.Float
orfloat
java.lang.Double
ordouble
java.math.BigInteger
java.math.BigDecimal
Supported SQL types:
- bigint, decimal, double, float, int, real, smallint, tinyint, char, varchar
Example:
@QueryTextField( // Index configuration indexOptions = @IndexOptions( refreshSeconds = 60, partitions = 10 ), // multiple mappers are supported bigDecimalMappers = { @BigDecimalMapper (name="myBigDecimalField", column="column_name_1", integer_digits= 2, decimal_digits= 2, validated=true), @BigDecimalMapper (name="aStringFloat", integer_digits= 3, decimal_digits= 3, validated=false) } ) public class MyEntity { @QuerySqlField (name = "column_name_1") private float aFloat; @QuerySqlField private String aStringFloat; [...] }
@QueryTextField( // Index configuration indexOptions = @IndexOptions( refreshSeconds = 60, partitions = 10 ) ) public class MyEntity { @QueryTextField( // multiple mappers are supported bigDecimalMappers = @BigDecimalMapper (name="myBigDecimalField", column="column_name_1", integer_digits= 2, decimal_digits= 2, validated=true) ) @QuerySqlField (name = "column_name_1") private float aFloat; @QueryTextField( bigDecimalMappers = @BigDecimalMapper (integer_digits= 3, decimal_digits= 3, validated=false) ) @QuerySqlField private String aStringFloat; [...] }
CREATE TABLE "PUBLIC".MYENTITY ( id int, column_name_1 float, aStringFloat varchar PRIMARY KEY (id) ) WITH "TEMPLATE=PARTITIONED"; CREATE INDEX MYENTITY_LUCENE_IDX ON "PUBLIC".MYENTITY(LUCENE) FULLTEXT '{ ''refresh_seconds'':''60'', ''partitioner'':''{"type":"token","partitions":10}'', ''schema'':''{ "fields": { "myBigDecimalField": { type: "bigdec", integer_digits: 2, decimal_digits: 2, validated: true, column: "column_name_1" }, "aStringFloat": { type: "bigdec", integer_digits: 3, decimal_digits: 3, validated: false } } }'' }';
Big integer mapper¶
Single-column mapper. Maps arbitrary precision signed integer values.
Parameters:
- validated (default = false): if mapping errors should make SQL writes fail, instead of just logging the error.
- column (default = mapper's name): the name of the column storing the big integer to be indexed.
- digits (default = 32): the max number of allowed digits.
Additional parameters for Java Annotation Syntax:
- name (default = name of the annotated QueryEntity's property): The mapper's name. It will be used as indexed field name into lucene document. You will use it as
field
parameter's value on lucene searches.
Supported Java types:
java.lang.String
(a base 10 integer string representation)java.lang.Byte
orbyte
java.lang.Short
orshort
java.lang.Integer
orint
java.lang.Long
orlong
java.math.BigInteger
Supported SQL types:
- bigint, int, smallint, tinyint, char, varchar
Example:
@QueryTextField( // Index configuration indexOptions = @IndexOptions( refreshSeconds = 60, partitions = 10 ), // multiple mappers are supported bigIntegerMappers = { @BigIntegerMapper (name="myBigIntegerField", column="column_name_1", digits= 5, validated=true), @BigIntegerMapper (name="aBigInteger", digits= 10, validated=false) } ) public class MyEntity { @QuerySqlField (name = "column_name_1") private String aInteger; @QuerySqlField private BigInteger aBigInteger; [...] }
@QueryTextField( // Index configuration indexOptions = @IndexOptions( refreshSeconds = 60, partitions = 10 ) ) public class MyEntity { @QueryTextField( bigIntegerMappers = { @BigIntegerMapper (name="myBigIntegerField", column="column_name_1", digits= 5, validated=true) } ) @QuerySqlField (name = "column_name_1") private String aInteger; @QueryTextField( bigIntegerMappers = @BigIntegerMapper (digits= 10, validated=false) ) @QuerySqlField private BigInteger aBigInteger; [...] }
CREATE TABLE "PUBLIC".MYENTITY ( id int, column_name_1 varchar, aBigInteger bigint PRIMARY KEY (id) ) WITH "TEMPLATE=PARTITIONED"; CREATE INDEX MYENTITY_LUCENE_IDX ON "PUBLIC".MYENTITY(LUCENE) FULLTEXT '{ ''refresh_seconds'':''60'', ''partitioner'':''{"type":"token","partitions":10}'', ''schema'':''{ "fields": { "myBigIntegerField": { type: "bigint", digits: 5, validated: true, column: "column_name_1" }, "aBigInteger": { type: "bigint", digits: 10, validated: true } } }'' }';
Bitemporal mapper¶
Multiple-columns mapper. Maps four columns containing the four dates defining a bitemporal fact. The mapped columns shouldn't be collections.
Parameters:
- validated (default = false): if mapping errors should make SQL writes fail, instead of just logging the error.
- vt_from (mandatory): the name of the column storing the beginning of the valid date range.
- vt_to (mandatory): the name of the column storing the end of the valid date range.
- tt_from (mandatory): the name of the column storing the beginning of the transaction date range.
- tt_to (mandatory): the name of the column storing the end of the transaction date range.
- now_value (default = Long.MAX_VALUE): a date representing now.
- pattern (default = yyyy/MM/dd HH:mm:ss.SSS Z): the date pattern for parsing not-date columns and creating Lucene fields. Note that it can be used to index dates with reduced precision. If column is a
java.lang.Number
that does not matchpattern
will be parsed as the milliseconds since January 1, 1970, 00:00:00 GMT.
Additional parameters for Java Annotation Syntax:
- name (mandatory): The mapper's name. It will be used as indexed field name into lucene document. You will use it as
field
parameter's value on lucene searches.
Supported Java types:
java.lang.String
(string representation of a date, format must matchpattern
)java.lang.Byte
orbyte
java.lang.Short
orshort
java.lang.Integer
orint
java.lang.Long
orlong
java.math.BigInteger
java.util.Date
java.util.UUID
- For the curious: parsed as Date from the number of milliseconds since Unix Epoch October 15, 1582, 00:00:00 UTC extracted from UUID.timestamp(), then apply
pattern
.
- For the curious: parsed as Date from the number of milliseconds since Unix Epoch October 15, 1582, 00:00:00 UTC extracted from UUID.timestamp(), then apply
Supported SQL types:
- bigint, date, int, smallint, tinyint, timestamp, uuid, char, varchar
Example:
@QueryTextField( // Index configuration indexOptions = @IndexOptions( refreshSeconds = 60, partitions = 10 ), bitemporalMappers = @BitemporalMapper( name= "myBitemporal", vt_from = "valt_from", vt_to = "valt_to", tt_from = "trant_from", tt_to = "trant_to", pattern = "yyyy/MM/dd HH:mm:ss.SSS", now_value = "3000/01/01 00:00:00.000" ) ) public class MyEntity { @QuerySqlField(name = "valt_from") private String vtFrom; @QuerySqlField(name = "valt_to") private String vtTo; @QuerySqlField(name = "trant_from") private String ttFrom; @QuerySqlField(name = "trant_to") private String ttTo; [...] }
CREATE TABLE "PUBLIC".MYENTITY ( id int, valt_from varchar, valt_to varchar, trant_from varchar, trant_to varchar, PRIMARY KEY (id) ) WITH "TEMPLATE=PARTITIONED"; CREATE INDEX MYENTITY_LUCENE_IDX ON "PUBLIC".MYENTITY(LUCENE) FULLTEXT '{ ''refresh_seconds'':''60'', ''partitioner'':''{"type":"token","partitions":10}'', ''schema'':''{ "fields": { "myBitemporal": { type: "bitemporal", vt_from: "valt_from", vt_to: "valt_to", tt_from: "trant_from", tt_to: "trant_to", validated: true, pattern: "yyyy/MM/dd HH:mm:ss.SSS", now_value: "3000/01/01 00:00:00.000" } } }'' }';
Blob mapper¶
Single-column mapper. Maps a blob value.
Parameters:
- validated (default = false): if mapping errors should make SQL writes fail, instead of just logging the error.
- column (default = mapper's name): the name of the column storing blob to be indexed.
Additional parameters for Java Annotation Syntax:
- name (default = name of the annotated QueryEntity's property): The mapper's name. It will be used as indexed field name into lucene document. You will use it as
field
parameter's value on lucene searches.
Supported Java types:
java.lang.String
(hexadecimal string. Example: "FA2B" or "0xFA2B")byte[]
java.nio.ByteBuffer
Supported SQL types:
- binary, char, varchar
Example:
@QueryTextField( // Index configuration indexOptions = @IndexOptions( refreshSeconds = 60, partitions = 10 ), // multiple mappers are supported blobMappers = { @BlobMapper (name="myByteArray", column="column_name", validated=true), @BlobMapper (name="aHexString", validated=true) } ) public class MyEntity { @QuerySqlField (name = "column_name") private byte[] aByteArray; @QuerySqlField private String aHexString; [...] }
@QueryTextField( // Index configuration indexOptions = @IndexOptions( refreshSeconds = 60, partitions = 10 ) ) public class MyEntity { @QueryTextField( blobMappers = @BlobMapper (name="myByteArray", column="column_name", validated=true) ) @QuerySqlField (name = "column_name") private byte[] aByteArray; @QueryTextField( blobMappers = @BlobMapper (validated=true) ) @QuerySqlField private String aHexString; [...] }
CREATE TABLE "PUBLIC".MYENTITY ( id int, column_name binary, aHexString varchar, PRIMARY KEY (id) ) WITH "TEMPLATE=PARTITIONED"; CREATE INDEX MYENTITY_LUCENE_IDX ON "PUBLIC".MYENTITY(LUCENE) FULLTEXT '{ ''refresh_seconds'':''60'', ''partitioner'':''{"type":"token","partitions":10}'', ''schema'':''{ "fields": { "myByteArray": { type: "bytes", validated: true, column: "column_name" }, "aHexString": { type: "bytes", validated: true } } }'' }';
Boolean mapper¶
Single-column mapper. Maps a boolean value.
Parameters:
- validated (default = false): if mapping errors should make SQL writes fail, instead of just logging the error.
- column (default = mapper's name): the name of the column storing boolean value to be indexed.
Additional parameters for Java Annotation Syntax:
- name (default = name of the annotated QueryEntity's property): The mapper's name. It will be used as indexed field name into lucene document. You will use it as
field
parameter's value on lucene searches.
Supported Java types:
java.lang.String
(a boolean string representation:true
,TRUE
,false
,FALSE
)java.lang.Boolean
orboolean
Supported SQL types:
- boolean, char, varchar
Example:
@QueryTextField( // Index configuration indexOptions = @IndexOptions( refreshSeconds = 60, partitions = 10 ), // multiple mappers are supported booleanMappers = { @BooleanMapper (name="myBoolean", column="column_name", validated=true), @BooleanMapper (name="aBooleanString", validated=true) } ) public class MyEntity { @QuerySqlField (name = "column_name") private boolean aBoolean; @QuerySqlField private String aBooleanString; [...] }
@QueryTextField( // Index configuration indexOptions = @IndexOptions( refreshSeconds = 60, partitions = 10 ) ) public class MyEntity { @QueryTextField( booleanMappers = @BooleanMapper (name="myBoolean", column="column_name", validated=true) ) @QuerySqlField (name = "column_name") private boolean aBoolean; @QueryTextField( booleanMappers = @BooleanMapper (validated=true) ) @QuerySqlField private String aBooleanString; [...] }
CREATE TABLE "PUBLIC".MYENTITY ( id int, column_name boolean, aBooleanString varchar, PRIMARY KEY (id) ) WITH "TEMPLATE=PARTITIONED"; CREATE INDEX MYENTITY_LUCENE_IDX ON "PUBLIC".MYENTITY(LUCENE) FULLTEXT '{ ''refresh_seconds'':''60'', ''partitioner'':''{"type":"token","partitions":10}'', ''schema'':''{ "fields": { "myBoolean": { type: "boolean", validated: true, column: "column_name" }, "aBooleanString": { type: "boolean", validated: true } } }'' }';
Date mapper¶
Single-column mapper. Maps dates using a pattern, a timestamp (milliseconds since January 1, 1970, 00:00:00 GMT) or a time UUID (UNIX timestamp).
Parameters:
- validated (default = false): if mapping errors should make SQL writes fail, instead of just logging the error.
- column (default = mapper's name): the name of the column storing the date to be indexed.
- pattern (default = yyyy/MM/dd HH:mm:ss.SSS Z): the date pattern for parsing not-date columns and creating Lucene fields. Note that it can be used to index dates with reduced precision. If column value is a number that does not match
pattern
, it will be parsed as Date from the milliseconds since January 1, 1970, 00:00:00 GMT defined by column value, then applypattern
.
Additional parameters for Java Annotation Syntax:
- name (default = name of the annotated QueryEntity's property): The mapper's name. It will be used as indexed field name into lucene document. You will use it as
field
parameter's value on lucene searches.
Supported Java types:
java.lang.String
(string representation of a date, format must matchpattern
)java.lang.Byte
orbyte
java.lang.Short
orshort
java.lang.Integer
orint
java.lang.Long
orlong
java.math.BigInteger
java.util.Date
java.util.UUID
- For the curious: parsed as Date from the number of milliseconds since Unix Epoch October 15, 1582, 00:00:00 UTC extracted from UUID.timestamp(), then apply
pattern
.
- For the curious: parsed as Date from the number of milliseconds since Unix Epoch October 15, 1582, 00:00:00 UTC extracted from UUID.timestamp(), then apply
Supported SQL types:
- bigint, date, int, smallint, tinyint, timestamp, uuid, char, varchar
Example: Index the column creation with a precision of minutes using the date format pattern yyyy/MM/dd HH:mm:
@QueryTextField( // Index configuration indexOptions = @IndexOptions( refreshSeconds = 60, partitions = 10 ), dateMappers = @DateMapper (name="creation", validated=true, pattern="yyyy/MM/dd HH:mm") ) public class MyEntity { @QuerySqlField private Date creation; [...] }
@QueryTextField( // Index configuration indexOptions = @IndexOptions( refreshSeconds = 60, partitions = 10 ) ) public class MyEntity { @QueryTextField( dateMappers = @DateMapper (validated=true, pattern="yyyy/MM/dd HH:mm") ) @QuerySqlField private Date creation; [...] }
CREATE TABLE "PUBLIC".MYENTITY ( id int, creation date PRIMARY KEY (id) ) WITH "TEMPLATE=PARTITIONED"; CREATE INDEX MYENTITY_LUCENE_IDX ON "PUBLIC".MYENTITY(LUCENE) FULLTEXT '{ ''refresh_seconds'':''60'', ''partitioner'':''{"type":"token","partitions":10}'', ''schema'':''{ "fields": { "creation": { type: "date", validated: true, pattern: "yyyy/MM/dd HH:mm" } } }'' }';
Date range mapper¶
Multiple-columns mapper. Maps a time duration/period defined by a start date and a stop date. The mapped columns shouldn't be collections.
Parameters:
- validated (default = false): if mapping errors should make SQL writes fail, instead of just logging the error.
- from (mandatory): the name of the column storing the start date of the time duration to be indexed.
- to (mandatory): the name of the column storing the stop date of the time duration to be indexed.
- pattern (default = yyyy/MM/dd HH:mm:ss.SSS Z): the date pattern for parsing not-date columns and creating Lucene fields. Note that it can be used to index dates with reduced precision. If column value is a number that does not match
pattern
, it will be parsed as Date from the milliseconds since January 1, 1970, 00:00:00 GMT defined by column value, then applypattern
.
Additional parameters for Java Annotation Syntax:
- name (mandatory): The mapper's name. It will be used as indexed field name into lucene document. You will use it as
field
parameter's value on lucene searches.
Supported Java types:
java.lang.String
(string representation of a date, format must matchpattern
)java.lang.Byte
orbyte
java.lang.Short
orshort
java.lang.Integer
orint
java.lang.Long
orlong
java.math.BigInteger
java.util.Date
java.util.UUID
- For the curious: parsed as Date from the number of milliseconds since Unix Epoch October 15, 1582, 00:00:00 UTC extracted from UUID.timestamp(), then apply
pattern
.
- For the curious: parsed as Date from the number of milliseconds since Unix Epoch October 15, 1582, 00:00:00 UTC extracted from UUID.timestamp(), then apply
Supported SQL types:
- bigint, date, int, smallint, tinyint, timestamp, uuid, char, varchar
Example 1: Index a time period defined by the columns start and stop, using the default date pattern:
@QueryTextField( // Index configuration indexOptions = @IndexOptions( refreshSeconds = 60, partitions = 10 ), dateRangeMappers = @DateRangeMapper (name="duration", from="start", to="stop") ) public class MyEntity { @QuerySqlField private Date start; @QuerySqlField(name = "stop") private Date end; [...] }
CREATE TABLE "PUBLIC".MYENTITY ( id int, start date, stop date PRIMARY KEY (id) ) WITH "TEMPLATE=PARTITIONED"; CREATE INDEX MYENTITY_LUCENE_IDX ON "PUBLIC".MYENTITY(LUCENE) FULLTEXT '{ ''refresh_seconds'':''60'', ''partitioner'':''{"type":"token","partitions":10}'', ''schema'':''{ "fields": { "creation": { type: "date_range", from: "start", to: "stop" } } }'' }';
Example 2: Index a time period defined by the columns start and stop, validating values, and using a precision of minutes:
@QueryTextField( // Index configuration indexOptions = @IndexOptions( refreshSeconds = 60, partitions = 10 ), dateRangeMappers = @DateRangeMapper (name="duration", from="start", to="stop", validated=true, pattern="yyyy/MM/dd HH:mm") ) public class MyEntity { @QuerySqlField private Date start; @QuerySqlField(name = "stop") private Date end; [...] }
CREATE TABLE "PUBLIC".MYENTITY ( id int, start date, stop date PRIMARY KEY (id) ) WITH "TEMPLATE=PARTITIONED"; CREATE INDEX MYENTITY_LUCENE_IDX ON "PUBLIC".MYENTITY(LUCENE) FULLTEXT '{ ''refresh_seconds'':''60'', ''partitioner'':''{"type":"token","partitions":10}'', ''schema'':''{ "fields": { "creation": { type: "date_range", from: "start", to: "stop", validated: true, pattern: "yyyy/MM/dd HH:mm" } } }'' }';
Double mapper¶
Single-column mapper. Maps a 64-bit decimal number.
Parameters:
- validated (default = false): if mapping errors should make SQL writes fail, instead of just logging the error.
- column (default = mapper's name): the name of the column storing the double to be indexed.
- boost (default = 0.1f): the Lucene's index-time boosting factor.
Additional parameters for Java Annotation Syntax:
- name (default = name of the annotated QueryEntity's property): The mapper's name. It will be used as indexed field name into lucene document. You will use it as
field
parameter's value on lucene searches.
Supported Java types:
java.lang.String
(a base 10 decimal string representation, decimal separator must be '.' )java.lang.Byte
orbyte
java.lang.Short
orshort
java.lang.Integer
orint
java.lang.Long
orlong
java.lang.Float
orfloat
java.lang.Double
ordouble
java.math.BigInteger
java.math.BigDecimal
Supported SQL types:
- bigint, decimal, double, float, int, real, smallint, tinyint, char, varchar
Example:
@QueryTextField( // Index configuration indexOptions = @IndexOptions( refreshSeconds = 60, partitions = 10 ), // multiple mappers are supported doubleMappers = { @DoubleMapper (name="myDoubleField", column="column_name", boost= 2.0, validated=true), @DoubleMapper (name="aStringFloat", validated=false) } ) public class MyEntity { @QuerySqlField (name = "column_name") private float aFloat; @QuerySqlField private String aStringFloat; [...] }
@QueryTextField( // Index configuration indexOptions = @IndexOptions( refreshSeconds = 60, partitions = 10 ) ) public class MyEntity { @QueryTextField( // multiple mappers are supported doubleMappers = { @DoubleMapper (name="myDoubleField", column="column_name", boost= 2.0, validated=true) } ) @QuerySqlField (name = "column_name") private float aFloat; @QueryTextField( doubleMappers = @DoubleMapper (validated=false) ) @QuerySqlField private String aStringFloat; [...] }
CREATE TABLE "PUBLIC".MYENTITY ( id int, column_name float, aStringFloat varchar, PRIMARY KEY (id) ) WITH "TEMPLATE=PARTITIONED"; CREATE INDEX MYENTITY_LUCENE_IDX ON "PUBLIC".MYENTITY(LUCENE) FULLTEXT '{ ''refresh_seconds'':''60'', ''partitioner'':''{"type":"token","partitions":10}'', ''schema'':''{ "fields": { "myDoubleField": { type: "double", boost: 2.0, validated: true, column: "column_name" }, "aStringFloat": { type: "double", validated: false } } }'' }';
Float mapper¶
Single-column mapper. Maps a 32-bit decimal number.
Parameters:
- validated (default = false): if mapping errors should make SQL writes fail, instead of just logging the error.
- column (default = mapper's name): the name of the column storing the float to be indexed.
- boost (default = 0.1f): the Lucene's index-time boosting factor.
Additional parameters for Java Annotation Syntax:
- name (default = name of the annotated QueryEntity's property): The mapper's name. It will be used as indexed field name into lucene document. You will use it as
field
parameter's value on lucene searches.
Supported Java types:
java.lang.String
(a base 10 decimal string representation, decimal separator must be '.' )java.lang.Byte
orbyte
java.lang.Short
orshort
java.lang.Integer
orint
java.lang.Long
orlong
java.lang.Float
orfloat
java.lang.Double
ordouble
java.math.BigInteger
java.math.BigDecimal
Supported SQL types:
- bigint, decimal, double, float, int, real, smallint, tinyint, char, varchar
Example:
@QueryTextField( // Index configuration indexOptions = @IndexOptions( refreshSeconds = 60, partitions = 10 ), // multiple mappers are supported floatMappers = { @FloatMapper (name="myFloatField", column="column_name", boost= 2.0, validated=true), @FloatMapper (name="aStringFloat") } ) public class MyEntity { @QuerySqlField (name = "column_name") private float aFloat; @QuerySqlField private String aStringFloat; [...] }
@QueryTextField( // Index configuration indexOptions = @IndexOptions( refreshSeconds = 60, partitions = 10 ) ) public class MyEntity { @QueryTextField( // multiple mappers are supported floatMappers = { @FloatMapper (name="myFloatField", column="column_name", boost= 2.0, validated=true) } ) @QuerySqlField (name = "column_name") private float aFloat; @QueryTextField( floatMappers = @FloatMapper ) @QuerySqlField private String aStringFloat; [...] }
CREATE TABLE "PUBLIC".MYENTITY ( id int, column_name float, aStringFloat varchar, PRIMARY KEY (id) ) WITH "TEMPLATE=PARTITIONED"; CREATE INDEX MYENTITY_LUCENE_IDX ON "PUBLIC".MYENTITY(LUCENE) FULLTEXT '{ ''refresh_seconds'':''60'', ''partitioner'':''{"type":"token","partitions":10}'', ''schema'':''{ "fields": { "myDoubleField": { type: "float", boost: 2.0, validated: true, column: "column_name" }, "aStringFloat": { type: "float" } } }'' }';
Geo point mapper¶
Multiple-columns mapper. Maps a geospatial location (point) defined by two columns containing a latitude and a longitude. Indexing is based on a composite spatial strategy that stores points in a doc values field and also indexes them into a geohash recursive prefix tree with a certain precision level. The low-accuracy prefix tree is used to quickly find results, maybe producing some false positives, and the doc values field is used to discard these false positives. The mapped columns shouldn't be collections.
Parameters:
- validated (default = false): if mapping errors should make SQL writes fail, instead of just logging the error.
- latitude (mandatory): the name of the column storing the latitude of the point to be indexed. A valid latitude must in the range [-90, 90].
- longitude (mandatory): the name of the column storing the longitude of the point to be indexed. A valid longitude must in the range [-180, 180].
- max_levels (default = 11): the maximum number of levels in the underlying geohash search tree. False positives will be discarded using stored doc values, so this doesn't mean precision lost. Higher values will produce few false positives to be post-filtered, at the expense of creating more terms in the search index.
Additional parameters for Java Annotation Syntax:
- name (mandatory): The mapper's name. It will be used as indexed field name into lucene document. You will use it as
field
parameter's value on lucene searches.
Supported Java types:
java.lang.String
(a base 10 decimal string representation, decimal separator must be '.' )java.lang.Byte
orbyte
java.lang.Short
orshort
java.lang.Integer
orint
java.lang.Long
orlong
java.lang.Float
orfloat
java.lang.Double
ordouble
java.math.BigInteger
java.math.BigDecimal
Supported SQL types:
- bigint, decimal, double, float, int, real, smallint, tinyint, char, varchar
Example:
@QueryTextField( // Index configuration indexOptions = @IndexOptions( refreshSeconds = 60, partitions = 10 ), // multiple mappers are supported geoPointMappers = @GeoPointMapper(name = "my_geo_point", latitude = "lat", longitude = "lon", max_levels = 15, validated = true) ) public class MyEntity { @QuerySqlField private double lat; @QuerySqlField (name = "lon") private double lng; [...] }
CREATE TABLE "PUBLIC".MYENTITY ( id int, lon double, lat double, PRIMARY KEY (id) ) WITH "TEMPLATE=PARTITIONED"; CREATE INDEX MYENTITY_LUCENE_IDX ON "PUBLIC".MYENTITY(LUCENE) FULLTEXT '{ ''refresh_seconds'':''60'', ''partitioner'':''{"type":"token","partitions":10}'', ''schema'':''{ "fields": { "my_geo_point": { type: "geo_point", latitude: "lat", longitude: "lon", max_levels: 15, validated: true } } }'' }';
Geo shape mapper¶
Single-column mapper. Maps a geographical shape stored in a text column with Well Known Text (WKT)
format or in a org.locationtech.jts.geom.Geometry
column. The supported WKT shapes are point, linestring, polygon, multipoint, multilinestring and multipolygon.
It is possible to specify a sequence of geometrical transformations to be applied to the shape before indexing it. It could be used for indexing only the centroid of the shape, or a buffer around it, etc.
Indexing is based on a composite spatial strategy that stores shapes in a doc values field and also indexes them into a geohash recursive prefix tree with a certain precision level. The low-accuracy prefix tree is used to quickly find results, maybe producing some false positives, and the doc values field is used to discard these false positives.
This mapper depends on Java Topology Suite (JTS).
Parameters:
- validated (default = false): if mapping errors should make SQL writes fail, instead of just logging the error.
- column (default = mapper's name): the name of the column storing the shape.
- max_levels (default = 5): the maximum number of levels in the underlying geohash search tree. False positives will be discarded using stored doc values, so this doesn't mean precision lost. Higher values will produce few false positives to be post-filtered, at the expense of creating more terms in the search index.
- transformations (optional): sequence of geometrical transformations to be applied to each shape before indexing it.
Additional parameters for Java Annotation Syntax:
- name (default = name of the annotated QueryEntity's property): The mapper's name. It will be used as indexed field name into lucene document. You will use it as
field
parameter's value on lucene searches.
Supported Java types:
java.lang.String
(string representation of geometry in WKT format)org.locationtech.jts.geom.Geometry
Supported SQL types:
- geometry, char, varchar
Example 1:
public class Block { @QueryTextField(geoShapeMappers = { @GeoShapeMapper(name = "place", column="shape") }) @QuerySqlField private String shape; ... }
CREATE TABLE IF NOT EXISTS "PUBLIC".BLOCK ( id int, shape varchar, PRIMARY KEY (id) ) WITH "TEMPLATE=PARTITIONED"; CREATE INDEX BLOCK_LUCENE_IDX ON "PUBLIC".BLOCK(LUCENE) FULLTEXT '{ ''refresh_seconds'':''60'', ''partitioner'':''{"type":"token","partitions":10}'', ''schema'':''{ "fields": { "place": { type: "geo_shape", column: "shape" } } }'' }';
-- insert some geometries within polygon POLYGON((-80.14247278479951 25.795756477689594, -66.11315588592869 18.47447597127288, -64.82713517019887 32.33019640254669, -80.14247278479951 25.795756477689594)) INSERT INTO "PUBLIC".block(id, shape) VALUES (1,'POINT(-74.72900390625 26.37218544169562)'); INSERT INTO "PUBLIC".block(id, shape) VALUES (2,'POINT(-69.89501953125 27.97499795326776)'); INSERT INTO "PUBLIC".block(id, shape) VALUES (3,'POINT(-69.89501953125 27.97499795326776)'); INSERT INTO "PUBLIC".block(id, shape) VALUES (4,'POINT(-69.89501953125 27.97499795326776)'); -- insert some geometries within a buffer 10 kilometers of the Florida's coast line (LINESTRING(-80.90 29.05, -80.51 28.47, -80.60 28.12, -80.00 26.85, -80.05 26.37)) and a polygon that intersets with this 20km buffer INSERT INTO "PUBLIC".block(id, shape) VALUES (5,'POINT(-80.6485413219623 28.640536042944007)'); INSERT INTO "PUBLIC".block(id, shape) VALUES (6,'POINT(-80.58958155284962 27.988011858490253)'); INSERT INTO "PUBLIC".block(id, shape) VALUES (7,'POINT(-80.26109141065051 27.31654822659739)'); INSERT INTO "PUBLIC".block(id, shape) VALUES (8,'POLYGON((-80.95740306914789 27.806634697867153, -80.36001876156914 27.838020410985237, -80.60843599838408 27.602406791332584, -81.15258804093104 27.623370824383542, -80.95740306914789 27.806634697867153))');
Example 2: Index only the centroid of the WKT shape contained in the indexed column:
@QueryTextField( // Index configuration indexOptions = @IndexOptions( refreshSeconds = 60, partitions = 10 ) ) public class City { @QueryTextField(geoShapeMappers = { @GeoShapeMapper( name = "place_centroid", column = "shape", transformations = @GeoTransformation(type = GeoTransformationType.CENTROID), max_levels = 15)) }) @QuerySqlField private Geometry shape; ... }
CREATE TABLE IF NOT EXISTS "PUBLIC".CITY ( name varchar, shape geometry, PRIMARY KEY (name) ) WITH "TEMPLATE=PARTITIONED"; CREATE INDEX CITY_LUCENE_IDX ON "PUBLIC".CITY(LUCENE) FULLTEXT '{ ''refresh_seconds'':''60'', ''partitioner'':''{"type":"token","partitions":10}'', ''schema'':''{ "fields": { "place_centroid": { type: "geo_shape", column: "shape", max_levels: 15, transformations: [{type: "centroid"}] } } }'' }';
INSERT INTO "PUBLIC".city(name, shape) VALUES ('birmingham', 'POLYGON((-2.25 52.63, -2.26 52.49, -2.13 52.36, -1.80 52.34, -1.57 52.54, -1.89 52.67, -2.25 52.63))'); INSERT INTO "PUBLIC".city(name, shape) VALUES ('london', 'POLYGON((-0.55 51.50, -0.13 51.19, 0.21 51.35, 0.30 51.62, -0.02 51.75, -0.34 51.69, -0.55 51.50))');
Example 3: Index a buffer 50 kilometres around the area of a city:
@QueryTextField( // Index configuration indexOptions = @IndexOptions( refreshSeconds = 60, partitions = 10 ) ) public class City { @QueryTextField(geoShapeMappers = { @GeoShapeMapper( name = "place_buffer", column = "shape", transformations = @GeoTransformation(type = GeoTransformationType.BUFFER, max_distance = "50km")) }) @QuerySqlField private Geometry shape; ... }
CREATE TABLE IF NOT EXISTS "PUBLIC".CITY ( name varchar, shape geometry, PRIMARY KEY (name) ) WITH "TEMPLATE=PARTITIONED"; CREATE INDEX CITY_LUCENE_IDX ON "PUBLIC".CITY(LUCENE) FULLTEXT '{ ''refresh_seconds'':''60'', ''partitioner'':''{"type":"token","partitions":10}'', ''schema'':''{ "fields": { "place_buffer": { type: "geo_shape", column: "shape", max_levels: 15, transformations: [{type: "buffer", min_distance: "50km"}] } } }'' }';
INSERT INTO "PUBLIC".city(name, shape) VALUES ('birmingham', 'POLYGON((-2.25 52.63, -2.26 52.49, -2.13 52.36, -1.80 52.34, -1.57 52.54, -1.89 52.67, -2.25 52.63))'); INSERT INTO "PUBLIC".city(name, shape) VALUES ('london', 'POLYGON((-0.55 51.50, -0.13 51.19, 0.21 51.35, 0.30 51.62, -0.02 51.75, -0.34 51.69, -0.55 51.50))');
Example 4: Index a buffer 50 kilometers around the borders of a country:
@QueryTextField( // Index configuration indexOptions = @IndexOptions( refreshSeconds = 60, partitions = 10 ), geoShapeMappers = { @GeoShapeMapper( name = "place", column = "shape", transformations = @GeoTransformation(type = GeoTransformationType.BUFFER, max_distance = "50km")) }) public class Border { @QuerySqlField private String shape; ... }
CREATE TABLE IF NOT EXISTS "PUBLIC".BORDER ( name varchar, shape varchar, PRIMARY KEY (name) ) WITH "TEMPLATE=PARTITIONED"; CREATE INDEX BORDER_LUCENE_IDX ON "PUBLIC".BORDER(LUCENE) FULLTEXT '{ ''refresh_seconds'':''60'', ''partitioner'':''{"type":"token","partitions":10}'', ''schema'':''{ "fields": { "place": { type: "geo_shape", column: "shape", max_levels: 15, transformations: [{type: "buffer", min_distance: "50km"}] } } }'' }';
INSERT INTO "PUBLIC".border(name, shape) VALUES ('france', 'LINESTRING(-1.7692623510307564 43.33616076038274, -1.2505740047697933 43.05871685595872, -0.2260349939872593 42.77098869300744, 0.681245743448637 42.818493812398145, 2.1252193765092477 42.468351698279776, 3.1051001377164886 42.43862842055378)'); INSERT INTO "PUBLIC".border(name, shape) VALUES ('portugal', 'LINESTRING(-8.856095991604167 41.939536163204366, -8.167701975450063 42.1743738747258, -8.130491488090382 41.80099080619647, -6.642071993703126 41.967209197025284, -6.251361876426471 41.5926099563257, -6.921150648900735 41.019594111075804, -6.995571623620098 39.71552328866082, -7.516518446655637 39.68689405837003, -6.958361136260418 39.039601047883025, -7.29325552249755 38.53201174087454, -6.976966379940258 38.196488460238356, -7.4979132029757976 37.579789414825896, -7.404886984576595 37.19544025819424)');
Example 5: Index the convex hull of the WKT shape contained in the indexed column:
@QueryTextField( // Index configuration indexOptions = @IndexOptions( refreshSeconds = 60, partitions = 10 ) ) public class Block { @QueryTextField(geoShapeMappers = { @GeoShapeMapper( name = "place_convexhull", column = "shape", transformations = @GeoTransformation(type = GeoTransformationType.CONVEX_HULL)) }) @QuerySqlField private String shape; ... }
CREATE TABLE IF NOT EXISTS "PUBLIC".BLOCK ( id int, shape varchar, PRIMARY KEY (id) ) WITH "TEMPLATE=PARTITIONED"; CREATE INDEX BLOCK_LUCENE_IDX ON "PUBLIC".BLOCK(LUCENE) FULLTEXT '{ ''refresh_seconds'':''60'', ''partitioner'':''{"type":"token","partitions":10}'', ''schema'':''{ "fields": { "place_convexhull": { type: "geo_shape", column: "shape", transformations: [{type: "convex_hull"}] } } }'' }';
INSERT INTO "PUBLIC".block(id, shape) VALUES (20, 'POLYGON(((-73.91738891601561 40.7118739519081, -73.91738891601561 40.7118739519081, -73.92425537109375 40.70771000786732... ))');
Example 6: Index the bounding box of the WKT shape contained in the indexed column:
@QueryTextField( // Index configuration indexOptions = @IndexOptions( refreshSeconds = 60, partitions = 10 ) ) public class Block { @QueryTextField(geoShapeMappers = { @GeoShapeMapper( name = "place_bbox", column = "shape", transformations = @GeoTransformation(type = GeoTransformationType.BBOX)) }) @QuerySqlField private String shape; ... }
CREATE TABLE IF NOT EXISTS "PUBLIC".BLOCK ( id int, shape varchar, PRIMARY KEY (id) ) WITH "TEMPLATE=PARTITIONED"; CREATE INDEX BLOCK_LUCENE_IDX ON "PUBLIC".BLOCK(LUCENE) FULLTEXT '{ ''refresh_seconds'':''60'', ''partitioner'':''{"type":"token","partitions":10}'', ''schema'':''{ "fields": { "place_bbox": { type: "geo_shape", column: "shape", transformations: [{type: "bbox"}] } } }'' }';
INSERT INTO "PUBLIC".block(id, shape) VALUES (20, 'POLYGON(((-73.91738891601561 40.7118739519081, -73.91738891601561 40.7118739519081, -73.92425537109375 40.70771000786732... ))');
Inet mapper¶
Single-column mapper. Maps an IP address. Either IPv4 and IPv6 are supported.
Parameters:
- validated (default = false): if mapping errors should make SQL writes fail, instead of just logging the error.
- column (default = mapper's name): the name of the column storing the IP address to be indexed.
Additional parameters for Java Annotation Syntax:
- name (default = name of the annotated QueryEntity's property): The mapper's name. It will be used as indexed field name into lucene document. You will use it as
field
parameter's value on lucene searches.
Supported Java types:
java.lang.String
(an IPv4 or IPv6 address string representation)java.net.InetAddress
Supported SQL types:
- char, varchar
Example:
@QueryTextField( // Index configuration indexOptions = @IndexOptions( refreshSeconds = 60, partitions = 10, ramBufferMB = 10 ), inetMappers = { @InetMapper(name="ipv4", validated = true) } ) public class MyEntity { @QuerySqlField private InetAddress ipv4; [...] }
@QueryTextField( // Index configuration indexOptions = @IndexOptions( refreshSeconds = 60, partitions = 10, ramBufferMB = 10 ) ) public class MyEntity { @QueryTextField( inetMappers = { @InetMapper(validated = true) } ) @QuerySqlField private InetAddress ipv4; [...] }
CREATE TABLE "PUBLIC".MYENTITY ( id int, ipv4 varchar, PRIMARY KEY (id) ) WITH "TEMPLATE=PARTITIONED"; CREATE INDEX MYENTITY_LUCENE_IDX ON "PUBLIC".MYENTITY(LUCENE) FULLTEXT '{ ''refresh_seconds'':''60'', ''partitioner'':''{"type":"token","partitions":10}'', ''ram_buffer_mb'':''10'', ''schema'':''{ "fields": { "ipv4": { type: "inet", validated : true } } }'' }';
Integer mapper¶
Single-column mapper. Maps a 32-bit integer number.
Parameters:
- validated (default = false): if mapping errors should make SQL writes fail, instead of just logging the error.
- column (default = mapper's name): the name of the column storing the integer to be indexed.
- boost (default = 0.1f): the Lucene's index-time boosting factor.
Additional parameters for Java Annotation Syntax:
- name (default = name of the annotated QueryEntity's property): The mapper's name. It will be used as indexed field name into lucene document. You will use it as
field
parameter's value on lucene searches.
Supported Java types:
java.lang.String
(a base 10 integer string representation)java.lang.Byte
orbyte
java.lang.Short
orshort
java.lang.Integer
orint
java.lang.Long
orlong
java.lang.Float
orfloat
java.lang.Double
ordouble
java.math.BigInteger
java.math.BigDecimal
java.util.Date
- For the curious: parsed as, for byte-order comparability, the number of milliseconds since January 1, 1970, 00:00:00 GMT represented by this date -> converted to days -> shift by Integer.MIN_VALUE and treated as an unsigned integer (see org.hawkore.ignite.lucene.schema.mapping.IntegerMapper to know how)
Supported SQL types:
- bigint, decimal, double, float, int, real, smallint, tinyint, timestamp, date, char, varchar
Example:
@QueryTextField( // Index configuration indexOptions = @IndexOptions( refreshSeconds = 60, partitions = 10 ), // multiple mappers are supported integerMappers = { @IntegerMapper (name="myIntegerField", column="column_name", boost= 2.0, validated=true), @IntegerMapper (name="aStringInteger") } ) public class MyEntity { @QuerySqlField (name = "column_name") private int aInteger; @QuerySqlField private String aStringInteger; [...] }
@QueryTextField( // Index configuration indexOptions = @IndexOptions( refreshSeconds = 60, partitions = 10 ) ) public class MyEntity { @QueryTextField( // multiple mappers are supported integerMappers = { @IntegerMapper (name="myIntegerField", column="column_name", boost= 2.0, validated=true) } ) @QuerySqlField (name = "column_name") private int aInteger; @QueryTextField( integerMappers = @IntegerMapper ) @QuerySqlField private String aStringInteger; [...] }
CREATE TABLE "PUBLIC".MYENTITY ( id int, column_name int, aStringInteger varchar, PRIMARY KEY (id) ) WITH "TEMPLATE=PARTITIONED"; CREATE INDEX MYENTITY_LUCENE_IDX ON "PUBLIC".MYENTITY(LUCENE) FULLTEXT '{ ''refresh_seconds'':''60'', ''partitioner'':''{"type":"token","partitions":10}'', ''schema'':''{ "fields": { "myIntegerField": { type: "integer", boost: 2.0, validated: true, column: "column_name" }, "aStringInteger": { type: "integer" } } }'' }';
Long mapper¶
Single-column mapper. Maps a 64-bit integer number.
Parameters:
- validated (default = false): if mapping errors should make SQL writes fail, instead of just logging the error.
- column (default = mapper's name): the name of the column storing the double to be indexed.
- name (default = name of the annotated QueryEntity's property): the name of the indexed lucene field. You will use it as
field
parameter's value on lucene searches. - boost (default = 0.1f): the Lucene's index-time boosting factor.
Supported Java types:
java.lang.String
(a base 10 integer string representation)java.lang.Byte
orbyte
java.lang.Short
orshort
java.lang.Integer
orint
java.lang.Long
orlong
java.lang.Float
orfloat
java.lang.Double
ordouble
java.math.BigInteger
java.math.BigDecimal
java.util.Date
- For the curious: parsed as the number of milliseconds since January 1, 1970, 00:00:00 GMT, extracted as Date.getTime()
Supported SQL types:
- bigint, decimal, double, float, int, real, smallint, tinyint, timestamp, date, char, varchar
Additional parameters for Java Annotation Syntax:
- name (default = name of the annotated QueryEntity's property): The mapper's name. It will be used as indexed field name into lucene document. You will use it as
field
parameter's value on lucene searches.
Example:
@QueryTextField( // Index configuration indexOptions = @IndexOptions( refreshSeconds = 60, partitions = 10 ), // multiple mappers are supported longMappers = { @LongMapper (name="myLongField", column="column_name", boost= 2.0, validated=true), @LongMapper (name="aStringLong", validated=false) } ) public class MyEntity { @QuerySqlField (name = "column_name") private long aLong; @QuerySqlField private String aStringLong; [...] }
@QueryTextField( // Index configuration indexOptions = @IndexOptions( refreshSeconds = 60, partitions = 10 ) ) public class MyEntity { @QueryTextField( // multiple mappers are supported longMappers = { @LongMapper (name="myLongField", column="column_name", boost= 2.0, validated=true) } ) @QuerySqlField (name = "column_name") private long aLong; @QueryTextField( longMappers = @LongMapper ) @QuerySqlField private String aStringLong; [...] }
CREATE TABLE "PUBLIC".MYENTITY ( id int, column_name bigint, aStringLong varchar, PRIMARY KEY (id) ) WITH "TEMPLATE=PARTITIONED"; CREATE INDEX MYENTITY_LUCENE_IDX ON "PUBLIC".MYENTITY(LUCENE) FULLTEXT '{ ''refresh_seconds'':''60'', ''partitioner'':''{"type":"token","partitions":10}'', ''schema'':''{ "fields": { "myLongField": { type: "long", boost: 2.0, validated: true, column: "column_name" }, "aStringLong": { type: "long" } } }'' }';
String mapper¶
Single-column mapper. Maps a not-analyzed text value.
Parameters:
- validated (default = false): if mapping errors should make SQL writes fail, instead of just logging the error.
- column (default = mapper's name): the name of the column storing the IP address to be indexed.
- case_sensitive (default = true): if the text will be indexed preserving its casing.
Additional parameters for Java Annotation Syntax:
- name (default = name of the annotated QueryEntity's property): The mapper's name. It will be used as indexed field name into lucene document. You will use it as
field
parameter's value on lucene searches.
Supported Java types:
java.lang.String
java.lang.Byte
orbyte
java.lang.Short
orshort
java.lang.Integer
orint
java.lang.Long
orlong
java.lang.Float
orfloat
java.lang.Double
ordouble
java.math.BigInteger
java.math.BigDecimal
java.util.Date
java.util.UUID
java.net.InetAddress
Supported SQL types:
- bigint, decimal, double, float, int, real, smallint, tinyint, timestamp, date, char, varchar, uuid
Example:
@QueryTextField( // Index configuration indexOptions = @IndexOptions( refreshSeconds = 60, partitions = 10, ramBufferMB = 10 ), stringMappers = { @StringMapper(name="myCaseSensitiveField", column="column_name", case_sensitive = true, validated = true), @StringMapper(name="myStringField", column="column_name", case_sensitive = false) } ) public class MyEntity { @QuerySqlField(name = "column_name") private String name; [...] }
@QueryTextField( // Index configuration indexOptions = @IndexOptions( refreshSeconds = 60, partitions = 10, ramBufferMB = 10 ) ) public class MyEntity { @QueryTextField( stringMappers = { @StringMapper(name="myCaseSensitiveField", column="column_name", case_sensitive = true, validated = true), @StringMapper(name="myStringField", column="column_name", case_sensitive = false) } ) @QuerySqlField(name = "column_name") private String name; [...] }
CREATE TABLE "PUBLIC".MYENTITY ( id int, column_name varchar, PRIMARY KEY (id) ) WITH "TEMPLATE=PARTITIONED"; CREATE INDEX MYENTITY_LUCENE_IDX ON "PUBLIC".MYENTITY(LUCENE) FULLTEXT '{ ''refresh_seconds'':''60'', ''partitioner'':''{"type":"token","partitions":10}'', ''ram_buffer_mb'':''10'', ''schema'':''{ "fields": { "myCaseSensitiveField": { type: "string", validated: true, case_sensitive : true, column: "column_name" }, "myStringField": { type: "string", column: "column_name", case_sensitive : false } } }'' }';
Text mapper¶
Single-column mapper. Maps a language-aware text value analyzed according to the specified analyzer.
Parameters:
- validated (default = false): if mapping errors should make SQL writes fail, instead of just logging the error.
- column (default = mapper's name): the name of the column storing the IP address to be indexed.
- analyzer (default = default analyzer defined on
@IndexOptions
orschema
): the name of the text analyzer to be used. Additionally to references to those custom analyzers defined in theclasspathAnalyzers
andsnowballAnalyzers
of@IndexOptions
, there are prebuilt analyzers for Arabic, Bulgarian, Brazilian, Catalan, Sorani, Czech, Danish, German, Greek, English, Spanish, Basque, Persian, Finnish, French, Irish, Galician, Hindi, Hungarian, Armenian, Indonesian, Italian, Latvian, Dutch, Norwegian, Portuguese, Romanian, Russian, Swedish, Thai and Turkish.
Additional parameters for Java Annotation Syntax:
- name (default = name of the annotated QueryEntity's property): The mapper's name. It will be used as indexed field name into lucene document. You will use it as
field
parameter's value on lucene searches.
Supported Java types:
java.lang.String
java.lang.Byte
orbyte
java.lang.Short
orshort
java.lang.Integer
orint
java.lang.Long
orlong
java.lang.Float
orfloat
java.lang.Double
ordouble
java.math.BigInteger
java.math.BigDecimal
java.util.Date
java.util.UUID
java.net.InetAddress
Supported SQL types:
- bigint, decimal, double, float, int, real, smallint, tinyint, timestamp, date, char, varchar, uuid
Example:
@QueryTextField( // Index configuration indexOptions = @IndexOptions( refreshSeconds = 60, partitions = 10, ramBufferMB = 10, snowballAnalyzers = { @SnowballAnalyzer(name = "my_custom_analyzer", language = "Spanish", stopwords = "el,la,lo,los,las,a,ante,bajo,cabe,con,contra") } ), textMappers = { @TextMapper(name="spanish_text", column="message_body", analyzer = "my_custom_analyzer", validated = true), @TextMapper(name="english_text", column="message_body", analyzer = "English") } ) public class MyEntity { @QuerySqlField(name = "message_body") private String body; [...] }
@QueryTextField( // Index configuration indexOptions = @IndexOptions( refreshSeconds = 60, partitions = 10, ramBufferMB = 10, snowballAnalyzers = { @SnowballAnalyzer(name = "my_custom_analyzer", language = "Spanish", stopwords = "el,la,lo,los,las,a,ante,bajo,cabe,con,contra") } ) ) public class MyEntity { @QueryTextField( textMappers = { @TextMapper(name="spanish_text", column="message_body", analyzer = "my_custom_analyzer", validated = true), @TextMapper(name="english_text", column="message_body", analyzer = "English") } ) @QuerySqlField(name = "message_body") private String body; [...] }
CREATE TABLE "PUBLIC".MYENTITY ( id int, message_body varchar, PRIMARY KEY (id) ) WITH "TEMPLATE=PARTITIONED"; CREATE INDEX MYENTITY_LUCENE_IDX ON "PUBLIC".MYENTITY(LUCENE) FULLTEXT '{ ''refresh_seconds'':''60'', ''partitioner'':''{"type":"token","partitions":10}'', ''ram_buffer_mb'':''10'', ''schema'':''{ analyzers: { "my_custom_analyzer": { type: "snowball", language: "Spanish", stopwords: "el,la,lo,los,las,a,ante,bajo,cabe,con,contra" } }, "fields": { "spanish_text": { type: "text", validated: true, analyzer : "my_custom_analyzer", column: "message_body" }, "english_text": { type: "text", column: "message_body", analyzer : "English" } } }'' }';
UUID mapper¶
Single-column mapper. Maps an UUID value.
Parameters:
- validated (default = false): if mapping errors should make SQL writes fail, instead of just logging the error.
- column (default = mapper's name): the name of the column storing the IP address to be indexed.
Additional parameters for Java Annotation Syntax:
- name (default = name of the annotated QueryEntity's property): The mapper's name. It will be used as indexed field name into lucene document. You will use it as
field
parameter's value on lucene searches.
Supported Java types:
java.lang.String
(an UUID string representation)java.util.UUID
Supported SQL types:
- uuid, char, varchar
Example:
@QueryTextField( // Index configuration indexOptions = @IndexOptions( refreshSeconds = 60, partitions = 10, ramBufferMB = 10 ), uuidMappers = { @UUIDMapper(name="uuid", column = "column_name", validated = true) } ) public class MyEntity { @QuerySqlField(name = "column_name") private UUID uuid; [...] }
@QueryTextField( // Index configuration indexOptions = @IndexOptions( refreshSeconds = 60, partitions = 10, ramBufferMB = 10 ) ) public class MyEntity { @QueryTextField( uuidMappers = { @UUIDMapper(name="uuid", column = "column_name", validated = true) } ) @QuerySqlField(name = "column_name") private UUID uuid; [...] }
CREATE TABLE "PUBLIC".MYENTITY ( id int, column_name UUID, PRIMARY KEY (id) ) WITH "TEMPLATE=PARTITIONED"; CREATE INDEX MYENTITY_LUCENE_IDX ON "PUBLIC".MYENTITY(LUCENE) FULLTEXT '{ ''refresh_seconds'':''60'', ''partitioner'':''{"type":"token","partitions":10}'', ''ram_buffer_mb'':''10'', ''schema'':''{ "fields": { "uuid": { type: "uuid", column:"column_name", validated: true } } }'' }';
Example¶
You can find samples source code at Hawkore's Apache Ignite extensions sample project.