Solr Schema

Solr 7.6

Document

document just like the entity of a database table, while field is similar the column of a table.

Schema

Solr stores details about the field types and fields it is expected to understand in a schema file. It describes the documents you will ask Solr to index. The Schema define a document as a collection of fields. The name and location of this file may vary depending on how you initially configured Solr or if you modified it later.

  1. managed-schema.xml is the name for the schema file Solr uses by default to support making Schema changes at runtime via the Schema API, or Schemaless Mode features.

  2. schema.xml is the traditional name for a schema file which can be edited manually by users who use the ClassicIndexSchemaFactory.

  3. If you are using SolrCloud you may not be able to find any file by these names on the local filesystem. You will only be able to see the schema through the Schema API (if enabled) or through the Solr Admin UI’s Cloud Screens.

class

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">

In field definition, the implementing class is responsible for making sure the field is handled correctly.

In the class names in managed-schema.xml, the string solr is shorthand for org.apache.solr.schema or org.apache.solr.analysis.

Therefore, solr.TextField is really org.apache.solr.schema.TextField.

_root_

The _root_ field is needed for block-join support.

_version_

Optimistic Concurrency is a feature of Solr that can be used by client applications which update/replace documents to ensure that the document they are replacing/updating has not been concurrently modified by another client application. This feature works by requiring a version field on all documents in the index, and comparing that to a version specified as part of the update command. By default, Solr’s Schema includes a version field, and this field is automatically added to each new document.

uniqueKey

The uniqueKey element specifies which field is a unique identifier for documents. Default unique key is id whose default value is generated by Solr.

You can define the unique key field by naming it: <uniqueKey>id</uniqueKey>

uniqueKey should be used if you will ever update a document in the index. For example, for id=3,name=abc in Solr, this record is changed to id=3,name=kkk in databse, if the uniqueKey is not specified, then after do delta import, there are two records in Solr, id=3,name=abc and id=3,name=kkk.

Schema defaults and copyFields cannot be used to populate the uniqueKey field. The fieldType of uniqueKey must not be analyzed and must not be any of the *PointField types.

copy filed

As the name implies, it means copy one or more fields to another field. The name of the field you want to copy is the source, and the name of the copy is the destination.

Fields are copied before analysis is done, meaning you can have two fields with identical original content, but which use different analysis chains and are stored in the index differently.

if the destination field has data of its own in the input documents, the contents of the source field will be added as additional values – just as if all of the values had originally been specified by the client. Remember to configure your fields as multivalued=”true” if they will ultimately get multiple values (either from a multivalued source or from multiple copyField directives).

One case for copy filed is copy the value of your primark key to id field like select customer_id as id from customer such that Solr can update the index correctly according to the primary key value.

Another case is to create a single “search” field that will serve as the default query field when users or clients do not specify a field to query.

For example, title, author, keywords, and body may all be fields that should be searched by default, with copy field rules for each field to copy to a catchall field (for example, it could be named anything). Later you can set a rule in solrconfig.xml to search the catchall field by default. One caveat to this is your index will grow when using copy fields. However, whether this becomes problematic for you and the final size will depend on the number of fields being copied, the number of destination fields being copied to, the analysis in use, and the available disk space.

wildcard

Copy field can use wildcard(*). For example, <copyField source="*_t" dest="text" /> will copy the contents of all incoming fields that match the wildcard pattern *_t to the text field.

chain

copy fields cannot be chained i.e., you cannot copy from here to there and then from there to elsewhere. However, the same source field can be copied to multiple destination fields

dynamic field

Dynamic fields allow Solr to index fields that you did not explicitly define in your schema.

A dynamic field is just like a regular field except it has a name with a wildcard in it. When you are indexing documents, a field that does not match any explicitly defined fields can be matched with a dynamic field.

Reference

Apache Solr Reference Guide

UniqueKey

SchemaXml

solr-block-join-support

Written on January 18, 2019