Solr Schema
Solr 7.6
Document
document
just like the entity of a database table, while field
is similar the column of a table.
Schema
Solr stores details about the field types and fields it is expected to understand in a schema file. It describes the documents you will ask Solr to index. The Schema define a document as a collection of fields. The name and location of this file may vary depending on how you initially configured Solr or if you modified it later.
-
managed-schema.xml
is the name for the schema file Solr uses by default to support making Schema changes at runtime via the Schema API, or Schemaless Mode features. -
schema.xml
is the traditional name for a schema file which can be edited manually by users who use the ClassicIndexSchemaFactory. -
If you are using
SolrCloud
you may not be able to find any file by these names on the local filesystem. You will only be able to see the schema through the Schema API (if enabled) or through the Solr Admin UI’s Cloud Screens.
class
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
In field definition, the implementing class is responsible for making sure the field is handled correctly.
In the class
names in managed-schema.xml
, the string solr
is shorthand for org.apache.solr.schema
or
org.apache.solr.analysis
.
Therefore, solr.TextField
is really org.apache.solr.schema.TextField
.
_root_
The _root_
field is needed for block-join
support.
_version_
Optimistic Concurrency is a feature of Solr that can be used by client applications which update/replace documents to ensure that the document they are replacing/updating has not been concurrently modified by another client application. This feature works by requiring a version field on all documents in the index, and comparing that to a version specified as part of the update command. By default, Solr’s Schema includes a version field, and this field is automatically added to each new document.
uniqueKey
The uniqueKey
element specifies which field is a unique identifier for documents. Default unique key is id
whose default value is generated by Solr.
You can define the unique key field by naming it: <uniqueKey>id</uniqueKey>
uniqueKey
should be used if you will ever update a document in the index.
For example, for id=3,name=abc
in Solr, this record is changed to id=3,name=kkk
in databse,
if the uniqueKey
is not specified, then after do delta import, there are two records in Solr,
id=3,name=abc
and id=3,name=kkk
.
Schema defaults and copyFields cannot be used to populate the uniqueKey field. The fieldType of uniqueKey must not be analyzed and must not be any of the *PointField types.
copy filed
As the name implies, it means copy one or more fields to another field. The name of the field you want to copy is the source, and the name of the copy is the destination.
Fields are copied before analysis is done, meaning you can have two fields with identical original content, but which use different analysis chains and are stored in the index differently.
if the destination field has data of its own in the input documents, the contents of the source field will be added as additional values – just as if all of the values had originally been specified by the client. Remember to configure your fields as multivalued=”true” if they will ultimately get multiple values (either from a multivalued source or from multiple copyField directives).
One case for copy filed is copy the value of your primark key to id
field like select customer_id as id from customer
such that Solr can update the index correctly according to the primary key value.
Another case is to create a single “search” field that will serve as the default query field when users or clients do not specify a field to query.
For example, title, author, keywords, and body may all be fields that should be searched by default, with copy field rules for each field to copy to a catchall field (for example, it could be named anything). Later you can set a rule in solrconfig.xml to search the catchall field by default. One caveat to this is your index will grow when using copy fields. However, whether this becomes problematic for you and the final size will depend on the number of fields being copied, the number of destination fields being copied to, the analysis in use, and the available disk space.
wildcard
Copy field can use wildcard(*
). For example, <copyField source="*_t" dest="text" />
will copy the contents of all incoming fields that match the wildcard pattern *_t
to the text
field.
chain
copy fields cannot be chained i.e., you cannot copy from here to there and then from there to elsewhere. However, the same source field can be copied to multiple destination fields
dynamic field
Dynamic fields allow Solr to index fields that you did not explicitly define in your schema.
A dynamic field is just like a regular field except it has a name with a wildcard in it. When you are indexing documents, a field that does not match any explicitly defined fields can be matched with a dynamic field.