Apache Solr Tutorial

Apache Solr Tutorial

Solr Terminology

There are several terms associated with Solr implementation which you will come across, let’s understand them better.

  • Core: It is a running instance of a Lucene index along with all the Solr configuration (SolrConfigXml, SchemaXml, etc...). In simple terms, Core is basically an index of the text and fields found in documents. A single Solr application can contain 0 or more cores which are run largely in isolation but can communicate with each other if necessary via the Core Container.
  • Collection: Collection is a complete logical index in a SolrCloud cluster. It is associated with a config set and is made up of one or more shards.
  • Shards: A shard is logical division of the collection, containing a subset of documents from the collection, such that every document in a collection is contained in exactly one Shard. Each shard is made up of one or more replicas.
    Apache Solr Tutorial
  • Nodes: A node is Java Virtual Machine instance running solr in cloud mode. It is also called a server. Each Solr core can also be considered a node.
  • Replicas: Replica is one copy of a shard. Each replica exists within Solr as a core.
  • Cluster: A set of Solr nodes is called a Cluster. It is managed by ZooKeeper as a single unit. When you have a cluster, you can always make requests to the cluster and if the request is acknowledged, you can be sure that it will be managed as a unit and be durable, i.e., you won't lose data.
  • Index: A Solr index is the object used for indexing and retrieving information. It contains data from many different sources, including XML files, comma-separated value (CSV) files, data extracted a database, and files formats such as Microsoft Word or PDF.
  • Field: Solr's basic unit of information that is a document composed of "fields". When you add a document, Solr takes the information in the document's fields and adds that information to an index.
  • Schema: Schema tells the Solr how it should build indexes from input documents.
  • Port: The default port for running solr is 8983.
  • Client-Server: In network Architecture, Client/server is a program relationship in which one program that is the Client requests a service from another program that is Server.
  • Zookeeper: It is a distributed coordinated service for maintaining configuration information, providing distributed synchronization, and providing group services to large number of hosts.