Elasticsearch is a real-time distributed and open source full-text search and analytics engine. It is used in Single Page Application (SPA) projects. Elasticsearch is an open source developed in Java and used by many big organizations around the world. It is licensed under the Apache license version 2.0.
In this tutorial, you will learn in detail the basics of Elasticsearch and its important features.
This tutorial is designed for software professionals who want to learn the basics of Elasticsearch and its programming concepts in simple and easy steps. It describes the components of Elasticsearch with suitable examples.
This tutorial is designed to configure the HR module of SAP in an easy and systematic way. Packed with plenty of screenshots, it will be useful for consultants as well as end-users.
Prerequisites
Before you begin with this tutorial, you should have a basic understanding of Java, JSON, search engines, and web technologies. The interaction with Elasticsearch is through RESTful API; therefore, it is always recommended to have knowledge of RESTful API.
If you are new to any of these concepts, we suggest you to take the help of tutorials based on these topics before you start with Elasticsearch.
Elasticsearch is an Apache Lucene-based search server. It was developed by Shay Banon and published in 2010. It is now maintained by Elasticsearch BV. Its latest version is 7.0.0.
Elasticsearch is a real-time distributed and open source full-text search and analytics engine. It is accessible from RESTful web service interface and uses schema less JSON (JavaScript Object Notation) documents to store data. It is built on Java programming language and hence Elasticsearch can run on different platforms. It enables users to explore very large amount of data at very high speed.
General Features
The general features of Elasticsearch are as follows −
Elasticsearch is scalable up to petabytes of structured and unstructured data.
Elasticsearch can be used as a replacement of document stores like MongoDB and RavenDB.
Elasticsearch uses denormalization to improve the search performance.
Elasticsearch is one of the popular enterprise search engines, and is currently being used by many big organizations like Wikipedia, The Guardian, StackOverflow, GitHub etc.
Elasticsearch is an open source and available under the Apache license version 2.0.
Key Concepts
The key concepts of Elasticsearch are as follows −
Node
It refers to a single running instance of Elasticsearch. Single physical and virtual server accommodates multiple nodes depending upon the capabilities of their physical resources like RAM, storage and processing power.
Cluster
It is a collection of one or more nodes. Cluster provides collective indexing and search capabilities across all the nodes for entire data.
Index
It is a collection of different type of documents and their properties. Index also uses the concept of shards to improve the performance. For example, a set of document contains data of a social networking application.
Document
It is a collection of fields in a specific manner defined in JSON format. Every document belongs to a type and resides inside an index. Every document is associated with a unique identifier called the UID.
Shard
Indexes are horizontally subdivided into shards. This means each shard contains all the properties of document but contains less number of JSON objects than index. The horizontal separation makes shard an independent node, which can be store in any node. Primary shard is the original horizontal part of an index and then these primary shards are replicated into replica shards.
Replicas
Elasticsearch allows a user to create replicas of their indexes and shards. Replication not only helps in increasing the availability of data in case of failure, but also improves the performance of searching by carrying out a parallel search operation in these replicas.
Advantages
Elasticsearch is developed on Java, which makes it compatible on almost every platform.
Elasticsearch is real time, in other words after one second the added document is searchable in this engine
Elasticsearch is distributed, which makes it easy to scale and integrate in any big organization.
Creating full backups are easy by using the concept of gateway, which is present in Elasticsearch.
Handling multi-tenancy is very easy in Elasticsearch when compared to Apache Solr.
Elasticsearch uses JSON objects as responses, which makes it possible to invoke the Elasticsearch server with a large number of different programming languages.
Elasticsearch supports almost every document type except those that do not support text rendering.
Disadvantages
Elasticsearch does not have multi-language support in terms of handling request and response data (only possible in JSON) unlike in Apache Solr, where it is possible in CSV, XML and JSON formats.
Occasionally, Elasticsearch has a problem of Split brain situations.
Comparison between Elasticsearch and RDBMS
In Elasticsearch, index is similar to tables in RDBMS (Relation Database Management System). Every table is a collection of rows just as every index is a collection of documents in Elasticsearch.
The following table gives a direct comparison between these terms−
Elasticsearch RDBMS
Cluster Database
Shard Shard
Index Table
Field Column
Document Row
In my previous blog on What is Elasticsearch, I have introduced Elasticsearch, talked about its advantages, and did the installation on windows. I have also discussed the basic concepts and different API conventions present in Elasticsearch. But let me tell you something interesting, whatever I have discussed in the previous blog, is just the tip of the iceberg. In this Elasticsearch tutorial blog, I will introduce all the features which make the Elasticsearch the fastest and most popular among its competitors. Also, I will introduce you to the different API’s present in Elasticsearch and how you can perform different searches using them through this Elasticsearch tutorial blog.
Below are the topics that I will be discussing this Elasticsearch tutorial blog:
Elasticsearch APIs
Query DSL
Mapping
Analysis
Modules
So, let’s get started with the very first topic of this Elasticsearch tutorial blog.
Elasticsearch APIs – Elasticsearch Tutorial
This section of Elasticsearch tutorial blog talks about various kinds of API’s supported by Elasticsearch. Let’s understand each of them in detail.
Document API
Elasticsearch provides both single document APIs and multi-document APIs.
SINGLE DOCUMENT API
Index API
Get API
Update API
Delete API
MULTI-DOCUMENT API
Multi Get API
Bulk API
Delete By Query API
Update By Query API
Reindex API
Now that you know about different types of Document APIs, let’s try to implement CRUD operations to them.
Index API
The index API is responsible for adding and updating a typed JSON document in a specific index and then making it searchable. The following example inserts the JSON document into the “playlist” index, under a type called “kpop” with an id of 1:
PUT /playlist/kpop/1
{
"title" : "Beautiful Life",
"artist" : "Crush",
"album" : "Goblin",
"year" : 2017
}
GET API
The get API is responsible for fetching a typed JSON document from the index based on its unique id. The following example gets a JSON document from a “playlist” index, under a type called “kpop”, with id valued 2:
1
GET /playlist/kpop/2
UPDATE API
The updated API is responsible for updating a document based on a script provided. The operation fetches the document from the index, runs the script and then indexes back the result. To make sure no updates happen during the “get” and “reindex”, it uses versioning. The following example updates a JSON document from a “playlist” index, under a type called “kpop”, by adding a new field called “time”:
In this chapter, we will understand the installation procedure of Elasticsearch in detail.
To install Elasticsearch on your local computer, you will have to follow the steps given below −
Step 1 − Check the version of java installed on your computer. It should be java 7 or higher. You can check by doing the following −
In Windows Operating System (OS) (using command prompt)−
> java -version
In UNIX OS (Using Terminal) −
$ echo $JAVA_HOME
Step 2 − Depending on your operating system, download Elasticsearch from www.elastic.co as mentioned below −
For windows OS, download ZIP file.
For UNIX OS, download TAR file.
For Debian OS, download DEB file.
For Red Hat and other Linux distributions, download RPN file.
APT and Yum utilities can also be used to install Elasticsearch in many Linux distributions.
Step 3 − Installation process for Elasticsearch is simple and is described below for different OS −
Windows OS− Unzip the zip package and the Elasticsearch is installed.
UNIX OS− Extract tar file in any location and the Elasticsearch is installed.
$wget
https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch7.0.0-linux-x86_64.tar.gz
$tar -xzf elasticsearch-7.0.0-linux-x86_64.tar.gz
Using APT utility for Linux OS− Download and install the Public Signing Key
$ wget -qo - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo
apt-key add -
Save the repository definition as shown below −
$ echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" |
sudo tee -a /etc/apt/sources.list.d/elastic-7.x.list
Run update using the following command −
$ sudo apt-get update
Now you can install by using the following command −
$ sudo apt-get install elasticsearch
Download and install the Debian package manually using the command given here −
$wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch7.0.0-amd64.deb
$sudo dpkg -i elasticsearch-7.0.0-amd64.deb0
Using YUM utility for Debian Linux OS
Download and install the Public Signing Key −
$ rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch
ADD the following text in the file with .repo suffix in your “/etc/yum.repos.d/” directory. For example, elasticsearch.repo
elasticsearch-7.x]
name=Elasticsearch repository for 7.x packages
baseurl=https://artifacts.elastic.co/packages/7.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md
You can now install Elasticsearch by using the following command
sudo yum install elasticsearch
Step 4 − Go to the Elasticsearch home directory and inside the bin folder. Run the elasticsearch.bat file in case of Windows or you can do the same using command prompt and through terminal in case of UNIX rum Elasticsearch file.
In Windows
> cd elasticsearch-2.1.0/bin
> elasticsearch
In Linux
$ cd elasticsearch-2.1.0/bin
$ ./elasticsearch
Note − In case of windows, you might get an error stating JAVA_HOME is not set, please set it in environment variables to “C:\Program Files\Java\jre1.8.0_31” or the location where you installed java.
Step 5 − The default port for Elasticsearch web interface is 9200 or you can change it by changing http.port inside the elasticsearch.yml file present in bin directory. You can check if the server is up and running by browsing http://localhost:9200. It will return a JSON object, which contains the information about the installed Elasticsearch in the following manner −
{
"name" : "Brain-Child",
"cluster_name" : "elasticsearch", "version" : {
"number" : "2.1.0",
"build_hash" : "72cd1f1a3eee09505e036106146dc1949dc5dc87",
"build_timestamp" : "2015-11-18T22:40:03Z",
"build_snapshot" : false,
"lucene_version" : "5.3.1"
},
"tagline" : "You Know, for Search"
}
Step 6 − In this step, let us install Kibana. Follow the respective code given below for installing on Linux and Windows −
For Installation on Linux −
wget https://artifacts.elastic.co/downloads/kibana/kibana-7.0.0-linuxx86_64.tar.gz
tar -xzf kibana-7.0.0-linux-x86_64.tar.gz
cd kibana-7.0.0-linux-x86_64/
./bin/kibana
For Installation on Windows −
Download Kibana for Windows from https://www.elastic.co/products/kibana. Once you click the link, you will find the home page as shown below
good
ReplyDelete