ElastiSearch fields mapping customisation

Changing the field mapping ElasticSearch is using

With ElasticSearch, you don't need to explicitly define everything (field names, field types, indices, ...). He will try to do it automatically.
When uploading data using the REST api to an index which is not yet created, a new one with the name provided will be created. Default mapping (the types to use for each fields) and settings will be applied.
Most of the time, ElasticSearch is able to determine from itself the type from any given field and to create the according field mapping. But sometimes it is not or you need to force another type.

The most easy way is to create a default mapping template. Default template are automatically assigned to any newly created index by ElasticSearch based on the "template" field, containing a name/regex matching the index name.
For example:

Template for Syslog entries:
es1(admin)~$ curl -XDELETE http://localhost:9200/_template/syslog
es1(admin)~$ curl -XPUT http://localhost:9200/_template/syslog -d '{
  "template" : "syslog-*",
  "settings" : {
      "number_of_shards" : 1,
            "number_of_replicas" : 1
  },
  "mappings" : {
    "default" : {
      "properties" : {
        "timestamp" : {
          "type" : "date",
          "format" : "YYYY-MM-dd HH:mm:ss Z"
        },
        "severity" : {
          "type" : "integer"
        },
        "facility" : {
          "type" : "integer"
        }
      }
    }
  }
}'


The first command delete any default template named "syslog".
The second command use a PUT call to upload the new template definition. This template is doing the following:
  • will be applied to any index named syslog-<anything>
  • indices created with this template will only have 1 shards and 1 replica (so there will be 2 copies of the index)
  • if there is a field called "timestamp", it will be of type "date" if it matches the given format. Date format are following the Java DateTime format.
  • if there are fields called "severity" or "facility", there will be of type "integer".

Template for Tweets entries:
es1(admin)~$ curl -XDELETE http://localhost:9200/_template/tweets
es1(admin)~$ curl -XPUT http://localhost:9200/_template/tweets -d '{
  "template" : "tweets-*",
  "settings" : {
      "number_of_shards" : 2
  },
  "mappings" : {
    "default" : {
      "properties" : {
        "created_at" : {
          "type" : "date",
          "format" : "YYYY-MM-dd HH:mm:ss.SSS Z"
        }
      }
    }
  }
}'


The first command delete any default template named "tweets".
The second command use a PUT call to upload the new template definition. This template is doing the following:
  • will be applied to any index named tweets-<anything>
  • indices created with this template will only have 2 shards and 1 replica (because this is the default)
  • if there is a field called "created_at", it will be of type "date" if it matches the given format. Date format are following the Java DateTime format.
Remark: you don't need this mapping if you are ingesting tweets via Logstash, because he will automatically create a valid date timestamp from the tweet data received.
   
Mapping needs to be defined before creating the index. Once an index is created and fields already added to it, you cannot change their mapping.

Default mapping, creating once before starting your ingestion process is useful when the application that will fill in the data doesn't give you the ability to associate a mapping template.
Software like Logstash allows you to define a mapping template file to be use each time a new index is created.
These mapping files are defined like you do for default template. This is a JSON structure where you put the settings and mappings you want to be different of the defaults.
Here is a default mapping for an index created by Logstash from Apache2 Access logs:
{
  "template" : "apache-*",
  "version" : 50001,
  "settings" : {
    "index.refresh_interval" : "5s",
    "index.number_of_shards" : 1,
    "index.number_of_replicas" : 1
  },
  "mappings" : {
    "_default_" : {
      "_all" : {"enabled" : true, "norms" : false},
      "dynamic_templates" : [ {
        "message_field" : {
          "path_match" : "message",
          "match_mapping_type" : "string",
          "mapping" : {
            "type" : "text",
            "norms" : false
          }
        }
      }, {
        "string_fields" : {
          "match" : "*",
          "match_mapping_type" : "string",
          "mapping" : {
            "type" : "text", "norms" : false,
            "fields" : {
              "keyword" : { "type": "keyword" }
            }
          }
        }
      } ],
      "properties" : {
        "@timestamp": { "type": "date", "include_in_all": false },
        "@version": { "type": "keyword", "include_in_all": false },
        "geoip"  : {
          "dynamic": true,
          "properties" : {
            "ip": { "type": "ip" },
            "location" : { "type" : "geo_point" },
            "latitude" : { "type" : "half_float" },
            "longitude" : { "type" : "half_float" }
          }
        },
        "bytes" : { "type" : "long" }
      }
    }
  }
}


In this mapping, by default, all field will be of the type "string", except for the ones defined under properties:
    @timestamp
    @version
    geoip
    bytes

these are valid field names as created by Logstash from an Apache2 Access log.

Just save the above text in a text file that your Logstash configuration will point to; more details on the next page, concerning Logstash configuration and usage.