Aldebaran

人生最棒的感觉,就是你做到别人说你做不到的事。

0%

使用elasticsearch-curator备份Elasticsearch索引

生田絵梨花 - 白石麻衣

前言

在日常工作中,当我们需要去维护一个elasitcsearch集群以期能稳定工作。通常需要有计划的做很多事情。比如定期的清除数据,合并 segment,备份恢复等。如果我们具备编程能力,这些工作一般都是可以通过各种编程语言根据我们的需求,调用elasticsearch的API可以完成的。

elasticsearch整个生态圈已经很成熟。elastic.co提供的curator这个工具(用python开发的)已经为各种运维场景提供了完善的解决方案,大部分情况下,我们只需要使用curator就可以完成我们的日常需求。

下面就是curator备份Elasticsearch索引

准备过程

这里以nfs为例,如果使用各大云服务商的云存储(例如: AWS s3)类似。

  1. 添加Elasticsearch备份存储目录

    在集群的每台机器上进行目录创建

     $ mkdir /data/backup/elasticsearch_backup
    
  2. 挂载共享文件存储目录

    在集群的每台机器上目录挂载

    nfs4

     $ mount -t nfs 10.9.0.6:/share-7f4ef504-3ddb-40e9-853b-15d495cc9fb1 /data/backup
    

    nfs3

     $ mount -t nfs -o vers=3,nolock,proto=tcp,noresvport 10.9.0.6:/share-7f4ef504-3ddb-40e9-853b-15d495cc9fb1 /data/backup
    
  3. 修改Elasticsearch集群配置

    在Elasticsearch集群的每台机器上都添加path.repo属性

     path.repo: ["/data/backup/elasticsearch_backup"]
    

    配置修改完成后,需要重启Elasticsearch集群(依次重启)

  4. 安装elasticsearch-curator

    没有必要每台都安装过去, 安装方式不仅限于pip,也可以yum/apt。

     $ pip install elasticsearch-curator
    
  5. 建立备份仓库

    web终端(例如: elasticsearch-head/cerebro)

     PUT _snapshot/elasticsearch_backup
     {
         "type": "fs", 
         "settings": {
             "location": "/data/backup/elasticsearch_backup",
             "compress": true
         }
     }
    

    shell终端

     $ curl -X PUT "10.9.3.16:9200/_snapshot/elasticsearch_backup" -H 'Content-Type: application/json' -d'
     {
         "type": "fs",
         "settings": {
             "location": "/data/backup/elasticsearch_backup",
             "compress": true
         }
     }'
    

备份数据快照

  • 编辑curator.yml

    这里要注意的是 master_only: False参数。如果在elasticsearch集群的全部node上都安装了curator那么需要将这个值修改为master_only: True

      $ vim ./curator/curator.yml
    
      ---
      # Remember, leave a key empty if there is no value.  None will be a string,
      # not a Python "NoneType"
      client:
      hosts:
          - 10.9.3.16
          - 10.9.3.19
      port: 9200
      url_prefix:
      use_ssl: False
      certificate:
      client_cert:
      client_key:
      aws_key:
      aws_secret_key:
      aws_region:
      ssl_no_validate: False
      http_auth:
      timeout: 60
      master_only: False
    
      logging:
      loglevel: DEBUG
      logfile: /data/elasticsearch/elasticsearch_plugins/curator/curator.log
      logformat: default
      blacklist: ['elasticsearch', 'urllib3']
    
  • 编辑action.yml

    这个文件是关键。

    可以参考下面的例子中实现的功能: 备份前缀为sdk_ | game_且超过31天的索引,其默认快照名称模式为
    ‘es-%Y%m%d%H%M%S’。等待快照完成。跳过存储库文件系统访问检查

      ---
      # Remember, leave a key empty if there is no value.  None will be a string,
      # not a Python "NoneType"
      #
      # Also remember that all examples have 'disable_action' set to True.  If you
      # want to use this action as a template, be sure to set this to False after
      # copying it.
      actions:
      1:
          action: snapshot
          description: >-
          Snapshot sdk_|game_ prefixed indices older than 31 day (based on index
          creation_date) with the default snapshot name pattern of
          'es-%Y%m%d%H%M%S'.  Wait for the snapshot to complete.  Do not skip
          the repository filesystem access check.  Use the other options to create
          the snapshot.
          options:
          repository: elasticsearch_backup
          # Leaving name blank will result in the default 'curator-%Y%m%d%H%M%S'
          name: es-%Y%m%d%H%M%S
          ignore_unavailable: False
          include_global_state: True
          partial: True
          wait_for_completion: True
          skip_repo_fs_check: True
          ignore_empty_list: True
          continue_if_exception: False
          disable_action: False
          filters:
          - filtertype: pattern
          kind: regex
          value: '^(sdk_|game_).*$'
          - filtertype: age
          source: creation_date
          direction: older
          unit: days
          unit_count: 31
    
  • 运行备份

    试运行

      $ curator --dry-run --config ./curator/curator.yml ./curator/action.yml
    

    运行

      $ curator --config ./curator/curator.yml ./curator/action.yml
    

后续

备份完成后,可以通过API来查看,如果看到状态 "state": "SUCCESS" 即为成功

$ curl -XGET 'http://10.9.3.16:9200/_snapshot/elasticsearch_backup/squirrel-es-202002241009'

参考文档

https://elasticsearch.cn/article/560

Elasticsearch snapshot 备份的使用方法

https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-snapshots.html

https://www.elastic.co/guide/en/elasticsearch/client/curator/current/examples.html

https://github.com/elastic/curator

https://www.elastic.co/guide/cn/elasticsearch/guide/current/backing-up-your-cluster.html