Ceph MDS daemon damaged问题

Ceph MDS daemon damaged后的有效处理方法

系统信息

Ceph版本:Luminous 12.2.7

系统版本:Ubuntu 16.04.2 LTS


遇到的问题

我们的CephFS是多个Active的MDS,在底层OSD扩容的时候,一个Active的MDS突然报告damaged

ceph status 报告如下:

            1 filesystem is degraded
            1 mds daemon damaged

ceph fs status 报告如下:

mycephfs - 100 clients
=======
+------+--------+----------------+---------------+-------+-------+
| Rank | State  |      MDS       |    Activity   |  dns  |  inos |
+------+--------+----------------+---------------+-------+-------+
|  0   | active |    mds-node6   | Reqs:   18 /s |  579k |  578k |
|  1   | active |    mds-node7   | Reqs:    0 /s | 31.8k | 31.8k |
|  2   | failed |                |               |       |       |
+------+--------+----------------+---------------+-------+-------+
+-----------------+----------+-------+-------+
|       Pool      |   type   |  used | avail |
+-----------------+----------+-------+-------+
| cephfs_metadata | metadata | 2798M | 7762G |
|   cephfs_data   |   data   | 7987G | 7762G |
+-----------------+----------+-------+-------+

+----------------+
|  Standby MDS   |
+----------------+
|    mds-node4   |
|    mds-node3   |
|    mds-node5   |
+----------------+
MDS version: ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5) luminous (stable)

查看CephFS Rank的damage信息:

# ceph tell mds.2 damage
terminate called after throwing an instance of 'std::out_of_range'
  what():  map::at
Aborted (core dumped)


有效处理办法

repaire mds rank

# ceph mds repaired 2

之后查看CephFS状态

+------+--------+----------------+---------------+-------+-------+
| Rank | State  |      MDS       |    Activity   |  dns  |  inos |
+------+--------+----------------+---------------+-------+-------+
|  0   | active |    mds-node6   | Reqs:   15 /s |  523k |  522k |
|  1   | active |    mds-node7   | Reqs:    0 /s | 31.8k | 31.8k |
|  2   | replay |    mds-node4   |               |       |       |
+------+--------+----------------+---------------+-------+-------+

...



+------+-----------+----------------+---------------+-------+-------+
| Rank |   State   |      MDS       |    Activity   |  dns  |  inos |
+------+-----------+----------------+---------------+-------+-------+
|  0   |   active  |    mds-node6   | Reqs:   15 /s |  523k |  522k |
|  1   |   active  | mds-node7 | Reqs:    0 /s | 31.8k | 31.8k |
|  2   | reconnect |    mds-node4   |               |  164k |  164k |
+------+-----------+----------------+---------------+-------+———----+
...
+------+--------+----------------+---------------+-------+-------+
| Rank | State  |      MDS       |    Activity   |  dns  |  inos |
+------+--------+----------------+---------------+-------+-------+
|  0   | active |    mds-node6   | Reqs:    0 /s |  242k |  242k |
|  1   | active | mds-node7 | Reqs:    0 /s | 31.9k | 31.8k |
|  2   | active |    mds-node4   | Reqs:    0 /s |  320k |  319k |
+------+--------+----------------+---------------+-------+——----—+



意外的影响

在修复damaged MDS后,发现之前连接的client都断开了,不知道什么原因,应该不是预期效果

这样会导致之前有mount CephFS的节点上的CephFS不能正常使用,需要umount后再mount使用

  • 发表于 2018-08-18 12:48
  • 阅读 ( 713 )
  • 分类:CephFS

你可能感兴趣的文章

相关问题

0 条评论

请先 登录 后评论
不写代码的码农
bruins

9 篇文章

作家榜 »

  1. bruins 9 文章
  2. ictfox 1 文章
  3. men 0 文章
  4. Ianasa xia 0 文章
  5. Wangyang 0 文章
  6. yy 0 文章
  7. liangfang 0 文章
  8. wangguoqin1001 0 文章