Pacemaker und Corosync rebooten

Wambui · Beitrag von **Wambui** » 07.02.2016 14:57:57

Hallo,
ich beschäftige mich gerade mit Pacemaker respektive Corosync. Wie kann ich einen solchen Cluster mit zwei Nodes zuverlässig rebooten? Wenn ich einfach shutdown -h now oder reboot mache, dann habe ich beim Starten der beiden Dienste nachher Probleme. So funktioniert dann nach einem

Code: Alles auswählen

service corosync start
service pacemaker start

crm status nicht mehr "Connection to cluster failed: connection failed" oder beide sind OFFLINE. Also habe ich vor dem reboot/Shutdown

Code: Alles auswählen

service pacemaker stop

gemacht. Nur das rödelt

Code: Alles auswählen

service pacemaker stop
Signaling Pacemaker Cluster Manager to terminate: [  OK  ]
Waiting for cluster services to unload:......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

Also wie stoppe einen HA-Cluster wie Pacemaker zuverlässig um zum Beispiel Hardware zu wechseln?

Grüße
Wambui

Colttt · Beitrag von **Colttt** » 07.02.2016 15:14:19

Zeig mal die config und die ip-einstellungen.. Du hast da sicher was falsch gemacht

Wambui · Beitrag von **Wambui** » 07.02.2016 16:57:44

corosync.conf für beide

Code: Alles auswählen

# Please read the corosync.conf.5 manual page
#compatibility: whitetank

#aisexec {
#       user: root
#       group: root
#}

service {
        name: pacemaker
        ver: 1
        use_mgmtd: yes
}

totem {
        ttl: 255
        clear_node_high_bit: yes
        rrp_mode: passive
        version: 2
        secauth: on
        threads: 2
        interface {
                ringnumber: 0
                bindnetaddr: 192.168.0.0
                mcastaddr: 226.94.1.1
                mcastport: 5404
        }
        interface {
                ringnumber: 1
                bindnetaddr: 10.0.00.0
                mcastaddr: 226.94.12.1
                mcastport: 5406
        }
}

logging {
        fileline: off
        to_stderr: yes
        to_logfile: yes
        to_syslog: yes
        logfile: /var/log/corosync.log
        debug: off
        timestamp: on
}

gemäß "Linux Hochverfügbarkeit" von Oliver Siebel, 2013

elwood ist DC:

Code: Alles auswählen

auto lo
iface lo inet loopback
allow-hotplug eth0
iface eth0 inet static
        address 192.168.0.101
        netmask 255.255.255.0
        network 192.168.0.0
        broadcast 192.168.0.255
        gateway 192.168.0.1
        # dns-* options are implemented by the resolvconf package, if installed
        dns-nameservers 192.168.0.1
        dns-search local.site
auto eth1
iface eth1 inet static
        address 10.0.0.101
        netmask 255.0.0.0
        network 10.0.0.0

und jake:

Code: Alles auswählen

auto lo
iface lo inet loopback
allow-hotplug eth0
iface eth0 inet static
        address 192.168.0.102
        netmask 255.255.255.0
        network 192.168.0.0
        broadcast 192.168.0.255
        gateway 192.168.0.1
        # dns-* options are implemented by the resolvconf package, if installed
        dns-nameservers 192.168.0.1
        dns-search local.site
auto eth1
iface eth1 inet static
        address 10.0.0.102
        netmask 255.0.0.0
        network 10.0.0.0

cibadmin -Q

Code: Alles auswählen

<cib epoch="33" num_updates="28" admin_epoch="0" validate-with="pacemaker-1.2" cib-last-written="Sun Feb  7 16:49:17 2016" crm_feature_set="3.0.6" update-origin="elwood" update-client="crm_attribute" have-quorum="1" dc-uuid="jake">
  <configuration>
    <crm_config>
      <cluster_property_set id="cib-bootstrap-options">
        <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff"/>
        <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="openais"/>
        <nvpair id="cib-bootstrap-options-expected-quorum-votes" name="expected-quorum-votes" value="2"/>
        <nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith-enabled" value="false"/>
        <nvpair id="cib-bootstrap-options-no-quorum-policy" name="no-quorum-policy" value="ignore"/>
        <nvpair id="cib-bootstrap-options-stonith-action" name="stonith-action" value="poweroff"/>
        <nvpair id="cib-bootstrap-options-default-resource-stickiness" name="default-resource-stickiness" value="100"/>
      </cluster_property_set>
    </crm_config>
    <nodes>
      <node id="elwood" type="normal" uname="elwood">
        <instance_attributes id="nodes-elwood">
          <nvpair id="nodes-elwood-standby" name="standby" value="off"/>
        </instance_attributes>
      </node>
      <node id="jake" type="normal" uname="jake">
        <instance_attributes id="nodes-jake">
          <nvpair id="nodes-jake-standby" name="standby" value="false"/>
        </instance_attributes>
      </node>
    </nodes>
    <resources>
      <primitive class="ocf" id="Service_IP" provider="heartbeat" type="IPaddr2">
        <instance_attributes id="Service_IP-instance_attributes">
          <nvpair id="Service_IP-instance_attributes-ip" name="ip" value="192.168.0.150"/>
          <nvpair id="Service_IP-instance_attributes-cidr_netmask" name="cidr_netmask" value="24"/>
        </instance_attributes>
        <operations>
          <op id="Service_IP-monitor-10" interval="10" name="monitor" timeout="20"/>
        </operations>
        <meta_attributes id="Service_IP-meta_attributes">
          <nvpair id="Service_IP-meta_attributes-target-role" name="target-role" value="started"/>
        </meta_attributes>
      </primitive>
      <primitive class="ocf" id="Service_IP2" provider="heartbeat" type="IPaddr2">
        <instance_attributes id="Service_IP2-instance_attributes">
          <nvpair id="Service_IP2-instance_attributes-ip" name="ip" value="192.168.0.151"/>
          <nvpair id="Service_IP2-instance_attributes-cidr_netmask" name="cidr_netmask" value="24"/>
        </instance_attributes>
        <operations>
          <op id="Service_IP2-monitor-10" interval="10" name="monitor" timeout="20"/>
        </operations>
      </primitive>
    </resources>
    <constraints/>
  </configuration>
  <status>
    <node_state id="jake" uname="jake" ha="active" in_ccm="true" crmd="online" join="member" expected="member" crm-debug-origin="do_update_resource" shutdown="0">
      <lrm id="jake">
        <lrm_resources>
          <lrm_resource id="Service_IP2" type="IPaddr2" class="ocf" provider="heartbeat">
            <lrm_rsc_op id="Service_IP2_last_0" operation_key="Service_IP2_start_0" operation="start" crm-debug-origin="do_update_resource" crm_feature_set="3.0.6" transition-key="11:0:0:77c1a508-bdbb-4fbf-b2d6-727e76d74206" transition-magic="0:0;11:0:0:77c1a508-bdbb-4fbf-b2d6-727e76d74206" call-id="4" rc-code="0" op-status="0" interval="0" last-run="1454860185" last-rc-change="1454860185" exec-time="100" queue-time="0" op-digest="f59bf798eed8f6364cb446d88b1166b0"/>
            <lrm_rsc_op id="Service_IP2_monitor_10000" operation_key="Service_IP2_monitor_10000" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.6" transition-key="12:0:0:77c1a508-bdbb-4fbf-b2d6-727e76d74206" transition-magic="0:0;12:0:0:77c1a508-bdbb-4fbf-b2d6-727e76d74206" call-id="5" rc-code="0" op-status="0" interval="10000" last-rc-change="1454860185" exec-time="30" queue-time="0" op-digest="26e47fc921703013010c1b38bab54f9b"/>
          </lrm_resource>
          <lrm_resource id="Service_IP" type="IPaddr2" class="ocf" provider="heartbeat">
            <lrm_rsc_op id="Service_IP_last_0" operation_key="Service_IP_monitor_0" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.6" transition-key="7:0:7:77c1a508-bdbb-4fbf-b2d6-727e76d74206" transition-magic="0:7;7:0:7:77c1a508-bdbb-4fbf-b2d6-727e76d74206" call-id="2" rc-code="7" op-status="0" interval="0" last-run="1454860185" last-rc-change="1454860185" exec-time="150" queue-time="0" op-digest="d75c4388bc0d9b932e63bbe807f9d105"/>
          </lrm_resource>
        </lrm_resources>
      </lrm>
      <transient_attributes id="jake">
        <instance_attributes id="status-jake">
          <nvpair id="status-jake-probe_complete" name="probe_complete" value="true"/>
        </instance_attributes>
      </transient_attributes>
    </node_state>
    <node_state id="elwood" uname="elwood" ha="active" in_ccm="true" crmd="online" join="member" expected="member" crm-debug-origin="do_update_resource" shutdown="0">
      <lrm id="elwood">
        <lrm_resources>
          <lrm_resource id="Service_IP" type="IPaddr2" class="ocf" provider="heartbeat">
            <lrm_rsc_op id="Service_IP_last_0" operation_key="Service_IP_start_0" operation="start" crm-debug-origin="do_update_resource" crm_feature_set="3.0.6" transition-key="9:0:0:77c1a508-bdbb-4fbf-b2d6-727e76d74206" transition-magic="0:0;9:0:0:77c1a508-bdbb-4fbf-b2d6-727e76d74206" call-id="4" rc-code="0" op-status="0" interval="0" last-run="1454860185" last-rc-change="1454860185" exec-time="80" queue-time="0" op-digest="d75c4388bc0d9b932e63bbe807f9d105"/>
            <lrm_rsc_op id="Service_IP_monitor_10000" operation_key="Service_IP_monitor_10000" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.6" transition-key="10:0:0:77c1a508-bdbb-4fbf-b2d6-727e76d74206" transition-magic="0:0;10:0:0:77c1a508-bdbb-4fbf-b2d6-727e76d74206" call-id="5" rc-code="0" op-status="0" interval="10000" last-rc-change="1454860185" exec-time="30" queue-time="0" op-digest="ab9d849a0cc63056be0d083c011bbdfc"/>
          </lrm_resource>
          <lrm_resource id="Service_IP2" type="IPaddr2" class="ocf" provider="heartbeat">
            <lrm_rsc_op id="Service_IP2_last_0" operation_key="Service_IP2_monitor_0" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.6" transition-key="5:0:7:77c1a508-bdbb-4fbf-b2d6-727e76d74206" transition-magic="0:7;5:0:7:77c1a508-bdbb-4fbf-b2d6-727e76d74206" call-id="3" rc-code="7" op-status="0" interval="0" last-run="1454860185" last-rc-change="1454860185" exec-time="160" queue-time="0" op-digest="f59bf798eed8f6364cb446d88b1166b0"/>
          </lrm_resource>
        </lrm_resources>
      </lrm>
      <transient_attributes id="elwood">
        <instance_attributes id="status-elwood">
          <nvpair id="status-elwood-probe_complete" name="probe_complete" value="true"/>
        </instance_attributes>
      </transient_attributes>
    </node_state>
  </status>
</cib>

Jetzt liessen sich beide Nodes manuell starten. Allerdings elwood musste ich den pacemaker zweimal starten bis er nicht FAILED anzeigte.

heisenberg · Beitrag von **heisenberg** » 07.02.2016 17:05:58

Hi,

gerne auch nochmal einen crm configure show und einen crm status von beiden Nodes als Pastebin?

Ich bin jetzt noch nicht so der Pacemaker-Überflieger der sich die Matrix ähem Clusterconfiguration im XML-Quellcode anschaut.

Wambui · Beitrag von **Wambui** » 07.02.2016 17:20:07

Kein Problem.
crm configure show

Code: Alles auswählen

node elwood \
        attributes standby="off"
node jake \
        attributes standby="false"
primitive Service_IP ocf:heartbeat:IPaddr2 \
        params ip="192.168.0.150" cidr_netmask="24" \
        op monitor interval="10" timeout="20" \
        meta target-role="started"
primitive Service_IP2 ocf:heartbeat:IPaddr2 \
        params ip="192.168.0.151" cidr_netmask="24" \
        op monitor interval="10" timeout="20"
property $id="cib-bootstrap-options" \
        dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
        cluster-infrastructure="openais" \
        expected-quorum-votes="2" \
        stonith-enabled="false" \
        no-quorum-policy="ignore" \
        stonith-action="poweroff" \
        default-resource-stickiness="100"

crm status

Code: Alles auswählen

============
Last updated: Sun Feb  7 17:18:23 2016
Last change: Sun Feb  7 16:49:17 2016 via crm_attribute on elwood
Stack: openais
Current DC: jake - partition with quorum
Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
2 Nodes configured, 2 expected votes
2 Resources configured.
============

Online: [ elwood jake ]

 Service_IP     (ocf::heartbeat:IPaddr2):       Started elwood
 Service_IP2    (ocf::heartbeat:IPaddr2):       Started jake

Als Zusatzinfo, ich lasse dass alles auf zwei Virtualbox-Instanzen laufen.

Wambui · Beitrag von **Wambui** » 07.02.2016 17:32:27

Sorry,
hier das Ganz aus der Perspektive von jake:

crm configure show

Code: Alles auswählen

node elwood \
        attributes standby="off"
node jake \
        attributes standby="false"
primitive Service_IP ocf:heartbeat:IPaddr2 \
        params ip="192.168.0.150" cidr_netmask="24" \
        op monitor interval="10" timeout="20" \
        meta target-role="started"
primitive Service_IP2 ocf:heartbeat:IPaddr2 \
        params ip="192.168.0.151" cidr_netmask="24" \
        op monitor interval="10" timeout="20"
property $id="cib-bootstrap-options" \
        dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
        cluster-infrastructure="openais" \
        expected-quorum-votes="2" \
        stonith-enabled="false" \
        no-quorum-policy="ignore" \
        stonith-action="poweroff" \
        default-resource-stickiness="100"

crm status

Code: Alles auswählen

============
Last updated: Sun Feb  7 17:32:05 2016
Last change: Sun Feb  7 16:49:17 2016 via crm_attribute on elwood
Stack: openais
Current DC: jake - partition with quorum
Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
2 Nodes configured, 2 expected votes
2 Resources configured.
============

Online: [ elwood jake ]

 Service_IP     (ocf::heartbeat:IPaddr2):       Started elwood
 Service_IP2    (ocf::heartbeat:IPaddr2):       Started jake

debianforum.de

Pacemaker und Corosync rebooten

Pacemaker und Corosync rebooten

Re: Pacemaker und Corosync rebooten

Re: Pacemaker und Corosync rebooten

Re: Pacemaker und Corosync rebooten

Re: Pacemaker und Corosync rebooten

Re: Pacemaker und Corosync rebooten