Alle weiteren Dienste, die nicht in die drei oberen Foren gehören.
-
Wambui
- Beiträge: 120
- Registriert: 03.08.2014 10:06:10
- Lizenz eigener Beiträge: GNU Free Documentation License
Beitrag
von Wambui » 07.02.2016 14:57:57
Hallo,
ich beschäftige mich gerade mit Pacemaker respektive Corosync. Wie kann ich einen solchen Cluster mit zwei Nodes zuverlässig rebooten? Wenn ich einfach shutdown -h now oder reboot mache, dann habe ich beim Starten der beiden Dienste nachher Probleme. So funktioniert dann nach einem
crm status nicht mehr "Connection to cluster failed: connection failed" oder beide sind OFFLINE. Also habe ich vor dem reboot/Shutdown
gemacht. Nur das rödelt
Code: Alles auswählen
service pacemaker stop
Signaling Pacemaker Cluster Manager to terminate: [ OK ]
Waiting for cluster services to unload:......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
Also wie stoppe einen HA-Cluster wie Pacemaker zuverlässig um zum Beispiel Hardware zu wechseln?
Grüße
Wambui
-
Colttt
- Beiträge: 3012
- Registriert: 16.10.2008 23:25:34
- Wohnort: Brandenburg
-
Kontaktdaten:
Beitrag
von Colttt » 07.02.2016 15:14:19
Zeig mal die config und die ip-einstellungen.. Du hast da sicher was falsch gemacht
Debian-Nutzer
ZABBIX Certified Specialist
-
Wambui
- Beiträge: 120
- Registriert: 03.08.2014 10:06:10
- Lizenz eigener Beiträge: GNU Free Documentation License
Beitrag
von Wambui » 07.02.2016 16:57:44
corosync.conf für beide
Code: Alles auswählen
# Please read the corosync.conf.5 manual page
#compatibility: whitetank
#aisexec {
# user: root
# group: root
#}
service {
name: pacemaker
ver: 1
use_mgmtd: yes
}
totem {
ttl: 255
clear_node_high_bit: yes
rrp_mode: passive
version: 2
secauth: on
threads: 2
interface {
ringnumber: 0
bindnetaddr: 192.168.0.0
mcastaddr: 226.94.1.1
mcastport: 5404
}
interface {
ringnumber: 1
bindnetaddr: 10.0.00.0
mcastaddr: 226.94.12.1
mcastport: 5406
}
}
logging {
fileline: off
to_stderr: yes
to_logfile: yes
to_syslog: yes
logfile: /var/log/corosync.log
debug: off
timestamp: on
}
gemäß "Linux Hochverfügbarkeit" von Oliver Siebel, 2013
elwood ist DC:
Code: Alles auswählen
auto lo
iface lo inet loopback
allow-hotplug eth0
iface eth0 inet static
address 192.168.0.101
netmask 255.255.255.0
network 192.168.0.0
broadcast 192.168.0.255
gateway 192.168.0.1
# dns-* options are implemented by the resolvconf package, if installed
dns-nameservers 192.168.0.1
dns-search local.site
auto eth1
iface eth1 inet static
address 10.0.0.101
netmask 255.0.0.0
network 10.0.0.0
und jake:
Code: Alles auswählen
auto lo
iface lo inet loopback
allow-hotplug eth0
iface eth0 inet static
address 192.168.0.102
netmask 255.255.255.0
network 192.168.0.0
broadcast 192.168.0.255
gateway 192.168.0.1
# dns-* options are implemented by the resolvconf package, if installed
dns-nameservers 192.168.0.1
dns-search local.site
auto eth1
iface eth1 inet static
address 10.0.0.102
netmask 255.0.0.0
network 10.0.0.0
cibadmin -Q
Code: Alles auswählen
<cib epoch="33" num_updates="28" admin_epoch="0" validate-with="pacemaker-1.2" cib-last-written="Sun Feb 7 16:49:17 2016" crm_feature_set="3.0.6" update-origin="elwood" update-client="crm_attribute" have-quorum="1" dc-uuid="jake">
<configuration>
<crm_config>
<cluster_property_set id="cib-bootstrap-options">
<nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff"/>
<nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="openais"/>
<nvpair id="cib-bootstrap-options-expected-quorum-votes" name="expected-quorum-votes" value="2"/>
<nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith-enabled" value="false"/>
<nvpair id="cib-bootstrap-options-no-quorum-policy" name="no-quorum-policy" value="ignore"/>
<nvpair id="cib-bootstrap-options-stonith-action" name="stonith-action" value="poweroff"/>
<nvpair id="cib-bootstrap-options-default-resource-stickiness" name="default-resource-stickiness" value="100"/>
</cluster_property_set>
</crm_config>
<nodes>
<node id="elwood" type="normal" uname="elwood">
<instance_attributes id="nodes-elwood">
<nvpair id="nodes-elwood-standby" name="standby" value="off"/>
</instance_attributes>
</node>
<node id="jake" type="normal" uname="jake">
<instance_attributes id="nodes-jake">
<nvpair id="nodes-jake-standby" name="standby" value="false"/>
</instance_attributes>
</node>
</nodes>
<resources>
<primitive class="ocf" id="Service_IP" provider="heartbeat" type="IPaddr2">
<instance_attributes id="Service_IP-instance_attributes">
<nvpair id="Service_IP-instance_attributes-ip" name="ip" value="192.168.0.150"/>
<nvpair id="Service_IP-instance_attributes-cidr_netmask" name="cidr_netmask" value="24"/>
</instance_attributes>
<operations>
<op id="Service_IP-monitor-10" interval="10" name="monitor" timeout="20"/>
</operations>
<meta_attributes id="Service_IP-meta_attributes">
<nvpair id="Service_IP-meta_attributes-target-role" name="target-role" value="started"/>
</meta_attributes>
</primitive>
<primitive class="ocf" id="Service_IP2" provider="heartbeat" type="IPaddr2">
<instance_attributes id="Service_IP2-instance_attributes">
<nvpair id="Service_IP2-instance_attributes-ip" name="ip" value="192.168.0.151"/>
<nvpair id="Service_IP2-instance_attributes-cidr_netmask" name="cidr_netmask" value="24"/>
</instance_attributes>
<operations>
<op id="Service_IP2-monitor-10" interval="10" name="monitor" timeout="20"/>
</operations>
</primitive>
</resources>
<constraints/>
</configuration>
<status>
<node_state id="jake" uname="jake" ha="active" in_ccm="true" crmd="online" join="member" expected="member" crm-debug-origin="do_update_resource" shutdown="0">
<lrm id="jake">
<lrm_resources>
<lrm_resource id="Service_IP2" type="IPaddr2" class="ocf" provider="heartbeat">
<lrm_rsc_op id="Service_IP2_last_0" operation_key="Service_IP2_start_0" operation="start" crm-debug-origin="do_update_resource" crm_feature_set="3.0.6" transition-key="11:0:0:77c1a508-bdbb-4fbf-b2d6-727e76d74206" transition-magic="0:0;11:0:0:77c1a508-bdbb-4fbf-b2d6-727e76d74206" call-id="4" rc-code="0" op-status="0" interval="0" last-run="1454860185" last-rc-change="1454860185" exec-time="100" queue-time="0" op-digest="f59bf798eed8f6364cb446d88b1166b0"/>
<lrm_rsc_op id="Service_IP2_monitor_10000" operation_key="Service_IP2_monitor_10000" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.6" transition-key="12:0:0:77c1a508-bdbb-4fbf-b2d6-727e76d74206" transition-magic="0:0;12:0:0:77c1a508-bdbb-4fbf-b2d6-727e76d74206" call-id="5" rc-code="0" op-status="0" interval="10000" last-rc-change="1454860185" exec-time="30" queue-time="0" op-digest="26e47fc921703013010c1b38bab54f9b"/>
</lrm_resource>
<lrm_resource id="Service_IP" type="IPaddr2" class="ocf" provider="heartbeat">
<lrm_rsc_op id="Service_IP_last_0" operation_key="Service_IP_monitor_0" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.6" transition-key="7:0:7:77c1a508-bdbb-4fbf-b2d6-727e76d74206" transition-magic="0:7;7:0:7:77c1a508-bdbb-4fbf-b2d6-727e76d74206" call-id="2" rc-code="7" op-status="0" interval="0" last-run="1454860185" last-rc-change="1454860185" exec-time="150" queue-time="0" op-digest="d75c4388bc0d9b932e63bbe807f9d105"/>
</lrm_resource>
</lrm_resources>
</lrm>
<transient_attributes id="jake">
<instance_attributes id="status-jake">
<nvpair id="status-jake-probe_complete" name="probe_complete" value="true"/>
</instance_attributes>
</transient_attributes>
</node_state>
<node_state id="elwood" uname="elwood" ha="active" in_ccm="true" crmd="online" join="member" expected="member" crm-debug-origin="do_update_resource" shutdown="0">
<lrm id="elwood">
<lrm_resources>
<lrm_resource id="Service_IP" type="IPaddr2" class="ocf" provider="heartbeat">
<lrm_rsc_op id="Service_IP_last_0" operation_key="Service_IP_start_0" operation="start" crm-debug-origin="do_update_resource" crm_feature_set="3.0.6" transition-key="9:0:0:77c1a508-bdbb-4fbf-b2d6-727e76d74206" transition-magic="0:0;9:0:0:77c1a508-bdbb-4fbf-b2d6-727e76d74206" call-id="4" rc-code="0" op-status="0" interval="0" last-run="1454860185" last-rc-change="1454860185" exec-time="80" queue-time="0" op-digest="d75c4388bc0d9b932e63bbe807f9d105"/>
<lrm_rsc_op id="Service_IP_monitor_10000" operation_key="Service_IP_monitor_10000" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.6" transition-key="10:0:0:77c1a508-bdbb-4fbf-b2d6-727e76d74206" transition-magic="0:0;10:0:0:77c1a508-bdbb-4fbf-b2d6-727e76d74206" call-id="5" rc-code="0" op-status="0" interval="10000" last-rc-change="1454860185" exec-time="30" queue-time="0" op-digest="ab9d849a0cc63056be0d083c011bbdfc"/>
</lrm_resource>
<lrm_resource id="Service_IP2" type="IPaddr2" class="ocf" provider="heartbeat">
<lrm_rsc_op id="Service_IP2_last_0" operation_key="Service_IP2_monitor_0" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.6" transition-key="5:0:7:77c1a508-bdbb-4fbf-b2d6-727e76d74206" transition-magic="0:7;5:0:7:77c1a508-bdbb-4fbf-b2d6-727e76d74206" call-id="3" rc-code="7" op-status="0" interval="0" last-run="1454860185" last-rc-change="1454860185" exec-time="160" queue-time="0" op-digest="f59bf798eed8f6364cb446d88b1166b0"/>
</lrm_resource>
</lrm_resources>
</lrm>
<transient_attributes id="elwood">
<instance_attributes id="status-elwood">
<nvpair id="status-elwood-probe_complete" name="probe_complete" value="true"/>
</instance_attributes>
</transient_attributes>
</node_state>
</status>
</cib>
Jetzt liessen sich beide Nodes manuell starten. Allerdings elwood musste ich den pacemaker zweimal starten bis er nicht FAILED anzeigte.
-
heisenberg
- Beiträge: 4146
- Registriert: 04.06.2015 01:17:27
- Lizenz eigener Beiträge: MIT Lizenz
Beitrag
von heisenberg » 07.02.2016 17:05:58
Hi,
gerne auch nochmal einen
crm configure show und einen
crm status von beiden Nodes als Pastebin?
Ich bin jetzt noch nicht so der Pacemaker-Überflieger der sich die Matrix ähem Clusterconfiguration im XML-Quellcode anschaut.
![Smile :)](./images/smilies/icon_smile.gif)
-
Wambui
- Beiträge: 120
- Registriert: 03.08.2014 10:06:10
- Lizenz eigener Beiträge: GNU Free Documentation License
Beitrag
von Wambui » 07.02.2016 17:20:07
Kein Problem.
crm configure show
Code: Alles auswählen
node elwood \
attributes standby="off"
node jake \
attributes standby="false"
primitive Service_IP ocf:heartbeat:IPaddr2 \
params ip="192.168.0.150" cidr_netmask="24" \
op monitor interval="10" timeout="20" \
meta target-role="started"
primitive Service_IP2 ocf:heartbeat:IPaddr2 \
params ip="192.168.0.151" cidr_netmask="24" \
op monitor interval="10" timeout="20"
property $id="cib-bootstrap-options" \
dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false" \
no-quorum-policy="ignore" \
stonith-action="poweroff" \
default-resource-stickiness="100"
crm status
Code: Alles auswählen
============
Last updated: Sun Feb 7 17:18:23 2016
Last change: Sun Feb 7 16:49:17 2016 via crm_attribute on elwood
Stack: openais
Current DC: jake - partition with quorum
Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
2 Nodes configured, 2 expected votes
2 Resources configured.
============
Online: [ elwood jake ]
Service_IP (ocf::heartbeat:IPaddr2): Started elwood
Service_IP2 (ocf::heartbeat:IPaddr2): Started jake
Als Zusatzinfo, ich lasse dass alles auf zwei Virtualbox-Instanzen laufen.
-
Wambui
- Beiträge: 120
- Registriert: 03.08.2014 10:06:10
- Lizenz eigener Beiträge: GNU Free Documentation License
Beitrag
von Wambui » 07.02.2016 17:32:27
Sorry,
hier das Ganz aus der Perspektive von jake:
crm configure show
Code: Alles auswählen
node elwood \
attributes standby="off"
node jake \
attributes standby="false"
primitive Service_IP ocf:heartbeat:IPaddr2 \
params ip="192.168.0.150" cidr_netmask="24" \
op monitor interval="10" timeout="20" \
meta target-role="started"
primitive Service_IP2 ocf:heartbeat:IPaddr2 \
params ip="192.168.0.151" cidr_netmask="24" \
op monitor interval="10" timeout="20"
property $id="cib-bootstrap-options" \
dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false" \
no-quorum-policy="ignore" \
stonith-action="poweroff" \
default-resource-stickiness="100"
crm status
Code: Alles auswählen
============
Last updated: Sun Feb 7 17:32:05 2016
Last change: Sun Feb 7 16:49:17 2016 via crm_attribute on elwood
Stack: openais
Current DC: jake - partition with quorum
Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
2 Nodes configured, 2 expected votes
2 Resources configured.
============
Online: [ elwood jake ]
Service_IP (ocf::heartbeat:IPaddr2): Started elwood
Service_IP2 (ocf::heartbeat:IPaddr2): Started jake