POSTS
OpenSIPS High Availability
Our company is growing and because of this our infrastructure is growing too. Not a bad problem to have. We wanted to bring more FreeSWITCH servers online to test out the new version but this was proving difficult. Because of this we decided to introduce OpenSIPS into the mix in front of FreeSWITCH so that we can more easily control the ‘edge’ of our voice network. Now we have OpenSIPS operating as a SIP signaling proxy between our vendors and our media/feature servers (FreeSWITCH).
I’m currently bringing online two more OpenSIPS nodes to hande our endpoint traffic (web dialers and SIP clients) via WebRTC, SIPS/TLS and we need to make these nodes highly available. OpenSIPS offers a “clusterer” module that allows us to create a “Federated Cluster Topology” from two OpenSIPS nodes. Big words. I know. I wanted to understand it better since there is a lot of ‘magic’ involved. The gist is the cluster topology shares a common NOSQL database store, while each node retains separate in-memory AoR data. In this case we used MongoDB which stores the user location data centrally for both servers to query if the record is found to not be in-memory.
First, the setup
Install and load the correct modules (usrloc, clusterer, cachedb_mongodb, json (for raw query parsing))
...
#### USeR LOCation module
loadmodule "usrloc.so"
modparam("usrloc", "nat_bflag", "NAT")
modparam("usrloc", "use_domain", 1)
modparam("usrloc", "working_mode_preset", "federation-cachedb-cluster")
modparam("usrloc", "location_cluster", 1)
#### CLUSTERER module
loadmodule "clusterer.so"
modparam("clusterer", "my_node_id", 1) ## CHANGE THIS ON THE SECOND NODE
modparam("clusterer", "seed_fallback_interval", 5)
modparam("clusterer", "db_url", "mysql://opensips:opensips@localhost/opensips")
#### JSON module
loadmodule "json.so"
#### CACHEDB_MONGODB module
loadmodule "cachedb_mongodb.so"
modparam("usrloc", "cachedb_url","mongodb://10.0.1.244:27017/opensipsDB.userlocation")
modparam("cachedb_mongodb", "cachedb_url","mongodb:instance1://10.0.1.244:27017/opensipsDB.userlocation")
modparam("cachedb_mongodb", "compat_mode_3.0", 1)
...
Configure the clusterer module by inserting the correct node data in each node’s respective config database
mysql> select * from clusterer;
+----+------------+---------+----------------+-------+-----------------+----------+------------+-------+-------------+
| id | cluster_id | node_id | url | state | no_ping_retries | priority | sip_addr | flags | description |
+----+------------+---------+----------------+-------+-----------------+----------+------------+-------+-------------+
| 1 | 1 | 1 | bin:10.0.1.100 | 1 | 3 | 50 | 10.0.1.100 | seed | NULL |
| 2 | 1 | 2 | bin:10.0.2.100 | 1 | 3 | 50 | 10.0.2.100 | seed | NULL |
+----+------------+---------+----------------+-------+-----------------+----------+------------+-------+-------------+
2 rows in set (0.00 sec)
Federated Cluster Topology - Default using cluster_check_addr
However, the default recommended configuration for Federated Clusters seemed a little inefficient from a signaling standpoint. If we have two nodes (X and Y), where Alice is registered to X, but FreeSWITCH randomly sends a SIP INVITE to Y then Y will determine if it’s part of a cluster, realize that Alice is not registered locally, but registered to X, then Y will forward the INVITE to X. So, 50% of the time this inefficiency will exist. If FreeSWITCH sent an INVITE to X (looking for Alice), it would find Alice’s AoR registered in memory locally, then send an INVITE to Alice’s phone. This is not the end of the world, and would scale fairly well.
When we register we can see MongoDB being updated:
> db.userlocation.find().pretty()
{
"_id" : "tylerdurden@sip.domain.com10.0.1.100",
"aor" : "tylerdurden@sip.domain.com",
"home_ip" : "10.0.1.100"
}
Here’s what the routing logic looks like:
route {
...
# if from a node in OpenSIPS cluster
if (cluster_check_addr(1, "$si")) {
# do local lookup
lookup("location","m");
switch ($retcode) {
case 1:
xlog("LOCAL CONTACT FOUND\n");
# relay to endpoint
if (!t_relay()) {
xlog("UNABLE TO RELAY TO CONTACT\n");
sl_reply_error();
} else {
xlog("RELAY TO CONTACT SUCCESSFUL\n");
}
exit;
case -1:
case -3:
sl_send_reply(404, "Not Here");
exit;
case -2:
sl_send_reply(405, "Not Allowed");
exit;
};
exit;
# if from FreeSWITCH
} else if ( $si==10.0.1.100 || $si==10.0.2.100 ) {
# do global lookup to find where the endpoint is registered
lookup("location","g");
switch ($retcode) {
case 1:
xlog("CONTACT FOUND, RELAY TO ENDPOINT OR OPENSIPS NODE\n");
if (!t_relay()) {
xlog("UNABLE TO RELAY TO CONTACT\n");
sl_reply_error();
} else {
xlog("RELAY TO CONTACT SUCCESSFUL\n");
}
exit;
case -1:
case -3:
sl_send_reply(404, "Not Found");
exit;
case -2:
sl_send_reply(405, "Not Allowed");
exit;
};
exit;
}
...
}
Federated Cluster Topology - Custom config with redirect
What if we needed to scale more than two nodes? It was an interesting thought experiment even if unlikely since OpenSIPS is extremely efficient. We introduced a SIP 302 Redirect so that OpenSIPS can get out of the signaling path if the user isn’t registered to that particular node. If switches were to blindly accept 302 Redirects that could be a security issue but in this case we’re fine since we control these servers as well.
route {
...
# remove the cluster_check_addr block
# if from FreeSWITCH
if ( $si==10.0.1.100 || $si==10.0.2.100 ) {
# build AoR without "sip:"
$avp(short_aor) = $tU + "@" + $td";
# query mongo for the home_ip (the IP address of the server that is registered)
# this was a little tricky to get right, the third param needs wrapping quotes in OpenSIPS v3.0.X
cache_raw_query("mongodb:instance1",
"{\"find\":\"userlocation\", \"query\":{\"aor\":\"$avp(short_aor)\"}}", "$avp(aor_doc)");
$json(aor_doc) := $avp(aor_doc);
$avp(home_ip) = $json(aor_doc/home_ip);
# send 404 or 405 is can't find contact in database so FreeSWITCH doesn't retry
if ($avp(home_ip) == null) {
xlog("CONTACT NOT REGISTERED IN MONGO\n");
sl_send_reply(405, "Not Allowed");
exit;
# if contact is registered, but not on this server then send 302 Redirect.
# this might be better as a 305 redirect because the 302 rewrites the Contact Header
# with the IP address
if ($avp(home_ip) != $avp(private_ip)) {
xlog("LOCAL CONTACT NOT FOUND\n");
# add the redirect destinations as branch
$branch = "sip:" + $tU + "@" + $avp(home_ip);
# sending a 3xx reply will automatically push all
# existing branches as Contact URIs
send_reply(302,"Moved Temporarily");
exit;
}
# we need to recreate the original AoR with the domain name since it was replaced by the 302
$avp(original_aor) = "sip:" + $tU + "@sip.domain.com";
# this performs in-memory lookup of contact and t_relay sends an INVITE to registered contact
lookup("location","m",$avp(original_aor));
switch ($retcode) {
case 1:
xlog("LOCAL CONTACT FOUND IN MEMORY\n");
if (!t_relay()) {
xlog("UNABLE TO RELAY TO CONTACT\n");
sl_reply_error();
} else {
xlog("RELAY TO CONTACT SUCCESSFUL\n");
}
exit;
case -1:
case -3:
sl_send_reply(404, "Not Here");
exit;
case -2:
sl_send_reply(405, "Not Allowed");
exit;
};
}
...
}
This seemed overly complex given that we will probably never expand these nodes horizontally given how efficient OpenSIPS is. Instead, what we ended up with was this:
Serialized OpenSIPS Failover - no Cluster, no Mongo, no Mess!
Alice randomly registers to OpenSIPS node X or Y. FreeSWITCH will always INVITE to node X looking for Alice. X will return a 404 if Alice is not registered. FreeSWITCH will not try to INVITE to node Y. Y will return a 404 if Alice is not registered. So, it will have a 50/50 chance of getting it right. This eliminates the need for us to use the “clusterer” module (Federated Cluster Topology) which means we don’t have to manage a MongoDB cluster (another three nodes) on top of that.
Have FreeSWITCH always bridge to X first. If it receives a 404, it will try the next destination
session:execute("bridge","sofia/gateway/X/tylerdurden|sofia/gateway/Y/tylerdurden");
Now our OpenSIPS code has been reduced this:
route {
...
if ( $si==10.0.1.100 || $si==10.0.2.100 ) {
lookup("location","m");
switch ($retcode) {
case 1:
xlog("LOCAL CONTACT FOUND\n");
if (!t_relay()) {
xlog("UNABLE TO RELAY TO CONTACT\n");
sl_reply_error();
} else {
xlog("RELAY TO CONTACT SUCCESSFUL\n");
}
exit;
case -1:
case -3:
sl_send_reply(404, "Not Found");
exit;
case -2:
sl_send_reply(405, "Not Allowed");
exit;
};
exit;
}
...
}
We are not registering multiple devices with the same endpoint username, if you are you may want to use OpenSIPS’ default Federated Configuration instead since the above won’t ring all devices at once.