Monitoring your server cluster
The server’s web interface will give you feedback on what the current state of the system is. If the system is in any way degraded, you will see an alert highlighted on the server's home page. If you are unable to access the web interface on one of your servers, then try accessing it using the other server(s) in the cluster.
Clicking on the alert link will take you to the Multi-Site Resilience (MSR) page where you will be able to get more information about what the issue is.
On this page, you will see an indication of the status of each server in the cluster. The top section is the status of the video server components and the bottom section is the status of the database server components.
Green indicates that the component is online and working.
Grey indicates that the component is online but in a degraded state. For example, a database will be presented like this when it is syncing data from the cluster after a restart.
Red indicates that the component is offline.
Clicking on a component will give additional information. For video components, this includes a list of the encoders and viewers attached to it. You can also change the name of the component through this page to make it more relevant to your deployment.
Temporarily disabling a server in an MSR deployment
There may be situations where having a particular server active in a deployment may cause issues for the deployment as a whole. For example, a particular site has an issue that degrades the bandwidth available to the rest of the system but doesn’t disconnect the server. In this situation, any encoders or clients that connect to that site may struggle to stay connected or transmit video.
It is possible to manually ‘fence’ a given site to prevent it from participating in the deployment. This can be done from the MSR page without physical access to the site to be ‘fenced’. This allows temporary disabling of this site while remedial work takes place.
Both EdgeVis Servers and their database can be independently fenced – to remove a site from the cluster it is recommended that both the server and database be fenced.
Fencing a site
With an administrator-level account, navigate to the Multi-site resilience page (on any server).
To fence a server, select it to open its status page and click the Fence server button from the menu on the right. The server will immediately become unavailable to new and existing encoders/viewers.
To fence a database, select it to open its status page and click the Fence database button from the menu on the right.
Repeat, if necessary, for any additional servers/databases at the affected site.
Any servers/databases that have been fenced will now report as FENCED on the main MSR status page.
Unfence a site
With an administrator-level account, navigate to the Multi-site resilience page (on any active server).
To unfence a server, select it to open its status page and click the Rejoin server button from the menu on the right.
To unfence a database, select it to open its status page and click the Rejoin database button from the menu on the right.
Repeat, if necessary, for any additional servers/databases at the affected site.
After re-joining the cluster, the servers/databases will now report as ONLINE on the main MSR status page.
Identifying and troubleshooting common failures
Both video and database components for a server are marked as offline
This indicates that the other server is completely disconnected from the cluster.
This state is normal during restart of one of the servers and will resolve itself once the server returns.
If the state doesn’t return to normal then check the following:
If you can access the other server’s UI and it reports the opposite status, then it is likely to be a network issue. Check the network connection between the servers is functional and any firewall rules are correct.
Check the other server hardware is powered and functional.
Check that the EdgeVis Server service is running on the other server.
Check the time synchronisation between the servers.
Once the issue has been resolved, the offline server will reconnect to the cluster and its database will synchronise with the other server. After the synchronisation is complete, the cluster will be in a healthy state again.
Video component is marked as offline
This indicates that while both servers are running, the video components of the cluster have disconnected from each other.
Normally, this should resolve itself after no more than 15 minutes.
If the servers remain in this state, then check the following:
If you can access the other server’s UI and it reports the opposite status, then it is likely to be a network issue. Check the network connection between the servers is functional and any firewall rules are correct.
Check the time synchronisation between the servers.
Database is marked as syncing
This occurs when a server is recovering from a failure.
When the server comes back online, it may need to re-synchronise the database with the other server before it is fully healthy and redundant again.
In some cases this may require a full copy of the database. In these cases the server may report this state for some time depending on the speed of the link between the sites.
If this state doesn’t resolve itself, check available bandwidth on the link between the sites as it may not be fast enough to complete the synchronisation.