Ensuring Availability through System Design

MDS, in conjunction with K8T Ltd, has developed to ITIL standards, an integrated suite of management tools to enable ongoing and real-time Assessment, Management, Alerting and Reporting (AMAR™)

  • Assessment of design alternatives
    • to assist effective planning and modification of the hardware configuration, taking into consideration the potential impact on heat loadings (associated fire risk), power consumption (carbon footprint and distribution limitations) and cooling dynamics (temperature design limitations) in particular
    • to enable easier and more intuitive asset management of equipment at a variety of perspectives i.e. from the server/switch/etc, through racks, rows, vaults, data centres through to multiple centres
  • Monitoring of operational status
    • To monitor and visually represent a range of critical information sensed and captured within the Data Centre, via a variety of devices, including but not limited to:
      • building management status (biometric/digital controls, security alarms, visitor management, climate management, water detection, etc)
      • environmental status (temperature of vaults/racks/server/CPU, climate control performance, humidity, vibration levels, noise, etc)
      • power consumption (distribution of power against design tolerance to centre/vault/rack/server, UPS status, back-up system status, current, phasing volatility, etc)
      • hardware performance (CPU usage, disk space/status, swap space, fan usage, response times, etc)
      • application performance (data table usage/response, o/s monitoring, etc)
      • network performance (intrusion detection, bandwidth peak/average/trend, data transfer, packet size, IP connectivity uptime, VLAN uptime, web page response/uptime, etc)
  • Alerting of pre-critical states
    • Notification to engineers (own and clients) of a breach of pre-established tolerance levels for any and all status reading that are monitored, via appropriate systems (automated phone call, paging, sms, email, etc) depending on the criticality of alert and desired response timescale (linked to specific SLA)
  • Reporting of historical events to assist redesign
    • Reporting of information related to assessment, monitoring and alert triggers in a format (word doc, pdf, spreadsheet, data export) and timescale (real-time via extranet, near-real time via overnight, weekly, exception reporting etc) as appropriate. This included extranet access to progression of support activities in real-time.

MDS has been utilising AMAR™ for some time and has seen an identifiable improvement in efficiency and availability. This required:

  • Development of K8Tram, which integrates a drag and drop asset management solution with a "lite" computation flow dynamics application, providing a scenario planning tool to assess the impact of design changes or rack reconfiguration on the heat loading within the data suite.
  • Configuration of a wide range of sensing and monitoring points using Nagios® and integration into MDS's existing RESPOND alerting system
  • Development of appropriate reporting options, configurable for different clients.

Technical

K8Tram is a Microsoft based application on a SQL database written in .Net. The CFD calculations are based on algorithms written by and by based upon K8T Ltd experience.

BMS sensing and status is achieved by the following:

  • biometric/digital controls (Axiom3)
  • security alarms (BWS and Redcare)
  • visitor management (Eventum)
  • climate management (Airedale weblan)
  • water detection (Nagios®)

Environmental sensing and status is achieved by:

  • temperature of vaults/racks (Airedale weblan, K8Tram)
  • heat loadings (K8Tram)
  • temperature of server/CPU (Nagios®)
  • climate control performance) (Nagios®)
  • humidity (APC Environmental)

Power management sensing, control and billing is achieved through the following:

  • data centre power usage (Merlin electronic metre)
  • vault power usage (Merlin electronic metre)
  • distribution of power against design tolerance to rack (Nagios®)
  • UPS status (Nagios® /APC)
  • power usage to server (K8Tram theoretical, Nagios® actual)
  • back-up system status (Nagios®)
  • current (Nagios®)
  • phasing volatility (Nagios®)

Hardware performance (absolute and relative to design tolerances) sensing and status is achieved by:

  • CPU usage
  • disk space/status
  • swap space
  • fan usage
  • response times, etc

Application performance status is achieved by:

  • data table usage/response time
  • o/s monitoring

Network performance sensing and status is achieved by:

  • intrusion detection/prevention (Cisco)
  • bandwidth peak/average/trend (Cisco/Nagios®)
  • data transfer (Cisco/Nagios®)
  • packet size (Cisco/Nagios®)
  • IP connectivity uptime (WhatsUp Gold)
  • VLAN uptime and individual application connectivity (WhatsUp Gold)
  • web page response/uptime (Siteseer)

Nagios® is an Open Source host, service and network monitoring program designed to inform Data Centre Operators of network problems before clients, end-users or managers do. The monitoring daemon runs intermittent checks on hosts and services using external "plugins". With more than 2 million services monitored, Nagios® is a proven application for all aspects of a data centre monitoring environment.

MDS's RESPOND system triggers a variety of alerts 24x7 via Vodafone UK's paging and SMS text services as well as email alerts to engineers and clients alike.

Summary

Through the AMAR™ integrated suite, MDS achieve the following to the benefit of all our clients, the availability of their applications and the peace of mind for their business continuity


› Return to top of page