Report written by Rémy Vandepoel
DBaaS Time Series: OVH storage team use case
Today, it is necessary to be equipped with a dashboard to get an accurate view of activity while coexisting with a monitoring system, which provides information on the state of health of the entire server farm. Being equipped with these metric tools is indispensable in assuring the operation of thousands of servers across different zones.
"Today, it is necessary to be equipped with a dashboard to get an accurate view of activity."
Due to the growing volume of data, over the past several months, the Storage team has identified new needs. Consideration has been given to the technologies surrounding the three main aspects of metrology: collection, storage and exploitation.
These aspects make it possible to have a view of the entire server farm via:
• Graphite: displays daily information concerning the tens of thousands of servers and disks which make up the server farm. It combines simplicity and flexibility for aggregation and summarization.
• Dashing: complimentary and displays critical values in t time within dashboards.
We can define metrology as a way to evaluate, visualize and analyze data provided by applications or hardware resources.
For this, we must be able to store/retrieve information easily, in general through an API or web interfaces to access advanced features – Grafana.
This solution is generally combined with monitoring (including Shinken/Nagios or Icinga), using probes and alerts to notify administrators about the status of various services and components.
Time Series, the next Eldorado?
Information and data about an infrastructure, has become the new raw material to exploit. It is just one facet of something larger called data science which is comprised of solutions, capacities and the manipulation of large volumes of data.
One of the trends on this subject has been big data, permitting the collection and analysis of large volumes, making them accessible and reusable.
Time Series analysis, another branch of this data-science is highly coveted among startups that work closely with connected devices.
IoT users are not the only ones that handle large volumes of information on a daily basis, another example is Business Intelligence.
Storage team usages in numbers
1 TB – Volume of storage per day
57,000 –Metrics provided per second
152 billion – Stored points for the last 30 days
OVH has developed a dedicated solution: DBaaS Time Series
It acts as a point of entry into this universe, to manipulate and process the terabytes (or for some, petabytes) of identified metrics.
The release of DBaaS Time Series simplifies access to storage and analysis technologies in multiple use cases. A major advantage of this solution is that administrators are not preoccupied with the platform tool, which allows them to concentrate on their work. Internally at OVH, many teams have integrated this solution into their production environments as a metrology tool. Today, it is the use case which we are discussing.
This choice provides benefits to the Storage team, accounting for numerous advantages: unlimited expansion in DBaaS mode, time savings, and the Storage team is no longer responsible for maintaining the metrology infrastructure. On a daily basis, the improvement seen are twofold, allowing for rapid integration of DBaaS TS based on OpenTSDB standards and the simplification of requests through these open protocols.
Switching to DBaaS Time Series allows them to be even better equipped and to be constantly informed when monitoring the infrastructure.
Different blocks are included in the OVH offer which allow us to combine them to form a platform delivered to our customers. The Metrics Gateway, the real port of entry of DBaaS TS is designed to be multi-protocol and for now, manages OpenTSDB (subsequently, Graphite or InfluxDB are envisioned). In order to be stored, the metrics collected must be relayed, quickly and without any loss of data. The choice of Kafka for this central point is the Event Bus to manage the queueing process. This is also the solution adopted by OVH PaaS Logs, allowing us to take advantage of internal expertise.
Received data is then stored within Metrics Warehouse. It acts as one gigantic specialized and optimized database for temporary data on the Big-Data Apache stack. The choice in HBase provides the advantage of infinite scalability and multi-site replication. The last of the components managed by OVH for this platform is the Dashboard. The benefit of using Grafana is immediate: data are accessible in nearly real-time.
How DBaaS TS is powered
Upstream, it is necessary to collect information, the values which will be stored. For this, scollecter has been deployed.
DBaaS TS is OpenTSDB compatible and respects conventions. Scollector, for its part, has the particular advantage of having the ability to natively communicate with its API. This is one of the major assets that pushed the team to use and simplify integration.
Collection is carried out via scripts, some directly include classic system metrics (i.e. processor and RAM consumption, load average, available disk space on each partition) and others have been developed internally by employees using Python. The project is available on GitHub: scollector. One of the biggest advantages of this solution is rapid adaptability. Our business-specific scripts were deployed in record time.
"One of the biggest advantages of scollector is rapid adaptability. Our business-specific scripts were deployed in record time."
Many specific needs surround storage, requiring the team to develop their own add-ons:
• S.M.A.R.T, makes known the state of health for each disk in the server farm
• Iostat and the read/write executed for each disk
• ZFS widely used internally, metrics must be collected, like remaining capacity of the pools and cache (ARC, L2ARC) and the number of snapshots
Use of Grafana
For the purpose of analyzing the billions of points collected each day, it is essential to be equipped with a tool for visualization, Grafana.
This choice was made to cover the needs of flexibility and responsiveness of the teams and permits cross referencing of different sources of information. This is why OVH made Grafana directly available within DBaaS TS.
The metric visualization interface is of great importance, because when analyzed by an administrator it will become added value and a real benefit during extended diagnostics.
DBaaS TS used in a metrology scenario is an asset for a massive infrastructure. Combined with a monitoring solution, it becomes a tool in which precision is reliable and robust.
Daily use of Grafana combined with DBaaS TS on the backend, permits to view in real-time – the data that was made accessible just minutes prior. The ability to step back in time and look at the metrics displayed provides the opportunity to see a trend over the previous hours or days and better distinguish one element from another.
Innovation is part of our DNA and facing daily challenges, without question, is what we adore. These current challenges revolve around search by pattern recognition which increases our ability to make predictions. The goal of DBaaS TS is to be a game changer by making these concepts available to all.
You can contribute to improving this service alongside our experts by signing up to the dedicated mailing-list: firstname.lastname@example.org.