vRack 3.0: the OVH private network, intensively redesigned to continue anticipating IT project requirements
After the Wikileaks incident, Anonymous pointed to some of the weaknesses of the Internet. By democratizing DDoS attacks, they showed how easy it is to take down a website. Some teenagers therefore discovered a new type of videogame in which the weapon is the LOIC (Low Orbit Ion Cannon) and the target is a prestigious website.
Before 2011, DDoS attacks were very often launched by groups of hackers to take back a network of robots (botnet) by taking control of an IRC server. We have also had clients targeted by a group of cyber criminals paid by their competitors. Their goal was to take their website down to get their visitors.
As in real life, DDoS attacks on the Internet were and always will be linked to cyber banditry. These attacks are done by men paid to do them. And against them, there are other men protecting the clients’ infrastructures. We transpose the world as it is nowadays on the Internet.
Up until 2011, anti-DDoS protection systems were based off the limitations of certain kinds of traffic by source and/or destination IP addresses, which would stop 99 out of 100 attacks. Protections were available for everyone and free for customers. After 2011, a lot of new people were introduced to DDoS and got interested in this type of activity.
The democratization of attacks. New types of attacks were written and spread on the Internet. Finally, websites appeared that allowed anyone to buy the necessary resources to generate an attack on a target during five or 10 minutes. For any reason, anyone with a Paypal account can now launch an attack on a target. On our level, we noticed that at the end of 2012, our protections were not efficient enough to prevent these new types of attacks.
We have always considered anti-DDoS protections as a standard feature, not an option. To be protected against all kinds of attacks should not be a luxury that only a few customers can afford. At the same time, even with efficient anti-DDoS protections, our clients are the target of more severe, diversified and unlimited attacks because hackers are adapting and modifying the size of their attacks accordingly. We therefore had to go all out from the get go.
At first sight, yes, it is impossible to resolve. But if the investments are mutual between all of our clients, the cost for the anti-DDoS service per customer will stay rather low. Therefore, the idea is to do the opposite of all of our competitors: not to offer an expensive option, but to integrate the anti-DDoS protection in the overall cost of the service.
If we were only to use the market solutions, we would have to invest around $2.75 million for 100 Gbps/100 Mpps of protection. Nowadays, a protection capacity of 100 Gbps is not enough. Moreover, our datacenters cover five zones: Paris (P19, GSW, DC1), Roubaix (RBX), Strasbourg (SBG), Gravelines (GRA), so four zones in France, plus the Montreal zone (BHS) in North America.
Indeed, we would have to invest over $10 million to get 500 Gbps/500 Mpps of total protection capacity for the five zones covered by our datacenters. If we add our 24/7 support team, this would mean a raise in costs of around $10 a month. This is way too much for us.
Since 2011, we have been developing some network functions non-related to DDoS, on the Tilera processors. These are processors with 48 or 60 programmable cores that get to the same level of performance than the algorithms written in hardware on the ASIC* (Application Specific Integrated Circuit).
As an investment, instead of $2.75 million for 100 Gbps/100 Mpps, we can have the same level of protection for $103,000! The investment costs are divided by 26! It is enormous. But the Tilera platforms are simply hardware able to manage a large amount of packets per second. There is no software on them and so we have to write everything from the beginning with specialized developers.
Two years. It is a long time and our clients did not have the time to wait that long in order to access the anti-DDoS protections. We therefore focused our attention on the anti-DDoS protections that block the most vicious attacks on bandwidth. We reduced the developing time to two months.
As for the rest, those that do not require a large bandwidth capacity, we use the market’s standard solutions. Our added value was to imagine and putting in place this combination of technologies that reduce the investment and makes the anti-DDoS protection available to all.
Indeed, we created a mix of different technologies to get a protection of around 500 Gbps/500 Mpps, available for each of our clients within 3 to 6 months and without affecting prices. We have also decided to use three anti-DDoS infrastructures in three of our zones and protect the remaining two zones through the three other zones.
We set up a 160 Gbps protection in Strasbourg (SBG), Roubaix (RBX) and Montreal (BHS), which equals to a total of 480 Gbps. In case of an attack on an IP in Paris or Gravelines, but also on an IP at Roubaix, Strasbourg or Beauharnois, the traffic is cleaned on all three infrastructures at the same time.
Each of the 160 Gbps infrastructures is called “VAC”. The term VAC comes from the word vacuum. We have VAC1, VAC2 and VAC3 in three different locations in Europe and North America. All three VAC work in parallel to manage all of the attacks on each one of our clients.
If the attack comes from Eastern or Central Europe, like from Russia, Poland, Romania, Italy, Switzerland or Germany, they are vacuumed by VAC2, in Strasbourg. If the attack comes from Western or Northern Europe, so from France, Belgium, the Netherlands, the United Kingdom, or Spain, it is handled by VAC1, which is in Roubaix. Finally, if the attack comes from North or South America, Asia or Australia, is goes through VAC3, in the Montreal (BHS) zone, in Quebec.
Before we can talk about the size of the attacks, the network needs to have a surplus of capacity to catch the attack and redirect it to the VAC. If the attack is very powerful, a nice anti-DDoS infrastructure will be useless because the network will already by saturated upstream and the service will be downgraded for all of our clients.
Today, our average interconnection capacity between our network and the Internet is about 2.5 Tbps. By the end of the year, we will have 3.5 Tbps and we believe that by the end of 2014, we should be at 5 Tbps.
Yes, but our network is exploited in the “OVH towards Internet” direction. On the other hand, the attacks come from the “Internet towards OVH” direction. In other words, we truly have a surplus capacity of more than 2 Tbps in the direction we need to handle the DDoS attacks.
We sometimes get attacks targeted towards our customers with a size over 70-80 Gbps. This type of attacks is stopped without a problem and the customer sees no impact on their services.
The surplus capacity of over 2 Tbps is spread through 20 PoPs (points of presence) across Europe and North America. Paris, Frankfurt, Warsaw, Amsterdam, Madrid, New York, Chicago, Los Angeles, Miami… wherever our network is physically present. In these PoP, we are linked with different users.
A DDoS attack coming from thousands of source IP addresses arrives from all parts of the world. Unlike our competitors, we do not concentrate the arrival of the attack on our network through one PoP in one city. Here, a DDoS attack enters our network at the closest location from the source of the attack, for instance in Miami, Los Angeles, Toronto, Prague, Vienna, Madrid, Paris, etc. The attack is therefore spread through all of the network capacity around the world, so it does not overload the users, nor the network. And then, it is vacuumed and cleaned by the closest VAC.
The VAC is made up of many pieces of equipment in a series that look at each IP packet to then decide, depending on the content of the packet, if it is legitimate, if it is a packet linked to an attack, or if we need to release an authentication algorithm for the packet.
In the case of a legitimate packet, it is accepted and routed towards the next piece of equipment, and then the last piece of equipment of the VAC redirects it to the client’s server. In the case of a packet that evidently is part of an attack, the IP packet is erased. In the third case, these are packets that we need to authenticate because they may be legitimate, or part of an attack. The VAC therefore releases an authentication algorithm to determine what to do.
For example, we use an algorithm named “Syn Auth” to block the Synflood types of attack, which consist in sending a lot of SYN packets from spoofed IP sources. Here is how it works: once the VAC receives the SYN packet, it erases it and sends a RST packet to the source IP that reinitializes the source IP’s connection. This standard connection recovery mechanism is interpreted by the source IP address, and it replies with a new SYN packet with the same SYN sequence number as the first packet. The VAC remembers the first packet and realizes that the second packet is resending the same sequence number. Consequence: the source IP address is added to the VAC’s “white list” for an hour. All of the connections from the source IP to the destination IP are automatically accepted. However, if the source IP resends SYN packets that have nothing to do with the previous SYN packets, the VAC generates exactly the same algorithm for all of those SYN packets and therefore erases them all. The attack is then prevented and the legitimate packets keep going through.
Besides the “SSH” client that replays the sequence correctly, but does not automatically reinitializes the new connection, all other TCP clients can manage this algorithm without a problem.
We only have to rerun the “SSH” client and the connection is established.
It is the consequence of the choice the client made by activating the “permanent mitigation” option. Indeed, if a client chooses to be protected 24/7, mitigation works all the time. And since the SSH is used exclusively by the customer to manage the server, it is a compromise he chose to have on his server.
Yes, in the case of auto-mitigation, OVH detects the attack and then activates the suction on the VAC. After the end of the attack, OVH stops the mitigation after 15 minutes. Instead of 15 minutes, the client can choose between “immediately”, “1h”, “6h” or “26h”. In the case of “permanent mitigation”, the client chooses to use the VAC 24/7.
If the client has activated the permanent mitigation option and there really is an attack, we detect it and activate, on our side, auto-mitigation, as well as permanent mitigation. There is no change in the process, but if, during the attack, the client deactivates the permanent mitigation option, the mitigation still goes on because of the auto-mitigation.
With Pro, the client automatically has permanent mitigation. He also has access to the Firewall Network that allows him to authorize certain IP addresses or applications to connect to his server and block everything else. Finally, we stock all of the information on the streams that were accepted and erased by the VAC for each mitigated IP, for seven days. It’s a service with a very high added value to analyze an attack after the fact.
Yes, that is all, but it is huge. OVH is the only one on the market offering permanent mitigation for all of its services. Permanent mitigation is very expensive because it uses the VAC resources 24/7, even when there is no attack. In the case of auto-mitigation, the client only uses the anti-DDoS protection during an attack.
When it comes to the service itself, they are exactly the same. By default or Pro, the client has access to every protection against any attacks, no matter the length of the attack, the size of the attack, or the type of attack. We integrate in the service alerts at the beginning of mitigation, at the end of mitigation, with the mitigation reports, including traffic curves of what entered the VAC and what got out of it. We also add a 10-lines sample of the traffic that was erased and accepted.
Our surplus network has a capacity over 2 Tbps. We have three VAC in production, so we can manage up to 480 Gbps/480 Mpps.
Tests on VAC1 and the developing on Tilera took three months. After the first VAC was validated, we launched the production on the two other VAC. Everything was done in five months.
We have been handling attacks on every level since the first day at OVH, for 14 years, whether it is on the backbone, the dedicated servers or mutual hosting. We have a very long experience in L7 protections for websites against any kind of appreciative attacks. We are developing the software that makes L3/L4 protections that we are spinning onto the Tilera. To write an algorithm that starts up to 10 million times per second on each 10G Tilera port, one needs to know how to code, but also how the Internet works in each packet’s detail. Finally, if a client uses auto-mitigation and it is lagging, de-adjusted or simply does not work very well, the client does not necessarily see it since mitigation is only activated during the attack, which means a few minutes here and there. With permanent mitigation, we deal with very complicated cases, for different clients with a variety of services. It is an experience unavailable to our competitors because they only have auto-mitigation.
During the betas, we tested for the right settings and we got to verify the global functioning of the VAC. We then tuned the settings to some particular cases related to some clients and integrated them in the final configuration. We also fixed some flaws. The VAC now works perfectly well.
The VAC is paired with the API v6, which allows us to automate the mitigations. Adding the new Firewall Network rules, activating the mitigation, deactivating it on one IP address or all IP addresses is very easily done with the API v6. In parallel, the Manager takes the available functions from the API v6 to make them usable within a few clicks.
The present VAC provides a unidirectional, or asymmetrical, protection and some attacks require a bidirectional, or symmetrical, protection.
With unidirectional protection, the VAC only takes the packets in the “Internet towards the client’s server” direction, but lets the server directly respond in the opposite direction. For all L3/L4 levels of protection, to check the packet’s checksum, etc. it is perfect. If we are talking about layer 7 (L7), for instance, we are not only talking about the IP packet’s heading, or the content of the first packet, but we are talking about the connection between tens of packets.
The decision as to whether it is an attack or not can sometimes only be taken after the connection is made. To mitigate in a unidirectional way from L7, the VAC has to authorize the attack to get on the client’s server, to then determine if the connection is legitimate or not. The server’s resources are therefore held up during the attack, to then be freed up.
The bidirectional, or symmetrical protections. We are presently working on setting up L7 Web and DNS protections that can be activated “on the go”, without any reconfiguration on the client’s server. This would block attacks on applications, for instance, a DDoS on a site’s URL, or the slowloris, which consists in activating a maximum of connections on a server, to then slowly write the request. We know how to mitigate these kinds of attacks on our infrastructure and only let the legitimate connections reach the server.
The betas are going on right now, and we believe that the service will be STABLE by the end of September and available for the Web/SSL. Then, we will work on the DNS and maybe other protocols.
We are indeed challenging the norms of this market by making the anti-DDoS protection accessible for all of our clients, while it usually is a very expensive service. Once again, we did it by questioning the established technologies and by creating an innovative technical solution completely different and which works better with our clients’ needs in terms of prices and mitigation capacity.
* Application Specific for Integrated Circuit: it is an electronic circuit that integrates on one chip all of the necessary active elements to build a function or an electronic assembly.
An ASIC can integrate many function of one board, or replace it, and even integrate new functions in a smaller casing.
vRack 3.0: the OVH private network, intensively redesigned to continue anticipating IT project requirements
1.3 Tbit/s mitigated by the VAC: a recap on the Memcached episode
OVH wishes you all the best for 2018, and presents its ambitions for this year