Load balancing is a common solution for distributing web applications horizontally across multiple hosts while providing the users with a single point of access to the service. HAProxyis one of the most popular open source load balancing software, which also offers high availability and proxy functionality.
HAProxy aims to optimise resource usage, maximise throughput, minimise response time, and avoid overloading any single resource. It is available for install on many Linux distributions like Ubuntu 16 in this guide, but also on Debian 8 and CentOS 7 systems.
HAProxy is particularly suited for very high traffic websites and is therefore often used to improve web service reliability and performance for multi-server configurations. This guide lays out the steps for setting up HAProxy as a load balancer on Ubuntu 16 to its own cloud host which then directs the traffic to your web servers.
As a pre-requirement for the best results, you should have a minimum of two web servers and a server for the load balancer. The web servers need to be running at least the basic web service such as Apache2 or nginx to test out the load balancing between them.
As a fast developing open source application HAProxy available for install in the Ubuntu default repositories might not be the latest release. To find out what version number is being offered through the official channels enter the following command.
sudo apt show haproxy
HAProxy has always three active stable versions of the releases, two of the latest versions in development plus a third older version that is still receiving critical updates. You can always check the currently newest stable version listed on the HAProxy website and then decide which version you wish to go with.
While the latest stable version 1.7 of HAProxy is not yet available on the packet manager by default, it can be found in a third party repository. To install HAProxy from an outside repo, you will need to add the new repository with the following command.
sudo add-apt-repository ppa:vbernat/haproxy-1.7
Confirm adding the new PPA by pressing the Enter key.
Next, update your sources list.
sudo apt update
Then install HAProxy as you normally would.
sudo apt install -y haproxy
Afterwards, you can double check the installed version number with the following command.
haproxy -v
HA-Proxy version 1.7.8-1ppa1~xenial 2017/07/09
Copyright 2000-2017 Willy Tarreau <willy@haproxy.org>
The installation is then complete. Continue below with the instructions for how to configuring the load balancer to redirect requests to your web servers.
Configuring the load balancer
Setting up HAProxy for load balancing is a quite straight forward process. Basically, all you need to do is tell HAProxy what kind of connections it should be listening for and where the connections should be relayed to.
This is done by creating a configuration file /etc/haproxy/haproxy.cfg with the defining settings. You can read about the configuration options at HAProxy documentation page if you wish to find out more.
Load balancing on layer 4
Once installed HAProxy should already have a template for configuring the load balancer. Open the configuration file, for example, using nano with the command underneath.
sudo nano /etc/haproxy/haproxy.cfg
Add the following sections to the end of the file. Replace the <server name> with what ever you want to call you servers on the statistics page and the <private IP> with the private IPs for the servers you wish to direct the web traffic to. You can check the private IPs at your UpCloud Control Panel and Private network tab under Network menu.
frontend http_front
bind *:80
stats uri /haproxy?stats
default_backend http_back
backend http_back
balance roundrobin
server <server1 name> <private IP 1>:80 check
server <server2 name> <private IP 2>:80 check
This defines a layer 4 load balancer with a front-end name http_front listening to the port number 80, which then directs the traffic to the default backend named http_back. The additional stats URI /haproxy?stats enables the statistics page at that specified address.
Different load balancing algorithms
Configuring the servers in the backend section allows HAProxy to use these servers for load balancing according to the roundrobin algorithm whenever available.
The balancing algorithms are used to decide which server at the backend each connection is transferred to. Some of the useful options include the following:
Roundrobin: Each server is used in turns according to their weights. This is the smoothest and fairest algorithm when the servers’ processing time remains equally distributed. This algorithm is dynamic, which allows server weights to be adjusted on the fly.
Leastconn: The server with the lowest number of connections is chosen. Round-robin is performed between servers with the same load. Using this algorithm is recommended with long sessions, such as LDAP, SQL, TSE, etc, but it is not very well suited for short sessions such as HTTP.
First: The first server with available connection slots receives the connection. The servers are chosen from the lowest numeric identifier to the highest, which defaults to the server’s position on the farm. Once a server reaches its maxconn value, the next server is used.
Source: The source IP address is hashed and divided by the total weight of the running servers to designate which server will receive the request. This way the same client IP address will always reach the same server while the servers stay the same.
Configuring load balancing for layer 7
Another possibility is to configure the load balancer to work on layer 7, which is useful when parts of your web application are located on different hosts. This can be accomplished by conditioning the connection transfer for example by the URL.
Open the HAProxy configuration file with a text editor.
sudo nano /etc/haproxy/haproxy.cfg
Then set the front and backend segments according to the example below.
frontend http_front
bind *:80
stats uri /haproxy?stats
acl url_blog path_beg /blog
use_backend blog_back if url_blog
default_backend http_back
backend http_back
balance roundrobin
server <server name> <private IP>:80 check
server <server name> <private IP>:80 check
backend blog_back
server <server name> <private IP>:80 check
The front end declares an ACL rule named url_blog that applies to all connections with paths that begin with /blog. Use_backend defines that connections matching the url_blog condition should be served by the backend named blog_back, while all other requests are handled by the default backend.
At the backend side, the configuration sets up two server groups, http_back like before and the new one called blog_back that servers specifically connections to example.com/blog.
After making the configurations, save the file and restart HAProxy with the next command.
sudo systemctl restart haproxy
If you get any errors or warnings at start up, check the configuration for any mistypes and then try restarting again.
Testing the setup
With the HAProxy configured and running, open your load balancer server’s public IP in a web browser and check that you get connected to your backend correctly. The parameter stats uri in the configuration enables the statistics page at the defined address.
http://<load balancer public IP>/haproxy?stats
When you load the statistics page and all of your servers are listed in green your configuration was successful!
The statistics page contains some helpful information to keep track of your web hosts including up and down times and session counts. If a server is listed in red, check that the server is powered on and that you can ping it from the load balancer machine.
In case your load balancer does not reply, check that HTTP connections are not getting blocked by a firewall. Also, confirm that HAProxy is running with the command below.
sudo systemctl status haproxy
Password protecting the statistics page
Having the statistics page simply listed at the front end, however, is publicly open for anyone to view, which might not be such a good idea. Instead, you can set it up to its own port number by adding the example below to the end of your haproxy.cfg file. Replace the username and password with something secure.
After adding the new listen group, remove the old reference to the stats uri from the frontend group. When done, save the file and restart HAProxy again.
sudo systemctl restart haproxy
Then open the load balancer again with the new port number, and log in with the username and password you set in the configuration file.
http://<load balancer public IP>:8181
Check that your servers are still reporting all green and then open just the load balancer IP without any port numbers on your web browser.
http://<load balancer public IP>/
If your backend servers have at least slightly different landing pages you will notice that each time you reload the page you get the reply from a different host. You can try out different balancing algorithms in the configuration section or take a look at the full documentation.
Conclusions
Congratulations on successfully configuring HAProxy! With a basic load balancer setup, you can considerably increase your web application performance and availability. This guide is however just an introduction to load balancing with HAProxy, which is capable of much more than what could be covered in first-time setup instruction. We recommend experimenting with different configurations with the help of the extensive documentation available for HAProxy, and then start planning the load balancing for your production environment.
While using multiple hosts to protect your web service with redundancy, the load balancer itself can still leave a single point of failure. You can further improve the high availability by setting up a floating IP between multiple load balancers. You can find out more about this at our article for Floating IPs on UpCloud.
http-serveris a simple, zero-configuration command-line http server. It is powerful enough for production usage, but it's simple and hackable enough to be used for testing, local development, and learning.
Installing globally:
Installation vianpm. If you don't havenpmyet:
curl https://npmjs.org/install.sh | sh
Once you havenpm:
npm install http-server -g
This will installhttp-serverglobally so that it may be run from the command line.
Usage:
http-server [path] [options]
[path]defaults to./publicif the folder exists, and./otherwise.
Installing as a node app
mkdir myapp
cd myapp/
jitsu install http-server
If you do not havejitsuinstalled you can install it vianpm install jitsu -g
Usage
Starting http-server locally
node bin/http-server
Now you can visit http://localhost:8080 to view your server
Deploy http-server to nodejitsu
jitsu deploy
You will now be prompted for asubdomainto deploy your application on
Available Options:
-pPort to listen for connections on (defaults to 8080)
-aAddress to bind to (defaults to '0.0.0.0')
-dShow directory listings (defaults to 'True')
-iDisplay autoIndex (defaults to 'True')
-eor--extDefault file extension (defaults to 'html')
-sor--silentIn silent mode, log messages aren't logged to the console.
-hor--helpDisplays a list of commands and exits.
-cSet cache time (in seconds) for cache-control max-age header, e.g. -c10 for 10 seconds. To disable caching, use -c-1.
Cradle's API builds right on top of Node's asynch API. Every asynch method takes a callback as its last argument. The return value is an event.EventEmitter, so listeners can also be optionally added.
You can check if a database exists with the exists() method.
db.exists(function(err,exists){if(err){console.log('error',err);}elseif(exists){console.log('the force is with you.');}else{console.log('database does not exists.');db.create();/* populate design documents */}});
If you want to get a specific revision for that document, you can pass it as the 2nd parameter to get().
Cradle is also able to fetch multiple documents if you have a list of ids, just pass an array to get:
db.get(['luke','vader'],function(err,doc){...});
Querying a view
db.view('characters/all',function(err,res){res.forEach(function(row){console.log("%s is on the %s side of the force.",row.name,row.force);});});
You can access the key and value of the response with forEach using two parameters. An optional third parameter will return the id like this example.
db.view('characters/all',function(err,res){res.forEach(function(key,row,id){console.log("%s has view key %s.",row.name,key);});});
To use View Generation Options you can use the view Method with three parameters (viewname, options, callback):
db.view('characters/all',{group:true,reduce:true},function(err,res){res.forEach(function(row){console.log("%s is on the %s side of the force.",row.name,row.force);});});
Querying a row with a specific key
Lets suppose that you have a design document that you've created:
If you want all the cars made by Ford with a model name between Rav4 and later (alphabetically sorted). In CouchDB you could query this view directly by making an HTTP request to:
In the options object you can also optionally specify whether or not to group and reduce the output. In this example reduce must be false since there is no reduce function defined for the cars/byMakeAndModel. With grouping and reducing the options object would look like:
Note that when saving a document this way, CouchDB overwrites the existing document with the new one. If you want to update only certain fields of the document, you have to fetch it first (with get), make your changes, then resave the modified document with the above method.
If you only want to update one or more attributes, and leave the others untouched, you can use the merge()method:
db.merge('luke',{jedi:true},function(err,res){// Luke is now a jedi,// but remains on the dark side of the force.});
Note that we didn't pass a _rev, this only works because we previously saved a full version of 'luke', and thecache option is enabled.
bulk insertion
If you want to insert more than one document at a time, for performance reasons, you can pass an array tosave():
Note: If you must use View Generation Options on your temporary view you can use the three parameter version of the temporaryView() Method - similar to the one described above.
creating validation
when saving a design document, cradle guesses you want to create a view, mention views explicitly to work around this.
db.save('_design/laws',{views:{},validate_doc_update:function(newDoc,oldDoc,usrCtx){if(!/^(light|dark|neutral)$/.test(newDoc.force))throw{error:"invalid value",reason:"force must be dark, light, or neutral"}}}});
removing documents (DELETE)
To remove a document, you call the remove() method, passing the latest document revision.
If remove is called without a revision, and the document was recently fetched from the database, it will attempt to use the cached document's revision, providing caching is enabled.
Or if you want to see changes since a specific sequence number:
db.changes({since:42},function(err,list){...});
The callback will receive the list of changes as an Array. If you want to include the affected documents, simply pass include_docs: true in the options.
Streaming
You can also stream changes, by calling db.changes without the callback. This API uses the excellentfollowlibrary from IrisCouch:
In this case, it returns an instance of follow.Feed, which behaves very similarly to node's EventEmitterAPI. For full documentation on the options available to you when monitoring CouchDB with .changes() see the follow documentation.
Attachments
Cradle supports writing, reading, and removing attachments. The read and write operations can be either buffered or streaming
Writing
You can buffer the entire attachment body and send it all at once as a single request. The callback function will fire after the attachment upload is complete or an error occurs
You can use a read stream to upload the attachment body rather than buffering the entire body first. The callback function will fire after the streaming upload completes or an error occurs
Syntax
vardoc=savedDoc// <some saved couchdb document which has an attachment>varid=doc._idvarrev=doc._revvaridAndRevData={id:id,rev:rev}varattachmentData={name:attachmentName// something like 'foo.txt''Content-Type':attachmentMimeType// something like 'text/plain', 'application/pdf', etc.body:rawAttachmentBody// something like 'foo document body text'}varreadStream=fs.createReadStream('/path/to/file/')varwriteStream=db.saveAttachment(idData,attachmentData,callbackFunction)readStream.pipe(writeStream)
When the streaming upload is complete the callback function will fire
Example Attach a pdf file with the name 'bar.pdf' located at path './data/bar.pdf' to an existing document
varpath=require('path')varfs=require('fs')// this document should already be saved in the couchdb databasevardoc={_id:'fooDocumentID',_rev:'fooDocumentRev'}varidData={id:doc._id,rev:doc._rev}varfilename='bar.pdf'// this is the filename that will be used in couchdb. It can be different from your source filename if desiredvarfilePath=path.join(__dirname,'data','bar.pdf')varreadStream=fs.createReadStream// note that there is no body field here since we are streaming the uploadvarattachmentData={name:'fooAttachment.txt','Content-Type':'text/plain'}db.saveAttachment(idData,attachmentData,function(err,reply){if(err){console.dir(err)return}console.dir(reply)},readStream)
Reading
Buffered
You can buffer the entire attachment and receive it all at once. The callback function will fire after the download is complete or an error occurs. The second parameter in the callback will be the binary data of the attachment
You can stream the attachment as well. If the attachment is large it can be useful to stream it to limit memory consumption. The callback function will fire once the download stream is complete. Note that there is only a single error parameter passed to the callback function. The error is null is no errors occured or an error object if there was an error downloading the attachment. There is no second parameter containing the attachment data like in the buffered read example
Example Say you want to read back an attachment that was saved with the name 'foo.txt'. However the attachment foo.txt is very large so you want to stream it to disk rather than buffer the entire file into memory
vardoc=<somesaveddocumentthathasanattachmentwithname*foo.txt*>varid=doc._idvarattachmentName='foo.txt'vardownloadPath=path.join(__dirname,'foo_download.txt')varwriteStream=fs.createWriteStream(downloadPath)varreadStream=db.getAttachment('piped-attachment','foo.txt',function(err){// note no second reply paramterif(err){console.dir(err)return}console.dir('download completed and written to file on disk at path',downloadPath)})readStream.pipe(writeStream)
Removing
You can remove uploaded attachments with a _id and an attachment name