Sunday, October 9, 2011

Tomcat: Clustering and Load Balancing with HAProxy under Ubuntu 10.04 - Part 3

Review


In the previous section, we've implemented load balancing using HAProxy and session sharing among our Tomcat instances. In this section, we will examine in-depth the HAProxy configuration file and setup its logging facilities.

Table of Contents


  1. Setting-up the Environment
    • Download Tomcat
    • Configure Tomcat
    • Run Tomcat
    • Download HAProxy
    • Configure HAProxy
  2. Load Balancing
    • Default Setup
    • Sharing Sessions
    • Configure Tomcat to Share Sessions
    • Retest Session Sharing
    • Session Sharing Caveat
    • Sharing Sessions
  3. HAProxy Configuration
    • Configuration File
    • Logging

HAProxy Configuration


When it comes to HAProxy configuration, the best source of information is its online documentation at http://haproxy.1wt.eu/#docs. It's one massive text file of technical information though.

Configuration File


Not all information in that document applies to our configuration. Therefore, I have copied the relevant information only and pasted them as comments per line:

global
log 127.0.0.1 local0
log 127.0.0.1 local1 notice
#Adds a global syslog server. Up to two global servers can be defined. They
#will receive logs for startups and exits, as well as all logs from proxies
#configured with "log global". An optional level can be specified to filter
#outgoing messages. By default, all messages are sent.
#An IPv4 address optionally followed by a colon and a UDP port. If
#no port is specified, 514 is used by default (the standard syslog port).
maxconn 4096
#Sets the maximum per-process number of concurrent connections to <number>. It
#is equivalent to the command-line argument "-n". Proxies will stop accepting
#connections when this limit is reached. The "ulimit-n" parameter is
#automatically adjusted according to this value. See also "ulimit-n"
uid 99
#Changes the process' user ID to <number>. It is recommended that the user ID
#is dedicated to HAProxy or to a small set of similar daemons. HAProxy must
#be started with superuser privileges in order to be able to switch to another
#one. See also "gid" and "user".
gid 99
#Changes the process' group ID to <number>. It is recommended that the group
#ID is dedicated to HAProxy or to a small set of similar daemons. HAProxy must
#be started with a user belonging to this group, or with superuser privileges.
#See also "group" and "uid".
daemon
#Makes the process fork into background. This is the recommended mode of
#operation. It is equivalent to the command line "-D" argument. It can be
#disabled by the command line "-db" argument.
#debug
#NO NEED TO ENABLE - krams
#Enables debug mode which dumps to stdout all exchanges, and disables forking
#into background. It is the equivalent of the command-line argument "-d". It
#should never be used in a production configuration since it may prevent full
#system startup.
#quiet
#NO NEED TO ENABLE - krams
#Do not display any message during startup. It is equivalent to the command-
#line argument "-q".
defaults
log global
#Enable per-instance logging of events and traffic.
#global should be used when the instance's logging parameters are the
#same as the global ones. This is the most common usage. "global"
#replaces <address>, <facility> and <level> with those of the log
#entries found in the "global" section. Only one "log global"
#statement may be used per instance, and this form takes no other
#parameter.
mode http
#Set the running mode or protocol of the instance
#The instance will work in HTTP mode. The client request will be
#analyzed in depth before connecting to any server. Any request
#which is not RFC-compliant will be rejected. Layer 7 filtering,
#processing and switching will be possible. This is the mode which
#brings HAProxy most of its value.
option httplog
#Enable logging of HTTP request, session state and timers
option dontlognull
#Enable or disable logging of null connections
retries 3
#Set the number of retries to perform on a server after a connection failure
option redispatch
#Enable or disable session redistribution in case of connection failure
maxconn 2000
#Fix the maximum number of concurrent connections on a frontend
#This value should not exceed the global maxconn
contimeout 5000
#Set the maximum time to wait for a connection attempt to a server to succeed.
clitimeout 50000
#Set the maximum inactivity time on the client side.
#An unspecified timeout results in an infinite timeout, which
#is not recommended. Such a usage is accepted and works but reports a warning
#during startup because it may results in accumulation of expired sessions in
#the system if the system's timeouts are not configured either.
srvtimeout 50000
#Set the maximum inactivity time on the server side.
#balance roundrobin
#NO NEED TO ENABLE. IT'S THE DEFAULT - krams
#The load balancing algorithm of a backend is set to roundrobin when no other
#algorithm, mode nor option have been set
frontend http-in
bind *:80
#Define one or several listening addresses and/or ports in a frontend
default_backend servers
#Specify the backend to use when no "use_backend" rule has been matched
backend servers
option httpchk OPTIONS /
#Enable HTTP protocol to check on the servers health
option forwardfor
#Enable insertion of the X-Forwarded-For header to requests sent to servers
#Since HAProxy works in reverse-proxy mode, the servers see its IP address as
#their client address. This is sometimes annoying when the client's IP address
#is expected in server logs. To solve this problem, the well-known HTTP header
#"X-Forwarded-For" may be added by HAProxy to all requests sent to the server.
stats enable
#Enable statistics reporting with default settings
stats refresh 10s
#Enable statistics with automatic refresh
stats hide-version
#Enable statistics and hide HAProxy version reporting
stats scope .
# Enable statistics and limit access scope
stats uri /admin?stats
#Enable statistics and define the URI prefix to access them
stats realm Haproxy\ Statistics
#Enable statistics and set authentication realm
#<realm> is the name of the HTTP Basic Authentication realm reported to
#the browser. The browser uses it to display it in the pop-up
#inviting the user to enter a valid username and password.
stats auth admin:pass
#Enable statistics with authentication and grant access to an account
cookie JSESSIONID prefix
#Enable cookie-based persistence in a backend
#server <name> <address>[:port] [param*]
#Please refer to section 5 for more details.
server tomcat1 127.0.0.1:8080 cookie JSESSIONID_SERVER_1 check inter 5000
server tomcat2 127.0.0.1:8180 cookie JSESSIONID_SERVER_2 check inter 5000
#Declare a server in a backend
#server <name> <address>[:port] [param*]
#<param*> is a list of parameters for this server. The "server" keywords
#accepts an important number of options and has a complete section
#dedicated to it. Please refer to section 5 for more details.
view raw haproxy.cfg hosted with ❤ by GitHub

Take note of the following parts:
  • frontend http-in: We're telling HAProxy to listen to HTTP requests
  • default_backend servers: We declare a set of backend servers
  • stats uri /admin?stats: This is the URL to the stats page, relative to your hostname
  • stats realm Haproxy\ Statistics: This is the server name you see when you login to the stats page.
  • server tomcat1 127.0.0.1:8080 cookie JSESSIONID check inter 5000: Defines a server. In this case, a Tomcat server. Here we assigned the IP and port number.

HAProxy Logging


Logging is crucial in any serious application, and HAProxy has facilities to log its activities.
However, to setup one requires extra effort because to enable logging in HAProxy we need to know
Linux's logging facilities via the Syslog server and take into account the Syslog implementation in Ubuntu Lucid (10.04).

What is Syslog?
syslog is a utility for tracking and logging all manner of system messages from the merely informational to the extremely critical. Each system message sent to the syslog server has two descriptive labels associated with it that makes the message easier to handle. - Source: Quick HOWTO : Ch05 : Troubleshooting Linux with syslog

To enable logging, we need to:
  • Add a logging facility in haproxy.cfg
  • Add the logging facility to Syslog server

Add a logging facility in haproxy.cfg
Edit the haproxy.cfg file:
sudo gedit /etc/haproxy/haproxy.cfg

And declare the following:
global
log 127.0.0.1 local0
log 127.0.0.1 local1 notice
view raw haproxy.cfg hosted with ❤ by GitHub


We declared two logging facilities under the global section. Both facilities will send their log output to the Syslog server which is located at 127.0.0.1. The default port is 514. Each logger has its own unique name: local0 and local1.

Why are they named such? These are local facilities defined by the user to log specific deamons (see What is LOCAL0 through LOCAL7 ?).

Remember an optional level can be specified to a filter. Hence, local1 has an extra argument: notice. This means local1 will only capture logs with notice level as opposed to all, i.e. errors, debugs.

Reload haproxy by running the following command:
sudo haproxy -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid -sf $(cat /var/run/haproxy.pid)

This command will not restart HAProxy. It will just reload the configuration file. This is good because you won't be killing active connections. If you get a missing file i.e /var/run/haproxy.pid or other errors, just kill the haproxy process and restart it:
kill -9 #####
where ##### is the process id


Add the logging facility to Syslog server
There are two solutions to achieve this.

Solution #1
a. Run
sudo gedit /etc/rsyslog.conf

And declare the following lines at the end:
# Custom log facilities for haproxy
local0.* /var/log/haproxy0a.log
local1.* /var/log/haproxy1a.log
$ModLoad imudp
# load the imudp module for rsyslog
# provides UDP syslog reception
# start UDP server on this port, "*" means all addresses
$UDPServerRun 514
# local IP address (or name) the UDP listens should bind to
$UDPServerAddress 127.0.0.1
view raw rsyslog.conf hosted with ❤ by GitHub

b. Restart syslog server by running:
restart rsyslog

Solution #2
Instead of editing directly the rsyslog.conf, we can declare a separate configuration under /etc/rsyslog.d/ directory. If you inspect carefully the rsyslog.conf, you will see the following comments:

#
# Include all config files in /etc/rsyslog.d/
#
$IncludeConfig /etc/rsyslog.d/*.conf
view raw rsyslog.conf hosted with ❤ by GitHub

This setting will load all *.conf files under /etc/rsyslog.d/ directory.

a. Run
sudo gedit /etc/rsyslog.d/haproxy.conf

And declare the following lines at the end:
# Custom log facilities for haproxy
local0.* /var/log/haproxy0a.log
local1.* /var/log/haproxy1a.log
$ModLoad imudp
# load the imudp module for rsyslog
# provides UDP syslog reception
# start UDP server on this port, "*" means all addresses
$UDPServerRun 514
# local IP address (or name) the UDP listens should bind to
$UDPServerAddress 127.0.0.1
view raw haproxy.conf hosted with ❤ by GitHub


b. Restart syslog server by running:
restart rsyslog

Overflowing Logs


We've setup HAProxy logging. We can see the logs in /var/log/haproxy0a.log and /var/log/haproxy1a.log files. However, we also see them in /var/log/syslog.

This is bad because now we have redundant logs that just eats up space. You don't want that syslog to be polluted with HAProxy logs. That's the reason why we've setup a separate logging facility in the first place.

There are two ways to prevent this unwanted overflow:

Solution #1
1. Run
sudo gedit /etc/rsyslog.d/50-default.conf

And search for the following lines (right after the introductory comments):
auth,authpriv.* /var/log/auth.log
*.*;auth,authpriv.none -/var/log/syslog
view raw 50-default.conf hosted with ❤ by GitHub

And change them as follows:
auth,authpriv.* /var/log/auth.log
*.*;auth,authpriv,local0,local1.none -/var/log/syslog
view raw 50-default.conf hosted with ❤ by GitHub

This means local0 and local1 should not overflow to syslog.

b. Restart syslog server by running:
restart rsyslog

Solution #2
1. Run
sudo gedit /etc/rsyslog.conf

And find the following lines:
# Custom log facilities for haproxy
local0.* /var/log/haproxy0a.log
local1.* /var/log/haproxy1a.log
view raw rsyslog.conf hosted with ❤ by GitHub

And change them as follows:
# Custom log facilities for haproxy
local0.* -/var/log/haproxy0a.log
& ~
local1.* -/var/log/haproxy1a.log
& ~
view raw rsyslog.conf hosted with ❤ by GitHub

The addition of & ~ will prevent the logs designated to local0 from overflowing to other logging facilities.

Note: If you can't find those lines, maybe you've declared your configuration under /etc/rsyslog.d/haproxy.conf. If yes, follow the same steps.

b. Restart syslog server by running:
restart rsyslog

Rotate Logs


We've setup HAProxy logging. We've isolated the logs from overflowing to syslog. However, there's another problem. The HAProxy logs will soon pile-up and consume precious disk space. Gladly, Linux has a way to schedule and reuse the same lgo file and perform compression.

For more info of log rotation in Linux, please see Quick HOWTO : Ch05 : Troubleshooting Linux with syslog: Logrotate.

Again, there are two ways of handling this requirement:

Solution #1
a. Run
sudo gedit /etc/logrotate.d/haproxy

And add the following lines:
/var/log/haproxy*.log
{
rotate 4
weekly
missingok
notifempty
compress
delaycompress
sharedscripts
postrotate
reload rsyslog >/dev/null 2>&1 || true
endscript
}
view raw logrotate.d hosted with ❤ by GitHub

b. Restart syslog server by running:
restart rsyslog

Solution #2
Log rotation with rsyslog from the official rsyslog documentation. This is something I haven't tried yet but if you're willing to experiment, here's the link: http://www.rsyslog.com/doc/log_rotation_fix_size.html. This technique utilizes the output channels.

However, read the following notes:
Output Channels are a new concept first introduced in rsyslog 0.9.0. As of this writing, it is most likely that they will be replaced by something different in the future. So if you use them, be prepared to change you configuration file syntax when you upgrade to a later release.
- http://www.rsyslog.com/doc/rsyslog_conf_output.html

References


The following is a compendium of references that I found interesting to read further:

R: What is LOCAL0 through LOCAL7 ?
L: http://www.linuxquestions.org/questions/linux-security-4/what-is-local0-through-local7-310637/

R: Quick HOWTO : Ch05 : Troubleshooting Linux with syslog
L: http://www.linuxhomenetworking.com/wiki/index.php/Quick_HOWTO_:_Ch05_:_Troubleshooting_Linux_with_syslog

R: rsyslog official site
L: http://www.rsyslog.com/doc/rsyslog_conf.html

R: rsyslog.conf configuration file
L: http://www.rsyslog.com/doc/rsyslog_conf.html

R: UDP Syslog Input Module
L: http://www.rsyslog.com/doc/imudp.html

R: How to keep haproxy log messages out of /var/log/syslog
L: http://serverfault.com/questions/214312/how-to-keep-haproxy-log-messages-out-of-var-log-syslog

R: HAProxy Logging in Ubuntu Lucid
L: http://kevin.vanzonneveld.net/techblog/article/haproxy_logging/

Q: Install and configure haproxy, the software based loadbalancer in Ubuntu
A: http://linuxadminzone.com/install-and-configure-haproxy-the-software-based-loadbalancer-in-ubuntu/

Conclusion


That's it. We've completed our study of HAProxy and Tomcat clustering. We've learned how to setup, configure load balancing, and handle failover. We've also learned the important points when enabling session sharing. We've also studied HAProxy's configuration and logging facilities.

If you want to learn more about web development and integration with other technologies, feel free to read my other tutorials in the Tutorials section.
StumpleUpon DiggIt! Del.icio.us Blinklist Yahoo Furl Technorati Simpy Spurl Reddit Google I'm reading: Tomcat: Clustering and Load Balancing with HAProxy under Ubuntu 10.04 - Part 3 ~ Twitter FaceBook

Subscribe by reader Subscribe by email Share

11 comments:

  1. Thank you very much for such a good tutorial :)

    ReplyDelete
  2. Thanks so much for this tutorial!! it is amazing the way you explain!!
    It is a pity that you no longer continue with this tutorials since time ago.

    ReplyDelete
  3. I would like to say thanks a lot .. This tutorial it is great for me ... Obrigado !!!

    ReplyDelete
  4. Excellent write up! I was stuck in session sharing, followed the steps as mentioned in the article, now working fine!

    ReplyDelete
  5. Thanks for sharing this article. For more Online Shopping Offers Whatsapp Groups you can visit
    Whatsapp groups
    Whatsapp groups links
    Whatsapp groups Invite links

    ReplyDelete
  6. This comment has been removed by the author.

    ReplyDelete
  7. I am very happy to see this post because it is very useful for me because there is so much information in it. I always like to read quality and I'm happy that I got this thing in your post. Thanks for sharing the best article post. visit Zomato coupons Zomato offers Zomato Promocodes

    ReplyDelete
  8. Effective blog with a lot of information. I just Shared you the link below for Courses .They really provide good level of training and Placement,I just Had Spring Classes in this institute , Just Check This Link You can get it more information about the Spring course.


    Java training in chennai | Java training in annanagar | Java training in omr | Java training in porur | Java training in tambaram | Java training in velachery

    ReplyDelete