About RADIUS service, and how it works

Q: We have some customers who report that they are not able to log on to our hotspot service because the login page is not displaying.  We see some radius errors in the logs and suspect that there may be some problem with the radius service


A: Radius logon only ever acts as an 'authentication referrer' - it is not involved in any way with maintaining an online session.  It works like this:

1. When a user wants to log on, the credentials are sent to the login device (e.g. pppoe server or hotspot server)
2. the login device sends a request to the radius server: "the user sent me these credentials, what should I do about him?"
3. the radius server replies with 'allow' or 'deny', and if 'allow' it also sends a set of configuration parameters for the session, for example "allow this user to connect, set maximum up/down speeds thus, and kick him off when he exceeds that quota"
4. after user is allowed to connect, the radius server is not involved in any further maintenance of the session.  

In fact, the radius server can crash and burn, and the user will continue to stay connected until logged off by whatever means.  The session can disconnect for a variety of reasons, but it is always the actual logon device that decides to drop the session due to whatever reason, including:

a. the user requests logout
b. the time/download/upload limit (perhaps set by the radius response at login) is exceeded
c. the client drops off the network
d. the inactivity time (if defined) is exceeded
e. the admin forces session termination

etc etc/

There is one other function of radius: accounting

Radius accounting works in a similar manner in that the login server only reports the information to radius at regular intervals (as configured at the login server itself - look for 'accounting update interval', typically set to somewhere between 15 and 60 minutes)  It works something like this:

1. when the user first logs on (perhaps as result of radius authentication) the logon server sends a 'session start' report.
2. session start report contains the username and all the relevant details of the session - what IP address, time of session start, what server name, what username, and so on.
3. if 'accounting updates are sent', then each time the defined interval is passed, the login server sends an 'accounting update' report that repeats all the relevant session info, and also inclues how much time the user has been online for, and how much data bas been downloaded/uploaded since the login.
4. when the session disconnects, the 'session stop' report is sent to radius and includes the final details of online time and data transfer as well as other useful info including /why/ the session is terminated.

Radius takes all these reports and stores the data in an 'accounting table' this accounting table is what you are looking at when you view 'usage report' in the duxAdmin. 

Note that once again, radius has absolutely no involvement in the session other than taking what the login server reports, and storing it in the database.  It never returns anything to the login server other than an acknowledgement that the report was received and processed.  Sometimes, the login server does not know if the report was received, and so it will report it in the logs as a 'warning' message - those are the blue log lines that you show in your snapshots.

There are several reasons why this might happen.

Since radius is UDP based, there is no actual 'connection' between the login server and radius - the protocol is more like 'post mail' than a telephone call.  With a phone call, if anything happens with the communication you know about it immediately - the phone line goes dead.

With post, you write a letter to the correspondent and ask for acknowledgement by reply post.  You know it usually takes one day for your letter to get there, that it may take your correspondent 24 hours to respond, and that it will take about 1 day for the reply article to deliver.  Thus if you do not receive your reply letter after 3-4 days, then you can assume that either:

a. the letter was lost on the way to your correspondent - he never received it
b. your correspondent had some problem processing your request
c. the reply letter was lost on the way back to you.

Which if these is what actually happened?  You have no way of knowing.

Obviously, the third outcome is not as important - so long as the message was received and processed, do you care that you didn't receive the confirmation?

To make your best effort to be sure that the data is received and processed, you can configure your logon server to re-send the radius report after some defined time that a reply is not received.  How long to make that delay will depend a lot on how long it usually takes to receive a reply.  Snail mail correspondence with a colleague in Bagdad will probably take a lot longer than with a colleague across town! ;-)

Also, if your correspondent is busy replying to potentially hundreds or thousands of letters day, it might take a lot longer to respond than someone who deals with just a couple per day.

Looking under 'status' tab of your radius server entry in winbox tool, you can see what is the round-trip-time of the most recent packet, in milliseconds.  That time includes the full round trip, including transit of your report to radius, time for the radius server to process the request, and time for the response to arrive back to the login server.

Watch that value over a period of time - perhaps write a script to keep a running log, and calculate the max, min and average times.  Set the 'radius time-out' value accordingly so that after the time-out is reached, you can be relatively sure that the accounting report was lost.

If you set the time too short, then radius server might start to receive duplicate reports!  Although this is not usually a problem to validity of data (radius is smart enough to detect, and deal with duplicates) but it can place undue load on the radius server and can even cause an escalating congestion on the server - more packets to arrive, increases response time, increase in response time causes more packets to arrive! :-)

All up, you will note that there is nothing in this that will force a connection to drop - therefore, if there is a problem with sessions failing, then it is highly unlikely to be related to radius.

Hope that helps clarify the radius situation - seems like those problems are something else.  Can we get some idea of the time-brackets where this login page problem is happening?