PDA

View Full Version : TGNS servers need serious work



Agamemnon
01-18-2008, 08:45 PM
Ok, so yet another server crash on a full server at 5:30 on 1/18. Once again, server 1 crashed and server 2 got unplayable amounts of lag immediately afterwards. Lately it seems like every time the server fills up it crashes shortly afterwards. Is there anyone who has both the server access and the time to try to fix this? One approach would be to remove all the plugins and then start adding them back until we've found the culprit. The only problem with that approach is it may be a cumulative problem, we may just have more going on then the server can handle. If that's the case, we may need to put some thought into which plugins we really want to keep, and which don't really add anything to the tgns experience.

In any case, the first step is to figure out who has the ability to add and remove plugins (or otherwise diagnose the problems) and is willing to spend some time trying to save the server?

lush
01-19-2008, 06:22 AM
Just for fun, let's shutdown the second server for a bit, and see if that helps, that will immediately tell us if it's a load issue (which it shouldn't be, but it's an easy test)

Agamemnon
01-19-2008, 01:41 PM
That's a good idea, can any admin do this? I can't remember the last time there were people on both servers, and if there's any chance it may fix the problem we should try this immediately, since it costs us nothing.


Just for fun, let's shutdown the second server for a bit, and see if that helps, that will immediately tell us if it's a load issue (which it shouldn't be, but it's an easy test)

Pokerface
01-20-2008, 08:43 PM
I was thinking something similar.

Seed #1; #2 will be going down soon (tonight probably).

Agamemnon
01-21-2008, 04:52 PM
So server 2 was down last night, and we had a full server on #1 without crashing (though running balance does still generate a large lag spike, and occasionally kicks half the server). Overall I'd call the experiment successful since we had a full server for a couple of hours with no crashes for the first time in recent memory. However, today server 2 is back up. Any particular reason for this?

Morganan
01-21-2008, 07:10 PM
I thought the server was pretty laggy last night even though it didn't crash.

Dr. Pepper PhD.
01-21-2008, 10:21 PM
Only thing that i remember being changed about the balance was Wyz making it so no two players could use balance at the same time or within a certain interval.

Pokerface
01-21-2008, 11:40 PM
I stopped the serverdoc process; it probably came back up on a reboot. Since the experiment went well though, I'll remove the second server's process from startup.

Dr. Pepper PhD.
01-26-2008, 03:53 AM
So I mentioned to agy about the possibilities of hardware problems and he said if so we are ****ed... why can't we just tell theplanet to fix it, aren't we paying them for that

Second question...

What are the specs on the server box being used to host TGNS???

I am looking into upgrading my line at my house for a personal business, and might be able to accommodate setting up a comp as a place for TGNS that would cost the community no moneys. All depends on the bandwidth/usage and the server specs, I might be able to throw a machine together that I could keep operational for an extended period of time.

Seeing as how I have come back to being a community player again this would make it so the interested players could still play. Also if there is any problem with the server I would be willing to give the other more active players a number to reach me at should I not be home for said problems. I used to run a CS server with what I believe to run at a 110 ping to a regular from denmark on my cable line so I cannot imagine that it would take that much more.

The line I am looking at would be about 128K up more than I will utilize and would be able to set up the company firewall behind a main router that the TG server would be off of. This is all a floating idea for now though seeing as how we cannot confirm a hardware problem yet.

EDIT: Also I would want in no way to change anything about TGNS, it would be for the community as the current server all changes are made by TG Admins, so only ideas from the community and changes that are made by admins would be done, like I said I would want it to be TGNS just in a diff server location

Apophis
01-26-2008, 06:30 AM
The hardware tests clean. It's a software issue.

128k is a far cry from the bandwidth we need/require for TG servers. Thanks for the offer, but we're all set.


So I mentioned to agy about the possibilities of hardware problems and he said if so we are ****ed... why can't we just tell theplanet to fix it, aren't we paying them for that

Second question...

What are the specs on the server box being used to host TGNS???

I am looking into upgrading my line at my house for a personal business, and might be able to accommodate setting up a comp as a place for TGNS that would cost the community no moneys. All depends on the bandwidth/usage and the server specs, I might be able to throw a machine together that I could keep operational for an extended period of time.

Seeing as how I have come back to being a community player again this would make it so the interested players could still play. Also if there is any problem with the server I would be willing to give the other more active players a number to reach me at should I not be home for said problems. I used to run a CS server with what I believe to run at a 110 ping to a regular from denmark on my cable line so I cannot imagine that it would take that much more.

The line I am looking at would be about 128K up more than I will utilize and would be able to set up the company firewall behind a main router that the TG server would be off of. This is all a floating idea for now though seeing as how we cannot confirm a hardware problem yet.

EDIT: Also I would want in no way to change anything about TGNS, it would be for the community as the current server all changes are made by TG Admins, so only ideas from the community and changes that are made by admins would be done, like I said I would want it to be TGNS just in a diff server location

Dr. Pepper PhD.
01-27-2008, 04:25 AM
The hardware tests clean. It's a software issue.


Great... that is really what I wanted to hear... its just easier this way honestly.

Agamemnon
01-27-2008, 12:54 PM
Because shutting down server two eliminated the crashes and significantly reduced the frequency of the lag spikes, I think we can be fairly certain that the "software issue" is the total amount of load being placed on the server. Is there any way to find out what other games are being run on the same server, and in particular if something was added in late december? If something was added at that time, it's basically the cause of all our problems.

Dr. Pepper PhD.
01-27-2008, 03:51 PM
NS is its own box (for the time being).

From other thread

Pokerface
01-28-2008, 07:17 PM
Because shutting down server two eliminated the crashes and significantly reduced the frequency of the lag spikes, I think we can be fairly certain that the "software issue" is the total amount of load being placed on the server. Is there any way to find out what other games are being run on the same server, and in particular if something was added in late december? If something was added at that time, it's basically the cause of all our problems.
Nothing changed. Two instances of NS eat maybe 30% of the box's CPU.

Dr. Pepper PhD.
01-29-2008, 02:50 AM
Ok... so agy and I spoke for a while on the server tonight and came up with a couple ideas...

Does everyone remember the windows updates that shut the server down a couple months ago... well it seems that that is the only occurence of change that we can remember in that time frame... could you do a system restore back before that date to see if that helps... then disable whatever update came?

strip the server into a folder that has no directory related to the running server (for backup purposes) then reload hlds and the ns server files... just to check if we can get a steady unlaggy server going... as you go one step at a time add back the ban list... followed by the plugins one by one til you notice a jump in lag... or any lag at all... with all hopes a fresh clean install will fix whatever problem it was

lastly would be a full format... which i know would take a lot of admin time to accomplish and no one wants to see... but its a fool proof software test of cleaning out windows and redoing all the updates to date prior to loading the hlds... that would get us to a clean install with clean updates... (from time to time updates on long running systems can be a bit sloppy... depending on whats loaded and what the update affects)

as for that we did notice wyz there tonight and hope that we can find and fix the problem and get back to our beautiful community

Wyzcrak
01-31-2008, 11:45 PM
we did notice wyz there tonight and hope that we can find and fix the problem
I'll be giving this more attention in the next week than I've given it in the last month. As I'm just beginning that, I have no idea what the "fix" will be (also, others will likely be more deserving of thanks than I'll be once it's fixed). No promises.

ChopStick
02-05-2008, 04:03 AM
Thanks Wyz.

Agamemnon
02-10-2008, 03:56 AM
So we discovered tonight that the commander chat is lagging the server again. Not sure if that was fixed and is now broken again, or whether it just stayed broken and we didn't have comms typing enough to see the connection.

Also, as "next week" has come and gone, has there been any progress on the reinstall yet? Or does the reappearance of the comm chat lag mean that we reverted to a backup that had that problem?

Wyzcrak
02-10-2008, 05:17 PM
I've given this more thought and no more effort in the last week.

Let's take baby steps. I've just disabled the comm chat. Given how long we've been on that, it's hard to swallow it as the cause, but what's the harm in trying? ... especially given how time-consuming it is to investigate network issues (my main suspect).

Agamemnon
02-10-2008, 06:38 PM
So of the various suggestions that have been made, the common theme seems to be that the first step in trying to do something about this is a clean reinstall, starting with as few plugins as we possibly can, and slowly re-adding whichever we feel we just can't live without.

I would say just having the Admin Icons and Goodbye plugins would be enough to start with, as that would still allow us to exert some control over what goes on in the server. Just for testing purposes, it would probably be good to have a couple days with absolutely no plugins, just the fresh reinstall, so that we can verify reasonable server performance with nothing on it.

So to summarize, the first of these "baby steps" is the clean reinstall without any plugins added.

Apophis
02-10-2008, 06:42 PM
So of the various suggestions that have been made, the common theme seems to be that the first step in trying to do something about this is a clean reinstall, starting with as few plugins as we possibly can, and slowly re-adding whichever we feel we just can't live without.

I would say just having the Admin Icons and Goodbye plugins would be enough to start with, as that would still allow us to exert some control over what goes on in the server. Just for testing purposes, it would probably be good to have a couple days with absolutely no plugins, just the fresh reinstall, so that we can verify reasonable server performance with nothing on it.

So to summarize, the first of these "baby steps" is the clean reinstall without any plugins added.

That's not a baby step, it's treating dandruff by decapitation. It would also eliminate your ability to find the root cause of the problem and would only be effective at eliminating the effects of the root cause.

Wyzcrak
02-10-2008, 09:49 PM
I've used a clean install before to treat similar problems. Decapitation does keep you from learning the root cause, but between my limited time and the seemingly sporadic nature of troubleshooting heavily modded servers, I'd be tickled pink right now if we could just get ourselves happy, root cause designation or not. Sadly, though, that approach requires much more work than I'm hoping we'll be able to put into this to solve it.

As it is, I'm better lately for making time commitments than I am keeping the (hence our continued problems).

"We haven't changed anything on the server" sounds good, but it doesn't really mean much (see "sporadic" note above). Nevertheless, the server is largely in a very good (nevermind complex) state (config-wise, performance problems aside), and I'd like to leave as much of that in place as we can manage, as I certainly do not have the time (short of me hitting a full-on "nerd night" the likes of which I haven't had since Evan was born) to recreate from the ground floor (even by adding files "back" one at a time -- that is an oversimplification in some -- not all -- details) the many, many tweaks that make the server what it is today.

Agamemnon
02-11-2008, 01:13 AM
That's not a baby step, it's treating dandruff by decapitation. It would also eliminate your ability to find the root cause of the problem and would only be effective at eliminating the effects of the root cause.

Over the course of the last month and a half we've shown ourselves clearly unable to diagnose the root cause. If eliminating the effects will yield a functioning server with decent performance, I say bring out the guillotine.

If the problem is truly a software problem with one of the plugins, the quickest way to determine that would be reinstalling half at a time, effectively doing a binary search. Essentially, at each step the process is:

If server is functioning correctly, add half of the last removed plugins. (With the starting set of last removed plugins being all of them)

If server stops functioning correctly, remove half of the last added plugins.

If removing a plugin is as easy as adding one, this process could certainly be done in reverse starting from the full server. However, starting from a vanilla server with none of the plugins lets us immediately determine whether the problem is or is not caused by a plugin.

Dr. Pepper PhD.
02-11-2008, 03:08 AM
I know my opinion here matters little since i have not the money to donate yet, but it seems a real turnoff to the TG community to let a server go 2 months with a known problem and doing little to fix it.

I understand that our(TGNS) admins are extremely busy with their lives and that them being the knowledgeable ones on the plugins and the like would be the prime ones to fix it. But if wyz still thinks its a networking issue then why hasn't this been checked thoroughly.

Although I must wonder that if it were outside the host location between them and us then why is TGNS the only server being affected. So that in my mind leaves software, or internal networking of our specific server within theplanet's farm. Has this been addressed with theplanet? The only thing i am aware that has been addressed(with them) is that hardware is good.

I have been a member of this community for a while and love it. If I had the choice of still getting my butt handed to me for 7 hours on TGNS or dominating people of lower calibers in BAD and G4B2S, I'd pick TGNS everytime. I try to do my part. Ask aga he gets a message from me almost daily "SEEDING TGNS" because I wanna help how I can. It just seems like this community is being put out to pasture with so little being done about an important issue for gameplay.

If I can help I will, here are my ideas for root problems:
1. Networking issue within theplanet
2. Problem with a windows update that came about around the beginning of december(the beginning of the lag)
3. Other software loaded on the server box, intentional or uninvited(virus, spyware, etc.)
4. Adjustment made to plugins around the december area, only one to my knowledge was the adaptation to the balance command (No two commands can be set off close to one another resulting in the kick of half the server). Do not recall when this was made but was done in the last two months.

Other than that nothing else would logically make sense as other things would be happening that would give different feedback (lag, error logs, etc.) One thing I have noticed is the most recent lag i have noticed was during implementation of whichbot... lagged the other night with 7 or 8 people and 11 bots... extremely violent lag too.

Sorry if i am more a pain than a help... i just want my home back.... :(

blu.knight
02-11-2008, 07:23 AM
Maybe it's the end for NS... Maybe we should all convert to TF2 and get a TGTF2 server. :)

We should just yell at Flayra to finish NS2 faster.

kormendi
02-11-2008, 08:36 AM
get a TGTF2 server

There already is a TGTF2 server.

Agamemnon
02-11-2008, 03:15 PM
If I can help I will, here are my ideas for root problems:
1. Networking issue within theplanet
2. Problem with a windows update that came about around the beginning of december(the beginning of the lag)
3. Other software loaded on the server box, intentional or uninvited(virus, spyware, etc.)
4. Adjustment made to plugins around the december area, only one to my knowledge was the adaptation to the balance command (No two commands can be set off close to one another resulting in the kick of half the server). Do not recall when this was made but was done in the last two months.


As to number 4, we tried reversing that change. It had no effect on the lag.

TheFeniX
02-11-2008, 09:46 PM
There already is a TGTF2 server.I told you boy don't touch that darn thing.

FYI: I played a bit of NS last night and it did well with 14 people on. Granted, after losing three times in a row I realized it would be more productive to go to sleep, so I didn't hang around for later games.

Wyzcrak
02-11-2008, 10:20 PM
The response to the "why hasn't this been fixed yet" just can't go beyond the NS admins given how little time we've given it and, more importantly, how little we've included anyone past Poker in the conversation. It is negligence, plain and simple.

I'm still working on this. Between my limited time and what I'm guessing is everyone else's boredom hearing about solution attempt after solution attempt that doesn't give relief, I'll not go into the details here and now.

Wyzcrak
02-12-2008, 11:17 PM
Server's down for the night while I poke it with cold needles.

TF2 is DYING.

Wyzcrak
02-13-2008, 08:35 AM
I defrag'd the server's disk last night (literally. me. bit by bit. with notepad.). On paper, the improvement was substantial, but none of that matters if you guys can't tell us that the gameplay is better. So go find out (please :)).

Slayer of Hippies
02-14-2008, 10:04 AM
Apophis, respectfully, I don't know what you know - but NS is a game plagued with the worst yo-yo style hitbox mechanics of probably any game online at this time. So if the posts here seem like overreactions they probably aren't as bad as all that, they're just very frustrated - and rightly so.

blu.knight
02-14-2008, 02:43 PM
Server's down for the night while I poke it with cold needles.

TF2 is DYING.

Is TF2's popularity really waning? I didn't know there was even a TG server for it.

I haven't had access to a gaming pc in two months either. :(
Stupid London.

kormendi
02-14-2008, 05:20 PM
Is TF2's popularity really waning?
Looks like TGU needs to run another session of "Wyz Humor 101"

Dr. Pepper PhD.
02-15-2008, 03:03 AM
I defrag'd the server's disk last night (literally. me. bit by bit. with notepad.). On paper, the improvement was substantial, but none of that matters if you guys can't tell us that the gameplay is better. So go find out (please :)).

Did notice improvements... The balance still kicks half a full server... but other than that no really noticeable lag minor lag still exists though... thank you much for what you have done so far... and I'm hoping I can get a full server most nights

blu.knight
02-15-2008, 04:48 AM
Looks like TGU needs to run another session of "Wyz Humor 101"

Heh, I thought that might be... But then again I've been gone for two months and TF2 could be dying. :P

I'll definitely be hoping on TGNS at least for a game or two when I get back in the states. :)

Wyzcrak
02-16-2008, 06:15 PM
blu.knight is DYING.

I'll focus my attention now on whatever is causing folks to get kicked by execution of the 'balance' command (I think this is unrelated to our recent performance problems, which I'll consider resolved until I hear otherwise).

Dr. Pepper PhD.
02-17-2008, 04:49 AM
Nah today we had lag spikes first in co with bots then later in co without... then again later in ns (obviously also without) can't really pinpoint the trigger if there is a client side trigger

lagarto
02-17-2008, 03:19 PM
It's weird. I haven't noticed any lag at all since last year when we all agreed the server was becoming a Jigglypuff wannabe, with those 250+ ping spikes for everyone. Since then I've seen people complaining about lag whereas I haven't felt any. I don't spend countless hours playing co_ trying to seed the server, but I've still played quite a bit when server gets games going, and the only issue I've found is the balance command (oh, and the commander typing lag, which again I haven't seen recently).

I remember some time ago a bunch of random (including me and Lost) people lagged at random intervals in the server, and by tracing in ms dos we found spikes at the-planet.com or something like that. Eventually it solved itself, and Wyz kept telling us not to complain because it wasn't quite the server's fault. So maybe it's the same this time and you guys are still relating it to last year's problems ?