Jump to content

Sudden RIS Slowdown/Crash Problem?


jfield

Recommended Posts

I am having a new problem with my Windows 2003 SP1 RIS server, a Dell Poweredge 2650 with a 2.8GHz CPU and 2 gigs of RAM on 100/1000 megabit switched network. I used to be able to RIS as many systems as I wanted, but now at about 15 systems my server will slow to a crawl, and then eventually drop network connections entirely. Sometimes, the "System" process will be using 100% of the CPU. I haven't made any changes other than Windows updates. Has anyone had anything similar to this?

Thanks,

-Jeff

Link to comment
Share on other sites


Symantec Antivirus Corporate Edition Server 10.1 was recently put on there in place of 9, I'm leaning towards that. The Machine also has Backup Exec 10d on it and I'm making sure the latest updates are installed for that. The machine receives critical updates through WSUS automatically.

Leaning towards Norton at this point. On the last crash, it indicated there was a memory leak somewhere (Event 2019) so I'm going to try to replicate the problem yet again and track the memory usage.

-Jeff

Link to comment
Share on other sites

The SYSTEM process is a display of system kernel-level activity, which means it's most likely a driver running in kernel. Since antivirus software filter drivers run in kernel, these are a BIG no-no on a RIS server in general, or at least not against the RIS volume running the SIS-groveler service (also done via a kernel filter driver). A/V on a RIS server saps performance, can cause I/O lockups, system instability, crashes, and other oddball behavior.

Again I'll say it - if you run real-time A/V on servers that are heavily used, you WILL eventually have load issues. Especially if you use those nasty /3GB or /PAE switches (or heaven forbid, both at the same time - yuck!).

Link to comment
Share on other sites

A little more information and a couple more questions..

I'm certain its a memory leak as I'm getting Event 2019 repeatedly from the Srv process. My problem is I don't see any huge increases in my pages/nonpaged pool, my thread count or my file handles.

handle count 20892

pool nonpaged 884792

pool paged 3227756

thread count 916

pf usage .97gb

Top Process:

lsass 255mb

System hanbdles 2207 threads 66

Lsass seems a bit big to me, but 256 mb isn't all that big.

When this problem happens, I can't even run ping because its out of memory. Interestingly, the system still shows a gigabyte of memory "available", and only shows the page file have 970mb or so used.

The event:

Event Type: Error

Event Source: Srv

Event Category: None

Event ID: 2019

Date: 9/11/2006

Time: 7:54:28 AM

User: N/A

Computer: xxxx

Description:

The server was unable to allocate from the system nonpaged pool because the pool was empty.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

Data:

0000: 00 00 04 00 01 00 54 00 ......T.

0008: 00 00 00 00 e3 07 00 c0 ....ã..À

0010: 00 00 00 00 9a 00 00 c0 ....š..À

0018: 00 00 00 00 00 00 00 00 ........

0020: 00 00 00 00 00 00 00 00 ........

0028: 18 00 00 00 ....

I've got that repeated since we imaged about 25 systems on Saturday until this morning when I came in and rebooted. I've downgraded the antivirus to the version we used last year, but I've always excluded the "RemoteInstall" folder - do I need to exclude the entire RIS volume?

I wish I had the money to seperate out our services, this box handles a lot for us, but its never been a problem in the past, last year I could images 50-100 machines a day and the server would never flinch, so it has to be a change I made or an update made. I'm still leaning towards the AV software, we'll see how the downgrade goes. It is also running a fully updated backup exec 10d, thats my next likely suspect I think.

Any other thoughts? Should I disable AV scanning on the entire volume the RIS images are stored on, or is excluding RemoteInstall enough?

Thanks,

-Jeff

Edited by jfield
Link to comment
Share on other sites

If you're still getting 2019s, let me know and I'll assist you in using a poolmon script to garner the culprit (hint - it's almost always the antivirus).

By the way, have you updated symevent.sys on that machine to the latest version? Symevent.sys is Symantec Antivirus' kernel-mode filter driver, and they do update it regularly to deal with issues that arise (including kernel memory usage issues, etc). Also, if you're running Symantec Antivirus, check to see if you have a file on your system called symtdi.sys - if you do, that's the likely culprit. Uninstall the Email Scanning components of SAV to get rid of that little nasty file - it's known to cause kernel memory leaks in nonpaged pool (those 2019s you're getting indicate a leak or a huge increase at intervals in nonpaged pool).

Symevent.sys:

http://service1.symantec.com/SUPPORT/ent-s...998092408260848

Symtdi.sys:

http://service1.symantec.com/SUPPORT/ent-s...ment&seg=hm

Edited by cluberti
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...