Performance,Dirty Reads ,Lost Updates and Locks

Recently I came across a very interesting issue, which I feel I should write about it. During load testing of web based application, I came across an issue where in response time for couple of the business process was exceeding a lot compare to SLA’s given to us. Test was almost clean with less than 0.1% error with no failed transactions in it. Server resources were also looking good, no abnormal behavior of JVM/CPU/Memory etc. Still the transaction response time was very high for the couple of business processes.

Let’s call those business processes for the sake of this post as A, B, C. Business process A does adding (Insert) a new record to the database, Business B does editing the record in the same table (We call edit as Update in DB Language) and Process C does downloading the latest information which includes both Process A and Process B. We ran these 3 business processes for about an hour with 1 user each. All the data sets used in load testing for these 3 business processes were unique. These business processes were fairly simple to script and web navigation was also fairly easy. Our job was to test these business processes with 1 user each so as to achieve close to 100 transactions for each process in an hour. Pretty easy task and we ran lot many tests with these processes and in all these tests response time was almost same with very little variance. The results were like puzzle. Thankfully we had HP Dia installed in that environment and we had some instrumentation done for this application. So I thought let’s see what application threads are doing, while I am running the test for this application.HP Diagnostics has a very good feature of showing you live thread count along with thread trace as what each threads are doing at that point of time. So I took the thread dump of the all threads running in this JVM. Believe me, taking thread dump with HP Diagnostic is as easy as clicking the link.

Vow, for the first 10 mins, most of the thread were runnable state, and as slowly users ramped up, then few of the threads went into the waiting state and then few were oscillating between runnable and waiting state. Another 10 mins in the test run, I could see lot many threads in waiting state. Now after seeing threads in waiting state I can understood as why I was seeing a very high response time for the most simple business process which involves nothing but simple insert and update operation. Response time was high because threads were waiting for something and that something was nothing but DB Execute update calls.

Now next puzzle for me was why on earth these threads are waiting at DB Execute calls and that too for cool 10 secs and with so less load (I had close to 17 scripts with one user each per script). Now given the limited access we had on database, and after discussing my findings with application development team, we decided to engage DBA to assist us in finding out the root cause of this wait. At this point I must appreciate the honestly of the application developer who agreed to this idea of engaging the DBA. There are very rare cases where developers agree with performance engineers and that’s reason I must thank this guy.

So we engaged the DBA and ran couple of tests and he collected the database stats during the test. He did some analysis and identified couple of stored procedure and dynamic queries which was in need of tuning. In addition to this he came back and said that response time for these 3 business process were high because, row level locking was getting escalated to table level lock after certain duration of time and again after certain time, some transactions were getting dirty read and still after couple of minutes, table gets locked and does not allow any insert or update. Fair enough analysis.

Hmm I was happy that finally we know as why we were seeing high response time. Our next step was discussing this with key stakeholders about our findings. Lot many things were discussed in the meeting, but the key question which I liked in an hour’s meeting was that of DBA asking “Can this application tolerate the dirty reads “. There was long pause for some time in the meeting and finally after some time, this developer guy said this application cannot tolerate the dirty reads. Hmm there was pause again for about 30 secs, after another 20 secs, another voice said, these 3 transactions are executed less than 100 times in an hour and there exists absolutely very little chance that we see this concurrency in Production. There was again some silence and later on everyone moved on to next topic.

Now the reason as why I feel this is interesting case and deserves some writing on this topic is that we have dirty reads/lost updates/phantom reads that can be found out only by proper load testing and there exists a very high probability that if we ignore this we are sending out incorrect information to the application users in production under load and this incorrect information at times could be used by those users to sue the company back. These are typical cases where data integrity issues takes precedence to performance.So if you cannot redesign the queries, than I would suggest you to sacrifice the performance.These types of issues should become high priority fix at least in cases where we are dealing with financial applications. I also feel that at times these are critical bugs given that we have table level lock and this lock impacts all the users if one user locks the table for any reason.

Maybe later on I will write another short post on dirty read/lost updates. These are interesting cases where only proper load testing reproduces the issue and strong foundational skills of performance engineering helps to identify the root cause of the issue.

.

Comments

Know how shared is your server box and If Server runs with Default Settings

Working on the shared environments or clustered environment is quite a challenge and getting the best performance out of the servers requires that we understand as how servers are been designed or maintained in the infrastructure. If during the load testing, we are continuously seeing a very high response time which continuously exceeds the SLA’s given to us, then it’s time probably we should  start focusing on the shared environment aspects of the application.

Some of the question I often ask are as below,

  • Is the Database the only application running on the server box? If it’s sharing the server box with other enterprise applications like web server etc. , then probably it cannot perform optimally.
  • Does the Database memory/CPU/IO counters looks OK and they seems to align with hardware configuration of the server box.
  • Is the hardware of the server box like BIOS, Disk Controller etc. is up to date with latest patches and latest software. (I Know Microsoft Supports recommend this step always in case of any performance issue and I whole heartedly agree to this approach).
  • How often the virus scanning software is running in the server box. Quite often virus scanning software interrupts the smooth functioning of the database servers by interfering with database file system.Both are IO based application. It makes sense to exclude scanning of key database files to improve the performance of the database.
  • Does the database server have regular maintenance window, how often is database maintenance done. Maintenance activities could include applying patches/server reboots etc.

Having the answers to these questions often helps in eliminating the various roots cause for the slow performance of the application. Once we have eliminated these causes and have confirmed that environment or server box seems to good and still we are seeing slowness in the application, then probably it makes sense to drill down the details of the individual database server configuration,

Quite often it happens that most of the servers are installed with default settings. We all love to click next and next and next button while installing any enterprise software.This habit installing enterprise software later on starts giving nightmare to everyone once the application goes live.Every setting is factory made settings.So it times makes sense to drill and find out those default settings of the servers.

So let’s assume that we have MSSQL 2008 as the database server and we need to know the various default configuration setting that is in place for MSSQL 2008 box like what’s the memory /connection/processor setting etc.,

Assuming that you have SQL Management Studio installed, all you need to do is follow the below steps,

  • Connect to the Database server with Management Studio.
  • Using object explorer of studio, right click on the server node and select properties from the context menu.
  • Server property window will be visible to you. This window will have all the information with regard to connections/memory and various other settings that are in place for that database.

We should be able to see the something like below screen,

MSSQL2008_001

Another way of know the default setting is via system procedure sp_configure

SELECT * FROM sys.configurations

ORDER BY name ;

GO

or with

sp_configure

go

All these information will help us to correlate the high response time from the front end to the backend database and at times helps to scale application performance.When you run this procedure, it will show lot many information ,nearly 60+ settings that impacts the database.So in case if you feel any settings do not match with your application requirement then probably you need to talk to the DBA and explain him your reasoning as why that setting is impacting the performance of the applications.

For example, if the user connection value in database is set to 10 , and during your tests with 30 users, you see that response time are high, then probably highlighting this difference to the DBA might help assuming that at some point of time,it might happens that 10 connection might not be sufficient to serve the peak hour demands of the application.

Comments

Setting Custom Error Page and Internet Explorer Bug

Today I came across one interesting case where in spite of doing a proper set up for custom error page in Tomcat, Internet Explorer 9.0.6 was just refusing to display my custom page.The same custom error page was  displaying fine in other browsers like Google Chrome and Firefox without any issue.But in Internet Explorer it  was showing up its own error page which was looking like below,

 

image

Now of course it does say that its 404 error,but I believe this page is somewhat not good for user experience.Users expect that we display some informative message in case if some thing goes wrong while browsing the site and that message should give them some information about error and what they need to do next and what we are doing to resolve the issue,something like below,

 

 

I was trying to do exactly the same , set up the custom error page, trigger that page by generating some error condition and check if this works in all browsers as expected.

Unfortunately  Internet Explorer seems to have  some problems displaying custom error pages in case if they are light weight in size.In addition to weight of the page, they also has certain rules which they follow while implementing custom error page,

Below is the information I copied from Eric Lawrence blog,

A common question from web developers is: What makes IE decide to show a friendly error page?

The answer is that the server’s response must meet two criteria:

  1. The HTTP Status code must be [400, 403, 404, 405, 406, 408, 409, 410, 500, 501, 505]
  2. The HTTP Response body’s byte length must be shorter than a threshold value

If the server’s response meets both criteria, then IE will show its own Friendly HTTP Error page instead of the server’s terse response.

The byte length thresholds are stored in the registry in HKEY_LOCAL_MACHINE under the subkey \SOFTWARE\Microsoft\Internet Explorer\Main\ErrorThresholds. The default threshold is 256 bytes for the response codes [403, 405, 410] and 512 bytes for response codes [400, 404, 406, 408, 409, 500, 501, 505]. If the registry entry is missing for one of the status codes, its threshold defaults to 512 bytes.

In addition to Eric’s blog, here is another KB article from Microsoft which confirms this information about the behavior of IE.

Though I feel good that we have some workaround for resolving this situation(Thanks to ASF Konstantin for showing some direction here else I would have wasted another 8 hrs trying to figure out the fix/workaround for this ), I feel its more of the bug in Internet Explorer that needs to be fixed or else they have lot of moving parts of last century in Internet Explorer 9.0.x which has not been updated since IE 5 days.

So in case if your custom error page is not getting displayed in Internet Explorer, due to below checked settings,

image

then probably you need to add lot of comments in your custom HTML error page so that size of custom page exceeds the threshold value.

The HTML comments are looks some like,

<!–This is a comment. Comments are not displayed in the browser–>
<!–This is a comment. Comments are not displayed in the browser–>
<!–This is a comment. Comments are not displayed in the browser–>
<!–This is a comment. Comments are not displayed in the browser–>
<!–This is a comment. Comments are not displayed in the browser–>
<!–This is a comment. Comments are not displayed in the browser–>

The reason I feel this is more of the IE bug is that job of the browser is to display whatever server sends and pass the same data to the user without playing around with the response data set.In this case, IE seems to be hiding the custom error message just because the weight of the page for that http status code is less than 512 bytes.I think this behavior of IE unnecessarily increase the size of  bytes that flows in the internet.IE has  close to 70% worldwide share in browser usage, and even 10 bytes increase in response size should be significant unnecessary overhead to network traffic.

Technorati Tags: ,,

Comments

reCaptcha’s,Simple Captcha,Javascript Captcha’s and Testing/Performance Engineering

I know for sure that most of us browsing the internet has at some point of time  come across captcha images and validation.Captcha Images validation techniques are  used with HTML forms in  web sites to prevent automated spams.I had recently implemented one functionality which was using Google reCaptcha technique to prevent automated spams.I had used the Google reCaptcha Java/Jsp plugin for my functionality.While implementing this reCaptcha functionality I came across interesting observations which I feel can help Performance Engineering community since I believe there exists some knowledge gaps in the way performance engineers understands captchas.The reason why I say this is that I have come across few folks misunderstanding and  asking me as how to capture or correlate captchas or at times I have seen people claiming to have defeated the captchas during their load testing assignments , of course without disabling the captchas validation.

Google recaptcha is more of  an web service call to the Google servers and uses some interesting techniques in validating the users input.So in Google reCaptcha’s there is actually 2 things we need to take care, first displaying the captcha in the front end and this can be done using the via below code,and second is verifying the inputs given by the user at the server side by passing required number of the parameters.

<script type="text/javascript"
     src=http://www.google.com/recaptcha/api/challenge?k=yourpublickey>
  </script>
  <noscript>
     <iframe src=http://www.google.com/recaptcha/api/noscript?k=yourpublickey
         height="300" width="500" frameborder="0"></iframe><br>
     <textarea name="recaptcha_challenge_field" rows="3" cols="40">
     </textarea>
     <input type="hidden" name="recaptcha_response_field"
         value="manual_challenge">
  </noscript>

The above code will display the captcha image in your web page and will built container which will have Captcha details in it along with text area field to write the image words.I suggest you use the above code at the head section of the HTML.The reCaptcha should look like below,

 

image

The above captcha consists of 2 words here one of the word is verification word (9th) and another is scanned text from some book which might belong to some ancient book or famous book or flop book or some super technical publication stuff which only Google folks read and this scanned text is then distorted to ensure that no OCR can read it correctly, and once it is confirmed that this text has failed OCR test, only then these types of images are fed in reCaptcha’s image database.

Now in order to fetch these captcha images in the front end, we see minimum 8 calls ,

image

 

One of the calls among these 8 calls has challenge field(recaptcha_challenge_field) which corresponds to the verification word as seen in the front end.Again we just know that its a long unique string generated at server side to uniquely identify the image and its answer.

So if you believe that correlating or capturing this long string means that you have solved captcha puzzle, then probably I would say, you are terribly wrong,read this post further to know why,

Now coming to the server side of validation,Lets use servlets to understand this process,

The first thing to do in the servlet program is to import the below packages in your validating servlet,

import net.tanesha.recaptcha.ReCaptchaImpl;
import net.tanesha.recaptcha.ReCaptchaResponse;

These 2 packages has method definitions to display error message encountered during validation and has method that can show us whether our captcha validation succeeded or failed.

String remoteAddr = request.getRemoteAddr();
ReCaptchaImpl reCaptcha = new ReCaptchaImpl();
        reCaptcha.setPrivateKey(“yourprivatekey”);
        String challenge = request.getParameter(“recaptcha_challenge_field”);
        String uresponse = request.getParameter(“recaptcha_response_field”);
        ReCaptchaResponse reCaptchaResponse = reCaptcha.checkAnswer(remoteAddr, challenge, uresponse);
         System.out.println(“challenge is ” + challenge + “‘”);
         System.out.println(” uresponse is ” + uresponse + “‘”);
         System.out.println(” remoteAddr is ” + remoteAddr + “‘”);

The above piece of code can written in the doPost method or doGet method of the servlet depending on the method type used in the main form.I have added to couple of print statements to ensure that we are indeed getting back the correct values from the front end.

If you look closely at the above code, we can see that there are at least 4 parameters which we need and among these 4 there are some parameters which we are collecting from the users namely

  • Remote IP Address of the client
  • Private Key of the site
  • User Response given by the user after seeing the captcha image at the front end
  • Challenge field.(This is same long string generated at the front end)

After passing  these 4 parameters correctly to the Google reCaptcha servers in the format as expected  by them, only then we say that the user is verifed as Human and not the automated bot.

Now what this all things means,

  • If we have some captchas which are third party, then probably we need unique IP address , in case we decide to test them.
  • There exists an option where we can test captcha as an isolated component given that most captcha solution are server side programs.
  • From the front end, its impossible to read the text embedded in the image since most of the matured captcha libraries present in the market host only images or use images which are distorted and cannot be read by OCR software,so testing captcha from the front end not a technically feasible approach.

So if someone tells you that they have tested the application which had captchas functionality in it, then probably you need to ask, how did they read the text embedded in the image ?

So moving on, is it the right thing to exclude captchas from the testing effort, I would say no, and I have certain reasons for this like

  • Captcha are blocking functionality features which according to me means that failed Captcha can block other working functionality.If captcha goes down for any reasons, user cannot do the submit or just cannot do any transaction on the page that benefits the site.So its loss to the site.
  • Captcha often has the blocking UI.I have seen this at least in my environment with recaptcha where till the captcha is loaded, certain portion of the page remains blank.
  • Since Captcha are server side programs, they suffer from same performance issues as seen in most of the server programs like server unavailable, client abort exception, 503 errors etc.
  • If we are using custom pure client side captcha solution like generating captcha using Javascript, then exists a risk that some day some smart user will just disable the javascript at his end and then do multiple submits to the site.Yes its possible to bypass client side captcha solutions that uses javascript solution with no server side verification .So if we are using this type of solution, then obviously testing is required end to end.

Captcha are interesting techniques and they do help to reduce spams, but they keeping them outside the scope of testing is not the right thing to do.

Comments

Deadlocks in MSSQL 2005 Explained

Have you ever seen the below message during your load testing assignment working in the Microsoft .Net world involving ASP.NET and MSSQL Server,

Msg 1205, Level 13, State 51, Line 1

Transaction (Process ID XX) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.

If yes, then probably you need to read on this complete post because either you are testing with limited set of data or you are really seeing locking issues in your database,so I am hoping  that you will find it interesting to correlate deadlocks or locking contention issues with system performance or response time of the applications,

So let’s start with Microsoft definition of deadlocks which is excellent material to refer,

A deadlock occurs when two or more tasks permanently block each other by each task having a lock on a resource which the other tasks are trying to lock. The following graph presents a high level view of a deadlock state where:

  • Task T1 has a lock on resource R1 (indicated by the arrow from R1 to T1) and has requested a lock on resource R2 (indicated by the arrow from T1 to R2).
  • Task T2 has a lock on resource R2 (indicated by the arrow from R2 to T2) and has requested a lock on resource R1 (indicated by the arrow from T2 to R1).

Diagram showing tasks in a deadlock state

Because neither task can continue until a resource is available and neither resource can be released until a task continues, a deadlock state exists.

The SQL Server Database Engine automatically detects deadlock cycles within SQL Server. The Database Engine chooses one of the sessions as a deadlock victim and the current transaction is terminated with an error to break the deadlock.

Deadlocks often add to the wait time stats and most of the time becomes a silent performance bottlenecks which are hard to identify and troubleshoot, I call them as silent bottlenecks because you never know from the front end or from your load testing scripts as why a simple form insert or search is taking so much time to retrieve a data in spite of giving sufficient filter parameters in the search.

There also exist performance counters which can help to identify whether deadlocks are happening during the test run assuming that you see slowness but not any kind of errors during your test execution.

Below are some of the counters I often see in case its required to monitor for locks,

image

  • Lock wait time (ms) – Total time spend to acquire a lock for all DB Transactions.
  • Lock waits/sec – Number of times user has to wait to acquire a lock. Ideally value of 0 is expected here.
  • Average Wait Time (ms) – Wait time each lock request as to wait. More than 500ms often indicates blocking issues.
  • Lock requests/sec – The number of lock request per sec. High value often indicates that lot of rows are being accessed. So this indicates the application is heavily dependent on DB health. There exists some optimization scope to reduce DB calls.
  • Lock Timeout/sec – Time out that occurs due to locking issues. Sometimes these timeouts manifests as application errors. If you see frequent timeouts in applications during load tests, then probably you need to monitor this counter.
  • Number of deadlocks/Sec: These are pure play deadlocks where in process with low priority is rolled back. Error message shown earlier belongs to this category of locks. If your application handles all types of exceptions, then probably you will not see the above message other than some custom error message saying something went wrong with the application.

The above counters are extremely helpful in identifying the performance issues with regard to Database. In case if you have access to application database, then probably you can also go ahead and fire the below queries in SSMS,

DBCC TRACESTATUS (1222,-1);

GO

and

DBCC TRACESTATUS (1204,-1);
GO

When you execute the above queries in SSMS, it will give you status whether trace 1204/1222 trace flags are on for your database, these are 2 trace flags which needs to turned on for identifying and fixing the root cause of deadlocks.

In order to enable these flags, we can fire the below queries in SSMS,

DBCC TRACEON (1204, -1);
GO

and

DBCC TRACEON (1222, -1);
GO

By turning on both the trace flags ,we get valuable information with regard to resources/spid which are locked and reasons/queries which are locking or creating deadlocks.

As per Microsoft SQL Team,the importance and difference between these flags are when deadlocks occur, trace flag 1204 and trace flag 1222 return information that is captured in the SQL Server 2005 error log. Trace flag 1204 reports deadlock information formatted by each node involved in the deadlock. Trace flag 1222 formats deadlock information, first by processes and then by resources. It is possible to enable both trace flags to obtain two representations of the same deadlock event.

In addition to above methods, I feel SQL Profiler also contains events that can be used to identify deadlocks quite easily and in fact does show right away the sql queries involved in deadlocks.I vividly remember that SQL Server 2000 profiler had this capability to detect deadlocks.

Now the big question,are deadlocks high priority performance defects, I personally feel that deadlocks are not high priority defects unless they are not memory locking contention issues or thread contention issues on the smaller scale application.However they can have big impact if your application has large user base and your application makes lot of DB Calls.Deadlocks are often associated with high wait times which directly contributes to the high latency on the front end side of the application.

Comments

Identify Wait Events in MS SQL 2005 that Impacts Performance of the Application

While working with MSSQL 2005, I suggest having an eye on the wait events of the database server. Wait events happening at the servers often at times are the main reasons as why we quite often see the high response time at the UI Level. If the query is waiting for something at the database level, then the transaction which was supposed to take 2 secs will end up taking 3 secs and quite often unless we have clear understanding and knowledge on the database performance tuning aspects basics, it becomes quite a hard task to identify and fix such performance issues.

Most often in database world, Wait events normally falls in 3 categories,

· Resource Wait: This often is seen when the threads wait for some resources like IO/CPU etc.

· Queue Wait: Occur when a worker thread is idle, waiting for work to fall in its plate.

· External waits: Occurs when waiting for an external event like fetching data from some other table or doing some custom stuff in the box outside DB layer.

Let’s think about what happens when we fire a query to update a row, I feel below sequence of event happens

· The optimizer uses statistics to help decide on the best way to execute the query.

· The query is executed.

· The row will be read from disk into memory.(Exactly the reason as why I always say DB action are memory bound)

· The update will be written to the transaction log.

· The row will be changed in memory.

· Confirmation will be sent back to the client.

Wait events can happen at any of these above steps and in fact they are can identified by the help of pulling data out from the DMV Wait stats tables and correlating these information with Perf Mon counters.

Wait events timing at the sql server level can be identified by querying below DMV’s in MS SQL 2005,

· sys.dm_exec_requests

· sys.dm_os_waiting_tasks

· sys.dm_os_wait_stats

Let me give a you a quick walkthrough of each of the DMV’s above and illustrate as how we can identify some issues,

sys.dm_exec_requests – Session Level Information

This DMV gives information at session level, i.e at the higher level as which SPID is consuming high CPU, or having high waits times. System level processes and queries which are running parallel are often hard to detect with just this DMV’s.However this DMV’s gives some information on Wait Events.

Below is copy pasted information from the MS SQL team site which I feel is much more easier to understand with regard to this DMV,

Each SQL Server session has a unique session_id in the system DMV sys.dm_exec_requests. The stored procedure sp_who2 provides a list of these sessions in addition to other connection information such as command, resource, wait types, wait time, and status. User queries will have a session_id > 50. Common status values are ‘running’, ‘runnable’ and ‘suspended’, as described in the Execution Model discussion. A session status of ‘Sleeping’ indicates SQL Server is waiting for the next SQL Server command.

PS: sp_who2 can be run with SSMS studio, just type sp_who2 in query window and execute the command. It gives you very valuable information of the running processes at the Database level at that particular point of time.

sys.dm_os_waiting_tasks – All Waiting Tasks

This DMV gives information about the entire waiting task that is waiting for some reason. As long as tasks are waiting for something, they can seen here by querying these DMV’s.Again below is the MS explanation of this DMV’s which I find is quite easy to understand,

The waiter list that shows all waiting sessions and the reasons for the waits can be found in the DMV sys.dm_os_waiting_tasks. The session_id, wait type, and associated wait time can be seen. In addition, if the session is blocked from acquiring a lock, the session holding (known as blocking) the lock as well as the blocked resource is shown in the columns blocking_session_id and resource.

Below is the one of the SP which I had in my library which I often use in case I have access to DB,

SELECT wt.*,

st.text

FROM sys.dm_os_waiting_tasks wt LEFT JOIN sys.dm_exec_requests er

ON wt.waiting_task_address = er.task_address

OUTER APPLY sys.dm_exec_sql_text(er.sql_handle) st

ORDER BY wt.session_id

sys.dm_os_wait_stats – Aggregated time by Wait Type

This DMV is an aggregation of all wait times from all queries since SQL Server started and is recommended approach from MS for server wide tuning.

Below is the explanation of this DMV from MS,

Sys.dm_os_wait_stats is the DMV that contains wait statistics, which are aggregated across all session ids since the last restart of SQL Server or since the last time that the wait statistics were reset manually using DBCC SQLPERF (‘sys.dm_os_wait_stats’, CLEAR). Resetting wait statistics can be helpful before running a test or workload.

Anytime a session_id waits for a resource, the session_id is moved to the waiter list along with an associated wait type. The DMV sys.dm_os_waiting_tasks shows the waiter list at a given moment in time. Waits for all session_id are aggregated in sys.dm_os_wait_stats. While collecting information from this DMV, its often requires that we clear stats and run the test with before and after change,

Stats can be cleared by running in SSMS,

DBCC sqlperf (‘sys.dm_os_wait_stats’, clear)

The stored procedures track_waitstats_2005 and get_waitstats_2005 can be used to measure the wait statistics for a given workload.

Here is the track_waitstats_2005 SP

SELECT request_session_id AS Session,

resource_database_id AS DBID,

Resource_Type,

resource_description AS Resource,

request_type AS Type,

request_mode AS Mode,

request_status AS Status

FROM sys.dm_tran_locks

Here is the get_wait stats_2005 SP,

if exists (select * from sys.objects where object_id = object_id(N’[dbo].[get_waitstats_2005]‘) and OBJECTPROPERTY(object_id, N’IsProcedure’) = 1)

    drop procedure [dbo].[get_waitstats_2005]

GO

CREATE proc [dbo].[get_waitstats_2005] (@report_format varchar(20)=’all’, @report_order varchar(20)=’resource’)

as

– this proc will create waitstats report listing wait types by percentage.  

–     (1) total wait time is the sum of resource & signal waits, @report_format=’all’ reports resource & signal

–    (2) Basics of execution model (simplified)

–        a. spid is running then needs unavailable resource, moves to resource wait list at time T0

–        b. a signal indicates resource available, spid moves to runnable queue at time T1

–        c. spid awaits running status until T2 as cpu works its way through runnable queue in order of arrival

–    (3) resource wait time is the actual time waiting for the resource to be available, T1-T0

–    (4) signal wait time is the time it takes from the point the resource is available (T1)

–          to the point in which the process is running again at T2.  Thus, signal waits are T2-T1

–    (5) Key questions: Are Resource and Signal time significant?

–        a. Highest waits indicate the bottleneck you need to solve for scalability

–        b. Generally if you have LOW% SIGNAL WAITS, the CPU is handling the workload e.g. spids spend move through runnable queue quickly

–        c. HIGH % SIGNAL WAITS indicates CPU can’t keep up, significant time for spids to move up the runnable queue to reach running status

–     (6) This proc can be run when track_waitstats is executing

– Revision 4/19/2005

– (1) add computation for CPU Resource Waits = Sum(signal waits / total waits)

– (2) add @report_order parm to allow sorting by resource, signal or total waits

set nocount on

declare @now datetime, @totalwait numeric(20,1), @totalsignalwait numeric(20,1), @totalresourcewait numeric(20,1)

    ,@endtime datetime,@begintime datetime

    ,@hr int, @min int, @sec int

if not exists (select 1 from sysobjects where id = object_id ( N’[dbo].[waitstats]‘) and OBJECTPROPERTY(id, N’IsUserTable’) = 1)

begin

        raiserror(‘Error [dbo].[waitstats] table does not exist’, 16, 1) with nowait

        return

end

if lower(@report_format) not in (‘all’,'detail’,'simple’)

    begin

        raiserror (‘@report_format must be either ”all”,”detail”, or ”simple”’,16,1) with nowait

        return

    end

if lower(@report_order) not in (‘resource’,'signal’,'total’)

    begin

        raiserror (‘@report_order must be either ”resource”, ”signal”, or ”total”’,16,1) with nowait

        return

    end

if lower(@report_format) = ‘simple’ and lower(@report_order) <> ‘total’

    begin

        raiserror (‘@report_format is simple so order defaults to ”total”’,16,1) with nowait

        select @report_order = ‘total’

    end

select  @now=max(now),@begintime=min(now),@endtime=max(now)

from [dbo].[waitstats] where [wait_type] = ‘Total’

— subtract waitfor, sleep, and resource_queue from Total

select @totalwait = sum([wait_time_ms]) + 1, @totalsignalwait = sum([signal_wait_time_ms]) + 1 from waitstats 

–where [wait_type] not in (‘WAITFOR’,'SLEEP’,'RESOURCE_QUEUE’, ‘Total’, ‘***total***’) and now = @now

where [wait_type] not in (‘CLR_SEMAPHORE’,'LAZYWRITER_SLEEP’,'RESOURCE_QUEUE’,'SLEEP_TASK’,'SLEEP_SYSTEMTASK’,'Total’,'WAITFOR’, ‘***total***’) and now = @now

select @totalresourcewait = 1 + @totalwait – @totalsignalwait

– insert adjusted totals, rank by percentage descending

delete waitstats where [wait_type] = ‘***total***’ and now = @now

insert into waitstats select ‘***total***’,0,@totalwait,0,@totalsignalwait,@now

select ‘start time’=@begintime,’end time’=@endtime

        ,’duration (hh:mm:ss:ms)’=convert(varchar(50),@endtime-@begintime,14)

        ,’report format’=@report_format, ‘report order’=@report_order

if lower(@report_format) in (‘all’,'detail’)

    begin

—– format=detail, column order is resource, signal, total.  order by resource desc

    if lower(@report_order) = ‘resource’

    select [wait_type],[waiting_tasks_count]

        ,’Resource wt (T1-T0)’=[wait_time_ms]-[signal_wait_time_ms]

        ,’res_wt_%’=cast (100*([wait_time_ms] – [signal_wait_time_ms]) /@totalresourcewait as numeric(20,1))

        ,’Signal wt (T2-T1)’=[signal_wait_time_ms]

        ,’sig_wt_%’=cast (100*[signal_wait_time_ms]/@totalsignalwait as numeric(20,1))

        ,’Total wt (T2-T0)’=[wait_time_ms]

        ,’wt_%’=cast (100*[wait_time_ms]/@totalwait as numeric(20,1))

    from waitstats 

    where [wait_type] not in (‘CLR_SEMAPHORE’,'LAZYWRITER_SLEEP’,'RESOURCE_QUEUE’,'SLEEP_TASK’,'SLEEP_SYSTEMTASK’,'Total’,'WAITFOR’)

    and now = @now

    order by ‘res_wt_%’ desc

—– format=detail, column order signal, resource, total.  order by signal desc

    if lower(@report_order) = ‘signal’

    select [wait_type],[waiting_tasks_count]

        ,’Signal wt (T2-T1)’=[signal_wait_time_ms]

        ,’sig_wt_%’=cast (100*[signal_wait_time_ms]/@totalsignalwait as numeric(20,1))

        ,’Resource wt (T1-T0)’=[wait_time_ms]-[signal_wait_time_ms]

        ,’res_wt_%’=cast (100*([wait_time_ms] – [signal_wait_time_ms]) /@totalresourcewait as numeric(20,1))

        ,’Total wt (T2-T0)’=[wait_time_ms]

        ,’wt_%’=cast (100*[wait_time_ms]/@totalwait as numeric(20,1))

    from waitstats 

    where [wait_type] not in (‘CLR_SEMAPHORE’,'LAZYWRITER_SLEEP’,'RESOURCE_QUEUE’,'SLEEP_TASK’,'SLEEP_SYSTEMTASK’,'Total’,'WAITFOR’)

    and now = @now

    order by ‘sig_wt_%’ desc

—– format=detail, column order total, resource, signal.  order by total desc

    if lower(@report_order) = ‘total’

    select [wait_type],[waiting_tasks_count]

        ,’Total wt (T2-T0)’=[wait_time_ms]

        ,’wt_%’=cast (100*[wait_time_ms]/@totalwait as numeric(20,1))

        ,’Resource wt (T1-T0)’=[wait_time_ms]-[signal_wait_time_ms]

        ,’res_wt_%’=cast (100*([wait_time_ms] – [signal_wait_time_ms]) /@totalresourcewait as numeric(20,1))

        ,’Signal wt (T2-T1)’=[signal_wait_time_ms]

        ,’sig_wt_%’=cast (100*[signal_wait_time_ms]/@totalsignalwait as numeric(20,1))

    from waitstats 

    where [wait_type] not in (‘CLR_SEMAPHORE’,'LAZYWRITER_SLEEP’,'RESOURCE_QUEUE’,'SLEEP_TASK’,'SLEEP_SYSTEMTASK’,'Total’,'WAITFOR’)

    and now = @now

    order by ‘wt_%’ desc

end

else

—- simple format, total waits only

    select [wait_type],[wait_time_ms]

            ,percentage=cast (100*[wait_time_ms]/@totalwait as numeric(20,1))

    from waitstats 

    where [wait_type] not in (‘CLR_SEMAPHORE’,'LAZYWRITER_SLEEP’,'RESOURCE_QUEUE’,'SLEEP_TASK’,'SLEEP_SYSTEMTASK’,'Total’,'WAITFOR’)

    and now = @now

    order by percentage desc

—- compute cpu resource waits

select ‘total waits’=[wait_time_ms],’total signal=CPU waits’=[signal_wait_time_ms]

    ,’CPU resource waits % = signal waits / total waits’=cast (100*[signal_wait_time_ms]/[wait_time_ms] as numeric(20,1)), now

from [dbo].[waitstats]

where [wait_type] = ‘***total***’

order by now

GO

exec [dbo].[get_waitstats_2005] @report_format=’detail’,@report_order=’resource’

 

The stored procedure get_wait stats_2005 reports the wait types that are collected by track_wait stats_2005. The get_wait stats_2005 procedure can be run during the execution of track_waitstats or after track_wait stats is completed. Running get_wait stats_2005 during the execution of track_waitstats_2005 will return a report of intermediate results while running get_waitstats_2005 at the conclusion of track_waitstats_2005 will return the final wait statistics report. The report provides a detailed picture of different wait types during the time measured, and the accumulated wait time for each.

Get_waitstats_2005 reports information about waits. Total wait time is composed of resource waits and signal waits. Resource waits are computed by subtracting signal waits from total waits. Because signal waits represent the amount of time spent waiting in the runnable queue for CPU resources, they are a measure of CPU pressure. The application blueprints identify the significance CPU pressure by comparing signal waits with total waits

PS: Before you run the above SP, make sure you have wait table in the database, or you can run the below SP to create one for you. (This SP is also pulled out from MS site),

if exists (select * from sys.objects where object_id = object_id(N’[dbo].[track_waitstats_2005]‘) and OBJECTPROPERTY(object_id, N’IsProcedure’) = 1)
    drop procedure [dbo].[track_waitstats_2005]
go
CREATE proc [dbo].[track_waitstats_2005] (@num_samples int=10
                                ,@delay_interval int=1
                                ,@delay_type nvarchar(10)=’minutes’
                                ,@truncate_history nvarchar(1)=’N’
                                ,@clear_waitstats nvarchar(1)=’Y')
as

– This stored procedure is provided “AS IS” with no warranties, and confers no rights. 
– Use of included script samples are subject to the terms specified at http://www.microsoft.com/info/cpyright.htm

– T. Davidson
– @num_samples is the number of times to capture waitstats, default is 10 times
– default delay interval is 1 minute
– delaynum is the delay interval – can be minutes or seconds
– delaytype specifies whether the delay interval is minutes or seconds
– create waitstats table if it doesn’t exist, otherwise truncate
– Revision: 4/19/05 
— (1) added object owner qualifier
— (2) optional parameters to truncate history and clear waitstats
set nocount on
if not exists (select 1 from sys.objects where object_id = object_id ( N’[dbo].[waitstats]‘) and OBJECTPROPERTY(object_id, N’IsUserTable’) = 1)
    create table [dbo].[waitstats] 
        ([wait_type] nvarchar(60) not null, 
        [waiting_tasks_count] bigint not null,
        [wait_time_ms] bigint not null,
        [max_wait_time_ms] bigint not null,
        [signal_wait_time_ms] bigint not null,
        now datetime not null default getdate())
If lower(@truncate_history) not in (N’y',N’n')
    begin
    raiserror (‘valid @truncate_history values are ”y” or ”n”’,16,1) with nowait    
    end
If lower(@clear_waitstats) not in (N’y',N’n')
    begin
    raiserror (‘valid @clear_waitstats values are ”y” or ”n”’,16,1) with nowait    
    end
If lower(@truncate_history) = N’y' 
    truncate table dbo.waitstats
If lower (@clear_waitstats) = N’y' 
    dbcc sqlperf ([sys.dm_os_wait_stats],clear) with no_infomsgs — clear out waitstats
 
declare @i int,@delay varchar(8),@dt varchar(3), @now datetime, @totalwait numeric(20,1)
    ,@endtime datetime,@begintime datetime
    ,@hr int, @min int, @sec int
select @i = 1
select @dt = case lower(@delay_type)
    when N’minutes’ then ‘m’
    when N’minute’ then ‘m’
    when N’min’ then ‘m’
    when N’mi’ then ‘m’
    when N’n’ then ‘m’
    when N’m’ then ‘m’
    when N’seconds’ then ‘s’
    when N’second’ then ‘s’
    when N’sec’ then ‘s’
    when N’ss’ then ‘s’
    when N’s’ then ‘s’
    else @delay_type
end
if @dt not in (‘s’,'m’)
begin
    raiserror (‘delay type must be either ”seconds” or ”minutes”’,16,1) with nowait
    return
end
if @dt = ‘s’
begin
    select @sec = @delay_interval % 60, @min = cast((@delay_interval / 60) as int), @hr = cast((@min / 60) as int)
end
if @dt = ‘m’
begin
    select @sec = 0, @min = @delay_interval % 60, @hr = cast((@delay_interval / 60) as int)
end
select @delay= right(’0′+ convert(varchar(2),@hr),2) + ‘:’ + 
    + right(’0′+convert(varchar(2),@min),2) + ‘:’ + 
    + right(’0′+convert(varchar(2),@sec),2)
if @hr > 23 or @min > 59 or @sec > 59
begin
    select ‘delay interval and type: ‘ + convert (varchar(10),@delay_interval) + ‘,’ + @delay_type + ‘ converts to ‘ + @delay
    raiserror (‘hh:mm:ss delay time cannot > 23:59:59′,16,1) with nowait
    return
end
while (@i <= @num_samples)
begin
            select @now = getdate()
            insert into [dbo].[waitstats] ([wait_type], [waiting_tasks_count], [wait_time_ms], [max_wait_time_ms], [signal_wait_time_ms], now)    
            select [wait_type], [waiting_tasks_count], [wait_time_ms], [max_wait_time_ms], [signal_wait_time_ms], @now
                from sys.dm_os_wait_stats
            insert into [dbo].[waitstats] ([wait_type], [waiting_tasks_count], [wait_time_ms], [max_wait_time_ms], [signal_wait_time_ms], now)    
                select ‘Total’,sum([waiting_tasks_count]), sum([wait_time_ms]), 0, sum([signal_wait_time_ms]),@now
                from [dbo].[waitstats]
                where now = @now
            select @i = @i + 1
            waitfor delay @delay
end
— create waitstats report
execute dbo.get_waitstats_2005
go
exec dbo.track_waitstats_2005 @num_samples=20
                                ,@delay_interval=30
                                ,@delay_type=’s’
                                ,@truncate_history=’y’
                                ,@clear_waitstats=’y’
go

 

The above queries  I hope will help you to identify the wait events that consuming more of your resources and contributing to the latency of your application.Wait events and Queues often go hand in hand, when we see queuing, it means that that something somewhere is waiting for some resources, and above SP’s from MS does the good job of identifying those events that are waiting for something.

Comments

How often do you reboot your Servers

Ok here is this post all about, Server Reboots and  benefits of periodic reboots and how it improves the performance of the system overall.

So let’s start with some background, below are some of the questions or feelings which I feel you might as a performance engineer have either consciously or subconsciously faced it at some point of your career,

1. Did you ever have gut feeling (you get this feeling when you have spent years load testing applications) that application is alright in performance, yet the performance results for the tests looks bad?

2. Did you ever feel that for some reasons you are investigating issues for which there exists no sensible root cause? Yes quite often HP Diagnostics shows us the call method or box which is failing but again when we query logs; failure rates for those URL’s are very less or negligible.

3. We are spending hell lot of time on bugs/chasing application owners to fix the performance issues and application development team most often comes back and says “It works for me in less than second in Dev environment with same set of data” or “ it’s already working in production and response time are in milliseconds”.

4. Application response time is insanely having wide variance with same test setup across various days in spite of having powerful environment that matches the Production in almost all sense.

5. Did you ever feel that environment seems to be just too complicated/big to monitor and seems to be having more than 60+ known servers or boxes in all and Y number of unknown servers with 10’s of tickets implemented on the environment on the daily basis across the set of applications.(These are sometimes called hot deployments which do not require reboots,just un deploy the app war file and redeploy it again on the server with updated version yet at times, if there is badly written code, it leaves orphaned connections JDBC Connections threads or http connector threads).There always exists some application on daily basis for which either some dll is missing or is having incorrect version or has some kind of fix to be deployed.

6. Did you ever feel that lot of software’s are installed and uninstalled on daily basis in the environment and those software’s are quite often a profilers/debugger tools /Framework patches/Security Patches etc. etc.

If your answer to all of these questions is yes, then probably you are testing in the environment where servers has been up and running since more than 3 weeks or maybe a month or might have large set of boxes which has not been rebooted since last 6 months or so or simply its an environment which is managed badly with no health check been done at all.

Well its good thing that boxes are up and running for long time which tells us that these servers can provide us 99.99% availability in terms of service availability. In fact most of the server manufactures sells the servers with these SLA’s and I feel that’s precise reason as why there is no official policy or guidelines on Server Reboots .Below is the one of the comment from the person I follow and who at some point of time I believe was associated with Sun Microsystems,

There is no official policy at Sun regarding rebooting – mainly because we sell enterprise class machines. The intent of an enterprise class machine is to stay up at all cost. Our enterprise class servers have the ability to add and remove memory, IO, and CPU without a reboot. I know it is common practice to reboot “windows” based machines on a regular schedule, but this simply does NOT apply to Enterprise class Sun servers.
For sake of discussion, I will post what I think our server reboot policy should entail.
Purposed Sun Server Reboot Policy:
“Only reboot Sun servers when installing SW or HW that requires a reboot. It is not necessary to reboot servers on a regular schedule like Windows servers.”

Well looking back at his comments, he is saying reboot only when SW/HW is installed that requires the reboot. Fair enough comment given that they sell servers which needs to be up and running all the time under all circumstances. In fact I have known some Solaris boxes which were up and running for quite a long period of time, maybe a month I can recollect. Again does this mean that Solaris boxes does not require a reboot for a month even though we are installing software’s that doesn’t require reboots ?

I would say No, We still need a periodic reboots for Solaris assuming that we are doing lot of testing on those boxes. There always exists one or other issues in TCP Stack which leaves connection thread hanging(Thread abort issues) or there always exists memory leak caused by the bad code which is not released very soon(Yes sometimes we detect memory leaks and sometimes its left over, given that memory is cheap now or we just too busy or lazy) or its just some piece of software which gets uninstalled might just leave some  settings messed up.

Now moving forward let’s talk about the Windows Boxes, Windows servers are one of the best in terms of usability and UI.Everything is so easy to find and fix in windows with sexy GUI. However I have seen most admins often saying that more the windows servers we have in environment more strictly we need to adhere to periodic maintenance of those servers. Another reason as why every install or uninstall of software quite often requires a reboot is that there exists a risk that maybe some shared DLL or registry setting might be left over which might slow down the server drastically(eg Profiler traces). This is my own observation that installing or uninstalling the software from the windows boxes often leaves back orphaned traces in the system and their impact could be major or minor depending on what software we have installed or uninstalled. One can clearly see these by installing CCleaner software which has capability to clean the registry for orphaned keys.If you are installing the software on windows boxes and if the software is using any of the windows win32 dll, then without rebooting, the installed software might not work as desired given that shared DLL’s do not accept registration in case if some process is holding lock on it.Only reboots in most of the cases resolves those issues.(You can confirm this statement with process monitor).

I would say that Server reboots needs to done at periodic intervals depending on the amount of load your systems handles and its best thing to do at all times but again I suggest don’t do it just to hide problems or apply this process as band aid.It does not hurt to have some down time and take proper care of the servers.

So next time if you find someone selling enterprise software/servers and telling you that his server/software installation/uninstallation does not require reboots, then probably you need to dig deeper and ask for some solid technical proofs to back up their statements.

As a performance engineer, I will always suggest, Periodic reboots of the servers in the environment is good thing to do.

PS: Server reboots do not mean that you shut down and restart the box, it means following right process for shutting down the box by making sure that box is ready to be shut down.

Comments

Performance Counters to Identify Memory Issues in MS SQL Database

While monitoring the MS SQL Server I suggest keeping an eye on Buffer Manager/Memory Manager counter which can help us in identifying the memory contention issues in the MSSQL Database. The memory contention issues are quite easier to find in case we have single database hosted on the box and it becomes really hard job when the box is shared by multiple database and memory consumed by each database varies. Below are some of the steps which Microsoft recommends while working with Memory issues in MS SQL Database.

The following steps should help you troubleshoot memory errors.

· Verify if the server is operating under external memory pressure. If external pressure is present, try resolving it first, and then see if the problem/errors still exist.

· Start collecting performance monitor counters for SQL Server: Buffer Manager, SQL Server: Memory Manager.

· Verify the memory configuration parameters (sp_configure), min memory per query, min/max server memory, awe enabled, and the Lock Pages in Memory privilege. Watch for unusual settings. Correct them as necessary. Account for increased memory requirements for SQL Server 2005.

· Check for any non-default sp_configure parameters that might indirectly affect the server.

· Check for internal memory pressures.

· Observe DBCC MEMORYSTATUS output and the way it changes when you see memory error messages.

· Check the workload (number of concurrent sessions, currently executing queries).

Now since in most cases performance team never has access to database, so I suggest monitoring the database specific counter to identify memory contention issues. Below is short description of Buffer Manager Performance Object as given in MSDN Site,

The Buffer Manager object provides counters to monitor how SQL Server uses:

· Memory to store data pages, internal data structures, and the procedure cache.

· Counters to monitor the physical I/O as SQL Server reads and writes database pages.

Monitoring the memory and the counters used by SQL Server helps you determine:

· If bottlenecks exist from inadequate physical memory. If it cannot store frequently accessed data in cache, SQL Server must retrieve the data from disk.

· If query performance can be improved by adding more memory, or by making more memory available to the data cache or SQL Server internal structures.

· How often SQL Server needs to read data from disk. Compared with other operations, such as memory access, physical I/O consumes a lot of time. Minimizing physical I/O can improve query performance.

We need to keep in mind that Database operations are most often are memory intensive and at OS level there exists lot pull/push efforts done by various processes in case if the box is short in memory or is shared database server.

In addition to Available Bytes, Page/second, Paging file usage patterns, below counters also helps in some analysis,

SQLServer: Buffer Manager — Page Life Expectancy: It is number of seconds a page will stay in the buffer pool without references. The longer the page life expectancy, the healthier the server looks from a memory perspective. A server suffering from memory pressure will typically see page life expectancy values of 200 seconds or below. If we find this, we’ll have reasonable grounds to suspect a low memory condition.

SQLServer: Buffer Manager — Buffer cache hit ratio: Percentage of pages found in the buffer cache (Physical Memory) without having to read from disk. The ratio is the total number of cache hits divided by the total number of cache lookups over the last few thousand page accesses. After a long period of time, the ratio moves very little. Because reading from the cache is much less expensive than reading from disk, you want this ratio to be high. Generally, you can increase the buffer cache hit ratio by increasing the amount of memory available to SQL Server. If data pages are not found in the buffer, SQL Server must read them into the buffer from disk. This is usually a slow process because of disk latency and seek times. Even on a fastest storage drives, the time to read a page from disk compared with time to read a page from memory is many multiples greater. Buffer cache hit ratio less than 90% is often considered as some issues on the database side and needs to investigated.

SQLServer: Buffer Manager: Stolen pages: Stolen pages are those pages in memory which are stolen by another process. Servers which are experiencing memory pressure will typically show high quantities of stolen pages relative to the total target pages.

SQLServer: Memory Manager — Memory Grants Pending: Memory grants pending is effectively a queue of processes awaiting a memory grant. In general, if you have any processes queuing waiting for memory, you should expect degraded performance. The ideal situation for a healthy server is no outstanding memory grants.

SQLServer: Buffer Manager — Checkpoint pages/sec: It is number of pages flushed to disk per second by a checkpoint or other operation that require all dirty pages to be flushed.The SQL Server checkpoint operation requires all dirty pages to be written to disk. The checkpoint process is expensive in terms of disk input/output (I/O). When a server is running low on memory the checkpoint process will occur more frequently than usual as SQL Server attempts to create space in the buffer pool. If you observer sustained high checkpoint pages/second compared with normal rates for your server, it’s a good indication of a low memory condition.

SQLServer: Buffer Manager — Lazy writes/sec: Number of buffers written per second by the buffer manager’s lazy writer. The lazy writer is a system process that flushes out batches of dirty, aged buffers (buffers that contain changes that must be written back to disk before the buffer can be reused for a different page) and makes them available to user processes. The lazy writer eliminates the need to perform frequent checkpoints in order to create available buffers. This counter records the number of times per second that SQL Server relocates dirty pages from the buffer pool (in memory) to disk. Again, disk I/O is expensive and you should attempt to provide SQL Server with enough space for the buffer pool that lazy writes are as close to zero as possible. If you’re seeing lazy writes of 20 per second or more, then you can be sure the buffer pool isn’t big enough.

SQLServer: Buffer Manager — Total Pages: It is number of pages in the buffer pool (includes database, free, and stolen pages).This Buffer Manager/Total Pages counter will expose the total number of pages acquired by SQL Server.

SQLServer: Buffer Manager — Target Pages: Ideal number of pages in the buffer pool. This Buffer Manager/Target Pages counter records the ideal number of pages required for the SQL Server Buffer Pool.

If the target and total pages values are the same, SQL Server has sufficient memory. If the target is greater than the total, it’s likely that some external memory pressure (usually another Windows process) is preventing SQL Server acquiring as much memory as it would like to operate.

All these information often helps in troubleshooting hard to find performance issues specially in environments where performance team do not have access to query DMV’s or system tables to gather direct and precise information from the database.

Sometimes I feel monitoring and making sense of all these data is so boring, but again most of the time as a performance engineer we don’t have a choice or any other option given that getting an direct access to system tables is almost impossible in most projects for various reasons.

Comments

Java Performance Series – 2

Coming back to Java performance series, let me write something about various x/xx (thank god they didn’t come up with xxx stuff) options that comes bundled with java virtual machine. To begin with some valuable information copy pasted from oracle site,

Below is the some key things which we need to keep in mind always as we move forward with JDK,

  • Options that begin with -X are non-standard (not guaranteed to be supported on all VM implementations), and are subject to change without notice in subsequent releases of the JDK.
  • Options that are specified with -XX are not stable and are not recommended for casual use. These options are subject to change without notice.

Oracle classifies various VM options into below 3 categories,

  • Behavioral options change the basic behavior of the VM.
  • Performance tuning options are knobs which can be used to tune VM performance.
  • Debugging options generally enable tracing, printing, or output of VM information.

There is whole lot of information that can be pulled out of virtual machine which we can use to either identify/diagnose or get information from VM and analyze as how performance of the VM looks like. Some of the options needs to be set via server xml files and some can be pulled out via JMX. Most of the monitoring tools that are in market today uses JMX to pull the information from the JVM.

So let’s understand some of the various key gc debugging options flags that can used to pull Garbage collection stats from the java virtual machine. Please note that usage and command line differs across the oracle jdk /IBM jdk that is one option flag working perfectly in oracle JDK might not work or exist in IBM JDK. (Thank God BEA was taken over by Oracle, else we might have BEA JROCKKIT JVM as well to know and understand).

Below are the Oracle JDK Flags that are used to print the GC Information

-XX:+PrintGCTimeStamps prints a time stamp representing the number of seconds since the HotSpot VM was launched until the garbage collection occurred.

-XX:+PrintGCDetails provides garbage collector-specific statistics and thus varies depending on the garbage collector in use.

-Xloggc :< filename> directs the garbage collection information to the file named <filename>.

-XX:+PrintGCDateStamps Prints the garbage collection information with current date and timestamps. Information with this flag looks like below,

2012-04-08T02:57:10.213-0200: [GC [PSYoungGen: 623648K->323422K (300226K)]

192109K->123528K (13136848K), 0.534183 secs]

[Times: user=3.23 sys=0.09, real=0.32 secs]

The date stamp field, 2012-04-08T02:57:10.213-0200, uses the ISO 8601 date and time stamp. The output has the following form: YYYY-MM-DDTHH-MM-SS.

mmm-TZ where:

YYYY is the four-digit year.

MM is the two-digit month; single-digit months are prefixed with 0.

DD is the two-digit day of the month; single-digit days are prefixed with 0.

T is a literal that denotes a date to the left of the literal and a time of day to the right.

HH is the two-digit hour; single-digit hours are prefixed with 0.

MM is the two-digit minute; single-digit minutes are prefixed with 0.

SS is the two-digit second; single-digit seconds are prefixed with 0.

SS is the two-digit second; single-digit seconds are prefixed with 0.

mmm is the three-digit milliseconds; single- and two-digit milliseconds are prefixed with 00 and 0, respectively.

TZ is the time zone offset from GMT.

IBM JDK uses the below flags for getting information about garbage collection stats

-verbose: gc

Prints out information about garbage collections to standard out.

-Xverbosegclog :< path to file><filename [, X, Y]>

Prints out information about garbage collections to a file. If the integers X and Y are specified, the output is redirected to X files each containing output from Y GC cycles.

Having all these kinds of information gives us a proper understanding as how people often collect and troubleshoot garbage collection issues.

In the next post, I will try to explain some more key options that we need to know about java performance tuning. I will try to see if I can explain as how we can also use those command lines in real time with some example. (I thought of pulling out some GC numbers from my laptop, but there exists a risk that I might corrupt my project files , I am not yet full time developer J ,need to take copy/backup of my current application).

Comments

Basic File Upload with Apache Commons

Below is the piece of servlet code which you are try out in case if you even has a requirement for implementing a basic uploads in your website with Apache Commons File Upload library.

/*
* To change this template, choose Tools | Templates
* and open the template in the editor.
*/
package test;

import java.io.File;
import java.io.IOException;
import java.io.PrintWriter;
import java.util.Iterator;
import java.util.List;

import javax.servlet.ServletConfig;
import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

import org.apache.commons.fileupload.FileItem;
import org.apache.commons.fileupload.FileUploadException;
import org.apache.commons.fileupload.disk.DiskFileItemFactory;
import org.apache.commons.fileupload.servlet.ServletFileUpload;
import org.apache.commons.io.FilenameUtils;

public class UploadPDF extends HttpServlet {

    private static final String TMP_DIR_PATH = “D:\\temp”;
    private File tmpDir;
    private static final String DESTINATION_DIR_PATH = “D:\\UploadedPDF”;
    private File destinationDir;

    @Override
    public void init(ServletConfig config) throws ServletException {
        super.init(config);
        tmpDir = new File(TMP_DIR_PATH);
        if (!tmpDir.isDirectory()) {
            throw new ServletException(TMP_DIR_PATH + ” is not a directory”);
        }
        String realPath = getServletContext().getRealPath(DESTINATION_DIR_PATH);
        destinationDir = new File(DESTINATION_DIR_PATH);
        if (!destinationDir.isDirectory()) {
            throw new ServletException(DESTINATION_DIR_PATH + ” is not a directory”);
        }

    }

    @Override
    protected void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
        PrintWriter out = response.getWriter();
        response.setContentType(“text/plain”);

        DiskFileItemFactory fileItemFactory = new DiskFileItemFactory();
        /*
         *Set the size threshold, above which content will be stored on disk.
         */
        fileItemFactory.setSizeThreshold(1 * 1024 * 1024); //1 MB
        /*
         * Set the temporary directory to store the uploaded files of size above threshold.
         */
        fileItemFactory.setRepository(tmpDir);

        ServletFileUpload uploadHandler = new ServletFileUpload(fileItemFactory);

        try {
            /*
             * Parse the request
             */
            List items = uploadHandler.parseRequest(request);
            Iterator itr = items.iterator();
            while (itr.hasNext()) {
                FileItem item = (FileItem) itr.next();
                /*
                 * Handle Form Fields.
                 */
                String fileName = item.getName();

                if (item.isFormField()) {
                    out.println(“File Name = ” + item.getFieldName() + “, Value = ” + item.getString());
                } else {
                    //Handle Uploaded files.
                    out.println(“Field Name = ” + item.getFieldName()
                            + “, File Name = ” + item.getName()
                            + “, Content type = ” + item.getContentType()
                            + “, File Size = ” + item.getSize()
                            + “, boolean isInMemory = ” + item.isInMemory());
                    /*
                     * Write file to the location.
                     */

                    out.println(“File Name is = ” + FilenameUtils.getName(fileName));
                    File file = new File(destinationDir, FilenameUtils.getName(fileName));
                    out.println(destinationDir);
                    out.println(item.getName());                 
                    item.write(file);
                }
                out.close();
            }
        } catch (FileUploadException ex) {
            log(“Error encountered while parsing the request”, ex);
        } catch (Exception ex) {
            log(“Error encountered while uploading file”, ex);
        }

    }
}

Please make sure that you have apache commons IO library in your container lib folder or common io jar files in your application lib folder.In addition to commons IO library you will also need to download file upload jar files and place them in webapps lib folder.All these jars are the must for this code to run.More details on the location of these files can be found here.

In addition to having all the dependencies imported to your project,you also have make sure that proper encoding method is set in your form’s page that is request encoding type should be set as

enctype=”multipart/form-data”

In addition to this we also need to ensure that filename/file size/content is properly send by the client(browsers).Sometimes it might happen that due to some security policies, all these information might be blocked so I suggest to check for these possibilities in case if you find any issues.We can check this by giving couple of print statements and making sure that none of the values are coming out as null.

At the time of writing this post, I have checked that this code works with IE9/FF10/Safari5.03/Opera 10x.

Technorati Tags: ,

Comments

« Previous entries Next Page » Next Page »