Two computer performance metrics that you try to tune for are throughput and/or latency. What do these terms mean? I come from a training background and whenever possible try to use a good analogy to describe terms that can sometimes get very detailed and technical in nature.
I am going to use a real life example of a Chiropractor's office. I go to a chiropractor twice a month for adjustments as one of my preventive healthcare measures. He has 4 tables where clients coming in lay face down and then get adjusted. Sometimes many people come in at once and have to sit in one of 7 seats remembering the order they sat down in order to be serviced First in First Out: FIFO. He walks around to each table, performs some adjustments and while that person's body is given time to react, he moves on to another client and does individual adjustments somewhat round-robin to all active clients.
Now to relate this real life example to common queuing theory terms: The service time is the time it takes between laying face down on a table until you get up. This varies between clients depending on the level of adjustments they need but is generally between 7-20 minutes. The response time is the time from when a client enters the doctor's office until the time he leaves the doctor's office. Latency is another term for response time. My latency can sometimes be 35 minutes because I came in at a busy time and I had to wait on one of the 7 seats for 19 minutes and he worked on me for 16 minutes for a total of 35 minutes in his office.
One of my main considerations as a client is my latency in that I have many other things scheduled that day. The doctor's main consideration is throughput which is how many people can he adjust in a day because each one means money to him. If my latency were to jump up to 1 hour on average then I would re-consider the benefit of going to this chiropractor. The reason he has 4 tables is to increase his throughput at the expense of each client's latency because there are times when your body is ready for the next adjustment but he is busy with another client at the moment.
This can be likened to the new Chip Multi-Threading (CMT) processor architectures like Sun's Niagara T2. This processor will schedule 8 threads (clients) on to one core (servicer) in a round robin fashion. Even if the thread is ready he has to wait for the core to get to him. This is why a heavily compute bound single threaded application may not run as fast on this processor as one that runs only one thread until the thread waits for an I/O.
The difference between service time and response time (latency) is queue delays: time spent waiting at one of my chiropractor's 7 seats instead of being actively serviced. If my chiropractor's wife, who is also a licensed chiropractor, is helping him in another adjusting room then that is like two cores and this would on average double the office's throughput. If the arrival rate of the clients exactly matched the service times then this would be the most efficient scenario in that there would be no queue delays, throughput would be optimal making the doctor happy, and the clients would be happy due to decreased latency (besides the adjustment itself.) The 4 tables and 7 seats the doctor uses can also be viewed as buffers which help with flow control.
The model of computer architectures is basically queues of requests that need to be processed. This applies to the CPU, networks, or storage subsystems. The revolutionary Solaris 10 software observability tool, DTrace, can be used to find latency within many components: function call latency, disk I/O latency, wait time for a CPU in order to run (dispatch latency).
If you looked up queuing theory on Wikipedia you would find links to more formal descriptions of this important computer science study including a reference to Leonard Kleinrock, the famous professor at UCLA that invented packet switching (and the queuing theory behind it) which is the whole basis for the Internet. He actually setup the first node on the Internet (then called ARPANET) at UCLA.