Follow Magicmen Mens For Mne's Fashion,Style ,Dating,Sex Follow for APTITUDE ,REASONING, DATA INTERPRETATION ,GENERAL KNOWLEDGE google.com, pub-7799856554595592, DIRECT, f08c47fec0942fa0 p anc - Mission exam

p anc

Share This

Probability Distributions and Statistics
  1. INTRODUCTION
  2. BASIC STATISTICS
    1. STANDARD DEVIATION
    2. VARIANCE
  3. RANDOM VARIABLE
    1. DEFINITION
    2. TYPES OF RANDOM VARIABLES
    3. PROBABILITY MASS FUNCTION
    4. PROBABILITY DENSITY FUNCTION
    5. CUMULATIVE DISTRIBUTION FUNCTION
    6. EXPECTATION
    7. VARIANCE
    8. STANDARD DEVIATION
  4. DISCRETE PROBABILITY DISTRIBUTIONS
    1. BINOMIAL DISTRIBUTION
    2. POISSON DISTRIBUTION
    3. HYPERGEOMETRIC DISTRIBUTION
  5. CONTINUOUS PROBABILITY DISTRIBUTIONS
    1. UNIFORM DISTRIBUTION
    2. NORMAL DISTRIBUTION
  6. DETERMINING DISTRIBUTIONS
  7. SUMMARY OF DISTRIBUTIONS
  8. COEFFICIENT OF CORRELATION AND LINEARITY
    1. DEFINITION
    2. INTERPRETATION
  9. COEFFICIENT OF VARIATION
    1. DEFINITION
    2. APPLICATIONS
Probability Distributions and Statistics

  1. INTRODUCTION
Statistics is a science that deals with the collection, classification, analysis, and interpretation of data or numerical facts.
Statistics is extensively used in business for market research and risk analysis. Probability forms an important part of statistics. We can use probability to forecast results based on past data.
Questions on probability distributions and statistics appear in exams like FMS, IIFT and JMET. Every year there are at least two or three questions from this topic, thereby making it an important one. It is also an important part of the MBA curriculum.
  1. BASIC STATISTICS
  1. STANDARD DEVIATION
The standard deviation is a “measure of dispersion” from the mean of a population, a data set, or a probability distribution i.e. it measures how the values are scattered from the mean.
For example, if the average SAT score of students in Stanford University is 1400, with a standard deviation of 20. This means that most students have scores within 20 marks of the mean (1380 – 1420), while almost all students have scores within 40 marks of the mean (1360 – 1440). If the standard deviation is zero, then the score of all students would be exactly 1400.
For n values, X1, X2, X3, …, Xn the standard deviation, , can be calculated as:
or
Where μ is the mean of the sample of size n and is defined as:
Example 1:
Recently, smoking at public places was declared as an offence. Delhi Police has started imposing a penalty against smoking in public and has eight raid teams in place. In a surprise check, the raid team caught 40 people smoking in the Connaught Place area of Delhi. The standard deviation and sum of squares of the amount found in their pockets were Rs. 10 and Rs. 40.000, respectively. If the total fine imposed on these offenders is equal to the total amount found in their pockets and if the fine imposed is uniform, what is the amount that each offender will have to pay as fine?
[FMS 2009]

(1) Rs. 90(2) Rs. 60
(3) Rs. 30(4) Rs. 15

Solution:
The standard deviation for n items having mean value (i.e. arithmetic mean) is given as:




We have also been given that each offender was fined an equal amount.
Let each person be fined Rs. y.
Then, the total fine collected from all 40 offenders = 40y
Which is given to be equal to the sum of amounts collected from all the offender’s pockets.



Hence, = y
Substituting all these values in equation (i), we get ,

Squaring both sides ,

y² = 1000 – 100 = 900
y = 30
Every person has to pay a fine of Rs. 30.
Hence, option 3.
Example 2:
The mean salary in ICM Ltd. was Rs. 1,500 and the standard deviation was Rs. 400. A year later each employee got a Rs. 100 raise. After another year each employee’s salary (including the above mentioned raise) was increased by 20%. The standard deviation of the current salary is:
[IIFT 2008]

(1) 460(2) 480
(3) 580(4) None of the above

Solution:
The mean salary was Rs. 1,500 and standard deviation (SD) was Rs. 400. Since after 1 year each employee got a raise of Rs. 100, it would mean that SD would still remain same.
Now in the next year, since the increase in salary was 20% for all of the employees, the SD would have gone up by 20%. ( The increase in salary will depend on the salary of the employee and might not be uniform)
Standard deviation after 20% raise = 400 + 20% of 400 = 400 + 80 = Rs. 480

The new standard deviation would be 480.
Hence, option 2.
  1. VARIANCE
The variance of a random variable or a sample tells us how widely spread the values of the random variable or sample are likely to be from the mean or the average. The larger the variance, the more scattered the observations from the average.
Variance is the square of the standard deviation and hence, is always a non negative number.
It is denoted by 2.
The variance is 2 means, maximum values of the random variable or sample lie in the interval
[ 2, + 2]
For example,
For numbers 1, 2, 3, 4 and 5 the mean is 3.
Example 3:
The mean of the numbers a, b, 8, 5, 10 is 6 and the variance is 6.80. Then which one of the following gives possible values of a and b?
[SNAP 2009]
(1) a = 0, b = 7(2) a = 5, b = 2
(3) a = 3, b = 4(4) a = 2, b = 4

Solution:


Mean is given as 6.


a + b = 30 – 23 = 7
So, option 4 can be eliminated.





(a 6)2 + (b 6)2 + 21 = 34
(a 6)2 + (b 6)2 = 13
Only option 3 fits into the above equation.
Hence, option 3.
  1. RANDOM VARIABLE
  1. DEFINITION
A random variable is a function, which assigns unique numerical values to all possible outcomes of a random experiment subject to fixed conditions.
For example, if a coin is tossed the outcome can either be ‘heads’ or ‘tails’. If the random variable X is defined as the number of heads then X can take only two values, 0 and 1 as there are only two outcomes of the experiment either ‘heads’ or ‘not heads’(‘tails’).
A random variable is a function that maps events to numbers.
Example 4:
A coin is tossed three times and the sequence of heads and tails are noted. The random variable X is defined as the number of heads in three coin tosses. What are the values that X can take?
Solution:
When three coins are tossed the total number of outcomes = 2 2 2 = 8
The table below lists the all possible elements of the sample space and the corresponding values of X.
Outcome X
HHH 3
HHT 2
HTH 2
HTT 1
THH 2
THT 1
TTH 1
TTT 0

The random variable X can take values 0, 1, 2 and 3.
Example 5:
Two dice are thrown and the sum of numbers on the two faces is noted. The random variable X is defined as the sum of numbers on the faces of the two dice. What are the values that X can take?
Solution:
The minimum value that X can take is 2 and the maximum value that X can take is 12 achieving all the values in between.

The random variable X can take values 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 and 12.
  1. TYPES OF RANDOM VARIABLES
  1. DISCRETE
A discrete random variable takes only integer values and is usually the result of counting.
For example, if we flip a coin and count the number of heads then the number of heads results from a random process - flipping a coin. Also, the number of heads will always be an integral value - a number between 0 and n (both inclusive), where n is the number of times the coin is tossed. Therefore, the number of heads is a discrete random variable.
A discrete random variable can take only a finite number of distinct values. Examples of discrete random variables include the number of students in a class, the Saturday night attendance at a movie theatre, the number of patients in a doctor's surgery, the number of defective goods in a box of a consignment.
  1. CONTINUOUS
A continuous random variable can take any value within a range of values. In this case the outcomes are measured, not counted. Cricket batting averages, IQ scores, the length of time a long distance telephone call lasts and the length of time a computer chip lasts are just a few examples of continuous random variables.
The continuous random variable is not defined at specific values. The probability of observing any single value is equal to 0, as the continuous random variable can take infinite values.
It is defined over an interval of values, and is represented by the area under a curve (in higher mathematics, this is known as an integral).
  1. PROBABILITY MASS FUNCTION
If the discrete random variable X can take values r1, r2, r3, ...., rn then the function that gives the probability when X = ri for all values r1, r2, r3, ...., rn

That is, f X
(ri) = P(X = ri) , i = 1, 2,3,…,n is the probability mass function.
Example 6:
If we throw a die what is the sum of the probability mass function corresponding to each outcome?
Solution:
Total number of outcomes = 6
Let the random variable X denote the score of the die.
Then X can take values 1, 2, 3, 4, 5 and 6.








This means the probability of throwing a die and getting a score of 1 or 2 or 3 or 4 or 5 or 6 is 1.

REMEMBER
  • The sum of probabilities (or the probability mass functions) corresponding to each element of the sample space is always 1.
    We can also understand this concept with a simple example of flipping a coin.
    On flipping a coin we either get heads (H) or tails (T).



    Probability of flipping a coin and getting ‘heads’ or ‘tails’ is 1.
    In general,
    If the event A has outcomes A1, A2, A3, …, An
.
    Then P(A1) + P(A2) + … + P(An
) = 1
  1. PROBABILITY DENSITY FUNCTION
Figure (i)
If X is a continuous random variable, then the probability density function (pdf) of X, represented by fX
(x) is a function such that for two numbers, a and b, a b, the probability that the random variable X, takes a value in the interval [a, b] is the area under the curve of the density function, fX(x), from a to b.
That is, P( a X b) = Area of the shaded region in Figure (i)
Also, total area under the probability density curve is 1 square unit.
Area of the shaded region in Figure (ii) = 1 square unit
Figure (ii)
  1. CUMULATIVE DISTRIBUTION FUNCTION
The cumulative distribution function, cdf, is function fX
(x) of a random variable, X, and is defined for a number x as the probability that the observed value of X will be at most x. It is the area under the probability density curve from ∞ to x.
  1. EXPECTATION
The expected value (or mean) of a random variable is its central or average value.
Stating the expected value gives a general impression of the behaviour of random variables without having all the details of its probability mass function (if it is discrete) or its probability density function (if it is continuous). The expected value is based on the past data and is used to forecast future results.
The expected value of a random variable X is represented by E(X) or .
  1. EXPECTATION FOR A DISCRETE RANDOM VARIABLE
If X is a discrete random variable that can take values x1, x2, x3, ..., xn
and P(xi) represents
P(X = xi), then the expected value X is given by:
All we are doing is computing the weighted average of the random variable over its range, using probabilities as weights.
For example,
If there are three people in the room of which 2 are 5 ft tall and 1 is 6 ft tall.
The random variable X denotes the height of a person in the room.
Then the only values X can take are 5 or 6.
Expectation for a continuous random variable
If X is a continuous random variable with probability density function fX(x), then the expected value of X is given by:
Where fX(x) is the probability density function of the continuous random variable X.
General Expectation
In general, for discrete random variable X,
For example,
Example 7:
A and B throw a die for a stake of Rs.11, which is to be won by the player who first throws a six. The game ends when the stake is won by A or B. If A has the first throw, what are their respective expectations?
[SNAP 2008]
(1) 5 and 6(2) 6 and 5
(3) 11 and 10(4) 10 and 1
Solution:
A wins either in the first throw of the game, or in the third throw after A and B lose in the first and second throws, respectively, or in the fifth throw after A loses in the first and third throws and B loses in the second and fourth throws and so on. The probability of a six being thrown is 1/6 and the probability of 6 not being thrown is 5/6.
The probability of A winning












Hence, option 2.
Example 8:
A medical clinic tests blood for certain disease from which approximately one person in a hundred suffers. People come to the clinic in group of 50. The operator of the clinic wonders whether he can increase the efficiency of the testing procedure by conducting pooled tests. In the pooled tests, the operator would pool the 50 blood samples and test them altogether. If the pooled test was negative, he could pronounce the whole group healthy. If not, he could then test each person’s blood individually. The expected number of tests the operator will have to perform if he pools the blood samples are:
[IIFT 2008]
(1) 47(2) 25
(3) 21(4) None of the above
Solution:
Probability that a random person has the disease is 1/100.
Probability that a random person does not have the disease is 99/100.
Out of a sample of 50 people, the probability that no one has the disease is (99/100)50
Out of a sample of 50 people, the probability that at least one person has the disease is


If no one in the sample of 50 people has the disease, then only 1 test will have to be conducted.
If at least one person in the sample of 50 people has the disease then 51 tests will have to be conducted.
The expected number of tests that the operator will have to conduct







51 – 30 21
Hence, option 3.
  1. VARIANCE
If a random variable X has expected value (mean)
= E(X), then the variance Var(X) of X is given by: Var(X) = E(X )2
If random variable X is discrete then,
If the random variable is continuous then,
Where fX(x) is the probability density function of the random variable X.
  1. RELATION BETWEEN THE EXPECTED VALUE AND VARIANCE
Var(X) = E(X2) {E(X)}2
Example 9:
The return levels and associated probabilities of two securities are given below:
What are their respective expected values and variances of returns?
[JMET 2008]
  1. X: E(X) = 12 and Var(X) = 0
Y: E(Y) = 13 and Var(Y) = 2.4
  1. X: E(X) = 12 and Var(X) = 5.4
Y: E(Y) = 13 and Var(Y) = 2.4
  1. X: E(X) = 12 and Var(X) =3
Y: E(Y) = 14.3 and Var(Y) = 4.13
  1. X: E(X) = 12 and Var(X) = 4.5
Y: E(Y) = 14.3 and Var(Y) = 6.19
Solution:



















Hence, option 2.
  1. STANDARD DEVIATION
For a random variable X, the standard deviation is
Example 10:
Find the expectation, variance and standard deviation of the number of heads when three coins are tossed.
Solution:
The random variable X is defined as the number of heads in three coin tosses.
X can take values 0, 1, 2 and 3 as shown in Example 1.










  1. DISCRETE PROBABILITY DISTRIBUTIONS
  1. BINOMIAL DISTRIBUTION
  1. DEFINITION AND APPLICATION
When a coin is flipped, the outcome is either a head or a tail; when a student solves a sum, the answer can either be correct or incorrect; when a baby is born, the baby is either born in the month of October or is not. Each of these events has two possible independent outcomes. One of the outcomes can be labelled "success" and the other, "failure". These are few examples of experiments that follow binomial distribution.
From the above examples we can generalize that an experiment that follows binomial distribution has the following properties:
  • The experiment consists of n repeated trials.
  • Each trial results in just two possible outcomes. One of these outcomes can be termed as a success and the other, a failure.
  • The probability of success, p, is constant throughout.
  • The outcome of one trial is not affected by the outcome of other trials, that is, each trial is independent.
If an event occurs n times (for example, a die is tossed n times), then the binomial distribution can be used to determine the probability of obtaining exactly x successes in the n outcomes.
The binomial probability for obtaining x successes in n trials, is given by the following function:
f(x) = nCx px q(n x)
, x = 1, 2, 3, …, n
We say that random variable X follows binomial distribution with parameters n and p.
Where,
x = Number of successes that result from a binomial experiment.
p = Probability of success on an individual trial.
q = Probability of failure in an individual trial.
n = Number of trials.
Also, p + q = 1
Example 11:
Two dice are tossed 5 times. Let X denote the number of throws in which the score of the 1st die exceeds the score of the 2nd die. What is the distribution of the experiment and the probability of its success?

Solution:
Let the number of successes to be obtained in 5 trials be x.
Number of trials = n = 5
In each trial, success = score on the 1st die > score of the 2nd die

Favourable outcomes = {(2, 1), (3, 1), (3, 2),(4, 1), (4, 2), (4, 3), (5, 1), (5, 2), (5, 3), (5, 4),(6, 1), (6, 2), (6, 3), (6, 4), (6, 5)}


Number of favourable outcomes = 15
Total outcomes = 6 6 = 36


The experiment consists of repeated trials. The die is tossed 5 times.
Either, Score on the 1st die > Score of the 2nd die or Score on the 1st die < Score of the 2nd die
Each trial can result in just two possible outcomes
The probability of success is 0.416(constant) on every trial.
The trials are independent; that is, the outcome of one trial does not affect the outcome of the second trial.
The random variable X follows binomial distribution with parameters n and p.
Here, n = 5 and p = 0.416.
q = 1 p = 0.584
The binomial probability for obtaining x successes in 5 trials are given by the following function:
f(x) = 5Cx
(0.416)x (0.584)(5 x)
  1. PROPERTIES
  1. MEAN
The mean or expected value of the random variable X that follows binomial distribution with parameters n and p is = E(X) = np
  1. VARIANCE
The variance of the random variable X that follows binomial distribution with parameters n and p is Var(X) = npq
  1. STANDARD DEVIATION
The standard deviation of the random variable X that follows binomial distribution with parameters n and p is
Example 12:
The random variable X follows binomial distribution with parameters n and p. If E(X) = 5 and Var(X) = 4 then find n and p.
Solution:
For a binomial distribution,
E(X) = np
np = 5...(i)
Also, Var(X) = npq
npq = 4...(ii)
Dividing (ii) by (i) we get,


We know that p = 1 q


Substituting the value of p in (i) we get,
n = 25
  1. POISSON DISTRIBUTION
  1. DEFINITION AND APPLICATION
Poisson distribution is generally used when the sample size is very large.
An experiment that follows Poisson distribution has the following properties:
  • Each outcome of the experiment can be classified distinctly as a success or a failure.
  • The average number of successes (λ) that occur in a specified region is known.
  • The probability of success is proportional to the size of the region.
  • The probability of success in an extremely small region is approximately equal to zero.
Note: The specified region could be a length, an area, a volume, a period of time, etc.
In Mumbai on an average 20 car accidents happen in a month:
  • On a particular day either car accidents happen or they don’t.
  • The average number of car accidents (success) that occur in a specified region (month) is known.
  • The probability that no accident takes place reduces as the number of months increase.
  • The probability of a car accident in a minute (extremely small region as compared to a month) will be virtually zero.
The number of car accidents taking place over a given period is an example of a random variable that follows Poisson distribution.
Some other examples of experiments that follow Poisson distribution are:
  • Birth defects and genetic mutations.
  • Traffic flow.
  • Mortality of infants in a city.
  • The number of misprints in a book.
  • The number of bacteria on a plate.
The average number of successes in a specified region is represented by λ and the probability of x successes is given by the following function:
In this case we say that the random variable X follows Poisson distribution with parameter λ.
Example 13:
Suppose that flaws in the plywood occur at an average rate of 1 per 50 square feet. What is the probability that a 4 8 feet sheet will have:
(i) No flaws
(ii) At most one flaw

Solution (i):
The average number of flaws per 50 sq ft is 1.


Let the random variable X be the number of flaws.
Let us term success as finding a flaw and failure as otherwise.
This is an example of an experiment that follows Poisson distribution as the experiment results in outcomes that can be classified as successes (flaw) or failures (no flaw).

The average number of flaws (successes) that occur per 32 sq feet (specified region) is known.
As the area of the plywood increases the probability of finding at least one flaw increases.
The probability of success (flaw) in an extremely small region (0.01 sq foot) is 0.0064 i.e. virtually zero.
The probability that a success will occur is proportional to the size of the region.
Hence, the random variable X follows Poisson distribution with parameter λ = 0.64.
Probability that 32 sq feet of the plywood will have x flaws is given by the following function


If the 32 sq ft plywood has no flaw then x = 0.


Solution (ii):
If the 32 sq ft plywood has at the most one flaw then x 1 i.e. x = 0, 1





  1. PROPERTIES
  1. MEAN
The mean or expected value of the random variable X that follows Poisson distribution with parameter λ is
= E(X) = λ
  1. VARIANCE
The variance of the random variable X that follows Poisson distribution with parameter λ is Var(X) = λ which is same as its MEAN.
  1. STANDARD DEVIATION
The standard deviation of the random variable X that follows Poisson distribution with parameter λ is
Example 14:
Random variable X follows Poisson distribution with parameter λ such that P(X = 0) = P(X = 1). Find E(X).

Solution:
By Poisson distribution the probability of x successes is given by the following function






Solving (i) and (ii) for λ we get, λ = 1
For Poisson distribution, E(X) = λ
E(X) = 1
  1. LIMITING CASE OF BINOMIAL DISTRIBUTION
For n 20 and p 0.05, or n 100 and np 10,the binomial distribution with parameters n and p can be approximated by Poisson distribution by taking λ = np.
Example 15:
If 1% of Nike t-shirts are defective, what is the probability that a carton of 50 Nike t-shirts has at least 2 defective t-shirts?

Solution:
Let the random variable X represent the number of defective t-shirts.
1. The experiment consists of repeated trials.
All the t-shirts have to be checked for a defect.
n = 50
2. Either a t-shirt is defective or it’s not defective.
Each trial can result in just two possible outcomes.
3. The probability of the t-shirt being defective is 0.01 on every trial.
4. The trials are independent; that is, getting a defective t-shirt in one trial does not affect whether we get defective t-shirts in other trials.
X follows the binomial distribution with parameters n and p.
n 50 and p 0.01
B(n, p) P(λ) where λ = np = 0.5
The probability of finding x defective items out of 50 is given by the function


The probability of finding at least 2 defective items out of 50 is given by
P(X 2) = 1 P(X < 2)
= 1 {f(0) + f(1)}


0.09
  1. HYPERGEOMETRIC DISTRIBUTION
  1. DEFINITION AND APPLICATION
Hypergeometric distribution is used for calculating probabilities for samples drawn from relatively small population and without replacement. This means that an item's chance of being selected increases on each trial.
An experiment that follows hypergeometric distribution should have the following properties:
  • A sample of size n is randomly selected one at a time (n trials) without replacement from M items.
  • Out of M items, k items are of one type and can be classified as successes and M k items are of one type and can be classified as failures.
The discrete random variable X is used to count the number of successes in the sample.
Choosing a team of 8 from a group of 12 boys and 6 girls or a council of 5 senators from the legislature of 22 Democrats and 18 Republicans are examples of hypergeometric experiments.
If we have to choose a team of 8 with 4 boys from a group of 10 boys and 7 girls then:
  • A sample of size 8 is randomly selected one at a time (8 trials) without replacement from a population of 17 students.
  • In the population, 10 (k) items are boys and can be classified as successes and 7 students are girls and can be classified as failures.
Note that it would not be an experiment that follows binomial distribution as the probability of success is not constant on every trial. In the beginning, the probability of selecting a boy is 10/17. If you select a boy on the first trial, the probability of selecting a boy on the second trial is 9/16. And if you select a girl on the first trial, the probability of selecting a boy on the second trial is 10/16.
In a population of M items if k items are of one type and M k items are of one type then the probability of selecting a sample of size n (n trials) without replacement where x are of one type and (nx) are of the other is given by the following function:
We say that the random variable X follows hypergeometric distribution with parameters M, k and n.
Example 16:
An urn has 10 balls out of which 4 are defective. If a sample of 5 balls is drawn, find the probability of getting exactly 2 defective balls when the sampling is done:
(i) Without replacement
(ii) With replacement

Solution (i):
Let the random variable X denote the number of defective balls selected.
If the sample is drawn without replacement then the experiment follows hypergeometric distribution as
A sample of size 5(n) is randomly selected one at a time (5 trials) without replacement from a population of 10(M) items.
In the population, 4 items (balls) are defective and can be classified as successes and 10 4 = 6 items (balls) are not defective and can be classified as failures.
The probability of getting exactly x defective ball’s is

The probability of getting exactly 2 defective balls is


Solution (ii):
If the balls are drawn with replacement then the experiment follows binomial distribution as
  • The experiment consists of repeated trials. 5 balls are drawn.
  • Either the ball drawn is defective or its not. Therefore, each trial can result in just two possible outcomes.
  • The probability of success is 0.4 on every trial.
  • The trials are independent; that is, a defective ball in one draw does not affect whether we get defective balls in other draws.
Here, n = 5 and p = 0.4
q = 1 p = 0.6
The binomial probability for obtaining x successes in n trials, is given by the following function:
f(x) = n
Cx px q(n x) , x = 1, 2, 3, …., n
The binomial probability for obtaining 2 successes in 5 trials is given by,
f(2) = 5C2 (0.4)2(0.6)(5 2) = 0.3456
  1. PROPERTIES
  1. MEAN
The mean or expected value of the random variable X that follows hypergeometric distribution with parameters M, k and n is
  1. VARIANCE
The variance of the random variable X that follows hypergeometric distribution with parameters M, k and n is
  1. STANDARD DEVIATION
The standard deviation of the random variable X that follows hypergeometric distribution with parameters M, k and n is
  1. CONTINOUS PROBABILITY DISTRIBUTIONS
  1. UNIFORM DISTRIBUTION
  1. DEFINITION AND APPLICATION
In a uniform distribution, the probability of success is the same for all values of X. It is also called a rectangular distribution.
For example, if a die is tossed we can get the score as either 1 or 2 or 3 or 4 or 5 or 6. The probability of obtaining any one of the six possible outcomes is 1/6. Since the probability of getting any of the outcomes is the same, it’s a uniform distribution. This is an example of discrete uniform distribution.
Also, if buses arrive at a given bus stop every 15 minutes and you arrive at the bus stop at a random time, the time you wait for the next bus to arrive follows uniform distribution in the interval [0, 15]. In this case the random variable, X, is continuous as it can take any value between 0 and 15 and is not necessarily an integer. Whether a person arrives at the bus stop 5 minutes after the last bus left or 12.34 minutes after the last bus left, his probability of boarding the next bus will be same.
The probability of occurrence of any event x is given by the following function
Example 17:
If we toss a die then what is the probability that the die will show a number that is smaller than 4?
Solution:
The probability function of the experiment is defined in Example 5. Each possible outcome is equally likely to occur. Thus, we have a uniform distribution.
The probability that the die will land on a number smaller than 4 is equal to:



  1. PROPERTIES
  1. MEAN
The mean or expected value of the random variable X that follows uniform distribution in the interval [a, b] is
  1. VARIANCE
The variance of the random variable X that follows uniform distribution in the interval [a, b] is
  1. STANDARD DEVIATION
The standard deviation of the random variable X that follows uniform distribution in the interval [a, b] is
Example 18:


Solution:




Solving (i) and (ii) for a and b we get,
a = 1 and b = 3







The graph of the given function is,

P(X < 0) is the area under the curve from 1 to 0, that is, the area under the curve shaded with forward slanting lines.







Example 19:

The following probability distribution can be used to represent the waiting time (w) of the customer in a bank:




The expected waiting time of the customer in the bank E(w) is:
[JMET 2009]




Solution:

The probability density function given in the question is that of a random variable which follows uniform distribution.


Hence, option 1.

  1. NORMAL DISTRIBUTION
  1. DEFINITION AND APPLICATION
The normal distribution is used to approximately describe a random variable that tends to cluster around the mean. For example, the weight of adult males in the Africa is normally distributed, with a mean of 50 kilograms. This means that the weight of most men is close to the mean, though a small number of outliers weigh significantly above or below the mean. A histogram of male weights will appear similar to a normal curve, with the correspondence becoming closer as the sample size increases.
If the random variable X follows normal distribution, then the probability of occurrence of any event x is given by:
Where is the mean and is the standard deviation of the random variable X.
We say that the random variable X follows normal distribution with parameters and .
By definition the mean, standard deviation and variance of a random variable that follows normal distribution is , and 2.
  1. NORMAL CURVE
The graph of the probability density function of the normal distribution is bell-shaped and is known as the bell curve. It attains maximum value at X = .
Figure 1:
Figure 2:
The graph of the normal distribution depends on the mean and the standard deviation. The mean of the distribution determines maxima of the graph, and the standard deviation determines its width and height.
For a large standard deviation, the curve is short and wide (See Figure 1). For a small standard deviation, the curve is tall and narrow (See Figure 2). All normal distribution curves/bell curves are symmetric irrespective of their mean or standard deviation.
  1. PROPERTIES OF THE NORMAL CURVE
  • The total area under the bell curve is equal to 1.
  • The probability that a normal random variable equals any particular value is 0, that is,
P(X = a) = 0.
Therefore, P(X < a) = P(X a)
This is true in case of all continuous random variables.
  • The probability that X is greater than a equals the area under the normal curve between X = a and X = ∞ (non-shaded area in the figure below).
  • The probability that X is less than a equals the area under the normal curve between X = a and X = ∞ (the shaded area in the figure below).
  1. EMPIRICAL RULE
This rule holds true for every normal curve and is independent of the standard deviation and mean of the normal distribution.
  • Approximately 68% of the area under the normal curve lies between X = and X = + i.e. within 1 standard deviation of the mean.
  • Approximately 95% of the area under the normal curve lies between X = – 2 and X = + 2 i.e. within 2 standard deviations of the mean.
  • Approximately 99.7% of the area under the curve lies between X = – 3 and X = + 3 i.e. within 3 standard deviations of the mean.
This rule is also known as the 68-95-99.7 rule.
For example,
Let the random variable X be defined as the weight of females in London with mean as 55 kg and standard deviation as 5 kg. Then X is a continuous random variable as its value is not necessarily an integer. Moreover, weight is measured not counted.
The average weight of females in London is 55 kg, with a standard deviation of 5 kg. This means that 68% of the females weigh within 5 kg of the mean (50 kg–60 kg), while 95% of females weigh within 10 kg of the mean (45–65) and 99.7% of the females weigh within 15 kg of the mean (40 kg–70 kg).
  1. STANDARD NORMAL DISTRIBUTION
A continuous random variable Z follows standard normal distribution if it follows normal distribution with mean 0 and standard deviation 1.
Probability of occurrence of any event z is given by,
  1. READING THE STANDARD NORMAL TABLE
Using the standard normal table we can determine the probability that P(Z z) or simply the area under the bell curve to the left of Z = z.
In order to calculate P(Z z) using the standard normal table we follow the given steps :
Calculate the value of Z to 2 decimal places. Round off the value if necessary.
Z = a.bc, for some integer a and non negative integers b and c.
Find a.b from the column headed z.
Find 0.0c from the row whose first element is z.
P(Z z) = value found at the intersection of the row and column.
For example,
P(Z 0.45) = 0.6736
The following examples show how to use the standard normal tables to find the probability of Z for different values of its range.
  1. P(Z 0.57) = 0.7157
Also, P(Z 0.57) = Area under the normal curve to the left of Z = 0.57 (As shown in Figure a))
Figure (a):
  1. P(Z 1.23) = 0.8907
Also, P(Z 1.23) = Area under the normal curve to the left of Z = 1.23 (As shown in Figure (b))
Figure (b):
  1. P(Z > 1.23) = 1 P(Z 1.23)
= 1 0.8907
= 0.1093
Area of the shaded region in Figure (c)
= Total area under the normal curve Area of the shaded region in Figure (b)
= 1 Area of the shaded region in Figure (b)
Also, P(Z > 1.23)
= Area under the normal curve to the right of Z
= 1.23 (As shown in Figure (c)).
Figure (c):
  1. P(Z 1.23) = 1 P(Z 1.23)
= 1 0.8907
= 0.1093
Since, the normal curve is symmetrical about the mean
Area of the shaded region in Figure (c)
= Area of the shaded region in Figure (d)
Figure (d):
Alternatively,
We can directly determine the value of P( Z < 1.23) from the normal table that gives cumulative probability for negative Z values.
P(Z < 1.23) = 0.1093
  1. P (Z > 1.23) = 1 P(Z 1.23) …[As done in 3]
= 1 {1 P(Z 1.23)} …[As done in 4]
= P(Z 1.23)
= 0.8907
By the above example we have that,
Area of the shaded region in Figure (c)
= Area of the shaded region in Figure (d)
1 Area of the shaded region in Figure (c)
= 1 Area of the shaded region in Figure (d)
Area of the shaded region in Figure (b)
= Area of the shaded region in Figure (e)
Figure (e):
Alternatively,
P(Z > 1.23) = 1 P(Z 1.23) …[As done in 3]
= 1 0.1093
= 0.8907
  1. P (0.57 < Z < 1.72) = P(Z 1.72) P(Z 0.57)
= 0.9573 0.7157
= 0.2416
Area of the shaded region in Figure (g)
= Area of the shaded region in Figure (f) Area of the shaded region Figure (a)
Figure (f):
Figure (g):
  1. P(0.57 < Z < 1.72) = P(Z 1.72) P(Z 0.57)
= P(Z 1.72) {1 P(Z 0.57)}
= 0.9573 0.2843
= 0.673
Area of the shaded region in Figure (i)
= Area of the shaded region in Figure (f) Area of the shaded region in Figure (h)
= Area of the shaded region in Figure (f) {1 Area of the shaded region in Figure (a)}
Figure (h):
Figure (i):
  1. P (1.72 < Z < 0.57) = P(Z 0.57) P(Z 1.72)
= 0.2843 0.0427
= 0.2416
Area of the shaded region in Figure (k)
= Area of the shaded region in Figure (j) Area of the shaded region in Figure (h)
Figure (j):
Figure (k):
  1. P





(1.72 < Z and Z > 0.57) = P(Z > 0.57) + P(Z 1.72)
= {1 P(Z 0.57)} + {1 P(Z 1.72)}
= 1 0.7157 + ( 1 0.9573)
= 0.2843 + 0.0427
= 0.327
Area of the shaded region in Figure (m)
= Area of the shaded region in Figure (l) + Area of the shaded region in Figure (j)
Figure (l):
Figure (m)
REMEMBER

  • For Z < 0,
    The area under the bell curve < 0.5
  • For Z > 0,
    The area under the bell curve > 0.5
  1. CONVERTING NORMAL VARIABLE TO STANDARD NORMAL VARIABLE
If a continuous random variable X follows normal distribution with parameters and then X can be converted to a standard normal variable, Z, by substituting the value of Z as
Using (i) we get the following:
Example 20:
If X follows a normal distribution with mean 50 and standard deviation 5, then find the values of the following using standard normal tables.
(i) P(X < 45)
(ii) P(X > 60)
(iii) P(48 < X < 55)

Solution:
(i) We will first find the Z value of 45.


    P(X < 45) = P(Z < 1) = 0.1587

(ii) We will first find the Z value of 60.


    P(X > 60) = 1 P (X 60)
    = 1 P(Z 2)
    = 1 0.9772
    = 0.0228


(iii)We will first find the Z value of 48 and 55.




    P(48 < X < 55) = P(0.4 < Z < 1)
    = P(Z 1) P(Z 0.4)
    = 0.8413 0.3446
    = 0.4967
Example 21:
The diameter of the toothpaste tube manufactured by HUL is a random variable with mean as 2 cm and standard deviation as 0.1 cm. For a particular order it is required that the diameter of the tube should not be less than or equal to 1.95 and it should not be greater than or equal to 2.3 cm. What proportion of the toothpaste tubes made would meet the requirement?
Solution:
Let the random variable X be defined as the diameter of the toothpaste tube.
The random variable X follows normal distribution with mean 2 and standard deviation 0.1.
Proportion of the toothpaste tubes with diameter between 1.95 and 2.3 = P ( 1.95 < X < 2.3)

We will first find the Z value of 1.95 and 2.3.







P( 1.95 < X < 2.3)
= P(Z < 3) P(Z < 0.5)
= 0.9987 0.6915 ... [From the standard normal tables]
= 0.3072
  1. DETERMINING DISTRIBUTIONS
Example 22:
Question 1:
The 5’s, 10's and 20's biscuit packets are pro duced in lot sizes of 100 each. Three packets each of 5's, 10's and 20's are inspected at random. If even one biscuit is found broken in any of the three, the respective lot is rejected, the probability that 1 broken biscuit will be found in a 5's, 10's or 20's packet is estimated to be 0.10, 0.20 and 0.30 respectively. If a sample of three packets each of 5's, 10's and 20's is inspected, which probability distribution should we use to estimate the probability that all nine packets will be accepted?

(1) Normal(2) Binomial
(3) Poisson (4) Hypergeometric
Question 2:
What is the probability that all nine packets will be accepted?
(1) 0.749 (2) 0.006
(3) 0.128 (4) 0.504
[JMET 2008]

Solution 1:
Normal distribution is a continuous distribution that clutters around a mean or average.
We can eliminate option 1.
The hypergeometric distribution is a discrete probability distribution that describes the number of successes in a sequence of n draws from a finite population without replacement, just as the binomial distribution describes the number of successes for draws with replacement.
In the question, the probability that a biscuit is broken is same for each of the 3 packets picked up from the same lot. And here we assume that the packets are picked with replacement. Therefore, we may use either Binomial or Poisson distribution to evaluate the probability.
The binomial distribution converges towards the Poisson distribution for n 20 and p 0.05.
Here, n = 3 and n < 20.
We can also eliminate option 3.
Hence, option 2.

Solution 2:
Probability of not having any defective packs containing 5 biscuits = (1 0.1)3 = 0.93 = 0.729
Similarly, probability of not having any defective packs containing 10 and 20 biscuits is 0.512 and 0.343.
Total probability of all packets being accepted
= 0.729 0.512 0.343 = 0.128
Hence, option 3.

Example 23:
The number of shoppers entering a retail store in an hour is a good example of a random variable that follows:
[JMET 2009]
(1) Normal distribution
(2) Binomial distribution
(3) Poisson distribution
(4) None of the above

Solution:
If a random variable X is defined as the number of shoppers entering a retail store in an hour, then X is a discrete random variable as the number of shoppers entering the store per hour, though large, can be counted and will always take an integral value.
We can eliminate option 1 as it’s a continuous probability distribution.
Here we don’t know the probability of the shopper entering the store but we know the number of shoppers entering the shop per hour i.e. λ.
Moreover,
The average number of shoppers (successes) that enter per hour (specified region) is known.
The probability of “no shopper” entering the retail store decreases as the number of hours increase.
Also, the probability of shoppers entering the store per 5 seconds (very small region) will be very small or virtually zero.
The number of shoppers entering in a retail store in an hour is a good example of the random variable that follows Poisson distribution.
Hence, option 3.
  1. SUMMARY OF DISTRIBUTIONS
  1. COEFFICIENT OF CORRELATION AND LINEARITY
  1. DEFINITION
Two variables p and q are said to share a linear relationship if the graph of p when plotted against q or q when plotted against p is a straight line.
Coefficient of correlation is used to measure the degree of the linear dependence between any two random variables X and Y.
It is represented by ρ and its value ranges between 1 and 1.
The co-efficient of correlation, ρx
, Y between two random variables X and Y is defined as
Where X and X are the mean and standard deviation respectively, of random variable X, and Y and Y
are the mean and standard deviation respectively, of random variable Y.
or
Note: The coefficient of correlation for two random variables is defined only if their standard deviations are non zero.
  1. INTERPRETATION
The greater the absolute value of the coefficient of correlation (i.e. the closest the value is to 1 or 1), the greater is the linear dependence between the variables.
Coefficient of correlation equal to 1 or 1 indicates the strongest linear relationship. In this case data points fall exactly in a straight line as shown in the following scatter plots.
For ρ = 1
For ρ = 1
If 0 < ρ 1, then there exists an increasing linear relation between random variables X and Y. In this case, the value of Y increases as X increases as shown in the scatter plot below.
For ρ = 0.46
If 1 ρ < 0, then there exists a decreasing linear relation between random variables X and Y. In this case, the value of Y decreases as X increases as shown in the following scatter plot.
For ρ = 0.54
If ρ = 0 then the variables are linearly independent.
For ρ = 0
REMEMBER

  • The coefficient of correlation measures the linear relationship between variables.
    For example,
    A random variable X is uniformly distributed over the interval [1, 1] and Y = X3.
    The value of Y is totally dependent on X but the value of the coefficient of correlation of X and Y is zero as X and Y share a curvilinear relationship.
  • Correlation is affected by outliers. One outlier can considerably reduce the value of the coefficient of correlation as shown in the scatter plot.

    For ρ = 0.71
Example 24:
An investment consultant has plotted the graph between BSE Sensex and Indian inflation rate as shown in the graph to better understand their relationship.



What is the coefficient of correlation between BSE Sensex and the Indian inflation rate?
[JMET 2009]

(1) 1000(2) 0.05
(3) 1(4) 2000

Solution:
Since the BSE sensex decreases linearly when the inflation rate increases linearly, the coefficient of correlation has to be less than 0. Also, the points plotted fall exactly in a straight line indicating a strong linear relationship.
The coefficient of correlation is 1.
Hence, option 3.
  1. COEFFICIENT OF VARIATION
  1. DEFINITION
Coefficient of variation is defined as the ratio of mean and standard deviation of a data set or a random variable.
For example,
A random variable X is defined as the share price of company P and the mean of X is Rs. 5,000/month. If the coefficient of variation is 0.02 then the standard deviation will be Rs. 100.
By the empirical rule, 95% of the time the price of the share ranges between Rs. 4,800 and Rs 5,200(See figure below).
If the coefficient of variation is 0.2 then the standard deviation will be Rs. 1,000.
By the empirical rule, 95% of the time the price of the share ranges between Rs. 3,000 and Rs 7,000(See figure below).
Thus, a smaller coefficient of variation means lesser fluctuations, and therefore a more stable trading environment.
  1. APPLICATIONS
It helps in comparing one data series to the other even if their means are different.
It helps to analyse the amount of risk taken as compared to the return expected on the investment.
Example 25:
Coefficient of variation is useful to study
[SNAP 2008]
(1) Risk (2) Disparity
(3) Consistency(4) All of the above

Solution:
The coefficient of variation is useful for studying risk, disparity as well as consistency.
Hence, option 4.

Pages