卫生统计学目录
Biostatistics
统计资料类型
统计学集中趋势
统计学离散趋势
极差;四分位数;方差variance;标准差standard
deviation;
变异系数coefficient
of dispersion (CV)-标准差与匀数之比(S/X-m),属于相对数,便于资料间的比较。
『计量性资料』
的统计推断
【正态分布
normal distribution】
正态分布Gaussian
distribution-标准正态分布standard
normal distibution(移动分布曲线的高峰至坐标O处,u分布);对数正态分布log-normal
distribution于(对一些偏态分布的数据进行对数处理,可获得正态分布进行处理)
【总体匀数的估计和假设检验
hypothesis testing/significance tset】
统计推断statistical
inference-包括二个方面:参数估计和假设检验
参数估计就是转换成样本指标(统计量)来估计总体指标(参数);
考虑抽样误差,以interval
estimation(区间估计)按一定概率估计总体匀数。
抽样误差sampling
error(样本匀数与总体匀数之间的误差);
标准误standard
error(样本匀数的标准差);
用样本的资料按u分布处理-t变换(t分布);
二样本匀数的比较用t检验或u检验;
Two difference as the d for calculating SS and
S-error [two marching data or sample vs. population
(value of theoretical, or normal or from large of group, could
assume as one as standard and two mean should be as Ho for no
difference to test)]; or the two groups have the same number
of samples (n1=n0 always)
1)Mean of sample with total
mean of population:
样本匀数与总体匀数比较的
t 检验: 山区健康男子的脉搏是否高于一般
t=(X1m-X0m) / (s/n^1/2)
i.g t=(74.2pulse1-72pulse0)/6.5/25^1/2=1.692;
v=n-1=25-1=24; t value table for t
2) Mean of two sample
marched:
注射组与非注射组;施加二种因素的尿样对比;治疗前后;这类配对研究问题。
t=(d1-d0) / (s/n^1/2; d= d1-d2;
s=[(sum-(d)^2-(sum-d)^2/n]/(n-1)
it is same as
t=(X1-X0) / (s/n^1/2) i.g. treatment before vs.
after
Others (or n1=/=n0):
3) Mean of two samples (or
two groups):
克山病的11例血磷值与13例正常值的对比检验;
t=(X1-mean-X0-mean) / Sx1-x0;
Sx1-x0=[Sc/(1/n1+1/n0)]^1/2; v=n1+n2-2
Sc(合并方差combined
or pooled variance) = [
(sum-(x1)^2-(sum-x1)^2/n1+(sum-(x0)^2-(sum-x0)^2/n0]/(n1+n0-2) ]
i.g. health vs. sick; the mean of two representative of the
each group of the population as considered.
4) Mean of two log(mean)
sample:
两样本几何匀数比较的
t 检验:抗体滴度
t=(X1-X0) / Sx1-x0; => t=(lgX1-lgX0) /
Sx1-x0 by log calculation; data is as the relationship with
times or doubles.
5)
两大样本匀数比较的
u 检验 When n is large, i.e.
>=50; Using the U-test; u=(x1m-x0m)/[(s1^2/n1+s0^2/n0)]^1/2
t-test |
|
A |
|
B |
|
Diff |
x^2 |
配对研究 |
|
|
|
|
|
|
|
|
1 |
3550 |
|
2450 |
|
1100 |
1210000 |
|
2 |
2000 |
|
2400 |
|
-400 |
160000 |
|
3 |
3000 |
|
1800 |
|
1200 |
1440000 |
|
4 |
3950 |
|
3200 |
|
750 |
562500 |
|
5 |
3800 |
|
3250 |
|
550 |
302500 |
|
6 |
3750 |
|
2700 |
|
1050 |
1102500 |
|
7 |
3450 |
|
2500 |
|
950 |
902500 |
|
8 |
3050 |
|
1750 |
|
1300 |
1690000 |
|
|
|
|
|
|
|
|
|
sum |
26550 |
|
20050 |
|
6500 |
|
|
M=6500/8 |
|
812.5 |
|
|
42250000 |
7370000 |
|
SS |
[7370000-422500/8]/(8-1)= |
298392.9 |
sqrt= |
546.2535 |
|
t=(M-u)/(s/n^1/2) |
4.207016 |
|
546.25/n^1/2 |
193.1298 |
|
|
|
|
|
|
|
|
|
|
3318.75 |
|
2506.25 |
|
812.5 |
|
二样本方差的比较用F检验
1) when camparing two small samples: 1st
F--test; F=SS-large/SS-small; if F is refused Ho checked, then
t'-test,
两小样本匀数(二组病例)比较的
F 检验(判断方差是否齐);方差不齐=>再做
t' 检验
t'= (x1-x0)/[(s1^2/n1+s0^2/n0)]^1/2;
i.g. t'= (x1-x0)/[(s1^2/n1+s0^2/n0)]^1/2=(6.21-4.34)/[(1.79)^2/10+(0.56)^2/50]=3.272
(s1^2=0.3204; s0^2=0.006272)
ta'=[s1^2 t(a,u1)+s0^2
t(a,u2)]/(s1^2+s0^2 ); i.g. v1=10-1=9; v0=50-1=49, get
t0.05=2.262 and t0.005=2.009 (two-side);
t0.05=[(0.3204)(2.262)+(0.006272)(2.009)]/(0.3204+(2.009)=2.257
ta' vs. t':
t0.05=2.257, t'=3.272; p<0.05, with a=0.05, refuse Ho,
accept H1
假设检验:hypothesis
under test (null hypothesis)
Ho (例如,二样本匀数相等)和
alternative hypothesis H1于(
例如,二样本匀数有相差,并非抽样误差引起);
a--level
to check
P值是指在由Ho所规定的总体中作随机抽样,获得等于及大于(或等于及小于)现有统计量的概率。
P<=a,结论按所取检验水准拒绝Ho,接受H1
;表示符合(预设为样本与总体之间的Ho假设的检验是小概率事件。
P>=a,即现有样本信息支持Ho,没有理由拒绝,此时只好接受它。
即所得到的检验的H0正确的概率小于检验水准a的概率,按a的概率而言,检验概率落在a的概率(分布面积)之内,因此,H0可能(正确)的概率不接受(或拒绝)。
二个样本匀数的比较用t检验或u检验:
适应于方差齐homoscedasticity(二总体方差相等的关系)。
而如果要了解二样本方差不等是否由于抽样误差所致,可用方差性检验,方法用F检验;t'检验(适当变量变换),秩和检验,近似法。
第一类错误type
I error(拒绝了实际成立的Ho);第二类错误type
II error(不拒绝了实际不成立的
Ho)。
【方差分析
analysis of variance (ANOV)】
二个样本匀数的比较用t检验或u检验,多个样本的比较用方差分析。
方差分析基本思想是把全部观察值之间的变异--总变异,按设计和需要分为二个或多个组成部分,再作分析。除用于多样本匀数的比较外,还可用于二个样本匀数的比较(可以把一个样本按二组或多组进行组间和组内匀数再作分析),分析因素间的交互作用和回归方程的线性假设检验等。
三个不同时期的大鼠染尘湿肺重比较(完全随机设计的多样本匀数的比较)
|
|
A |
|
|
|
B |
|
|
|
C |
|
|
|
SUM |
|
|
|
|
|
(X)^2 |
|
|
|
(X)^2 |
|
|
|
(X)^2 |
|
|
|
(X)^2-sum |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
3.3 |
10.89 |
|
|
4.4 |
19.36 |
|
|
3.6 |
12.96 |
|
|
11.3 |
127.69 |
|
|
1 |
3.6 |
12.96 |
|
|
4.4 |
19.36 |
|
|
4.4 |
19.36 |
|
|
12.4 |
153.76 |
|
|
1 |
4.3 |
18.49 |
|
|
3.4 |
11.56 |
|
|
5.1 |
26.01 |
|
|
12.8 |
163.84 |
|
|
1 |
4.1 |
16.81 |
|
|
4.2 |
17.64 |
|
|
5 |
25 |
|
|
13.3 |
176.89 |
|
|
1 |
4.2 |
17.64 |
|
|
4.7 |
22.09 |
|
|
5.5 |
30.25 |
|
|
14.4 |
207.36 |
|
|
1 |
3.3 |
10.89 |
|
|
4.2 |
17.64 |
|
|
4.7 |
22.09 |
|
|
12.2 |
148.84 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Exi |
22.8 |
519.84 |
|
|
25.3 |
640.09 |
|
|
28.3 |
800.89 |
|
|
76.4 |
5836.96 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
SUMi/N |
ni |
6 |
86.64 |
|
|
6 |
106.6817 |
|
|
6 |
133.4817 |
|
|
18 |
324.2756 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Mi |
3.8 |
|
|
|
4.216667 |
|
|
|
4.716667 |
|
|
|
4.244444 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
(Xa)^2 |
|
87.68 |
|
|
|
107.65 |
|
|
|
135.67 |
|
|
331 |
|
|
SUMi/N |
(Xexi)^2/6 |
|
86.64 |
|
|
|
106.6817 |
|
|
|
133.4817 |
|
|
326.8033 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
N=18 |
k=3 |
n=17,V=2,V=15 |
|
Ma-b-c |
|
|
|
|
|
|
|
|
|
|
|
|
N-k=15 |
k-1=2 (min 1
for free degagree |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
C: |
(X)^2-sum |
/N |
324.2756 |
|
|
|
|
|
|
|
|
SSsum |
331-324.2756= |
6.724444 |
|
SS-total |
=331-324 |
|
6.724444 |
|
V-total=18-1=17 |
17 |
|
|
|
|
SSbwt |
326.8033-324.2756= |
2.527778 |
|
SS-btw |
=326.8033 -C |
2.527778 |
|
Vk=k-1=3-1=2 |
2 |
|
|
|
|
SSinr |
SSsum-SSbwt= |
4.196667 |
|
SS-inner |
SSt - SSb |
|
4.196667 |
|
Vi=N-Vk=18-3=15 |
15 |
|
|
|
|
|
|
|
|
|
MSbtw |
SSbtw / Vbwt=
2.527778/2 |
|
1.263889 |
|
|
|
|
|
|
MSbwt=SSbwt/Vb.2 |
1.263889 |
|
|
MSi |
Ssi/Vi |
|
|
|
0.279778 |
|
|
|
|
|
|
MSinr=SSinr/Vi15 |
0.279778 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
F=MSbte/MSinr |
4.517474 |
|
|
F=MSbtw/Msi |
|
|
|
4.517474 |
|
|
|
|
|
|
|
|
|
|
1) 完全随机设计的多样本匀数的比较:常见比较几种不同疗法对某病的疗效,不同因素或物质造成的影响;
SS(--inner)=SS(-total)-SS(-between);
SS=[Sum(X)^2+(sumX)^2/n]/n-total or/ n-no of group or/ n-no of
individual of each group) respectively;
SS(-total)=[sum (each total
of group)^2-[(sum (each of group)]^2/n-(tatal) [C]
SS(-between)=[(group1)^2+(group2)^2+(group3)^2]/no
of each group have (i.g, each group of 3 group have 6
individual, then 6, not 3)-[(sum (each of group)]^2/n-(tatal)[C]
MS-btw=SS-btw/k;
Ms-inner/v; k=n-2(when total of 3 groups); v=n-k;
(to get the Mean or single SS-err, using n-1, v, k; otherwise,
n used)
F=MS-btw/Ms-inner;
(v-btw and v-inner to get F value) (Note: this example;
SS-bwt divided by 6, but MS-bwt divided by 2 since only 3
groups and minus 1)
2)配伍组设计的多样本匀数的比较:从总变异多分离出(增设)配伍组变异(即扣除了配伍组对总变异的影响),因而可提高实验效率。处理上和组间变异类似,只是行列关系互换而已。
多样本匀数间的两两比较,又称多重比较。方法较多,常用q检验。
SS-total=SS-treat+SS-btw-maching+SS-inner;
one more group than 1)'s handling; except SS-inner, all others
minus [C] value to get SS-...
comapring F1, if no
difference, combine it to get MS-error, and finally for
F=MS-treat/MS-error;
大鼠经五种染尘湿肺重比较:
march |
Contro |
|
|
A |
|
|
B |
|
|
C |
|
|
D |
|
|
Sum |
|
|
|
|
|
X^2 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
1.4 |
1.96 |
|
4.1 |
16.81 |
|
1.9 |
3.61 |
|
1.8 |
3.24 |
|
2 |
4 |
|
11.2 |
125.44 |
|
|
2 |
1.5 |
2.25 |
|
3.6 |
12.96 |
|
1.9 |
3.61 |
|
2.3 |
5.29 |
|
2.3 |
5.29 |
|
11.6 |
134.56 |
|
|
3 |
1.5 |
2.25 |
|
4.3 |
18.49 |
|
2.1 |
4.41 |
|
2.3 |
5.29 |
|
2.4 |
5.76 |
|
12.6 |
158.76 |
|
|
4 |
1.8 |
3.24 |
|
3.3 |
10.89 |
|
2.4 |
5.76 |
|
2.5 |
6.25 |
|
2.6 |
6.76 |
|
12.6 |
158.76 |
|
|
5 |
1.5 |
2.25 |
|
4.2 |
17.64 |
|
1.8 |
3.24 |
|
1.8 |
3.24 |
|
2.6 |
6.76 |
|
11.9 |
141.61 |
|
|
6 |
1.5 |
2.25 |
|
3.3 |
10.89 |
|
1.7 |
2.89 |
|
2.4 |
5.76 |
|
2.1 |
4.41 |
|
11 |
121 |
|
|
Sum |
9.2 |
84.64 |
|
22.8 |
519.84 |
|
11.8 |
139.24 |
|
13.1 |
171.61 |
|
14 |
196 |
|
70.9 |
5026.81 |
|
|
|
|
14.2 |
|
|
87.68 |
|
|
23.52 |
|
|
29.07 |
|
|
32.98 |
|
|
|
840.13 |
|
N,k, |
6 |
|
|
6 |
|
|
6 |
|
|
6 |
|
|
6 |
|
|
30 |
167.5603 |
168.026 |
Vc=k-1=5-1=4 |
M |
1.533333 |
|
|
3.8 |
|
|
1.966667 |
|
|
2.183333 |
|
|
2.333333 |
|
|
2.363333 |
b=6 |
0.465667 |
5 in horental griop, but data are 6 of sum in cloum u |
|
|
14.2 |
|
|
87.68 |
|
|
23.52 |
|
|
29.07 |
|
|
32.98 |
187.45 |
187-167= |
19.88967 |
|
|
|
|
14.10667 |
|
|
86.64 |
|
|
23.20667 |
|
|
28.60167 |
|
|
32.66667 |
185.2217 |
185-167= |
17.66133 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
19.8-17.6 |
2.228333 |
1.762667 |
|
|
5 total
treatment with 6 group |
|
|
|
|
|
|
|
|
|
|
|
F= |
17.7/2.2 |
7.925804 |
|
|
|
SSsum: |
167.5603 |
|
Vsum=N-1=29 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
SStreat |
17.66133 |
|
V=k-1=5-1=4 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
SScontrol |
0.465667 |
|
V=b-1=6-1=5 |
|
|
|
SSerr=Sssum-Sstreat-Sscontrl |
1.762667 |
|
Verr=Vtotal-vtreat-Vcontrl=29-4-5=20 |
|
|
|
|
|
|
|
|
MStrat=Sstreat/Vtreat |
4.415333 |
|
|
|
|
MScotrl=SScontrl/Vcontrl |
0.093133 |
|
|
|
|
MSerr=SSerr/Verr |
|
0.088133 |
|
|
|
|
|
|
|
|
|
|
|
|
F=MScontrl/MSerr |
|
1.056732 |
|
V1=5,V2=20, F
value, P>0.05) |
|
|
|
|
|
|
|
|
|
it is not
refuse the Ho, Therefore, combine Sscontrl |
|
|
|
Sserr=0.466+1,762=2.228,
Verr=5+20=25 |
|
|
|
|
Mserr-combin=2.228/25=0.0891 |
|
|
|
|
F=Mstreat/Mserr=4.416/0.00891=49.562 |
|
|
|
|
|
V1=4,v2=25
p<0.01, a=0.05, accept H1 |
|
|
|
多个样本之间的二二比较(大鼠经五种染尘湿肺重的二二比较)
q (Newman-Keuls) testing methods
q (Newman-Keuls) testing methods |
|
|
|
|
|
|
|
|
q=|Xa-Xb|/Sa-Sb |
two mean/Sderr |
|
|
|
|
|
|
|
Sa-Sb=[(MSerr/2*(1/n1+1/n2)]^1/2 |
|
|
|
|
|
|
|
#1 |
|
#2 |
|
#3 |
|
#4 |
|
#5 |
|
I.e. mean |
3.8 |
|
2.333333 |
|
2.183333 |
|
1.966667 |
|
1.533333 |
|
#1 to #5 |
3.8-1,533= |
2.266667 |
(A) |
|
|
|
|
|
|
|
|
|
|Xa-Xb| |
|
|
|
|
|
|
|
|
M.Serr= |
0.088133 |
/n(6) |
0.014689 |
sqrt E40 |
0.121198 |
(B) |
|
|
|
|
(A)/(B)= |
18.70222 |
(q) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
t-test |
X1^2 |
(++X1)^2 |
|
X2^2 |
(++X2)^2 |
|
|
|
|
|
|
14.2 |
84.64 |
14.10667 |
87.68 |
519.84 |
86.64 |
SS |
0.113333 |
|
|
|
|
|
1.533333 |
|
|
3.8 |
MS |
2.266667 |
|
|
|
|
|
|
|
|
|
t= |
20 |
|
|
|
|
|
|
|
|
|
|
|
|
|
q值已如上面表得到,通过共10组间的二二分别比较(此处略),得出结论。
3) 多个方差的齐性检验(Bartlett):方差分析的应用条件是总体的方差相等,在方差分析前,常需进行多个方差的齐性检验。通过样本(理论上均来自正态分布)信息推断各总体方差是否相等。特别是在样本方差相差悬殊时,提醒需要注意这个问题。可有F检验,t'检验,Bartlett法。
X^2=2.3026{[lg(sumSSi/sum(ni-1)]sum(ni-1)-sum(ni-1)lgsi^2}/{1+1/3(k-1)[sum(1/(ni-1)-1/sum(ni-1)]},
v=k-1; si^2--^2 each of them; SSi--sum each of them
三组小白鼠注射同位素测得的脾蛋白放射性次数:
X^2 not same |
|
|
|
|
|
|
|
|
|
|
Bartett
Method |
A |
A^2 |
|
B |
B^2 |
|
C |
C^2 |
|
|
2.3026 value |
1 |
3.8 |
14.44 |
|
5.6 |
31.36 |
|
1.5 |
2.25 |
|
|
|
2 |
9 |
81 |
|
4 |
16 |
|
3.8 |
14.44 |
|
|
|
3 |
2.5 |
6.25 |
|
3 |
9 |
|
5.5 |
30.25 |
|
|
|
4 |
8.2 |
67.24 |
|
8 |
64 |
|
2 |
4 |
|
|
|
5 |
7.1 |
50.41 |
|
3.8 |
14.44 |
|
3 |
9 |
|
|
|
6 |
11 |
121 |
|
4 |
16 |
|
5.1 |
26.01 |
|
|
|
7 |
11.5 |
132.25 |
|
6.4 |
40.96 |
|
3.3 |
10.89 |
|
|
|
8 |
9 |
81 |
|
4.2 |
17.64 |
|
4 |
16 |
|
|
|
9 |
11 |
121 |
|
4 |
16 |
|
2.1 |
4.41 |
|
|
|
10 |
7.9 |
62.41 |
|
7 |
49 |
|
2.7 |
7.29 |
|
|
|
|
81 |
6561 |
|
50 |
2500 |
|
33 |
1089 |
|
|
|
sum |
656.1 |
737 |
|
250 |
274.4 |
|
108.9 |
124.54 |
|
sum |
|
n=10 |
SS |
8.988889 |
|
SS |
2.711111 |
|
SS |
1.737778 |
|
13.43778 |
|
|
lgSS |
0.953706 |
|
|
0.433147 |
|
|
0.239994 |
|
1.626848 |
|
X^2=2.3026(10-1)(3lg(13.4378/3)-1.6268/[1+(3+1)/(3(3)(1-1)=6.454 |
|
|
|
|
|
NO.
A
B C
1
3.8
5.6 1.5 ( should be
include 4 digits, but omit)
2
....
....
....
(v=3-1=2)
10
....
....
...
Sum:
si^2
8.98
2.71
1.73
13.4378
lgsi^2
0.95
0.43
0.24
1.6268
X^2=2.3026{[lg(sumSSi/sum(ni-1)]sum(ni-1)-sum(ni-1)lgsi^2}/{1+1/3(k-1)[sum(1/(ni-1)-1/sum(ni-1)]}=2.3026(10-1)[3lg(13.4378/3)-1.6268]/[1+(3+1)/[3(3)(10-1)]=6.454
变量变换:t检验和方差分析都要求各样本取自正态分布,各总体方差相同,如果不能满足这些条件,解决办法有二:
一是变量变换,对数变换(transformation
of logarithm),平方根变换(transformation
of square root)(Poisson分布,例如,放射性计数),百分数的平方根反正玄变换(transformation
of inverse sine)(总体百分数较小,<30%;或较大,>70%,便离正态更为明显,通过p的平方根反正玄变换,使资料接近正态分布,达到方差齐性要求)
对数变换(三组发汞含量):
n not same |
A |
|
|
B |
|
|
C |
|
|
|
|
1 |
0.61 |
0.3721 |
|
0.79 |
0.6241 |
|
0.62 |
0.3844 |
|
|
|
2 |
0.72 |
0.5184 |
|
1.01 |
1.0201 |
|
1.01 |
1.0201 |
|
|
|
3 |
0.72 |
0.5184 |
|
1.24 |
1.5376 |
|
1.18 |
1.3924 |
|
|
|
4 |
1.02 |
1.0404 |
|
1.41 |
1.9881 |
|
1.47 |
2.1609 |
|
|
|
5 |
0.88 |
0.7744 |
|
1.54 |
2.3716 |
|
1.35 |
1.8225 |
|
|
|
6 |
1.36 |
1.8496 |
|
1.75 |
3.0625 |
|
1.52 |
2.3104 |
|
|
|
7 |
1.44 |
2.0736 |
|
1.94 |
3.7636 |
|
1.81 |
3.2761 |
|
|
|
8 |
1.9 |
3.61 |
|
2.54 |
6.4516 |
|
1.86 |
3.4596 |
|
|
|
9 |
1.82 |
3.3124 |
|
3.05 |
9.3025 |
|
2.37 |
5.6169 |
|
|
|
10 |
2.16 |
4.6656 |
|
|
|
|
2.76 |
7.6176 |
|
|
|
|
12.63 |
159.5169 |
|
15.27 |
233.1729 |
|
15.95 |
254.4025 |
|
sum |
|
Sum |
15.95169 |
18.7349 |
|
25.9081 |
30.1217 |
|
25.44025 |
29.0609 |
|
26 |
|
n=26 |
10 |
2.78321 |
|
9 |
4.2136 |
|
10 |
3.62065 |
|
10.61746 |
|
|
SS |
0.309246 |
|
SS |
0.5267 |
|
SS |
0.402294 |
|
1.23824 |
|
(ni-1)x |
lgSS |
-4.58727 |
|
|
-2.22749 |
|
|
-3.5591 |
|
-10.3739 |
|
1/(ni-1) |
|
0.111111 |
|
|
0.125 |
|
|
0.111111 |
|
0.347222 |
|
X^2=2.3026[26lg(10.6064/26)-(-10.3835)]/{1+1/3(3-1)[0.3472-1/26]}=0.567 |
|
|
v=3-1=2 |
X^2 value |
|
二是秩和检验(非参数统计法)。
【正态性检验
(test of normality)】
目测法;累计频率点在正态概率纸上
矩法method
of moment(coefficient of skewnesspian
偏度系数
coefficient of kurtosis 峰度系数);
Using middle of u=0, ^3 and ^4 to get the [skewness偏态;
正态峰mesokurtosis;
尖哨峰leptokurtosis;
平阔峰platykurtosis] for
understanding.
D检验(顺序统计量D作正态性检验,用D界值表);
D=sum[(n+1)/2-i][Xi...-Xi]/(n^3[sumX^2-(sumX)^2/n]^1/2
( i for x ranking from small to large)
(Note: the methods are only for testing the
distrubution)
【相对数
relative number 】
两个有联系的指标之比(绝对数说明事物状态之一,至于是否就严重或原因重大,靠相对数);注意:比不是率(不能以比代率),相对数一般分母不宜过小,率一般不是直接相加关系,资料是否有可比性,样本率(或构成比)的比较应遵守随机抽样,要作假设检验。
常用:rate(率);constituent
ratio(构成比);
relative
ratio(相对比);
dynamic
series(动态数列);例如,发展速度,增长速度;定基(选一个特定时期为基数);环基(选上一个时期为基数
不能直接比较的率(例如年龄构成因素的影响时),用standardization标准化法;
标准化率standarized
rate/adjustment rate; 直接法(由标准人口构成比计算);间接法(由标准率计算
,i.g. 标准年龄死亡率)。
『计数资料(离散型资料)』的统计推断
【二项分布及其应用】
二项分布binominal
distribution: 二项分布的概率函数;
分布函数; 分布图形;
匀数与标准差。
应用:总体率的区间分布;样本率与总体率比较;二样本率比较的u检验。
步骤:计算概率;概率当频数;求频数(SUM
F)和及平方和(SUM
F(X)^2;
Independent and
no-relationship events; i.g. rate of death: 0.8; then the rate
of live would be 0.2 (1-0.8); if A, B, C, 3 of them for observing,
the chance would be:
[(1-p)+p]^n=[(1-0.2)+0.2]^3=(0.8+0.2)^3=0.2^3+3(0.2)^2(0.8)+3(0.8)^2(0.2)+0.8^3=0.008+0.0096+0.384+0.512=1.
If the number is large, the
rate (frequency) could be considered as U-type of distribution
for the statistic analysis.
P(x)=[(n!/(X!(n-X)!][(1-p)^n-x]
[P^x], x=1,2,...,n i.g. n=no of
sample; x=rate of frequency;
三只小白鼠的生存概率(作频数量考量),匀数,标准差
10 samples and ask how the
chance of 8 with rate of 20%:
P(8)=[10!/(8!(10-8)!][0.8)^10-8][0.2^8]
X
f=P(x) fx
f(x^2)
0
0.008
0
0
1
0.096
0.096 0.096
2
0.384
0.768 1.536
3
0.512
1.536 4.608
sum
1.000
2.400 6.240
Mean=sum fx / sum f =
2.400/1=2.4 ; Standard S={[sum f(x^2)-(sum fx)^2/sum
f]/sum f} ^1/2 = [6.24-(2.4)^2/1]/1=0.69; if rate,
S,=[p(1-p)/n]^1/2
应用:
1) 总体率的区间分布;find
the vale from the table; (p-UaSp, p+UaSp)
10献血员HB阴性8人,HB阴性率可信限区间?
n=10; X=8;
X>n/2; x=10-8=2, from table, 3-56(+可信限区间);
100-3=97; 100-56=44; 95%可信限区间44-97%
2) 样本率与总体率比较;
if the p is far from the 0.5 and the positive rate quite small
(i.e. not common diseases), it could be directly calculate by
the formula.
(1)新生儿染色体异常1%,某医院400名单只有一例异常,是否低于一般?
p=0.01;
1-p=0.99; n=400;ask not over 1%; Ho: p=0.01; H1: p<0.01
P=sum[p(X)]=(0.99)^400+[400!/1!(400-1)!](0.99)^(400-1)*(0.01)=0.0905;
a=0.05, not refuse Ho
(2) 正态近似法(n>50):胃溃疡出血20%;某医院65岁以上304例,31.6%胃溃疡出血,是否高?u=(0.316-0.20)/[(0.2(1-0.2)/304]^1/2=5.06
单侧:高
3) 二样本率比较的u检验;
u=(p1-p2)/[Pc(1-Pc)(1/n1+1/n2)],
Pc=(X1+X2)/(n1+n2), u-table for value
男生80,感染23,感染率28.75%;
女生85,感染13,感染率15.29%;感染率有无差别?
Pc=(23+13)/(80+85)=0.2182;
u=(0.2875-0.1529)/[0.2182(1-0.2182)(1/80+1/85)]^1/2=2.0921;
0.05>p>0.02, a=0.05, refuse Ho
【Poisson分布及其应用】
二项分布的特例:概率小(<0.05),每次观察单位数n很大,离散形分布。(i.g.
distribution of bacteria) 分子扩散型
概率函数p(X);
P(x)=e^-r[r^x/X!]; 展开式(容易计算些):
P(0)=e^-r(0!=1); P(X=1)=p(x) [r/(x+1)],
x=1.2.... r=mean of population.
分布图形;r
is higher, i.g. r=20, close to normal distribution.
分布特性和应用条件。离散型分布,适用于计数资料。单位时间,单位面积或容积的观察类事件。if
there are mutual effectiveness between the events, i.g.
infectious diseases or
不均匀的细菌培养,NOT
IN the scope of APPLICATION. Similar
with 二项分布,
but when P is small and n is large, they are much more close.
easy to calculation.
应用:
1)总体匀数的估计;
X+-Ua(X)^1/2 X=Mean of sample
(1)面积100CM的培养皿,一小时取样,24小时得8个菌落,求该室每100CM的细菌菌落个数的95%可信区间?
X=8(sample),
from table, r 95%: 3.4--15.8; 95%可信区间
3.4--15.8个/100cmm
2)样本匀数与总体匀数比较;
By formula: P(0)=e^-r; P(X=1)=p(x) [r/(x+1)],
x=1.2.... r=mean of population.
(1) i.g.
疫苗严重反应率1/1000,现有150人,有2人严重反应,问是否高于一般。
思路:假设从一般的反应率(1/1000)的人口中随机抽样150例,严重反应的概率小于或等于2是多少?
Ho:r=r0(假设一样或不高于正常人群);
P(0)=e^-0.15[0.15^2/0!=0.8607;
P(1)=e^-0.15[0.15^2/1!=0.1291;
P=1-[P(0)+P(1)]=1-(0.8607+0.1291)=0.0102; p=0.0102,
a=0.05, refuse Ho.
(2) 正态近似法:u=(X-r0)/(r0)^1/2
;When X>=20, ok.
某病年死亡率7.58/10万;作3年回顾调查,得29人死亡,该人口年龄结构与总体无区别,问死亡率有无差别?H0:
u=u0; u=(29-7.58*3)/(7.58*3)^1/2=1.3127.
0.20>P>0.10, a=0.05, accept Ho
3)二样本率比较的
u 检验。
When X>=20, ok.
A和B各10样本,各样本中取1ML培养,A菌落890个,A菌落785个,菌落有无差别?
u==(X1-X2)/(X1-X2)^1/2=(890-785)/(890+785)^1/2
【X^2检验
chi-square
test】
用途广。现介绍三:检验推断两个或两个以上样本率(或构成比)之间有无差异?两因素间有无相关?检验频数分布的拟合度。
1) 四格表;行X列表资料;配对资料;频数分布拟合优度;四格表的确切概率法。
二组注射对比:
X^2 = SUM [(A-T)^2/T]; T=Theoretical
frequency; A=Actual frequency;
Chi-square test |
|
|
|
|
|
|
|
|
|
|
|
|
|
A+ |
|
B- |
|
SUM |
A+/SUM |
|
|
|
|
|
|
|
a check |
52 |
57.17699 |
19 |
13.82300885 |
71 |
0.732394 |
|
|
|
|
|
|
|
b compare |
39 |
33.82301 |
3 |
8.17699115 |
42 |
0.928571 |
|
|
|
|
|
|
|
Sum |
91 |
|
22 |
|
113 |
|
|
|
|
|
|
|
|
Theory |
0.80530973 |
+rate |
0.19469 |
-rate |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
X2=Sum(A-T)^2/T |
|
|
|
|
|
x^2=(52-57.18)^2/57.18+(19-13.82)^2/13.82+39-33.82)^2/33.82+3-8.18)^2/8.18=6.48 |
or |
|
|
|
|
|
|
=[(52*3-19*39)^2*113]/(71*42*19*22)=6.48 |
|
|
|
correction
for continuity: |
|
|
|
|
1: -o.5 |
upper calc. |
|
|
|
|
|
Treat
(+)
(-)
sum
T-1
52(57.18)
(a)
19(13.82)
(b)
71
(T-no: 57.18=71*91/113; 13.82=71*22/113)
T-0
39(33.82)
(c)
3(8.18)
(d)
42
(T-no: 33.82=42*91/113; 8.18=42*22/113)
sum
91
22
113
X^2=[(52-57.18)^2/57.18+(19-13.82)^2/13.82+(39-33.82)^2/33.82+(3-8.18)^2/8.18]=6.48;
0.025>p>0.01, a=0.05, (p<a), refuse Ho, accept H1
or X^2=[(ad-bc)^2*n]/[(a+b)*(c+d)*(a+c)*(b+d)]
(1) 1<T<5, n>40; (2) T<1 or
n<40; need a correction for formula: X^2
= SUM [(|A-T|-0.5)^2/T]; X^2=[(|ad-bc|-n/2)^2*n]/[(a+b)*(c+d)*(a+c)*(b+d)]
Basically, it assumed there
were no difference. When theoretical frequency is same as actual
frequency, then Ho should be accepted. However, it is difficult
to get the real data for T-0, therefore, put some kind of data
as T-0 supposed for comparing with T-1, if difference, it
could be tested. Simply, it is used the total number as
standard to adjusted the number (rate) for comparing. To sum
all of each difference to get the test result.
2) More than 2x2:两个以上样本率(或构成比)之间有无差异
三个地区黄曲霉素情况:
mult row/clm |
A- |
|
B+ |
|
Sum |
+Rate
B+/Sum |
|
a |
6 |
|
23 |
|
29 |
0.793103 |
|
|
b |
30 |
|
14 |
|
44 |
0.318182 |
|
|
c |
8 |
|
3 |
|
11 |
0.272727 |
|
|
|
|
|
|
|
|
|
|
|
sum |
44 |
|
40 |
|
84 |
|
|
|
|
|
|
|
|
|
|
|
|
rate |
0.52380952 |
|
0.47619 |
|
v=(3-1)(2-1)=2 |
|
|
|
|
|
|
|
|
|
|
|
|
x^2=84(6*6/29*44+23*23/29*44+30*30/44*44+14*14/44*40+8*8/11*44+3*3/11*40-1)=17.907 |
X^2=n*[sum (A^2)/(nR*nC)-1]
又例如,二组病例的患者型(4型),问构成比是否有差别?不同时期病变(X光改变)有无关系?
注意事项:理论频数不宜过小;多个样本率(或构成比)的检验,只能说总体之间有差别,而不能说明彼此之间都有差别;如果是效应类的(+,
++, +++等等),宜用秩和检验。
3) 非X^2检验的类似四格表法(exact
probabilities in 2x2 table):
四格表的确切概率法
When T<1 or n<40,
this method is suitable. It is a method for early period since
the computer not so popular or available.
P=[(a+b)!(c+d)!(a+c)!(b+d)!]/[a!b!c!d!n!]
步骤:用四格表的合计部分分别列出可能性的四格表法;每个四格表按公式算出P,把所有的P加起来(大于或等于|A-T|的),然后P表求概率。
4)判断是否适合于X^2,二项式分布,POISSON分布的频数分布拟合优度的X^2检验(good-of-fit
test)
频数分布拟合优度的X^2检验
步骤:把实际频数(A)得出和(总数TOTAL),然后把这个总数按概率P(X)分布,得出理论频数(实际频数X概率),再用(A-T)^2/T得出(分别相加得和)相当于是(X^2)的值。
X |
A(sum(f) |
X*A |
r/X |
P(x) |
Acum P(x) |
T=n*P(x) |
(A-T)^2/T |
0 |
26 |
0 |
0 |
0.08291 |
0.08291 |
24.873 |
0.051065 |
1 |
51 |
51 |
2.49 |
0.206446 |
0.289356 |
61.9338 |
1.930254 |
2 |
84 |
168 |
1.245 |
0.257025 |
0.546381 |
77.1075 |
0.616108 |
3 |
70 |
210 |
0.83 |
0.213331 |
0.759712 |
63.9993 |
0.562637 |
4 |
42 |
168 |
0.6225 |
0.132798 |
0.89251 |
39.8394 |
0.117175 |
5 |
15 |
75 |
0.498 |
0.066134 |
0.958644 |
19.8402 |
1.180811 |
6 |
9 |
54 |
0.415 |
0.027445 |
0.986089 |
8.2335 |
0.071358 |
7 |
3 |
21 |
0.355714 |
0.013911 |
1 |
4.1733 |
0.329867 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
28 |
300 |
747 |
558009 |
1 |
5.515602 |
300 |
4.859275 |
|
|
|
1860.03 |
|
0.01838534 |
1 |
|
n=300 |
Mean |
2.49 |
r |
|
|
|
|
【秩和检验rank
sum test】
t检验和
F检验都是假定正态分布,推断二个或多个总体匀数(系正态总体参数)是否相等,称为参数统计parametric
statistics。
秩和检验--非参数统计nonparametric
statistics,不依赖于总体分布的具体形式,有广泛适应性。
注:适合参数统计处理的资料,若用非参数统计处理,常损失部分信息,降低效率。
1) 配对比较的符号
(T界表);
步骤: Ho,H1,a; =>
Difference in Value; => Arrangement from Small to large
=> Absolutely value for numbering; same value marked as two
number (i.g. -6, -6 for No.5 and N0.6; if +6 and -6,
marked as (|-6|+6)=12, 12/2=6; "0" has no number for
marking) => Obtained the total for negative value and the
totoal of positive value, slect the small one as T value
宇航员心率前后比较,得差值,以绝对值小的为T,当n<=50,查表法;n>50,u检验。
(1) => if n<=50, from the table to
find the T value; find the p, a=0.05 or a=0.01 for checking
(2) if n>50,
u=[(|T-n(n-1)/4|-0.5)]/[(n(n+1)(2n+1)/24)]^1/2; The similar
with u-distribution.
If the two negative and positive
number is close or equal, making an adjustment:u=[(|T-n(n-1)/4|-0.5)]/[(n(n+1)(2n+1)/24)-sum(tj^3-tj)/48]^1/2;
2) 配伍设计的多样本比较(另一种秩和检验方法,M界表);
ABC三个黄豆芽VIT-C含量抽查:
Date
A
B
C
Feb 11.4(3)
5.8(2)
3.5(1)
(No. by order)
Apr 6.4(1)
8.6(3)
7.5(2)
Jun
...
...
...
Aug ...
...
...
Oct
...
...
...
Dec
...
...
...
Ri
14
12
10
(total of No.)
R-avrg
12
12
12
(14+12+10)/3 average M
(Ri-R)^2
2^2
0
(-2)^2 =8 (M value)
(1) M value from M table
(2) X^2 method: Xr^2=12M/[bk(k+1)] (average
M), X^2 table for value, k=3-1, b=6 (Feb-Dec); if the
observing number with a lot of the same value and large n, the
adjustment is need. Xr^2 (divided by) Xr^2=Xr^2 /
{1-sum[(ti^2-tj)]/[bk(k^2-1)]}
3) 二样本比较;
有二组数据可以分别编秩,用编秩之间的小样本求T值(有n1和n2二个参数查表)。
二组血铅比值:
Sample
A
Sample B
a a' b
b'
5
(1)
17
(9)
(No. by order)
5
(2)
18 (10.5)
... ...
...
...
... ...
... ...
Sum
ni=10 Ti=59.5
nj=7
Tj=93.5
步骤:
Ho,H1,a; => like table above to get the small n for T value
(1) if ni<=20 and ni-nj<=10;
T table for the value
(2) if not as (1) above,
u={[|T1-n1(N+1)/2]-0.5}/{[n1n2(N+1)/12]^1/2}
N=n1+n2; get the u value from the u table. {n1(N+1)/2}为平均秩和,而T为样本秩和。
if a lot of same
order number, an adjustment is needed.
u={[|T1-n1(N+1)/2]-0.5}/{[n1n2/12N(N-1)][(N^3-N-sum(ti^2-tj)]}^1/2
4)频数表资料或等级资料(无明显界限的一类资料,+,++)
一般取样本量小的一组的为T值,样本量相同,则取小的T为T值。
例如,二组肝炎血清胆红素的比较:
原数据分组段(一个栏a),比较组(二个组)的实际数按原数据分组段分别(分别列成二个栏b,c),并同组段相加(为一栏d),再得出秩的相应分段分布(一个栏e),按秩的相应分段上下值得出秩的组中值(一个栏f),秩和(二组各一栏);最低部分别标出ni
nj Ti Tj Using the formula to get u value.
4)多样本间比较(H-test);
H=[12/N(N+1)]*[sum (Ri^2/ni)-3(N+1)],
if large n and a lot of the same order number. an adjustment:
/[1-sum(tj^3-tj)/(N^3-n)]; from X^2table to get value.
可以用方差检验,也可以用H检验。
H=[12/(N(N+1)] * [sum(Ri^2/ni)-3(N+1)]/[1-sum(tj^3-tj)/(N^3-N)];
[1-sum(tj^3-tj)/(N^3-N)]为调整项,用于相同观察值比较多时。
5)多样本间两两比较();
Based on H-test above, for further t-test
between the two samples.
t=(Ra-Rb)/{[N(N+1)(N-1-H)/12(n-k)]*[(1/na+1/nb)]}
Ra=Ra/na ; Rb=Rb/nb; average order number;t value from t table
【直线回归与相关】
『直线回归liner
regression:两个变量之间的关系』
I 回归
1)To get: Y=a+bX
例如,年龄或IgG浓度(自变量,independent
variable,作X轴)与血压或IgG浓度表现出的环圈大小(应变量,dependent
variable,作Y轴)
区别于一般函数而称为直线回归方程: y=a+bX;
a截距(intercept);
b回归系数(regression
coefficient),即斜率(slope)。
b=sum(X-x)(Y-y)/sum(x-x)^2=Lxy/Lxx,
a=y-bx; x, y,=mean; Lxx=X离均差平方和;
Lxy ( X and Y 离均差积和)
=sum[(X-x)(Y-y)]=sumXY-(sumX)(sumY)/n
|
a(original) |
b |
|
c(original) |
d |
|
e |
|
f |
|
g |
|
g^2 |
|
IgG(X) |
X^2 |
|
沉淀环直径(Y) |
Y^2 |
|
X*Y |
|
Y Liner |
|
Y-Y(变量的一部分) |
|
|
1 |
1 |
|
4 |
16 |
|
4 |
|
4.14 |
|
-0.14 |
|
0.0196 |
|
2 |
4 |
|
5.5 |
30.25 |
|
11 |
|
5.26 |
|
0.24 |
|
0.0576 |
|
3 |
9 |
|
6.2 |
38.44 |
|
18.6 |
|
6.38 |
|
-0.18 |
|
0.0324 |
|
4 |
16 |
|
7.7 |
59.29 |
|
30.8 |
|
7.5 |
|
0.2 |
|
0.04 |
|
5 |
25 |
|
8.5 |
72.25 |
|
42.5 |
|
8.62 |
|
-0.12 |
|
0.0144 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
sum |
15 |
55 |
|
31.9 |
216.23 |
|
106.9 |
|
31.9 |
|
-1.8E-15 |
|
0.164 |
|
225 |
|
|
1017.61 |
|
|
|
|
|
|
|
|
|
n=5 |
45 |
|
|
203.522 |
|
|
|
Y=3.02+1.12X= |
Y-Y-liner= |
|
|
|
mean |
3 |
|
|
6.38 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Lxx=sum(X-Xmean)^2=Sum(X^2)-(sumX)^2/n= |
55-(15)^2/5= |
55-45=10 |
|
|
|
|
|
|
|
|
Lyy=sum(Y-Ymean)^2=Sum(Y^2)-(sumY)^2/n= |
216.23-203.52= |
12.708 |
|
|
|
|
|
|
Lxy=106.9-(15)(31.9)/5=11.20 |
|
|
|
|
11.2 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
b=11.20/10=1.12 |
|
|
|
|
|
|
|
|
|
|
|
|
a=6.38-1.12(3)=3.02 |
|
|
SS-total=sum[(Y-Y(m)]^2=12.708 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
SS-regression=(11.2)^2/10=12.544 |
|
SS-residiual=SS-total
- SS -regression=12.708-12.544=0.164 |
|
|
|
|
|
Y=a+bX=3.02+1.12X
[It may estimate there are
the relationship of the liner regression. To get sum(X), sum(Y),
sum(X^2) and sum (Y^2), furthur to get sum(XY)]
2) 回归系数的假设检验:方差分析或t检验
应用:
两个变量之间的依存关系
;(例如,IgG vs. 环圈大小);先求方程中的a和b,列出方程,将原始数据(X)代入方程得一系列值(方程解Y),求出原始数据的(Y)与方程解)之差,得差和及其方差和。F或t检验。
进行预测forecast
(回归方程的重要方面。例如,乙脑发病率(预报量Y)与日照时间的光线,将发病率作平方根反正玄变换,在已知现有日照时间(预报因子X),预测乙脑发病率,附可信区间);
统计控制statistical
control (例如,一定关系下控制汽车流量,
一种逆运算);以Y为最大控制量,Y=Y(与X相对应的量,可用线性方程代入)+-检验水准的值*标准误(Lxy),
即一个Y的最大符合a=0.05的值逆向解出X值。
步骤:
(1) Take the two number of X (smaller one and
lager one, i.g. x1=1,y1=4.14; x2=5,y2=8.62) to draw the liner
regression.
To check: the liner regression must included
the mean of X and Y; and the extend the line would be a.
(2) To test:
SS-total sum of square (Lyy, Y离均差平方和,总平方和,
sum[(y-y(m)]^2)
=SS-regression sum of square {[Ly(y变量的一部分)-y(mean)],
总平方和中可用X解释的部分,
sum [(y(变部)-y(m)}^2}
+SS-residual sum of square {Ly-y(y变量的一部分),
sum[(y-y(变部)]^2
;
Vtotal=Vregr+Vremain; Vt=n-1, Vreg=1,
Vrem=n-2
SS-total=sum[(Y-Y(m)]^2=12.708
SS-regression=(11.2)^2/10=12.544
SS-residiual=SS-total - SS
-regression=12.708-12.544=0.164
(1)F=(SSreg/Vreg)/(SSrem/Vrem); t=(b-0)/Sb=b/(Syx/Lxx^1/2);
Syx=sum(Y-y)^2/(n-2)=[SSrem/(n-2)]^1/2; v-reg=1, v-residual=3,
p<0.01, a=0.05
n=5; v-total=5-1=4; v-regression=1,
v-residual=3, F=[(12.544/1)/(0.164/3)]=229.32; p<0.01
(2) t=(b-0)/Sb=b/[(Sxy/Lxx^1/2)]=1.12/[0.2338/(10^1/2)]=15.149;
Sxy=[0.164/(5-2)]^1/2=0.2338 v=3 the p<0.001 a=0.05
t=(F)^1/2=(229.23)^1/2=15.143, same
results even with F, p<0.01 and t, p<0.001
『直线相关liner
correlation, simple correlation:两个变量之间
bivariate normal distribution 是否有直线相关的关系』
II 回归
正相关positive
correlation;负相关negative
correlation;相关程度degree
of relationship。直线相关系数的假设检验常用t检验。
1) positive correlation;
OR negative correlation;
注意:得有实际意义;先作散点图判断趋势;相关不一定是因果关系,也可是伴随关系。
2) 直线相关系数
correlation coefficient
r={[sum(X-Xm)(Y-Ym)]/[(sum(X-Xm)^2*sum(Y-Ym)^2]^1/2}=Lxy/(lxx*lyy)^1/2
; r>0, positive correlation; r<0, negative
correlation; r=1, totally correlation;
t=(r-0)/Sy=r/{[(1-r^2)/(n-2)]^1/2}
3) r have the same analysis
value of b, but easy to calculate.
SS-regression=r^2*SS-total; r^2, coefficient of determination.
r^2=(0.2)^2=0.04, it's meaning SS-reg only have 4% of
SS-total, no actual regression significance.
3)
等级相关 rank correlation
Spearman method:
r-s=1-6Sum(d)^2/n(n^2-1); d为每对观察值X
Y秩次之差,n对子数
rank correlation |
|
|
|
|
|
|
|
|
|
|
|
|
X |
rank x |
|
Y |
rank y |
|
d=Xr-Yr |
d^2 |
|
|
|
1 |
0.7 |
1 |
|
21.5 |
3 |
|
-2 |
4 |
|
|
|
2 |
1 |
2 |
|
18.9 |
2 |
|
0 |
0 |
|
|
|
3 |
1.7 |
3 |
|
14.4 |
1 |
|
2 |
4 |
|
|
|
4 |
3.7 |
4 |
|
46.5 |
7 |
|
-3 |
9 |
|
|
|
5 |
4 |
5 |
|
27.3 |
4 |
|
1 |
1 |
|
|
|
6 |
5.1 |
6 |
|
64.6 |
9 |
|
-3 |
9 |
|
|
|
7 |
5.5 |
7 |
|
46.3 |
6 |
|
1 |
1 |
|
|
|
8 |
5.7 |
8 |
|
34.2 |
5 |
|
3 |
9 |
|
|
|
9 |
5.9 |
9 |
|
77.6 |
10 |
|
-1 |
1 |
|
|
|
10 |
10 |
10 |
|
55.1 |
8 |
|
2 |
4 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
sum= |
42 |
|
|
|
r-s=1-6(42)/10(10^2-1)=0.745 |
n=10 and r-s
Table. P=0.02, a=0.05, refuse Ho and accept H1(positive
correlation) |
|
|
if there are
too many tie rank needed an adjustment.
r'-s={[(n^3-n)/6}-(Tx+Ty)-sum(d)^2| /
[(n^3-n)/6-2T)^1/2]*[(n^3-n)/6-2Ty]^1/2 |
4) When using, must be with
actual meaning between X and Y; Before analysis, make or draw
a figure; May not be the factors of course
5) 曲线直线化
(1)曲线拟合:重要手段之一。一些非直线化相关的因素,用直线回归不足以恰当解释,而可以通过适当变量变换关系而仍然采用直线回归来表达。用直线回归求出b,a值,然后,还原为曲线方程。
Tips: variable
transformation to make rectification -> curve
fitting;
步骤:Select
a type of curve, i.g. exponent curve; => Change the Y
(Y=A*(B)^xto lgY (lgY=a+bX);=>
最小二乘法求出方程;=>将直线华方程转换为曲线式,作曲线图;(关键:求出X-mean,
Y-mean, sum(X), sum(X)^2, sum(Y) sum(Y)^2
最小二乘法原理:直线能保证各实测点至直线的纵向距离的平方和为最小。
i.g. rank
correlation above,
curve fitting |
X-distance |
X^2 |
|
Y-density |
|
lgY |
Lxy=X*lgY |
|
0.9906^x |
Y-from regression |
|
|
50 |
2500 |
|
0.687 |
|
-0.16304 |
-8.15216 |
|
0.9361 |
0.623614 |
0.583766 |
|
|
100 |
10000 |
|
0.398 |
|
-0.40012 |
-40.0117 |
|
0.9361 |
0.388895 |
0.364045 |
|
|
150 |
22500 |
|
0.2 |
|
-0.69897 |
-104.846 |
|
0.9361 |
0.242521 |
0.227023 |
|
|
200 |
40000 |
|
0.121 |
|
-0.91721 |
-183.443 |
|
0.9361 |
0.151239 |
0.141575 |
|
|
250 |
62500 |
|
0.09 |
|
-1.04576 |
-261.439 |
|
0.9361 |
0.094315 |
0.088288 |
|
|
300 |
90000 |
|
0.05 |
|
-1.30103 |
-390.309 |
|
0.9361 |
0.058816 |
0.055058 |
|
|
400 |
160000 |
|
0.02 |
|
-1.69897 |
-679.588 |
|
0.9361 |
0.022873 |
0.021412 |
|
|
500 |
250000 |
|
0.01 |
|
-2 |
-1000 |
|
0.9361 |
0.008895 |
0.008327 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
sum |
1950 |
637500 |
|
1.576 |
|
-8.2251 |
-2667.79 |
|
|
|
|
|
n=8 |
243.75 |
3802500 |
(1950)^2 |
0.197 |
|
-1.02814 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
X-mean=sum(X)/n=1950/8=243.75 |
|
Lxy=sum(X-X-mean)*(sum(Y-Y-mean)=sum(XY)-sum(x)*sum(Y)/n |
|
|
|
Y-mean=sum(Y)/n=-8.2251/8=-1.0281 |
-384.15 |
|
|
|
|
|
|
|
|
Lxx=sum(X^2)-(sumX)^2/n=637500-(1950)^2/8=162187.5 |
162187.5 |
|
|
|
|
|
|
|
Lxy=sum(Xy-(sum(X)*(sum(Y)/n=-2667.7887-(1950)(-8.2251)/8=-662.92 |
instead of
Y-density of original data, using lgY |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
b=Lxy/Lxx=-662.92/162187.5=-0.0041 |
|
|
|
|
|
|
|
|
|
a=lgY-(mean)-bX-(mean)=-1.02814-(-0.0041)(243.75)=-0.0287 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Y=a-bX=-0.0287-(0.0041)(X) |
|
|
|
|
|
|
|
|
|
|
lgY=-(0.0287+0.0041X) |
|
or |
Y=10^-(0.0287+0.0041X)=0.9361(0.9906)^x |
=POWER(0.9906,X) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Usin X and
Y-from regression to draw a figure. |
|
|
|
|
|
|
|
(2)直接使用变量变换的直线回归:例如,正态概率纸目测法,就是将百分数(亦称频率)作概率单位变换,使S形曲线(即累计频率曲线)直线化,用于正态检验。又如,卫生检验工作中,若两变量呈曲线趋势,常用直线化回归方程,绘制标准线(工作曲线)。
【多元线性回归与相关
multistage linear regression】
一个应变量与多个自变量的关系
应用:某些因素与某一现象的数量关系(气温,湿度与发病率);进行因素分析(病因中相对重要的因素);进行预测forecast和统计控制statistical
control。
Y=bo+b1X1+b2X2+...bnXn;
b1,b2, partial regression coefficient;
例如,肺活量与身高,体重的关系;
(矩阵法得方程)
步骤:
(1) 确定系数,建立方程。
(2)
检验相关:多元回归方程的线性如若F检验不拒绝H0,应进一步检验Y与每个自变量是否有线性回归关系,还是仅与其中部分自变量有线性关系。如若检验与某自变量无关,则进一步求系数建立新回归方程并作新回归系数的检验。假设检验:
F=[SS-regression/m]/[SS-residual/(n-m-1)
(3) 如若F检验不拒绝H0,应进一步检验Y与每个自变量是否有线性回归关系,还是仅与其中部分自变量有线性关系。偏回归系数的假设检验:
ti=(bi-Bi) / { [SS-residual/(n-m-1)}^1/2} *Cii^1/2; Cii
矩阵C的主对角线上第i行i列的元素。
(4)
如若检验与某自变量无关,则进一步求系数建立新回归方程并作新回归系数的
t 检验。
Multiply Liner Regression:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Syx=(2.5800/27)^1/2=
|
0.095556
|
0.309122
|
|
t=b/Syx/(Lxx)^1/2=0.0597/(0.3091/(857.1179)^1/2=5.6
|
29.27658
|
0.010559
|
5.65296
|
(t value)
|
t-table: P<0.001, a=0.05, refuse Ho, accept H1; Y has the
liner relatinship with X2
|
|
|
The new test for the new liner regression:
|
Ho: B=0; H1, B=/=0, a=0.05;
|
SS-total=
|
5.63362069
|
|
SS-regression=Lxy/Lxx=(51.1595)^2/857.1179=3.0536
|
3.05359692
|
|
SS-resideual=
|
2.580024
|
|
|
|
|
X2 regression:
|
Y=
|
-0.00917
|
+ |
0.0596878
|
X2
|
|
( if X1, regression:
Y=
|
-2.60854
|
+ |
0.031561
|
X1
|
X1: t-value=
|
3.78105715
|
) |
V-residual=n-2=29-2
|
27
|
|
|
|
Liner Regression:
|
Y=-0.0096+0.0597X2
|
<=
|
b=51.1595/857.1179=
|
0.059688
|
|
a=2.2069-(0.0597)(37.1276)=
|
-0.00917
|
|
|
|
|
|
|
|
|
|
|
|
(if X1 tested:
|
Y=-2.60854+0.031561X1
|
Lxx=L11=67706.4-(4424.7^2/29)=
|
1957.953103
|
Lxy=L1y=9826.25-(4424.7)(64/29)=
|
61.79483
|
X-m-X1m
|
152.5758621
|
b=61.79/1957
|
0.031561
|
a=
|
-2.60854
|
)
|
|
|
|
|
Lxx=L22=40832.39-(1076.7^2/29)=857.1179
|
857.11793
|
|
Lxy=L2y=2427.325-(1076.7)(64)/29=51.1595
|
51.15948
|
|
X-m=X2-m
|
37.12758621
|
|
Y-m=
|
2.206897
|
|
|
|
|
|
Renew the formula of the liner regression be excluding the X1:
n=
|
29
|
sum(X2)=
|
1076.7
|
|
sum(X2)^2
|
40832.39
|
|
sum(X2*Y)
|
2427.325
|
|
Y^2=
|
146.875
|
|
|
|
|
|
|
|
|
(if excluding X2: n=
|
|
)
|
sum(X1)=
|
4424.7
|
|
sum(X1)^2
|
677060.4
|
|
sum(X1*Y)
|
9826.65
|
|
Y^2=
|
146.875
|
)
|
|
|
|
|
|
|
t1, P>0.50; t2, 0.005>p>0.002;
a=0.05, refuse t1, accept t2; X2(kg) has the
relationship of regression, X1(cm) has no relationship of
regree\ssion. Then
, excluded the X1 to find the new liner regrssion.
|
X1: SS-regr
|
1.950302
|
|
|
|
|
t1=0.0050/0.3134*0.001137^1/2=0.47
|
0.474357391
|
0.010575
|
0.033716655
|
|
t2=0.0541/0.3134*0.002597^1/2=3.38
|
3.382248819
|
0.01598377
|
0.05096
|
|
|
X1: SS-resideual:
|
|
|
|
Sy.12..m=[SS-residual/(n-m-1)]^1/2=
|
0.1568281
|
|
|
|
|
|
SS-redsidual=
|
2.557887
|
0.098380259
|
0.31365628
|
|
|
|
0.136419
|
3.683318
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
X1^1/2
|
0.36935
|
44.24876
|
|
|
t={(bi-Bi)/[SS-residual/(n-m-1)]^1/2 * (Cii)^1/2]}=bi/Sy.12
m
(Cii)^1/2; Sy.12
m=[SS-residual/(n-m-1)]^1/2
|
|
|
t-testing for b1, b2; Ho:
B1=0, B2=0; H1: B1=/=0, B2=/=0; a=0.05
|
|
|
X1: Syx/Lx
|
0.008347
|
|
|
|
When H1 accept, it need to find the further relationship
between Y & X1 or Y & X2 respectively. Testing the
coefficient; Ho: B1=0, t-testing.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
F=(3.0800/2) / (2.5536/26) =
|
15.63187
|
|
Testing multiply liner regression
|
|
p<0.01, a=0.05; if
refuse Ho, or accept H1; indicatred the liner regression
significant and it is effective. (Ho B1=0;B2=0)
|
|
|
|
|
|
|
|
|
|
|
|
|
(Ho: b1=0; b2=0, b3=0
the whole or all regression
coefficient was assumed zero; it means no relationship of
regression)
|
|
|
|
SS-total=SS-regression+ SS-residual
|
SS-total=sum(Y-Y-m)^2=sum(Y)^2-[sum(Y)]^2/n=Lyy=
|
5.63362069
|
|
|
|
|
144.3171
|
141.2413793
|
3.07573395
|
|
|
|
|
|
|
|
|
|
|
SS-regression=sum(Y-change-Y-m)^2=b'*B-[sum(Y)]^2/n=
(-0.565664 0.005017
0.054061)(64 9836.65 2427.325)-(64)^2/29=
|
|
3.07573395
|
|
2:X1 & X2
|
V-regression=2
|
2
|
|
|
|
|
|
SS-residual=5.63362069-3.0800=
|
|
|
2.557886739
|
|
|
|
|
|
|
|
|
|
V-residual=29-2-1=26
|
26
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Y=-0.565664+0.0050X1+0.0541X2
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
b0
|
-0.56566
|
-0.56566
|
|
a1 b1 c1
|
|
|
|
|
A1 B1
C1
|
a1 b1
|
A1*a1+B1*a2+C1*a3
|
A1*b1+B1*b2+C1*b3
|
|
|
|
|
|
|
b=
|
b1 =A^-1*B
|
0.005017
|
0.005017
|
|
a2 b2 c2
|
C=A^-1=a1b2c3+a2b3c1+a3b1c2-a3b2c1-a2b1c3-a1b3c2
|
A2 B2
C2
|
a2 b2
|
A2*a1+B2*a2+C2*a3
|
A2*b1+B2*b2+C2*b3
|
|
|
|
|
|
|
|
b2
|
0.054061
|
0.054061
|
|
a3 b3 c3
|
0
|
|
|
|
|
x
|
a3 b3
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
29
|
4424.7
|
1076.7
|
|
|
|
64
|
|
|
|
15.63236
|
-0.12611
|
0.098131
|
|
|
|
|
|
|
|
|
|
A=
|
4424.7
|
677060.4
|
165239.8
|
|
|
B=
|
9826.65
|
|
|
C=A^-1=
|
-0.12611
|
0.001137
|
-0.00128
|
|
|
|
|
|
|
|
|
|
(X' * X)
|
1076.7
|
165239.8
|
40832.39
|
|
|
(X' * Y)
|
2427.325
|
|
|
(X' * X)^-1
|
0.098131
|
-0.00128
|
0.002597
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
341801577.34
|
2757383
|
2145643
|
|
#########
|
-2757383.373
|
2145643
|
|
#########
|
-2757383
|
2145643
|
|
9.91E+09
|
-12200594211
|
2310213475
|
|
|
|
|
|
|
|
|
2757383.373
|
24856.42
|
27879.71
|
|
-2757383.4
|
24856.42
|
-27879.7
|
|
-2757383.4
|
24856.42
|
-27879.7
|
|
|
|
|
|
|
|
|
|
|
|
|
2145642.681
|
27879.71
|
56780.64
|
|
2145642.7
|
-27879.71
|
56780.64
|
|
2145642.7
|
-27879.7
|
56780.64
|
|
|
21865007.1
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
sumN^2-(sumN)^2/n
|
5.633621
|
|
|
|
1957.953103
|
|
|
|
857.1179
|
|
|
|
|
|
|
|
|
|
|
|
|
Square( N^2) / n
|
141.2414
|
|
|
|
675102.4169
|
|
|
|
39975.27
|
|
|
|
|
|
|
|
|
|
|
|
|
Average(m)
|
2.206897
|
|
|
|
152.5758621
|
|
|
|
37.12759
|
|
|
|
|
|
|
|
|
|
|
|
|
sum -basic
|
29
|
64
|
146.875
|
8.88E-16
|
|
4424.7
|
677060.4
|
4424.7
|
|
1076.7
|
40832.39
|
1076.7
|
|
|
|
|
|
165239.8
|
|
9826.65
|
|
2427.325
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
X3(L) =Y
|
Y^2
|
Y-Y-m
|
|
X1(cm)
|
X1^2
|
X1-X1-m
|
|
X2(Kg)
|
X2^2
|
X2-X2-m
|
|
|
|
|
|
X1 * X2
|
|
X1*Y
|
|
X2 * Y
|
|
1
|
1.75
|
3.0625
|
-0.4569
|
|
135.1
|
18252.01
|
135.1
|
|
32
|
1024
|
32
|
|
|
|
|
|
4323.2
|
|
236.425
|
|
56
|
|
2
|
2
|
4
|
-0.2069
|
|
139.9
|
19572.01
|
139.9
|
|
30.4
|
924.16
|
30.4
|
|
|
|
|
|
4252.96
|
|
279.8
|
|
60.8
|
|
3
|
2.75
|
7.5625
|
0.543103
|
|
163.6
|
26764.96
|
163.6
|
|
46.2
|
2134.44
|
46.2
|
|
|
|
|
|
7558.32
|
|
449.9
|
|
127.05
|
|
4
|
2.5
|
6.25
|
0.293103
|
|
146.5
|
21462.25
|
146.5
|
|
33.5
|
1122.25
|
33.5
|
|
|
|
|
|
4907.75
|
|
366.25
|
|
83.75
|
|
5
|
2.75
|
7.5625
|
0.543103
|
|
156.2
|
24398.44
|
156.2
|
|
37.1
|
1376.41
|
37.1
|
|
|
|
|
|
5795.02
|
|
429.55
|
|
102.025
|
|
6
|
2
|
4
|
-0.2069
|
|
156.4
|
24460.96
|
156.4
|
|
35.5
|
1260.25
|
35.5
|
|
|
|
|
|
5552.2
|
|
312.8
|
|
71
|
|
7
|
2.75
|
7.5625
|
0.543103
|
|
167.8
|
28156.84
|
167.8
|
|
41.5
|
1722.25
|
41.5
|
|
|
|
|
|
6963.7
|
|
461.45
|
|
114.125
|
|
8
|
1.5
|
2.25
|
-0.7069
|
|
149.7
|
22410.09
|
149.7
|
|
31
|
961
|
31
|
|
|
|
|
|
4640.7
|
|
224.55
|
|
46.5
|
|
9
|
2.5
|
6.25
|
0.293103
|
|
145
|
21025
|
145
|
|
33
|
1089
|
33
|
|
|
|
|
|
4785
|
|
362.5
|
|
82.5
|
|
10
|
2.25
|
5.0625
|
0.043103
|
|
148.5
|
22052.25
|
148.5
|
|
37.2
|
1383.84
|
37.2
|
|
|
|
|
|
5524.2
|
|
334.125
|
|
83.7
|
|
11
|
3
|
9
|
0.793103
|
|
165.5
|
27390.25
|
165.5
|
|
49.5
|
2450.25
|
49.5
|
|
|
|
|
|
8192.25
|
|
496.5
|
|
148.5
|
|
12
|
1.25
|
1.5625
|
-0.9569
|
|
135
|
18225
|
135
|
|
27.6
|
761.76
|
27.6
|
|
|
|
|
|
3726
|
|
168.75
|
|
34.5
|
|
13
|
2.75
|
7.5625
|
0.543103
|
|
153.3
|
23500.89
|
153.3
|
|
41
|
1681
|
41
|
|
|
|
|
|
6285.3
|
|
421.575
|
|
112.75
|
|
14
|
1.75
|
3.0625
|
-0.4569
|
|
152
|
23104
|
152
|
|
32
|
1024
|
32
|
|
|
|
|
|
4864
|
|
266
|
|
56
|
|
15
|
2.25
|
5.0625
|
0.043103
|
|
160.5
|
25760.25
|
160.5
|
|
47.2
|
2227.84
|
47.2
|
|
|
|
|
|
7575.6
|
|
361.125
|
|
106.2
|
|
16
|
1.75
|
3.0625
|
-0.4569
|
|
153
|
23409
|
153
|
|
32
|
1024
|
32
|
|
|
|
|
|
4896
|
|
267.75
|
|
56
|
|
17
|
2
|
4
|
-0.2069
|
|
147.6
|
21785.76
|
147.6
|
|
40.5
|
1640.25
|
40.5
|
|
|
|
|
|
5977.8
|
|
295.2
|
|
81
|
|
18
|
2.25
|
5.0625
|
0.043103
|
|
157.5
|
24806.25
|
157.5
|
|
43.3
|
1874.89
|
43.3
|
|
|
|
|
|
6819.75
|
|
354.375
|
|
97.425
|
|
19
|
2.75
|
7.5625
|
0.543103
|
|
155.1
|
24056.01
|
155.1
|
|
44.7
|
1998.09
|
44.7
|
|
|
|
|
|
6932.97
|
|
426.525
|
|
122.925
|
|
20
|
2
|
4
|
-0.2069
|
|
160.5
|
25760.25
|
160.5
|
|
37.5
|
1406.25
|
37.5
|
|
|
|
|
|
6018.75
|
|
321
|
|
75
|
|
21
|
1.75
|
3.0625
|
-0.4569
|
|
143
|
20449
|
143
|
|
31.5
|
992.25
|
31.5
|
|
|
|
|
|
4504.5
|
|
250.25
|
|
55.125
|
|
22
|
2.25
|
5.0625
|
0.043103
|
|
149.4
|
22320.36
|
149.4
|
|
33.9
|
1149.21
|
33.9
|
|
|
|
|
|
5064.66
|
|
336.15
|
|
76.275
|
|
23
|
2.75
|
7.5625
|
0.543103
|
|
160.8
|
25856.64
|
160.8
|
|
40.4
|
1632.16
|
40.4
|
|
|
|
|
|
6496.32
|
|
442.2
|
|
111.1
|
|
24
|
2.5
|
6.25
|
0.293103
|
|
159
|
25281
|
159
|
|
38.5
|
1482.25
|
38.5
|
|
|
|
|
|
6121.5
|
|
397.5
|
|
96.25
|
|
25
|
2
|
4
|
-0.2069
|
|
158.2
|
25027.24
|
158.2
|
|
37.5
|
1406.25
|
37.5
|
|
|
|
|
|
5932.5
|
|
316.4
|
|
75
|
|
26
|
1.75
|
3.0625
|
-0.4569
|
|
150
|
22500
|
150
|
|
36
|
1296
|
36
|
|
|
|
|
|
5400
|
|
262.5
|
|
63
|
|
27
|
2.25
|
5.0625
|
0.043103
|
|
144.5
|
20880.25
|
144.5
|
|
34.7
|
1204.09
|
34.7
|
|
|
|
|
|
5014.15
|
|
325.125
|
|
78.075
|
|
28
|
2.5
|
6.25
|
0.293103
|
|
154.6
|
23901.16
|
154.6
|
|
39.5
|
1560.25
|
39.5
|
|
|
|
|
|
6106.7
|
|
386.5
|
|
98.75
|
|
29
|
1.75
|
3.0625
|
-0.4569
|
|
156.5
|
24492.25
|
156.5
|
|
32
|
1024
|
32
|
|
|
|
|
|
5008
|
|
273.875
|
|
56
|
多元线性相关
用以说明自变量与应变量的相关程度。
多元线性回归与相关
multistage linear regression
复相关系数(多元相关系数,全相关系数coefficient
of multiply correlation)
R^2 measured or indicated how correlated between
the X1, X2, ...and Y.
R^2=SS-regression / SS-total
=1-SS-residual/SS-total
R = (SS-regression / SS-total)^1/2
=(1-SS-residual/SS-total)^1/2
R between 0--1; 1-indicated most correlation.
多元线性回归与相关
multistage linear regression
复相关系数(多元相关系数,全相关系数coefficient
of multiply correlation)
R^2 measured or indicated how correlated between
the X1, X2, ...and Y.
R^2=SS-regression / SS-total
=1-SS-residual/SS-total
R = (SS-regression / SS-total)^1/2
=(1-SS-residual/SS-total)^1/2
R between 0--1; 1-indicated most correlation.
i.e. R=(3.08/5.63)^1/2=0.74
复相关系数的假设检验
F=[R^2/(1-R^2)]*[(n-m-1)/m)]; n:number of sample;
m: number of X; n-m-1: v of residual;
i.e. Ho: Total coefficient=0, (no regression); H1:
Total coefficient=/=0, (with regression); a=0.05;
F=[(0.74)^2/(1-0.74^2)]*[(29-2-1)/2)]=15.679;
F-table: P<0.01, refuse Ho, accept H1;
F Meaning: with multiply liner regression
偏相关系数(Coefficient
of partial correlation)
问题:简单相关并不能一定(或往往不能)真实反映一个变量与一个因变量之间的关系,只有使其它变量固定,即扣除了其它变量的影响,计算二变量之间的关系才能反映此二变量之间的真实关系。例如,身高(X1,CM)
体重(X2,KG)
都与肺活量(Y,L)有相关,但在多回归分析中(偏相关系数检验),固定体重(X2,KG)变量,则身高(X1,CM)
变量与肺活量(Y,L)不相关,是体重(X2,KG)变量起主要作用,而身高(X1,CM)
变量与肺活量(Y,L)的相关分析中,因为身高(X1,CM)与体重(X2,KG)有关,体重(X2,KG)因素起了相关的作用。控制一个变量的,为一级相关系数;如本例为一级相关系数。
一般可用一个变量(作固定控制)的投影坐标来表示一个要分析的变能,例如,固定变量分别为1,2,3,4,5时,另一个变量X(轴上)与Y(轴上)的图形(椭圆)来表述;图形越圆,表示低度简单相关,高度偏相关;图形越偏,表示高度简单相关,低度偏相关;而偏相关比较能反映真实情况。
偏相关系数计算
(1)Xi与Y的离均差平方和;Lii和Lyy离均差积和;Lij
和 Liy的值;
(2)计算二变量的简单相关;
(3)计算偏相关系数。
1st:
r 12^3=[r12-r13*r23]/[(1-r13^2)(1-r23^2)]^1/2
L11=677060.37-(4424.4)^2/29=1957.9531
L22=857.1179
Lyy=5.6336
L12=165239.8-(4424.7)(1076.7)/29=961.3693
Ly1=9826.65-(64)(4424.7)/29=61.7948
Ly2=2427.325-(64)(1076.7)/29=51.1595
2nd:
r12=L12/(L11*L22)^1/2=961.3693/[(1957.9531)(857.1179)]^1/2=0.7421
ry1=Ly1/(Lyy*L11)^1/2=61.7948/[(5.6336)(1957.9531)]^1/2=0.5884
ry2=Ly2/(Lyy*L22)^1/2=51.1595/[(5.6336)*857.1179)]^1/2=0.7362
3rd:
r12.y=[0.7421-(0.5884)(0.7362)]/[(1-0.5884^2)(1-0.7362^2)]^1/2=0.5645
ry.12=[0.5884-(0.7421)(0.7362)]/[(1-0.7421^2)(1-0.7362^2)]^1/2=0.0927
ry2.1=[0.7362-(0.5884)(0.7421)]/[(1-0.5884^2)(1-0.7421^2)]^1/2=0.5527
偏相关系数的假设检验
t=[r/(1-r^2)^1/2]*[n-m-1]^1/2, v=n-m-1
ry1.2=0.0927 vs ry2.1=0.5527; Ho: 总统相关系数p1y.2=0,
p2y.1=0; H1: =/=0;
t1=[ry1.2/(1-ry1.2^2)^1/2]*[n-m-1]^1/2={0.0927/[(1-0.0927^2)]^1/2}*{(29-2-1)^1/2}=0.475
t2=[ry2.1/(1-ry2.1^2)^1/2]*[n-m-1]^1/2={0.5527/[(1-0.5527^2)]^1/2}*{(29-2-1)^1/2}=3.382
t1: not refuse Ho; when X2(kg) as set, X1(cm) has
no regression with Y;
t2: refuse Ho; when X1(cm) as set, X2(kg) has
regression with Y.
In sum: Liner Mutliply
X1(cm): 0.5884 0.0927
X2(kg) 0.7362 0.5527
【统计表statistical
table与统计图statistical
graph】
统计表:标题,总名称,有时间和地点-注明;横标目,横行数的意义,有单位的要注明(例如%);内容上,主语,被研究的事物,常列于左;谓语,表主语的指标,常列于右;注意:一表一中心,重点突出,简单明了;主谓(位置)分明,层次(标目)清楚;线条不宜过多(尽可能省);标目不宜有斜线;数字阿拉伯,无数字用-,0是0;备注以*于表外下方。
统计图:条图bar
graph指标数值大小;百分条图percent
bar graph指标比重关系;圆图circle
graph指标比重关系及全体所占比重关系;线图line
graph发展变化关系;半对数线图semilogarithmic
line graph发展速度(相对比);直方图histogram连续变量的频数分布;散点图scatter
diagram密集程度和趋势;统计地图statistical
map指标地域分布
【调查设计】
医学研究常是抽样观察 survey;
experiment;
field survey
调查计划:
目的和指标 sensitivity和specificity
不贪多求全;
确定对象和范围(病例/家庭/集体单位);
调查方法:case-control
和cohort
study,普查census/complete
survey;样本
sampling survey;
搜集资料:直接观察和采访(访问,调查会,信访);调查项目item要精选:
分析项目(能整理出指标的内容)和备查项目(保证分析项目的完整及便于核查);调查表questionarie:list
和/或card,调查表编码codesheet便于计算机处理,男1女2;整理分析计划:设计分组classification质量或数量方式分;整理表sorting
table;归组方法
四种抽样方法:
单纯随机抽样simple
random sampling(全体编号随机);
分层抽样stratified
sampling(分层后再随机);
系统抽样systematic
sampling(间隔或机械抽样);
整群抽样cluster
sampling(实际常为地区抽样area
sampling)
其它:二阶段抽样(例如,先地区抽样再居民抽样);
样本含量估计(匀数的抽样方面,率的抽样方面);
非抽样误差控制(常是设计上,执行中或整理过程带来的问题)
【实验设计experimental
design】
四原则:
对照control;
重复replication(样本大小估计;
随机randomization(保证实验其它因素的均衡);
双盲double
blind method。
对照:
空白对照(null);
实验对照(例如,实验组添加了维生素面包/对照组不加维生素面包);
标准对照(用现有标准或正常值作为对照);
自身对照(同一者的用药前后);
相互对照(实验组用新药旧药);
历史对照(以前的研究成果作比较,需要注意可比性)
六设计:
交互设计self-control
design(自身对照中实验因素a与b在实验组A与B再交换成Ba与Ab的方式);
随机设计completely
random design(把实验因素或实验对象完全随机处理,应用广泛);
配对设计paired
design(按一定条件配对,如性别年龄,提高效率);
配伍设计randomized
block design/placebo(配对设计扩大成配组);
拉丁方设计latin
square design(拉丁方表格,比配伍设计更进一步);
正交设计orthogonal
design(正交表,高效多因素;
盲法设计(常为双盲double
blind method,病人和医生都不知道谁在那个组,或安慰剂)。
【准实验研究】
准实验研究是社会科学研究的一种方法。相对于真正的实验研究而言,采用一定的操控程序,利用自然场景,灵活地控制实验对象。包括对照组无前测设计和非对等控制组设计。与真正的实验设计不同之处在于,没有随机分配实验对象到实验组和控制组,严谨性略低,因而所产生的因果结论的效度比真正的实验研究低,但优点在于所要求的条件灵活,在无法控制所有可能影响实验结果的无关变量时,具有广泛的应用性。
准实验研究进行的环境是现实的和自然的,与现实的联系也就密切得多。而实验研究的环境与实际生活中的情况相差很大,完全是一个人工制作的环境,与现实的联系较难。
真实验是能随机分派被试,完全控制无关干扰来源,能系统地操作自变量的实验。从而具有较高的内外在效度。真实验设计都有一个控制组,被试随机选择和随机分配到组。主要有三种表现形式:
1)实验组、控制组前后测设计
2)实验组、控制组后测设计
3)所罗门四组设计
【统计用表】
各种统计用表的使用方法
|