微生信生物帶你學(xué)235頁的PLS

您所在的位置：網(wǎng)站首頁 › 屬虎和屬馬屬羊一起做生意 › 微生信生物帶你學(xué)235頁的PLS

微生信生物帶你學(xué)235頁的PLS

2023-09-21 19:27| 來源: 網(wǎng)絡(luò)整理| 查看: 265

偏最小二乘路徑分析（PLS-PM）

截止目前為止，網(wǎng)上的教程也就是這里這篇：http://blog.sciencenet.cn/home.php?do=blog&id=940864&mod=space&uid=2379401。?上面只有R語言的例子數(shù)據(jù)，并且給予的解釋是還不足對(duì)這個(gè)分析做一個(gè)明確的認(rèn)識(shí)并運(yùn)用實(shí)踐。這個(gè)分析的R教程有235頁，從模型的了解到模型構(gòu)建，到結(jié)果文件解讀再到模型的對(duì)比檢驗(yàn)等都做了完整的教程。所以在這里學(xué)習(xí)起來也需要分幾個(gè)部分：

第一部分理解模型簡單運(yùn)用

現(xiàn)在我們來深入理解PLS-PMPLS 路徑模型

PLS 路徑模型是Wold 及Lohmoller 繼偏最小二乘回歸之后提出的分析多組變量集合之間的線性統(tǒng)計(jì)關(guān)系的方法，是PLS 回歸的擴(kuò)展與延伸。PLS 路徑的主要優(yōu)點(diǎn)有: ( 1) 可對(duì)小樣本進(jìn)行測算; ( 2) 不需要對(duì)觀測變量分布與誤差分布做特定的概率分布假設(shè)，因此也就不存在模型無法識(shí)別的問題; ( 3) 可用許多潛變量與顯變量估計(jì)較復(fù)雜的模型。PLS 路徑模型由兩部分組成: 測量模型( 描述顯變量與隱變量之間的關(guān)系，又稱為外部模型) 與結(jié)構(gòu)模型( 描述隱變量之間的關(guān)系，又稱為內(nèi)部模型) 。每組顯變量Xj所對(duì)應(yīng)的隱變量為ξj ，Xj與其所對(duì)應(yīng)的ξj構(gòu)成測量模型，不同組間的ξj構(gòu)成結(jié)構(gòu)模型。PLS 路徑模型示意圖見圖2。?

?模型一般有兩種形式：

A表示是反映型的變量，隱變量是顯變量的原因，箭頭指向顯變量

B表示是影響型的變量，顯變量是隱變量的原因，箭頭指向隱變量一般我們都是通過隱變量來估算顯變量的變化，也就是反應(yīng)型變量。模型評(píng)價(jià)：PLS 路徑模型的評(píng)價(jià)與檢驗(yàn)首先要滿足兩個(gè)條件: (1) 潛變量與其所對(duì)應(yīng)的顯變量之間高正相關(guān)，相關(guān)負(fù)荷大于 0.4; (2) 各潛變量組指標(biāo)是單一維度的，即“唯一度”檢驗(yàn)。對(duì)各潛變量組做主成分檢驗(yàn)，并得出顯變量與潛變量的相關(guān)矩陣。這兩個(gè)指標(biāo)對(duì)應(yīng)在R中有相應(yīng)的函數(shù)獲取指標(biāo)。

反映式測量模型的信度評(píng)價(jià)：

科隆巴奇系數(shù): Cronbach’s α ≥ 0.7;使用函數(shù)$unidim調(diào)取

合成信度: CR ≥ 0.6;

指標(biāo)絕對(duì)標(biāo)準(zhǔn)載荷（loadings）: ≥0.7，低于0.4的指標(biāo)刪除?反映式測量模型的效度評(píng)價(jià):

主成分分析( 最大特征根) : 僅有1個(gè)最大特征根，＞ 1

平均差異萃取量( AVE) : 聚合效度，＞ 0.5,使用$inner_summary得到結(jié)果

交叉載荷:每個(gè)顯變量的標(biāo)準(zhǔn)外部權(quán)重要大于其與另外潛變量的交叉權(quán)重；使用函數(shù)$crossloadings提取,觀察對(duì)角線值是否都大于同行的其他值。loading檢測顯變量對(duì)隱變量的擬合效果，crossloadings檢測對(duì)其他隱變量的擬合效果，如果大于本顯變量就出問題了。

結(jié)構(gòu)模型的評(píng)價(jià):

內(nèi)生潛變量決定系數(shù)( 方差解釋度 Coecients of determination R2) :≥0.60，較好; 0.3，適中; ＜ 0.3，較差: 在R語言中通過$inner summary函數(shù)調(diào)取這部分結(jié)果。使用$inner_summary[, "R2", drop = FALSE]命令單獨(dú)調(diào)取R2值。

For each regression in the structural model we have an R2 that is interpreted similarly as in any multiple regression analysis. R2 indicates the amount of variance in the endogenous latent variable explained by its independent latent variables. The inner model seems to be ?ne, although we must keep in mind that this is a very simple model. We have an R2 = 0.85 which under the PLS-PM standards can be considered as an outstanding R2. In fact, values for the R-squared can be classi?ed in three categories (please don’t take them as absolute truth):

Low: R < 0.30 (although some authors consider R < 0.20)

Moderate: 0.30 < R < 0.60 (you can 0.20 < R < 0.50)

High: R > 0.60 (alternatively there R > 0.50)

路徑效果大小f2: ≥0.35，較大; 0.15，適中; ＜ 0.02，很小

R語言實(shí)戰(zhàn)：

# library(BiocManager)# install("plspm")# install.packages("plsdepot")library("plspm")library(plsdepot)

導(dǎo)入數(shù)據(jù)并進(jìn)行一般統(tǒng)計(jì)分析

education = read.table("./sample-data//education.txt", header = TRUE, row.names = 1)dim(education)summary(education[, 1:20])# 這批數(shù)據(jù)是一份調(diào)查問卷，最后三行分別記錄了這批調(diào)查者的寫別，收入和工作，問題的答案按照是否同意被分為了七類。#下面簡單統(tǒng)計(jì)是第一個(gè)變量的值分布aux_distrib = table(education[, 1])/nrow(education)barplot(aux_distrib, border = NA, main = colnames(education)[1])

library(RColorBrewer)# 統(tǒng)計(jì)前四列指標(biāo)# questions of Support indicatorssq1 = "Help when not doing well"sq2 = "I feel underappreciated"sq3 = "I can find a place where I feel safe"sq4 = "Concerns about school"# put questions in one vectorsup_questions = c(sq1, sq2, sq3, sq4)# setting graphical parametersop = par(mfrow = c(2,2), mar = c(2.5, 3.2, 2, 0.8))# bar-chart for each indicator of Supportfor (j in 1:4) { distribution = table(education[,j]) / nrow(education) barplot(distribution, border = NA, col = brewer.pal(8, "Blues")[2:8], axes = FALSE, main = sup_questions[j], cex.main = 1) # add vertical axis, and rectangle around figure axis(side = 2, las=2) box("figure", col="gray70")

}# reset default graphical parameterspar(op)

##計(jì)算前四列指標(biāo)是否相關(guān)cor(education[, 1:4])# 計(jì)算PCA，查看前四個(gè)指標(biāo)的權(quán)重library(plsdepot)# PCA of Support indicators with nipalssupport_pca = nipals(education[,1:4])# plotplot(support_pca, main = "Support indicators (circle of correlations)", cex.main = 1)

開始PLS-PM路徑分析

首先我們開始假設(shè)路徑，注意，既然我們是為了尋找因果關(guān)系，那么隱變量之間必須是單向的，這也就意味著路徑矩陣只能是半角矩陣。對(duì)角線是隱變量自己，下半角我們來設(shè)定假設(shè)路徑。每個(gè)路徑模型中我們都會(huì)有兩種類型的隱變量類型（BLOCKS DEFINITION ）

Exogenous ：外源變量，純作為解釋變量用來解釋內(nèi)源變量。

Endogenous ：內(nèi)源變量，既可以作為解釋變量解釋其他內(nèi)源變量，又可以作為因果關(guān)系的果被其他外源或者內(nèi)源變量解釋。

# 開始做路徑分析# rows of path matrixSupport = c(0, 0, 0, 0, 0, 0)Advising = c(0, 0, 0, 0, 0, 0)Tutoring = c(0, 0, 0, 0, 0, 0)Value = c(1, 1, 1, 0, 0, 0)Satisfaction = c(1, 1, 1, 1, 0, 0)Loyalty = c(0, 0, 0, 0, 1, 0)# matrix (by row binding)edu_path = rbind(Support, Advising, Tutoring, Value, Satisfaction, Loyalty)colnames(edu_path) = rownames(edu_path)# plot the inner matrixinnerplot(edu_path, box.size = 0.1)

設(shè)置隱變量對(duì)應(yīng)的顯變量數(shù)據(jù)和模型類型

# outer modeledu_blocks = list(1:4, 5:8, 9:12, 13:16, 17:19, 20:23)# modes (reflective blocks)edu_modes = rep("A", 6)

運(yùn)行模型

# apply plspmedu_pls1 = plspm(education, edu_path, edu_blocks, modes = edu_modes)# print edu_pls1edu_pls1

查看模型的全部結(jié)果

#summary()函數(shù)湖展示PLS_PM全部結(jié)果summary(edu_pls1)

顯變量之間應(yīng)該一致的，這里明顯出現(xiàn)了不一致

# check unidimensionality：小于0.7的就代表這些變量中存在問題edu_pls1$unidim#下面出圖看看plot(edu_pls1, what = "loadings")

為了統(tǒng)一個(gè)隱變量內(nèi)顯變量一致，我們修改兩個(gè)變量，并從新分析

#為了保持組內(nèi)一直，我們將指標(biāo)反轉(zhuǎn)，因此變量代表的意思就相反了# adding Support 'appreciated'education$sup.appre = 8 - education$sup.under# adding 'Loyalty' pleasededucation$loy.pleas = 8 - education$loy.asha

#處理完成這個(gè)問題之后就開始指定潛變量對(duì)應(yīng)的顯變量edu_blocks2 = list(c(1, 27, 3, 4), 5:8, 9:12, 13:16, 17:19, c(20, 21, 28, 23))# apply plspmedu_pls2 = plspm(education, edu_path, edu_blocks2, modes = edu_modes)#此時(shí)我們看到每一組內(nèi)是一致的plot(edu_pls2, what = "loadings")

#再次檢測異質(zhì)性edu_pls2$unidim#嗎，模型模塊外部相關(guān)，內(nèi)部相關(guān)loading，這個(gè)值大于0.7就認(rèn)為可以，communality大于0.49就認(rèn)為可以# ，這代表了潛變量可以解釋的程度，超過50% ，就認(rèn)為潛變量可以解釋超過50%的顯變量edu_pls2$outer_model#使用ggplot出圖現(xiàn)實(shí)load列，這里代表模型中顯變量可以被模型解釋的變量，大于0.7表明不錯(cuò)library(ggplot2)# barchart of loadingsggplot(data = edu_pls2$outer_model, aes(x = name, y = loading, fill = block)) + geom_bar(stat = "identity" , position = "dodge") + # threshold line (to peek acceptable loadings above 0.7) geom_hline(yintercept = 0.7, color = "gray50" ) + # add title ggtitle("Barchart of Loadings") + # rotate x-axis names theme(axis.text.x = element_text(angle = 90))

#檢測顯變量是否合適，觀察對(duì)角線值是否都大于同行的其他值edu_pls2$crossloadings#路徑效應(yīng)指數(shù)edu_pls2$path_coefs#顯變量對(duì)隱變量的解釋，loadingedu_pls2$outer_model#這里是R方aa= summary(edu_pls2)aa$inner_summary

#顯著性edu_pls2$inner_model#提取模型擬合度edu_pls2$gof#繪制路徑圖innerplot(edu_pls2)#提取影響satpls$effects

展示不同隱變量之間的貢獻(xiàn)關(guān)系

#出圖，添加路徑尺度大小plot(edu_pls2, arr.pos = 0.35)Paths = edu_pls2$path_coefsarrow_lwd = 10 * round(Paths, 2)plot(edu_pls2, arr.pos = 0.35, arr.lwd = arrow_lwd)#效應(yīng)可視化good_rows = c(3:5, 7:15)#path_effs = as.matrix(edu_pls2$effects[good_rows, 2:3])rownames(path_effs) = edu_pls2$effects[good_rows, 1]# setting margin sizeop = par(mar = c(8, 3, 1, 0.5))# barplots of total effects (direct + indirect)barplot(t(path_effs), border = NA, col = c("#9E9AC8", "#DADAEB"), las = 2, cex.names = 0.8, cex.axis = 0.8, legend = c("Direct", "Indirect"), args.legend = list(x = "top", ncol = 2, border = NA, bty = "n", title = "Effects"))# resetting default marginspar(op)

boot檢驗(yàn)，后我們對(duì)于各種指標(biāo)就會(huì)得到誤差無置信區(qū)間。包括顯變量對(duì)隱變量的影響指標(biāo)weigt和loading命令調(diào)取：$boot$weigts和$boot$loadings，path路徑$boot$paths，R2$boot$rsq,以及隱變量之間的影響$boot$total.efs，每個(gè)值都有標(biāo)準(zhǔn)誤和在95%區(qū)間的最低和最高值。文章上使用的人還不多。

What we obtain in foot val$boot is a list with results for:

the outer weights (foot val$boot$weigts)

the loadings (foot val$boot$loadings)

the path coe?cients (foot val$boot$paths)

the R2 (foot val$boot$rsq)

the total e?ects (foot val$boot$total.efs)

Each one of these elements is a matrix that contains ?ve columns: the original value of the parameters, the bootstrap mean value, the bootstrap standard error, and the lower percentile and upper percentiles of the 95% bootstrap con?dence interval.

# boot檢驗(yàn)foot_val = plspm(education, edu_path, edu_blocks2, modes = edu_modes, boot.val = TRUE, br = 200)foot_val$boot

下一步，對(duì)兩個(gè)模型的差異分析：前提是兩個(gè)模型處理數(shù)據(jù)不一樣，其他全部一樣，這時(shí)我們可以進(jìn)行一個(gè)比較：

###--------模型對(duì)比library(plspm)# load data collegedata(college)# what does the data look likehead(college, n = 5)# path matrix (inner model)HighSchool = c(0, 0, 0, 0)Intro = c(1, 0, 0, 0)Medium = c(1, 1, 0, 0)Graduation = c(1, 1, 1, 0)gpa_path = rbind(HighSchool, Intro, Medium, Graduation)# list of blocks (outer model)gpa_blocks = list(1:3, 4:7, 8:11, 12)# vector of reflective modesgpa_modes = rep("A", 4)# apply plspmgpa_pls = plspm(college, gpa_path, gpa_blocks, modes = gpa_modes, boot.val = TRUE)summary(gpa_pls)# plot path coefficientsplot(gpa_pls)

下面分性別進(jìn)行模型構(gòu)建

# select data of female studentsfemale = college[college$Gender == "FEMALE", ]# female students plspmfemale_gpa_pls = plspm(female, gpa_path, gpa_blocks, modes = gpa_modes)# select data of male studentsmale = college[college$Gender == "MALE", ]# male students plspmmale_gpa_pls = plspm(male, gpa_path, gpa_blocks, modes = gpa_modes)# plot path coefficientsplot(female_gpa_pls, box.size = 0.14)

plot(male_gpa_pls, box.size = 0.14)

我們想比對(duì)一下，男生和女生的模型是否相同，這個(gè)時(shí)候就需要使用函數(shù)plspm.groups。

# apply plspm.groups bootstrapgpa_boot = plspm.groups(gpa_pls, college$Gender, method = "bootstrap")# see the resultsgpa_boot

# apply plspm.groups premutationgpa_perm = plspm.groups(gpa_pls, college$Gender, method = "permutation")# see the resultsgpa_perm

# path coefficients between female and male studentsbarplot(t(as.matrix(gpa_boot$test[,2:3])), border = NA, beside = TRUE, col = c("#FEB24C","#74A9CF"), las = 2, ylim = c(-0.1, 1), cex.names = 0.8, col.axis = "gray30", cex.axis = 0.8)# add horizontal lineabline(h = 0, col = "gray50")# add itletitle("Path coefficients of Female and Male Students", cex.main = 0.95, col.main = "gray30")# add legendlegend("top", legend = c("female", "male"), pt.bg = c("#FEB24C", "#A6BDDB"), ncol = 2, pch = 22, col = c("#FEB24C", "#74A9CF"), bty = "n", text.col = "gray40")

完成之后我們發(fā)現(xiàn)相同路徑在不同性別人群中出現(xiàn)了的不同。這一不同我們已經(jīng)檢測的，雖然在本研究中是不顯著的。當(dāng)應(yīng)用到我們的研究后可以標(biāo)注顯著性，更加清晰明了。?

【本文地址】

公司簡介

聯(lián)系我們

今日新聞

推薦新聞

專題文章