竞赛圈 > [猜你喜欢]Kuhung分享 | 认识推荐系统,你只需要一场竞赛
我是参加DataCastle推荐系统的kuhung。在截止日期的测评榜中,我的团队——猜你不喜欢,以7.86565的最终成绩,位居第二。接下来我将分享我的比赛心得及才赛代码。
推荐系统如今在互联网上广泛应用。因其能给网站带来附加收益,近些年得到了很多关注,基础算法理论体系较为成熟。这次参加的推荐比赛,数据集规范,可以说是给新手一个很好认识推荐系统的机会。
主要模型:贝叶斯概率矩阵分解(Bayesian Probabilistic Matrix Factorization,BPMF)
融合模型:Keras神经网络(yinjh前辈分享)
BPMF是基于PMF的一套概率矩阵分解模型。对PMF不熟悉的同学,可以参考这篇文章:概率矩阵分解模型 PMF(http://blog.csdn.net/stdcoutzyx/article/details/21347157)
在文中,作者有介绍,PMF可以理解为奇异值分解(SVD)的扩展,而SVD则是一种特殊的PMF。SVD矩阵分解时,要对原始评分矩阵进行补全。PMF则将用户和物品都表示在一个F维的隐式特征空间,假设用户和商品的隐式特征向量以及现有的评分数据的条件概率都服从高斯先验分布。
通过这一方式,在DC的数据集上,PMF的评分为7.8331。在该阶段,位于第五。
BMPF由PMF改进而来,它从贝叶斯角度而不是传统概率角度求解问题。在实验条件下,使用Netflix数据集,准确性较PMF可提高1.7%~3.6%。
关于BPMF的详细介绍,可参考这一博文:推荐系统算法BPMF(http://blog.csdn.net/shenxiaolu1984/article/details/50405659)
更加原汁原味的是多伦多大学的这篇Bayesian Probabilistic Matrix Factorization using Markov Chain Monte
Carlo(https://www.cs.toronto.edu/~amnih/papers/bpmf.pdf)
在使用BPMF的情况下,测评分数为7.85053。在同一数据集,较PMF提升0.22%。
Keras神经网络使用yinjh前辈分享的代码,为了适配本地机型,只做了小幅更改。
本机跑出来的Keras方法效果是7.83312,略微优于PMF。
在本例中,BPMF代码以matlab形式实现。非特别声明,以下代码均在matlab中运行。 代码参照:Bayesian Probabilistic Matrix Factorization(http://www.utstat.toronto.edu/~rsalakhu/BPMF.html)
% train.csv经过处理,删掉时间一列 python可实现
M=csvread('trian.csv',1,1) % 第一行为列名,从第二行开始读取数值
save train M
M(:,1)=M(:,1)+1; % matlab索引从1开始,数据集中有0,加一消除。
M(:,2)=M(:,2)+1;
save test M
N=csvread('test.csv',1,1)
save test N
N(:,1)=N(:,1)+1;
N(:,2)=N(:,2)+1;
save test N
% Code provided by Ruslan Salakhutdinov
%
% Permission is granted for anyone to copy, use, modify, or distribute this
% program and accompanying programs and documents for any purpose, provided
% this copyright notice is retained and prominently displayed, along with
% a note saying that the original programs are available from our
% web page.
% The programs and documents are distributed without any warranty, express or
% implied. As the programs were written for research purposes only, they have
% not been tested to the degree that would be advisable in any important
% application. All use of these programs is entirely at the user's own risk.
rand('state',0);
randn('state',0);
if restart==1
restart=0;
epsilon=50; % Learning rate 学习比率
lambda = 0.01; % Regularization parameter 正则化参数
momentum=0.8;
epoch=1;
maxepoch=50; % 最大阶数
load train % Triplets: {user_id, movie_id, rating} 三个一组
mean_rating = mean( M (:,3)); % 平均评分
pairs_tr = length(M); % training data 训练数据
%%%%%%%%%%% 商品数和用户信息须提前获知,简单python可实现 %%%%%%%%%%%%
numbatches= 210; % Number of batches 批数
num_m = 14726; % Number of movies 商品数
num_p = 223970; % Number of users 用户数
num_feat = 10; % Rank 10 decomposition 10次分解?
w1_M1 = 0.1*randn(num_m, num_feat); % Movie feature vectors 商品特征矩阵
w1_P1 = 0.1*randn(num_p, num_feat); % User feature vecators 用户特征矩阵
w1_M1_inc = zeros(num_m, num_feat); % 初始化
w1_P1_inc = zeros(num_p, num_feat); % 初始化
end
for epoch = epoch:maxepoch % 迭代开始
rr = randperm(pairs_tr); % 随机置换向量
M = M(rr,:); % 训练集随机化
clear rr
for batch = 1:numbatches
fprintf(1,'epoch %d batch %d \r',epoch,batch);
N=157987; % number training triplets per batch 每一批的训练集样本数
aa_p = double(M((batch-1)*N+1:batch*N,1));
aa_m = double(M((batch-1)*N+1:batch*N,2));
rating = double(M((batch-1)*N+1:batch*N,3));
rating = rating-mean_rating; % Default prediction is the mean rating. 默认为平均评分
%%%%%%%%%%%%%% Compute Predictions 计算预测 %%%%%%%%%%%%%%%%%
pred_out = sum(w1_M1(aa_m,:).*w1_P1(aa_p,:),2); % 2代表矩阵相乘后行求和
f = sum( (pred_out - rating).^2 + ...
0.5*lambda*( sum( (w1_M1(aa_m,:).^2 + w1_P1(aa_p,:).^2),2)));
%%%%%%%%%%%%%% Compute Gradients 计算梯度 %%%%%%%%%%%%%%%%%%%
IO = repmat(2*(pred_out - rating),1,num_feat); % repmat 矩阵重复函数
Ix_m=IO.*w1_P1(aa_p,:) + lambda*w1_M1(aa_m,:);
Ix_p=IO.*w1_M1(aa_m,:) + lambda*w1_P1(aa_p,:);
dw1_M1 = zeros(num_m,num_feat);
dw1_P1 = zeros(num_p,num_feat);
for ii=1:N
dw1_M1(aa_m(ii),:) = dw1_M1(aa_m(ii),:) + Ix_m(ii,:);
dw1_P1(aa_p(ii),:) = dw1_P1(aa_p(ii),:) + Ix_p(ii,:);
end
%%%% Update movie and user features 更新商品和用户特征 %%%%%%%%%%%
w1_M1_inc = momentum*w1_M1_inc + epsilon*dw1_M1/N;
w1_M1 = w1_M1 - w1_M1_inc;
w1_P1_inc = momentum*w1_P1_inc + epsilon*dw1_P1/N;
w1_P1 = w1_P1 - w1_P1_inc;
end
%%%%%%%%%%%%%% Compute Predictions after Paramete Updates 字段更新后的再一次预测 %%%%%%%%%%%%%%%%%
pred_out = sum(w1_M1(aa_m,:).*w1_P1(aa_p,:),2);
f_s = sum( (pred_out - rating).^2 + ...
0.5*lambda*( sum( (w1_M1(aa_m,:).^2 + w1_P1(aa_p,:).^2),2)));
err_train(epoch) = sqrt(f_s/N);
%%% Compute predictions on the test set 测试集上的预测 %%%%%%%%%%%%%%%%%%%%%%
load test
pairs_pr = length(test);
NN=pairs_pr;
aa_p = double(test(:,1));
aa_m = double(test(:,2));
%rating = double(test(:,3));
pred_out = sum(w1_M1(aa_m,:).*w1_P1(aa_p,:),2) + mean_rating;
ff = find(pred_out>5); pred_out(ff)=5; % Clip predictions 预测修剪
ff = find(pred_out<1); pred_out(ff)=1;
fprintf(1, 'epoch %4i batch %4i Training RMSE %6.4f \n', ...
epoch, batch, err_train(epoch));
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
if (rem(epoch,10))==0
save pmf_weight w1_M1 w1_P1
end
end
% 结果在工作区的pred_out中,可粘贴复制或命令save pre_pmf.csv pred_out。
在使用bpmf前,先将训练集的平均评分赋给测试集,生成score列。
此外,还有两个关联函数。
makematrix.m
load train
num_m = 14726;
num_p = 223970;
count = sparse(num_p,num_m); %for Netflida data, use sparse matrix instead.
for mm=1:num_m
ff= find(M(:,2)==mm);
fprintf(1, '\n %d / %d \t \n', mm,num_m);
count(M(ff,1),mm) = M(ff,3);
end
% 将结果保存,以备下次使用 save makematrix count
pred.m
function [pred_out] = pred(w1_M1_sample,w1_P1_sample,N,mean_rating);
%%% Make predicitions on the validation data
aa_p = double(N(:,1));
aa_m = double(N(:,2));
rating = double(N(:,3));
pred_out = sum(w1_M1_sample(aa_m,:).*w1_P1_sample(aa_p,:),2) + mean_rating;
ff = find(pred_out>5); pred_out(ff)=5;
ff = find(pred_out<1); pred_out(ff)=1;
bayespmf.m主体
rand('state',0);
randn('state',0);
if restart==1
restart=0;
epoch=1;
maxepoch=50;
iter=0;
num_m = 14726;
num_p = 223970;
num_feat = 10;
% Initialize hierarchical priors 初始化分层先验
beta=2; % observation noise (precision) 观测噪声 (精度)
mu_u = zeros(num_feat,1);
mu_m = zeros(num_feat,1);
alpha_u = eye(num_feat); % 返回单位矩阵
alpha_m = eye(num_feat);
% parameters of Inv-Whishart distribution (see paper for details)
WI_u = eye(num_feat);
b0_u = 2;
df_u = num_feat;
mu0_u = zeros(num_feat,1);
WI_m = eye(num_feat);
b0_m = 2;
df_m = num_feat;
mu0_m = zeros(num_feat,1);
load train
load test
mean_rating = mean(M(:,3));
ratings_test = double(N(:,3));
pairs_tr = length(M);
pairs_pr = length(N);
fprintf(1,'Initializing Bayesian PMF using MAP solution found by PMF \n');
load makematrix
load pmf_weight
%count=count'
err_test = cell(maxepoch,1);
w1_P1_sample = w1_P1;
w1_M1_sample = w1_M1;
clear w1_P1 w1_M1;
% Initialization using MAP solution found by PMF.
%% Do simple fit
mu_u = mean(w1_P1_sample)';
d=num_feat;
alpha_u = inv(cov(w1_P1_sample));
mu_m = mean(w1_M1_sample)';
alpha_m = inv(cov(w1_P1_sample));
%count=count';
probe_rat_all = pred(w1_M1_sample,w1_P1_sample,N,mean_rating);
counter_prob=1;
end
for epoch = epoch:maxepoch
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%% Sample from movie hyperparams (see paper for details)
n = size(w1_M1_sample,1);
x_bar = mean(w1_M1_sample)';
S_bar = cov(w1_M1_sample);
WI_post = inv(inv(WI_m) + n/1*S_bar + ...
n*b0_m*(mu0_m - x_bar)*(mu0_m - x_bar)'/(1*(b0_m+n)));
WI_post = (WI_post + WI_post')/2;
df_mpost = df_m+n;
alpha_m = wishrnd(WI_post,df_mpost);
mu_temp = (b0_m*mu0_m + n*x_bar)/(b0_m+n);
lam = chol( inv((b0_m+n)*alpha_m) ); lam=lam';
mu_m = lam*randn(num_feat,1)+mu_temp;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%% Sample from user hyperparams
n = size(w1_P1_sample,1);
x_bar = mean(w1_P1_sample)';
S_bar = cov(w1_P1_sample);
WI_post = inv(inv(WI_u) + n/1*S_bar + ...
n*b0_u*(mu0_u - x_bar)*(mu0_u - x_bar)'/(1*(b0_u+n)));
WI_post = (WI_post + WI_post')/2;
df_mpost = df_u+n;
alpha_u = wishrnd(WI_post,df_mpost);
mu_temp = (b0_u*mu0_u + n*x_bar)/(b0_u+n);
lam = chol( inv((b0_u+n)*alpha_u) ); lam=lam';
mu_u = lam*randn(num_feat,1)+mu_temp;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Start doing Gibbs updates over user and
% movie feature vectors given hyperparams.
for gibbs=1:2
fprintf(1,'\t\t Gibbs sampling %d \r', gibbs);
%%% Infer posterior distribution over all movie feature vectors
count=count';
for mm=1:num_m
%fprintf(1,'movie =%d\r',mm);
ff = find(count(:,mm)>0);
MM = w1_P1_sample(ff,:);
rr = count(ff,mm)-mean_rating;
covar = inv((alpha_m+beta*MM'*MM));
mean_m = covar * (beta*MM'*rr+alpha_m*mu_m);
lam = chol(covar); lam=lam';
w1_M1_sample(mm,:) = lam*randn(num_feat,1)+mean_m;
end
%%% Infer posterior distribution over all user feature vectors
count=count';
for uu=1:num_p
%fprintf(1,'user =%d\r',uu);
ff = find(count(:,uu)>0);
MM = w1_M1_sample(ff,:);
rr = count(ff,uu)-mean_rating;
covar = inv((alpha_u+beta*MM'*MM));
mean_u = covar * (beta*MM'*rr+alpha_u*mu_u);
lam = chol(covar); lam=lam';
w1_P1_sample(uu,:) = lam*randn(num_feat,1)+mean_u;
end
end
probe_rat = pred(w1_M1_sample,w1_P1_sample,N,mean_rating);
probe_rat_all = (counter_prob*probe_rat_all + probe_rat)/(counter_prob+1);
counter_prob=counter_prob+1;
fprintf(1, '\nEpoch %d \t \n', epoch);
end
% 结果在工作区probe_rat_all。粘贴复制或save pre_bpmf.csv probe_rat_all
restart=1;
fprintf(1,'Running Probabilistic Matrix Factorization (PMF) \n');
pmf
restart=1;
fprintf(1,'\nRunning Bayesian PMF\n');
bayespmf
本次比赛,算是另一种尝试。克制住了一上来就copy现有代码的冲动,而是转而寻找另外的模型,不断调试,不断修改,也学到了很多东西。在和yinjh前辈的代码结果取平均值后,取得了不错的成绩。
在看过yes,boy!分享的代码后,发现这里面其实还有很多东西可以学习。学无止境,保持前进。
小运营问我,DC积分榜排第一是种什么样的体验?我总感觉,虚得很。因为,无论是像yinjh这样乐于分享的前辈,还是yes,boy!这样专门搞计算机的,或者是像云泛天音这样偶尔冒个泡的大神,他们都要比我强太多。两次比赛,算是做的初步的尝试,还有很多很多的东西要学,还有很多很多的问题要解决。
最后借用《银河英雄传说》里的一句话结束此次分享:我们的征途,是星辰大海。
在github上获取全部代码:
https://github.com/KuHung/DateCastle/tree/master/Recommender-systems
了解更多竞赛即时信息,关注DC微信公众号