PyTorch の動作確認をしてみた（２２）

1. 環境は、Window 10 Home (64bit) 上で行った。

2. Anaconda3 (64bit) – Spyder上で、動作確認を行った。

3. python のバージョンは、python 3.7.0 である。

4. pytorch のバージョンは、pytorch 0.4.1 である。

5. GPU は, NVIDIA社の GeForce GTX 1050 である。

6. CPU は, Intel社の Core(TM) i7-7700HQ である。

今回確認した内容は、現場で使える! PyTorch開発入門深層学習モデルの作成とアプリケーションへの実装 (AI & TECHNOLOGY) の 6.1 行列因子分解 (P.152 – P.159) である。

※1. プログラムの詳細は, 書籍を参考(P.152 – P.159)にして下さい.
※2. 書籍上は, 本章に関し, 特に, 訓練したモデルの保存については, 書かれてなかったが, epoch 5 でも, 約1974秒かかったので, 訓練したモデルの保存は, 個人的には, 推奨したい.

■行列因子分解の訓練に関する動作確認.

# -*- coding: utf-8 -*-
# 1. library import.
from __future__ import print_function
import torch
from torch import nn, optim
from torch.utils.data import (DataLoader, TensorDataset)
import os, time
import pandas as pd
from sklearn import model_selection
from tqdm import tqdm
from statistics import mean

# 2. get csv data.
# downloaded from the site below.
# MovieLens
# https://grouplens.org/datasets/movielens/
start = time.time()
folder_path = os.path.expanduser('~')
folder_path = folder_path + '\\.spyder-py3\\pytorch\\ml-20m\\'
# X is a pair of userId, movieId.
df = pd.read_csv(folder_path + 'ratings.csv', encoding = 'utf-8')

～(略)～

for epoch in range(6):
～(略)～
    test_score = eval_net(net, test_loader, device = "cuda:0")
    print(epoch, mean(loss_log), test_score, flush = True)
    # save the learning result every ten iterations.
    # SAVING AND LOADING MODELS
    # https://pytorch.org/tutorials/beginner/saving_loading_models.html
    # -> A common PyTorch convention is 
    # to save models using either a .pt or .pth file extension.
    torch.save(net.state_dict(), folder_path + "mf_{:03d}.pth".format(epoch), pickle_protocol = 4)

# 10. display processing time.
end = time.time()
print('--------------------------------------------------')
print('Elapsed Time: ' + str(end - start) + "[sec]")

# -*- coding: utf-8 -*-

# 1. library import.

from __future__ import print_function

import torch

from torch import nn, optim

from torch.utils.data import (DataLoader, TensorDataset)

import os, time

import pandas as pd

from sklearn import model_selection

from tqdm import tqdm

from statistics import mean

# 2. get csv data.

# downloaded from the site below.

# MovieLens

# https://grouplens.org/datasets/movielens/

start = time.time()

folder_path = os.path.expanduser('~')

folder_path = folder_path + '\\.spyder-py3\\pytorch\\ml-20m\\'

# X is a pair of userId, movieId.

df = pd.read_csv(folder_path + 'ratings.csv', encoding = 'utf-8')

～(略)～

for epoch in range(6):

～(略)～

test_score = eval_net(net, test_loader, device = "cuda:0")

print(epoch, mean(loss_log), test_score, flush = True)

# save the learning result every ten iterations.

# SAVING AND LOADING MODELS

# https://pytorch.org/tutorials/beginner/saving_loading_models.html

# -> A common PyTorch convention is

# to save models using either a .pt or .pth file extension.

torch.save(net.state_dict(), folder_path + "mf_{:03d}.pth".format(epoch), pickle_protocol = 4)

# 10. display processing time.

end = time.time()

print('--------------------------------------------------')

print('Elapsed Time: ' + str(end - start) + "[sec]")

■実行結果(epoch 5).

100%|██████████| 17579/17579 [05:09<00:00, 39.52it/s]
0 1.5930599859871755 0.732918381690979
100%|██████████| 17579/17579 [05:08<00:00, 57.07it/s]
1 0.878951918882846 0.7076014280319214
100%|██████████| 17579/17579 [05:07<00:00, 57.17it/s]
2 0.8339888583244646 0.6978585124015808
100%|██████████| 17579/17579 [05:07<00:00, 57.24it/s]
3 0.8118861625033018 0.6956022381782532
100%|██████████| 17579/17579 [05:06<00:00, 35.50it/s]
4 0.7995163224732024 0.6941294074058533
100%|██████████| 17579/17579 [05:06<00:00, 39.15it/s]
5 0.792301750381799 0.6923578381538391
--------------------------------------------------
Elapsed Time: 1974.9214341640472[sec]

100%|██████████| 17579/17579 [05:09<00:00, 39.52it/s]

0 1.5930599859871755 0.732918381690979

100%|██████████| 17579/17579 [05:08<00:00, 57.07it/s]

1 0.878951918882846 0.7076014280319214

100%|██████████| 17579/17579 [05:07<00:00, 57.17it/s]

2 0.8339888583244646 0.6978585124015808

100%|██████████| 17579/17579 [05:07<00:00, 57.24it/s]

3 0.8118861625033018 0.6956022381782532

100%|██████████| 17579/17579 [05:06<00:00, 35.50it/s]

4 0.7995163224732024 0.6941294074058533

100%|██████████| 17579/17579 [05:06<00:00, 39.15it/s]

5 0.792301750381799 0.6923578381538391

--------------------------------------------------

Elapsed Time: 1974.9214341640472[sec]

■指定ユーザの映画評価に関する予測についての動作確認.

# -*- coding: utf-8 -*-
# 1. library import.
from __future__ import print_function
import torch
from torch import nn
from torch.utils.data import (DataLoader, TensorDataset)
import os, time
import pandas as pd
from sklearn import model_selection

# 2. get csv data.
# downloaded from the site below.
# MovieLens
# https://grouplens.org/datasets/movielens/
start = time.time()
folder_path = os.path.expanduser('~')
folder_path = folder_path + '\\.spyder-py3\\pytorch\\ml-20m\\'
# X is a pair of userId, movieId.
df = pd.read_csv(folder_path + 'ratings.csv', encoding = 'utf-8')

～(略)～

# 8. load model.
net.load_state_dict(torch.load(folder_path + "mf_005.pth"))

# 9. predict evaluation of user's movie.
net.to("cpu")

～(略)～

# 10. calculate the evaluation prediction value of all movies for 
# a certain user and extract the top five.
# query = torch.stack([
#         torch.zeros(max_item).fill_(1), 
#         torch.arange(1, max_item + 1)], 1).long()
# -> RuntimeError: Expected a Tensor of type torch.FloatTensor 
# but found a type torch.LongTensor for sequence element 1 in sequence 
# argument at position #1 'tensors'
# -> dtype = torch.float32 を 設定する形で, bug fix.
query = torch.stack([
        torch.zeros(max_item, dtype = torch.float32).fill_(1), 
        torch.arange(1, max_item + 1, dtype = torch.float32)], 1).long()

# torch.topk
# https://pytorch.org/docs/stable/torch.html#torch.topk
nq = net(query)
scores, indices = torch.topk(nq, 5)
print('- top five ---------------------------------------------')
print('scores:', str(scores))
print('indices:', str(indices))
print()

scores, indices = torch.topk(nq, 5, largest = False)
print('- last five --------------------------------------------')
print('scores:', str(scores))
print('indices:', str(indices))
print()

scores, indices = torch.topk(nq, 100010)
print('- between 25001 and 25010 ------------------------------')
print('scores:', str(scores[25000 : 25010]))
print('indices:', str(indices[25000 : 25010]))
print()

print('- between 50001 and 50010 ------------------------------')
print('scores:', str(scores[50000 : 50010]))
print('indices:', str(indices[50000 : 50010]))
print()

print('- between 75001 and 75010 ------------------------------')
print('scores:', str(scores[75000 : 75010]))
print('indices:', str(indices[75000 : 75010]))
print()

print('- between 100001 and 100010 ----------------------------')
print('scores:', str(scores[100000 : 100010]))
print('indices:', str(indices[100000 : 100010]))
print()

# 11. display processing time.
end = time.time()
print('--------------------------------------------------------')
print('Elapsed Time: ' + str(end - start) + "[sec]")

# -*- coding: utf-8 -*-

# 1. library import.

from __future__ import print_function

import torch

from torch import nn

from torch.utils.data import (DataLoader, TensorDataset)

import os, time

import pandas as pd

from sklearn import model_selection

# 2. get csv data.

# downloaded from the site below.

# MovieLens

# https://grouplens.org/datasets/movielens/

start = time.time()

folder_path = os.path.expanduser('~')

folder_path = folder_path + '\\.spyder-py3\\pytorch\\ml-20m\\'

# X is a pair of userId, movieId.

df = pd.read_csv(folder_path + 'ratings.csv', encoding = 'utf-8')

～(略)～

# 8. load model.

net.load_state_dict(torch.load(folder_path + "mf_005.pth"))

# 9. predict evaluation of user's movie.

net.to("cpu")

～(略)～

# 10. calculate the evaluation prediction value of all movies for

# a certain user and extract the top five.

# query = torch.stack([

# torch.zeros(max_item).fill_(1),

# torch.arange(1, max_item + 1)], 1).long()

# -> RuntimeError: Expected a Tensor of type torch.FloatTensor

# but found a type torch.LongTensor for sequence element 1 in sequence

# argument at position #1 'tensors'

# -> dtype = torch.float32 を設定する形で, bug fix.

query = torch.stack([

torch.zeros(max_item, dtype = torch.float32).fill_(1),

torch.arange(1, max_item + 1, dtype = torch.float32)], 1).long()

# torch.topk

# https://pytorch.org/docs/stable/torch.html#torch.topk

nq = net(query)

scores, indices = torch.topk(nq, 5)

print('- top five ---------------------------------------------')

print('scores:', str(scores))

print('indices:', str(indices))

print()

scores, indices = torch.topk(nq, 5, largest = False)

print('- last five --------------------------------------------')

print('scores:', str(scores))

print('indices:', str(indices))

print()

scores, indices = torch.topk(nq, 100010)

print('- between 25001 and 25010 ------------------------------')

print('scores:', str(scores[25000 : 25010]))

print('indices:', str(indices[25000 : 25010]))

print()

print('- between 50001 and 50010 ------------------------------')

print('scores:', str(scores[50000 : 50010]))

print('indices:', str(indices[50000 : 50010]))

print()

print('- between 75001 and 75010 ------------------------------')

print('scores:', str(scores[75000 : 75010]))

print('indices:', str(indices[75000 : 75010]))

print()

print('- between 100001 and 100010 ----------------------------')

print('scores:', str(scores[100000 : 100010]))

print('indices:', str(indices[100000 : 100010]))

print()

# 11. display processing time.

end = time.time()

print('--------------------------------------------------------')

print('Elapsed Time: ' + str(end - start) + "[sec]")

■実行結果.

tensor([3.6101], grad_fn=<MulBackward>)

- top five ---------------------------------------------
scores: tensor([5.0000, 5.0000, 5.0000, 5.0000, 5.0000], grad_fn=<TopkBackward>)
indices: tensor([ 96667, 109301, 120890,  55433,  42453])

- last five --------------------------------------------
scores: tensor([4.6535e-06, 3.3581e-05, 4.9357e-05, 6.0969e-05, 6.2392e-05],
       grad_fn=<TopkBackward>)
indices: tensor([ 18571,  96152,  54252,  35498, 109398])

- between 25001 and 25010 ------------------------------
scores: tensor([4.6394, 4.6394, 4.6394, 4.6393, 4.6393, 4.6393, 4.6393, 4.6393, 4.6392,
        4.6392], grad_fn=<SliceBackward>)
indices: tensor([ 93562,  30782,  41455,  25498,  12656, 123988,  26001, 104624,  14696,
         49105])

- between 50001 and 50010 ------------------------------
scores: tensor([3.6825, 3.6824, 3.6824, 3.6824, 3.6823, 3.6823, 3.6822, 3.6821, 3.6820,
        3.6820], grad_fn=<SliceBackward>)
indices: tensor([ 88630,  92275, 123422, 121111,  78579,   3348,  66865,  44663,  56952,
         68030])

- between 75001 and 75010 ------------------------------
scores: tensor([2.2174, 2.2172, 2.2170, 2.2169, 2.2168, 2.2168, 2.2167, 2.2167, 2.2166,
        2.2166], grad_fn=<SliceBackward>)
indices: tensor([83245, 97727, 79370, 13530, 48519, 13638, 52819, 62837, 99327, 98202])

- between 100001 and 100010 ----------------------------
scores: tensor([0.6957, 0.6956, 0.6955, 0.6955, 0.6954, 0.6954, 0.6953, 0.6953, 0.6953,
        0.6953], grad_fn=<SliceBackward>)
indices: tensor([ 74712,  98505,  72104,  14471,  28726,  85159,  35487, 127431,  22152,
          5103])

--------------------------------------------------------
Elapsed Time: 38.12584853172302[sec]

tensor([3.6101], grad_fn=<MulBackward>)

- top five ---------------------------------------------

scores: tensor([5.0000, 5.0000, 5.0000, 5.0000, 5.0000], grad_fn=<TopkBackward>)

indices: tensor([ 96667, 109301, 120890, 55433, 42453])

- last five --------------------------------------------

scores: tensor([4.6535e-06, 3.3581e-05, 4.9357e-05, 6.0969e-05, 6.2392e-05],

grad_fn=<TopkBackward>)

indices: tensor([ 18571, 96152, 54252, 35498, 109398])

- between 25001 and 25010 ------------------------------

scores: tensor([4.6394, 4.6394, 4.6394, 4.6393, 4.6393, 4.6393, 4.6393, 4.6393, 4.6392,

4.6392], grad_fn=<SliceBackward>)

indices: tensor([ 93562, 30782, 41455, 25498, 12656, 123988, 26001, 104624, 14696,

49105])

- between 50001 and 50010 ------------------------------

scores: tensor([3.6825, 3.6824, 3.6824, 3.6824, 3.6823, 3.6823, 3.6822, 3.6821, 3.6820,

3.6820], grad_fn=<SliceBackward>)

indices: tensor([ 88630, 92275, 123422, 121111, 78579, 3348, 66865, 44663, 56952,

68030])

- between 75001 and 75010 ------------------------------

scores: tensor([2.2174, 2.2172, 2.2170, 2.2169, 2.2168, 2.2168, 2.2167, 2.2167, 2.2166,

2.2166], grad_fn=<SliceBackward>)

indices: tensor([83245, 97727, 79370, 13530, 48519, 13638, 52819, 62837, 99327, 98202])

- between 100001 and 100010 ----------------------------

scores: tensor([0.6957, 0.6956, 0.6955, 0.6955, 0.6954, 0.6954, 0.6953, 0.6953, 0.6953,

0.6953], grad_fn=<SliceBackward>)

indices: tensor([ 74712, 98505, 72104, 14471, 28726, 85159, 35487, 127431, 22152,

5103])

--------------------------------------------------------

Elapsed Time: 38.12584853172302[sec]

■以上の実行結果から, 以下のことが分かった.

1. MAEの確認.
epoch 5 で, 約0.69 まで改善した(※書籍とほぼ同じ結果).

2. 行列因子分解の訓練.
epoch 5 で, 約1974秒 かかった.

3. torch.stack()に関すること.
RuntimeError: Expected a Tensor of type torch.FloatTensor 
but found a type torch.LongTensor for sequence element 1 in sequence argument at position #1 'tensors'
-> 上記 error を回避するため, dtype = torch.float32 (2箇所) 指定する形で動作確認した.

4. 指定ユーザの映画評価の予測について.
・約38秒 かかった.
・Top5 だと, 評価5 ばかりだが, 例えば, 50001 ～ 50010位を見ると, 評価3.6 辺りに下降し, 
さらに, 100001 ～ 100010位では, 評価0.69 付近まで下降していることが確認できた.

1. MAEの確認.

epoch 5 で, 約0.69 まで改善した(※書籍とほぼ同じ結果).

2. 行列因子分解の訓練.

epoch 5 で, 約1974秒かかった.

3. torch.stack()に関すること.

RuntimeError: Expected a Tensor of type torch.FloatTensor

but found a type torch.LongTensor for sequence element 1 in sequence argument at position #1 'tensors'

-> 上記 error を回避するため, dtype = torch.float32 (2箇所) 指定する形で動作確認した.

4. 指定ユーザの映画評価の予測について.

・約38秒かかった.

・Top5 だと, 評価5 ばかりだが, 例えば, 50001 ～ 50010位を見ると, 評価3.6 辺りに下降し,

さらに, 100001 ～ 100010位では, 評価0.69 付近まで下降していることが確認できた.

■参照サイト
【参照URL①】MovieLens
【参照URL②】SAVING AND LOADING MODELS
【参照URL③】torch.topk

■参考書籍
現場で使える! PyTorch開発入門深層学習モデルの作成とアプリケーションへの実装 (AI & TECHNOLOGY)

コメントを残す コメントをキャンセル

コメントを残すコメントをキャンセル