预备知识及Torch基础

本节归纳总结一些常用的Torch及Tensor操作。参考torch文档时最好看英文文档，中文的文档有的参数都没写全，还需要改进

数据操作

在深度学习中，样本一般会拥有多维的特征，使用张量(Tensor)来表示样本。对于张量的简单介绍，可以看我前面的文章矩阵基础。我们主要使用Pytorch框架来进行代码的演示，在Torch中Tensor可以支持GPU运算，且支持自动微分。

Tensor一般存储在内存中，并采⽤基于CPU的计算，若需要使用GPU运算需要额外指定。

下面结合Torch文档及书中内容对常用的Tensor操作做总结:

Tensor创建

一个张量tensor可以从Python的list或序列构建。

使用torch.Tensor()可以构建一个张量，默认会构建Float32类型的张量。

从python List构建torch

可以看到torch.Tensor()会将Tensor全部改为默认类型

num = [1,2,3,4]
tensor = torch.Tensor(num)
print(tensor.dtype)  # torch.float32

tensor = torch.tensor(num)
print(tensor.dtype)  # torch.int64

要将⼤小为1的张量转换为Python标量，我们可以调⽤item函数或Python的内置函数。

1
2
3

a = torch.tensor([3.5])
a, a.item(), float(a), int(a)
# (tensor([3.5000]), 3.5, 3.5, 3)

从numpy创建

我们使用Tensor()函数转换numpy 的 array时，在CPU或GPU上执⾏操作的时候，如果Python的NumPy包也希望使⽤相同的内存块执⾏其他操作，不需要停下计算来等它。但使用from_numpy()函数时需要注意这点

import numpy as np
a = np.array([1,2,3,4])
# 改变原始numpy array
tensor_a = torch.from_numpy(a)
print(tensor_a, a.dtype, tensor_a.dtype, type(tensor_a))  # tensor([1, 2, 3, 4]) int64 torch.int64 <class 'torch.Tensor'>
tensor_a[0] = -1
print(a)  # [-1  2  3  4]
print(tensor_a)  # tensor([-1,  2,  3,  4])


# 也可以使用Tensor函数创建,不改变原始numpy array
tensor_b = torch.Tensor(a)
print(tensor_b, tensor_b.dtype, type(tensor_b))  # tensor([-1.,  2.,  3.,  4.]) torch.float32 <class 'torch.Tensor'>
tensor_b[0] = -2
print(a)  # -1  2  3  4]
print(tensor_b)  # tensor([-2.,  2.,  3.,  4.])

其他的一些常用创建Tensor函数

torch.eye(3) # 创建一个对角线为1，其余位置为0的二维Tensor
# tensor([[1., 0., 0.],
        # [0., 1., 0.],
        # [0., 0., 1.]])

torch.ones(size=(2,3))  # 创建全为1的指定size的Tensor
# tensor([[1., 1., 1.],
#         [1., 1., 1.]])

torch.zeros(size=(3,3)) # 创建全为0的指定size的Tensor
# tensor([[0., 0., 0.],
#         [0., 0., 0.],
#         [0., 0., 0.]])

torch.rand(size=(2,2))  # 创建包含从区间[0,1)的均匀分布中抽取的一组随机数
# tensor([[0.4378, 0.0239],
#         [0.0968, 0.1274]])

torch.randn(size=(2,2))  # 创建从标准正态分布中抽取的一组随机数
# tensor([[-1.0866, -0.1296],
#         [-1.0830,  0.0376]])

torch.randperm(4)  # 给定参数n，返回一个[0, n)的随机整数排列
# tensor([1, 2, 3, 0])

# torch.arange(start, end, step=1, out=None) → Tensor
# 返回一个1维张量，长度为 floor((end−start)/step)。包含从start到end，以step为步长的一组序列值(默认步长为1)。
torch.arange(5)   #tensor([0, 1, 2, 3, 4])

torch.range() # 不建议使用
#使用range会红字提示:torch.range is deprecated and will be removed in a future release 
# because its behavior is inconsistent with Python's range builtin. 
# Instead, use torch.arange, which produces values in [start, end)

# torch.linspace(start, end, steps=100, out=None) → Tensor
#返回一个1维张量，[start,end]上均匀间隔的steps个点。 输出1维张量的长度为steps
torch.linspace(3, 10, steps=3)  # tensor([ 3.0000,  6.5000, 10.0000])

# torch.logspace(start, end, steps=100, out=None) → Tensor
# 返回一个1维张量，包含在区间 10start 和 10end上以对数刻度均匀间隔的steps个点。 输出1维张量的长度为steps。
torch.logspace(start=1, end=2, steps=5)  # tensor([ 10.0000,  17.7828,  31.6228,  56.2341, 100.0000])

Tensor的索引、切片、连接、换位

在实际的使用中，我们会经常用到Tensor的一些运算。

连接操作

在给定维度上对输入的张量序列seq 进行连接操作

这里对Tensor的维度做个简易的解释。通过Tensor.shape可以得到Tensor的size，size中有几个数字，该Tensor就是几维。通过第0维cat，则是将第0维组合起来。size(2,3)和size(2,3)组合第0维得到size(4,3)。

注意:按照dim维cat时，必须除了该维度其余维度完全一致才可cat

# torch.cat(inputs, dimension=0) → Tensor
x = torch.Tensor([[1,2,3],[2,3,4]])
# tensor([[1., 2., 3.],
#         [2., 3., 4.]])

print(x.shape)  # torch.Size([2, 3])

#三个x通过0维cat
torch.cat((x, x, x), 0)
# tensor([[1., 2., 3.],
#         [2., 3., 4.],
#         [1., 2., 3.],
#         [2., 3., 4.],
#         [1., 2., 3.],
#         [2., 3., 4.]])

#三个x通过1维cat
torch.cat((x, x, x), 1)
# tensor([[1., 2., 3., 1., 2., 3., 1., 2., 3.],
#         [2., 3., 4., 2., 3., 4., 2., 3., 4.]])

stack函数

切片操作

torch.chunk 将Tensor分n块

torch.split 将Tensor按照每个为n进行划分

chunk()

torch.chunk(tensor, chunks, dim=0)

在给定维度(轴)上将输入张量进行分块

tensor (Tensor) – 待分块的输入张量
chunks (int) – 分块的个数
dim (int) – 沿着此维度进行分块

注意:

使用chunk尽量让它可以整除

如果不能整分,会按照该维度个数除dim数向上取整去分每块，直到分完。

(官网没看到解释，点进去源码不是Python的)

是使用6/4和5/3试出的结果。dim=4并不一定会分出4个tensor

a = torch.arange(8).reshape(2,4)
# tensor([[0, 1, 2, 3],
#         [4, 5, 6, 7]])
#将Tensor在第0维分成2块
a1, a2 = torch.chunk(a, 2, dim=0)
print(a1)  # tensor([[0, 1, 2, 3]])
print(a2)  # tensor([[4, 5, 6, 7]])

#将Tensor在第1维分成2块
a1, a2 = torch.chunk(a, 2, dim=1)
print(a1)
# tensor([[0, 1],
#         [4, 5]])
print(a2)
# tensor([[2, 3],
#         [6, 7]])

# 如果不能整分,会按照floor()分每块(官网没看到解释，点进去源码不是Python的，故试了2个猜了一下。使用chunk尽量让它可以整除)
a = torch.arange(12).reshape(2,6)
torch.chunk(a, 4, dim=1)
# (tensor([[0, 1],
#          [6, 7]]),
#  tensor([[2, 3],
#          [8, 9]]),
#  tensor([[ 4,  5],
#          [10, 11]]))
# 可以看到最后还是分成了3个tensor，每个tensor在第1维上的个数为2
a = torch.arange(10).reshape(2,5)
torch.chunk(a, 3, dim=1)
# (tensor([[0, 1],
#          [5, 6]]),
#  tensor([[2, 3],
#          [7, 8]]),
#  tensor([[4],
#          [9]]))
# 可以看到最后分成了3个tensor，最后一个tensor在第1维上的个数为1

split()

1	torch.split(tensor, split_size, dim=0)

将输入张量分割成相等形状的chunks（如果可分）。如果沿指定维的张量形状大小不能被split_size 整分，则最后一个分块会小于其它分块。这个没有疑问

参数:

tensor (Tensor) – 待分割张量
split_size (int) – 单个分块的形状大小
dim (int) – 沿着此维进行分割

a = torch.arange(12).reshape(2,6)
torch.split(a, 3, dim=1)
# (tensor([[0, 1, 2],
#          [6, 7, 8]]),
#  tensor([[ 3,  4,  5],
#          [ 9, 10, 11]]))
torch.split(a, 4, dim=1)
# (tensor([[0, 1, 2, 3],
#          [6, 7, 8, 9]]),
#  tensor([[ 4,  5],
#          [10, 11]]))

squeeze和unsqueeze

在训练神经网络时，我们经常用到这两个函数，因为需要对data进行batch_size的改造。这两个操作并不会改变元素的数目(因为是去除张量中形状为1的)，只会改变维度。

1 2	torch.squeeze(input, dim=None, out=None) # squeeze是将该维度=1的挤压掉 torch.unsqueeze(input, dim, out=None) # unsqueeze是在该维度上增加1维

将输入张量形状中的1 去除并返回。如果输入是形如(A×1×B×1×C×1×D)(A×1×B×1×C×1×D)，那么输出形状就为： (A×B×C×D)(A×B×C×D)

当给定dim时，那么挤压操作只在给定维度上。例如，输入形状为: (A×1×B)(A×1×B), squeeze(input, 0) 将会保持张量不变，只有用 squeeze(input, 1)，形状会变成 (A×B)(A×B)。

注意：返回张量与输入张量共享内存，所以改变其中一个的内容会改变另一个。

参数:

input (Tensor) – 输入张量
dim (int, optional) – 如果给定，则input只会在给定维度挤压
out (Tensor, optional) – 输出张量

a = torch.ones((1,2,3,1,1,5))
print(a.shape)  # torch.Size([1, 2, 3, 1, 1, 5])
b = torch.squeeze(a)
print(b.shape)  # torch.Size([2, 3, 5])

a = torch.ones(size=(1,2,3,4))
print(a.size())
b = torch.unsqueeze(a, dim=2)
print(b.size())

我们这里使用最简单的Tensor进行辅助理解:

a = torch.ones(size=(1,1))
print(a)  # tensor([[1.]])
print(a.size())  # torch.Size([1, 1])
b = torch.squeeze(a,1)  # torch.Size([1])
print(b)  # tensor([1.])
print(b.size())  # tensor([1.])
c = torch.squeeze(b,0)  # torch.Size([1])
print(c)  # tensor(1.)
print(c.size())  # torch.Size([])

Tensor中只有1个值，a=[[1]]，b=[1], c=1,值没变，但每次维度都会降1

转置

torch.t 只能转置2维张量，转置其0，1维度。等价于transpose(input, 0, 1)。如果维度不等于2则抛出异常。也可以使用tensor.T

1	torch.transpose(input, dim0, dim1, out=None) → Tensor

返回输入矩阵input的转置。交换维度dim0和dim1。输出张量与输入张量共享内存，所以改变其中一个会导致另外一个也被修改。

参数:

input (Tensor) – 输入张量
dim0 (int) – 转置的第一维
dim1 (int) – 转置的第二维

a = torch.arange(6).reshape(2,3)
print(a)
# tensor([[0, 1, 2],
#         [3, 4, 5]])
print(torch.t(a))
print(a.T)
# tensor([[0, 3],
#         [1, 4],
#         [2, 5]])
a = torch.arange(6).reshape(1,2,3)
print(a)
# tensor([[[0, 1, 2],
#          [3, 4, 5]]])
print(torch.transpose(a,0,1))
# tensor([[[0, 1, 2]],

#         [[3, 4, 5]]])