不同框架实现LSTM代码及转Onnx方法

2023-05-16

文章目录

1、Paddle 生成LSTM
- 1.1 time_major=False
- 1.2 time_major=True
- 1.3 sequence_lens
- 1.4 无初始状态
- 1.5 查看生成的onnx模型
2 pytorch 生成LSTM
- 2.1 batch_first=True
- 2.2 batch_first=False
- 2.3 查看生成的onnx模型
3 Tensorflow2 生成LSTM
- 3.1 time_major=False
- 3.2 time_major=True
- 3.3 return_state=True
- 3.4 查看生成的模型
4 lstm 转换工具

本文将实现用paddle,pytorch,tensorflow2三种框架实现lstm的单层、双层、双向双层三种形式，并将整个过程生成的模型转换成onnx,并将onnx模型的结构展示。本文所有代码及执行结果都可以在https://gitee.com/tdddeeel/lstm_pytorch_tensorflow_paddle这里找到。

1、Paddle 生成LSTM

整个过程包括模型定义、导出、转onnx、优化onnx.最后的一个onnx是我们最后需要的onnx,可以查看图。这部分实际包括了paddle生成模型及转Onnx的过程，关于更多的整个流程，请参考博客

import os
import sys
import paddle
from paddle import nn
import numpy as np
from onnxsim import simplify
import onnxoptimizer
import onnx
import onnxruntime

/home/tl/anaconda3/envs/pdle/lib/python3.7/site-packages/paddle/vision/transforms/functional_pil.py:36: DeprecationWarning: NEAREST is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.NEAREST or Dither.NONE instead.
  'nearest': Image.NEAREST,
/home/tl/anaconda3/envs/pdle/lib/python3.7/site-packages/paddle/vision/transforms/functional_pil.py:37: DeprecationWarning: BILINEAR is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BILINEAR instead.
  'bilinear': Image.BILINEAR,
/home/tl/anaconda3/envs/pdle/lib/python3.7/site-packages/paddle/vision/transforms/functional_pil.py:38: DeprecationWarning: BICUBIC is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BICUBIC instead.
  'bicubic': Image.BICUBIC,
/home/tl/anaconda3/envs/pdle/lib/python3.7/site-packages/paddle/vision/transforms/functional_pil.py:39: DeprecationWarning: BOX is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BOX instead.
  'box': Image.BOX,
/home/tl/anaconda3/envs/pdle/lib/python3.7/site-packages/paddle/vision/transforms/functional_pil.py:40: DeprecationWarning: LANCZOS is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.LANCZOS instead.
  'lanczos': Image.LANCZOS,
/home/tl/anaconda3/envs/pdle/lib/python3.7/site-packages/paddle/vision/transforms/functional_pil.py:41: DeprecationWarning: HAMMING is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.HAMMING instead.
  'hamming': Image.HAMMING
/home/tl/anaconda3/envs/pdle/lib/python3.7/site-packages/onnx/mapping.py:27: DeprecationWarning: `np.object` is a deprecated alias for the builtin `object`. To silence this warning, use `object` by itself. Doing this will not modify any behavior and is safe. 
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  int(TensorProto.STRING): np.dtype(np.object)

1.1 time_major=False

与pytorch的batch_first=True是相同的功能

class One_LSTM_batch(nn.Layer):
    def __init__(self,in_channels=3,out_channels=4):
        super().__init__()
        self.rnn = nn.LSTM(in_channels,out_channels,time_major=False)
    def forward(self,x):
        # b,c,h,w =x.shape
        # x1 = paddle.reshape(x,[b,c,h*w])
        h0 = paddle.zeros((1,1,4))
        c0 = paddle.zeros((1,1,4))
        x2 = paddle.squeeze(x,2)
        x3 = paddle.transpose(x2,(0,2,1))
        out,_ = self.rnn(x3,(h0,c0))
        return out # shape 1,6,4

model_path = "paddle/One_LSTM_batch"
model = One_LSTM_batch()
model.eval()
infer_shape = [1,3,1,6]
input_spec=paddle.static.InputSpec(shape=infer_shape, dtype="float32")
paddle.onnx.export(model,model_path,input_spec=[input_spec],opset_version=11,enable_onnx_checker=True)

model = onnx.load(model_path+'.onnx')
model_sim ,check = simplify(model)
assert check,"simplified onnx model could not be validated"
save_path = model_path+"_sim.onnx"
onnx.save(model_sim,save_path)

model = onnx.load(save_path)
if model.ir_version<4:
    print("Model with ir_version below 4 requires to in clude initializer in graph input")
    exit()
inputs = model.graph.input
name_to_input = {}
for input in inputs:
    name_to_input[input.name]=input
for initializer in model.graph.initializer:
    if initializer.name in name_to_input:
        inputs.remove(name_to_input[initializer.name])
passes=["extract_constant_to_initializer","eliminate_unused_initializer"]
optimized_model = onnxoptimizer.optimize(model,passes)
save_path = model_path+"_sim_opt.onnx"
onnx.save(optimized_model,save_path)



class Two_LSTM_batch(nn.Layer):
    def __init__(self,in_channels=3,out_channels=4):
        super().__init__()
        self.rnn = nn.LSTM(in_channels,out_channels,time_major=False,num_layers=2)
    def forward(self,x):
        # b,c,h,w =x.shape
        # x1 = paddle.reshape(x,[b,c,h*w])
        h0 = paddle.zeros((2,1,4))
        c0 = paddle.zeros((2,1,4))
        x2 = paddle.squeeze(x,2)
        x3 = paddle.transpose(x2,(0,2,1))
        out,_ = self.rnn(x3,(h0,c0))
        return out # shape 1,6,4

model_path = "paddle/Two_LSTM_batch"
model = Two_LSTM_batch()
model.eval()
infer_shape = [1,3,1,6]
input_spec=paddle.static.InputSpec(shape=infer_shape, dtype="float32")
paddle.onnx.export(model,model_path,input_spec=[input_spec],opset_version=11,enable_onnx_checker=True)

model = onnx.load(model_path+'.onnx')
model_sim ,check = simplify(model)
assert check,"simplified onnx model could not be validated"
save_path = model_path+"_sim.onnx"
onnx.save(model_sim,save_path)

model = onnx.load(save_path)
if model.ir_version<4:
    print("Model with ir_version below 4 requires to in clude initializer in graph input")
    exit()
inputs = model.graph.input
name_to_input = {}
for input in inputs:
    name_to_input[input.name]=input
for initializer in model.graph.initializer:
    if initializer.name in name_to_input:
        inputs.remove(name_to_input[initializer.name])
passes=["extract_constant_to_initializer","eliminate_unused_initializer"]
optimized_model = onnxoptimizer.optimize(model,passes)
save_path = model_path+"_sim_opt.onnx"
onnx.save(optimized_model,save_path)



class Bi_Two_LSTM_batch(nn.Layer):
    def __init__(self,in_channels=3,out_channels=4):
        super().__init__()
        self.rnn = nn.LSTM(in_channels,out_channels,time_major=False,direction="bidirect",num_layers=2)
    def forward(self,x):
        # b,c,h,w =x.shape
        # x1 = paddle.reshape(x,[b,c,h*w])
        h0 = paddle.zeros((4,1,4))
        c0 = paddle.zeros((4,1,4))
        x2 = paddle.squeeze(x,2)
        x3 = paddle.transpose(x2,(0,2,1))
        out,_ = self.rnn(x3,(h0,c0))
        return out # shape 1,6,4

model_path = "paddle/Bi_Two_LSTM_batch"
model = Bi_Two_LSTM_batch()
model.eval()
infer_shape = [1,3,1,6]
input_spec=paddle.static.InputSpec(shape=infer_shape, dtype="float32")
paddle.onnx.export(model,model_path,input_spec=[input_spec],opset_version=11,enable_onnx_checker=True)

model = onnx.load(model_path+'.onnx')
model_sim ,check = simplify(model)
assert check,"simplified onnx model could not be validated"
save_path = model_path+"_sim.onnx"
onnx.save(model_sim,save_path)

model = onnx.load(save_path)
if model.ir_version<4:
    print("Model with ir_version below 4 requires to in clude initializer in graph input")
    exit()
inputs = model.graph.input
name_to_input = {}
for input in inputs:
    name_to_input[input.name]=input
for initializer in model.graph.initializer:
    if initializer.name in name_to_input:
        inputs.remove(name_to_input[initializer.name])
passes=["extract_constant_to_initializer","eliminate_unused_initializer"]
optimized_model = onnxoptimizer.optimize(model,passes)
save_path = model_path+"_sim_opt.onnx"
onnx.save(optimized_model,save_path)

2022-08-03 09:39:41 [INFO]	ONNX model generated is valid.
2022-08-03 09:39:41 [INFO]	ONNX model saved in paddle/One_LSTM_batch.onnx
2022-08-03 09:39:41 [INFO]	ONNX model generated is valid.
2022-08-03 09:39:41 [INFO]	ONNX model saved in paddle/Two_LSTM_batch.onnx
2022-08-03 09:39:41 [INFO]	ONNX model generated is valid.
2022-08-03 09:39:41 [INFO]	ONNX model saved in paddle/Bi_Two_LSTM_batch.onnx


/home/tl/anaconda3/envs/pdle/lib/python3.7/site-packages/onnx/numpy_helper.py:93: DeprecationWarning: `np.object` is a deprecated alias for the builtin `object`. To silence this warning, use `object` by itself. Doing this will not modify any behavior and is safe. 
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  if arr.dtype == np.object:

1.2 time_major=True

与pytorch的batch_first=False相同

class One_LSTM_time(nn.Layer):
    def __init__(self,in_channels=3,out_channels=4):
        super().__init__()
        self.rnn = nn.LSTM(in_channels,out_channels,time_major=True)
    def forward(self,x):
        # b,c,h,w =x.shape
        # x1 = paddle.reshape(x,[b,c,h*w])
        h0 = paddle.zeros((1,1,4))
        c0 = paddle.zeros((1,1,4))
        x2 = paddle.squeeze(x,2)
        x3 = paddle.transpose(x2,(2,0,1))
        out,_ = self.rnn(x3,(h0,c0))
        return out # shape 1,6,4

model_path = "paddle/One_LSTM_time"
model = One_LSTM_time()
model.eval()
infer_shape = [1,3,1,6]
input_spec=paddle.static.InputSpec(shape=infer_shape, dtype="float32")
paddle.onnx.export(model,model_path,input_spec=[input_spec],opset_version=11,enable_onnx_checker=True)

model = onnx.load(model_path+'.onnx')
model_sim ,check = simplify(model)
assert check,"simplified onnx model could not be validated"
save_path = model_path+"_sim.onnx"
onnx.save(model_sim,save_path)

model = onnx.load(save_path)
if model.ir_version<4:
    print("Model with ir_version below 4 requires to in clude initializer in graph input")
    exit()
inputs = model.graph.input
name_to_input = {}
for input in inputs:
    name_to_input[input.name]=input
for initializer in model.graph.initializer:
    if initializer.name in name_to_input:
        inputs.remove(name_to_input[initializer.name])
passes=["extract_constant_to_initializer","eliminate_unused_initializer"]
optimized_model = onnxoptimizer.optimize(model,passes)
save_path = model_path+"_sim_opt.onnx"
onnx.save(optimized_model,save_path)


class Two_LSTM_time(nn.Layer):
    def __init__(self,in_channels=3,out_channels=4):
        super().__init__()
        self.rnn = nn.LSTM(in_channels,out_channels,time_major=True,num_layers=2)
    def forward(self,x):
        # b,c,h,w =x.shape
        # x1 = paddle.reshape(x,[b,c,h*w])
        h0 = paddle.zeros((2,1,4))
        c0 = paddle.zeros((2,1,4))
        x2 = paddle.squeeze(x,2)
        x3 = paddle.transpose(x2,(2,0,1))
        out,_ = self.rnn(x3,(h0,c0))
        return out # shape 1,6,4

model_path = "paddle/Two_LSTM_time"
model = Two_LSTM_time()
model.eval()
infer_shape = [1,3,1,6]
input_spec=paddle.static.InputSpec(shape=infer_shape, dtype="float32")
paddle.onnx.export(model,model_path,input_spec=[input_spec],opset_version=11,enable_onnx_checker=True)

model = onnx.load(model_path+'.onnx')
model_sim ,check = simplify(model)
assert check,"simplified onnx model could not be validated"
save_path = model_path+"_sim.onnx"
onnx.save(model_sim,save_path)

model = onnx.load(save_path)
if model.ir_version<4:
    print("Model with ir_version below 4 requires to in clude initializer in graph input")
    exit()
inputs = model.graph.input
name_to_input = {}
for input in inputs:
    name_to_input[input.name]=input
for initializer in model.graph.initializer:
    if initializer.name in name_to_input:
        inputs.remove(name_to_input[initializer.name])
passes=["extract_constant_to_initializer","eliminate_unused_initializer"]
optimized_model = onnxoptimizer.optimize(model,passes)
save_path = model_path+"_sim_opt.onnx"
onnx.save(optimized_model,save_path)




class Bi_Two_LSTM_time(nn.Layer):
    def __init__(self,in_channels=3,out_channels=4):
        super().__init__()
        self.rnn = nn.LSTM(in_channels,out_channels,time_major=True,direction="bidirect",num_layers=2)
    def forward(self,x):
        # b,c,h,w =x.shape
        # x1 = paddle.reshape(x,[b,c,h*w])
        h0 = paddle.zeros((4,1,4))
        c0 = paddle.zeros((4,1,4))
        x2 = paddle.squeeze(x,2)
        x3 = paddle.transpose(x2,(2,0,1))
        out,_ = self.rnn(x3,(h0,c0))
        return out # shape 1,6,4

model_path = "paddle/Bi_Two_LSTM_time"
model = Bi_Two_LSTM_time()
model.eval()
infer_shape = [1,3,1,6]
input_spec=paddle.static.InputSpec(shape=infer_shape, dtype="float32")
paddle.onnx.export(model,model_path,input_spec=[input_spec],opset_version=11,enable_onnx_checker=True)

model = onnx.load(model_path+'.onnx')
model_sim ,check = simplify(model)
assert check,"simplified onnx model could not be validated"
save_path = model_path+"_sim.onnx"
onnx.save(model_sim,save_path)

model = onnx.load(save_path)
if model.ir_version<4:
    print("Model with ir_version below 4 requires to in clude initializer in graph input")
    exit()
inputs = model.graph.input
name_to_input = {}
for input in inputs:
    name_to_input[input.name]=input
for initializer in model.graph.initializer:
    if initializer.name in name_to_input:
        inputs.remove(name_to_input[initializer.name])
passes=["extract_constant_to_initializer","eliminate_unused_initializer"]
optimized_model = onnxoptimizer.optimize(model,passes)
save_path = model_path+"_sim_opt.onnx"
onnx.save(optimized_model,save_path)

2022-08-03 09:40:26 [INFO]	ONNX model generated is valid.
2022-08-03 09:40:26 [INFO]	ONNX model saved in paddle/One_LSTM_time.onnx
2022-08-03 09:40:26 [INFO]	ONNX model generated is valid.
2022-08-03 09:40:26 [INFO]	ONNX model saved in paddle/Two_LSTM_time.onnx
2022-08-03 09:40:26 [INFO]	ONNX model generated is valid.
2022-08-03 09:40:26 [INFO]	ONNX model saved in paddle/Bi_Two_LSTM_time.onnx

1.3 sequence_lens

对于paddle lstm还有一个参数是 sequence_lens,这个是与pytorch不一样的。sequence_length用于指定time steps不小于sequence_length时，就给截断了，多余的当做填充元素，只以单层LSTM，time_major=True来做个小试验

class Seq_One_LSTM_time(nn.Layer):
    def __init__(self,in_channels=3,out_channels=4):
        super().__init__()
        self.rnn = nn.LSTM(in_channels,out_channels,time_major=True)
    def forward(self,x):
        # b,c,h,w =x.shape
        # x1 = paddle.reshape(x,[b,c,h*w])
        h0 = paddle.zeros((1,1,4))
        c0 = paddle.zeros((1,1,4))
        sequence_lens = paddle.to_tensor([6]) # same shape to b
        x2 = paddle.squeeze(x,2)
        x3 = paddle.transpose(x2,(2,0,1))
        out,_ = self.rnn(inputs=x3,initial_states=(h0,c0),sequence_length=sequence_lens)
        return out # shape 1,6,4

model_path = "paddle/Seq_One_LSTM_time"
model = Seq_One_LSTM_time()
model.eval()
infer_shape = [1,3,1,6]
input_spec=paddle.static.InputSpec(shape=infer_shape, dtype="float32")
paddle.onnx.export(model,model_path,input_spec=[input_spec],opset_version=11,enable_onnx_checker=True)

model = onnx.load(model_path+'.onnx')
model_sim ,check = simplify(model)
assert check,"simplified onnx model could not be validated"
save_path = model_path+"_sim.onnx"
onnx.save(model_sim,save_path)

model = onnx.load(save_path)
if model.ir_version<4:
    print("Model with ir_version below 4 requires to in clude initializer in graph input")
    exit()
inputs = model.graph.input
name_to_input = {}
for input in inputs:
    name_to_input[input.name]=input
for initializer in model.graph.initializer:
    if initializer.name in name_to_input:
        inputs.remove(name_to_input[initializer.name])
passes=["extract_constant_to_initializer","eliminate_unused_initializer"]
optimized_model = onnxoptimizer.optimize(model,passes)
save_path = model_path+"_sim_opt.onnx"
onnx.save(optimized_model,save_path)

W0803 15:08:57.771442 24847 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.4, Runtime API Version: 11.2
W0803 15:08:57.774729 24847 device_context.cc:465] device: 0, cuDNN Version: 8.1.


2022-08-03 15:09:00 [INFO]	ONNX model generated is valid.
2022-08-03 15:09:00 [INFO]	ONNX model saved in paddle/Seq_One_LSTM_time.onnx


/home/tl/anaconda3/envs/pdle/lib/python3.7/site-packages/paddle2onnx/constant/dtypes.py:47: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  np.bool: core.VarDesc.VarType.BOOL,
/home/tl/anaconda3/envs/pdle/lib/python3.7/site-packages/paddle2onnx/constant/dtypes.py:48: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  core.VarDesc.VarType.FP32: np.float,
/home/tl/anaconda3/envs/pdle/lib/python3.7/site-packages/paddle2onnx/constant/dtypes.py:53: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  core.VarDesc.VarType.BOOL: np.bool
/home/tl/anaconda3/envs/pdle/lib/python3.7/site-packages/paddle/fluid/layers/utils.py:77: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3,and in 3.9 it will stop working
  return (isinstance(seq, collections.Sequence) and
/home/tl/anaconda3/envs/pdle/lib/python3.7/site-packages/onnx/helper.py:343: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3,and in 3.9 it will stop working
  is_iterable = isinstance(value, collections.Iterable)
/home/tl/anaconda3/envs/pdle/lib/python3.7/site-packages/onnx/numpy_helper.py:93: DeprecationWarning: `np.object` is a deprecated alias for the builtin `object`. To silence this warning, use `object` by itself. Doing this will not modify any behavior and is safe. 
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  if arr.dtype == np.object:

1.4 无初始状态

这一点是指我们在调用lstm的时候不会手动传入初始状态h0和c0,但内部会自动赋值初始状态为全0，pytorch也是这个原理，但是Onnx的结构图是不一样的，pytorch在不传入初始状态时的结构和paddle手动传入的结果是一样的，这个后边再说，综合对比所有的结构就可以看出差异

class Ini_One_LSTM_time(nn.Layer):
    def __init__(self,in_channels=3,out_channels=4):
        super().__init__()
        self.rnn = nn.LSTM(in_channels,out_channels,time_major=True)
    def forward(self,x):
        # b,c,h,w =x.shape
        # x1 = paddle.reshape(x,[b,c,h*w])
        x2 = paddle.squeeze(x,2)
        x3 = paddle.transpose(x2,(2,0,1))
        out,_ = self.rnn(x3)
        return out # shape 1,6,4

model_path = "paddle/Ini_One_LSTM_time"
model = Ini_One_LSTM_time()
model.eval()
infer_shape = [1,3,1,6]
input_spec=paddle.static.InputSpec(shape=infer_shape, dtype="float32")
paddle.onnx.export(model,model_path,input_spec=[input_spec],opset_version=11,enable_onnx_checker=True)

model = onnx.load(model_path+'.onnx')
model_sim ,check = simplify(model)
assert check,"simplified onnx model could not be validated"
save_path = model_path+"_sim.onnx"
onnx.save(model_sim,save_path)

model = onnx.load(save_path)
if model.ir_version<4:
    print("Model with ir_version below 4 requires to in clude initializer in graph input")
    exit()
inputs = model.graph.input
name_to_input = {}
for input in inputs:
    name_to_input[input.name]=input
for initializer in model.graph.initializer:
    if initializer.name in name_to_input:
        inputs.remove(name_to_input[initializer.name])
passes=["extract_constant_to_initializer","eliminate_unused_initializer"]
optimized_model = onnxoptimizer.optimize(model,passes)
save_path = model_path+"_sim_opt.onnx"
onnx.save(optimized_model,save_path)

2022-08-03 15:06:50 [INFO]	ONNX model generated is valid.
2022-08-03 15:06:50 [INFO]	ONNX model saved in paddle/Ini_One_LSTM_time.onnx

1.5 查看生成的onnx模型

paddle_onnx = sorted(os.listdir('paddle'))
paddle_onnx_paths = sorted([os.path.join('paddle',path) for path in paddle_onnx])
print(paddle_onnx)

['Bi_Two_LSTM_batch.onnx', 'Bi_Two_LSTM_batch_sim.onnx', 'Bi_Two_LSTM_batch_sim_opt.onnx', 'Bi_Two_LSTM_time.onnx', 'Bi_Two_LSTM_time_sim.onnx', 'Bi_Two_LSTM_time_sim_opt.onnx', 'Ini_One_LSTM_time.onnx', 'Ini_One_LSTM_time_sim.onnx', 'Ini_One_LSTM_time_sim_opt.onnx', 'One_LSTM_batch.onnx', 'One_LSTM_batch_sim.onnx', 'One_LSTM_batch_sim_opt.onnx', 'One_LSTM_time.onnx', 'One_LSTM_time_sim.onnx', 'One_LSTM_time_sim_opt.onnx', 'Seq_One_LSTM_time.onnx', 'Seq_One_LSTM_time_sim.onnx', 'Seq_One_LSTM_time_sim_opt.onnx', 'Two_LSTM_batch.onnx', 'Two_LSTM_batch_sim.onnx', 'Two_LSTM_batch_sim_opt.onnx', 'Two_LSTM_time.onnx', 'Two_LSTM_time_sim.onnx', 'Two_LSTM_time_sim_opt.onnx']

# 查看每个模型的大小
! du -sh paddle/*

16K	paddle/Bi_Two_LSTM_batch.onnx
8.0K	paddle/Bi_Two_LSTM_batch_sim.onnx
8.0K	paddle/Bi_Two_LSTM_batch_sim_opt.onnx
16K	paddle/Bi_Two_LSTM_time.onnx
8.0K	paddle/Bi_Two_LSTM_time_sim.onnx
8.0K	paddle/Bi_Two_LSTM_time_sim_opt.onnx
8.0K	paddle/Ini_One_LSTM_time.onnx
4.0K	paddle/Ini_One_LSTM_time_sim.onnx
4.0K	paddle/Ini_One_LSTM_time_sim_opt.onnx
8.0K	paddle/One_LSTM_batch.onnx
4.0K	paddle/One_LSTM_batch_sim.onnx
4.0K	paddle/One_LSTM_batch_sim_opt.onnx
8.0K	paddle/One_LSTM_time.onnx
4.0K	paddle/One_LSTM_time_sim.onnx
4.0K	paddle/One_LSTM_time_sim_opt.onnx
8.0K	paddle/Seq_One_LSTM_time.onnx
4.0K	paddle/Seq_One_LSTM_time_sim.onnx
4.0K	paddle/Seq_One_LSTM_time_sim_opt.onnx
12K	paddle/Two_LSTM_batch.onnx
4.0K	paddle/Two_LSTM_batch_sim.onnx
4.0K	paddle/Two_LSTM_batch_sim_opt.onnx
12K	paddle/Two_LSTM_time.onnx
4.0K	paddle/Two_LSTM_time_sim.onnx
4.0K	paddle/Two_LSTM_time_sim_opt.onnx

加载onnx模型并推理，对比推理结果，两两一对

def onnx_infer(model_path,data):
    """_summary_

    Args:
        model_path (_type_): _description_
        data (_type_): _description_
    """
    onnx_session=onnxruntime.InferenceSession(model_path)
    input_name = onnx_session.get_inputs()[0].name
    output_name = onnx_session.get_outputs()[0].name
    result = onnx_session.run([output_name],{input_name:data})
    return result[0]

test_data = np.random.random((1,3,1,6)).astype(np.float32) # batch,channel,height,width
results={}

for i,onnx_path in enumerate(paddle_onnx_paths):

    result = onnx_infer(onnx_path,test_data)
    results[os.path.basename(onnx_path)]=result

    if i%3 ==2:
        try:
            values = list(results.values())
            np.testing.assert_allclose(values[0],values[1],rtol=1e-5)
            np.testing.assert_allclose(values[2],values[1],rtol=1e-5)
            print(f"{list(results.keys())} have same results")
        except:
            print(f"{list(results.keys())} have different results")
        finally:
            results={}

2022-08-03 17:03:35.322915835 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Concat_4 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.322938940 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Concat_5 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.322945528 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Concat_8 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.322951708 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Slice_12 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.322957288 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Slice_13 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.322963180 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Constant_44 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.322968668 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Concat_13 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.322974083 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Concat_14 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.322979304 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Concat_17 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.322984535 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Slice_26 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.322990949 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Slice_27 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.322996580 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Constant_87 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.392730468 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Concat_4 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.392757571 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Concat_5 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.392764542 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Concat_8 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.392770328 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Slice_12 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.392775836 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Slice_13 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.392781517 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Constant_44 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.392786892 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Concat_13 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.392792184 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Concat_14 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.392797446 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Concat_17 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.392802640 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Slice_26 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.392808940 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Slice_27 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.392814520 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Constant_87 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.482187472 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Constant_0 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.482211960 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Concat_0 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.482218895 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Constant_4 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.482224351 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Concat_1 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.482229628 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Concat_6 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.482235089 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Concat_7 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.482240251 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Concat_10 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.482245348 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Constant_44 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.482250361 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Constant_45 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.482255449 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Constant_46 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.482267806 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Constant_47 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.482273280 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Constant_48 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.482278188 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Constant_49 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.482283068 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Constant_50 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.544810887 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Concat_4 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.544836135 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Concat_5 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.544842815 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Concat_8 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.544848451 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Slice_12 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.544853781 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Slice_13 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.544859396 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Constant_44 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.


['Bi_Two_LSTM_batch.onnx', 'Bi_Two_LSTM_batch_sim.onnx', 'Bi_Two_LSTM_batch_sim_opt.onnx'] have same results
['Bi_Two_LSTM_time.onnx', 'Bi_Two_LSTM_time_sim.onnx', 'Bi_Two_LSTM_time_sim_opt.onnx'] have same results
['Ini_One_LSTM_time.onnx', 'Ini_One_LSTM_time_sim.onnx', 'Ini_One_LSTM_time_sim_opt.onnx'] have same results
['One_LSTM_batch.onnx', 'One_LSTM_batch_sim.onnx', 'One_LSTM_batch_sim_opt.onnx'] have same results
['One_LSTM_time.onnx', 'One_LSTM_time_sim.onnx', 'One_LSTM_time_sim_opt.onnx'] have same results
['Seq_One_LSTM_time.onnx', 'Seq_One_LSTM_time_sim.onnx', 'Seq_One_LSTM_time_sim_opt.onnx'] have same results
['Two_LSTM_batch.onnx', 'Two_LSTM_batch_sim.onnx', 'Two_LSTM_batch_sim_opt.onnx'] have same results
['Two_LSTM_time.onnx', 'Two_LSTM_time_sim.onnx', 'Two_LSTM_time_sim_opt.onnx'] have same results


2022-08-03 17:03:35.627142152 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Concat_4 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.627166154 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Concat_5 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.627172672 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Concat_8 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.627178399 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Slice_12 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.627184004 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Slice_13 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.627189596 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Constant_44 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.680342507 [W:onnxruntime:, graph.cc:3559 CleanUnusedInitializersAndNodeArgs] Removing initializer 'assign_0.tmp_0'. It is not used by any node and should be removed from the model.
2022-08-03 17:03:35.684640360 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Concat_4 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.684662133 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Concat_5 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.684672193 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Concat_8 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.684680922 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Slice_12 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.684688933 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Slice_13 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.684697379 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Constant_44 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.710238396 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Concat_4 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.710278276 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Concat_5 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.710292209 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Concat_8 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.710303902 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Slice_12 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.710315317 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Slice_13 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.710326920 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Constant_44 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.710337914 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Concat_13 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.710348837 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Concat_14 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.710359584 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Concat_17 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.710370872 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Slice_26 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.710383823 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Slice_27 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.710395467 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Constant_87 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.810340323 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Concat_4 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.810366323 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Concat_5 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.810374143 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Concat_8 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.810380074 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Slice_12 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.810385473 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Slice_13 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.810391336 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Constant_44 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.810396558 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Concat_13 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.810401858 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Concat_14 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.810407006 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Concat_17 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.810412186 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Slice_26 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.810417317 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Slice_27 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2022-08-03 17:03:35.810424180 [W:onnxruntime:, graph.cc:1271 Graph] Initializer Constant_87 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.

以上onnx模型的推理可以看到在1e-5（十万分之一，6位有效数字）的容差下，结果一完全一样的。关于那么多warning,是sim后缀的模型产生的，原始模型和opt结尾的模型没有这个问题

以下部分是生成的以opt结尾的onnx模型的结构图：

类型	单层	双层	双层双向
time major=False
time_major=True

接着，是sequence_lens这个参数的影响，只是一个单层lstm,结果图是：
在这里插入图片描述
还有一个是无自定义初始状态的单层lstm的图，如下：

可以看到会有增加的算子，这部分其实是没必要的。

2 pytorch 生成LSTM

由于pytorch在导出onnx时，参数keep_initializers_as_inputs=False,所以只需要执行sim操作即可，否则要和paddle一样，多执行一个操作

2.1 batch_first=True

import os
import sys
sys.path.append('/home/tl/anaconda3/envs/pdle/lib/python3.7/site-packages')
import torch
from torch import nn
import numpy as np
from onnxsim import simplify
import onnxoptimizer
import onnx
import onnxruntime

class One_LSTM_batch(nn.Module):
    def __init__(self,in_channels=3,out_channels=4):
        super().__init__()
        self.rnn = nn.LSTM(in_channels,out_channels,batch_first=True)
    def forward(self,x):
        # b,c,h,w =x.shape
        # x1 = torch.reshape(x,[b,c,h*w])
        x2 = torch.squeeze(x,2)
        x3 = torch.permute(x2,(0,2,1))
        out,_ = self.rnn(x3)
        return out # shape 1,6,4
    
model = One_LSTM_batch()
model.to('cpu')
model.eval()

input = torch.randn(1,3,1,6)
output = model(input)
print("output shape:",output.shape)

input_shapes=[(1,3,1,6)]
onnx_export_path = "torch/One_lstm_batch.onnx"
dummy_input=[]
for ele in input_shapes:
    dummy_input.append(torch.randn(ele))
dummy_input=tuple(dummy_input)

# torch.onnx.export(model, dummy_input, onnx_export_path,export_params=True,verbose=False, opset_version=11,do_constant_folding=True, input_names=["input"], output_names=["output"], dynamic_axes={'input' : {0 : 'batch_size'},'output' : {0 : 'batch_size'}})
torch.onnx.export(model, dummy_input, onnx_export_path,export_params=True,verbose=False, opset_version=11,do_constant_folding=True,keep_initializers_as_inputs=False,input_names=["input"], output_names=["output"])
print("export onnx to:",onnx_export_path)

onnx_model = onnx.load(onnx_export_path)
model_sim ,check = simplify(onnx_model)
assert check,"simplified onnx model could not be validated"
save_path = os.path.splitext(onnx_export_path)[0]+"_sim.onnx"
onnx.save(model_sim,save_path)

class Two_LSTM_batch(nn.Module):
    def __init__(self,in_channels=3,out_channels=4):
        super().__init__()
        self.rnn = nn.LSTM(in_channels,out_channels,batch_first=True,num_layers=2)
    def forward(self,x):
        # b,c,h,w =x.shape
        # x1 = torch.reshape(x,[b,c,h*w])
        x2 = torch.squeeze(x,2)
        x3 = torch.permute(x2,(0,2,1))
        out,_ = self.rnn(x3)
        return out # shape 1,6,4
    
model = Two_LSTM_batch()
model.to('cpu')
model.eval()

input = torch.randn(1,3,1,6)
output = model(input)
print("output shape:",output.shape)

input_shapes=[(1,3,1,6)]
onnx_export_path = "torch/Two_lstm_batch.onnx"
dummy_input=[]
for ele in input_shapes:
    dummy_input.append(torch.randn(ele))
dummy_input=tuple(dummy_input)

# torch.onnx.export(model, dummy_input, onnx_export_path,export_params=True,verbose=False, opset_version=11,do_constant_folding=True, input_names=["input"], output_names=["output"], dynamic_axes={'input' : {0 : 'batch_size'},'output' : {0 : 'batch_size'}})
torch.onnx.export(model, dummy_input, onnx_export_path,export_params=True,verbose=False, opset_version=11,do_constant_folding=True, keep_initializers_as_inputs=False,input_names=["input"], output_names=["output"])
print("export onnx to:",onnx_export_path)

onnx_model = onnx.load(onnx_export_path)
model_sim ,check = simplify(onnx_model)
assert check,"simplified onnx model could not be validated"
save_path = os.path.splitext(onnx_export_path)[0]+"_sim.onnx"
onnx.save(model_sim,save_path)

class Bi_Two_LSTM_batch(nn.Module):
    def __init__(self,in_channels=3,out_channels=4):
        super().__init__()
        self.rnn = nn.LSTM(in_channels,out_channels,batch_first=True,num_layers=2,bidirectional=True)
    def forward(self,x):
        # b,c,h,w =x.shape
        # x1 = torch.reshape(x,[b,c,h*w])
        x2 = torch.squeeze(x,2)
        x3 = torch.permute(x2,(0,2,1))
        out,_ = self.rnn(x3)
        return out # shape 1,6,4
    
model = Bi_Two_LSTM_batch()
model.to('cpu')
model.eval()

input = torch.randn(1,3,1,6)
output = model(input)
print("output shape:",output.shape)

input_shapes=[(1,3,1,6)]
onnx_export_path = "torch/Bi_Two_lstm_batch.onnx"
dummy_input=[]
for ele in input_shapes:
    dummy_input.append(torch.randn(ele))
dummy_input=tuple(dummy_input)

# torch.onnx.export(model, dummy_input, onnx_export_path,export_params=True,verbose=False, opset_version=11,do_constant_folding=True, input_names=["input"], output_names=["output"], dynamic_axes={'input' : {0 : 'batch_size'},'output' : {0 : 'batch_size'}})
torch.onnx.export(model, dummy_input, onnx_export_path,export_params=True,verbose=False, opset_version=11,do_constant_folding=True, keep_initializers_as_inputs=False,input_names=["input"], output_names=["output"])
print("export onnx to:",onnx_export_path)

onnx_model = onnx.load(onnx_export_path)
model_sim ,check = simplify(onnx_model)
assert check,"simplified onnx model could not be validated"
save_path = os.path.splitext(onnx_export_path)[0]+"_sim.onnx"
onnx.save(model_sim,save_path)

output shape: torch.Size([1, 6, 4])
export onnx to: torch/One_lstm_batch.onnx
output shape: torch.Size([1, 6, 4])
export onnx to: torch/Two_lstm_batch.onnx


/home/tl/anaconda3/envs/ptch/lib/python3.7/site-packages/torch/onnx/symbolic_opset9.py:2192: UserWarning: Exporting a model to ONNX with a batch_size other than 1, with a variable length with LSTM can cause an error when running the ONNX model with a different batch size. Make sure to save the model with a batch size of 1, or define the initial states (h0/c0) as inputs of the model. 
  "or define the initial states (h0/c0) as inputs of the model. ")
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.


output shape: torch.Size([1, 6, 8])
export onnx to: torch/Bi_Two_lstm_batch.onnx


WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.

有一些warning,所以最好也可以手动传入参数

2.2 batch_first=False

class One_LSTM_time(nn.Module):
    def __init__(self,in_channels=3,out_channels=4):
        super().__init__()
        self.rnn = nn.LSTM(in_channels,out_channels,batch_first=False)
    def forward(self,x):
        # b,c,h,w =x.shape
        # x1 = torch.reshape(x,[b,c,h*w])
        x2 = torch.squeeze(x,2)
        x3 = torch.permute(x2,(2,0,1))
        out,_ = self.rnn(x3)
        return out # shape 1,6,4
    
model = One_LSTM_time()
model.to('cpu')
model.eval()

input = torch.randn(1,3,1,6)
output = model(input)
print("output shape:",output.shape)

input_shapes=[(1,3,1,6)]
onnx_export_path = "torch/One_lstm_time.onnx"
dummy_input=[]
for ele in input_shapes:
    dummy_input.append(torch.randn(ele))
dummy_input=tuple(dummy_input)

# torch.onnx.export(model, dummy_input, onnx_export_path,export_params=True,verbose=False, opset_version=11,do_constant_folding=True, input_names=["input"], output_names=["output"], dynamic_axes={'input' : {0 : 'batch_size'},'output' : {0 : 'batch_size'}})
torch.onnx.export(model, dummy_input, onnx_export_path,export_params=True,verbose=False, opset_version=11,do_constant_folding=True, input_names=["input"], output_names=["output"])
print("export onnx to:",onnx_export_path)

onnx_model = onnx.load(onnx_export_path)
model_sim ,check = simplify(onnx_model)
assert check,"simplified onnx model could not be validated"
save_path = os.path.splitext(onnx_export_path)[0]+"_sim.onnx"
onnx.save(model_sim,save_path)

class Two_LSTM_time(nn.Module):
    def __init__(self,in_channels=3,out_channels=4):
        super().__init__()
        self.rnn = nn.LSTM(in_channels,out_channels,batch_first=False,num_layers=2)
    def forward(self,x):
        # b,c,h,w =x.shape
        # x1 = torch.reshape(x,[b,c,h*w])
        x2 = torch.squeeze(x,2)
        x3 = torch.permute(x2,(2,0,1))
        out,_ = self.rnn(x3)
        return out # shape 1,6,4
    
model = Two_LSTM_time()
model.to('cpu')
model.eval()

input = torch.randn(1,3,1,6)
output = model(input)
print("output shape:",output.shape)

input_shapes=[(1,3,1,6)]
onnx_export_path = "torch/Two_lstm_time.onnx"
dummy_input=[]
for ele in input_shapes:
    dummy_input.append(torch.randn(ele))
dummy_input=tuple(dummy_input)

# torch.onnx.export(model, dummy_input, onnx_export_path,export_params=True,verbose=False, opset_version=11,do_constant_folding=True, input_names=["input"], output_names=["output"], dynamic_axes={'input' : {0 : 'batch_size'},'output' : {0 : 'batch_size'}})
torch.onnx.export(model, dummy_input, onnx_export_path,export_params=True,verbose=False, opset_version=11,do_constant_folding=True, input_names=["input"], output_names=["output"])
print("export onnx to:",onnx_export_path)

onnx_model = onnx.load(onnx_export_path)
model_sim ,check = simplify(onnx_model)
assert check,"simplified onnx model could not be validated"
save_path = os.path.splitext(onnx_export_path)[0]+"_sim.onnx"
onnx.save(model_sim,save_path)

class Bi_Two_LSTM_time(nn.Module):
    def __init__(self,in_channels=3,out_channels=4):
        super().__init__()
        self.rnn = nn.LSTM(in_channels,out_channels,batch_first=False,num_layers=2,bidirectional=True)
    def forward(self,x):
        # b,c,h,w =x.shape
        # x1 = torch.reshape(x,[b,c,h*w])
        x2 = torch.squeeze(x,2)
        x3 = torch.permute(x2,(2,0,1))
        out,_ = self.rnn(x3)
        return out # shape 1,6,4
    
model = Bi_Two_LSTM_time()
model.to('cpu')
model.eval()

input = torch.randn(1,3,1,6)
output = model(input)
print("output shape:",output.shape)

input_shapes=[(1,3,1,6)]
onnx_export_path = "torch/Bi_Two_lstm_time.onnx"
dummy_input=[]
for ele in input_shapes:
    dummy_input.append(torch.randn(ele))
dummy_input=tuple(dummy_input)

# torch.onnx.export(model, dummy_input, onnx_export_path,export_params=True,verbose=False, opset_version=11,do_constant_folding=True, input_names=["input"], output_names=["output"], dynamic_axes={'input' : {0 : 'batch_size'},'output' : {0 : 'batch_size'}})
torch.onnx.export(model, dummy_input, onnx_export_path,export_params=True,verbose=False, opset_version=11,do_constant_folding=True, input_names=["input"], output_names=["output"])
print("export onnx to:",onnx_export_path)

onnx_model = onnx.load(onnx_export_path)
model_sim ,check = simplify(onnx_model)
assert check,"simplified onnx model could not be validated"
save_path = os.path.splitext(onnx_export_path)[0]+"_sim.onnx"
onnx.save(model_sim,save_path)

output shape: torch.Size([6, 1, 4])
export onnx to: torch/One_lstm_time.onnx
output shape: torch.Size([6, 1, 4])
export onnx to: torch/Two_lstm_time.onnx


/home/tl/anaconda3/envs/ptch/lib/python3.7/site-packages/torch/onnx/symbolic_opset9.py:2192: UserWarning: Exporting a model to ONNX with a batch_size other than 1, with a variable length with LSTM can cause an error when running the ONNX model with a different batch size. Make sure to save the model with a batch size of 1, or define the initial states (h0/c0) as inputs of the model. 
  "or define the initial states (h0/c0) as inputs of the model. ")
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.


output shape: torch.Size([6, 1, 8])
export onnx to: torch/Bi_Two_lstm_time.onnx


WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.

2.3 查看生成的onnx模型

pytorch_onnx = sorted(os.listdir('torch'))
pytorch_onnx_paths = sorted([os.path.join('torch',path) for path in pytorch_onnx])
print(pytorch_onnx)

['Bi_Two_lstm_batch.onnx', 'Bi_Two_lstm_batch_sim.onnx', 'Bi_Two_lstm_time.onnx', 'Bi_Two_lstm_time_sim.onnx', 'One_lstm_batch.onnx', 'One_lstm_batch_sim.onnx', 'One_lstm_time.onnx', 'One_lstm_time_sim.onnx', 'Two_lstm_batch.onnx', 'Two_lstm_batch_sim.onnx', 'Two_lstm_time.onnx', 'Two_lstm_time_sim.onnx']

! du -sh torch/*

8.0K	torch/Bi_Two_lstm_batch.onnx
8.0K	torch/Bi_Two_lstm_batch_sim.onnx
8.0K	torch/Bi_Two_lstm_time.onnx
8.0K	torch/Bi_Two_lstm_time_sim.onnx
4.0K	torch/One_lstm_batch.onnx
4.0K	torch/One_lstm_batch_sim.onnx
4.0K	torch/One_lstm_time.onnx
4.0K	torch/One_lstm_time_sim.onnx
8.0K	torch/Two_lstm_batch.onnx
4.0K	torch/Two_lstm_batch_sim.onnx
8.0K	torch/Two_lstm_time.onnx
4.0K	torch/Two_lstm_time_sim.onnx

def onnx_infer(model_path,data):
    """_summary_

    Args:
        model_path (_type_): _description_
        data (_type_): _description_
    """
    onnx_session=onnxruntime.InferenceSession(model_path)
    input_name = onnx_session.get_inputs()[0].name
    output_name = onnx_session.get_outputs()[0].name
    result = onnx_session.run([output_name],{input_name:data})
    return result[0]

test_data = np.random.random((1,3,1,6)).astype(np.float32) # batch,channel,height,width
results={}

for i,onnx_path in enumerate(pytorch_onnx_paths):

    result = onnx_infer(onnx_path,test_data)
    results[os.path.basename(onnx_path)]=result

    if i%2 ==1:
        try:
            values = list(results.values())
            np.testing.assert_allclose(values[0],values[1],rtol=1e-7)
            print(f"{list(results.keys())} have same results")
        except:
            print(f"{list(results.keys())} have different results")
        finally:
            results={}

['Bi_Two_lstm_batch.onnx', 'Bi_Two_lstm_batch_sim.onnx'] have same results
['Bi_Two_lstm_time.onnx', 'Bi_Two_lstm_time_sim.onnx'] have same results
['One_lstm_batch.onnx', 'One_lstm_batch_sim.onnx'] have same results
['One_lstm_time.onnx', 'One_lstm_time_sim.onnx'] have same results
['Two_lstm_batch.onnx', 'Two_lstm_batch_sim.onnx'] have same results
['Two_lstm_time.onnx', 'Two_lstm_time_sim.onnx'] have same results

看起来pytorch转换成onnx在1e-7的精度下结果完全相同，相比paddle精度还是高一点

查看一下onnx的图

类型	单层	双层	双层双向
batch_first=True
batch_first=False

3 Tensorflow2 生成LSTM

在这里我使用的是tensorflow2.8版本。

import os
import tensorflow as tf
import onnx
import tf2onnx
from onnxsim import simplify
import onnxruntime
import numpy as np
from tensorflow.keras import layers as nn
#only use cpu
devices = tf.config.list_physical_devices("CPU")
tf.config.set_visible_devices(devices)

2022-08-09 16:10:42.419611: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.

因为tensorflow和pytorch默认是返回每一步的output的，而tensorflow是可以指定返回最后一步还是全部，由reture_sequences来决定，为了保持一致，设置为True.
tensorflow的是初始输入是格式是B，H,W,C,以此为基础进行构建

3.1 time_major=False

def One_LSTM_batch():
    input = nn.Input(shape=[1,6,3],batch_size=1,name="input")
    middle = tf.squeeze(input,axis=1)
    output = nn.LSTM(4,time_major=False,return_sequences=True,name='one')(middle)
    model = tf.keras.models.Model(input,output,name="One_LSTM_batch")
    return model
model = One_LSTM_batch()
#tf.keras.utils.plot_model(model,to_file=f'tensorflow/{model.name}.png',show_shapes=True,show_layer_names=True,show_dtype=True)
model.save("tensorflow/One_LSTM_batch")
spec = (tf.TensorSpec((1,1,6,3),tf.float32,name="input"),)
output_path="tensorflow/"+model.name+'.onnx'
model_proto,_=tf2onnx.convert.from_keras(model,input_signature=spec,opset=11,output_path=output_path)
output_names=[n.name for n in model_proto.graph.output]
model = onnx.load(output_path)
model_sim ,check = simplify(model)
assert check,"simplified onnx model could not be validated"
save_path = output_path.split('.')[0]+"_sim.onnx"
onnx.save(model_sim,save_path)

def Two_LSTM_batch():
    input = nn.Input(shape=[1,6,3],batch_size=1,name="input")
    middle = tf.squeeze(input,axis=1)
    output1 = nn.LSTM(4,time_major=False,return_sequences=True,name='one')(middle)
    output = nn.LSTM(4,time_major=False,return_sequences=True,name='two')(output1)
    model = tf.keras.models.Model(input,output,name="Two_LSTM_batch")
    return model
model = Two_LSTM_batch()
#tf.keras.utils.plot_model(model,to_file=f'tensorflow/{model.name}.png',show_shapes=True,show_layer_names=True,show_dtype=True)
model.save("tensorflow/Two_LSTM_batch")
spec = (tf.TensorSpec((1,1,6,3),tf.float32,name="input"),)
output_path="tensorflow/"+model.name+'.onnx'
model_proto,_=tf2onnx.convert.from_keras(model,input_signature=spec,opset=11,output_path=output_path)
output_names=[n.name for n in model_proto.graph.output]
model = onnx.load(output_path)
model_sim ,check = simplify(model)
assert check,"simplified onnx model could not be validated"
save_path = output_path.split('.')[0]+"_sim.onnx"
onnx.save(model_sim,save_path)

def Bi_Two_LSTM_batch():
    input = nn.Input(shape=[1,6,3],batch_size=1,name="input")
    middle = tf.squeeze(input,axis=1)
    output1 = nn.Bidirectional(nn.LSTM(4,time_major=False,return_sequences=True,name='one'),merge_mode="concat")(middle)
    output = nn.Bidirectional(nn.LSTM(4,time_major=False,return_sequences=True,name='two'),merge_mode="concat")(output1)
    model = tf.keras.models.Model(input,output,name="Bi_Two_LSTM_batch")
    return model
model = Bi_Two_LSTM_batch()
#tf.keras.utils.plot_model(model,to_file=f'tensorflow/{model.name}.png',show_shapes=True,show_layer_names=True,show_dtype=True)
model.save("tensorflow/Bi_Two_LSTM_batch")
spec = (tf.TensorSpec((1,1,6,3),tf.float32,name="input"),)
output_path="tensorflow/"+model.name+'.onnx'
model_proto,_=tf2onnx.convert.from_keras(model,input_signature=spec,opset=11,output_path=output_path)
output_names=[n.name for n in model_proto.graph.output]
model = onnx.load(output_path)
model_sim ,check = simplify(model)
assert check,"simplified onnx model could not be validated"
save_path = output_path.split('.')[0]+"_sim.onnx"
onnx.save(model_sim,save_path)

WARNING:tensorflow:Compiled the loaded model, but the compiled metrics have yet to be built. `model.compile_metrics` will be empty until you train or evaluate the model.


WARNING:tensorflow:Compiled the loaded model, but the compiled metrics have yet to be built. `model.compile_metrics` will be empty until you train or evaluate the model.
WARNING:absl:Found untraced functions such as lstm_cell_191_layer_call_fn, lstm_cell_191_layer_call_and_return_conditional_losses while saving (showing 2 of 2). These functions will not be directly callable after loading.


INFO:tensorflow:Assets written to: tensorflow/One_LSTM_batch/assets


INFO:tensorflow:Assets written to: tensorflow/One_LSTM_batch/assets
WARNING:absl:<keras.layers.recurrent.LSTMCell object at 0x7f3ac810f8e0> has the same name 'LSTMCell' as a built-in Keras object. Consider renaming <class 'keras.layers.recurrent.LSTMCell'> to avoid naming conflicts when loading with `tf.keras.models.load_model`. If renaming is not possible, pass the object in the `custom_objects` parameter of the load function.
2022-08-16 16:19:10.687955: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 4
2022-08-16 16:19:10.688047: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2022-08-16 16:19:10.706293: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 14627 MB memory:  -> device: 0, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:da:00.0, compute capability: 7.0
2022-08-16 16:19:10.707378: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 14627 MB memory:  -> device: 1, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:db:00.0, compute capability: 7.0
2022-08-16 16:19:10.708381: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:2 with 14627 MB memory:  -> device: 2, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:dc:00.0, compute capability: 7.0
2022-08-16 16:19:10.709447: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:3 with 14627 MB memory:  -> device: 3, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:dd:00.0, compute capability: 7.0
2022-08-16 16:19:10.719888: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:1164] Optimization results for grappler item: graph_to_optimize
  function_optimizer: Graph size after: 87 nodes (60), 98 edges (68), time = 2.078ms.
  function_optimizer: Graph size after: 87 nodes (0), 98 edges (0), time = 1.092ms.
Optimization results for grappler item: while_cond_1209930
  function_optimizer: function_optimizer did nothing. time = 0.004ms.
  function_optimizer: function_optimizer did nothing. time = 0ms.
Optimization results for grappler item: while_body_1209931
  function_optimizer: function_optimizer did nothing. time = 0.002ms.
  function_optimizer: function_optimizer did nothing. time = 0ms.

2022-08-16 16:19:10.780564: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 4
2022-08-16 16:19:10.780636: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2022-08-16 16:19:10.798557: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 14627 MB memory:  -> device: 0, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:da:00.0, compute capability: 7.0
2022-08-16 16:19:10.799642: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 14627 MB memory:  -> device: 1, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:db:00.0, compute capability: 7.0
2022-08-16 16:19:10.800648: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:2 with 14627 MB memory:  -> device: 2, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:dc:00.0, compute capability: 7.0
2022-08-16 16:19:10.801728: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:3 with 14627 MB memory:  -> device: 3, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:dd:00.0, compute capability: 7.0
2022-08-16 16:19:10.812277: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:1164] Optimization results for grappler item: graph_to_optimize
  constant_folding: Graph size after: 30 nodes (-23), 30 edges (-27), time = 1.377ms.
  function_optimizer: Graph size after: 30 nodes (0), 30 edges (0), time = 0.6ms.
  constant_folding: Graph size after: 30 nodes (0), 30 edges (0), time = 0.554ms.
  function_optimizer: Graph size after: 30 nodes (0), 30 edges (0), time = 0.592ms.
Optimization results for grappler item: while_cond_1209930
  constant_folding: Graph size after: 14 nodes (0), 4 edges (0), time = 0.272ms.
  function_optimizer: function_optimizer did nothing. time = 0.002ms.
  constant_folding: Graph size after: 14 nodes (0), 4 edges (0), time = 0.182ms.
  function_optimizer: function_optimizer did nothing. time = 0.001ms.
Optimization results for grappler item: while_body_1209931
  constant_folding: Graph size after: 50 nodes (0), 50 edges (0), time = 0.766ms.
  function_optimizer: function_optimizer did nothing. time = 0.001ms.
  constant_folding: Graph size after: 50 nodes (0), 50 edges (0), time = 0.646ms.
  function_optimizer: function_optimizer did nothing. time = 0.001ms.



WARNING:tensorflow:Compiled the loaded model, but the compiled metrics have yet to be built. `model.compile_metrics` will be empty until you train or evaluate the model.


WARNING:tensorflow:Compiled the loaded model, but the compiled metrics have yet to be built. `model.compile_metrics` will be empty until you train or evaluate the model.
WARNING:absl:Found untraced functions such as lstm_cell_192_layer_call_fn, lstm_cell_192_layer_call_and_return_conditional_losses, lstm_cell_193_layer_call_fn, lstm_cell_193_layer_call_and_return_conditional_losses while saving (showing 4 of 4). These functions will not be directly callable after loading.


INFO:tensorflow:Assets written to: tensorflow/Two_LSTM_batch/assets


INFO:tensorflow:Assets written to: tensorflow/Two_LSTM_batch/assets
WARNING:absl:<keras.layers.recurrent.LSTMCell object at 0x7f39dc6fd2e0> has the same name 'LSTMCell' as a built-in Keras object. Consider renaming <class 'keras.layers.recurrent.LSTMCell'> to avoid naming conflicts when loading with `tf.keras.models.load_model`. If renaming is not possible, pass the object in the `custom_objects` parameter of the load function.
WARNING:absl:<keras.layers.recurrent.LSTMCell object at 0x7f39bc2a58e0> has the same name 'LSTMCell' as a built-in Keras object. Consider renaming <class 'keras.layers.recurrent.LSTMCell'> to avoid naming conflicts when loading with `tf.keras.models.load_model`. If renaming is not possible, pass the object in the `custom_objects` parameter of the load function.
2022-08-16 16:19:16.650941: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 4
2022-08-16 16:19:16.651052: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2022-08-16 16:19:16.669042: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 14627 MB memory:  -> device: 0, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:da:00.0, compute capability: 7.0
2022-08-16 16:19:16.670119: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 14627 MB memory:  -> device: 1, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:db:00.0, compute capability: 7.0
2022-08-16 16:19:16.671115: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:2 with 14627 MB memory:  -> device: 2, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:dc:00.0, compute capability: 7.0
2022-08-16 16:19:16.672189: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:3 with 14627 MB memory:  -> device: 3, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:dd:00.0, compute capability: 7.0
2022-08-16 16:19:16.689047: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:1164] Optimization results for grappler item: graph_to_optimize
  function_optimizer: Graph size after: 170 nodes (120), 193 edges (136), time = 3.802ms.
  function_optimizer: Graph size after: 170 nodes (0), 193 edges (0), time = 2.068ms.
Optimization results for grappler item: while_cond_1221920
  function_optimizer: function_optimizer did nothing. time = 0.005ms.
  function_optimizer: function_optimizer did nothing. time = 0.001ms.
Optimization results for grappler item: while_body_1221921
  function_optimizer: function_optimizer did nothing. time = 0.003ms.
  function_optimizer: function_optimizer did nothing. time = 0ms.
Optimization results for grappler item: while_body_1221499
  function_optimizer: function_optimizer did nothing. time = 0.002ms.
  function_optimizer: function_optimizer did nothing. time = 0.001ms.
Optimization results for grappler item: while_cond_1221498
  function_optimizer: function_optimizer did nothing. time = 0.002ms.
  function_optimizer: function_optimizer did nothing. time = 0ms.

2022-08-16 16:19:16.788220: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 4
2022-08-16 16:19:16.788291: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2022-08-16 16:19:16.812080: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 14627 MB memory:  -> device: 0, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:da:00.0, compute capability: 7.0
2022-08-16 16:19:16.813181: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 14627 MB memory:  -> device: 1, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:db:00.0, compute capability: 7.0
2022-08-16 16:19:16.814187: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:2 with 14627 MB memory:  -> device: 2, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:dc:00.0, compute capability: 7.0
2022-08-16 16:19:16.815259: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:3 with 14627 MB memory:  -> device: 3, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:dd:00.0, compute capability: 7.0
2022-08-16 16:19:16.832669: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:1164] Optimization results for grappler item: graph_to_optimize
  constant_folding: Graph size after: 56 nodes (-46), 57 edges (-54), time = 2.381ms.
  function_optimizer: Graph size after: 56 nodes (0), 57 edges (0), time = 1.085ms.
  constant_folding: Graph size after: 56 nodes (0), 57 edges (0), time = 0.99ms.
  function_optimizer: Graph size after: 56 nodes (0), 57 edges (0), time = 1.093ms.
Optimization results for grappler item: while_cond_1221920
  constant_folding: Graph size after: 14 nodes (0), 4 edges (0), time = 0.284ms.
  function_optimizer: function_optimizer did nothing. time = 0.001ms.
  constant_folding: Graph size after: 14 nodes (0), 4 edges (0), time = 0.182ms.
  function_optimizer: function_optimizer did nothing. time = 0.001ms.
Optimization results for grappler item: while_body_1221921
  constant_folding: Graph size after: 50 nodes (0), 50 edges (0), time = 0.779ms.
  function_optimizer: function_optimizer did nothing. time = 0.001ms.
  constant_folding: Graph size after: 50 nodes (0), 50 edges (0), time = 0.65ms.
  function_optimizer: function_optimizer did nothing. time = 0.001ms.
Optimization results for grappler item: while_body_1221499
  constant_folding: Graph size after: 50 nodes (0), 50 edges (0), time = 0.776ms.
  function_optimizer: function_optimizer did nothing. time = 0.001ms.
  constant_folding: Graph size after: 50 nodes (0), 50 edges (0), time = 0.645ms.
  function_optimizer: function_optimizer did nothing. time = 0.002ms.
Optimization results for grappler item: while_cond_1221498
  constant_folding: Graph size after: 14 nodes (0), 4 edges (0), time = 0.271ms.
  function_optimizer: function_optimizer did nothing. time = 0.001ms.
  constant_folding: Graph size after: 14 nodes (0), 4 edges (0), time = 0.182ms.
  function_optimizer: function_optimizer did nothing. time = 0.001ms.



WARNING:tensorflow:Compiled the loaded model, but the compiled metrics have yet to be built. `model.compile_metrics` will be empty until you train or evaluate the model.


WARNING:tensorflow:Compiled the loaded model, but the compiled metrics have yet to be built. `model.compile_metrics` will be empty until you train or evaluate the model.
WARNING:absl:Found untraced functions such as lstm_cell_195_layer_call_fn, lstm_cell_195_layer_call_and_return_conditional_losses, lstm_cell_196_layer_call_fn, lstm_cell_196_layer_call_and_return_conditional_losses, lstm_cell_198_layer_call_fn while saving (showing 5 of 8). These functions will not be directly callable after loading.


INFO:tensorflow:Assets written to: tensorflow/Bi_Two_LSTM_batch/assets


INFO:tensorflow:Assets written to: tensorflow/Bi_Two_LSTM_batch/assets
WARNING:absl:<keras.layers.recurrent.LSTMCell object at 0x7f39dc655250> has the same name 'LSTMCell' as a built-in Keras object. Consider renaming <class 'keras.layers.recurrent.LSTMCell'> to avoid naming conflicts when loading with `tf.keras.models.load_model`. If renaming is not possible, pass the object in the `custom_objects` parameter of the load function.
WARNING:absl:<keras.layers.recurrent.LSTMCell object at 0x7f39f4480d30> has the same name 'LSTMCell' as a built-in Keras object. Consider renaming <class 'keras.layers.recurrent.LSTMCell'> to avoid naming conflicts when loading with `tf.keras.models.load_model`. If renaming is not possible, pass the object in the `custom_objects` parameter of the load function.
WARNING:absl:<keras.layers.recurrent.LSTMCell object at 0x7f39f44a2cd0> has the same name 'LSTMCell' as a built-in Keras object. Consider renaming <class 'keras.layers.recurrent.LSTMCell'> to avoid naming conflicts when loading with `tf.keras.models.load_model`. If renaming is not possible, pass the object in the `custom_objects` parameter of the load function.
WARNING:absl:<keras.layers.recurrent.LSTMCell object at 0x7f39c4533760> has the same name 'LSTMCell' as a built-in Keras object. Consider renaming <class 'keras.layers.recurrent.LSTMCell'> to avoid naming conflicts when loading with `tf.keras.models.load_model`. If renaming is not possible, pass the object in the `custom_objects` parameter of the load function.
2022-08-16 16:19:31.962845: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 4
2022-08-16 16:19:31.962974: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2022-08-16 16:19:31.980983: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 14627 MB memory:  -> device: 0, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:da:00.0, compute capability: 7.0
2022-08-16 16:19:31.982057: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 14627 MB memory:  -> device: 1, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:db:00.0, compute capability: 7.0
2022-08-16 16:19:31.983048: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:2 with 14627 MB memory:  -> device: 2, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:dc:00.0, compute capability: 7.0
2022-08-16 16:19:31.984113: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:3 with 14627 MB memory:  -> device: 3, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:dd:00.0, compute capability: 7.0
2022-08-16 16:19:32.016037: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:1164] Optimization results for grappler item: graph_to_optimize
  function_optimizer: Graph size after: 348 nodes (244), 397 edges (276), time = 8.373ms.
  function_optimizer: Graph size after: 348 nodes (0), 397 edges (0), time = 4.505ms.
Optimization results for grappler item: while_cond_1256176
  function_optimizer: function_optimizer did nothing. time = 0.004ms.
  function_optimizer: function_optimizer did nothing. time = 0.001ms.
Optimization results for grappler item: while_body_1255323
  function_optimizer: function_optimizer did nothing. time = 0.003ms.
  function_optimizer: function_optimizer did nothing. time = 0.001ms.
Optimization results for grappler item: while_cond_1255322
  function_optimizer: function_optimizer did nothing. time = 0.002ms.
  function_optimizer: function_optimizer did nothing. time = 0.001ms.
Optimization results for grappler item: while_body_1256601
  function_optimizer: function_optimizer did nothing. time = 0.003ms.
  function_optimizer: function_optimizer did nothing. time = 0.001ms.
Optimization results for grappler item: while_cond_1255746
  function_optimizer: function_optimizer did nothing. time = 0.002ms.
  function_optimizer: function_optimizer did nothing. time = 0.001ms.
Optimization results for grappler item: while_body_1256177
  function_optimizer: function_optimizer did nothing. time = 0.002ms.
  function_optimizer: function_optimizer did nothing. time = 0.001ms.
Optimization results for grappler item: while_body_1255747
  function_optimizer: function_optimizer did nothing. time = 0.002ms.
  function_optimizer: function_optimizer did nothing. time = 0.001ms.
Optimization results for grappler item: while_cond_1256600
  function_optimizer: function_optimizer did nothing. time = 0.002ms.
  function_optimizer: function_optimizer did nothing. time = 0.001ms.

2022-08-16 16:19:32.194241: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 4
2022-08-16 16:19:32.194314: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2022-08-16 16:19:32.212231: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 14627 MB memory:  -> device: 0, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:da:00.0, compute capability: 7.0
2022-08-16 16:19:32.213308: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 14627 MB memory:  -> device: 1, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:db:00.0, compute capability: 7.0
2022-08-16 16:19:32.214300: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:2 with 14627 MB memory:  -> device: 2, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:dc:00.0, compute capability: 7.0
2022-08-16 16:19:32.215369: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:3 with 14627 MB memory:  -> device: 3, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:dd:00.0, compute capability: 7.0
2022-08-16 16:19:32.248503: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:1164] Optimization results for grappler item: graph_to_optimize
  constant_folding: Graph size after: 120 nodes (-92), 125 edges (-108), time = 4.738ms.
  function_optimizer: Graph size after: 120 nodes (0), 125 edges (0), time = 2.234ms.
  constant_folding: Graph size after: 120 nodes (0), 125 edges (0), time = 2.266ms.
  function_optimizer: Graph size after: 120 nodes (0), 125 edges (0), time = 2.25ms.
Optimization results for grappler item: while_cond_1256176
  constant_folding: Graph size after: 14 nodes (0), 4 edges (0), time = 0.274ms.
  function_optimizer: function_optimizer did nothing. time = 0.001ms.
  constant_folding: Graph size after: 14 nodes (0), 4 edges (0), time = 0.181ms.
  function_optimizer: function_optimizer did nothing. time = 0.002ms.
Optimization results for grappler item: while_body_1255323
  constant_folding: Graph size after: 50 nodes (0), 50 edges (0), time = 0.79ms.
  function_optimizer: function_optimizer did nothing. time = 0.001ms.
  constant_folding: Graph size after: 50 nodes (0), 50 edges (0), time = 0.647ms.
  function_optimizer: function_optimizer did nothing. time = 0.001ms.
Optimization results for grappler item: while_cond_1255322
  constant_folding: Graph size after: 14 nodes (0), 4 edges (0), time = 0.266ms.
  function_optimizer: function_optimizer did nothing. time = 0.001ms.
  constant_folding: Graph size after: 14 nodes (0), 4 edges (0), time = 0.18ms.
  function_optimizer: function_optimizer did nothing. time = 0.001ms.
Optimization results for grappler item: while_body_1256601
  constant_folding: Graph size after: 50 nodes (0), 50 edges (0), time = 0.777ms.
  function_optimizer: function_optimizer did nothing. time = 0.001ms.
  constant_folding: Graph size after: 50 nodes (0), 50 edges (0), time = 0.654ms.
  function_optimizer: function_optimizer did nothing. time = 0.001ms.
Optimization results for grappler item: while_cond_1255746
  constant_folding: Graph size after: 14 nodes (0), 4 edges (0), time = 0.276ms.
  function_optimizer: function_optimizer did nothing. time = 0.001ms.
  constant_folding: Graph size after: 14 nodes (0), 4 edges (0), time = 0.181ms.
  function_optimizer: function_optimizer did nothing. time = 0.001ms.
Optimization results for grappler item: while_body_1256177
  constant_folding: Graph size after: 50 nodes (0), 50 edges (0), time = 0.78ms.
  function_optimizer: function_optimizer did nothing. time = 0.002ms.
  constant_folding: Graph size after: 50 nodes (0), 50 edges (0), time = 0.651ms.
  function_optimizer: function_optimizer did nothing. time = 0.001ms.
Optimization results for grappler item: while_body_1255747
  constant_folding: Graph size after: 50 nodes (0), 50 edges (0), time = 0.772ms.
  function_optimizer: function_optimizer did nothing. time = 0.001ms.
  constant_folding: Graph size after: 50 nodes (0), 50 edges (0), time = 0.649ms.
  function_optimizer: function_optimizer did nothing. time = 0.001ms.
Optimization results for grappler item: while_cond_1256600
  constant_folding: Graph size after: 14 nodes (0), 4 edges (0), time = 0.277ms.
  function_optimizer: function_optimizer did nothing. time = 0.001ms.
  constant_folding: Graph size after: 14 nodes (0), 4 edges (0), time = 0.181ms.
  function_optimizer: function_optimizer did nothing. time = 0ms.

3.2 time_major=True

def One_LSTM_time():
    input = nn.Input(shape=[1,6,3],batch_size=1,name="input")
    middle1 = tf.squeeze(input,axis=1)
    middle = tf.transpose(middle1,[1,0,2])
    output = nn.LSTM(4,time_major=True,return_sequences=True,name='one')(middle)
    model = tf.keras.models.Model(input,output,name="One_LSTM_time")
    return model
model = One_LSTM_time()
#tf.keras.utils.plot_model(model,to_file=f'tensorflow/{model.name}.png',show_shapes=True,show_layer_names=True,show_dtype=True)
model.save("tensorflow/One_LSTM_time")
spec = (tf.TensorSpec((1,1,6,3),tf.float32,name="input"),)
output_path="tensorflow/"+model.name+'.onnx'
model_proto,_=tf2onnx.convert.from_keras(model,input_signature=spec,opset=11,output_path=output_path)
output_names=[n.name for n in model_proto.graph.output]
model = onnx.load(output_path)
model_sim ,check = simplify(model)
assert check,"simplified onnx model could not be validated"
save_path = output_path.split('.')[0]+"_sim.onnx"
onnx.save(model_sim,save_path)

def Two_LSTM_time():
    input = nn.Input(shape=[1,6,3],batch_size=1,name="input")
    middle1 = tf.squeeze(input,axis=1)
    middle = tf.transpose(middle1,[1,0,2])
    output1 = nn.LSTM(4,time_major=True,return_sequences=True,name='one')(middle)
    output = nn.LSTM(4,time_major=True,return_sequences=True,name='two')(output1)
    model = tf.keras.models.Model(input,output,name="Two_LSTM_time")
    return model
model = Two_LSTM_time()
#tf.keras.utils.plot_model(model,to_file=f'tensorflow/{model.name}.png',show_shapes=True,show_layer_names=True,show_dtype=True)
model.save("tensorflow/Two_LSTM_time")
spec = (tf.TensorSpec((1,1,6,3),tf.float32,name="input"),)
output_path="tensorflow/"+model.name+'.onnx'

model_proto,_=tf2onnx.convert.from_keras(model,input_signature=spec,opset=11,output_path=output_path)
output_names=[n.name for n in model_proto.graph.output]
model = onnx.load(output_path)
model_sim ,check = simplify(model)
assert check,"simplified onnx model could not be validated"
save_path = output_path.split('.')[0]+"_sim.onnx"
onnx.save(model_sim,save_path)

def Bi_Two_LSTM_time():
    input = nn.Input(shape=[1,6,3],batch_size=1,name="input")
    middle1 = tf.squeeze(input,axis=1)
    middle = tf.transpose(middle1,[1,0,2])
    output1 = nn.Bidirectional(nn.LSTM(4,time_major=True,return_sequences=True,name='one'),merge_mode="concat")(middle)
    output = nn.Bidirectional(nn.LSTM(4,time_major=True,return_sequences=True,name='two'),merge_mode="concat")(output1)
    model = tf.keras.models.Model(input,output,name="Bi_Two_LSTM_time")
    return model
model = Bi_Two_LSTM_time()
#tf.keras.utils.plot_model(model,to_file=f'tensorflow/{model.name}.png',show_shapes=True,show_layer_names=True,show_dtype=True)
model.save("tensorflow/Bi_Two_LSTM_time")
spec = (tf.TensorSpec((1,1,6,3),tf.float32,name="input"),)
output_path="tensorflow/"+model.name+'.onnx'
model_proto,_=tf2onnx.convert.from_keras(model,input_signature=spec,opset=11,output_path=output_path)
output_names=[n.name for n in model_proto.graph.output]
model = onnx.load(output_path)
model_sim ,check = simplify(model)
assert check,"simplified onnx model could not be validated"
save_path = output_path.split('.')[0]+"_sim.onnx"
onnx.save(model_sim,save_path)

WARNING:tensorflow:Compiled the loaded model, but the compiled metrics have yet to be built. `model.compile_metrics` will be empty until you train or evaluate the model.


WARNING:tensorflow:Compiled the loaded model, but the compiled metrics have yet to be built. `model.compile_metrics` will be empty until you train or evaluate the model.
WARNING:absl:Found untraced functions such as lstm_cell_8_layer_call_fn, lstm_cell_8_layer_call_and_return_conditional_losses while saving (showing 2 of 2). These functions will not be directly callable after loading.


INFO:tensorflow:Assets written to: tensorflow/One_LSTM_time/assets


INFO:tensorflow:Assets written to: tensorflow/One_LSTM_time/assets
WARNING:absl:<keras.layers.recurrent.LSTMCell object at 0x7f3ae4727f70> has the same name 'LSTMCell' as a built-in Keras object. Consider renaming <class 'keras.layers.recurrent.LSTMCell'> to avoid naming conflicts when loading with `tf.keras.models.load_model`. If renaming is not possible, pass the object in the `custom_objects` parameter of the load function.
2022-08-16 15:43:52.434772: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 4
2022-08-16 15:43:52.434862: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2022-08-16 15:43:52.452892: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 14627 MB memory:  -> device: 0, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:da:00.0, compute capability: 7.0
2022-08-16 15:43:52.453967: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 14627 MB memory:  -> device: 1, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:db:00.0, compute capability: 7.0
2022-08-16 15:43:52.454957: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:2 with 14627 MB memory:  -> device: 2, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:dc:00.0, compute capability: 7.0
2022-08-16 15:43:52.456033: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:3 with 14627 MB memory:  -> device: 3, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:dd:00.0, compute capability: 7.0
2022-08-16 15:43:52.465981: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:1164] Optimization results for grappler item: graph_to_optimize
  function_optimizer: Graph size after: 85 nodes (56), 96 edges (64), time = 1.955ms.
  function_optimizer: Graph size after: 85 nodes (0), 96 edges (0), time = 1.076ms.
Optimization results for grappler item: while_cond_48297
  function_optimizer: function_optimizer did nothing. time = 0.004ms.
  function_optimizer: function_optimizer did nothing. time = 0.001ms.
Optimization results for grappler item: while_body_48298
  function_optimizer: function_optimizer did nothing. time = 0.002ms.
  function_optimizer: function_optimizer did nothing. time = 0ms.

2022-08-16 15:43:52.520244: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 4
2022-08-16 15:43:52.520309: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2022-08-16 15:43:52.538162: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 14627 MB memory:  -> device: 0, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:da:00.0, compute capability: 7.0
2022-08-16 15:43:52.539247: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 14627 MB memory:  -> device: 1, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:db:00.0, compute capability: 7.0
2022-08-16 15:43:52.540262: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:2 with 14627 MB memory:  -> device: 2, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:dc:00.0, compute capability: 7.0
2022-08-16 15:43:52.541338: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:3 with 14627 MB memory:  -> device: 3, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:dd:00.0, compute capability: 7.0
2022-08-16 15:43:52.551627: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:1164] Optimization results for grappler item: graph_to_optimize
  constant_folding: Graph size after: 28 nodes (-23), 28 edges (-27), time = 1.298ms.
  function_optimizer: Graph size after: 28 nodes (0), 28 edges (0), time = 0.575ms.
  constant_folding: Graph size after: 28 nodes (0), 28 edges (0), time = 0.512ms.
  function_optimizer: Graph size after: 28 nodes (0), 28 edges (0), time = 0.588ms.
Optimization results for grappler item: while_cond_48297
  constant_folding: Graph size after: 14 nodes (0), 4 edges (0), time = 0.269ms.
  function_optimizer: function_optimizer did nothing. time = 0.001ms.
  constant_folding: Graph size after: 14 nodes (0), 4 edges (0), time = 0.179ms.
  function_optimizer: function_optimizer did nothing. time = 0.001ms.
Optimization results for grappler item: while_body_48298
  constant_folding: Graph size after: 50 nodes (0), 50 edges (0), time = 0.769ms.
  function_optimizer: function_optimizer did nothing. time = 0.001ms.
  constant_folding: Graph size after: 50 nodes (0), 50 edges (0), time = 0.642ms.
  function_optimizer: function_optimizer did nothing. time = 0.001ms.



WARNING:tensorflow:Compiled the loaded model, but the compiled metrics have yet to be built. `model.compile_metrics` will be empty until you train or evaluate the model.


WARNING:tensorflow:Compiled the loaded model, but the compiled metrics have yet to be built. `model.compile_metrics` will be empty until you train or evaluate the model.
WARNING:absl:Found untraced functions such as lstm_cell_9_layer_call_fn, lstm_cell_9_layer_call_and_return_conditional_losses, lstm_cell_10_layer_call_fn, lstm_cell_10_layer_call_and_return_conditional_losses while saving (showing 4 of 4). These functions will not be directly callable after loading.


INFO:tensorflow:Assets written to: tensorflow/Two_LSTM_time/assets


INFO:tensorflow:Assets written to: tensorflow/Two_LSTM_time/assets
WARNING:absl:<keras.layers.recurrent.LSTMCell object at 0x7f3ac83b8d60> has the same name 'LSTMCell' as a built-in Keras object. Consider renaming <class 'keras.layers.recurrent.LSTMCell'> to avoid naming conflicts when loading with `tf.keras.models.load_model`. If renaming is not possible, pass the object in the `custom_objects` parameter of the load function.
WARNING:absl:<keras.layers.recurrent.LSTMCell object at 0x7f3a7c209b80> has the same name 'LSTMCell' as a built-in Keras object. Consider renaming <class 'keras.layers.recurrent.LSTMCell'> to avoid naming conflicts when loading with `tf.keras.models.load_model`. If renaming is not possible, pass the object in the `custom_objects` parameter of the load function.
2022-08-16 15:43:57.663352: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 4
2022-08-16 15:43:57.663442: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2022-08-16 15:43:57.681413: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 14627 MB memory:  -> device: 0, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:da:00.0, compute capability: 7.0
2022-08-16 15:43:57.682504: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 14627 MB memory:  -> device: 1, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:db:00.0, compute capability: 7.0
2022-08-16 15:43:57.683492: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:2 with 14627 MB memory:  -> device: 2, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:dc:00.0, compute capability: 7.0
2022-08-16 15:43:57.684558: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:3 with 14627 MB memory:  -> device: 3, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:dd:00.0, compute capability: 7.0
2022-08-16 15:43:57.701027: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:1164] Optimization results for grappler item: graph_to_optimize
  function_optimizer: Graph size after: 164 nodes (112), 187 edges (128), time = 3.95ms.
  function_optimizer: Graph size after: 164 nodes (0), 187 edges (0), time = 2.055ms.
Optimization results for grappler item: while_cond_59528
  function_optimizer: function_optimizer did nothing. time = 0.004ms.
  function_optimizer: function_optimizer did nothing. time = 0ms.
Optimization results for grappler item: while_body_59937
  function_optimizer: function_optimizer did nothing. time = 0.002ms.
  function_optimizer: function_optimizer did nothing. time = 0ms.
Optimization results for grappler item: while_cond_59936
  function_optimizer: function_optimizer did nothing. time = 0.002ms.
  function_optimizer: function_optimizer did nothing. time = 0.001ms.
Optimization results for grappler item: while_body_59529
  function_optimizer: function_optimizer did nothing. time = 0.002ms.
  function_optimizer: function_optimizer did nothing. time = 0ms.

2022-08-16 15:43:57.789964: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 4
2022-08-16 15:43:57.790031: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2022-08-16 15:43:57.807917: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 14627 MB memory:  -> device: 0, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:da:00.0, compute capability: 7.0
2022-08-16 15:43:57.809002: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 14627 MB memory:  -> device: 1, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:db:00.0, compute capability: 7.0
2022-08-16 15:43:57.809991: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:2 with 14627 MB memory:  -> device: 2, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:dc:00.0, compute capability: 7.0
2022-08-16 15:43:57.811055: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:3 with 14627 MB memory:  -> device: 3, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:dd:00.0, compute capability: 7.0
2022-08-16 15:43:57.832990: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:1164] Optimization results for grappler item: graph_to_optimize
  constant_folding: Graph size after: 50 nodes (-46), 51 edges (-54), time = 2.266ms.
  function_optimizer: Graph size after: 50 nodes (0), 51 edges (0), time = 1.037ms.
  constant_folding: Graph size after: 50 nodes (0), 51 edges (0), time = 0.898ms.
  function_optimizer: Graph size after: 50 nodes (0), 51 edges (0), time = 1.077ms.
Optimization results for grappler item: while_cond_59528
  constant_folding: Graph size after: 14 nodes (0), 4 edges (0), time = 0.274ms.
  function_optimizer: function_optimizer did nothing. time = 0.001ms.
  constant_folding: Graph size after: 14 nodes (0), 4 edges (0), time = 0.182ms.
  function_optimizer: function_optimizer did nothing. time = 0.001ms.
Optimization results for grappler item: while_body_59937
  constant_folding: Graph size after: 50 nodes (0), 50 edges (0), time = 0.767ms.
  function_optimizer: function_optimizer did nothing. time = 0.002ms.
  constant_folding: Graph size after: 50 nodes (0), 50 edges (0), time = 0.655ms.
  function_optimizer: function_optimizer did nothing. time = 0.002ms.
Optimization results for grappler item: while_cond_59936
  constant_folding: Graph size after: 14 nodes (0), 4 edges (0), time = 0.261ms.
  function_optimizer: function_optimizer did nothing. time = 0.001ms.
  constant_folding: Graph size after: 14 nodes (0), 4 edges (0), time = 0.183ms.
  function_optimizer: function_optimizer did nothing. time = 0.001ms.
Optimization results for grappler item: while_body_59529
  constant_folding: Graph size after: 50 nodes (0), 50 edges (0), time = 2.66ms.
  function_optimizer: function_optimizer did nothing. time = 0.004ms.
  constant_folding: Graph size after: 50 nodes (0), 50 edges (0), time = 2.102ms.
  function_optimizer: function_optimizer did nothing. time = 0.002ms.



WARNING:tensorflow:Compiled the loaded model, but the compiled metrics have yet to be built. `model.compile_metrics` will be empty until you train or evaluate the model.


WARNING:tensorflow:Compiled the loaded model, but the compiled metrics have yet to be built. `model.compile_metrics` will be empty until you train or evaluate the model.
WARNING:absl:Found untraced functions such as lstm_cell_12_layer_call_fn, lstm_cell_12_layer_call_and_return_conditional_losses, lstm_cell_13_layer_call_fn, lstm_cell_13_layer_call_and_return_conditional_losses, lstm_cell_15_layer_call_fn while saving (showing 5 of 8). These functions will not be directly callable after loading.


INFO:tensorflow:Assets written to: tensorflow/Bi_Two_LSTM_time/assets


INFO:tensorflow:Assets written to: tensorflow/Bi_Two_LSTM_time/assets
WARNING:absl:<keras.layers.recurrent.LSTMCell object at 0x7f3a844d8160> has the same name 'LSTMCell' as a built-in Keras object. Consider renaming <class 'keras.layers.recurrent.LSTMCell'> to avoid naming conflicts when loading with `tf.keras.models.load_model`. If renaming is not possible, pass the object in the `custom_objects` parameter of the load function.
WARNING:absl:<keras.layers.recurrent.LSTMCell object at 0x7f3a7c14f400> has the same name 'LSTMCell' as a built-in Keras object. Consider renaming <class 'keras.layers.recurrent.LSTMCell'> to avoid naming conflicts when loading with `tf.keras.models.load_model`. If renaming is not possible, pass the object in the `custom_objects` parameter of the load function.
WARNING:absl:<keras.layers.recurrent.LSTMCell object at 0x7f3a7c79bf70> has the same name 'LSTMCell' as a built-in Keras object. Consider renaming <class 'keras.layers.recurrent.LSTMCell'> to avoid naming conflicts when loading with `tf.keras.models.load_model`. If renaming is not possible, pass the object in the `custom_objects` parameter of the load function.
WARNING:absl:<keras.layers.recurrent.LSTMCell object at 0x7f3a8465a160> has the same name 'LSTMCell' as a built-in Keras object. Consider renaming <class 'keras.layers.recurrent.LSTMCell'> to avoid naming conflicts when loading with `tf.keras.models.load_model`. If renaming is not possible, pass the object in the `custom_objects` parameter of the load function.
2022-08-16 15:44:12.614816: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 4
2022-08-16 15:44:12.614936: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2022-08-16 15:44:12.633044: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 14627 MB memory:  -> device: 0, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:da:00.0, compute capability: 7.0
2022-08-16 15:44:12.634138: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 14627 MB memory:  -> device: 1, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:db:00.0, compute capability: 7.0
2022-08-16 15:44:12.635134: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:2 with 14627 MB memory:  -> device: 2, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:dc:00.0, compute capability: 7.0
2022-08-16 15:44:12.636200: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:3 with 14627 MB memory:  -> device: 3, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:dd:00.0, compute capability: 7.0
2022-08-16 15:44:12.667021: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:1164] Optimization results for grappler item: graph_to_optimize
  function_optimizer: Graph size after: 334 nodes (228), 383 edges (260), time = 8.111ms.
  function_optimizer: Graph size after: 334 nodes (0), 383 edges (0), time = 4.232ms.
Optimization results for grappler item: while_body_93140
  function_optimizer: function_optimizer did nothing. time = 0.005ms.
  function_optimizer: function_optimizer did nothing. time = 0ms.
Optimization results for grappler item: while_body_93550
  function_optimizer: function_optimizer did nothing. time = 0.002ms.
  function_optimizer: function_optimizer did nothing. time = 0ms.
Optimization results for grappler item: while_cond_92313
  function_optimizer: function_optimizer did nothing. time = 0.002ms.
  function_optimizer: function_optimizer did nothing. time = 0.001ms.
Optimization results for grappler item: while_cond_93139
  function_optimizer: function_optimizer did nothing. time = 0.002ms.
  function_optimizer: function_optimizer did nothing. time = 0.001ms.
Optimization results for grappler item: while_cond_93549
  function_optimizer: function_optimizer did nothing. time = 0.002ms.
  function_optimizer: function_optimizer did nothing. time = 0.001ms.
Optimization results for grappler item: while_body_92314
  function_optimizer: function_optimizer did nothing. time = 0.002ms.
  function_optimizer: function_optimizer did nothing. time = 0.001ms.
Optimization results for grappler item: while_cond_92723
  function_optimizer: function_optimizer did nothing. time = 0.002ms.
  function_optimizer: function_optimizer did nothing. time = 0.001ms.
Optimization results for grappler item: while_body_92724
  function_optimizer: function_optimizer did nothing. time = 0.002ms.
  function_optimizer: function_optimizer did nothing. time = 0.001ms.

2022-08-16 15:44:12.837667: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 4
2022-08-16 15:44:12.837739: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2022-08-16 15:44:12.855667: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 14627 MB memory:  -> device: 0, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:da:00.0, compute capability: 7.0
2022-08-16 15:44:12.856749: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 14627 MB memory:  -> device: 1, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:db:00.0, compute capability: 7.0
2022-08-16 15:44:12.857738: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:2 with 14627 MB memory:  -> device: 2, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:dc:00.0, compute capability: 7.0
2022-08-16 15:44:12.858802: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:3 with 14627 MB memory:  -> device: 3, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:dd:00.0, compute capability: 7.0
2022-08-16 15:44:12.896488: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:1164] Optimization results for grappler item: graph_to_optimize
  constant_folding: Graph size after: 106 nodes (-92), 111 edges (-108), time = 4.422ms.
  function_optimizer: Graph size after: 106 nodes (0), 111 edges (0), time = 2.035ms.
  constant_folding: Graph size after: 106 nodes (0), 111 edges (0), time = 1.936ms.
  function_optimizer: Graph size after: 106 nodes (0), 111 edges (0), time = 2.095ms.
Optimization results for grappler item: while_body_93140
  constant_folding: Graph size after: 50 nodes (0), 50 edges (0), time = 0.783ms.
  function_optimizer: function_optimizer did nothing. time = 0.001ms.
  constant_folding: Graph size after: 50 nodes (0), 50 edges (0), time = 0.648ms.
  function_optimizer: function_optimizer did nothing. time = 0.001ms.
Optimization results for grappler item: while_body_93550
  constant_folding: Graph size after: 50 nodes (0), 50 edges (0), time = 0.778ms.
  function_optimizer: function_optimizer did nothing. time = 0.001ms.
  constant_folding: Graph size after: 50 nodes (0), 50 edges (0), time = 0.65ms.
  function_optimizer: function_optimizer did nothing. time = 0.001ms.
Optimization results for grappler item: while_cond_92313
  constant_folding: Graph size after: 14 nodes (0), 4 edges (0), time = 0.265ms.
  function_optimizer: function_optimizer did nothing. time = 0.002ms.
  constant_folding: Graph size after: 14 nodes (0), 4 edges (0), time = 0.181ms.
  function_optimizer: function_optimizer did nothing. time = 0.001ms.
Optimization results for grappler item: while_cond_93139
  constant_folding: Graph size after: 14 nodes (0), 4 edges (0), time = 0.258ms.
  function_optimizer: function_optimizer did nothing. time = 0.001ms.
  constant_folding: Graph size after: 14 nodes (0), 4 edges (0), time = 0.179ms.
  function_optimizer: function_optimizer did nothing. time = 0.001ms.
Optimization results for grappler item: while_cond_93549
  constant_folding: Graph size after: 14 nodes (0), 4 edges (0), time = 0.893ms.
  function_optimizer: function_optimizer did nothing. time = 0.004ms.
  constant_folding: Graph size after: 14 nodes (0), 4 edges (0), time = 0.633ms.
  function_optimizer: function_optimizer did nothing. time = 0.004ms.
Optimization results for grappler item: while_body_92314
  constant_folding: Graph size after: 50 nodes (0), 50 edges (0), time = 1.257ms.
  function_optimizer: function_optimizer did nothing. time = 0.002ms.
  constant_folding: Graph size after: 50 nodes (0), 50 edges (0), time = 0.959ms.
  function_optimizer: function_optimizer did nothing. time = 0.002ms.
Optimization results for grappler item: while_cond_92723
  constant_folding: Graph size after: 14 nodes (0), 4 edges (0), time = 0.377ms.
  function_optimizer: function_optimizer did nothing. time = 0.002ms.
  constant_folding: Graph size after: 14 nodes (0), 4 edges (0), time = 0.274ms.
  function_optimizer: function_optimizer did nothing. time = 0.001ms.
Optimization results for grappler item: while_body_92724
  constant_folding: Graph size after: 50 nodes (0), 50 edges (0), time = 1.113ms.
  function_optimizer: function_optimizer did nothing. time = 0.002ms.
  constant_folding: Graph size after: 50 nodes (0), 50 edges (0), time = 0.952ms.
  function_optimizer: function_optimizer did nothing. time = 0.002ms.

3.3 return_state=True

支持上一层的state做为下一层的初始状态

def One_LSTM_time_state():
    input = nn.Input(shape=[1,6,3],batch_size=1,name="input")
    middle1 = tf.squeeze(input,axis=1)
    middle = tf.transpose(middle1,[1,0,2])
    output,h_state,c_state = nn.LSTM(4,time_major=True,return_sequences=True,return_state=True,name='one')(middle)
    model = tf.keras.models.Model(inputs=input,outputs=[output,h_state,c_state],name="One_LSTM_time_state")
    return model
model = One_LSTM_time_state()
#tf.keras.utils.plot_model(model,to_file=f'tensorflow/{model.name}.png',show_shapes=True,show_layer_names=True,show_dtype=True)
model.save("tensorflow/One_LSTM_time_state")
spec = (tf.TensorSpec((1,1,6,3),tf.float32,name="input"),)
output_path="tensorflow/"+model.name+'.onnx'
model_proto,_=tf2onnx.convert.from_keras(model,input_signature=spec,opset=11,output_path=output_path)
output_names=[n.name for n in model_proto.graph.output]
model = onnx.load(output_path)
model_sim ,check = simplify(model)
assert check,"simplified onnx model could not be validated"
save_path = output_path.split('.')[0]+"_sim.onnx"
onnx.save(model_sim,save_path)

def Two_LSTM_time_state():
    input = nn.Input(shape=[1,6,3],batch_size=1,name="input")
    middle1 = tf.squeeze(input,axis=1)
    middle = tf.transpose(middle1,[1,0,2])
    output1,h_state,c_state = nn.LSTM(4,time_major=True,return_sequences=True,return_state=True,name='one')(middle)
    output,h_state1,c_state1 = nn.LSTM(4,time_major=True,return_sequences=True,return_state=True,name='two')(output1,initial_state=(h_state,c_state))
    model = tf.keras.models.Model(inputs=input,outputs=[output,h_state1,c_state1],name="Two_LSTM_time_state")
    return model
model = Two_LSTM_time_state()
#tf.keras.utils.plot_model(model,to_file=f'tensorflow/{model.name}.png',show_shapes=True,show_layer_names=True,show_dtype=True)
model.save("tensorflow/Two_LSTM_time_state")
spec = (tf.TensorSpec((1,1,6,3),tf.float32,name="input"),)
output_path="tensorflow/"+model.name+'.onnx'

model_proto,_=tf2onnx.convert.from_keras(model,input_signature=spec,opset=11,output_path=output_path)
output_names=[n.name for n in model_proto.graph.output]
model = onnx.load(output_path)
model_sim ,check = simplify(model)
assert check,"simplified onnx model could not be validated"
save_path = output_path.split('.')[0]+"_sim.onnx"
onnx.save(model_sim,save_path)

def Bi_Two_LSTM_time_state():
    input = nn.Input(shape=[1,6,3],batch_size=1,name="input")
    middle1 = tf.squeeze(input,axis=1)
    middle = tf.transpose(middle1,[1,0,2])
    output1,h_state,c_state,h_state1,c_state1= nn.Bidirectional(nn.LSTM(4,time_major=True,return_sequences=True,return_state=True,name='one'),merge_mode="concat")(middle)
    output= nn.Bidirectional(nn.LSTM(4,time_major=True,return_sequences=True,return_state=True,name='two'),merge_mode="concat")(output1,initial_state=(h_state,c_state,h_state1,c_state1))
    model = tf.keras.models.Model(inputs=input,outputs=output,name="Bi_Two_LSTM_time_state")
    return model
model = Bi_Two_LSTM_time_state()
#tf.keras.utils.plot_model(model,to_file=f'tensorflow/{model.name}.png',show_shapes=True,show_layer_names=True,show_dtype=True)
model.save("tensorflow/Bi_Two_LSTM_time_state")
spec = (tf.TensorSpec((1,1,6,3),tf.float32,name="input"),)
output_path="tensorflow/"+model.name+'.onnx'
model_proto,_=tf2onnx.convert.from_keras(model,input_signature=spec,opset=11,output_path=output_path)
output_names=[n.name for n in model_proto.graph.output]
model = onnx.load(output_path)
model_sim ,check = simplify(model)
assert check,"simplified onnx model could not be validated"
save_path = output_path.split('.')[0]+"_sim.onnx"
onnx.save(model_sim,save_path)

WARNING:tensorflow:Compiled the loaded model, but the compiled metrics have yet to be built. `model.compile_metrics` will be empty until you train or evaluate the model.


WARNING:tensorflow:Compiled the loaded model, but the compiled metrics have yet to be built. `model.compile_metrics` will be empty until you train or evaluate the model.
WARNING:absl:Found untraced functions such as lstm_cell_151_layer_call_fn, lstm_cell_151_layer_call_and_return_conditional_losses while saving (showing 2 of 2). These functions will not be directly callable after loading.


INFO:tensorflow:Assets written to: tensorflow/One_LSTM_time_state/assets


INFO:tensorflow:Assets written to: tensorflow/One_LSTM_time_state/assets
2022-08-10 10:27:29.569744: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 4
2022-08-10 10:27:29.569887: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2022-08-10 10:27:29.587666: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 14627 MB memory:  -> device: 0, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:da:00.0, compute capability: 7.0
2022-08-10 10:27:29.588730: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 14627 MB memory:  -> device: 1, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:db:00.0, compute capability: 7.0
2022-08-10 10:27:29.589789: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:2 with 14627 MB memory:  -> device: 2, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:dc:00.0, compute capability: 7.0
2022-08-10 10:27:29.590839: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:3 with 14627 MB memory:  -> device: 3, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:dd:00.0, compute capability: 7.0
2022-08-10 10:27:29.657141: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 4
2022-08-10 10:27:29.657227: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2022-08-10 10:27:29.674878: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 14627 MB memory:  -> device: 0, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:da:00.0, compute capability: 7.0
2022-08-10 10:27:29.675934: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 14627 MB memory:  -> device: 1, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:db:00.0, compute capability: 7.0
2022-08-10 10:27:29.676968: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:2 with 14627 MB memory:  -> device: 2, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:dc:00.0, compute capability: 7.0
2022-08-10 10:27:29.678019: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:3 with 14627 MB memory:  -> device: 3, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:dd:00.0, compute capability: 7.0


WARNING:tensorflow:Compiled the loaded model, but the compiled metrics have yet to be built. `model.compile_metrics` will be empty until you train or evaluate the model.


WARNING:tensorflow:Compiled the loaded model, but the compiled metrics have yet to be built. `model.compile_metrics` will be empty until you train or evaluate the model.
WARNING:absl:Found untraced functions such as lstm_cell_152_layer_call_fn, lstm_cell_152_layer_call_and_return_conditional_losses, lstm_cell_153_layer_call_fn, lstm_cell_153_layer_call_and_return_conditional_losses while saving (showing 4 of 4). These functions will not be directly callable after loading.


INFO:tensorflow:Assets written to: tensorflow/Two_LSTM_time_state/assets


INFO:tensorflow:Assets written to: tensorflow/Two_LSTM_time_state/assets
2022-08-10 10:27:34.989735: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 4
2022-08-10 10:27:34.989854: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2022-08-10 10:27:35.007588: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 14627 MB memory:  -> device: 0, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:da:00.0, compute capability: 7.0
2022-08-10 10:27:35.008641: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 14627 MB memory:  -> device: 1, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:db:00.0, compute capability: 7.0
2022-08-10 10:27:35.009675: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:2 with 14627 MB memory:  -> device: 2, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:dc:00.0, compute capability: 7.0
2022-08-10 10:27:35.010708: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:3 with 14627 MB memory:  -> device: 3, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:dd:00.0, compute capability: 7.0
2022-08-10 10:27:35.118860: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 4
2022-08-10 10:27:35.118956: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2022-08-10 10:27:35.136643: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 14627 MB memory:  -> device: 0, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:da:00.0, compute capability: 7.0
2022-08-10 10:27:35.137703: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 14627 MB memory:  -> device: 1, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:db:00.0, compute capability: 7.0
2022-08-10 10:27:35.138736: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:2 with 14627 MB memory:  -> device: 2, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:dc:00.0, compute capability: 7.0
2022-08-10 10:27:35.139769: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:3 with 14627 MB memory:  -> device: 3, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:dd:00.0, compute capability: 7.0


WARNING:tensorflow:Compiled the loaded model, but the compiled metrics have yet to be built. `model.compile_metrics` will be empty until you train or evaluate the model.


WARNING:tensorflow:Compiled the loaded model, but the compiled metrics have yet to be built. `model.compile_metrics` will be empty until you train or evaluate the model.
WARNING:absl:Found untraced functions such as lstm_cell_155_layer_call_fn, lstm_cell_155_layer_call_and_return_conditional_losses, lstm_cell_156_layer_call_fn, lstm_cell_156_layer_call_and_return_conditional_losses, lstm_cell_158_layer_call_fn while saving (showing 5 of 8). These functions will not be directly callable after loading.


INFO:tensorflow:Assets written to: tensorflow/Bi_Two_LSTM_time_state/assets


INFO:tensorflow:Assets written to: tensorflow/Bi_Two_LSTM_time_state/assets
2022-08-10 10:27:50.572328: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 4
2022-08-10 10:27:50.572459: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2022-08-10 10:27:50.590225: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 14627 MB memory:  -> device: 0, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:da:00.0, compute capability: 7.0
2022-08-10 10:27:50.591290: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 14627 MB memory:  -> device: 1, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:db:00.0, compute capability: 7.0
2022-08-10 10:27:50.592334: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:2 with 14627 MB memory:  -> device: 2, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:dc:00.0, compute capability: 7.0
2022-08-10 10:27:50.593388: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:3 with 14627 MB memory:  -> device: 3, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:dd:00.0, compute capability: 7.0
2022-08-10 10:27:50.800566: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 4
2022-08-10 10:27:50.800672: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2022-08-10 10:27:50.818458: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 14627 MB memory:  -> device: 0, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:da:00.0, compute capability: 7.0
2022-08-10 10:27:50.819552: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 14627 MB memory:  -> device: 1, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:db:00.0, compute capability: 7.0
2022-08-10 10:27:50.820605: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:2 with 14627 MB memory:  -> device: 2, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:dc:00.0, compute capability: 7.0
2022-08-10 10:27:50.821643: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:3 with 14627 MB memory:  -> device: 3, name: Tesla V100-PCIE-16GB-LS, pci bus id: 0000:dd:00.0, compute capability: 7.0

3.4 查看生成的模型

tf_models = sorted(os.listdir('tensorflow'))
tf_models_path=[os.path.join('tensorflow',p) for p in tf_models if p.endswith('onnx')]
print(tf_models)

['Bi_Two_LSTM_batch', 'Bi_Two_LSTM_batch.onnx', 'Bi_Two_LSTM_batch_sim.onnx', 'Bi_Two_LSTM_time', 'Bi_Two_LSTM_time.onnx', 'Bi_Two_LSTM_time_sim.onnx', 'Bi_Two_LSTM_time_state', 'Bi_Two_LSTM_time_state.onnx', 'Bi_Two_LSTM_time_state_sim.onnx', 'One_LSTM_batch', 'One_LSTM_batch.onnx', 'One_LSTM_batch_sim.onnx', 'One_LSTM_time', 'One_LSTM_time.onnx', 'One_LSTM_time_sim.onnx', 'One_LSTM_time_state', 'One_LSTM_time_state.onnx', 'One_LSTM_time_state_sim.onnx', 'Two_LSTM_batch', 'Two_LSTM_batch.onnx', 'Two_LSTM_batch_sim.onnx', 'Two_LSTM_time', 'Two_LSTM_time.onnx', 'Two_LSTM_time_sim.onnx', 'Two_LSTM_time_state', 'Two_LSTM_time_state.onnx', 'Two_LSTM_time_state_sim.onnx']

tf_models_path

['tensorflow/Bi_Two_LSTM_batch.onnx',
 'tensorflow/Bi_Two_LSTM_batch_sim.onnx',
 'tensorflow/Bi_Two_LSTM_time.onnx',
 'tensorflow/Bi_Two_LSTM_time_sim.onnx',
 'tensorflow/Bi_Two_LSTM_time_state.onnx',
 'tensorflow/Bi_Two_LSTM_time_state_sim.onnx',
 'tensorflow/One_LSTM_batch.onnx',
 'tensorflow/One_LSTM_batch_sim.onnx',
 'tensorflow/One_LSTM_time.onnx',
 'tensorflow/One_LSTM_time_sim.onnx',
 'tensorflow/One_LSTM_time_state.onnx',
 'tensorflow/One_LSTM_time_state_sim.onnx',
 'tensorflow/Two_LSTM_batch.onnx',
 'tensorflow/Two_LSTM_batch_sim.onnx',
 'tensorflow/Two_LSTM_time.onnx',
 'tensorflow/Two_LSTM_time_sim.onnx',
 'tensorflow/Two_LSTM_time_state.onnx',
 'tensorflow/Two_LSTM_time_state_sim.onnx']

! du -sh tensorflow/*

4.2M	tensorflow/Bi_Two_LSTM_batch
8.0K	tensorflow/Bi_Two_LSTM_batch.onnx
8.0K	tensorflow/Bi_Two_LSTM_batch_sim.onnx
4.0M	tensorflow/Bi_Two_LSTM_time
8.0K	tensorflow/Bi_Two_LSTM_time.onnx
8.0K	tensorflow/Bi_Two_LSTM_time_sim.onnx
4.0M	tensorflow/Bi_Two_LSTM_time_state
8.0K	tensorflow/Bi_Two_LSTM_time_state.onnx
8.0K	tensorflow/Bi_Two_LSTM_time_state_sim.onnx
708K	tensorflow/One_LSTM_batch
4.0K	tensorflow/One_LSTM_batch.onnx
4.0K	tensorflow/One_LSTM_batch_sim.onnx
684K	tensorflow/One_LSTM_time
4.0K	tensorflow/One_LSTM_time.onnx
4.0K	tensorflow/One_LSTM_time_sim.onnx
696K	tensorflow/One_LSTM_time_state
4.0K	tensorflow/One_LSTM_time_state.onnx
4.0K	tensorflow/One_LSTM_time_state_sim.onnx
1.4M	tensorflow/Two_LSTM_batch
4.0K	tensorflow/Two_LSTM_batch.onnx
4.0K	tensorflow/Two_LSTM_batch_sim.onnx
1.3M	tensorflow/Two_LSTM_time
4.0K	tensorflow/Two_LSTM_time.onnx
4.0K	tensorflow/Two_LSTM_time_sim.onnx
1.3M	tensorflow/Two_LSTM_time_state
4.0K	tensorflow/Two_LSTM_time_state.onnx
4.0K	tensorflow/Two_LSTM_time_state_sim.onnx

def onnx_infer(model_path,data):
    """_summary_

    Args:
        model_path (_type_): _description_
        data (_type_): _description_
    """
    onnx_session=onnxruntime.InferenceSession(model_path)
    input_name = onnx_session.get_inputs()[0].name
    output_name = onnx_session.get_outputs()[0].name
    result = onnx_session.run([output_name],{input_name:data})
    return result[0]

tf_models_path=["tensorflow/One_LSTM_time.onnx"]

test_data = np.random.random(size=(1,1,6,3)).astype(np.float32) # batch,channel,height,width
for i,onnx_path in enumerate(tf_models_path):
    base_path = os.path.splitext(onnx_path)[0]
    if not base_path.endswith('sim'):
        results={}
        onnx_sim=base_path+'_sim.onnx'
        tf_result = tf.keras.models.load_model(base_path)(tf.convert_to_tensor(test_data))
        # print(f'base_path:{base_path} len:{len(tf_result)} type:{type(tf_result)}')
        if isinstance(tf_result,list):
            tf_result=tf_result[0].numpy()
        results[os.path.basename(base_path)]=tf_result
        onnx_result = onnx_infer(onnx_path,test_data)
        results[os.path.basename(onnx_path)]=onnx_result
        sim_result = onnx_infer(onnx_sim,test_data)
        results[os.path.basename(onnx_sim)]=sim_result
        try:
            values = list(results.values())
            np.testing.assert_allclose(values[0],values[1],rtol=1e-5)
            np.testing.assert_allclose(values[1],values[2],rtol=1e-5)
            print(f"{list(results.keys())} have same results")
        except:
            print(f"{list(results.keys())} have different results")
        finally:
            results={}

WARNING:tensorflow:No training configuration found in save file, so the model was *not* compiled. Compile it manually.


WARNING:tensorflow:No training configuration found in save file, so the model was *not* compiled. Compile it manually.


['Bi_Two_LSTM_batch', 'Bi_Two_LSTM_batch.onnx', 'Bi_Two_LSTM_batch_sim.onnx'] have same results
WARNING:tensorflow:No training configuration found in save file, so the model was *not* compiled. Compile it manually.


WARNING:tensorflow:No training configuration found in save file, so the model was *not* compiled. Compile it manually.


['Bi_Two_LSTM_time', 'Bi_Two_LSTM_time.onnx', 'Bi_Two_LSTM_time_sim.onnx'] have same results
WARNING:tensorflow:No training configuration found in save file, so the model was *not* compiled. Compile it manually.


WARNING:tensorflow:No training configuration found in save file, so the model was *not* compiled. Compile it manually.


['Bi_Two_LSTM_time_state', 'Bi_Two_LSTM_time_state.onnx', 'Bi_Two_LSTM_time_state_sim.onnx'] have same results
WARNING:tensorflow:No training configuration found in save file, so the model was *not* compiled. Compile it manually.


WARNING:tensorflow:No training configuration found in save file, so the model was *not* compiled. Compile it manually.


['One_LSTM_batch', 'One_LSTM_batch.onnx', 'One_LSTM_batch_sim.onnx'] have same results
WARNING:tensorflow:No training configuration found in save file, so the model was *not* compiled. Compile it manually.


WARNING:tensorflow:No training configuration found in save file, so the model was *not* compiled. Compile it manually.


['One_LSTM_time', 'One_LSTM_time.onnx', 'One_LSTM_time_sim.onnx'] have same results
WARNING:tensorflow:No training configuration found in save file, so the model was *not* compiled. Compile it manually.


WARNING:tensorflow:No training configuration found in save file, so the model was *not* compiled. Compile it manually.


['One_LSTM_time_state', 'One_LSTM_time_state.onnx', 'One_LSTM_time_state_sim.onnx'] have same results
WARNING:tensorflow:No training configuration found in save file, so the model was *not* compiled. Compile it manually.


WARNING:tensorflow:No training configuration found in save file, so the model was *not* compiled. Compile it manually.


['Two_LSTM_batch', 'Two_LSTM_batch.onnx', 'Two_LSTM_batch_sim.onnx'] have same results
WARNING:tensorflow:No training configuration found in save file, so the model was *not* compiled. Compile it manually.


WARNING:tensorflow:No training configuration found in save file, so the model was *not* compiled. Compile it manually.


['Two_LSTM_time', 'Two_LSTM_time.onnx', 'Two_LSTM_time_sim.onnx'] have same results
WARNING:tensorflow:No training configuration found in save file, so the model was *not* compiled. Compile it manually.


WARNING:tensorflow:No training configuration found in save file, so the model was *not* compiled. Compile it manually.


['Two_LSTM_time_state', 'Two_LSTM_time_state.onnx', 'Two_LSTM_time_state_sim.onnx'] have same results

查看onnx的结构：
tensorflow不同与前两个框架默认返回状态，tensorflow可以指定是否返回，而且要返回的话会返回所用，而不仅仅是最后一step的，

类型	单层	双层	双层双向
time_major=False,return_state=False
time_major=True,return_state=False
time_major=True,returnstate=True

4 lstm 转换工具

https://gitee.com/tdddeeel/lstm_pytorch_tensorflow_paddle 在这里有一个小工具，可以将双向lstm转换成两个单向的lstm
这里将转换pytorch batch_first=True及tensorflow time_major=True的双层双向lstm为例进行转换，转换前后的两个onnx对比精度也是可以参考我边的例子，经测试是没有问题的。目前该工具只对paddle和pytorch的模型有效，
使用方法

./bilstm_opt --onnx_path ... --save_path ..

看下onnx:

类型	转换前	转换后
pytorch

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)