这个错误是由于在 `MultiHeadAttention` 层中,`axis` 参数设置不正确导致的。默认情况下,`axis=-1` 表示对最后一个轴进行 softmax 操作,但是在你的代码中,输入数据的最后一个轴是时间步,而不是特征维度。
因此,你需要将 `axis` 参数设置为正确的值,即 `axis=2`,表示对特征维度进行 softmax 操作。修改代码如下:
```python
from keras.layers import Input, Dense, CuDNNGRU, Dropout, ActivityRegularization
from keras.layers import Concatenate, Flatten, Reshape, Lambda
from keras.layers import LayerNormalization, MultiHeadAttention
def LSTNet(trainX1, trainX2, trainY, config):
input1 = Input(shape=(trainX1.shape[1], trainX1.shape[2]))
input2 = Input(shape=(trainX2.shape[1], trainX2.shape[2]))
# CNN layer for temporal feature extraction
conv1 = Conv1D(filters=config['conv_filters'], kernel_size=config['conv_kernel'], activation='relu')(input1)
dropout1 = Dropout(config['dropout'])(conv1)
# GRU layer for capturing temporal dependencies
gru1 = CuDNNGRU(units=config['gru_units'], return_sequences=True)(dropout1)
dropout2 = Dropout(config['dropout'])(gru1)
# Attention layer for capturing temporal dependencies across multiple time series
attention1_output = MultiHeadAttention(num_heads=4, key_dim=16, attention_axes=(1, 2))(dropout2, dropout2)
attention1_output = LayerNormalization()(attention1_output)
dropout3 = Dropout(config['dropout'])(attention1_output)
# Concatenate with external features
concat1 = Concatenate(axis=2)([dropout3, input2])
# Fully connected layer for prediction
flatten1 = Flatten()(concat1)
dense1 = Dense(units=config['dense_units'], activation='relu')(flatten1)
dropout4 = Dropout(config['dropout'])(dense1)
output = Dense(units=1, activation='sigmoid', activity_regularizer=ActivityRegularization(l1=config['l1'], l2=config['l