Neural Networks Beginnings - страница 3
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
#Defining the architecture of a neural network
model = keras.Sequential(
[
layers.LSTM(128, input_shape=(None, 13)),
layers.Dense(64, activation="relu"),
layers.Dense(32, activation="relu"),
layers.Dense(10, activation="softmax"),
]
)
#Compilation of the model
model.compile(
optimizer=keras.optimizers.Adam(learning_rate=0.001),
loss=keras.losses.CategoricalCrossentropy(),
metrics=["accuracy"],
)
#Loading audio file
audio_file = tf.io.read_file("audio.wav")
audio, _ = tf.audio.decode_wav(audio_file)
audio = tf.squeeze(audio, axis=-1)
audio = tf.cast(audio, tf.float32)
# splitting into segments
frame_length = 640
frame_step = 320
audio_length = tf.shape(audio)[0]
num_frames = tf.cast(tf.math.ceil(audio_length / frame_step), tf.int32)
padding_length = num_frames * frame_step – audio_length
audio = tf.pad(audio, [[0, padding_length]])
audio = tf.reshape(audio, [num_frames, frame_length])
#Extracting MFCC features
mfccs = tf.signal.mfccs_from_log_mel_spectrograms(
tf.math.log(tf.abs(tf.signal.stft(audio))),
audio.shape[-1],
num_mel_bins=13,
dct_coefficient_count=13,
)
# Data preparation for training
labels = ["one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "zero"]
label_to_index = dict(zip(labels, range(len(labels))))
index_to_label = dict(zip(range(len(labels)), labels))
text = "one two three four five six seven eight nine zero"
target = tf.keras.preprocessing.text.one_hot(text, len(labels))
X_train = mfccs[None, …]
y_train = target[None, …]
# Training the model
history = model.fit(X_train, y_train, epochs=10)
# Making predictions
predicted_probs = model.predict(X_train)
predicted_indexes = tf.argmax(predicted_probs, axis=-1)[0]
predicted_labels = [index_to_label[i] for i in predicted_indexes]
# Outputting results
print("Predicted labels:", predicted_labels)
This code implements automatic speech recognition using a neural network based on TensorFlow and Keras. The first step is to define the neural network architecture using Keras Sequential API. In this case, a recurrent LSTM layer is used, which takes in a sequence of 13-length sound segments. Then there are several fully connected layers with a relu activation function and one output layer with a softmax activation function, which outputs probabilities for each speech class.
Next, the model is compiled using the compile method. The Adam optimizer with a learning rate of 0.001 is chosen, the loss function is categorical cross-entropy, and the classification accuracy is used as the metric.
Then a sound file in the wav format is loaded, decoded using tf.audio.decode_wav, and transformed into float32 numerical values. The file is then split into fragments of length 640 with a step of 320. If the file cannot be divided into equal fragments, padding is added.
This code implements automatic speech recognition using a neural network based on TensorFlow and Keras. The first step is to define the architecture of the neural network using the Keras Sequential API. In this case, a recurrent LSTM layer is used, which takes in a sequence of 13-length sound snippets. Then there are several fully connected layers with the relu activation function, and one output layer with the softmax activation function, which outputs probabilities for each speech class.