mcfly¶

mcfly.find_best_architecture(X_train, y_train, X_val, y_val, verbose=True, number_of_models=5, nr_epochs=5, subset_size=100, outputpath=None, model_path=None, metric=None, class_weight=None, **kwargs)[source]¶

Tries out a number of models on a subsample of the data, and outputs the best found architecture and hyperparameters.

Infers the task (classification vs. regression) automatically from the input data. For further details, see the Technical documentation.

Parameters:

X_train (Supported types:) –
- numpy array
- tf.data dataset. Should return a tuple of (inputs, targets) or (inputs, targets, sample_weights)
- generator or keras.utils.Sequence. Should return a tuple of (inputs, targets) or (inputs, targets, sample_weights)
The input dataset for training of shape (num_samples, num_timesteps, num_channels) More details can be found in the documentation for the Keras function Model.fit() [1]
y_train (numpy array) – The output classes for the train data, in binary format of shape (num_samples, num_classes) If the training data is a dataset, generator or keras.utils.Sequence, y_train should not be specified.
X_val (Supported types:) –
- numpy array
- tf.data dataset. Should return a tuple of (inputs, targets) or (inputs, targets, sample_weights)
- generator or keras.utils.Sequence. Should return a tuple of (inputs, targets) or (inputs, targets, sample_weights)
The input dataset for validation of shape (num_samples_val, num_timesteps, num_channels) More details can be found in the documentation for the Keras function Model.fit() [1]
y_val (numpy array) – The output classes for the validation data, in binary format of shape (num_samples_val, num_classes) If the validation data is a dataset, generator or keras.utils.Sequence, y_val should not be specified.
verbose (bool, optional) – flag for displaying verbose output
number_of_models (int, optiona) – The number of models to generate and test
nr_epochs (int, optional) – The number of epochs that each model is trained
subset_size (int, optional) – The size of the subset of the data that is used for finding the optimal architecture. Default is 100. If set to ‘None’ use the entire dataset. Subset is not supported for tf.data.Dataset objects or generators
outputpath (str, optional) – File location to store the model results
model_path (str, optional) – Directory to save the models as HDF5 files
class_weight (dict, optional) – Dictionary containing class weights (example: {0: 0.5, 1: 2.})
metric (str, optional) – metric that is used to evaluate the model on the validation set. See https://keras.io/metrics/ for possible metrics
**kwargs (key-value parameters) – parameters for generating the models (see docstring for modelgen.generate_models)

Returns:

best_model (Keras model) – Best performing model, already trained on a small sample data set.
best_params (dict) – Dictionary containing the hyperparameters for the best model
best_model_type (str) – Type of the best model
knn_performance (float) – performance score for kNN prediction on validation set

[1]: https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit

mcfly.generate_models(x_shape, number_of_output_dimensions, number_of_models, model_types=['CNN', 'DeepConvLSTM', 'ResNet', 'InceptionTime'], task='classification', metrics=['accuracy'], **hyperparameter_ranges)[source]¶

Generate one or multiple untrained Keras models with random hyperparameters. The number of models per given model type will be roughly balanced.

Parameters:	x_shape (tuple) – Shape of the input dataset: (num_samples, num_timesteps, num_channels) number_of_output_dimensions (int) – Number of classes for classification task or number of targets for regression. number_of_models (int) – Number of models to generate. Should at least be >= the number of given model types. model_types (list, optional) – Expects list containg names of mcfly default models (‘CNN’ ‘DeepConvLSTM’, ‘ResNet’, or ‘InceptionTime’), or custom model classes (see mcfly.models for examples on how such a class is build and what it must contain). Default is to use all built-in mcfly models. task (str) – Task type, either ‘classification’ or ‘regression’ metrics (list) – Metrics to calculate on the validation set. See https://keras.io/metrics/ for possible values. low_lr (float) – minimum of log range for learning rate: learning rate is sampled between 10(-low_lr) and 10(-high_lr) high_lr (float) – maximum of log range for learning rate: learning rate is sampled between 10(-low_lr) and 10(-high_lr) low_reg (float) – minimum of log range for regularization rate: regularization rate is sampled between 10(-low_reg) and 10(-high_reg) high_reg (float) – maximum of log range for regularization rate: regularization rate is sampled between 10(-low_reg) and 10(-high_reg) hyperparameter ranges can be specified according to the respective (Further) – types used. (model) –
Returns:	models – List of compiled models
Return type:	list

mcfly.train_models_on_samples(X_train, y_train, X_val, y_val, models, nr_epochs=5, subset_size=100, verbose=True, outputfile=None, model_path=None, early_stopping_patience='auto', batch_size=20, metric=None, class_weight=None)[source]¶

Given a list of compiled models, this function trains them all on a subset of the train data. If the given size of the subset is smaller then the size of the data, the complete data set is used.

Parameters:

X_train (Supported types:) –
- numpy array
- tf.data dataset. Should return a tuple of (inputs, targets) or (inputs, targets, sample_weights)
- generator or keras.utils.Sequence. Should return a tuple of (inputs, targets) or (inputs, targets, sample_weights)
The input dataset for training of shape (num_samples, num_timesteps, num_channels) More details can be found in the documentation for the Keras function Model.fit() [1]
y_train (numpy array) – The output classes for the train data, in binary format of shape (num_samples, num_classes) If the training data is a dataset, generator or keras.utils.Sequence, y_train should not be specified.
X_val (Supported types:) –
- numpy array
- tf.data dataset. Should return a tuple of (inputs, targets) or (inputs, targets, sample_weights)
- generator or keras.utils.Sequence. Should return a tuple of (inputs, targets) or (inputs, targets, sample_weights)
The input dataset for validation of shape (num_samples_val, num_timesteps, num_channels) More details can be found in the documentation for the Keras function Model.fit() [1]
y_val (numpy array) – The output classes for the validation data, in binary format of shape (num_samples_val, num_classes) If the validation data is a dataset, generator or keras.utils.Sequence, y_val should not be specified.
models (list of model, params, modeltypes) – List of keras models to train
nr_epochs (int, optional) – nr of epochs to use for training one model
subset_size – The number of samples used from the complete train set. If set to ‘None’ use the entire dataset. Default is 100, but should be adjusted depending on the type and size of the dataset. Subset is not supported for tf.data.Dataset objects or generators
verbose (bool, optional) – flag for displaying verbose output
outputfile (str, optional) – Filename to store the model training results
model_path (str, optional) – Directory to store the models as HDF5 files
early_stopping_patience (str, int) – Unless ‘None’ early Stopping is used for the model training. Set to integer to define how many epochs without improvement to wait for before stopping. Default is ‘auto’ in which case the patience will be set to number of epochs/10 (and not bigger than 5).
batch_size (int) – nr of samples per batch
metric (str) – DEPRECATED: metric to store in the history object
class_weight (dict, optional) – Dictionary containing class weights (example: {0: 0.5, 1: 2.})

Returns:

histories (list of Keras History objects) – train histories for all models
val_metrics (list of floats) – validation metrics of the models
val_losses (list of floats) – validation losses of the models
[1] (https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit)

mcfly.kNN_performance(X_train, y_train, X_val, y_val, k=1, task='classification')[source]¶

Performs k-Neigherst Neighbors and returns the validation performance score.

Returns accuracy if task is ‘classification’ or mean squared error if task is ‘regression’.

Parameters:	X_train (numpy array) – Train set of shape (num_samples, num_timesteps, num_channels) y_train (numpy array) – Class labels for train set X_val (numpy array) – Validation set of shape (num_samples, num_timesteps, num_channels) y_val (numpy array) – Class labels for validation set k (int) – Number of neighbors to use for classifying task (str) – Task type, either ‘classification’ or ‘regression’
Returns:	score – Performance score on the validation set
Return type:	float

mcfly.models¶

class mcfly.models.CNN(x_shape, number_of_classes, metrics=['accuracy'], cnn_min_layers=1, cnn_max_layers=10, cnn_min_filters=10, cnn_max_filters=100, cnn_min_fc_nodes=10, cnn_max_fc_nodes=2000, **base_parameters)[source]¶

Bases: object

Generate CNN model and hyperparameters.

create_model(filters, fc_hidden_nodes, learning_rate=0.01, regularization_rate=0.01, task='classification')[source]¶

Generate a convolutional neural network (CNN) model.

The compiled Keras model is returned.

Parameters:	filters (list of ints) – number of filters for each convolutional layer fc_hidden_nodes (int) – number of hidden nodes for the hidden dense layer learning_rate (float) – learning rate regularization_rate (float) – regularization rate task (str) – Task type, either ‘classification’ or ‘regression’
Returns:	model – The compiled Keras model
Return type:	Keras model

generate_hyperparameters()[source]¶

Generate a hyperparameter set that define a CNN model.

Returns:	hyperparameters – parameters for a CNN model
Return type:	dict

model_name = 'CNN'¶

class mcfly.models.DeepConvLSTM(x_shape, number_of_classes, metrics=['accuracy'], deepconvlstm_min_conv_layers=1, deepconvlstm_max_conv_layers=10, deepconvlstm_min_conv_filters=10, deepconvlstm_max_conv_filters=100, deepconvlstm_min_lstm_layers=1, deepconvlstm_max_lstm_layers=5, deepconvlstm_min_lstm_dims=10, deepconvlstm_max_lstm_dims=100, **base_parameters)[source]¶

Bases: object

Generate DeepConvLSTM model and hyperparameters.

create_model(filters, lstm_dims, learning_rate=0.01, regularization_rate=0.01, task='classification')[source]¶

Generate a model with convolution and LSTM layers.

See Ordonez et al., 2016, http://dx.doi.org/10.3390/s16010115

Parameters:	filters (list of ints) – number of filters for each convolutional layer lstm_dims (list of ints) – number of hidden nodes for each LSTM layer learning_rate (float) – learning rate regularization_rate (float) – regularization rate task (str) – Task type, either ‘classification’ or ‘regression’
Returns:	model – The compiled Keras model
Return type:	Keras model

generate_hyperparameters()[source]¶

Generate a hyperparameter set that defines a DeepConvLSTM model.

Returns:	hyperparameters – parameters for a DeepConvLSTM model
Return type:	dict

model_name = 'DeepConvLSTM'¶

class mcfly.models.InceptionTime(x_shape, number_of_classes, metrics=['accuracy'], IT_min_network_depth=3, IT_max_network_depth=6, IT_min_filters_number=32, IT_max_filters_number=96, IT_min_max_kernel_size=10, IT_max_max_kernel_size=100, **_other)[source]¶

Bases: object

create_model(filters_number, network_depth=6, use_residual=True, use_bottleneck=True, max_kernel_size=20, learning_rate=0.01, regularization_rate=0.0, task='classification')[source]¶

Generate a InceptionTime model. See Fawaz et al. 2019.

The compiled Keras model is returned.

Parameters:	input_shape (tuple) – Shape of the input dataset: (num_samples, num_timesteps, num_channels) class_number (int) – Number of classes for classification task filters_number (int) – number of filters for each convolutional layer network_depth (int) – Depth of network, i.e. number of Inception modules to stack. use_residual (bool) – If =True, then residual connections are used. Default is True. use_bottleneck (bool) – If=True, bottleneck layer is used at the entry of Inception modules. Default is true. max_kernel_size (int,) – Maximum kernel size for convolutions within Inception module. learning_rate (float) – learning rate regularization_rate (float) – regularization rate task (str) – Task type, either ‘classification’ or ‘regression’
Returns:	model – The compiled Keras model
Return type:	Keras model

generate_hyperparameters()[source]¶

Generate a hyperparameter set for an InceptionTime model.

Returns:	hyperparameters – Hyperparameter ranges for a InceptionTime model
Return type:	dict

model_name = 'InceptionTime'¶

class mcfly.models.ResNet(x_shape, number_of_classes, metrics=['accuracy'], resnet_min_network_depth=2, resnet_max_network_depth=5, resnet_min_filters_number=32, resnet_max_filters_number=128, resnet_min_max_kernel_size=8, resnet_max_max_kernel_size=32, **_other)[source]¶

Bases: object

Generate ResNet model and hyperparameters.

create_model(min_filters_number, max_kernel_size, network_depth=3, learning_rate=0.01, regularization_rate=0.01, task='classification')[source]¶

Generate a ResNet model (see also https://arxiv.org/pdf/1611.06455.pdf).

The compiled Keras model is returned.

Parameters:	min_filters_number (int) – Number of filters for first convolutional layer max_kernel_size (int,) – Maximum kernel size for convolutions within Inception module network_depth (int) – Depth of network, i.e. number of Inception modules to stack. Default is 3. learning_rate (float) – Set learning rate. Default is 0.01. regularization_rate (float) – Set regularization rate. Default is 0.01. task (str) – Task type, either ‘classification’ or ‘regression’
Returns:	model – The compiled Keras model
Return type:	Keras model

generate_hyperparameters()[source]¶

Generate a hyperparameter set that define a ResNet model.

Returns:	hyperparameters – parameters for a ResNet model
Return type:	dict

model_name = 'ResNet'¶

mcfly.find_architecture module¶

Summary: This module provides the main functionality of mcfly: searching for an optimal model architecture. The work flow is as follows: Function generate_models from modelgen.py generates and compiles models. Function train_models_on_samples trains those models. Function find_best_architecture is wrapper function that combines these steps. Example function calls can be found in the tutorial notebook (https://github.com/NLeSC/mcfly-tutorial)

mcfly.find_architecture.find_best_architecture(X_train, y_train, X_val, y_val, verbose=True, number_of_models=5, nr_epochs=5, subset_size=100, outputpath=None, model_path=None, metric=None, class_weight=None, **kwargs)[source]¶

Tries out a number of models on a subsample of the data, and outputs the best found architecture and hyperparameters.

Infers the task (classification vs. regression) automatically from the input data. For further details, see the Technical documentation.

Parameters:

X_train (Supported types:) –
- numpy array
- tf.data dataset. Should return a tuple of (inputs, targets) or (inputs, targets, sample_weights)
- generator or keras.utils.Sequence. Should return a tuple of (inputs, targets) or (inputs, targets, sample_weights)
The input dataset for training of shape (num_samples, num_timesteps, num_channels) More details can be found in the documentation for the Keras function Model.fit() [1]
y_train (numpy array) – The output classes for the train data, in binary format of shape (num_samples, num_classes) If the training data is a dataset, generator or keras.utils.Sequence, y_train should not be specified.
X_val (Supported types:) –
- numpy array
- tf.data dataset. Should return a tuple of (inputs, targets) or (inputs, targets, sample_weights)
- generator or keras.utils.Sequence. Should return a tuple of (inputs, targets) or (inputs, targets, sample_weights)
The input dataset for validation of shape (num_samples_val, num_timesteps, num_channels) More details can be found in the documentation for the Keras function Model.fit() [1]
y_val (numpy array) – The output classes for the validation data, in binary format of shape (num_samples_val, num_classes) If the validation data is a dataset, generator or keras.utils.Sequence, y_val should not be specified.
verbose (bool, optional) – flag for displaying verbose output
number_of_models (int, optiona) – The number of models to generate and test
nr_epochs (int, optional) – The number of epochs that each model is trained
subset_size (int, optional) – The size of the subset of the data that is used for finding the optimal architecture. Default is 100. If set to ‘None’ use the entire dataset. Subset is not supported for tf.data.Dataset objects or generators
outputpath (str, optional) – File location to store the model results
model_path (str, optional) – Directory to save the models as HDF5 files
class_weight (dict, optional) – Dictionary containing class weights (example: {0: 0.5, 1: 2.})
metric (str, optional) – metric that is used to evaluate the model on the validation set. See https://keras.io/metrics/ for possible metrics
**kwargs (key-value parameters) – parameters for generating the models (see docstring for modelgen.generate_models)

Returns:

best_model (Keras model) – Best performing model, already trained on a small sample data set.
best_params (dict) – Dictionary containing the hyperparameters for the best model
best_model_type (str) – Type of the best model
knn_performance (float) – performance score for kNN prediction on validation set

[1]: https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit

mcfly.find_architecture.kNN_performance(X_train, y_train, X_val, y_val, k=1, task='classification')[source]¶

Performs k-Neigherst Neighbors and returns the validation performance score.

Returns accuracy if task is ‘classification’ or mean squared error if task is ‘regression’.

Parameters:	X_train (numpy array) – Train set of shape (num_samples, num_timesteps, num_channels) y_train (numpy array) – Class labels for train set X_val (numpy array) – Validation set of shape (num_samples, num_timesteps, num_channels) y_val (numpy array) – Class labels for validation set k (int) – Number of neighbors to use for classifying task (str) – Task type, either ‘classification’ or ‘regression’
Returns:	score – Performance score on the validation set
Return type:	float

mcfly.find_architecture.store_train_hist_as_json(params, model_type, history, outputfile, metric_name=None)[source]¶

This function stores the model parameters, the loss and accuracy history of one model in a JSON file. It appends the model information to the existing models in the file.

Parameters:	params (dict) – parameters for one model model_type (Keras model object) – Keras model object for one model history (dict) – training history from one model outputfile (str) – path where the json file needs to be stored metric_name (str, optional) – DEPRECATED: name of metric from history to store

mcfly.find_architecture.train_models_on_samples(X_train, y_train, X_val, y_val, models, nr_epochs=5, subset_size=100, verbose=True, outputfile=None, model_path=None, early_stopping_patience='auto', batch_size=20, metric=None, class_weight=None)[source]¶

Given a list of compiled models, this function trains them all on a subset of the train data. If the given size of the subset is smaller then the size of the data, the complete data set is used.

Parameters:

X_train (Supported types:) –
- numpy array
- tf.data dataset. Should return a tuple of (inputs, targets) or (inputs, targets, sample_weights)
- generator or keras.utils.Sequence. Should return a tuple of (inputs, targets) or (inputs, targets, sample_weights)
The input dataset for training of shape (num_samples, num_timesteps, num_channels) More details can be found in the documentation for the Keras function Model.fit() [1]
y_train (numpy array) – The output classes for the train data, in binary format of shape (num_samples, num_classes) If the training data is a dataset, generator or keras.utils.Sequence, y_train should not be specified.
X_val (Supported types:) –
- numpy array
- tf.data dataset. Should return a tuple of (inputs, targets) or (inputs, targets, sample_weights)
- generator or keras.utils.Sequence. Should return a tuple of (inputs, targets) or (inputs, targets, sample_weights)
The input dataset for validation of shape (num_samples_val, num_timesteps, num_channels) More details can be found in the documentation for the Keras function Model.fit() [1]
y_val (numpy array) – The output classes for the validation data, in binary format of shape (num_samples_val, num_classes) If the validation data is a dataset, generator or keras.utils.Sequence, y_val should not be specified.
models (list of model, params, modeltypes) – List of keras models to train
nr_epochs (int, optional) – nr of epochs to use for training one model
subset_size – The number of samples used from the complete train set. If set to ‘None’ use the entire dataset. Default is 100, but should be adjusted depending on the type and size of the dataset. Subset is not supported for tf.data.Dataset objects or generators
verbose (bool, optional) – flag for displaying verbose output
outputfile (str, optional) – Filename to store the model training results
model_path (str, optional) – Directory to store the models as HDF5 files
early_stopping_patience (str, int) – Unless ‘None’ early Stopping is used for the model training. Set to integer to define how many epochs without improvement to wait for before stopping. Default is ‘auto’ in which case the patience will be set to number of epochs/10 (and not bigger than 5).
batch_size (int) – nr of samples per batch
metric (str) – DEPRECATED: metric to store in the history object
class_weight (dict, optional) – Dictionary containing class weights (example: {0: 0.5, 1: 2.})

Returns:

histories (list of Keras History objects) – train histories for all models
val_metrics (list of floats) – validation metrics of the models
val_losses (list of floats) – validation losses of the models
[1] (https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit)

mcfly.modelgen module¶

mcfly.modelgen.generate_models(x_shape, number_of_output_dimensions, number_of_models, model_types=['CNN', 'DeepConvLSTM', 'ResNet', 'InceptionTime'], task='classification', metrics=['accuracy'], **hyperparameter_ranges)[source]¶

Generate one or multiple untrained Keras models with random hyperparameters. The number of models per given model type will be roughly balanced.

Parameters:	x_shape (tuple) – Shape of the input dataset: (num_samples, num_timesteps, num_channels) number_of_output_dimensions (int) – Number of classes for classification task or number of targets for regression. number_of_models (int) – Number of models to generate. Should at least be >= the number of given model types. model_types (list, optional) – Expects list containg names of mcfly default models (‘CNN’ ‘DeepConvLSTM’, ‘ResNet’, or ‘InceptionTime’), or custom model classes (see mcfly.models for examples on how such a class is build and what it must contain). Default is to use all built-in mcfly models. task (str) – Task type, either ‘classification’ or ‘regression’ metrics (list) – Metrics to calculate on the validation set. See https://keras.io/metrics/ for possible values. low_lr (float) – minimum of log range for learning rate: learning rate is sampled between 10(-low_lr) and 10(-high_lr) high_lr (float) – maximum of log range for learning rate: learning rate is sampled between 10(-low_lr) and 10(-high_lr) low_reg (float) – minimum of log range for regularization rate: regularization rate is sampled between 10(-low_reg) and 10(-high_reg) high_reg (float) – maximum of log range for regularization rate: regularization rate is sampled between 10(-low_reg) and 10(-high_reg) hyperparameter ranges can be specified according to the respective (Further) – types used. (model) –
Returns:	models – List of compiled models
Return type:	list

mcfly¶

mcfly.models¶

mcfly.find_architecture module¶

mcfly.modelgen module¶

Table of Contents

Previous topic

This Page