Skip to main content

Models

Building production-ready models with Raptor is easy. You can use any model framework you want, and Raptor will take care of the rest.

To do that, you'll need to define the training function, and specify it's input features, label(also a feature), and the model framework you're using.

@model(
keys=['customer_id'],
input_features=['total_spend+sum'],
input_labels=[amount],
model_framework='sklearn',
model_server='sagemaker-ack',
)
@freshness(max_age='1h', max_stale='100h')
def amount_prediction(ctx: TrainingContext):
from sklearn.linear_model import LinearRegression

df = ctx.features_and_labels()

trainer = LinearRegression()
trainer.fit(df[ctx.input_features], df[ctx.input_labels])

return trainer
When should we use Feature Selectors?

You might notice that we're using a feature selector in the input_features field. Feature Selectors are a way to reference a specific representation of a feature.

When we want to add a feature that have an aggregation, we must use a selector to reference it: <name>+<aggrFn>.

For more information, see Feature Selectors.

Getting the training data

To get the training data, we can use the features_and_labels() function of the training context.

df = ctx.features_and_labels()

This will return a pandas dataframe that contains the features and labels.

To get the features, we can use the input_features field of the training context.

df[ctx.input_features]

Or, we can split the data to training set and test set:

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
df[ctx.input_features],
df[ctx.input_labels],
test_size=0.2,
random_state=42,
)

Training the model locally before exporting it

Sometimes, we want to train the model locally before exporting it. This is useful when we want to iterate and experiment with the model.

To do that, we can use the new train() method of training function

mymodel = amount_prediction()

# %%

data = amount_prediction.features_and_labels()

from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

x = data[amount_prediction.input_features]
y = data[amount_prediction.input_labels]
_, x_test, _, y_test = train_test_split(x, y, test_size=0.2, stratify=y, random_state=1234)

y_pred = mymodel.predict(x_test)
res = classification_report(y_pred, y_test.values.ravel())

Key Feature

The key feature is the feature that we used to join the rest of the features with, i.e.:

timestampentity_idfeature_1feature_2feature_3
10:0010.........
10:0110.........
10:0210.........

As you see in the table below, we are using the key feature (feature_1) timestamps to join the rest of the features - that means that we are getting the feature value of the rest of the features at the same time as the key feature.

Setting the key feature

The key feature is defined as the first feature in the feature set. Alternatively, you can set the key feature using the options of the decorator:

@model(
key_feature='feature_1',
...
)