Tutorial: Predict prices using regression with Model Builder
In this tutorial, you learn how to use ML.NET Model Builder to build a regression model to predict prices. The .NET console app that you develop in this tutorial predicts taxi fares based on historical New York taxi fare data.
- Prepare and understand the data
- Create a Model Builder Config file
- Choose a scenario
- Load the data
- Train the model
- Evaluate the model
- Use the model for predictions
The Model Builder price prediction template can be used for any scenario requiring a numerical prediction value. Example scenarios include: house price prediction, demand prediction, and sales forecasting.
Prerequisites
For a list of pre-requisites and installation instructions, visit the Model Builder installation guide.
Create a console application
Create a C# Console Application called "TaxiFarePrediction". Make sure Place solution and project in the same directory is unchecked.
Prepare and understand the data
Create a directory named Data in your project to store the data set files.
The data set used to train and evaluate the machine learning model is originally from the NYC TLC Taxi Trip data set.
To download the data set, navigate to the taxi-fare-train.csv download link.
When the page loads, right-click anywhere on the page and select Save as.
Use the Save As Dialog to save the file in the Data folder you created at the previous step.
In Solution Explorer, right-click the taxi-fare-train.csv file and select Properties. Under Advanced, change the value of Copy to Output Directory to Copy if newer.
Each row in the taxi-fare-train.csv
data set contains details of trips made by a taxi.
Open the taxi-fare-train.csv data set
The provided data set contains the following columns:
- vendor_id: The ID of the taxi vendor is a feature.
- rate_code: The rate type of the taxi trip is a feature.
- passenger_count: The number of passengers on the trip is a feature.
- trip_time_in_secs: The amount of time the trip took. You want to predict the fare of the trip before the trip is completed. At that moment you don't know how long the trip would take. Thus, the trip time is not a feature and you'll exclude this column from the model.
- trip_distance: The distance of the trip is a feature.
- payment_type: The payment method (cash or credit card) is a feature.
- fare_amount: The total taxi fare paid is the label.
The label
is the column you want to predict. When performing a regression task, the goal is to predict a numerical value. In this price prediction scenario, the cost of a taxi ride is being predicted. Therefore, the fare_amount is the label. The identified features
are the inputs you give the model to predict the label
. In this case, the rest of the columns with the exception of trip_time_in_secs are used as features or inputs to predict the fare amount.
Create Model Builder config file
When first adding Model Builder to the solution it will prompt you to create an mbconfig
file. The mbconfig
file keeps track of everything you do in Model Builder to allow you to reopen the session.
- In Solution Explorer, right-click the TaxiFarePrediction project, and select Add > Machine Learning Model.
- Name the
mbconfig
project TaxiFarePrediction, and click the Add button.
Choose a scenario
To train your model, you need to select from the list of available machine learning scenarios provided by Model Builder. In this case, the scenario is Value prediction
.
- In the scenario step of the Model Builder tool, select Value prediction scenario.
Select the environment
Model Builder can run the training on different environments depending on the scenario that was selected.
- Confirm the
Local (CPU)
item is selected, and click the Next step button.
Load the data
Model Builder accepts data from two sources, a SQL Server database or a local file in csv or tsv format.
- In the data step of the Model Builder tool, select File from the data source type selection.
- Select the Browse button next to the text box and use File Explorer to browse and select the taxi-fare-test.csv in the Data directory
- Choose fare_amount in the Column to predict (Label) dropdown.
- Click the Advanced data options link.
- In the Column settings tab, select the Purpose dropdown for the trip_time_in_secs column, and select Ignore to exclude it as a feature during training. Click the Save button to close the dialog.
- Click the Next step button.
Train the model
The machine learning task used to train the price prediction model in this tutorial is regression. During the model training process, Model Builder trains separate models using different regression algorithms and settings to find the best performing model for your dataset.
The time required for the model to train is proportionate to the amount of data. Model Builder automatically selects a default value for Time to train (seconds) based on the size of your data source.
- Leave the default value as is for Time to train (seconds) unless you prefer to train for a longer time.
- Select Start Training.
Throughout the training process, progress data is displayed in the Training results
section of the train step.
- Status displays the completion status of the training process.
- Best accuracy displays the accuracy of the best performing model found by Model Builder so far. Higher accuracy means the model predicted more correctly on test data.
- Best algorithm displays the name of the best performing algorithm performed found by Model Builder so far.
- Last algorithm displays the name of the algorithm most recently used by Model Builder to train the model.
Once training is complete the mbconfig
file will have the generated model called TaxiFarePrediction.zip
after training and two C# files with it:
- TaxiFare.consumption.cs: This file has a public method that will load the model and create a prediction engine with it and return the prediction.
- TaxiFare.training.cs: This file consists of the training pipeline that Model Builder came up with to build the best model including any hyperparameters that it used.
Click the Next step button to navigate to the evaluate step.
Evaluate the model
The result of the training step will be one model which had the best performance. In the evaluate step of the Model Builder tool, in the Best model section, will contain the algorithm used by the best performing model in the Model entry along with metrics for that model in RSquared.
Additionally, in the Output window of Visual Studio, there will be a summary table containing top models and their metrics.
This section will also allow you to test your model by performing a single prediction. It will offer text boxes to fill in values and you can click the Predict button to get a prediction from the best model. By default this will be filled in by a random row in your dataset.
If you're not satisfied with your accuracy metrics, some easy ways to try and improve model accuracy are to increase the amount of time to train the model or use more data. Otherwise, click Next step to navigate to the consume step.
(Optional) Consume the model
This step will have project templates that you can use to consume the model. This step is optional and you can choose the method that best suits your needs on how to serve the model.
- Console App
- Web API
Console app
When adding a console app to your solution, you will be prompted to name the project.
Name the console project TaxiFare_Console.
Click Add to solution to add the project to your current solution.
Run the application.
The output generated by the program should look similar to the snippet below:
Predicted Fare: 15.020833
Web API
When adding a web API to your solution, you will be prompted to name the project.
Name the Web API project TaxiFare_API.
Click Add to solution* to add the project to your current solution.
Run the application.
Open PowerShell and enter the following code where PORT is the port your application is listening on.
$body = @{ Vendor_id="CMT" Rate_code=1.0 Passenger_count=1.0 Trip_distance=3.8 Payment_type="CRD" } Invoke-RestMethod "https://localhost:<PORT>/predict" -Method Post -Body ($body | ConvertTo-Json) -ContentType "application/json"
If successful, the output should look similar to the text below:
score ----- 15.020833
Related content
To learn more about topics mentioned in this tutorial, visit the following resources:
Feedback
https://aka.ms/ContentUserFeedback.
Coming soon: Throughout 2024 we will be phasing out GitHub Issues as the feedback mechanism for content and replacing it with a new feedback system. For more information see:Submit and view feedback for