5  Exercise

A note on reproducibility

As taught extensively in Applied Geodata Science I, we value reproducible and open workflows. Therefore, we strongly advice you to create a suitable work environment. This includes proper version control of your code via git and GitHub, package version control via {renv}, and general best-practices in organizing your files and code.

5.1 Your Project

After reading through this tutorial, you should have a solid understanding of how you can use Random Forest models for digital soil mapping. Based on the provided knowledge and code, it is now your task to improve and expand the analysis. As stated in Chapter Chapter 2, the model created in the tutorial picked random covariates for model building. This is of course nonsensical and should be your first step to improve the model. Find a way to create a workflow that filters for the most relevant predictors (do not pick random variables, and do not just add all variables to your final model - do you udnerstand why the latter makes no sense?…). What number of variables do you find to be suitable for your final model?

In the AGDS Book, we explain how to conduct hyperparameter tuning and cross-validation of RandomForests via the {caret} package. Read up on how to do this and implement your own routine to predict the top layer pH! Moreover, we explain how to use model-agnostic procedures to interpret your model in the AGDS Book. Conduct these tests on your model and interpret your findings.

Finally, you should test your model as demonstrated in the tutorial. Give explanations for how and why your model performs differently than the one in this tutorial. Note that this exercise thrives on your curiosity to code! So, if you want to go further, you could also investigate the prediction of other soil properties, or test and compare other machine learning methods.