Once a data problem has been identified, and an aligned algorithm set selected, data is collected, prepared, trained, and tested, and finally tuned to obtain optimum results.
After identifying a data problem (e.g., predicting an outcome, categorizing a likely value, etc.), an analyst chooses from classes of algorithms most aligned to the problem type (for example, if the problem is determining a customer’s LTV as low, mid, or high, a classification algorithm would apply). The analyst would then gather enough data volume to be split into a training dataset and a testing dataset. After data collection, data is prepared by cleaning, formatting (e.g., changing unstructured data to structured fields), and addressing any data features that could corrupt the training and impact algorithm performance.
Prepared data is then used to train the algorithm, so that it learns the internal logic that maps inputs to outputs. Then the algorithm is tested against the separate test dataset to measure learning and performance. Many ML projects include multiple algorithms to compare and select the highest performing.
Finally, after selecting the best-performing algorithm, the model is tuned by adjusting parameters unique to the algorithm, and testing until optimum model performance is achieved.
Learn more about Machine Learning here.