At the end of November 2000, I attended the "2nd International Conference on Applications of Machine Learning to Ecological Modelling" in Adelaide, Australia. As I cleared customs into Australia, I was thinking of the old joke about entering Australia. Customs Official: "Have you got a criminal record"; Passenger: "I didn’t know that it was still necessary to have one". Jokes aside, Australia is a nice country to visit and the people are friendly and warm (especially since on my day of arrival it was 34oC and the next day was a sweltering 39oC!). Adelaide is known as the city of churches and for seafood from the Spencer Gulf, and is surrounded by the best-known wine region in Australia, the Barossa Valley: not surprisingly the people from Adelaide are known to drink wine and eat seafood religiously.
 
The conference itself had 60 delegates, allowing ample time for questions and discussion. This fostered a stimulating scientific environment, with many cutting-edge techniques in data analysis being discussed. An interesting debate emerged during the conference about the relative merits of the two main approaches to ecological modelling: the traditional dynamic modelling approach and the machine learning or empirical approach.
 
Dynamic modelling usually involves identifying a series of equations to explicitly describe a process and to link these processes together. It is based on an understanding (albeit often inadequate) of the system. An alternate approach is machine learning: this involves identifying relationships from the data themselves. The two approaches are complementary, each having advantages and disadvantages and each suited to a particular problem (see Fig. 1).

Dynamic modelling is useful when the problem is relatively well understood, the complexity is relatively low, and there is a small amount of data with low noise. It has the advantage of an underlying theory and causality. By contrast, machine learning is useful when little is known about a problem and there is a large amount of data that may have considerable noise. Machine learning has the advantage of using the data optimally and may be useful in many situations in marine ecology where we attempt to identify complex relationships from noisy data. The two approaches can be complementary, as the machine learning approach can lead to identification of critical variables that can then be modelled using the dynamic modelling approach. Thus, there may be a progression from a machine learning to a dynamic modelling approach as a discipline matures and understanding increases. Interestingly, hybrid approaches are possible, where the optimal formulation of a dynamic model can be achieved through machine learning techniques such as evolutionary algorithms.
There were a number of interesting applications of machine learning presented, ranging from predicting the effects of nuclear accidents to fish stock assessment. A particularly interesting talk was on the identification of marine microalgae by an artificial neural network (ANN), a machine learning technique.
 
Using flow cytometry, data can be collected on the optical and fluorescence characteristics of up to 1000 cells per second. As a cell passes through the laser beam of the cytometer, the cytometer records fluorescence at multiple wavelengths, together with light scatter at different angles. As a particle traverses the beam, measurements taken at successive time intervals constitute a "pulse" for each parameter, so that each particle passing the beam produces a set of pulses. ANNs have been trained to identify 72 species of phytoplankton based on their characteristic "signatures".In situ monitoring would allow immediate detection of harmful algal blooms.
 
There were many very interesting talks on predicting toxic cyanobacterial blooms in streams and lakes. On Australia’s longest river, the Murray, cyanobacterial blooms up to 1000 km long can render water undrinkable and have a devastating effect on wildlife. The occurrence of blooms was predicted from a suite of variables describing light, temperature and water quality. An interesting technique used to solve this problem was the evolutionary algorithm. These algorithms simulate natural evolution to find solutions to a problem. The procedure operates by generating initial random populations (different model formulations with different parameter values). These are then evaluated by comparison with the data and the best individuals (models) are kept and "bred" forming child models (with some modification of model formulation and parameterization akin to recombination in genetics). These models are then tested again and the process is repeated iteratively until the optimal model formulation is found. In this way, variables important for bloom formation and their critical ranges were identified.
It was clear from the conference that machine learning methods are a powerful group of techniques that can be applied in a wide variety of fields to identify important variables and for prediction. Considerable intellectual momentum (less strenuous than real momentum!) was generated during the conference, culminating in the formation of a new organization, the International Society for Ecological Informatics. Its charter is to "design and apply computational techniques for ecological analysis, synthesis and forecasting". There is now an opportunity for researchers in the marine community to apply some of these techniques. A good starting point for researchers wanting to learn more about this emerging field, is the upcoming special edition of Ecological Modelling containing papers from the Machine Learning conference. Finally, intellectually, socially and alcoholically the conference was excellent and I would like to take the opportunity to thank the ENVIFISH programme for providing the funds.
Anthony J Richardson
funded by European Union sponsored ENVIFISH programme (contract number: IC18-CT98-329).