Scientific Machine Learning
The intersection between the challenges from the ML model and the scientific application
The development of novel methods is useless without the application needed to verify the effectiveness. While numerous science and engineering applications are available, each application would require an independent and careful design, there are many challenges that are common across many a domain. The data collection in most scientific application is huge nowadays with petabytes of data generated by these applications. Applications such as climate specifically collect this data across multiple domains, multiple times and spatial scales and with large variance in measurement errors. In such cases, methods must be developed that not only are capable of handling large quantities of data but can capture information over multiple scales. Since, backpropagation over multiple scales is ill defined because of the nature of errors and uncertainties over these scales. Moreover, take into account that in some datasets, the measurement uncertainty is less relative to the scenario when the measurement uncertainty is high. These scenarios lead to different fidelities in model precision depending on the uncertainty in the data.
How does the difference in fidelity effect the learning behavior?
What weight should be given to different scales of the problem? How to identify shift in fidelity when the amount of data is huge?
How does this characterization change with increase in the number of dimensions? What would be the type of model that would be required to address these challenges?
What is the software infrastructure required to address these challenges? Do we need high performance computers for these challenges?