This subproject is one of many research subprojects utilizing the resources provided by a Center grant funded by NIH/NCRR. The subproject and investigator (PI) may have received primary funding from another NIH source, and thus could be represented in other CRISP entries. The institution listed is for the Center, which is not necessarily the institution for the investigator. Service BIOPSE (0001) Subproject Description The Biomedical Problem Solving Environment, or BioPSE, lies at the computational core of Center for Integrative Biomedical Computing. In this environment we develop modules, design simulations, and conduct scientific studies;a carefully crafted design is required for such an environment to meet all of its users'needs. This environment provides an infrastructure that liberates the scientist from the user-intensive, mundane tasks associated with most existing software tools. By commanding a modular, extensible, interactive set of software tools, scientists are free to apply their expertise to the science at hand. TRADITIONAL METHODS Traditional methods for investigating biomedical applications often use multiple, non-integrated computer programs. For example, a scientist using a computer simulation to examine the effect of electrode patch placement on transcardiac current density in defibrillation would require geometric modeling, numerical simulation, and scientific visualization tools to complete the task. One program might be used to define the thoracic surfaces from medical images, another to create a discrete mesh of the volume contained within the surfaces. Another application, like MATLAB, might be used to run a finite element simulation of the electric current distribution from the electrodes through the thoracic volume. Another approach might be to write a Fortran program using a public domain numerical library such as LAPACK. To see the output would require a scientific visualization package. It would be necessary to save the output for each of these steps in a format that could be read by the next sequence, and this might necessitate separate file format conversion utilities. To find the optimal location, shape, and size parameters for the defibrillating electrode, the scientist would have to go back to the geometric modeling package, change the necessary parameters, re-edit the input file, possibly recompile and relink the code, and manually rerun all of the subsequent steps to see how the new electrode configuration affects the current density distribution, and then manually iterate. The manual intervention required to drive this process is both tedious and time consuming. A more efficient scenario is one in which the user defines an appropriate set of parameters for a given simulation run, executes the defined simulations, and saves the results for subsequent examination. The complete execution of the sequence might require hours or even days, but the user would be free during that time to perform other tasks. In our example of the defibrillation simulation, the scientist could select various locations and orientations for the defibrillation electrodes, choose values for the other parameters of the simulation (e.g. the number of nodes in the finite element model, boundary conditions, error tolerance for convergence, and the evaluation criteria), and leave the simulations to run as long as necessary. Viewing the results would only occur after the completion of the simulation and might be as simple as watching the animation produced by the simulation or scanning other defibrillation quality indices such as maximum and minimum current density magnitude or current density histograms from the heart. This is an example of batch processing, i.e. the automated execution process whereby the user selects all of the parameters in advance and does not control the intraor interpackage execution. A primary benefit of batch processing is that it allows the scientist to utilize computational resources without the need to continuously guide the process. However, with most computer programs, execution cannot be automated. That is, the package will not run without regular user intervention during execution. Even if batch processing is possible, it leaves the user with limited control of an ongoing simulation and hence little chance to divert an unsuccessful or meaningless simulation until it is completed. BioPSE The goal of BioPSE is to incorporate and integrate all of the steps described in the previous example as components of a single, unified, extensible problem solving environment (PSE) that permits constant monitoring and oversight. The resulting functionality includes the ability to manage each step in a sequential computing process, and also to create batch processes that execute repeated simulations. However, the functionality that sets BioPSE apart from most integrated software environments is its ability to intervene and control execution anywhere in the chain at any time during its execution, what is known as "computational steering". To provide a non-technical analogy, adding computational steering to a software environment is similar to adding the ability to change plans while traveling by train. A train passenger traditionally gets on the train and automatically gets to a destination, leaving all the details of the individual actions to the rail system machinery and staff. The route and the destination are fixed and the passenger assumes the route is optimized for travel time. Adding steering to the train system would permit each passenger to direct the train- or at least some part of it in which the passenger is sitting- to a new route, with different stops, and even a different destination, and be able to make these decisions at any time during the trip. Many simulation software packages act like the traditional train ride- one seeks a solution but does not care how efficient or robust that solution is achieved. Some simulation software leaves all control and responsibility in the programmer's hands, achieving the ultimate in freedom, but with no guidance or security that the result will be efficient or even correct. Computational steering provides a middle ground in which the user determines the course of the simulation, but with guidance and support through robust, efficient algorithms and extensive, intuitive feedback. In the more rigorous example of the defibrillation simulation, computational steering allows a scientist to interactively change parameters and settings as the simulation executes, both as a single run or in batch mode. Steering interventions might include adjusting electrode locations or shapes, trying different input voltage levels, or adjusting the geometric model resolution in order to balance accuracy and execution time. There are a number of challenges in building a powerful software architecture for biomedical applications: (1) Accessibility and Usability: How to satisfy a wide range of users (from software developers to biomedical scientists) who have applications spanning a wide variety of biomedical domains (from bioelectric fields to mouse phenotyping). (2) Integration and Extensibility: How to make the software extensible so users can extend it to fit their needs, and so it can interact with other software systems. (3) Performance and Control: How to make the software easy to use and robust, while also making it efficient and highperformance. ACCESSIBILITY AND USABILITY This term describes making software as easy to obtain and use as possible. Software that is accessible should be easy to find, simple to download, and straightforward to install. Usability depends on a large number of features such as clear, comprehensive documentation, intuitive interfaces, and a general design that suits the application and the expected user. We have achieved a substantial degree of accessibility and usability in BioPSE through our web-based download process, extensive documentation, and especially in creating dedicated biomedical applications that use the terminology and typical work flow concepts of the field. The Specific Aims related to Accessibility and Usability that we are undertaking at Center for Integrative Biomedical Computing include: (1) Building targeted BioPSE applications that will be useful to our collaborators and biomedical scientists in the associated communities. (2) Further reducing the time required to get new users to a point of productivity with BioPSE. (3) Investigating the infrastructure required to carry out a separation of the graphical user interface (GUI) from the algorithmic component of BioPSE without a loss in functionality, flexibility, or performance. To achieve such a separation, we anticipate the need for separate state and event managers to control communication and allocation of system resources. INTEGRATION AND EXTENSIBILITY Integration in computing refers to including all (or most) of the features required for a task into a single program so that the user is always in the same environment in which the same commands do consistent tasks and all the necessary steps and options are immediately at hand. Extensibility is related to integration but includes the ability to add functionality to the environment through some sort of programming or addition of more software components. In BioPSE, we achieve integration through a dataflow programming model in which "modules" perform tasks on data streams ("data pipes") that pass from module to module in what we call a "network". Data passes from module to module, never leaving the program until the task is complete so that all functionality is integrated and available through a consistent user interface. The system is extensible at the network level because a user can add modules and extend the capabilities of the network. BioPSE is also more globally extensible by the addition of new modules, which any user with some programming background can create (or share with others). There is a second approach to extensibility beyond adding modules directly to BioPSE in the form of new code that conforms to its data structures;one that is more inclusive and bidirectional. This form of extensibility makes it possible to incorporate other pieces of software through a more flexible and general interface than is required to integrate program code directly. This approach involves building bridges to other programs rather than annexing them into an ever larger monolith. Moreover, it offers the opportunity to share code from the existing BioPSE base with other existing programs, thus extending capabilities in both directions. This is the model we have adopted for the future of BioPSE and that we have begun to implement. The Specific Aims related to Integrating and Extensibility that we are undertaking at Center for Integrative Biomedical Computing include: (1) Developing infrastructure and support tools that simplify the process of linking other software systems to BioPSE. (2) Developing the necessary interfaces and modularity such that external programs can access core BioPSE functionality. (3) Investigating the computer architectural elements necessary to provide efficient bridging of BioPSE with other substantial software systems using a generalized component interface. PERFORMANCE AND CONTROL Performance and control are essential attributes of software design that become increasingly critical as application size and complexity grow. Biomedical data sets continue to rapidly grow in size with MRI volumes now often containing hundreds of megabytes;CT volumes can encompass gigabytes;and confocal microscopy datasets can reach tens of gigabytes. As scientists are able to scan and simulate datasets at finer and finer resolutions, the software tools that work with those data sets must keep pace. Similarly, the complexity of simulations grows exponentially, resulting in equation systems that include up to 25 million points and the most recent even reaching 125 million. From the outset, BioPSE was designed to support the modeling, simulation, and visualization of very large data sets. To achieve this goal, we have designed the algorithms of BioPSE to make careful use of memory, and we have hand-tuned the performance of critical portions of the BioPSE software. While efficient data-structures and algorithms greatly aid BioPSE in calculating results for large data sets and simulations, the greatest data efficiency in the system comes from avoiding unnecessary computation, entirely. Specifically, through BioPSE we support computational steering for manual program control and incorporate "coherence accelerators" to allow automatic control and regulation of program execution. The coherence accelerators in BioPSE are run-time optimizations that take advantage of previously computed results. When a module's inputs have not changed from the previous execution, the module can reuse the output results that were computed last time. Similarly, for iterative algorithms, such as some linear system solvers, previous results can be used as a first guess to seed the algorithm, thus reducing the number of required iterations. These acceleration gains are a natural potential advantage that integrated dataflow systems have over the more traditional systems composed of disjointed components. We exploit this advantage ubiquitously throughout BioPSE. The Specific Aims related to Performance and Control that we are undertaking at Center for Integrative Biomedical Computing include: (1) Developing and implementing automated testing measures that will improve the robustness and stability of the BioPSE software. (2) Developing automatic saving and restarting infrastructure to increase the fault tolerance of the BioPSE system. (3) Investigating the infrastructural and communications measures required to maintain high performance for large networks within BioPSE and during bridging-based integration with other software systems. (4) Exploring and creating strategies for efficient management of large data sets through measures such as streaming, caching, and database archiving.