A flexible enterprise software architecture
The ISB Informatics Infrastructure, referred to as I3, is a modular, service-oriented research enterprise architecture which is capable of integrating emerging technologies. The I3 enterprise architecture is designed for interoperability and extensibility, and uses facets of both 'top-down' and 'bottom-up' design. In I3 developers can use their own evolving data models. However, formally defined domain specific data models and services are also provided through a number of common services. This architecture is designed to be flexible, interoperable and light weight, while enabling the rapid development of new solutions and integration of new technologies.There are two sides to the architecture: data access and data analysis. The data access uses LSIDs to provide an identity system for mapping data items to each other and to their RDF encoded metadata. Relationship information is navigated through the RDF documents. The data analysis architecture is based around Web Services, with an ontology describing the Web Service being stored in a registry service, so that resources can be reasoned over and discovered at run time. New services and data access are integrated by writing lightweight wrappers. This is a "model free" architecture, where there is no direct imposition of a structured data model on clients (which can be written in a variety of languages). However, a standard ID mechanism coupled with the use of "meta models" and ontologies means that a formal data centric integration strategy is available to developers if they wish to use it.
Adaptable data management system
Within research there is a continued introduction of new technologies and techniques. These are often high throughput and automated, and their usage is continually evolving. To support these requirements we have built a data management system that can be rapidly adapted for new usage.The data management system is designed to support the seamless mining and analysis of biological experiment data that is commonly used in systems biology (e.g. ChIP-chip, gene expression, proteomics, imaging, FACS). We use different content graphs to represent different views upon the data. Links between these views are dynamic and resolved at runtime. This means that the management system allows for both the rapid introduction of new types of information and the evolution of the knowledge it represents.
Rather than build a system
The management system is being extended to allow for multiple levels of integration, so that experimental results can simply be "dropped" into the system and immediately made available, and can later be migrated through a state machine to allow for more complex representations. The architecture is also being extended to provide for materialized views through dynamic data transformation, context searching, project working, history mechanisms and relationship navigation.
BIOINFORMATICS APPLICATION DEVELOPMENT
The informatics group develop applications and algorithms for usage in specific life science research areas.Cytoscape
Members of the team work on the core development of Cytoscape. Cytoscape is the leading network analysis and visualization tool. It is an open source community led software development project. Cytoscape was originally developed at the ISB, and is now maintained by the Cytoscape Consortium which consists of members from ISB, UCSD, MSKCC, Pasteur and Agilent.
Analysis and ETL Pipelines
A number of componentized pipelines have been built to enable the flexible analysis and processing of experiment data. The majority of these tools have been built for the processing of genomic and microscopy data, and are run within the GenePattern toolset environment. The use of a toolset builder allows for the rapid development and customisation of toolsets by non-software engineers.The toolsets that have been constructed include those for the analysis of various ChIPChip tiling and Gene Expression arrays, as well as for image analysis.
Research Informatics





