The data pre-computing process is illustrated in Figure 1; web-based and stand-alone tools learn more were used separately. Web-based localization prediction tools were requested via a Web automat, a python automatic submission
workflow using both “”httplib”" and “”urllib”" libraries. A different script was created for each tool. For web-tools with no equivalent (such as “”TatP”" for Tat-BOX and “”LIPO”" for Lipoprotein-BOX) and incompatible with automatic requests, we collected results manually. CoBaltDB also provides a platform with automatically pre-filled forms for additional submissions to a selection of fifty recent or specific web tools (Table 4). The stand-alone tools selleck compound were installed on a Unix platform (unique common compatible platform) and included in a global python pipeline with the HTTP request scripts. We selected information from a up-to-date collection of 20 databases and integrated this data within CoBaltDB; these databases were retrieved by simple downloading or creating an appropriate script which navigates on the web databases to collect all protein information. The global python pipeline used multi-threading to speed up the pre-computation of the 784 proteomes. Figure 1 A schematic view of the CoBaltDB
workflow. CoBaltDB integrates the results of 43 localization predictors for 784 complete bacterial and archaeal proteomes. Each complete NCBI prokaryotic genome implemented in CoBaltDB was classified as: archaea, or monoderm or diderm bacteria. 101 protein subcellular location predictors were evaluated and few were rejected. Selected tools were classified as: feature localization tools (Specialized), localization meta-tools (Global) or databases. The data recovery process was performed manually or via a Web automat using a python automatic submission workflow for both stand-alone and web-based tools. Databases were downloaded. For each protein, ouptuts collected were parsed and selected items were stored in particular CoBaltDB formatted
files (.cbt). The parsing pipeline creates one “”.cbt”" file per replicon to compose the final CoBaltDB repository. The client CoBaltDB Graphical User Interface communicates with the Phospholipase D1 server-side repository via web services to provide graphical and tabular representations of the results. Database Creation and Architecture For each protein, every output collected (a HTML page for web tools and a text file for standalone applications) was parsed and selected items were stored in a particular format: Selleckchem AZD8186 binary “”marshal”" files. The object structure obtained by parsing tool output was directly saved into a marshal file, allowing a quick and easy opening by directly restoring the initial parsing object.