Webtools bundle ma

1/17/2024

Using input Option D, the UniProt IDs in clusters in UniRef SSNs are used to obtain higher resolution SSNs. For large families, the UniRef90 or UniRef50 database is used in which sequences are conflated at 90% and 50% sequence identity threshold, respectively UniRef SSNs contain fewer nodes and edges than UniProt SSNs so can be more easily manipulated with laboratory laptop/desktop computers. Using input Option B, the user specifies a Pfam and/or InterPro family with sequences from the UniProt database. The EFI-EST tool was developed for generating SSNs for protein families ( ). The Enzyme Function Initiative (EFI ) developed strategies and tools to facilitate experimental assignment of in vitro activities and in vivo metabolic functions to uncharacterized enzymes discovered in genome projects. We first provide brief descriptions of EFI-EST and EFI-GNT and then examples of their use for 1) surveying sequence-function space in protein families to identify candidate proteins with novel properties/functions and 2) discovering enzymes in novel metabolic pathways.Ģ. EFI-EST for Generating Sequence Similarity Networks (SSNs) The tools have been cited in >300 publications a list is available on the web resource “ ? Training” page ( ). Since EFI-EST and EFI-GNT were introduced in 2014, >5,200 users have submitted >45,000 jobs to EFI-EST >1,500 users have submitted >14,000 jobs to EFI-GNT. We developed a publicly accessible web resource ( ) with tools that “democratize” genome enzymology : 1) EFI-EST for generating SSNs ( ) and 2) EFI-GNT for mining genome contexts ( ). We use “genomic enzymology” to describe the integration of analyses of sequence-function space in a protein family, together with the genome context of its members, to predict the enzymatic activities and the metabolic pathways in which they function. Thus, for an uncharacterized enzyme, genome context can provide information about the identity of the reaction as well as those of its neighbors. In bacterial, archaeal, and fungal genomes, operons and/or gene clusters often encode functionally linked enzymes in metabolic pathways. Mapping experimentally established functions, e.g., SwissProt-curated, on the SSN nodes allows identification of clusters with known functions uncharacterized clusters may contain enzymes in novel metabolic pathways and/or identify the starting points for evolution of functions for novel applications. As the sequence similarity threshold increases, the nodes segregate into isofunctional clusters. An SSN displays the results of an all-by-all pairwise sequence comparison (BLAST): each sequence is represented by a “node” nodes are connected by a line (“edge”) if they share a minimum user-specified sequence similarity. Sequence similarity networks (SSNs) now are used widely for analyses of sequence-function space in protein families. Most entries in the UniProt database (>80%) are assigned to at least one Pfam family and/or InterPro family that provides a (sometimes tentative) description of function. The challenge is organizing and leveraging the data. The amount of information is “amazing”: it should not be viewed as overwhelming but as an opportunity for discovery.

When this Opinion was completed (August 1, 2020), the UniProt database contained 185,561,210 entries (Release 2020_03, J).

0 Comments

Webtools bundle ma

Leave a Reply.

Author

Archives

Categories