Identification of Protein Homologs and Domain Boundaries by Iterative Sequence Alignment

Dustin Schaeffer, Nick V. Grishin

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

Evolutionary domains are protein regions with observable sequence similarity to other known domains. Here we describe how to use common sequence and profile alignment algorithms (i.e., BLAST, HHsearch) to delineate putative domains in novel protein sequences, given a reference library of protein domains. In this case, we use our database of evolutionary domains (ECOD) as a reference, but other domain sequence libraries could be used (e.g., SCOP, CATH). We describe our domain partition algorithm along with specific notes on how to avoid domain indexing errors when working with multiple data sources and software algorithms with differing outputs.

LanguageEnglish (US)
Title of host publicationMethods in Molecular Biology
PublisherHumana Press Inc.
Pages277-286
Number of pages10
DOIs
StatePublished - Jan 1 2019

Publication series

NameMethods in Molecular Biology
Volume1851
ISSN (Print)1064-3745

Fingerprint

Sequence Alignment
Information Storage and Retrieval
Libraries
Software
Databases
Protein Domains
Proteins

Keywords

  • Homologs
  • Protein domains
  • Sequence alignment

ASJC Scopus subject areas

  • Molecular Biology
  • Genetics

Cite this

Schaeffer, D., & Grishin, N. V. (2019). Identification of Protein Homologs and Domain Boundaries by Iterative Sequence Alignment. In Methods in Molecular Biology (pp. 277-286). (Methods in Molecular Biology; Vol. 1851). Humana Press Inc.. https://doi.org/10.1007/978-1-4939-8736-8_15

Identification of Protein Homologs and Domain Boundaries by Iterative Sequence Alignment. / Schaeffer, Dustin; Grishin, Nick V.

Methods in Molecular Biology. Humana Press Inc., 2019. p. 277-286 (Methods in Molecular Biology; Vol. 1851).

Research output: Chapter in Book/Report/Conference proceedingChapter

Schaeffer, D & Grishin, NV 2019, Identification of Protein Homologs and Domain Boundaries by Iterative Sequence Alignment. in Methods in Molecular Biology. Methods in Molecular Biology, vol. 1851, Humana Press Inc., pp. 277-286. https://doi.org/10.1007/978-1-4939-8736-8_15
Schaeffer D, Grishin NV. Identification of Protein Homologs and Domain Boundaries by Iterative Sequence Alignment. In Methods in Molecular Biology. Humana Press Inc. 2019. p. 277-286. (Methods in Molecular Biology). https://doi.org/10.1007/978-1-4939-8736-8_15
Schaeffer, Dustin ; Grishin, Nick V. / Identification of Protein Homologs and Domain Boundaries by Iterative Sequence Alignment. Methods in Molecular Biology. Humana Press Inc., 2019. pp. 277-286 (Methods in Molecular Biology).
@inbook{953b80a9fbfb428683ecc728ac1c1f81,
title = "Identification of Protein Homologs and Domain Boundaries by Iterative Sequence Alignment",
abstract = "Evolutionary domains are protein regions with observable sequence similarity to other known domains. Here we describe how to use common sequence and profile alignment algorithms (i.e., BLAST, HHsearch) to delineate putative domains in novel protein sequences, given a reference library of protein domains. In this case, we use our database of evolutionary domains (ECOD) as a reference, but other domain sequence libraries could be used (e.g., SCOP, CATH). We describe our domain partition algorithm along with specific notes on how to avoid domain indexing errors when working with multiple data sources and software algorithms with differing outputs.",
keywords = "Homologs, Protein domains, Sequence alignment",
author = "Dustin Schaeffer and Grishin, {Nick V.}",
year = "2019",
month = "1",
day = "1",
doi = "10.1007/978-1-4939-8736-8_15",
language = "English (US)",
series = "Methods in Molecular Biology",
publisher = "Humana Press Inc.",
pages = "277--286",
booktitle = "Methods in Molecular Biology",

}

TY - CHAP

T1 - Identification of Protein Homologs and Domain Boundaries by Iterative Sequence Alignment

AU - Schaeffer, Dustin

AU - Grishin, Nick V.

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Evolutionary domains are protein regions with observable sequence similarity to other known domains. Here we describe how to use common sequence and profile alignment algorithms (i.e., BLAST, HHsearch) to delineate putative domains in novel protein sequences, given a reference library of protein domains. In this case, we use our database of evolutionary domains (ECOD) as a reference, but other domain sequence libraries could be used (e.g., SCOP, CATH). We describe our domain partition algorithm along with specific notes on how to avoid domain indexing errors when working with multiple data sources and software algorithms with differing outputs.

AB - Evolutionary domains are protein regions with observable sequence similarity to other known domains. Here we describe how to use common sequence and profile alignment algorithms (i.e., BLAST, HHsearch) to delineate putative domains in novel protein sequences, given a reference library of protein domains. In this case, we use our database of evolutionary domains (ECOD) as a reference, but other domain sequence libraries could be used (e.g., SCOP, CATH). We describe our domain partition algorithm along with specific notes on how to avoid domain indexing errors when working with multiple data sources and software algorithms with differing outputs.

KW - Homologs

KW - Protein domains

KW - Sequence alignment

UR - http://www.scopus.com/inward/record.url?scp=85054735639&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85054735639&partnerID=8YFLogxK

U2 - 10.1007/978-1-4939-8736-8_15

DO - 10.1007/978-1-4939-8736-8_15

M3 - Chapter

T3 - Methods in Molecular Biology

SP - 277

EP - 286

BT - Methods in Molecular Biology

PB - Humana Press Inc.

ER -