CAS Registry BLAST logo

Similarity Search Input Formats

Similarity searches automatically detect and accept these formats:

Length restriction: Query sequences are limited to 50,000 characters for all input formats.

Examining your query: If you examine your query when viewing results, you will see the input query exactly as it was typed or read in.

FASTA Format

A sequence in FASTA format begins with a single-line description, followed by lines of sequence data. The description line begins with a greater than (>) symbol in the first column.

Example

>gi|129295|sp|P01013|OVAX_CHICK GENE X PROTEIN (OVALBUMIN-RELATED)
QIKDLLVSSSTDLDTTLVLVNAIYFKGMWKTAFNAEDTREMPFHVTKQESKPVQMMCMNNSFNVATLPAE
KMKILELPFASGDLSMLVLLPDEVSDLERIEKTINFEKLTEWTNPNTMEKRRVKVYLPQMKIEEKYNLTS
VLMALGMTDLFIPSANLTGISSAESLKISQAVHGAFMELSEDGIEMAGSTGVIEDIKHSPESEQFRADHP
FLFLIKHNPTNTIVYFGRYWSP

Format Requirements:

Format Recommendation: All lines of text should be shorter than 80 characters in length.

Bare Sequence Format

Bare sequence input can consist of lines of sequence data only, or lines of sequence data interspersed with numbers and/or spaces. Data in this format has no header.

Sequence Data Only

Example

QIKDLLVSSSTDLDTTLVLVNAIYFKGMWKTAFNAEDTREMPFHVTKQESKPVQMMCMNNSFNVATLPAE
KMKILELPFASGDLSMLVLLPDEVSDLERIEKTINFEKLTEWTNPNTMEKRRVKVYLPQMKIEEKYNLTS
VLMALGMTDLFIPSANLTGISSAESLKISQAVHGAFMELSEDGIEMAGSTGVIEDIKHSPESEQFRADHP
FLFLIKHNPTNTIVYFGRYWSP

Single queries only: CAS Registry BLAST® processes single queries only. Blank lines are not treated as delimiters.

Interspersed Data

Bare sequence input can also be sequences interspersed with numbers and/or spaces, such as the sequence portion of a GenBank®/GenPept flatfile report:

Example

1 qikdllvsss tdldttlvlv naiyfkgmwk tafnaedtre mpfhvtkqes kpvqmmcmnn
61 sfnvatlpae kmkilelpfa sgdlsmlvll pdevsdleri ektinfeklt ewtnpntmek
121 rrvkvylpqm kieekynlts vlmalgmtdl fipsanltgi ssaeslkisq avhgafmels
181 edgiemagst gviedikhsp eseqfradhp flflikhnpt ntivyfgryw sp

GCG

GCG-formatted sequences include two dots as the last characters at the end of a descriptor line. Everything up to and including the two dots is stripped automatically before the query is submitted.

Example

!!NA_SEQUENCE 1.0
    test.seq Length: 5390 October 10, 2001 13:50 Type: N Check: 8167 ..
    1 ttatataaaa aatgctgaaa acaggatcaa ggaggaagat ttaaatatag 
    51 atataatata tgggaagaaa cataaaaacg aaataagaac agctaaatat

_____

GenBank® is a registered trademark of the U.S. Department of Health and Human Services for the Genetic Sequence Data Bank.

BLAST® is a registered trademark of the National Library of Medicine.

BLAST® reference information provided in whole or in part from the National Center for Biotechnology Information (NCBI), National Library of Medicine, National Institutes of Health.

Unless designated otherwise, all other information Copyright © 1997-2017 by the American Chemical Society. All rights reserved.