CAS Registry BLAST logo

FASTA Substitution Codes

Before submitting a request, remove any numerical digits in the query sequence or replace them with the appropriate letter codes (e.g., N for an unknown nucleotide residue or X for an unknown amino acid residue).

Nucleotide Codes

Supported nucleotide codes are:

Nucleotide Code

Base

A

Adenosine

C

Cytidine

G

Guanine

T

Thymidine

U

Uridine

R

G or A (Purine)

Y

T or C (Pyrimidine)

K

G or T (Keto)

M

A or C (Amino)

S

G or C (strong)

W

A or T (weak)

B

G or T or C

D

G or A or T

H

A or C or T

V

G or C or A

N

A or G or C or T (any)

-

Gap of indeterminate length

For programs that use protein query sequences (BLASTp and tBLASTn), the accepted amino acid codes are:

Amino Acid Code

Three Letter Code

Amino Acid Name

A

Ala

Alanine

B

Asx

Aspartate or Asparagine

C

Cys

Cysteine

D

Asp

Aspartate

E

Glu

Glutamate

F

Phe

Phenylalanine

G

Gly

Glycine

H

His

Histidine

I

Ile

Isoleucine

J

Xle

Leucine or Isoleucine

K

Lys

Lysine

L

Leu

Leucine

M

Met

Methionine

N

Asn

Asparagine

O

Pyl

Pyrrolysine

P

Pro

Proline

Q

Gln

Glutamine

R

Arg

Arginine

S

Ser

Serine

T

Thr

Threonine

U

Scy

Selenocysteine

V

Val

Valine

W

Trp

Tryptophan

X

Xxx

Any - Uncommon or Unspecified

Y

Tyr

Tyrosine

Z

Glx

Glutamate or Glutamine

*

  Translation stop

-

  Gap of indeterminate length

Bare Sequence

Bare sequence (plain text) input can be lines of sequence data only, without a FASTA definition line.

Example:

QIKDLLVSSSTDLDTTLVLVNAIYFKGMWKTAFNAEDTREMPFHVTKQESKPVQMMCMNNSFNVATLPAE
KMKILELPFASGDLSMLVLLPDEVSDLERIEKTINFEKLTEWTNPNTMEKRRVKVYLPQMKIEEKYNLTS
VLMALGMTDLFIPSANLTGISSAESLKISQAVHGAFMELSEDGIEMAGSTGVIEDIKHSPESEQFRADHP
FLFLIKHNPTNTIVYFGRYWSP

Bare sequence input can also be interspersed with numbers and/or spaces, such as the sequence portion of a GenBank®/GenPept flatfile report:

Example:

1 qikdllvsss tdldttlvlv naiyfkgmwk tafnaedtre mpfhvtkqes kpvqmmcmnn
61 sfnvatlpae kmkilelpfa sgdlsmlvll pdevsdleri ektinfeklt ewtnpntmek
121 rrvkvylpqm kieekynlts vlmalgmtdl fipsanltgi ssaeslkisq avhgafmels
181 edgiemagst gviedikhsp eseqfradhp flflikhnpt ntivyfgryw sp

_____

See also

Similarity Search Input Formats

GenBank® is a registered trademark of the U.S. Department of Health and Human Services for the Genetic Sequence Data Bank.

BLAST® is a registered trademark of the National Library of Medicine.

BLAST® reference information provided in whole or in part from the National Center for Biotechnology Information (NCBI), National Library of Medicine, National Institutes of Health.

Unless designated otherwise, all other information Copyright © 1997-2017 by the American Chemical Society. All rights reserved.