An open reading frame is a series of codons in DNA or RNA that has the potential to code for a protein. It begins with an initiation codon and is not interrupted by stop codons until a termination signal is reached.
Structure and Identification
Open reading frames are essential for understanding gene organization. A typical ORF consists of a start codon (usually AUG in RNA or ATG in DNA) followed by a sequence of codons that are read sequentially during translation until a stop codon (UAA, UAG or UGA) is reached. Genomes are scanned in all six reading frames—three in the forward strand and three on the reverse complement—to identify ORFs. Computational gene prediction relies on finding long ORFs because random sequences may contain short stretches without stop codons. However, not every ORF corresponds to a functional gene; some arise by chance or serve as regulatory elements. In bacteria and archaea, genes are often arranged in operons, and ORFs correspond closely to protein‑coding sequences. In eukaryotes, ORFs may be interrupted by introns, and splicing defines the final coding sequence. Some viruses and prokaryotes have overlapping ORFs, where different proteins are encoded in different reading frames of the same region. Mechanisms such as ribosomal frameshifting or leaky scanning allow translation of multiple proteins from a single mRNA. Short ORFs in untranslated regions can modulate translation of downstream genes. Identification of ORFs is a foundational step in genome annotation and helps researchers predict potential proteins and explore evolutionary relationships.
Illustrative Examples
In coronaviruses the genome includes ORF1a and ORF1b, which encode replicase proteins via a programmed ribosomal frameshift, along with shorter ORFs such as ORF3a, ORF6 and ORF8 that encode accessory proteins. Bacteriophages and small RNA viruses often use overlapping ORFs to maximize coding capacity in compact genomes. Some eukaryotic messenger RNAs carry upstream ORFs that modulate translation efficiency of the main coding sequence. Bioinformatic tools such as ORF Finder scan sequences to highlight potential open reading frames for further analysis. Open reading frames highlight the linear nature of genetic coding and are a starting point for identifying genes in newly sequenced genomes. By examining start and stop codon positions and reading frame continuity, researchers can distinguish potential protein‑coding regions from non‑coding sequences. Understanding ORFs is therefore fundamental to molecular genetics, virology and biotechnology. Related Terms: Codon, Start Codon, Stop Codon, Ribosome, Frameshift