What is a gene? An introduction for beginners.

Intro

A gene is a functional unit of deoxyribonucleic acid (DNA) that contains basic information for the development of characteristics of an individual. In the process of transcription, a complementary copy in the form of RNA is produced from the strand of DNA.

There are different types of RNA. During translation, a sub-process of protein synthesis, the amino acid sequence of proteins is read from the mRNA (messenger RNA). In the body, the proteins each take on specific functions with which characteristics can be expressed. The activity state of a gene or its expression can be regulated differently in individual cells.

Are you interested in decoding 100% of your DNA? Nebula Genomics offers the most affordable Whole Genome Sequencing! Begin a lifetime of discovery with full access to your genomic data, weekly updates based on the latest scientific discoveries, advanced ancestry analysis, and powerful genome exploration tools. Click here to learn more!

Edited by Christina Swords, Ph.D.

Process by which genes produce proteins to perform a particular function in the body.

Genes carry hereditary traits located in the chromosomes, in specific places. They are the carriers of genetic information that is passed on to descendants through reproduction. The entire genetic information of a cell, consisting of thousands of genes, is called the genome. The Human Genome Project was founded with the aim of completely deciphering the over 20,000 genes of the human genome.

Structure

At the molecular level, a gene consists of two different regions:

A section of DNA from which a single-stranded copy of RNA is produced by transcription.
All additional DNA segments that are involved in the regulation of this copying process.

There are various differences in the structure of genes of different living organisms.

Genes encode not only the mRNA from which the proteins are translated but also molecules called rRNA and tRNA as well as other ribonucleic acids that have other tasks in the cell. A gene encoding a protein contains a description of the amino acid sequence of this protein. This description is available in a chemical language, namely in the genetic code in the form of the nucleotide sequence of the DNA molecule.

The individual “chain links” (nucleotides) of the DNA – grouped in groups of three (triplets, or codons) – represent the “letters” of the genetic code. The coding region, or all nucleotides that are directly involved in the description of the amino acid sequence, is called an open reading frame. A nucleotide consists of one part phosphate, one part deoxyribose (sugar) and one base. A base is either adenine, thymine, guanine or cytosine.

A gene is shown broken down into its component parts, highlighting the process of translation.

In front of the transcription unit, there are regulatory regions, such as enhancers or promoters, which increase the expression of certain genes. Depending on the sequence, various proteins such as transcription factors and RNA polymerase bind to these to begin transcription. In contrast, DNA polymerases copy the DNA during cell division.

In addition to the directly protein-coding open reading frame, the mRNA contains untranslated, non-coding regions: the 5′ untranslated region (5′ UTR) and the 3′ untranslated region (3′ UTR). These regions serve to regulate translation initiation and to regulate the activity of the ribonucleases, which degrade the RNA.

Components of mRNA include both coding and non-coding regions.

The genes of prokaryotes differ in their structure from the eukaryotic counterparts in that they do not have introns, an area of an RNA transcript that doesn’t code for proteins. In addition, several different RNA-forming gene segments can be connected very closely one after the other. These are referred to as polycistronic genes, and their activity can be regulated by a common regulatory element. These clusters, called operons, are transcribed together but translated into different proteins. Operons are typical for prokaryotes.

Genes can mutate, i.e. change spontaneously or through external influences (for example through radioactivity). These changes can take place at different locations. As a result, after a series of mutations, a gene can exist in different states called alleles.

A DNA sequence can also contain several overlapping genes. Copies produced during gene duplication can be sequence-identical but still be regulated differently, resulting in different amino acid sequences without being alleles.

Organization of genes

In all living organisms, only part of the DNA codes for RNAs. The remaining parts of the DNA are called non-coding DNA. This functions in gene regulation and has an influence on the architecture of chromosomes.

The location on a chromosome where the gene is located is called the locus. Furthermore, genes are not evenly distributed on the chromosomes, but sometimes occur in so-called clusters. Such clusters can consist of genes that are randomly located in close proximity to each other, or they can be groups of genes that code for proteins that are functionally related. However, genes whose proteins have similar functions can also be located on different pairs of chromosomes.

There are sections on the DNA that code for several different proteins. The reason for this is overlapping open reading frames.

Gene activity and regulation

Genes are “active” when their information is transcribed into RNA, i.e. when transcription takes place. Depending on the function, mRNA, tRNA, or rRNA is produced. In the case of mRNA, a protein can be translated from this activity.

Gene regulation takes place by binding and releasing proteins, called transcription factors, to specific regions of the DNA, the so-called “regulatory elements”. On a larger scale, this is achieved by methylation or “packaging” DNA segments into histone complexes.

The regulatory elements of DNA are also subject to variation. The influence of changes in gene regulation is likely to be comparable to the influence of mutations in protein-coding sequences. With classical genetic methods – by analysing inheritance and phenotypes – the effects of these mutations in inheritance cannot normally be separated.

Special genes

RNA genes in viruses

Although genes are present as DNA segments in all cell-based life forms, there are some viruses whose genetic information is in the form of RNA. RNA viruses infect a cell, which then immediately starts to produce proteins according to the instructions of the viral RNA; a transcription from DNA to RNA is not necessary. Retroviruses, on the other hand, translate their RNA into DNA during infection, using the enzyme reverse transcriptase.

Pseudogens

A gene in the narrower sense is usually a nucleotide sequence that contains the information for a protein that is directly functional. In contrast, pseudogenes are copies that do not encode a functional protein in its full length. Often these are the result of duplications and/or mutations which, without selection, accumulate in the pseudogene and have lost their original function. Nevertheless, some appear to play a role in the regulation of gene activity.

Jumping genes

They are also known as transposons and are mobile sections of genetic material that can move freely within the DNA of a cell. They cut themselves out of their original location in the genome and insert themselves at any other location. Researchers have shown that these jumping genes not only occur in reproductive cells as previously assumed, but are also active in nerve precursor cells.

You can learn more about genes from these resources:

Also, check out our Whole Genome Sequencing!