Dictionary


Transcription Factor

Transcription is the process by which a segment of DNA is used to generate an RNA template. The DNA segment is 'read' by an enzyme called RNA polymerase, which produces a strand of RNA that is complimentary to the DNA. In this complementary RNA strand, all thymine bases are replaced by uracil. In transcription, a portion of the double-stranded DNA template gives rise to a single-stranded RNA molecule. In some cases, the RNA molecule itself is a "finished product" that serves some important function within the cell. Often, however, transcription of an RNA molecule is followed by a translation step, which ultimately results in the production of a protein molecule.

Annotation

Annotation is the process of attaching biological information to sequences. It first identifies portions of the genome that do not code for proteins, then identifies the elements on the genome, a process called gene prediction, and at last, attaches biological information to these elements. The process of identifying the locations of genes and all of the coding regions in a genome and determining what those genes do. An annotation (irrespective of the context) is a note added by way of explanation or commentary. Once a genome is sequenced, it needs to be annotated to make sense of it.

Pathways

A biological pathway is a series of actions among molecules in a cell that leads to a certain product or a change in a cell. Such a pathway can trigger the assembly of new molecules, such as a fat or protein. Pathways can also turn genes on and off, or spur a cell to move. Some of the most common biological pathways are involved in metabolism, the regulation of gene expression and the transmission of signals. Pathways play key role in advanced studies of Genomics. Most common types of biological pathways: Metabolic pathways, Gene regulation pathways and Signal transduction pathways.

PubMed

PubMed is a free search engine accessing primarily the MEDLINE database of references and abstracts on life sciences and biomedical topics. The United States National Library of Medicine (NLM) at the National Institutes of Health maintains the database as part of the Entrez system of information retrieval. From 1971 to 1997, MEDLINE online access to the MEDLARS Online computerized database had been primarily through institutional facilities, such as university libraries. PubMed, first released in January 1996, ushered in the era of private, free, home- and office-based MEDLINE searching. The PubMed system was offered free to the public in June 1997, when MEDLINE searches via the Web were demonstrated, in a ceremony, by Vice President Al Gore.

Interactions

A quantitative genetic interaction definition has two components: a quantitative phenotypic measure and a neutrality function that predicts the phenotype of an organism carrying two noninteracting mutations. Interaction is then defined by deviation of a double-mutant organism's phenotype from the expected neutral phenotype. A double mutant with a more extreme phenotype than expected defines a synergistic (or synthetic) interaction between the corresponding mutations (synthetic lethality, in the extreme case). Alleviating or 'diminishing returns' interactions, in which the double-mutant phenotype is less severe than expected, often result when gene products operate in concert or in series within the same pathway. Alleviating interactions arise, for example, when a mutation in one gene impairs the function of a whole pathway, thereby masking the consequence of mutations in additional members of that pathway.

PDB

The Protein Data Bank (PDB) is a repository for the three-dimensional structural data of large biological molecules, such as proteins and nucleic acids. The PDB is a key resource in areas of structural biology, such as structural genomics. Most major scientific journals, and some funding agencies, now require scientists to submit their structure data to the PDB. If the contents of the PDB are thought of as primary data, then there are hundreds of derived (i.e., secondary) databases that categorize the data differently. A 4-character PDB ID is assigned to each new structure at the time of deposition. The IDs are automatically assigned and do not have meaning. However, they serve as the unique, immutable identifier of each entry in the Protein Data Bank. As such, they are used throughout the scientific literature (e.g. in journal articles and in other databases) to refer to entries in the Protein Data Bank. Hence, if the PDB ID of an entry in the Protein Data Bank is known, it is the most direct way to retrieve it from the database.

Conserved Domains

Domains can be thought of as distinct functional and/or structural units of a protein. These two classifications coincide rather often, as a matter of fact, and what is found as an independently folding unit of a polypeptide chain also carries specific function. Conserved domains contain conserved sequence patterns or motifs, which allow for their detection in polypeptide sequences. The distinction between domains and motifs is not sharp, however, especially in the case of short repetitive units. Functional motifs are also present outside the scope of structurally conserved domains. The Conserved Domain Database is a resource for the annotation of functional units in proteins. Its collection of domain models includes a set curated by NCBI, which utilizes 3D structure to provide insights into sequence/structure/function relationships.

Gene Ontologies

Gene ontology, or GO, is a major bioinformatics initiative to unify the representation of gene and gene product attributes across all species. More specifically, the project aims to maintain and develop its controlled vocabulary of gene and gene product attributes. It also annotate genes and gene products, and assimilate and disseminate annotation data. Moreover the GO project provide tools for easy access to all aspects of the data provided by the project, and to enable functional interpretation of experimental data using the GO, for example via enrichment analysis. The Gene Ontology project provides an ontology of defined terms representing gene product properties covering three domains, Cellular Components, Molecular Functions and Biological Processes. Each GO term within the ontology has a term name, which may be a word or string of words; a unique alphanumeric identifier; a definition with cited sources; and a namespace indicating the domain to which it belongs.

miRNA

A micro RNA (abbreviated miRNA) is a small non-coding RNA molecule (containing about 22 nucleotides) found in plants, animals, and some viruses, which functions in RNA silencing and post-transcriptional regulation of gene expression. MicroRNAs have been shown to be involved in a wide range of biological processes such as cell cycle control, apoptosis and several developmental and physiological processes including stem cell differentiation, hematopoiesis, hypoxia, cardiac and skeletal muscle development, neurogenesis, insulin secretion, cholesterol metabolism, aging, immune responses and viral replication. In addition, highly tissue-specific expression and distinct temporal expression patterns during embryogenesis suggest that microRNAs play a key role in the differentiation and maintenance of tissue identity. In addition to their important roles in healthy individuals, microRNAs have also been implicated in a number of diseases including a broad range of cancers, heart disease and neurological diseases. Consequently, microRNAs are intensely studied as candidates for diagnostic and prognostic biomarkers and predictors of drug response.

Multiple Sequence Alignments

A Multiple Sequence Alignment (MSA) is a sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. In many cases, the input set of query sequences are assumed to have an evolutionary relationship by which they share a lineage and are descended from a common ancestor. From the resulting MSA, sequence homology can be inferred and phylogenetic analysis can be conducted to assess the sequences' shared evolutionary origins. Visual depictions of the alignment as in the image at right illustrate mutation events such as point mutations (single amino acid or nucleotide changes) that appear as differing characters in a single alignment column, and insertion or deletion mutations (indels or gaps) that appear as hyphens in one or more of the sequences in the alignment. Multiple sequence alignment is often used to assess sequence conservation of protein domains, tertiary and secondary structures, and even individual amino acids or nucleotides.

CNV (Copy Number Variation)

Copy-number variations (CNVs) are a form of structural variation that manifest as deletions or duplications in the genome.For example, the chromosome that normally has sections in order as A-B-C-D might instead have sections A-B-C-C-D (a duplication of "C") or A-B-D (a deletion of "C"). Cells with CNVs have abnormal or, for certain genes, normal variations in their copy number.This variation accounts for roughly 13% of human genomic DNA and each variation may range from about one kilobase (1,000 nucleotide bases) to several megabases in size. CNVs affect segments of DNA and are thus, different from single-nucleotide polymorphisms (SNPs), which affect only one single nucleotide base.

Somatic Mutations

Genetic alteration acquired by a cell that can be passed to the progeny of the mutated cell in the course of cell division. Somatic mutations differ from germ line mutations, which are inherited genetic alterations that occur in the germ cells (i.e., sperm and eggs). Somatic mutations are frequently caused by environmental factors, such as exposure to ultraviolet radiation or to certain chemicals.Somatic mutations may occur in any cell division from the first cleavage of the fertilized egg to the cell divisions that replace cells in a senile individual. The mutation affects all cells descended from the mutated cell.