GGF Generics Library
- General – reference database
- Cancer database
- Children’s genetic diseases database
*non clinical reference use only
This project could be adopted by any cancer or disease research institute or any institution who could introduce a program where patients might be encouraged to contribute samples to build the databases. This could involve building both a public GGF library or database and a specialized database for the sponsor institution.
The GGF algorithm is able to analyze DNA and produce GGF motifs which are meaningful shapes as opposed to just statistical plots which are too ‘noisy’ to usefully classify into generic types for ‘technical analysis’. For instance, technical analysis in the stock market involves classifying statistical plots into generic shapes such as ‘candlesticks patterns’. One article lists 16 candlestick patterns analyzed into shape, shadow, tails ,direction, range, frequency of movement with colour coding for price up down etc. Distinct patterns are even given names including three line strike, two black gapping, three black crows, evening star, and abandoned baby (per www.investopedia.com ). Some genetic software does the same ie produce patterns from DNA that are able to be classified for reference such as GraphClust which clusters DNA patterns for this purpose (described in the following extract). However as is explained below GGF has greater utility for DNA analysis using a library of GGF generic patterns obtained from prior GGF compilations.
The following extract from our patent application explains:
The GGF Algorithm “is a new method designed to extract and recognize patterns related to the geometries of DNA that may reflect transcription outcomes to portray or map DNA sequences visually which will allow new ontologies of DNA or protein structure to be created. The GGF method is intended to allow new techniques of image and signal processing to be utilized to enhance existing DNA or protein libraries or databases which have higher recognition signatures amenable to better classification methods and query systems. GGF images and motifs can provide superior signatures and motifs to existing graphical representations of DNA or other coding macromolecules which are designed to greatly improve interpretation, analysis and understanding of genomic data. For example, GGF produces meaningful images in non coding intergenic zones of DNA providing new images that could provide a basis for discovering new gene expression mechanisms, evolutionary insights or epigenetic features otherwise unseen. The Wu Kabat variability plots can gauge non randomness whilst the GGF can not only indicate non randomness but produce meaningful motifs or signatures in clear images that can include generic motifs that may lead to matching other sequences which may be otherwise be difficult to match e.g. where substantial ‘noise’ obscures a generic or matched sequence. This is because GGF is not matching similar DNA sequences but producing motifs or signatures that can be compared. These sequences may be quite different but the image of the motifs are similar giving perhaps the only way to link the two loci of DNA under study. Thus, the GGF can provide solutions to the increasing problems of massive amounts of raw code derived from sequences.
To take one example of a graphical method which specifically analyses a type of DNA sequence representing clusters of non coding RNA sequences, the method is described in an article :Heyne S, Costa F, Rose D, Backofen R. GraphClust: alignment-free structural clustering of local RNA secondary structures. Bioinformatics. 2012 Jun 15;28(12):i224-32. doi: 10.1093/bioinformatics/bts224.
In that article the authors refer to applying their Cluster graph method too >220,000 sequence fragments to obtain a “small number of probable, but sufficiently different, structures for each RNA sequence. We then encode each structure as a labelled graph preserving all information about the nucleotide type and the bond type …in this way a sequence’s structure is represented as a graph with several disconnected components. We could now compute the similarity between the representative graphs using a graph kernel.” By use of various matching techniques such as ‘nearest neighbour’, covariance and refinement analysis and so on matches between the sample sequence and the target sequence can be made.
The GGF method has a new inventive step beyond methods such as Clust Graph because whilst the Clust Graph is a statistical visualization method designed to produce graphical non recursive representations merely aiding statistical correlation between the sample sequence and the target sequence, the GGF method is more than a statistical visualization method in that it contains recursive, geometrical and formatting features designed to model actual possible biological processes such as possible manner in which the recursive generation of polypeptides which fold occur or the possible manner in which micro or macro molecules tile into a macromolecule or tissues of the body and so on. Generating processes in the genome and proteome are known to be recursive and so recursive representations better model biological processes.”
[1]Heyne S, Costa F, Rose D, Backofen R. GraphClust: alignment-free structural clustering of local RNA secondary structures. Bioinformatics. 2012 Jun 15;28(12):i224-32. doi: 10.1093/bioinformatics/bts224.
GGF Indicator Tests
This is suitable as a joint venture or other commercial arrangement where such tests might be developed for future commercial use with a share of net revenues or royalty is paid to our company.
Once a reference library and GGF ontology exists then design and development of indicator tests can be made for clinical, industrial, research and development use. A list of biological substances which can be assayed for GGF signatures would become targets for an indicator test.R & D testing should produce a list of likely candidate substances where testing by other indicator methods is problematic or unreliable. 2 possibilities are:
- Conventional DNA testing such as PCR/NGS would produce a DNA sequence for a target sequence to investigate possible DNA variants or mutations. The GGF algorithm would then be applied to the suspect DNA segment to produce a GGF motif which would then be subject to an AI based image recognition search engine to seek matches from the GGF database to seek the nearest generic GGF motif to classify the suspect sequence eg an oncogene or genetic mutation/defect.
- A kit or field test kit which would combine existing kits with an added IC chip to apply the GGF algorithm to the sample DNA sequence detected and then upload through an app to undertake a comparison test to the GGF library of GGF motifs to check for a match, to again classify the suspect sequence.
The ideal would be a result that states “The DNA sequence of the target region has produced a GGF motif which is an 87% match to the oncogene XYZ “ Then further testing for that oncogene could take place. Equally viruses are expected to form part of the GGF library and so a test might return a match of X% to the ABC virus etc.
Codeweaver Biologics & Pharmaceutical Design
This is also suitable as a joint venture or other commercial arrangement where such tests might be developed for future commercial use with a share of net revenues or royalty is paid to our company.
Biologics are medicinally large molecules created from proteins from the cells of living organisms. They are much larger and more complex than normal medicine and require a complicated manufacturing process. Biologics are approved by the FDA in the United States under a different approval regime. Biosimilars are drugs similar to a biological medicine eg are specifically designed to mimic a biologic that’s already been developed. According to Pfizer:
“Biologics have helped advance patient care by delivering highly effective and targeted treatment across multiple life-threatening and chronic diseases in the fields of oncology, inflammation/immunology, rheumatology, gastroenterology, diabetes, neurology, and inherited conditions.”
Quoting from a recent paper* regarding industrial biosynthetics such as enzymes the authors said “Novel genetic techniques such as metabolic engineering, combinatorial biosynthesis and molecular breeding techniques ….. are contributing greatly to the development of improved industrial processes. ….. The sequencing of industrial microbal genomes is being carried out which bodes well for future process improvement and discovery of new industrial products.” Such products need generation of novel genome sequences to create industrial enzymes or other metabolites for use in environmental, recycling, catalysis, mineral & chemical processing and many other uses.
*Recombinant organisms for production of industrial products” Adrio & Demain Bioengineered Bugs 1:2, 116-131; March/April 2010; 2010 Landes Bioscience.
If the GGF is reproducing the shapes, patterns and morphology of proteins, cells, tissues etc then if a biologist is designing a biosynthetic product eg a new antibody or drug to fight cancer, disease or viruses then it would be of great utility for the biologist to ‘draw’ the shape of macromolecule that he or she wishes and for a new ‘reverse GGF’ algorithm to give that biologist the DNA sequence suggested by the GGF process. For this reason, GGF Codeweaver algorithms are under development to take the biotic profile and develop candidate ligands for new drugs or enzymes. A GGF Codeweaver algorithm might work as follows:
- The profile of the protein would be taken by laser or other nano testing method.
- This profile is then ‘shadowed’ by a generated GGF curve which encodes the possible GGF pixels that make up a GGF motif image to ‘shadow’ that profile.
- That GGF curve approximately captures the nearest GGF co ordinates to the profile.
- Those co ordinates then translate to a set of optimal oligopeptides or DNA sequences that are predicted to produce the target protein or to generate an aptamer.
- The set of candidate oliogopeptides or DNA sequences can be synthesized, placed on micro array trays and then tested to determine degree of match against the target receptor.
This GGF inverse method or “Codeweaver” could then design biosynthetics for:
- Oligopeptides to modify disease causing proteins
- Gene therapy such as modulating or inactivating defective genes causing disease.
- Immunotherapy
- Ligands or pharmacaphores for design of vaccines or medicine
- BioSynthetic products – medical, biological or industrial to create biosynthetics or biosimilars by designing enzymes with such techniques as protein evolution coupled with the new GGF Codeweaver algorithms and deep learning (AI).