It was 1970. I was in my laboratory at the NIH sequencing a murine myeloma protein in order to define the structure of its antibody combining region. Studies of protein conformation were at the cutting edge of science then; enthusiasm abounded. But it was clear to me that this work, in all its scientific elegance, had little to do with treating myeloma or anything else in mice or man. The reason for all the painstaking effort was the joy of pushing back the frontier of ignorance, even if only a bit. No one could foresee clinical utility then, nor would any become apparent for decades. Today such monoclonal antibodies are widely used to treat many diseases, sometimes with efficacy that justifies the costliness.
Genomics is in a bigger hurry.
Thanks to 40 years of breakthroughs, many earning Nobel Prizes, the chromosome carrying the defective gene underlying a genetic disease, Huntington’s disease, was identified in 1983 and the gene sequenced a decade later. In short order, defective genes underlying a number of single-gene diseases were defined: cystic fibrosis, hemophilia, and others. We all wait with baited breath for these elegant insights to transform into primary treatments for single allele genetic diseases. Attempts to transfect patients with normal genes are encouraging but barely so; it has proved difficult to get the right gene to stay in the right cells. Likewise, directly modifying the abnormal genetic apparatus is still largely just promising. The fallback remains working downstream from the genetic apparatus, replacing or modifying the defective products of many of these pathogenetic genes. Nonetheless, optimism regarding modifying the genetic apparatus itself is rational as is ever more boldness on the part of molecular biologists.
Another frontier for genetic medicine emerged from the archives of science fiction to the forefront of molecular biology in the 1990s. Are there other genes, or groups of genes, that make us susceptible to more common diseases such as coronary artery disease or diabetes? Science was primed to sequence the entire human genome in the hopes that less “abnormal” variations in DNA sequences might explain, at least in part, such susceptibilities. This required defining the variations in our genetic codes that were not associated with diseases and comparing such variations with those in patients who were afflicted. The human genome project was underway through the 1990s but its productivity escalated dramatically when Francis Collins was recruited to the NIH to run the Human Genome Project in competition with Craig Venter’s Celera Corporation in the private sector. In 2001 the first human genome had been sequenced. Within a decade the technology was honed so that any human genome could be sequenced rapidly and relatively inexpensively, even the genomes of cancer cells.
We are now swimming in DNA sequence data. About 3 billion base pairs are tucked into our 23 pairs of chromosomes. But nearly all the sequences of base pairs are conserved between individuals; differences are found in only a fraction of a per cent of the DNA. One would think it would be easy to spot the individual differences that make one susceptible to rheumatoid arthritis, or multiple sclerosis, or coronary artery disease – let alone render one a talented musician or an elite athlete. One might think so, but banish the thought. First of all, most traits including most pathological phenotypes (diseases) are polygenic and barely heritable. That means multiple genes are at play, some doing well by you and others not so well. Furthermore, nearly all phenotypes are a result of a balance between genetic and environmental influences and for nearly all common diseases, the environmental influences predominate. That’s why leading scholars, notably my colleague James P Evans, argue fervently against any of us, patient or physician, turning to whole genome sequencing to seek clinical meaningful insights. What do you learn if someone tells you that compared to the average person you have a tiny increase in the likelihood you will develop Crohn’s disease and a tiny decrease in the likelihood you will develop multiple sclerosis? Do you believe that tiny likelihoods are worth losing sleep over, or losing insurance over, or are even reproducible?
Are even reproducible? That sounds like heresy. But don’t for one moment think that either the phenotypes (diseases) or the genetic footprints (the base pair sequences) are sufficiently uniform or the heritable sufficiently robust to render associations more than tenuous. Trolling in an enormous data base for weak associations is more likely to snag debris than snare a marlin. To make this point, a group of statisticians tortured all the hospital discharge data collected for the Province of Ontario, Canada to reveal associations between 223 clinical diagnoses and astrological signs. Lo and behold, those born under Leo were more likely to be hospitalized for a gastrointestinal hemorrhage and those born under Sagittarius were more likely to fracture an arm.
Whenever you undertake an exercise of this nature, as an individual or as an investigator seeking a genetic “footprint” for a disease, you are at risk of coming up with weak associations on the basis or chance alone, on the basis of risk factors you don’t know to measure, or maybe even on a causal basis. Your guess is as good as mine. Investigators might be funded to hope that the associations hold promise for meaningful insights in the future. But why should an individual guess or why listen to those who are willing to guess for you? That’s how butter and margarine could switch between good and bad in the blink of a news cast, or advising you to keep a sleeping infant prone or recumbent or prone or….
So, whole genome screening of asymptomatic individuals isn’t ready for prime time, perhaps genotyping can help predict clinical events in individuals who are already ill. The poster child for doing this offers both a precedent and a proviso. A series of mutations in the BRCA1 and 2 alleles were defined in the 1990s, several of which are associated with susceptibility to breast and ovarian cancer in young women. It is not certain that having the BRCA 1 or 2 genotype marks women as at any increased risk for breast and ovarian cancer unless there is a close relative who already developed cancer at an early age. It’s as if these genes render one susceptible to a “second hit”, another gene or carcinogen or whatever that remains unknown. Hence, no one is recommending screening all women for BRCA 1 or 2; all that would accomplish is a stain on their kinship.
However, it is open season on the genomes of individuals with various diseases, cancers in particular, seeking genotypes that associate with prognosis or with efficacy (or lack thereof) of therapeutic modalities. This is called “personalized medicine.” It has risen to become an article of faith in translational medicine, in the public mind, in the halls of health policy, and in the portfolios of venture capitalists. It is such a seductive idea that all sorts of false starts are quickly forgotten. One form of false start involves fraudulent science such as the genetic test which was developed and commercialized by Anil Potti at Duke. Other false starts are examples of an investigator’s enthusiasm and presumptuousness overwhelming scientific rigor. That’s what happened to OvaSure, a blood test for ovarian cancer that was developed and commercialized by Dr. Gil Mor at Yale. This was not a genetic test; it was a test that purported to identify trace amounts of particular proteins exclusively in the blood of women with ovarian cancer. This test proved unreliable but not because of was fraudulent science. OvaSure died on the sword of scientific bias. The test was developed on blood specimens that were collected from patients with ovarian cancer and patients without ovarian cancer. However, one can never validate any test with the same sort of samples you used to develop the test. One has to systematically compare the results of the test on patients who are sampled randomly from a population that includes patients who have, don’t have, might have, and might not have ovarian cancer – not just the first 2 groups. My colleague, David Ransohoff, has written extensively and elegantly about this pitfall so that repetition today is inexcusable.
Nonetheless, validating tests by testing on samples that are the same or similar to those used to develop the test is convenient, less costly, expeditious, and all too often irresistible. Furthermore, despite this bias, positive results seem similarly irresistible to regulatory agencies, investors…and patients who grasp at “information” despite the risk of being misinformed. Here are 2 particularly disconcerting examples of this dialectic:
Oncotype DX Breast Cancer Assay:
Fifty years ago BREAST CANCER was a dragon than needed slaying. Surgeons were called forth to mete out violence: mastectomies gave way to radical mastectomies and then to super-radical mastectomies in the quest to excise the beast. All this made sense until some surgeons, starting with Oliver Cope, questioned the efficacy. Ever since the approach to the treatment of Breast Cancer has been modulated by evidence, a process that is painstakingly slow since no one is primed to abandon the standard of care. Hence, there has been an incremental withdrawal from dragon slaying, each increment supported by scientific evidence. Along the way, lumpectomy has become the standard of surgical care with various forms of adjunctive therapy invoked depending on the clinical circumstance. And there is more than one clinical circumstance. That’s because there is no Breast Cancer; there are breast cancers each with biological properties that speak to their prognosis and each invoking its own standard of care.
One form of “breast cancer”, ductal carcinoma in situ (DCIS), is on the verge of being labeled benign and of losing the cancer connotation inherent in its “carcinoma” label. Next in the realm of less malignant than we thought is a small invasive ductal cancer that is isolated to the breast with cells that are relatively mature in that they still display normal surface receptors for estrogen and progesterone and lack abnormal surface receptors. It is clear that the treatment of these tumors is galloping toward less and less aggressiveness. Today, nearly all are cured by lumpectomy, but not all. So, following lumpectomy these patients are treated with radiation therapy and one of two forms of chemotherapy: tamoxifen, an anti-estrogen that is a well-tolerated pill with few if any long term toxicities, or more powerful agents with a likelihood of short-term toxicity and the potential for long term toxicity. Patients and doctors would rather avoid the latter if the former was at least as effective.
Enter tumor genotyping. A decade ago, a company named “Genomic Health” developed the Oncotype DX Breast Cancer Assay. This test, which is performed on a minute sample of the cancer removed at lumpectomy, is an assay for the expression of a panel of 21 genes in the tumor. The result is a score that is said to predict the likelihood of recurrence and the need for more aggressive therapy. The studies upon which this assertion is based were extensive but all took advantage of samples of tissues collected for other reasons, mainly as part of therapy trials where the patient either did or did not have the disease, the bias we just discussed. Nonetheless, the test is licensed and is widely used in the U.S. and to some extent abroad. The charge for an assay currently is about $13,000.
This may be the standard of care, but it is a standard that is on thin scientific ice. As a result the National Institutes of Health commenced the TAILORx Trial in 2006, an elegant test of the Oncotype DX Breast Cancer Assay that has enrolled 10,000 patients and will be completed later this year.
For 10 years, hundreds of thousands of women have gambled that the Oncotype DX Breast Cancer Assay is not OvaSure Redux. Perhaps we should have demanded a TAILORx trial in 2004 when the assay was first developed and withheld licensing until we knew whether we were overtreating or undertreating or neither based on the test.
Afirma Gene Expression Classifier:
Lumps, bumps and nodules are the fate of many parts of our bodies, including many glands that secrete various substances. Very few female breasts are spared “cystic mastitis”. No prostates are spared benign nodules; it’s their gray hair. This is the case with the thyroid gland, too. If one were to use a very high resolution ultrasound and scan normal thyroid glands, over 2/3 of people have one or more nodule. You can feel these nodules in about 5% of people, more so in women and more so with age. No one is advocating for screening with high resolution ultrasounds, but there is de facto screening going on because of the great number of people who have imaging studies of their neck for other reasons. There is an epidemic of detected nodules of late.
Nearly all of these nodules are just that, bumps that should cause no concern. However, rarely one is a cancer. If for some reason a doctor feels inclined, any nodule that can be visualized with ordinary ultrasound can be needled and cells aspirated (a FNA or fine needle aspirate.) This is considered a standard of care, particularly if the nodules are larger and/or calcified. Hence there has been an epidemic of FNA and a cadre of pathologists profiting from reading all these samples.
Nearly all these nodules represent benign changes in the growth or function of a part of the thyroid. Most have that appearance on ultrasound. Most that seemed suspicious on ultrasound turn out to have normal pathology. These nodules are incidental findings that cause no difficulties. Very rarely they turn out to be cancerous growths. Most cancerous growths maintain enough of the characteristics of normal thyroid glands to be labeled follicular or papillary cancers. These cancers can grow and spread, but nearly always spread locally in the thyroid with little propensity for distant metastases. Some of the cancers have lost these features and are far more aggressive – but also more readily identified by pathologists. The challenge for the pathologists is nodules that are indeterminate, not normal but not clearly follicular or papillary carcinomas. These patients are often subjected to thyroid surgery, a procedure that occasionally results in severe complications. A recent paper from the Duke Cancer Center described the surgical results for 300 patients with nodules that were incidental findings on imaging studies and had indeterminate pathology on FNA; over half did not have any thyroid cancer and none had the thyroid cancer that can spread widely.
Enter the Afirma Gene Expression Classifier. This is a test, like Oncotype DX, that assesses the tissue for the expression of an array of genes thought to mark cancerous transformation in thyroid cells. The test was developed by Veracyte, a biotech startup in South San Francisco that funded a multicenter “validation” study, which was published in the New England Journal of Medicine in August 2012. Veracyte’s IPO followed in October 2013. The study is probably as good as it gets since it was prospective with all the many people involved in the care of the patients appropriately unaware of the results of the Afirma test. The majority of patients who underwent thyroid surgery for suspicious pathology on their FNA and had a negative Afirma turned out to be free of cancer, suggesting that patients with a suspicious FNA and negative Afirma can be spared surgery in the future. A suspicious Afirma test was less useful as most would also be free of tumor at surgery. The Afirma test has been widely utilized since it was introduced and the general experience seems to be consistent with the results of the formal test.
So we can spare half with suspicious pathology on FNA from unnecessary surgery but not over half the rest. Somehow, all this seems to be off point from a clinical perspective. Why are we bothering with all these incidental nodules in the first place? Thyroid cancer is rare and rarely other than a tumor that grows locally and stays locally. The epidemic of ultrasounding, aspirating, and excising has not led to an increase in the incidence of thyroid cancer or thyroid cancer mortality. If a nodule is felt or felt to be growing and the FNA is not clearly a cancer, why not just monitor with an annual ultrasound accompanied by a giant dose of reassurance? That’s all I’d let you do if it was I.
 McGuire AL, McCullough LB, Evans JP. The indispensable role of professional judgment in genomic medicine. JAMA 2013; 309:1465-6. Screening an asymptomatic person for genetic risk. New England Journal of Medicine 2014; 370:2443-5.
 Austin PC, Mamdani MM, Juurlink DN, Hux JE. Testing multiple statistical hypotheses resulted in spurious associations: a study of astrological signs and health. Journal of Clinical Epidemiology 2006; 59:964-9
 Ransohoff DF. How to improve reliability and efficiency of research about molecular markers: roles of phases, guidelines, and study design. Journal of Clinical Epidemiology 2007;60:1205-19 . Ransohoff DF. Bias as a threat to cancer molecular-marker research. Nature 2005; 5: 142-9.
 Yeo B, Turner NC, Jones A. An Update on the medical management of breast cancer. British Medical Journal 2014;348:g3608 doi: 10.1136/bmj.g3608 (Published 9 June 2014)
 Alexander EK, Schorr M, Kim C, et al. Multicenter clinical experience with the Afirma gene expression classifier. Journal of Clinical Endocrinology and Metabolism 2014; 99:119-25.