Categories

Tag: LLM

GPT-4o: What’s All The Fuss About?

By MIKE MAGEE

If you follow my weekly commentary on HealthCommentary.org or THCB, you may have noticed over the past 6 months that I appear to be obsessed with mAI, or Artificial Intelligence intrusion into the health sector space.

So today, let me share a secret. My deep dive has been part of a long preparation for a lecture (“AI Meets Medicine”) I will deliver this Friday, May 17, at 2:30 PM in Hartford, CT. If you are in the area, it is open to the public. You can register to attend HERE.

This image is one of 80 slides I will cover over the 90 minute presentation on a topic that is massive, revolutionary, transformational and complex. It is also a moving target, as illustrated in the final row above which I added this morning.

The addition was forced by Mira Murati, OpenAI’s chief technology officer, who announced from a perch in San Francisco yesterday that, “We are looking at the future of the interaction between ourselves and machines.”

The new application, designed for both computers and smart phones, is GPT-4o. Unlike prior members of the GPT family, which distinguished themselves by their self-learning generative capabilities and an insatiable thirst for data, this new application is not so much focused on the search space, but instead creates a “personal assistant” that is speedy and conversant in text, audio and image (“multimodal”).

OpenAI says this is “a step towards much more natural human-computer interaction,” and is capable of responding to your inquiry “with an average 320 millisecond (delay) which is similar to a human response time.” And they are fast to reinforce that this is just the beginning, stating on their website this morning “With GPT-4o, we trained a single new model end-to-end across text, vision, and audio, meaning that all inputs and outputs are processed by the same neural network. Because GPT-4o is our first model combining all of these modalities, we are still just scratching the surface of exploring what the model can do and its limitations.”

It is useful to remind that this whole AI movement, in Medicine and every other sector, is about language. And as experts in language remind us, “Language and speech in the academic world are complex fields that go beyond paleoanthropology and primatology,” requiring a working knowledge of “Phonetics, Anatomy, Acoustics and Human Development, Syntax, Lexicon, Gesture, Phonological Representations, Syllabic Organization, Speech Perception, and Neuromuscular Control.”

The notion of instantaneous, multimodal communication with machines has seemingly come of nowhere but is actually the product of nearly a century of imaginative, creative and disciplined discovery by information technologists and human speech experts, who have only recently fully converged with each other. As paleolithic archeologist, Paul Pettit, PhD, puts it, “There is now a great deal of support for the notion that symbolic creativity was part of our cognitive repertoire as we began dispersing from Africa.” That is to say, “Your multimodal computer imagery is part of a conversation begun a long time ago in ancient rock drawings.”

Throughout history, language has been a species accelerant, a secret power that has allowed us to dominate and rise quickly (for better or worse) to the position of “masters of the universe.”  The shorthand: We humans have moved “From babble to concordance to inclusivity…”

GPT-4o is just the latest advance, but is notable not because it emphasizes the capacity for “self-learning” which the New York Times correctly bannered as “Exciting and Scary,” but because it is focused on speed and efficiency in the effort to now compete on even playing field with human to human language. As OpenAI states, “GPT-4o is 2x faster, half the price, and has 5x higher (traffic) rate limits compared to GPT-4.”

Practicality and usability are the words I’d chose. In the companies words, “Today, GPT-4o is much better than any existing model at understanding and discussing the images you share. For example, you can now take a picture of a menu in a different language and talk to GPT-4o to translate it, learn about the food’s history and significance, and get recommendations.”

In my lecture, I will cover a great deal of ground, as I attempt to provide historic context, relevant nomenclature and definitions of new terms, and the great potential (both good and bad) for applications in health care. As many others have said, “It’s complicated!”

But as this yesterday’s announcing in San Francisco makes clear, the human-machine interface has blurred significantly. Or as Mira Murati put it, “You want to have the experience we’re having — where we can have this very natural dialogue.”

Mike Magee MD is a Medical Historian and regular contributor to THCB. He is the author of CODE BLUE: Inside the Medical Industrial Complex (Grove/2020)

2024 Prediction: Society Will Arrive at an Inflection Point in AI Advancement

By MIKE MAGEE

For my parents, March, 1965 was a banner month. First, that was the month that NASA launched the Gemini program, unleashing “transformative capabilities and cutting-edge technologies that paved the way for not only Apollo, but the achievements of the space shuttle, building the International Space Station and setting the stage for human exploration of Mars.” It also was the last month that either of them took a puff of their favored cigarette brand – L&M’s.

They are long gone, but the words “Gemini” and the L’s and the M’s have taken on new meaning and relevance now six decades later.

The name Gemini reemerged with great fanfare on December 6, 2023, when Google chair, Sundar Pichai, introduced “Gemini: our largest and most capable AI model.” Embedded in the announcement were the L’s and the M’s as we see here: “From natural image, audio and video understanding to mathematical reasoning, Gemini’s performance exceeds current state-of-the-art results on 30 of the 32 widely-used academic benchmarks used in large language model (LLM) research and development.

Google’s announcement also offered a head to head comparison with GPT-4 (Generative Pretrained Transformer-4.) It is the product of a non-profit initiative, and was released on March 14, 2023. Microsoft’s helpful AI search engine, Bing, helpfully informs that, “OpenAI is a research organization that aims to create artificial general intelligence (AGI) that can benefit all of humanity…They have created models such as Generative Pretrained Transformers (GPT) which can understand and generate text or code, and DALL-E, which can generate and edit images given a text description.”

While “Bing” goes all the way back to a Steve Ballmer announcement on May 28, 2009, it was 14 years into the future, on February 7, 2023, that the company announced a major overhaul that, 1 month later, would allow Microsoft to broadcast that Bing (by leveraging an agreement with OpenAI) now had more than 100 million users.

Which brings us back to the other LLM (large language model) – GPT-4, which the Gemini announcement explores in a head-to-head comparison with its’ new offering. Google embraces text, image, video, and audio comparisons, and declares Gemini superior to GPT-4.

Mark Minevich, a “highly regarded and trusted Digital Cognitive Strategist,” writing this month in Forbes, seems to agree with this, writing, “Google rocked the technology world with the unveiling of Gemini – an artificial intelligence system representing their most significant leap in AI capabilities. Hailed as a potential game-changer across industries, Gemini combines data types like never before to unlock new possibilities in machine learning… Its multimodal nature builds on yet goes far beyond predecessors like GPT-3.5 and GPT-4 in its ability to understand our complex world dynamically.”

Expect to hear the word “multimodality” repeatedly in 2024 and with emphasis.

Continue reading…