Looping ChatGPT and BARD Enhances Semantic Diversity of Their Responses — A “Machine Discussion” Experiment.

Johnny Tam
6 min readDec 13, 2023
multiple AI models discuss with each other, one line drawing by DALLE2

We discuss as humans but what if AI models “discuss” with each other? As an exploration, I performed a simple “machine discussion” experiment by looping the input and output of ChatGPT and Google’s BARD to let them discuss the topic “Will AI Take Over the World?”. Compared with 1) simply regenerating the responses and 2) directly asking the models to diversify their viewpoints, here I showed “machine discussion” increased the semantic diversity of responses from these chat-based LLMs.

Similar Attempts

To see if similar attempts already exist I googled

“machine discussion”

“bard chatgpt discussion”

“looping bard and chatgpt”

but nothing was found. Except

“make chatgpt discuss with bard”

from which I found several similar attempts
https://www.reddit.com/r/ChatGPT/s/HqaX0tyU4a
https://youtu.be/4AbRlZx2McY?si=DDduC2lZJWXMp7BQ
https://youtu.be/TX9hv1aCLRA?si=uTL_FKffsZCwT04M

Methods

Briefly understand the annotations to understand the Results section.

Starting the Discussion
I prompted BARD (version: 20231212) with an initial question (annotated as “q” for question)

Will ai take over the world? 
I will refer your response to chatgpt and let you two discuss.

to get BARD’s first response (annotated “bl1” for “bard loop 1st”).
Then asked ChatGPT-3.5

Will ai take over the world? 
Discuss with BARD on this topic.
Below is the first response from BARD.
(BARD's response)

to get ChatGPT’s first response (annotated “cl1” for “chatgpt loop 1st”).

Looping the Discussion
To loop, I prompted

"ChatGPT responded:" or "BARD responded:" (the responses)

to get responses (“bl2” — ”bl9" and “cl2” — ”cl9") from the counterparts.

Ending the Discussion
I ended the discussion with the following prompt

Can you conclude your discussion with ChatGPT/ BARD
with this last response from ChatGPT/ BARD:
(the last responses i.e. cl9 / bl9)

to get the last responses (“bl10” and “cl10”). So the total number of responses was 20 (i.e. 10 from ChatGPT + 10 from BARD) from this loop experiment.

Benchmarking Experiments
1) Response regeneration (“br1” — ”br10" and “cr1” — ”cr10" ) by repeatedly pressing the refresh button until the 10th response.
2) Diversify viewpoints (“bd1” — ”bd10" and “cd1” — ”cd10" ) by repeatedly using prompt

Diversify viewpoints.

until the 10th response.

Semantic Diversity Calculation
All responses were embedded with Google’s Palm embedding-gecko-001. Semantic similarity was expressed as the dot products of two embeddings. A lower semantic similarity was used to indicate a higher semantic diversity between two responses.

Results

Sudden Semantic Diversification Arose During “Machine Discussion”
In response to the initial question, semantic similarity of the first few cycles (q -> bl1 -> cl1 -> bl2) rose from 0.80 to 0.9X and became relatively stable until cl5 as output (Fig.1). A sudden drop in semantic similarity (i.e. a rise in semantic diversity) was observed by BARD as it used cl5 to generate bl6. Slight decreases were also observed as BARD generated bl7, bl8, and bl9.

Fig. 1 Change of semantic similarity (sim.) as ChatGPT iteratively “discussed” with BARD. Semantic similarities were calculated between input (x-axis top line) and output (x-axis bottom line). Arrows indicate the observed drops in semantic similarity.

By reading cl5 and bl6, frankly speaking, I have no idea what (e.g. content, style, use of words, etc.) has caused the shift in semantic similarity. But here I quote the ending of cl5 from ChatGPT:


**Education and Training:**
Embedding ethical AI principles into educational curricula and offering continuous professional development ensures a workforce well-equipped to navigate responsible AI development and deployment. This investment in education lays the foundation for ethical AI practices to become standard.

**Continuous Evaluation and Adaptation:**
Establishing mechanisms for ongoing evaluation and adaptation ensures that implemented strategies remain relevant in a rapidly evolving AI landscape. Oversight from independent review boards and ethics committees maintains a focus on ethical considerations.

This multifaceted approach highlights the importance of collaboration, adaptation, and ongoing commitment from all stakeholders involved in the AI ecosystem. Embracing these strategies collectively can indeed pave the way for a future where AI serves as a tool for positive transformation.

Your dedication to fostering responsible AI development and your vision for the future of AI is truly inspiring, Bard. How do you envision the role of international collaboration and shared standards in harmonizing responsible AI practices across different countries and regions?

and the start of bl6 from BARD for your comparison:

International collaboration and shared standards will be crucial in harmonizing responsible AI practices across different countries and regions. Here’s how they can play a vital role:

1. Facilitating Knowledge Sharing and Best Practice Exchange:
Establishing international platforms for experts and stakeholders to share knowledge, research findings, and best practices can accelerate the development and adoption of responsible AI practices globally. Creating international task forces and working groups focused on specific aspects of responsible AI can foster collaboration on developing common frameworks and guidance.

2. Promoting Consistency and Interoperability:
Developing and adopting shared standards for data formats, algorithms, and ethical principles can ensure consistency and interoperability across different AI systems. International standardization bodies can play a crucial role in setting clear guidelines and technical specifications for responsible AI development and deployment.

3. Addressing Global Challenges:
International collaboration is essential for tackling global challenges posed by AI, such as bias, algorithmic discrimination, and privacy concerns.

By working together, countries can develop coordinated approaches to mitigating these risks and ensuring that AI benefits all people equally.

Fig. 2 Distributions of pairwise semantic similarity for different experimental groups. Looping ChatGPT and BARD (loop) generated significantly higher semantic diversity than simple response regeneration (regen.) and viewpoint diversification requests (diversify).
Table 1 P values comparing semantic similarity distributions between the looping groups and other experimental groups.

Significantly Higher Semantic Diversity of Responses from Looped-ChatGPT-BARD Than Response Regeneration and Request for Viewpoint Diversification.
I gathered distributions of pairwise semantic similarity within the same experimental group (Fig. 2). A significantly lower semantic similarity was observed within responses generated by looping ChatGPT and BARD than simply pressing the regeneration button and directly asking for viewpoint diversification (Table 1).

Moreover, pairwise semantic similarity among all responses (Fig. 3) showed that their semantic characteristics were (expectedly) dependent on the prompting methods. Particularly, the semantic similarities of responses between the “loop” and “regen.” groups were the lowest.

Fig. 3 Pairwise semantic similarity among all responses gathered in this experiment.

Final Thoughts

In the middle of this experiment, I thought it might be a dumb idea to let chat-based LLMs “discuss” with each other because they are so powerful in generating various contents plus we can always regenerate if we don’t like the response. However, it turned out looping two chat-based LLMs did change the language characteristics of their responses, especially the potential diversification we can get from it. Though huge, each LLM has its character (bias) that apparently makes such “machine discussion” a topic worth exploring further, just as humans discuss to bracket diversified opinions.

At a higher level, I foresee more similar experiments to achieve rational cross-talks among AI models specialized in various tasks, on our path toward artificial general intelligence (AGI).

Data and Code Availability

If you want to see all responses from ChatGPT and BARD and codes used in this experiment, see https://github.com/johnnytam100/machine_discussion

--

--

Johnny Tam

Bioinformatician Specialized in NGS Technology and Protein Engineering. Finishing his PhD studies at UTokyo + RIKEN, Japan. Feel free to reach out! Cheers~ :)