Q1-B01-Team-CS

Description: This project was done as part of the DSC 180A Capstone with Dr. Arsanjani. We are fighting against misinformation by utilizing factuality factors and veracity vectors. Then we will be using predictive models and large language models to inform us on whether a certain article is misinformation or not. There are varying levels of misinformation labels from:

pants-on-fire
mostly-false
false
half-true
mostly-true
true

If you would like data for our human responses and our generative AI responses on naive realism, here is the google sheet: Link

File Description

Each file on this repo has its own purpose! We will explain what each file does below

data: Contains our dataset, liar plus, and other data needed to work notebooks.
- liar plus dataset (test2, train2, val2): a dataframe that contains multiple columns of attibutes that is extracted from different text including the varying levels of misinformation labels
  - Column 1: the ID of the statement ([ID].json).
  - Column 2: the varying levels of misinformation
  - Column 3: the title of the text
  - Column 4: the subject(s) of the text (what categories the articles fall under)
  - Column 5: the speaker/author.
  - Column 6: the speaker's/author's job title.
  - Column 7: the state the text was published/released/taken place
  - Column 8: the party affiliation of the speaker/author
  - Columns 9-13: the total credit history count, including the current statement.
    - column 9: barely true counts.
    - column 10: false counts.
    - column 11: half true counts.
    - column 12: mostly true counts.
    - column 13: pants on fire counts.
  - Column 14: the context (venue/location/medium of the text).
  - Column 15: extracted justification that justifies the text's level of misinformation
- chunks.pkl: List object that contains chunks from an article that contains misinformation
- speaker_reput_dict.pkl: Dictionary object where the keys are speakers and the values are lists that contain the reputation of each speaker
model: Contains our models
- XGModel.sav: XGBoosted Decision Tree that predicts veracity based on Naive Realism
- social_cred_predAI.h5: tensorflow Keras Neural Network that predicts social credibility score (not in use anymore)
- social_cred_predAI.keras: keras version of the saved tensorflow keras Neural Network to predict social credibility score (not in use anymore)
- speaker_context_party_model_state.pth: Saves the weights within the neural network to use again with Pytorch
- speaker_context_party_nn.pkl: Pickled model via Pytorch
notebooks: Contains all jupyternotebooks we've work on
- Pred_AI_notebook.ipynb: Keras tensorflow Predictive AI and other attempted models on jupyternotebook. This is then cleaned and moved to social_credibility_predAI.py.
- chunking.ipynb: A Python Notebook that chunks an article. This is where chunks.pkl come from.
- chunking_and_chroma.ipynb: A Python Notebook that inputs these chunks into a vector database known as Chroma
- function_call_test.ipynb: A Python Notebook we created to test our function calling function for our factuality factors.
- liar_plus_to_chroma.ipynb: A Python notebook that translate the liar plus dataset into a vector database.
- scraping_politifact.ipynb: A Python notebook that scrapes the politifact fact check website and politifact truth-o-meter website and translate the data into a vector database
- serp_api_testing.ipynb: A Python notebook that we tested our serp api web search, which is later integrated into our app.py file.
src: contains all files we need in the same directory in order to run our system
- app.py: Python file that has all the functionality for the Mesop interface integrated with Generative AI and Predictive AI. This is where you should start to see how our main features work!
- fcot_prompting.py: Python file that contains a predefined list of Fractal Chain of Thought prompted questions to ask to the Generative AI.
- function_calls.py: Functions for the sensational facutality factor for function calling.
- normal_prompting.py: Python file that contains a predefined list of normal prompted questions to ask to the Generative AI.
- poli_stance_function_calling.py: Functions for the political stance facutality factor for function calling.
- questions.py: Python file that contains a predefined list of questions to ask to the Generative AI.
- serp_api_testing.ipynb: A Python notebook that we tested our serp api web search, which is later integrated into our app.py file.
- social_credibility_predAI.py: tensorflow keras neural network in python file to save and load on app.py (not in use anymore)
- social_credibility_predAI_pytorch.py: Pytorch neural network in python file to save and load on app.py
.gitignore: Tell github which files not to track like env or pycache files.
.python-version: Your python version should be 3.11.9 for app.py to work
README.md: What you are reading right now :)
environmental_mac.yml: Used to download your environment to have all the packages to make this work. This env is for Mac devices
environmental_win.yml: Used to download your environment to have all the packages to make this work. This env is for Window devices

How to Get Started

Clone this repo with the following code:

git clone {github_repo_link}

Create an .env file in the root directory and paste your Google AI Studio API key inside of it:

GEMINI_API_KEY="{API_KEY}"

Within the same file as step 2 paste your Serp API key inside of it:

SERP_API_KEY="{API_KEY}"

Create the environment based on your OS. Make sure you are using 3.11.9 or earlier because Google AI Studio does not support Python 3.13 and wheel does not support 3.12

conda env create -f environment_{respective_OS}.yml

Once done, you need to set up a ChromaDB database for the Google Gemini to base its responses off of. Run the liar_plus_to_chroma.ipynb in its entirety. Make sure you have Docker installed and running before you run this command:

docker run --rm --name chromadb -v chroma_volume:/chroma/chroma -e IS_PERSISTENT=TRUE -e ANONYMIZED_TELEMETRY=TRUE -p 8000:8000 chromadb/chroma

Start fighting against misinformation by starting the app. If you are experiencing issues with this, pip uninstall mesop and reinstall it again. On a different terminal, run the following:

cd src
mesop app.py

Once you are done messing around with the tool. Stop the docker container by running "^C" or:

docker stop chromadb

Members

Calvin Nguyen: Github, Linkedin
Samantha Lin: Github, Linkdein

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Q1-B01-Team-CS

File Description

How to Get Started

Members

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 141 Commits
data		data
model		model
notebooks		notebooks
src		src
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
environment_win.yml		environment_win.yml

Folders and files

Latest commit

History

Repository files navigation

Q1-B01-Team-CS

File Description

How to Get Started

Members

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages